By Andres Corrada
Privacy standards are evolving in the ad tech industry, and BlueCava has launched many efforts in this area. Those include our adherence to IAB industry standards such as “opt out” for consumer screens. But there is another side to privacy that is seldom discussed but deserves attention – building a data processing system with built-in consumer anonymity. In this post, we will discuss the science behind one such effort by BlueCava as exemplified by a recent U.S. Patent application (United States 14/492,332).
Our core data science task at BlueCava consists of building algorithms and processes that can provide persistent, unique identifiers for screens, consumers and households. Our unique identifiers are anonymous. We do not rely on any Personally Identifiable Information (PII) to build them, which raises a measurement paradox for us. If we do not know the “true identity” of the screens we process in our system, how can we know we are delivering a highly accurate, anonymous identifier?
Contrast our situation with that of a closed-ad ecosystem such as Facebook. Major ad ecosystems frequently know the full identity of the users on their platform. Consequently, they protect the privacy of their users by anonymizing the identifiers they share with third parties. At BlueCava, we don’t have that information. Our BC ID, for example, is anonymous throughout our system.
Other companies might tackle this measurement conundrum by measuring their identifier accuracy on a small set for which they do have PII. There is one major drawback to this approach: It only measures accuracy on a small percentage of your data. You would be lucky if you could do this on 1% of your data.
As BlueCava does not use PII , our response is to start building the tools that satisfy both sides of this measurement challenge. We want to build measurement protocols that assure our clients we are delivering accurate results while simultaneously protecting the anonymity of our consumer data. This is exemplified by our recent patent titled “Measuring Web Browser Tag Properties Without True Unique Tags.”
The basic idea behind the patent is that we have two unique identifiers at our disposal – the BC ID and, that workhorse of the Internet, the web browser cookie. Both identifiers are noisy labelers of the uniqueness of a device on the Internet. Cookies get deleted, and our BC ID makes the occasional mistake. These two identifiers will sometimes agree and sometimes disagree.
The mathematical pattern of their joint labels can be turned into an algebraic system of equations. One side of these equations is the number of times our BC ID agrees or disagrees with the cookie of a third party. On the other side are the average accuracy of our BC ID and the cookies. Magically, the ensemble of eight equations can be solved to provide us exactly with what we want – the accuracy of our BC ID.
There is something satisfying about our solution to this measurement problem. We have many individual data points. On each individual one, we do not know if our BC ID or the cookie is correct. But put together, we can arrive at a statistical measure of accuracy. The math encodes our privacy respecting protocol: Into it go anonymous decisions, out comes our average accuracy.
By the way, if you are curious, we typically measure our BC ID accuracy to be about 99% accurate on consecutive appearances of a device, while cookies are 91% accurate. That seemingly small difference means that our BC ID lives about 100 days on average, versus cookies that typically live about 100 hours.