mdti_dec_2015Earlier this year, Mezzobit began assessing how websites across the Internet engage in data collection, user tracking and other visitor interactions, using our insight into billions of transactions that flow through our systems each month plus scans of top digital properties.

Our Mezzobit Data Transparency Index is intended to provide consumers, enterprises and regulators a monthly barometer of how the digital world is trending. Here’s a detailed explanation that accompanied our first Index in January, which coincided with Data Privacy Day.

In the second edition of the Index, we saw a small amount of improvement in two areas:

  • Visitor tracking, which measures the use of technologies such as cookies and browser fingerprinting, dropped by one point (meaning that we observed slightly less tracking in our sample set).
  • Tag chaining, which happens when website code calls in other third parties, also decreased slightly month over month.

Both of these resulted in a corresponding decrease from 42 to 41 in the composite score, which averages all of the scores.

Is this cause for concern or delight among privacy advocates or digital enterprises? Not really, no more so than a single hot day makes a heat wave.

Normal website operations cause variations based on a number of factors. When you visit your favorite site, there may be different articles, ads, or products on any given day, which affects the type and quantity of third parties called into the page. As our analysis is driven by these little bits of JavaScript or images called tags, the scores will fluctuate.

What’s more important is the longer term trend. As new data collection and tracking technologies are introduced to the Internet, we’ll see upward pressure on some scores. Conversely, some site operators change their strategies to improve performance or scale back third-party tags, which will trigger the reverse. Our database is becoming larger every day, and we’ll soon publish breakdowns of these numbers into components for different types of sites and tag vendors.

If you have any idea or if there are questions you’d like us to answer with this project, please drop us a line. Read more>

How we did it

Our study specifically looks at code and images embedded in websites, which are called tags, trackers or beacons. Website operators use tags provided by third parties to power common functions, such as advertising, social sharing, and analytics. Here’s a short article with more information about how they work, but once visitor’s browser loads a tag, it has mostly free reign to collect and transmit data, track the user, and change the website’s content — oftentimes without the knowledge of the originating website operator. And tags can breed like rabbits, with one calling another calling another until hundreds may be on a single page.

Using our proprietary data and algorithms, we created a master composite index that rolls up five underlying scores that represent the current state of Internet data. All scores are uncapped: as the Internet changes in the future months, they can rise without limitation. We also calculate these scores for hundreds of thousands of websites and tags, although their identities are not shared here.

But high scores aren’t necessarily bad, nor are low scores good. It’s like saying 55 mph is too fast or 20 mph is too slow, when each may be just fine for either a highway or school zone. More important is the value that digital enterprises and consumers receive from the Internet by virtue these activities as well as how reality differs from their expectations. Consumers are tiring of slow site performance — coupled with uncertain threats to their privacy — so ad blocking is on the rise. Publishers seek wider sources of revenue, so they invite a greater number of data partners onto their websites.

The scores also are relative to each other, with a higher score representing a greater level of a certain activity, while a lower score meaning the opposite. We plan to publish updates every month to track how the Internet is changing.