May 18, 2015

By Joseph Titlebaum
Chief Legal and Privacy Officer

Many websites collect personally identifiable information (PII) from consumers. Examples abound: payment details, sign-in details, consumers using Facebook or LinkedIn to sign in to various services, and in some cases sites simply wanting to treat “returning” visitors differently from new visitors.

Consumers expect that this type of personal data, however collected, will be treated with the highest level of protection, and many companies include statements in their privacy policies stating that they do not share the PII they collect.

Other Internet businesses, such as many analytics providers, try not to collect PII to avoid the extra management burden associated with it. Their site terms and conditions may expressly prohibit customers from sharing personal information, and their privacy policies state that they do not collect PII.

But with so much data being collected on websites and mobile apps, there are instances where PII is being collected and shared, perhaps without the knowledge of the business entities involved. This potentially increases a company’s legal and regulatory exposure, particularly for international businesses.

How can this happen?

One culprit is the inclusion of unencrypted PII in a URL, usually through a parameter or query string.

The PII may be there because the site has authentication or a paywall and is designed to remember that a visitor has been there previously. Or the PII may be collected as part of a sweepstakes or other contest. If the PII is included in the URL, when JavaScript or image calls transmit data back to remote servers, the PII will ride along, as shown in this example:

www.domain.com/page.html?name=JohnSmith&[email protected]

Because the URL contains embedded PII, entire string also would be considered to be PII, even though neither the publisher nor the third party intended this. This makes the issue of PII management much larger.

At a practical level, the unintended sharing of PII can impact a site’s relationship with analytics providers. For example, Google Analytics’ terms of service expressly prohibits the collection of PII. Google has from time to time notified its users that they have inappropriately provided PII. And Google may delete any data set that improperly includes PII, so it is critical that website owners not unnecessarily share PII.

In most instances, there’s no business requirement that recipients of the URL know the actual PII; they just typically need to understand some broad metadata about the user, such as new or returning. Data minimization principals would require that this PII be deleted, hashed or otherwise obscured.

While it’s unclear how often this type of sharing has resulted in legal or regulatory action, the associated liability increases with the sensitivity of information. In the early days of the federal government’s insurance marketplace, confidential data about a visitor’s health conditions was encoded in the URL without encryption, significantly adding to the beleaguered initiative’s woes.

The risk isn’t just between business partners, as man-in-middle exploits throughout the entire communications channel between the user, the website, and its partners can expose this data.

The impending ratification of the W3C Do No Track standard would raise the bar even higher. If a site or service publicly declares its adherence to the standard, it will have to include checks around embedded PII. Regulators carefully watch companies’ public declarations of their privacy policies and hold them accountable under the “Say what you do and do what you say” standard.

What should a business do?

Site owners should take care to understand exactly what sort of PII they collect, why and how it is collected. They should be careful to share that information only in accordance with their privacy policy – and in as limited a manner as possible. Using a tool such as Mezzobit’s Audience Control Module could sniff third party calls for telltale signs of PII, such as email addresses.

In instances where PII is now included in URLs, think about alternatives that achieve the same business objective. If internal systems require this information but external partners don’t, a simply level of encryption could permit continued passing of parameters while rendering them gibberish to external viewers.

As PII is more prevalent than many of us realize, it is critical that website owners really understand what sorts of data they have – and how it is being shared with business partners.