If the Department of Justice (DOJ) has its way, Google would be required to turn over some of the sensitive data it has to existing and future rivals for the next ten years. This includes users’ search history and advertising data, along with all data that Google can obtain from users in the United States, as well as data collected in responding to commercial and other inquiries, datasets used to train Google’s ranking and retrieval algorithms and models used for Google’s generative AI tools, and data conveyed in real-time, not just data that Google may retain or store. The scope of this “remedy” should terrify anyone who has ever used Google to search the internet.
Forced data sharing is among the many “remedies” DOJ has proposed to rectify what it contends is an anti-competitive market for internet search. If agreed to by the court, the DOJ would force Google to share and syndicate specific search data with “rivals and potential rivals.” That data can be used to reverse engineer Google’s own algorithms and it can be used to identify individual users. The DOJ delegates the work of protecting users privacy to a “technical committee” responsible for determining safeguards and identifying which companies qualify for access to Google’s data. We don’t know who will be part of this small committee, but it’s safe to assume it will not consist solely of experts in anonymization and aggregation of sensitive data and implementing such measures at scale and speed. The data exposed will be sensitive, private information, and the DOJ does not have a plan to protect it, instead saying their committee will handle that later.
But it’s virtually certain that forced sharing of search data will expose users and the internet ecosystem to untold privacy and security risks.
Privacy concerns are particularly acute for users. Search history contains some of our most innermost thoughts, providing a picture into our interests, concerns, and activities – including things we might not want to ask others (doctors, friends, family). Public opinion polls regularly show that a majority of people consider search history to be sensitive. When Google – and, hopefully, any search platform – shares search data today, it does so carefully. Its Google’s Google Trends reports, for example, aggregate and anonymize search results to protect against identifying users.
Forced syndication and sharing of search data as DOJ seeks goes well beyond anything that’s been attempted. Even with state-of-the art anonymization and pseudonymization techniques and robust governance measures, truly anonymizing large, complex datasets while preserving their utility for analysis or use is incredibly difficult.
Linkage attacks, where anonymized data is combined with other publicly available information, can potentially re-identify individuals. As data analysis and machine learning techniques advance, so do methods for re-identifying anonymized data, requiring continuous updates and improvements to anonymization strategies. Similar technical concerns apply to other methods that may be used. The use of differential privacy, secure enclaves, or other privacy enhancing technologies raise a host of issues involving interoperability, cost, and usability. As Chris DiBona put it, there do not exist “better and more sophisticated privacy protections that would also preserve the value of the data. You can’t have it both ways.”
Beyond technical capacity, DOJ’s proposal would require a robust framework of access controls, oversight, standards, definitions, and legal authorities to govern the sharing of search data with third parties. None of this exists. DOJ proposes to defer these questions to a “technical committee”. The DOJ’s proposed remedy hinges on a promissory note to create a committee to basically set, apply, and enforce revolutionary digital privacy standards, while providing no specifics on how they’d do such a thing. The responsibilities of the committee would involve, among many initial steps, establishing a legal definition or technical standard for what constitutes “anonymized” data in the United States – something that’s never before been done. The government is promising to build a data protection regime unlike anything ever created, and they’re betting everyone’s privacy on it.
Yet the ability of the new committee to effectively do all of this will undoubtedly be constrained by its inception. DOJ will likely prioritize competition-driven concerns for utility over privacy and security concerns, perhaps assuming that mere removal of personally identifiable information is sufficient to protect privacy. That’s fairly implied by the brevity of their court submissions on forced data sharing. Indeed, DOJ also defers identifying competitors and future competitors to the technical committee. This itself will be a herculean task given the value of Google’s search data and the fact that the generative AI wave has essentially redefined internet search. Once those rivals are identified, it will fall to the technical committee to ensure that each of them has implemented state-of-the-art governance practices, technical cybersecurity measures, and more.
The risk remains that information could be exposed and connections made between search data and actual users that expose users’ information and expose users to downstream harm. There’s a long history of data like this being re-identified despite anonymization. And advanced anonymization techniques undertaken at scale – not to mention in real-time – are likely no match for the power of AI tools to make sense of large datasets like this, which can contain a wealth of information about user behavior, interests, and preferences. No matter how much effort is put into mitigating these risks, it takes just one weakness to upend the entire delicate balance and expose users to identity theft, phishing attacks and scams, and even physical harm.
The DOJ proposal has enormous gaps; it demands Google hand over a massive amount of highly sensitive data with no plan in place to protect it and no explanation of how they will determine who gets access to that data. There’s also the very real possibility that beneficiaries of DOJ’s largesse, not subject to agreements with Google’s users, will use search data for unintended purposes – commercial solicitation, sale to third parties, or other things. There’s no federal privacy law to regulate how those companies use and protect the data. It’s unclear how such data sharing would be treated under state privacy laws and how internet users would exercise rights over their own data as Google provides or as is required by laws in many states.
Sharing large amounts of data with multiple companies exponentially increases the vulnerability of data to security breaches and potential misuse and unauthorized access. It would create multiple points of weakness for malicious actors to penetrate databases and use their fruits to perpetrate cybercrime, financial theft, and more. And organized actors, including, potentially, nation states, could harvest user data to sow discord and spread propaganda with data rich enough to microtarget susceptible populations.
Beyond these concerns, the proposal is likely to create a perverse disincentive for companies to innovate to improve their search tools. As described by the Wall Street Journal, DOJ is demanding “Google to socialize its data.” What this means is that Google’s rivals can simply free ride on the investments Google has made and the trust it has earned with its user base. Google’s CEO, Sundar Pichai, testified this week that forced data sharing would “allow anyone to completely reverse engineer, end to end, every aspect of our technology stack.”
That’s unlikely to lead to a better consumer experience or alternative ways to search the internet. It’s telling that in the DOJ’s case – which just concluded this week – not a single consumer testified. Indeed, it’s very likely that the multi-billion-dollar companies pushing DOJ’s proposal will be the prime beneficiaries of forced data sharing. It’s unlikely that regular users of the internet will see any benefit.
The DOJ proposal ignores all of these important issues, almost as if they do not exist, making clear that user welfare, and the security of the internet ecosystem, are mere afterthoughts. They shouldn’t be. Users need to have trust in the tools they choose for searching the internet. A hastily drawn-up proposal to exploit users’ data for the benefit of other companies will hurt users, even if steps are taken – and there’s no indication they are – to implement the most robust technical safeguards, guidance, and governance practices.
On the first day of trial, Assistant Attorney General Abigail Slater stated: “The Google search case matters because nothing less than the future of the internet is at stake here.” She is right. Again, DiBona: “to release this data stream would be such a grievous harm on American citizens[‘] privacy that it would be well nigh unrecoverable. If that’s the actual goal here, the end of privacy, then this is one way to do it? Do you think police overreach, identity theft, scams or predatory corporate behavior is bad now? It’s gonna get so much worse if you release this data.”