Live
Privacy & Security

The Privacy Lawyer Who Took on an AI Company — and Won

A four-year GDPR case against an AI training data company resulted in a landmark ruling on legitimate interests. We spoke to the lawyer who built it.

Sarah Okonkwo — no relation to this publication’s correspondent — is a privacy lawyer based in Brussels who spent four years building a case against a major AI training data company. In March, she won.

The case centred on a company that had scraped personal data from social media platforms, news sites, and public records databases to build training datasets for sale to AI laboratories. The company argued that publicly available data was fair game. Okonkwo argued that GDPR’s legitimate interests balancing test required the company to have a specific, proportionate reason for processing personal data — and that “we want to build a business selling training data” did not clear that bar.

The court agreed. The ruling required the company to delete datasets containing personal data of EU residents, to implement a verified opt-out mechanism for future scraping, and to pay €12 million in administrative fines. More significantly, the court held that the legitimate interests test applies to AI training data collection in the same way it applies to any other commercial data processing — a ruling that, if upheld on appeal, changes the economics of training data scraping across the EU.

The Interview

We spoke with Okonkwo for two hours after the ruling was handed down. What follows is an edited excerpt.

You’ve said this case was less about the specific company and more about establishing a principle. What’s the principle?

The principle is that “publicly available” is not a synonym for “legally processable.” GDPR was built on the concept of purpose limitation — you can only use data for the purpose for which it was collected. When someone posts on a social media platform, they are not consenting to having their posts used to train a commercial AI model. The fact that the platform made the data technically accessible doesn’t change the consent analysis.

The company argued their use constituted legitimate interest. Where did that argument fail?

The legitimate interests test has three parts: the interest must be legitimate, the processing must be necessary for that interest, and the interest must not be overridden by the fundamental rights of the data subjects. They cleared the first part. They arguably cleared the second. They failed the third. The court found that the right of individuals to control how their personal expression is used — including to train AI systems that will generate synthetic versions of that expression — is a fundamental right that was not adequately weighed.

What does this mean for AI companies that have already trained on this data?

That’s the uncomfortable question. The ruling applies to the scraping company, not to the AI laboratories that bought the data. Whether those laboratories are themselves liable is a separate legal question, and one that several ongoing cases are specifically designed to answer. I expect we’ll have clearer law on that within two years.

// Author
Theo Wright

Leave a Reply

Your email address will not be published. Required fields are marked *

@promptandpower

YouTube Channel

LinkedIn Page