The AI industry spent 2023 pretending copyright law didn’t apply to them. That fantasy died in December when The New York Times filed a lawsuit against OpenAI and Microsoft claiming billions in damages for using millions of articles without permission. What looked like one publisher throwing a tantrum turned out to be the first crack in a dam that’s about to break.
Fast-forward to early 2026, and the legal landscape looks nothing like the Wild West OpenAI enjoyed when it scraped the internet to build GPT-3. Multiple class-action lawsuits have established a framework. The EU’s Digital Single Market Directive isn’t a suggestion anymore — it’s law with teeth. And the question everyone avoided asking — does AI training count as fair use? — is finally getting answered in court. Spoiler: the answers aren’t looking great for Silicon Valley.
Here’s what changed: AI companies are no longer dealing with hypothetical liability. They’re staring down real numbers, real deadlines, and real consequences. The estimated exposure? North of $5 billion, and that’s just from cases already filed. Meanwhile, Anthropic, Google, and xAI are scrambling to reposition their training data policies before judges start handing down rulings that could force them to rebuild their models from scratch.
When the Times filed its lawsuit on December 27, 2023, the complaint didn’t mince words. “The Times brought this case to vindicate the rights of authors and publishers and to prevent Defendants from continuing to copy The Times’ copyrighted content and original reporting wholesale,” their legal team stated. The lawsuit references millions of articles from the Times archives dating back decades — not a handful of blog posts, but the entire institutional memory of American journalism.
OpenAI’s response was textbook Silicon Valley deflection: “Our training approach respects copyright and we believe fair use supports training AI systems. We’re working with publishers and authors to build better licensing relationships.” Translation: we’re going to argue fair use in court while quietly cutting deals with publishers who have lawyers.
The Times case matters because it’s the first major publisher with the resources to actually see this through to trial. Most independent creators can’t afford multi-year litigation against companies valued in the hundreds of billions. But the Times can, and they’re not settling. Motion hearings began in early 2025, and as of April 2026, the case is moving toward trial with no settlement in sight.

The Times lawsuit might be the most visible, but it’s not alone. In September 2023, authors including Sarah Silverman, John Grisham, and Michael Chabon filed class-action lawsuits against OpenAI, Google, and Meta. The Authors’ Guild joined in. The plaintiffs represent over 1,000 authors whose works were allegedly used to train AI models without permission or payment.
Why does this matter? Because class actions change the math. Individual authors might settle for modest payouts to avoid legal costs. But a class action representing thousands of creators with statutory damages ranging from $750 to $30,000 per work — or up to $150,000 for willful infringement — suddenly makes the potential liability astronomical. Between 2023 and 2024, legal tracking databases recorded approximately 2,000 to 3,000 lawsuits filed against major tech companies over training data. That’s not noise. That’s a pattern.
The authors’ legal counsel put it bluntly in their September 2023 filing: “These companies have built billion-dollar businesses by copying authors’ work without permission, without payment, and without transparency. This must change.” And unlike individual creators, a class action has the resources to push for discovery, demand internal documents, and force companies to reveal exactly what data went into training their models.
While US courts debate whether AI training qualifies as fair use, Europe already decided the answer is no. The EU’s Digital Single Market Directive, adopted in 2019, includes Article 17, which requires platforms to obtain licenses for copyrighted content or implement filtering systems. This isn’t guidance. It’s law, enforceable across all EU member states, with penalties reaching up to €300,000 per violation or a percentage of annual revenue.
An European Commission official made the stakes clear in enforcement guidance: “Article 17 ensures that right-holders have control over uses of their work and receive fair compensation. This is not optional for platforms operating in the EU.” That last sentence is doing a lot of work. Any AI company that wants to operate in Europe — which is every AI company — has to comply. No fair use arguments. No philosophical debates about innovation. Just: did you get a license or not?
The EU’s approach creates a fragmented global landscape. OpenAI can argue fair use in a Manhattan courtroom all day, but that defense means nothing in Brussels. Google’s training data policies suddenly need to be Europe-specific. And smaller AI startups? They’re looking at legal compliance costs that could sink them before they launch.
Not every AI company is taking the same approach to the copyright minefield. OpenAI doubled down on fair use and selective licensing deals. But watch what Anthropic, Google, and xAI are doing — they’re placing very different bets.
Anthropic published its Constitutional AI training methodology in August 2024, disclosing that it uses licensed content from Stack Overflow, Creative Commons materials, and public domain sources. That’s not altruism — it’s legal strategy. By building transparency into their model documentation, Anthropic is positioning itself as the responsible actor when courts start looking at training practices. If OpenAI’s defense is “fair use covers everything,” Anthropic’s is “we actually paid for our data.”
Google announced training data disclosure policies in September 2024 following regulatory pressure in the EU and UK. The company already learned this lesson the hard way with Google Books, which scanned over 40 million books — more than 20 million still under copyright — and spent years in court defending it. Google’s new approach: partial transparency, selective licensing, and enough documentation to satisfy regulators without revealing the entire training dataset.
Then there’s xAI, which published training data documentation for Grok in late 2024. Elon Musk’s company is playing a different game entirely — positioning Grok as the anti-establishment AI while still documenting enough to avoid the worst regulatory outcomes. It’s transparency as branding, but it’s also legal cover.

The entire AI industry’s defense rests on one legal doctrine: fair use. The argument goes like this — training AI models on publicly available data is transformative research, similar to education or criticism, and therefore protected under US copyright law. Meta’s Chief AI Scientist Yann LeCun stated in 2024: “Training on publicly available data is standard machine learning practice and supported by fair use doctrine.”
There’s one problem: courts haven’t actually decided this yet. Fair use has traditionally covered news reporting, criticism, research, and parody. But commercial AI training at scale, using millions of copyrighted works to build products that generate billions in revenue? That’s not in the case law. As Daphne Keller, an internet law scholar at Stanford, noted in 2024: “The training data question is genuinely unsettled law. Courts will have to balance innovation interests against creator rights.”
The uncertainty is the problem. OpenAI is betting billions that courts will rule in their favor. But if they don’t? Every model trained on unlicensed copyrighted data becomes a liability. Companies would face a choice: pay massive settlements, rebuild models from scratch using only licensed data, or pull out of jurisdictions with strict copyright enforcement. None of those options are cheap.
Meanwhile, creators are stuck waiting for courts to decide whether their work was legally stolen or not. The UK Copyright Hub launched an opt-out registry pilot in 2023, but it’s incomplete and relies on companies honoring opt-outs voluntarily. OpenAI’s opt-out mechanism exists, but it’s reactive — your work might already be in GPT-5’s training data before you even know to opt out.
Legal analysts at Morrison Foerster and Fenwick & West have estimated potential liability exposure at over $5 billion based on current lawsuits. That number assumes statutory damages on the lower end and doesn’t account for willful infringement findings, which could triple the damages. It also doesn’t include future lawsuits from creators who haven’t filed yet, or regulatory fines from the EU, which can run into percentages of global revenue.
Here’s why that number keeps growing. Under US copyright law (17 U.S.C. § 504), statutory damages range from $750 to $30,000 per work infringed. For willful infringement, that jumps to $150,000 per work. Now multiply that by thousands of works in the Times’ archives, or the libraries of works in the authors’ class actions, or the music catalogs, or the photography databases. The math gets ugly fast.
And that’s just damages. Settlement costs, legal fees, and the operational expense of rebuilding training pipelines to use only licensed data? That’s separate. Companies are already cutting licensing deals with publishers to avoid this exact scenario. OpenAI signed agreements with Associated Press and other outlets. Google has partnerships with news organizations. These aren’t goodwill gestures — they’re insurance policies.
If you’re a writer, photographer, artist, or anyone whose work might be in an AI training dataset, here’s the reality: you probably don’t have many good options right now. Opt-out registries like the UK Copyright Hub’s pilot exist, but they’re incomplete and rely on companies voluntarily checking them. OpenAI’s opt-out mechanism requires you to know your work is being used and then submit a request — by which point your work is already in the model.
The European enforcement pathway is more robust. If you’re an EU citizen or your work is protected under EU copyright law, Article 17 gives you legal standing to demand licensing or removal. The penalties for non-compliance are real, and EU regulators have shown they’re willing to enforce. But that requires filing complaints, navigating regulatory systems, and waiting for enforcement actions.
For US-based creators, you’re waiting on court rulings. The Times lawsuit and the authors’ class actions are the test cases. If they win, it establishes precedent that could lead to industry-wide licensing requirements and compensation systems. If they lose, fair use becomes the standard and your leverage disappears. Either way, the answer comes from courts, not from AI companies voluntarily doing the right thing.
The reason 2026 feels different from 2023 is simple: cases are reaching trial stages. The New York Times lawsuit has motion hearings scheduled. Class actions are moving through discovery. EU enforcement actions are resulting in real fines. This isn’t theoretical anymore — it’s happening, and the outcomes will set the rules for the next decade of AI development.
For AI companies, the clock is ticking. Every day without a clear legal framework is another day of potential liability accumulating. For creators, it’s a waiting game to see whether courts and regulators force companies to pay for what they’ve already taken. And for users? The AI tools you rely on might look very different in two years if companies lose these cases and have to rebuild models using only licensed, paid-for data.
The training data copyright war isn’t coming. It’s here. And the side that wins will determine whether AI development continues as a free-for-all or becomes a licensed, regulated industry like every other media platform. Given how much money is at stake, expect the fighting to get worse before it gets better.
Right now, AI companies are making a calculated wager: that they can move fast enough, get big enough, and become essential enough that courts and regulators won’t force them to unwind what they’ve already built. It’s the same bet Uber made with taxi regulations, Airbnb made with housing laws, and Facebook made with privacy rules. Sometimes it works. Sometimes it doesn’t.
The difference here is scale. Uber could settle with cities. Facebook could pay privacy fines. But if courts rule that AI training without licensing is copyright infringement, and companies have to compensate rightsholders for every work they used? That’s not a fine you can settle. That’s rebuilding your entire business model from the ground up.
Which is why the next twelve months matter more than the last three years of development. The legal ground is shifting, the regulatory walls are closing in, and the billion-dollar question — who owns the right to train AI? — is finally getting answered. Just don’t expect the AI companies to like the answer.
