A US judge has offered a partial lifeline to Anthropic, ruling that using copyrighted books to train its Claude AI model can count as “transformative” under American law.
That part of the decision has been met with quiet relief by the tech industry. But the court didn’t give Anthropic a free pass. The firm will still have to face trial over its use of pirated copies to build what the judge described as a “central library of all the books in the world.”
This case is part of a wider pattern. The boundaries around copyright, authorship and AI training data are being tested in real time. And it’s not just books. Earlier this month, Disney and Universal filed lawsuits against the AI image generator Midjourney, alleging the same kind of quiet appropriation of their intellectual property.
In the UK, the BBC is now locked in its own standoff with Perplexity AI. The broadcaster says the chatbot has been lifting full articles, word for word, in breach of copyright. They’ve issued a legal letter demanding the content be removed and that Perplexity pay for what’s already been used.
This is not the kind of fight the BBC tends to pick lightly. But it’s clear that publishers are starting to push back. Perplexity, for its part, has denied wrongdoing and framed the issue as an attempt to protect Google’s search dominance. That deflection hasn’t gone down especially well.
What makes these disputes so difficult is that they are not just legal questions. The heart of the matter lies in the relationship between original journalism and the systems now built to repurpose it. AI tools don’t produce news. They don’t investigate or verify. They repackage what others have already done, often without asking first.
Judge Alsup acknowledged that the books were used in a way that transformed them rather than copied them directly. But he was also clear that storing pirated books in bulk is not a settled matter. If found liable, Anthropic could face up to $150,000 in damages per title.
There’s no precedent to lean on here. Generative AI has moved so quickly that both the courts and the industry are catching up at the same time. Until a clearer framework emerges, the default approach appears to be scrape first, settle later.
Publishers and creators are beginning to see the costs of that approach. Copyright used to mean something quite fixed. Now it’s caught in a technological grey area, one where bots can quietly vacuum up entire catalogues of content and present them back to users without much trace of where the original came from.
There is no easy answer to how AI and media coexist. But the number of legal challenges suggests that relying on goodwill and technical loopholes is no longer enough.