Publishers Sue Meta Over AI Training on Pirated Books as Zuckerberg Faces Personal Liability

Image: Apnews
Main Takeaway
Major publishers including Hachette and McGraw Hill allege Meta trained Llama on millions of pirated books, claiming Zuckerberg personally authorized the.
Jump to Key PointsSummary
The New Lawsuit
Five major book publishers and bestselling author Scott Turow filed a class action lawsuit against Meta and CEO Mark Zuckerberg on May 5, 2026, alleging one of the largest copyright violations in history. The suit claims Meta illegally used millions of copyrighted books to train its Llama AI language system. Publishers involved include Hachette Book Group, Macmillan, McGraw Hill, Elsevier, and Cengage. According to court filings, Zuckerberg personally authorized the use of these materials, making him individually liable alongside the company.
The lawsuit alleges Meta obtained these works through piracy websites, including what plaintiffs describe as a systematic campaign to copy protected books without permission or compensation. This represents a significant escalation in the ongoing battle between content creators and AI companies over training data usage rights.
Previous Legal Wins and Their Limits
Just weeks before the publisher lawsuit, Meta scored two major victories in related cases. On June 25-26, 2025, U.S. District Judge Vince Chhabria dismissed separate copyright claims from individual authors including Sarah Silverman and Ta-Nehisi Coates. The judge ruled the authors had made "the wrong arguments" and their claims under the Digital Millennium Copyright Act failed because they couldn't prove intentional removal of copyright information.
However, these wins came with explicit limitations. Judge Chhabria emphasized his rulings applied only to the specific cases before him, leaving the door open for future lawsuits with stronger legal arguments. This distinction became crucial when publishers filed their more comprehensive complaint using different legal strategies and evidence.
Meta's Defense Strategy
Meta's legal team has consistently argued that using copyrighted books for AI training constitutes "fair use" under copyright law. The company maintains that training on copyrighted material transforms it into something new and different, serving a different purpose than the original works. Internal documents revealed in court show Meta staff evaluated over 7 million books and deemed them to have no "economic value" for individual licensing, suggesting a calculated decision rather than oversight.
This fair use argument has become the cornerstone of defense strategies across the AI industry, with companies claiming the transformative nature of AI training exempts them from traditional copyright requirements. The publishers' lawsuit directly challenges this interpretation, potentially setting precedent for the entire sector.
The Stakes for Publishing and AI
The case could fundamentally reshape how AI companies access training data. Publishers are seeking damages that could reach billions given the scale of alleged infringement and the commercial value of AI models trained on their content. The outcome will likely determine whether AI companies must negotiate licensing agreements with content owners or can continue using copyrighted materials without permission.
For publishers, victory could establish a new revenue stream from AI training licenses. For Meta and other AI companies, defeat could force massive retroactive payments and fundamentally alter their business models. The case also raises questions about the viability of current AI development practices if fair use protections prove insufficient.
What Happens Next
The lawsuit enters discovery phase in Manhattan federal court, where publishers will seek internal Meta communications about the decision to use copyrighted books. Zuckerberg's alleged personal involvement adds both legal risk and public relations pressure. The case could take 2-3 years to reach trial, with potential appeals extending the timeline further.
Meanwhile, similar cases against OpenAI, Google, and Anthropic continue through courts, creating a patchwork of precedents. Congress may intervene with new AI-specific copyright legislation if courts deliver conflicting rulings. The ultimate resolution will likely come through either a landmark Supreme Court decision or negotiated industry-wide licensing agreements.
Key Points
Five major publishers and Scott Turow filed class action against Meta and Zuckerberg for AI training on pirated books
Lawsuit alleges Zuckerberg personally authorized massive copyright infringement for Llama training
Meta recently won two related cases from individual authors but judges noted limited scope of rulings
Company argues fair use defense, claiming training transforms copyrighted content into new purpose
Case could force AI industry to negotiate licensing agreements or pay retroactive damages
Questions Answered
Hachette Book Group, Macmillan, McGraw Hill, Elsevier, and Cengage, along with author Scott Turow.
Court filings claim he personally authorized the use of copyrighted books for training Llama AI models.
In June 2025, courts dismissed copyright claims from individual authors including Sarah Silverman, but noted the rulings were case-specific.
Meta argues using copyrighted books for AI training constitutes 'fair use' and transforms the original works into something new.
Potentially billions in damages given the scale of alleged infringement and commercial value of AI models trained on publisher content.
The outcome will set precedent for OpenAI, Google, Anthropic and others facing similar copyright challenges over training data usage.
Source Reliability
62% of sources are highly trusted · Avg reliability: 83
Go deeper with Organic Intel
Simple AI systems for your life, work, and business. Each one includes copyable prompts, guides, and downloadable resources.
Explore Systems