Authors Accuse Tech Giants of Copyright Violation in AI Training
Fact checked

Landmark Copyright Lawsuit: Authors Challenge Tech Giants Over AI Training Practices 

In the ongoing tussle between technological advancement and intellectual property rights, another significant lawsuit has been filed. A coalition of authors, including former Arkansas governor Mike Huckabee and renowned Christian author Lysa TerKeurst, have accused tech behemoths Meta, Microsoft, and Bloomberg of copyright infringement. At the heart of the dispute lies the use of a controversial dataset named “Books3,” allegedly packed with thousands of pirated books, for training artificial intelligence (AI) models.

Key Points: 

  •  A group of authors have sued Meta, Microsoft, and Bloomberg over copyright infringements concerning AI training. 
  • The lawsuit revolves around the controversial “Books3” dataset, believed to contain many pirated books. 
  • Bloomberg clarified its stance, stating it used Books3 only for research purposes and not commercial models. 
  • The authors contend that the tech companies gained substantial unauthorized value from their books. 
  • The core issue intersects technological progress, intellectual property rights, and the definition of ‘fair use.’ 

The class-action lawsuit was filed in a New York federal court. The plaintiffs claim that these major corporations trained their vast language AI models using the contentious “Books3” dataset without requisite permissions. An AI research entity named EleutherAI is the focus of the case for purportedly supplying data, including the disputed contents of Books3, to train these corporate AI systems. 

Bloomberg through a spokesperson was quick to clarify, stating that while the company utilized the Books3 dataset to train its research model, it didn’t use the same for its commercial large language model, named BloombergGPT. In contrast to Bloomberg, Microsoft refrained from commenting, and Meta had not issued any statement at the time of the initial reporting. 

Highlighting the significance of intellectual property, the authors’ legal representatives stated, “We’re not opposed to innovation; we’re opposed to the theft behind the innovation.” This line reflects a growing concern among artists and authors about the unauthorized harnessing of their work for AI system training. 

 While many of these copyright infringement suits revolve around the controversial Books3 dataset and the allegedly sourced from unauthorized online “shadow libraries”; it supposedly contains text excerpts from myriad books. Prominent names like Huckabee, TerKeurst, David Kinnaman, Tsh Oxenreider, and John Blase have asserted that their published works formed part of this dataset. Consequently, they believe these were wrongfully used to hone Meta’s Llama 2 large-language model (developed in association with Microsoft) and BloombergGPT. Based on the alleged damages, they are seeking unspecified monetary compensation and an injunction against the misuse of their creative outputs. 

As technology progresses at an unprecedented pace, the lines defining ‘fair use’ under U.S. copyright law are being tested. While AI enterprises defend their actions citing the ‘fair-use doctrine,’ creators and copyright holders are becoming increasingly vigilant about protecting their rights in this digital age. 

Get a free legal case review today

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.