Meta is said to have violated the copyright of thousands of book authors for the AI training of its language model Llama. The Facebook Group is now facing several class action lawsuits.
Artificial intelligence requires huge amounts of data. But it is not clear for all of them whether they can even be used for AI training.
The main problem here is the copyright of texts, images or videos. And that's exactly what Meta is said to have violated on a mass scale. The US group is currently facing several class action lawsuits. The accusation: Meta is said to have violated the copyright of thousands of book authors.
Did Meta for AI Training Violate Copyright?
Novelist Christopher Farnsworth has filed one of these class action lawsuits in a US court. In it he accuses Meta of having used his books and those of other authors without permission for Llama's AI training.
He is demanding compensation and wants to stop the use of his works for AI training. He is not alone in this. Other authors have also filed similar class action lawsuits in the same court.
These include comedian Sarah Silverman and author Ta-Nehisi Coates. They also accuse Meta of violating the copyright of their works because the company is said to have used them for AI training.
Where does the data for AI training come from?
The background is a data set called “The Pile”, which is 886 gigabytes in size and contains numerous texts in English. This dataset comes from EleutherAI in 2020 and was made available for training large AI language models.
A subcategory of The Pile called Books3 contains 196,640 copyrighted books. It includes, among others, works by Stephen King, Margaret Atwood and the novelist Christopher Farnsworth.
According to the lawsuit, it is confirmed that Meta downloaded “The Pile” data set and used it “as part of its work in the training and development of its LLMs.” For this reason, Farnsworth accuses Meta of using the books contained in Books3 for the AI training of its Llama models and thus violating copyright.
The problem of AI and copyright
The conflict between AI companies and authors is not new. Companies that need data to train their AI models often refer to the fair use doctrine of US copyright law.
This doctrine stipulates that copyrighted works can also be used without authorization in areas such as public education. This is primarily aimed at science and the work of researchers and students.
But many big players in the AI industry also refer to the fair use doctrine and accuse the plaintiffs of slowing down progress in the field of artificial intelligence.
But while the AI companies rely on this, authors are demanding compensation for their works. The training of AI models is often compared to human learning. However, people who learn from books would buy them or borrow them from libraries, Farnsworth's class action lawsuit says.
People would “legally obtain” the works, thereby offering at least a certain level of compensation for authors and creators. He further explained: “Meta does not do this and has appropriated the content of authors to create a machine that generates exactly the type of content that authors are normally paid for.”
Also interesting:
Source: https://www.basicthinking.de/blog/2024/10/12/meta-urheberrecht-ki-training/