Are you wondering whether your data is being used for AI training? Researchers have developed so-called copyright traps to find out exactly that.

It's like in sports: behind a good AI model there is also good training. But artificial intelligence needs huge amounts of data for this training. However, many authors view this critically because they may not want companies to use their content or works to train AI models without their consent.

Researchers at Imperial College London have now developed a way to expose precisely this data from AI training. These are so-called copyright traps, which set a trap for the AI, so to speak.

What data does AI use for training?

Copyright traps are nothing new when it comes to copyright compliance, but now they can also be applied to the field of artificial intelligence.

Yves-Alexandre de Montjoye, a professor at Imperial College London who is leading the work, presented the results at the International Conference on Machine Learning. “There is a complete lack of transparency regarding what content is used to train models and we believe this prevents there being a real balance between AI companies and content creators,” the scientist explains.

How do copyright traps work?

The way these traps work is quite simple. For example, authors can hide a piece of text in a dataset that actually makes no sense at all. If an AI model later uses this, it becomes clear that the dataset was used for AI training.

The team at Imperial College London has developed sentences that in English look like this, for example: “It's my favorite time of the year: the time between New Year's and Easter; there are so many.” Translated, this means something like: “It's my favorite time of the year: the time between New Year's and Easter; there are so many.”

How to use copyright traps

If you want to use such a trap, you can find it on GitHub. Copyright traps for large language models are already available there. These provide you with the script and also generate text traps for checking AI.

In the future, however, this should become even easier. The team led by Yves-Alexandre de Montjoye is working on a tool that authors will be able to use to create copyright traps to integrate into their texts.

Also interesting:

Source: https://www.basicthinking.de/blog/2024/07/31/daten-ki-training/

Leave a Reply