The number of AI-generated content on the internet is currently increasing significantly. But it is becoming increasingly difficult to identify corresponding photos, texts or videos. Watermarking could be a solution for labeling AI content.
The topic of artificial intelligence has taken the internet by storm after the introduction of ChatGPT. Large language models have evolved enormously in a short period of time. But this always leads to the question of what a labeling requirement for AI-generated content on the internet could look like.
Because while texts and videos are getting better and better, people are finding it increasingly difficult to distinguish them from other content. A solution to this could be a type of watermark that identifies content created by an AI system.
Labeling requirement: Is a watermark for AI the solution?
A labeling requirement for AI-generated content is to be introduced in the European Union by August 2026 at the latest. Because then the transition period for the EU AI law ends.
This states that “the outputs of the AI system must be marked in a machine-readable format”. In addition, AI content should be recognizable as “artificially generated or manipulated”.
One solution could be a type of watermark that identifies AI-generated content as such. Google subsidiary DeepMind has developed a possible method for this.
Google will make this tool available as an open source solution for developers and companies in the future. Accompanying this publication is an article in the specialist magazine Nature published by DeepMind researchers describing SynthID's approach.
SynthID is “the first large-scale application of a generative text watermark for millions of users,” the article says. The tool also offers “better visibility than comparable methods”.
How does SynthID from Google work?
For example, a watermark on images, videos or audios is hidden in individual pixels or sounds. The AI system creates these elements in such a way that a pattern is created, but it is not visible or audible.
This also works for texts, but no pixels or sounds can be changed. Rather, in these cases, the way AI systems work is used.
AI models select the elements for content in such a way that they always choose the one that is statistically most likely as the following token. For example, for a question like “Where are you going…?”, the next token would be “go” rather than “shop”.
But these probabilities can be adjusted, which in turn creates corresponding patterns. These hidden patterns can then be read by an algorithm as watermarks.
Watermarking for AI: Successful test from SynthID
As part of a study, DeepMind researchers examined the effectiveness of watermarking for AI when using SynthID. To achieve this, half of 20 million responses from the Gemini chatbot were watermarked.
The watermark is therefore not recognizable to users. Because they should rate the answers positively or negatively. However, the difference between the answers with and without a watermark was only 0.01 percent.
Also interesting:
Source: https://www.basicthinking.de/blog/2024/10/30/wasserzeichen-ki/