Chip manufacturer Nvidia is said to have secretly used vast amounts of videos for AI training – including from YouTube and Netflix. This is evident from the company's internal documents. However, employees have expressed concerns about copyright.

Nvidia is said to have secretly used large amounts of videos from YouTube, Netflix and Co. for AI training. This is according to a report by 404 Media citing internal documents. The operation was reportedly carried out under the code name “Cosmos”.

AI training: Nvidia secretly uses YouTube and Netflix videos

According to the report, Nvidia employees were instructed via Slack and email to download millions of videos using automated programs. Ming-Yu Liu, vice president of research at Nvidia and head of the Cosmos project, wrote in an email in May 2024:

We are in the process of finalizing the v1 data pipeline and securing the necessary compute resources to build a video data factory capable of delivering a human lifetime's worth of training data per day.

According to a former employee who wished to remain anonymous, Nvidia asked its employees to scrape videos from Netflix and user-generated videos from YouTube, among other things.

This and other video content was then used to train an AI system for Nvidia's Omniverse 3D world generator, “digital human” products and self-driving car systems. The project has not yet been released to the public.

Copyright: Nvidia employees express concerns

According to the internal messages, some Nvidia employees have expressed concerns about the procedure. They are said to have criticized the ethical use in particular with regard to copyright. The responsible project managers have reportedly addressed these concerns 404 Media but dismissed.

Rather, there was “comprehensive approval” from the top management. Nvidia is said to have declared that the project was “in full compliance with the letter and spirit of copyright law.” Accordingly, only forms of expression, not facts or ideas, are protected.

The report comes at a time when the issue of AI training and copyright is playing an increasingly important role. Reddit recently blocked numerous search engines so that they could not train their AI systems with relevant forum content. Since then, Reddit content has only been displayed on Google because the company pays for it.

Also interesting:

Source: https://www.basicthinking.de/blog/2024/08/06/nvidia-heimliches-ki-training-mit-videos-im-wert-eines-menschenleben-pro-tag/

Leave a Reply