Google Uses YouTube's 20 Billion Videos to Train AI Without Creators' Consent

Google has been using YouTube’s massive video library to train its most powerful artificial intelligence tools, including the Gemini chatbot and Veo 3 video generator, according to new reports. The practice has sparked concerns among content creators who say they were never told their work was being used to build AI systems that could eventually compete with them.

The tech giant confirmed that it uses a subset of YouTube’s estimated 20 billion videos to train its AI models, though it hasn’t disclosed which videos or how many are being used. Even if Google only used 1% of YouTube’s catalog, that would amount to 2.3 billion minutes of content – more than 40 times the training data used by competing AI models, according to experts.

Creators Left in the Dark

Most YouTube creators and media companies weren’t aware their content was being used this way. Multiple leading creators and intellectual property professionals said they had never been informed by YouTube that their videos could be used to train Google’s AI models.

“It’s plausible that they’re taking data from a lot of creators that have spent a lot of time and energy and their own thought to put into these videos,” said Luke Arrigoni, CEO of Loti, a company that protects digital identity for creators. “It’s helping the Veo 3 model make a synthetic version, a poor facsimile, of these creators. That’s not necessarily fair to them”.

The revelation is particularly significant given Google’s recent announcement of Veo 3, one of the most advanced AI video generators on the market. The tool can create cinematic-quality video sequences with synchronized audio, including dialogue and sound effects, all generated entirely by AI.

Legal Gray Area Under YouTube’s Terms

When users upload videos to YouTube, they agree to terms of service that grant the platform broad rights to their content. The terms state: “By providing Content to the Service, you grant to YouTube a worldwide, non-exclusive, royalty-free, sublicensable and transferable license to use that Content.”

YouTube also stated in a September 2024 blog post that content could be used to “improve the product experience… including through machine learning and AI applications”.

However, experts argue that while Google may have the legal right to use this content, the ethical implications are murky. Creators are essentially helping train systems that could replace them without receiving any credit, consent, or compensation.

No Opt-Out for Google’s Own Models

YouTube does allow creators to opt out of third-party AI training from companies like Amazon, Apple, and Nvidia. But there’s no option to prevent Google from using their content to train its own AI models.

“We’ve always used YouTube content to make our products better, and this hasn’t changed with the advent of AI,” a YouTube spokesperson said. “We also recognize the need for guardrails, which is why we’ve invested in robust protections that allow creators to protect their image and likeness in the AI era.”

Evidence of Direct Content Matching

Some creators have found concerning similarities between their original work and AI-generated content. Vermillio, a company that helps protect individuals from AI misuse, used its proprietary Trace ID tool to analyze potential matches.

In one example, a video from YouTube creator Brodie Moss closely matched content generated by Veo 3. The analysis tool gave the comparison a score of 71 out of 100, with the audio alone scoring over 90 – indicating significant overlap.

Part of a Broader Industry Pattern

Google isn’t alone in mining YouTube for AI training data. Previous reports have shown that OpenAI transcribed over a million hours of YouTube videos to train its language models. Nvidia, Anthropic, Apple, and Salesforce have also used YouTube content for their AI development efforts.

The practice has led to increasing legal challenges. Disney and Universal recently filed a joint lawsuit against AI image generator Midjourney, alleging copyright infringement – the first such lawsuit from major Hollywood studios.

Mixed Reactions From Creators

Not all creators are upset about the practice. Some see it as an opportunity to embrace new technology.

Google has included an indemnification clause for its generative AI products, meaning it will take legal responsibility and cover costs if users face copyright challenges over AI-generated content.

The company has also partnered with Creative Artists Agency to help top talent identify and manage AI-generated content featuring their likeness, and provides tools for creators to request takedowns of videos that abuse their image.

Growing Calls for Regulation

The situation has drawn attention from lawmakers concerned about AI’s impact on creators and artists.

“The people who are losing are the artists and the creators and the teenagers whose lives are upended,” said Senator Josh Hawley during a May Senate hearing about AI replication of human likeness. “We’ve got to give individuals powerful enforceable rights and their images in their property in their lives back again or this is just never going to stop”.

As AI technology advances, the debate over data usage, creator rights, and fair compensation continues to intensify across the tech industry.

RELATED POST: