The New York Times prohibits using its content to train AI models

Refusing to comply with the restrictions could result in unspecified fines or penalties. | Photo by Kena Betancur/VIEWpressThe New York Times has taken preemptive measures to stop its content from being used to train artificial intelligence models. As reported...

Aug 14, 2023 - 21:02

0 30

The New York Times prohibits using its content to train AI models

The New York Times has taken preemptive measures to stop its content from being used to train artificial intelligence models. As reported by Adweek, the NYT updated its Terms of Service on August 3rd to prohibit its content — inclusive of text, photographs, images, audio/video clips, “look and feel,” metadata, or compilations — from being used in the development of “any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”

The updated terms now also specify that automated tools like website crawlers designed to use, access, or collect such content cannot be used without written permission from the publication. The NYT says that refusing to comply with these new restrictions could result in unspecified fines or penalties. Despite introducing the new rules to its policy, the publication doesn’t appear to have made any changes to its robots.txt — the file that informs search engine crawlers which URLs can be accessed.

Google recently granted itself permission to train its AI services on public data it collects from the web.

The move could be in response to a recent update to Google’s privacy policy that discloses the search giant may collect public data from the web to train its various AI services, such as Bard or Cloud AI. Many large language models powering popular AI services like OpenAI’s ChatGPT are trained on vast datasets that could contain copyrighted or otherwise protected materials scraped from the web without the original creator’s permission.

That said, the NYT also signed a $100 million deal with Google back in February that allows the search giant to feature Times content across some of its platforms over the next three years. The publication said that both companies will work together on tools for content distribution, subscriptions, marketing, ads, and “experimentation,” so it’s possible that the changes to the NYT terms of service are directed at other companies like OpenAI or Microsoft. Semafor reported on Sunday that the Times had dropped out of a media coalition attempting to jointly negotiate with tech companies over AI training data — which means if it does strike deals with companies, it could be more likely on a case-by-case basis.

OpenAI recently announced that website operators can now block its GPTBot web crawler from scraping their websites. Microsoft also added some new restrictions to its own T&Cs that ban people from using its AI products to “create, train, or improve (directly or indirectly) any other AI service,” alongside banning users from scraping or otherwise extracting data from its AI tools.

Earlier this month, several news organizations including The Associated Press and the European Publishers’ Council signed an open letter calling for global lawmakers to usher in rules that would require transparency into training datasets and consent of rights holders before using data for training.

Update 10:15AM ET: Added report on the Times dropping out of a media coalition for negotiating over AI data use.