Artists can use a data poisoning tool to confuse DALL-E and corrupt AI scraping

Image: OpenAIFighting against data used to train AI models has become more poisonous. A new tool called Nightshade allows users to attach it to their creative work, and it will corrupt — or poison — training data using that...

Artists can use a data poisoning tool to confuse DALL-E and corrupt AI scraping

Fighting against data used to train AI models has become more poisonous. 

A new tool called Nightshade allows users to attach it to their creative work, and it will corrupt — or poison —  training data using that art. Eventually, it can ruin future models of AI art platforms like DALL-E, Stable Diffusion, and Midjourney, removing its ability to create images.

Nightshade adds invisible changes to pixels in a piece of digital art. When the work is ingested by a model for training, the “poison” exploits a security vulnerability that confuses the model, so it will no longer read an image of a car as a car and come up with a cow instead. 

The MIT Technology Review reported that Ben Zhao, a professor at the University of Chicago and one of the creators of Nightshade, hopes to tilt the balance away from AI companies that have taken copyrighted data to train their models. The Nightshade research paper said training data used in text-to-image AI models are vulnerable to the type of attack the tool unleashes. 

“Surprisingly, we show that a moderate number of Nightshade attacks can destabilize general features in a text-to-image generative model, effectively disabling its ability to generate meaningful images,” the paper said. 

Artists vengeful enough to go this route can upload their work into Glaze, a tool made by Nightshade’s creators that masks their art style, for example, making a normally realistic drawing into something cubist, perhaps. Nightshade will be integrated into Glaze, letting users choose if they want to use the poison pill or be satisfied that the model can’t mimic their art style. 

Nightshade’s creators proposed in its paper that the tool and similar ones should be used as “a last defense for content creators against web scrapers” that do not recognize opt-out rules. 

Copyright issues around AI-generated content and training data remain a gray area in the absence of regulation. Many of the lawsuits fighting against copyright infringement are still making their way through the courts. Meanwhile, methods to prevent web crawlers from taking data without permission have been limited to blocking access to scrapers. Companies like Adobe plan to use a mark to flag if something was AI-generated but also show who owns the image. 

Some of the first lawsuits against generative AI platforms focused on copyrighted material making its way to training data for models, especially by text-to-image platforms. Three artists sued Stable Diffusion, Midjourney, and the art website DeviantArt in January, alleging that its models took their art without permission. Getty Images also filed a lawsuit against Stable Diffusion before building its own AI image generator trained on its licensed images. 

Google and Microsoft have said they are willing to take the legal heat if some customers are sued for copyright infringement while using their generative AI products. However, the majority of these products are text-based.