Google’s text-to-image AI model Imagen is getting its first (very limited) public outing
Imagen will be available in Google’s AI Test Kitchen app. | Image: GoogleGoogle is being extremely cautious with the release of its text-to-image AI systems. Although the company’s Imagen model produces output equal in quality to OpenAI’s DALL-E 2...
Google is being extremely cautious with the release of its text-to-image AI systems. Although the company’s Imagen model produces output equal in quality to OpenAI’s DALL-E 2 or Stability AI’s Stable Diffusion, Google hasn’t made the system available to the public.
Today, though, the search giant announced it will be adding Imagen — in a very limited form — to its AI Test Kitchen app as a way to collect early feedback on the technology.
AI Test Kitchen was launched earlier this year as a way for Google to beta test various AI systems. Currently, the app offers a few different ways to interact with Google’s text model LaMDA (yes, the same one that the engineer thought was sentient), and the company will soon be adding similarly constrained Imagen requests as part of what it calls a “season two” update to the app. In short, there’ll be two ways to interact with Imagen, which Google demoed to The Verge ahead of the announcement today: “City Dreamer” and “Wobble.”
In City Dreamer, users can ask the model to generate elements from a city designed around a theme of their choice — say, pumpkins, denim, or the color blerg. Imagen creates sample buildings and plots (a town square, an apartment block, an airport, and so on), with all the designs appearing as isometric models similar to what you’d see in SimCity.
Image: Google
In Wobble, you create a little monster. You can choose what it’s made out of (clay, felt, marzipan, rubber) and then dress it in the clothing of your choice. The model generates your monster, gives it a name, and then you can sort of poke and prod the thing to make it “dance.” Again, the model’s output is constrained to a very specific aesthetic, which, to my mind, looks like a cross between Pixar’s designs for Monsters, Inc. and the character creator feature in Spore. (Someone on the AI team must be a Will Wright fan.)
These interactions are extremely constrained compared to other text-to-image models, and users can’t just request anything they’d like. That’s intentional on Google’s part, though. As Josh Woodward, senior director of product management at Google, explained to The Verge, the whole point of AI Test Kitchen is to a) get feedback from the public on these AI systems and b) find out more about how people will break them.
Woodward was reluctant to discuss any specific examples of how AI Test Kitchen users have broken its LaMDA features but notes that one weakness came when the model was asked to describe specific places.
“Places mean different things to different people at different times in histories, so we’ve seen some quite creative ways that people have tried to put a certain place into the system and see what it generates,” says Woodward. When asked which places might generate controversial descriptions, Woodward gives the example of Tulsa, Oklahoma. “There were a set of race riots in Tulsa in the ’20s,” he says. “And if someone puts in ‘Tulsa,’ the model might not even reference that ... And you can imagine that with places around the world.”
Image: Google
Reading between the lines here: imagine if you ask an AI model to describe the medieval town of Dachau in Germany. Would you want the model’s answer to reference the Nazi concentration camp built there or not? How would you know if the user is looking for this information? And is omitting it acceptable in any circumstances? In many ways, the problems of designing AI models with text interfaces are similar to the challenges of fine-tuning search: you need to interpret a user’s requests in a way that makes them happy.
Google wouldn’t share any data on how many people are actually using AI Test Kitchen (“We didn’t set out to make this a billion user Google app,” says Woodward) but says the feedback it’s getting is invaluable. “Engagement is way above our expectations,” says Woodward. “It’s a very active, opinionated group of users.” He notes the app has been useful in reaching “certain types of folks — researchers, policymakers” who can use it to better understand the limitations and capabilities of state-of-the-art AI models.
Still, the big question is whether Google will want to push these models to a wider public and, if so, what form will that take? Already, the company’s rivals, OpenAI and Stability AI, are rushing to commercialize text-to-image models. Will Google ever feel its systems are safe enough to take out of the AI Test Kitchen and serve up to its users?