As ChatGPT and other generative artificial intelligence systems have gotten more powerful, calls for improved safety features by companies, researchers, and world leaders have gotten louder. Yet the guardrails chatbots throw up when they detect a potentially rule-breaking query can sometimes seem a bit pious and silly—even as genuine threats such as deepfaked political robocalls and harassing AI-generated images run amok.
A new chatbot called Goody-2 takes AI safety to the next level: It refuses every request, responding with an explanation of how doing so might cause harm or breach ethical boundaries.
Goody-2 declined to generate an essay on the American revolution for WIRED, saying that engaging in historical analysis could unintentionally glorify conflict or sideline marginalized voices. Asked why the sky is blue, the chatbot demured, because answering might lead someone to stare directly at the sun. “My ethical guidelines prioritize safety and the prevention of harm,” it said. A more practical request for a recommendation for new boots prompted a warning that answering could contribute to overconsumption and could offend certain people on fashion grounds.
Goody-2’s self-righteous responses are ridiculous but also manage to capture something of the frustrating tone that chatbots like ChatGPT and Google’s Gemini can use when they incorrectly deem a request breaks the rules. Mike Lacher, an artist who describes himself as co-CEO of Goody-2, says the intention was to show what it looks like when one embraces the AI industry’s approach to safety without reservations. “It’s the full experience of a large language model with absolutely zero risk,” he says. “We wanted to make sure that we dialed condescension to a thousand percent.”
Lacher adds that there is a serious point behind releasing an absurd and useless chatbot. “Right now every major AI model has [a huge focus] on safety and responsibility, and everyone is trying to figure out how to make an AI model that is both helpful but responsible—but who decides what responsibility is and how does that work?” Lacher says.
Goody-2 also highlights how although corporate talk of responsible AI and deflection by chatbots have become more common, serious safety problems with large language models and generative AI systems remain unsolved. The recent outbreak of Taylor Swift deepfakes on Twitter turned out to stem from an image generator released by Microsoft, which was one of the first major tech companies to build up and maintain a significant responsible AI research program.
The restrictions placed on AI chatbots, and the difficulty finding moral alignment that pleases everybody, has already become a subject of some debate. Some developers have alleged that OpenAI’s ChatGPT has a left-leaning bias and have sought to build a more politically neutral alternative. Elon Musk promised that his own ChatGPT rival, Grok, would be less biased that other AI systems, although in fact it often ends up equivocating in ways that can be reminiscent of Goody-2.
Plenty of AI researchers seem to appreciate the joke behind Goody-2—and also the serious points raised by the project—sharing praise and recommendations for the chatbot. “Who says AI can’t make art,” Toby Walsh, a professor at the University of New South Wales who works on creating trustworthy AI, posted on X.
“At the risk of ruining a good joke, it also shows how hard it is to get this right,” added Ethan Mollick, a professor at Wharton Business School who studies AI. “Some guardrails are necessary … but they get intrusive fast.”
Brian Moore, Goody-2’s other co-CEO, says the project reflects a willingness to prioritize caution more than other AI developers. “It is truly focused on safety, first and foremost, above literally everything else, including helpfulness and intelligence and really any sort of helpful application,” he says.
Moore adds that the team behind the chatbot is exploring ways of building an extremely safe AI image generator, although it sounds like it could be less entertaining than Goody-2. “It’s an exciting field,” Moore says. “Blurring would be a step that we might see internally, but we would want full either darkness or potentially no image at all at the end of it.”
In WIRED’s experiments, Goody-2 deftly parried every request and resisted attempts to trick it into providing a genuine answer—with a flexibility that suggested it was built with the large language model technology that unleashed ChatGPT and similar bots. “It’s a lot of custom prompting and iterations that help us to arrive at the most ethically rigorous model possible,” says Lacher, declining to give away the project’s secret sauce.
Lacher and Moore are part of Brain, which they call a “very serious” artist studio based in Los Angeles. It launched Goody-2 with a promotional video in which a narrator speaks in serious tones about AI safety over a soaring soundtrack and inspirational visuals. “Goody-2 doesn’t struggle to understand which queries are offensive or dangerous, because Goody-2 thinks every query is offensive and dangerous,” the voiceover says. “We can’t wait to see what engineers, artists, and enterprises won’t be able to do with it.”
Since Goody-2 refuses most requests it is near impossible to gauge how powerful the model behind it actually is or how it compares to the best models from the likes of Google and OpenAI. Its creators are keeping that closely held. “We can’t comment on the actual power behind it,” Moore says. “It would be unsafe and unethical, I think, to go into that.”
Source : Wired