Last week, some voters in New Hampshire received an AI-generated robocall impersonating President Biden, telling them not to vote in the state’s primary election. It’s not clear who was responsible for the call, but two separate teams of audio experts tell WIRED it was likely created using technology from voice-cloning startup ElevenLabs.
ElevenLabs markets its AI tools for uses like audiobooks and video games; it recently achieved “unicorn” status by raising $80 million at a $1.1 billion valuation in a new funding round co-led by venture firm Andreessen Horowitz. Anyone can sign up for the company’s paid service and clone a voice from an audio sample. The company’s safety policy says it is best to obtain someone’s permission before cloning their voice, but that permissionless cloning can be OK for a variety of non-commercial purposes, including “political speech contributing to public debates.” ElevenLabs did not respond to multiple requests for comment.
Pindrop, a security company that develops tools to identify synthetic audio, claimed in a blog post on Thursday that its analysis of audio from the call pointed to ElevenLabs’ technology or a “system using similar components.” The Pindrop research team checked patterns in the audio clip against more than 120 different voice synthesis engines looking for a match, but wasn’t expecting to find one because identifying the provenance of AI-generated audio can be difficult. The results were surprisingly clear, says Pindrop CEO Vijay Balasubramaniyan. “It came back well north of 99 percent that it was ElevenLabs,” he says.
The Pindrop team worked on a 39-second clip the company obtained of one of the AI-generated robocalls. It sought to verify its results by also analyzing audio samples known to have been created using ElevenLabs’ technology and also with another voice synthesis tool to check over the methodology.
ElevenLabs offers its own AI speech detector on its website that it says can tell whether an audio clip was created using the company’s technology. When Pindrop ran its sample of the suspect robocall through that system, it came back as 84 percent likely to be generated using ElevenLabs tools. WIRED independently got the same result when checking Pindrop’s audio sample with the ElevenLabs detector.
Hany Farid, a digital forensics specialist at the UC Berkeley School of Information, was initially skeptical of claims that the Biden robocall came from ElevenLabs. “When you hear the audio from a cloned voice from ElevenLabs, it’s really good,” he says. “The version of the Biden call that I heard was not particularly good, but the cadence was really funky. It just didn’t sound of the quality that I would have expected from ElevenLabs.”
But when Farid had his team at Berkeley conduct its own, independent analysis of the audio sample obtained by Pindrop, it too reached the same conclusion. “Our model says with high confidence that it is AI-generated and likely to be ElevenLabs,” he claims.
This is not the first time that researchers have suspected ElevenLabs tools were used for political propaganda. Last September, NewsGuard, a company that tracks online misinformation, claimed that TikTok accounts sharing conspiracy theories using AI-generated voices, including a clone of Barack Obama’s voice, used ElevenLabs’ technology. “Over 99 percent of users on our platform are creating interesting, innovative, useful content,” ElevenLabs said in an emailed statement to The New York Times at the time, “but we recognize that there are instances of misuse, and we’ve been continually developing and releasing safeguards to curb them.”
If the Pindrop and Berkeley analyses are correct, the deepfake Biden robocall was made with technology from one of the tech industry’s most prominent and well-funded AI voice startups. As Farid notes, ElevenLabs is already seen as providing some of the highest-quality synthetic voice offerings on the market.
According to the company’s CEO in a recent Bloomberg article, ElevenLabs is valued by investors at more than $1.1 billion. In addition to Andreessen Horowitz, its investors include prominent individuals like Nat Friedman, former CEO of Github, and Mustafa Suleyman, cofounder of AI lab DeepMind, now part of Alphabet, and firms like Sequoia Capital and SV Angel.
With its lavish funding, ElevenLabs is arguably better positioned than other AI startups to pour resources into creating effective safeguards against bad actors—a task made all the more urgent by the upcoming presidential elections in the United States. “Having the right safeguards is important, because otherwise anyone can create any likeness of any person,” Balasubramaniyan says. “As we’re approaching an election cycle, it’s just going to get crazy.”
A Discord server for ElevenLabs enthusiasts features people discussing how they intend to clone Biden’s voice, and sharing links to videos and social media posts highlighting deepfaked content featuring Biden or AI-generated dupes of Donald Trump and Barack Obama’s voices.
Although ElevenLabs is a market leader in AI voice cloning, in just a few years the technology has become widely available for companies and individuals to experiment with. That has created new business opportunities, such as creating audiobooks more cheaply, but also increases the potential for malicious use of the technology. “We have a real problem,” says Sam Gregory, program director at the nonprofit Witness, which helps people use technology to promote human rights. “When you have these very broadly available tools, it’s quite hard to police.”
While the Pindrop and Berkeley analyses suggest it could be possible to unmask the source of AI-generated robocalls, the incident also underlines how underprepared authorities, the tech industry, and the public are as the 2024 election season ramps up. It is difficult for people without specialist expertise to confirm the provenance of audio clips or check whether they are AI-generated. And more sophisticated analyses might not be completed quickly enough to offset the damage caused by AI-generated propaganda.
“Journalists and election officials and others don’t have access to reliable tools to be doing this quickly and rapidly when potentially election-altering audio gets leaked or shared,” Gregory says. “If this had been something that was relevant on election day, that would be too late.”
Source : Wired