When rumors began swirling last month about the Chinese search giant Baidu working on a chatbot to rival OpenAI’s ChatGPT, it seemed like the perfect move. Baidu has invested heavily in artificial intelligence over the past decade and could harness the technology for its leading search engine, as Microsoft has done for Bing and Google says it will do too.
Yet when Baidu unveiled Ernie Bot, or 文心一言 “Wenxin Yiyan” in Chinese, in Beijing earlier this month, the news fell flat.
Robin Li, Baidu’s CEO, admitted halfway through the launch stream that demos of Ernie Bot answering general knowledge questions, summarizing information from the web, and generating images were prerecorded, leading to snarky commentary on Chinese social media. It didn’t help that OpenAI had introduced a major upgrade, called GPT-4, to the AI technology that powers ChatGPT only the day before.
But Baidu also faces challenges that don’t apply to companies outside of China racing to compete with ChatGPT. It is inherently difficult to contain the tendency of these chatbots to make up or “hallucinate” facts, or the way they can be prompted into saying unpleasant—or inappropriate—things. But Baidu must also adhere to strict government censorship guidelines for online content.
“Baidu is going to face a tension between making a useful chatbot and making one that conforms to Chinese speech controls,” says Matt Sheehan, a fellow at the Carnegie Endowment for International Peace who studies China’s AI industry. “I’m skeptical they’ll be able to create a general-purpose chatbot that users can’t trick into spitting out speech that’s unacceptable in China.”
In less than four months since it was introduced, ChatGPT has become a cultural phenomenon, wowing the world with its ability to write poetry and prose, answer mathematical questions, hold forth on philosophical ideas, and converse fluently on just about any topic. The latest version can respond to images, not just text, and OpenAI says it scores more highly on a range of academic tests and makes fewer errors. In the tech industry, just about every company is now scrambling to develop a chatbot strategy.
The problem of getting models like ChatGPT to behave is far from solved, however. Microsoft was forced to limit the use of its Bing chatbot based on OpenAI’s technology after users found ways of evading the guardrails in place and getting the model to say inappropriate or questionable things such as claiming to want to break free of its controls or professing its feelings for a user.
Like the Bing bot and ChatGPT, Baidu’s Ernie Bot is built on top of a machine learning algorithm known as a large language model that was trained using vast quantities of text to predict the next word in a sentence. That simple mechanism, when paired with vast quantities of text and sufficient computing power, has proven able to produce strikingly humanlike responses.
Baidu and OpenAI both also used an additional training step in which human testers provide feedback on what type of answers are most satisfying. That causes the bots to produce responses that are more helpful but still far from perfect. It is not clear how to prevent such models from fabricating answers some of the time, or how to stop them from ever misbehaving.
China’s censorship regime requires Baidu and other internet companies to block access to certain websites and avoid politically sensitive subjects. The words or phrases that should be blocked can be updated rapidly in response to protests or during special events.
But Jeffrey Ding, an assistant professor at Georgetown University who studies China’s tech industry, says that concerns about censorship do not seem to have slowed the development of large language models in China. He notes that Baidu has made the Ernie language model that underpins its new bot available via an API for some time and that other companies have offered similar models.
Baidu has not given details of Ernie Bot’s training data, but it most likely was scraped from the Chinese internet. This will mean the bot’s feedstock has largely already been curated by China’s censorship rules, which, for example, aim to limit criticism of the government.
Censorship might also affect Chinese chatbots in more subtle ways. An academic research project from 2021 that trained algorithms on the Chinese-language version of Wikipedia, which is blocked in China, and Baidu’s Baike, a crowdsourced encyclopedia subject to government censorship, found that using censored training data significantly changed the meaning that AI software assigned to different words.
The algorithm trained on Chinese-language Wikipedia associated the words “democracy” closer to positive words such as “stability.” The algorithm trained on the censored Baike material represented “democracy” closer to “chaos,” more in line with the policy of China’s government. But because chatbots like ChatGPT can be extremely flexible and remix material in their training data, Baidu has likely had to introduce additional safeguards
Despite its mixed reception, Ernie Bot appears to be a capable competitor to ChatGPT. The bot is currently available only to a limited number of users, some of whom say they are impressed. ChatGPT is not available in China, although it is capable of conversing in Chinese.
Lei Li, a professor at UC Sant Barbara who specializes in AI and previously worked on the technology used to build some of the machine learning behind Ernie bot, points out that Baidu has been working on the underlying technology for around a decade. Microsoft, by contrast, licensed the core technology for Bing’s new chatbot and some forthcoming text-generation features for Office from OpenAI, in which it has invested billions of dollars in return for exclusive rights to its creations.
Li also says he is also impressed with some of what Ernie Bot can do, including its ability to generate stories and business reports. He adds that the hallucination problem is a challenge for all such language models. “This is where researchers still have work to do,” he says.
One WeChat poster compared the Chinese bot’s demoed capabilities to those of ChatGPT and found it better at handling Chinese idioms and more accurate in some instances. For example, ChatGPT incorrectly claimed that the ancestral home of science fiction author Liu Cixin, who wrote The Three Body Problem, is Hubei, while Ernie Bot correctly answered Henan. ChatGPT is blocked in China, but many people have found ways of accessing it.
An executive at one Chinese media company, who has been testing Ernie Bot and who asked to speak anonymously, adds that it has an impressive ability to handle regional Chinese dialects. They judged it to be better than the initial reaction to Baidu’s launch suggested.
Kevin Xu, who writes a popular English- and Chinese-language newsletter on China’s tech industry, believes that Baidu may have rushed its demo out so as to gain a first-mover advantage over other Chinese tech companies. This could help it improve the bot based on user feedback and also seed Chinese startups with the technology.
The search giant was once considered a dominant force in China, but over the past decade it has been overshadowed by Ailibaba, Tencent, and ByteDance, the company behind TikTok. Baidu says that over 100,000 businesses and 900,000 individuals in China have signed up for access to Ernie Bot.
Baidu and its rivals working on ChatGPT-style technology may also be hindered by US semiconductor sanctions aimed at hobbling China’s AI industry. Building cutting-edge large language models requires thousands of specialized computer chips. For now, Baidu may be able to rely on less-powerful chips, including ones designed and made in China. But as chip advances continue, it and other Chinese companies may struggle to keep pace with the scale and power US companies can apply to chatbot projects.
Baidu’s Li acknowledged the tensions between the US and China during the Ernie Bot launch video but played them down. “Ernie is not a tool in the US–China technology competition,” he said, “but the natural outcome of generations of Baidu developers pursuing a dream of using technology to change the world.” Even if Baidu can navigate the challenges ahead, comparisons with ChatGPT seem inevitable.
Source : Wired