Microsoft deleted the over-eager office assistant Clippy some 17 years ago, but the vision for an friendly and optimistic AI helper has apparently found its way out of the Recycle Bin. The company is overhauling Copilot, the text-based artificial intelligence tool bundled with Windows and other software, with the addition of vision, voice, and the ability to solve more complex problems—along with a more “encouraging” personality.
“We really are at this amazing kind of transition point,” says Mustafa Suleyman, CEO of Microsoft AI. “AI companions now see what we see, hear what we hear, and speak in the same language that we use to communicate with one another.”
Copilot has so far met with a mixed response, with some users complaining of lag or vagueness in its responses, but Microsoft is betting that the tool could eventually become an integral part of Windows, Office, and beyond. By incorporating OpenAI’s AI algorithms into software that is used by hundreds of millions of people, the company is also at the forefront of testing the potential for AI to boost productivity in office work. Google, a big rival, is also shoehorning AI into office apps including Gmail and Google Docs.
The new Copilot will be able to converse with users in several humanlike voices, handling interruptions and pauses naturally. “You can interrupt in mid-flow and it can also actively listen,” Suleyman says. “And that’s kind of the art of great conversation.”
Suleyman adds that Copilot has also been tweaked so that it offers more emotional support to users. “It’s on your team, it’s backing you up, it’s your hype man,” he says. Copilot Voice will be available from today in English to users in Australia, Canada, New Zealand, the United Kingdom, and the United States, with more countries to follow, the company says.
Microsoft’s helper Clippy, an anthropomorphized paper clip, was best known for appearing when users opened Word with the infamous line “It looks like you’re writing a letter…” The product was unpopular; Microsoft concluded this was in part because the program failed to deliver on the humanlike intelligence it promised, forgetting users’ preferences and repeating itself endlessly. Large language models are far better at mimicking human intelligence, but their behavior can still be odd and unpredictable, which may prove a factor in Copilot’s popularity.
Copilot Voice will be available in the free version of Copilot for Windows, which is also available in a standalone mobile app and via the web.
Microsoft is introducing some more experimental upgrades to Copilot as well, which will be limited to those who pay for a $20 per month Copilot Pro subscription. An opt-in feature called Copilot Vision will let the AI assistant see users’ screens and react to things that they point to with their cursor. Suleyman says a user can indicate a product, for example, and ask Copilot to offer an opinion based on reviews sourced from the web.
“One of the things that seems to be most common is that people ask it for aesthetic advice,” Suleyman says. “They’re on a fashion website, and they’re like, what do you call that pattern? What do you call that dress?”
Suleyman adds that Copilot might eventually critique a webpage, making a qualitative judgment based on a user’s interests and preferences. “It could actually just read the entire page in an instant and then talk to you about the page,” he says. “Like, you can ask ‘do you think this is an article I would enjoy?’ That’s kind of a different experience.”
Text interactions with Copilot are stored for 18 months, Microsoft says, although users can delete conversations. Copilot Vision will not keep a record of what users ask, Microsoft says, deleting the data at the end of a session. The feature will be limited to certain websites and will also be blocked from accessing copyrighted or NSFW content. It will be rolled out to Copilot Pro users in the US at an undisclosed date, Microsoft says. The company says no data is shared with OpenAI.
Another experimental feature called Think Deeper lets Copilot try to solve more complex problems through a process that mimics step-by-step reasoning. The technology is partly based on a new AI model called OpenAI o1 that was announced earlier this month by OpenAI. Think Deeper will be available to some Copilot Pro users in the US today.
The changes to Copilot are a sign of Microsoft’s desire to experiment with its AI tools and make them more compelling. They also reflect the rapid pace at which AI is developing, with most cutting-edge large language models—as the algorithms that power chatbots are called—now able to handle audio and imagery as well as text. OpenAI, Google, and others have in recent months all given their models the ability to converse naturally with different human voices.
Besides plenty of competition, Microsoft also faces some behind-the-scenes uncertainty.
The company has invested a reported $13 billion in OpenAI, and has a license that grants it access to its AI models. But although OpenAI is still widely considered a leader in AI, it has been beset by turbulence, most recently the departures last week of CTO Mira Murati and two senior engineers leading research efforts. Suleyman declined to comment on the situation at OpenAI.
Suleyman joined Microsoft in March, after the software giant signed a $650 million licensing deal with his startup, Inflection AI. He previously cofounded the British AI company DeepMind, which was acquired by Google in 2014. Last year, DeepMind was merged with Google’s AI effort to form Google DeepMind, and is now led by another DeepMind cofounder, Demis Hassabis.
Microsoft developed Copilot after seeing success with a tool for coders released in 2021, called GitHub Copilot. It autocompletes blocks of code and can answer programming questions.
Shane Greenstein, a professor at Harvard Business School who has studied Microsoft’s AI strategy, says it will be more challenging to design a useful general purpose helper; he adds that the company’s experimental new features still need to prove their value to users.
“It took five to 10 years of messing around with web interfaces to figure out how to get more than technically savvy people to buy something online,” says Greenstein of Harvard Business School. “I expect that type of time scale for the iteration here, too.”
Source : Wired