Mustafa Suleyman was at the center of an artificial intelligence revolution once before.
As a cofounder of DeepMind, a British company acquired by Google in 2014, he helped devise a new way for computers to tackle seemingly impossible problems by combining practice with positive and negative feedback. DeepMind demonstrated the approach by developing a superhuman Go-playing program, AlphaGo, which defeated the world’s best Go player in 2016.
Now the CEO of Microsoft AI, Suleyman is talking up a new kind of AI breakthrough.
As CEO of Microsoft AI, Suleyman oversees efforts to integrate the same AI that powers ChatGPT into software—including the Windows operating system—that runs most of the world’s personal computers.
In its latest upgrade, Microsoft announced today that its AI assistant, Copilot, now has a humanlike voice, the ability to see a user’s screen, and better reasoning skills.
Suleyman says it’s all part of a plan to make users fall back in love with the PC. He spoke to WIRED senior writer Will Knight from Redmond, Washington—over Microsoft Teams, naturally. The conversation has been lightly edited.
Will Knight: What’s the new vision for Copilot?
Mustafa Suleyman: We really are at this amazing kind of transition point. AI companions now see what we see, hear what we hear, and speak in the same language that we use to communicate with one another.
There is this new sort of design material that is actually about persistence, about relationship, about emotion. And I’m sort of crafting experiences which are about a kind of lasting, sustained interaction with a companion.
You joined Microsoft from Inflection AI, where the focus was building supportive and empathetic AI. It sounds like you’ve brought that to your new employer.
What I’ve long believed in, even since before DeepMind days, is AI’s potential to provide support. Emotional support is actually one of the things I first worked on as a 19-year-old, when I started a telephone counseling service.
That’s the beauty of this technological moment. To see what it feels like to engage with one of these experiences over a sustained period of time—this companion that really gets to know you. It’s coaching you, encouraging you, supporting you, teaching you. I think that isn’t going to feel like a computer anymore.
What’s the idea with Copilot Vision, the “labs” feature that Pro users will be able to try?
The vision mode enables you to say “what’s that thing over there [on your screen]?” Or, “Wait, what’s that? What do you think of that? Is that cool?”
There’s just so many little moments when you’re sitting at your computer. It’s phenomenal to have this AI companion see whatever you see, and talk to you in real time about what you’re looking at. It sort of changes the route that you take through your digital life, because you don’t have the burden of having to type something in.
This sounds like Recall, the controversial and now opt-in Windows feature that records a users’ on screen.
We don’t save any of the material with Copilot Vision, so once you close the browser after your session, it all just disappears. It fully deletes. But I’m thinking about if and how to introduce it in the future, because a lot of people do want that experience. If you could just say, ‘What was that picture that I saw online the other day? What was that meme?’ I think we’ll have to look into it one day.
At the moment, though, the Copilot Vision tool is ephemeral. We’ll sort of have to experiment over time and see what makes sense on that front.
What about the privacy risks introduced when people share sensitive information with Copilot otherwise?
We store the logs that you generate as a result of your conversation, those in the most secure way, to the highest standard of Microsoft Security. And we do save those because obviously you do want conversation history.
You’re also introducing Think Deeper, which will let Copilot tackle more difficult problems. This is based on OpenAI’s o1 model, aka Strawberry, right?
It’s like Strawberry, yeah. There’s an OpenAI model that we’ve tuned up for our more consumer purposes, and we’ve got it to act in a way that is more consistent with our AI companion theme.
What are the differences?
OpenAI’s is much more focused on pure math and scientific problem-solving. And what we’ve tried to do is have it focus on side-by-side comparisons and sort of consumer analysis, stuff like that.
Or when you get stuck on a hard problem or you want to reason through something, then it can really lay out a side-by-side comparison, or do an analysis at scale.
Are people at Microsoft already using this new version of Copilot?
Yeah, everyone’s using it. We’ve just gone on general availability across the company a few days ago. So everybody is using it, giving tons and tons of feedback. Our feedback channels are absolutely slammed. It’s a lot of fun.
People are going to remember Clippy, Microsoft’s last AI helper for Windows. Do people there see parallels?
Ha, well I saw Bill Gates the other day, and he was like, you do realize you’ve misnamed this whole AI thing? It should be called Clippy. I was like, dude!
But I mean, it just shows you how mind-blowing people like Bill are. People who see, not just, you know, two years ahead, but 20 years.
Are the new features a step towards so-called AI agents, which do useful chores on a computer?
Yeah, absolutely. The first stage is AI processing the same information that you process—seeing what you see, hearing what you hear, consuming the text that you consume. The second phase is [AI having] a long term, persistent memory that creates a shared understanding over time. And the third stage is AI interacting with third parties by sending instructions and taking actions—to buy things, book things, plan a schedule. And we’ve got these two features in an experimental R&D mode that we’re working on.
Wait, you have an AI agent for Windows that can go off and buy things for you?
It’s a way off, but yes, we’ve closed the loop, we’ve done transactions. The problem with this technology is that you can get it working 50, 60 percent of the time, but getting it to 90 percent reliability is a lot of effort. I’ve seen some stunning demos where it can independently go off and make a purchase, so on. But I’ve also seen some car crash moments where it doesn’t know what it’s doing.
Tell me more about a car crash. Did it go off and buy a Lamborghini on Bill’s credit card?
If it used Bill’s credit card, then I think it would be quite funny. But no, like I said, we’re still figuring it out step by step. It’s still deep in the doldrums of the labs. There’s a long way to go with these things but you can count it in quarters, not years, I would say.
What’s going to be the biggest challenge for you in making the kind of AI future you’re describing into reality?
The big thing here is figuring out how to craft technology that is trusted, because it is going to feel like a very intimate and personal experience. We’ve got to get the security part right, we’ve got to get the privacy part right, of course. But I think the real thing is trying to design the conversation so that the agent is able to articulate boundaries, so it can say that’s not something I’m prepared to engage in.
If we can nail that, that’s the foundation of a trusted experience, and then I think we can really push into the complicated side of things, which is how do we let it buy things for you on your behalf, or negotiate on your behalf, or enter into a contract on your behalf, or plan a schedule for you that involves three or four different stops over the course of a Saturday afternoon. And you’re like, I trust you, Copilot, you got this, right? That’s really what we’re working towards.
Source : Wired