Chatbots Are Entering Their Stone Age

For all the bluster about generative artificial intelligence upending the world, the technology has yet to meaningfully transform white-collar work. Workers are dabbling with chatbots for tasks such as drafting emails, and companies are launching countless experiments, but office work hasn’t undergone a major AI reboot.

Perhaps that’s only because we haven’t given chatbots like Google’s Gemini and OpenAI’s ChatGPT the right tools for the job yet; they’re generally restricted to taking in and spitting out text via a chat interface. Things might get more interesting in business settings as AI companies start deploying so-called “AI agents,” which can take action by operating other software on a computer or via the internet.

Anthropic, a competitor to OpenAI, announced a major new product today that attempts to prove the thesis that tool use is needed for AI’s next leap in usefulness. The startup is allowing developers to direct its chatbot Claude to access outside services and software in order to perform more useful tasks. Claude can, for instance, use a calculator to solve the kinds of math problems that vex large language models; be required to access a database containing customer information; or be compelled to make use of other programs on a user’s computer when it would help.

I’ve written before about how important AI agents that can take action may prove to be, both for the drive to make AI more useful and the quest to create more intelligent machines. Claude’s tool use is a small step toward the goal of developing these more useful AI helpers being launched into the world right now.

Anthropic has been working with several companies to help them build Claude-based helpers for their workers. Online tutoring company Study Fetch, for instance, has developed a way for Claude to use different features of its platform to modify the user interface and syllabus content a student is shown.

Other companies are also entering the AI Stone Age. Google demonstrated a handful of prototype AI agents at its I/O developer conference earlier this month, among many other new AI doodads. One of the agents was designed to handle online shopping returns, by hunting for the receipt in a person’s Gmail account, filling out the return form, and scheduling a package pickup.

Adept AI, a company cofounded by David Luan, formerly VP of engineering at OpenAI, has been honing AI agents for office work for more than a year. Adept is cagey about who it works with and what its agents do, but the strategy is clear.

“Our agents are already in the 90s [percent] for reliability for our enterprise customers,” Luan says. “The way we did that was to limit the scope of deployment a bit. All the new research we do is to improve reliability for new use cases that we don’t yet do well on.”

A key part of Adept’s plan is to train its AI agents to be better at understanding the goal at hand and the steps required to achieve it. The company hopes that will make the technology flexible enough to help out in all kinds of workplaces. “They need to understand the reward of the actual task at hand,” Luan says. “Not just have the ability to copy existing human behavior.”

The core capabilities needed to make AI agents more useful are also necessary to advance on the grander vision of making machine intelligence more powerful. Right now, the ability to make plans to achieve specific goals is a hallmark of natural intelligence that is notably lacking in LLMs.

It may be an extremely long time before machines attain humanlike intelligence, but the concept of tool use being crucial is evocative given the evolutionary path of Homo sapiens. In the natural world, prehuman hominids began handling crude stone tools for tasks such as cutting animal hides. The fossil record shows how increasingly sophisticated tool use blossomed alongside advancing intelligence, as humans’ dexterity, bipedalism, vision, and brain size progressed. Maybe now it’s time for one of humankind’s most sophisticated tools to develop tool use of its own.

Source : Wired