Meta Releases Llama 3.2—and Gives Its AI a Voice

Mark Zuckerberg announced today that Meta, his social-media-turned-metaverse-turned-artificial intelligence conglomerate, will upgrade its AI assistants to give them a range of celebrity voices, including those of Dame Judi Dench and John Cena. The more important upgrade for Meta’s long-term ambitions, though, is the new ability of its models to see users’ photos and other visual information.

Meta today also announced Llama 3.2, the first version of its free AI models to have visual abilities, broadening their usefulness and relevance for robotics, virtual reality, and so-called AI agents. Some versions of Llama 3.2 are also the first to be optimized to run on mobile devices. This could help developers create AI-powered apps that run on a smartphone and tap into its camera or watch the screen in order to use apps on your behalf.

“This is our first open source, multimodal model, and it’s going to enable a lot of interesting applications that require visual understanding,” Zuckerberg said on stage at Connect, a Meta event held in California today.

Given Meta’s enormous reach with Facebook, Instagram, WhatsApp, and Messenger, the assistant upgrade could give many people their first taste of a new generation of more vocal and visually capable AI helpers. Meta said today that more than 180 million people already use Meta AI, as the company’s AI assistant is called, every week.

Meta has lately given its AI a more prominent billing in its apps—for example, making it part of the search bar in Instagram and Messenger. The new celebrity voice options available to users will also include Awkwafina, Keegan Michael Key, and Kristen Bell.

Patrick Wendell, cofounder and VP of engineering at Databricks, a company that hosts AI models including Llama, says many companies are drawn to open models because they allow them to better protect their own data.

Large language models are increasingly becoming “multimodal,” meaning they are trained to handle audio and images as input as well as text. This extends a model’s abilities and allows developers to build new kinds of AI applications on top of it, including so-called AI agents capable of carrying out useful tasks on computers on their behalf. Llama 3.2 should make it easier for developers to build AI agents that can, say, browse the web, perhaps hunting for deals on a particular type of product when given a short description.

Earlier today, the Allen Institute for AI (Ai2), a research institute in Seattle, released an advanced open source multimodal model called Molmo. Molmo was released under a less restrictive license than Llama, and Ai2 is also releasing details of its training data, which can help researchers and developers experiment with and modify the model.

Meta said today that it would release several sizes of Llama 3.2 with corresponding capabilities. Besides two more powerful instantiations with 11 billion and 90 billion parameters—a measure of a model’s complexity as well as its size—Meta is releasing less capable 1 billion and 3 billion parameter versions designed to work well on portable devices. Meta says these versions have been optimized for ARM-based mobile chips from Qualcomm and MediaTek.

Meta’s AI overhaul comes at a heady time, with tech giants racing to offer the most advanced AI. The company’s decision to release its most prized models for free may give it an edge in providing the foundation for many AI tools and services—especially as companies begin to explore the potential of AI agents.

Source : Wired