Google’s Flagship Gemini AI Model Gets a Major Upgrade

Alphabet’s Gemini AI model has been public for only two months, but the company is already releasing an upgrade. Gemini Pro 1.5, launching with limited availability today, is more powerful than its predecessor and can handle huge amounts of text, video, or audio input at a time.

Demis Hassabis, CEO of Google DeepMind, which developed the new model, compares its vast capacity for input to a person’s working memory, something he explored years ago as a neuroscientist. “The great thing about these core capabilities is that they unlock sort of ancillary things that the model can do,” he says.

In a demo, Google DeepMind showed Gemini Pro 1.5 analyzing a 402-page PDF of the Apollo 11 communications transcript. The model was asked to find humorous portions and highlighted several moments, like when astronauts said that a communications delay was due to a sandwich break. Another demo showed the model answering questions about specific actions in a Buster Keaton movie. The previous version of Gemini could have answered these questions only for much shorter amounts of text or video. Google hopes that the new capabilities will allow developers to build new kinds of apps on top of the model.

“It really feels quite magical how the model performs this sort of reasoning across every single page, every single word,” says Oriol Vinyals, a research scientist at Google DeepMind.

Google says Gemini Pro 1.5 can ingest and make sense of an hour of video, 11 hours of audio, 700,000 words, or 30,000 lines of code at once—several times more than other AI models, including OpenAI’s GPT-4, which powers ChatGPT. The company has not disclosed the technical details behind this feat. Hassabis says that one use for models that can handle large amounts of text, tested by researchers at Google DeepMind, is identifying the important takeaways in Discord discussions with thousands of messages.

Source : Wired