With Gemini on Android, Google Points to Mobile Computing’s Future—and Past

Nearly a decade ago, Google showed off a feature called Now on Tap in Android Marshmallow—tap and hold the home button and Google will surface helpful contextual information related to what’s on the screen. Talking about a movie with a friend over text? Now on Tap could get you details about the title without having to leave the messaging app. Looking at a restaurant in Yelp? The phone could surface OpenTable recommendations with just a tap.

I was fresh out of college, and these improvements felt exciting and magical—its ability to understand what was on the screen and predict the actions you might want to take felt future-facing. It was one of my favorite Android features. It slowly morphed into Google Assistant, which was great in its own right, but not quite the same.

Today, at Google’s I/O developer conference in Mountain View, California, the new features Google is touting in its Android operating system feel like the Now on Tap of old—allowing you to harness contextual information around you to make using your phone a bit easier. Except this time, these features are powered by a decade’s worth of advancements in large language models.

“I think what’s exciting is we now have the technology to build really exciting assistants,” Dave Burke, vice president of engineering on Android, tells me over a Google Meet video call. “We need to be able to have a computer system that understands what it sees and I don’t think we had the technology back then to do it well. Now we do.”

I got a chance to speak with Burke and Sameer Samat, president of the Android ecosystem at Google, about what’s new in the world of Android, the company’s new AI assistant Gemini, and what it all holds for the future of the OS. Samat referred to these updates as a “once-in-a-generational opportunity to reimagine what the phone can do, and to rethink all of Android.”

Circle to Search … Your Homework

The upgraded Circle to Search in action.

Courtesy of Google

It starts with Circle to Search, which is Google’s new way of approaching Search on mobile. Much like the experience of Now on Tap, Circle to Search—which the company debuted a few months ago—is more interactive than just typing into a search box. (You literally circle what you want to search on the screen.) Burke says, “It’s a very visceral, fun, and modern way to search … It skews younger as well because it’s so fun to use.”

Samat claims Google has received positive feedback from consumers, but Circle to Search’s latest feature hails specifically from student feedback. Circle to Search can now be used on physics and math problems when a user circles them—Google will spit out step-by-step instructions on completing the problems without the user leaving the syllabus app.

Google Pixel phone showing Gemini Drag and Drop

With a coming update, you’ll be able to drag AI-generated images into emails and messages.

Courtesy of Google

At I/O, the updates to Gemini on Android are to make it more contextually aware, just like Now on Tap nearly a decade ago. Later this year, you’ll be able to generate images with Gemini and drag and drop them into apps like Gmail or Google Messages. Burke showed me an example of Gemini generating an image of tennis with pickles; he was responding to someone’s text about playing pickleball. He hailed Gemini—which popped up as an overlay over the messaging app—asked it to generate the image, and then dragged one and dropped it in the chat.

Google Pixel phone showing Ask this Video feature

You’ll be able to ask Gemini to pull specific bits of information out of a video.

Courtesy of Google

He then pulled up a YouTube video on pickleball rules. Call up Gemini while watching and you’ll see a prompt to “Ask this video.” This lets you employ Gemini to find specific information in the video without scrubbing through the whole thing yourself. (Who has time for that?) Burke asked about a specific pickleball rule, and Gemini quickly spat out an answer based on the video. This “summarize” functionality has been the hallmark of many AI tools—summarizing PDFs, videos, memos, and news stories (yay).

Google Pixel phone showing Ask this PDF feature

Text summaries of videos may prove helpful.

Courtesy of Google

Speaking of PDFs, you’ll soon be able to attach a PDF to Gemini (there will be a prompt for “Ask this PDF”) and Gemini can deliver specific information, saving you the need to scroll through several pages. Burke says these features are rolling out to millions of devices over the next few months, though the PDF feature will only be available for Gemini Advanced users—folks paying for the $20 per month subscription to access the cutting-edge capabilities of Google’s AI models.

Gemini in general will show more “dynamic suggestions” based on what’s happening on the screen. These will pop up right above the Gemini overlay when you activate the assistant.

Gemini Nano Gets an Upgrade

Gemini Nano is Google’s large language model powering select on-device features on certain phones, like the Pixel 8 series, Samsung Galaxy S24 range, and even the new Pixel 8A. Running these as on-device features means data does not need to be sent to the cloud, making the features more private. They can even work offline too.

Nano currently powers features like Summarize in Google’s Recorder app, which summarizes transcriptions, and Smart Reply in select messaging apps, which offers more contextual auto-replies to messages. Google’s newer version of the model—Gemini Nano with Multimodality—will arrive this year, starting with Pixel phones. It’s a bit of a mouthful, but it more or less means Gemini Nano will be able to do more than just process text.

“It’s a 3.8 billion parameter model and it’s multimodal—this is the first on-device built-in multimodal model,” Burke says. “It’s very powerful. It hits, on academic benchmarks, around 80 percent of Gemini 1.0, which is pretty amazing for a small model.”

Google’s screen reader will get an upgrade to better understand and describe images.

Courtesy of Google

This model will now power Google’s existing TalkBack screen reader feature on Android, which helps blind and low-vision users understand what’s on the screen. Gemini Nano will purportedly offer richer and more precise descriptions of what’s in each image. Google says on average, TalkBalk users see “90 unlabeled images per day,” but Gemini can fill the gap as it’ll be able to visualize and understand the images on the screen and describe them even when the user is offline.

Google has poured many of its AI smarts over the past few years into improving its call screening technology to limit robocalls, and Gemini Nano with Multimodality will soon help you avoid phone scams—in real time. A new feature called Scam Detection will have Gemini listening in your phone calls, and if it picks up on certain phrases or requests from the person on the other end, it will issue an alert that you’re likely in the middle of a scam call. Burke says this model was trained on data from websites like BanksNeverAskThat.com to learn what a bank wouldn’t ask you—and the types of things scammers typically ask for. He says all of this listening and detection happens on-device, so it’s private. We’ll hear more about this “opt-in feature” later this year.

Unusually, Google says it will be unveiling a few new Android features tomorrow rather than compressing all of the new stuff into today’s announcements, so stay tuned for more.

With the rise of AI hardware gadgets vying to replace your smartphone—and the talk of app-less generative interfaces—I asked Samat how he sees Android changing in the next five years. He’s excited to see the innovation from new and existing companies trying new things—and that Google is “trying a lot of things internally” too. But he boiled things down to an analogy with the automotive space.

If you buy a car, you’ve come to expect certain standard features, like a steering wheel. But with AI, one giant leap would be to take away those features—no steering wheel, no interfaces. “Some people would be excited by that, some people would not be excited by that.” He believes certain functions we do on our phones will be more assistive than ever with the help of AI—and we can expect some features to be replaced in that way.

“As that continues, what we will find—and we’re already seeing this in our own testing—is there are opportunities to fundamentally transform the UI in certain areas where it tips over from the point of, ‘OK, that’s really assistive,’ to ‘Actually, there should be an entirely new way of doing this.’ That’s what’s fun and exciting about right now. It’s an amazing time to be working on this technology.”

Source : Wired