AI: Google Gemini AI with Voice ready for its 'close-up'. RTZ #449

...beginning of mainstream multimodal, voice AI ramps

Aug 15, 2024

Multimodal AIs, which I’ve long written about, are ready for their mainstream close-up in this AI Tech Wave. And first out of the gate at mainstream scale is Google Gemini on Android smartphnes.

YARN | All right, Mr DeMille, I'm ready for my close-up. | Sunset Boulevard (1950) | Video gifs by quotes | 2a2bc1c0 | 紗

With Google’s rollout of its latest AI powered Pixel smartphones, watches and earbuds, it looks like unified multimodal AIs powered by the user’s personal data, may finally be here for mainstream rollout. And the star as I mentioned above, is Google Gemini on Android. (And soon to come also on iOS devices by Apple of course).

Multimodal voice AIs are something I’ve written about for a while now, with OpenAI showing off its voice driven GPT-4 Omni, Apple giving glimpse of its AI revamped Siri, and several tantalizing voice driven, multimodal AI demos of these these highly anthropomorphizing technologies.

Not to mention of course Apple’s vision of its ‘Apple Intelligence’, that leverages its unique branding as the ‘Privacy and Trust’ AI custodian of over two billion users’ personal data across a range of Apple hardware, operating systems, and applications/services. As I articulate here.

But Google’s presentation early this week was notable less for its latest hardware devices, and more for its live demos of its Google Gemini Advanced that can interact via Voice, while leveraging over two billion users’ data across Google Apps and platforms like Gmail, Docs, Drive, Chrome, Android and YouTube. And of course Google’s powerful integrated tech stack from its custom AI TPU silicon chips to massive AI data centers to serve up these multimodal voice AIs at scale.

Google Rolls Out TPU/GPU Enhancements ...

As Axios notes in “The chatbot wars get personal”:

“Google's Tuesday announcement of several new Gemini features is part of a broader effort by tech giants to make their chatbots more personal by giving them access to more of your data.”

“Why it matters: Apple and Microsoft have also taken steps to combine the generic knowledge of a large language model with a user's personal data — a move that makes AI assistants both more practical and more of a potential privacy problem.”

“The big picture: The first generative AI chatbots knew a lot about the world, but almost nothing about the specific person using them.”

“The race to combine world knowledge and personal information is now on — though it's still in its earliest stages.”

“Driving the news: Google is making several moves to allow Gemini to access users' information.”

“One of the new options is what Google describes as a "contextual overlay" that allows Android owners to press a button and have Gemini try to answer questions based on whatever is on the phone's screen.”
“Pixel Screenshots — which is only available on the new Pixel 9 series of phones, announced on Tuesday — allows people to store specific screenshots so they can later ask Gemini for the information on them, like the door code for an Airbnb reservation.”
“All the work is done on the device, Google said, meaning users' personal information isn't sent up to cloud servers. That's part of why the phones have added built-in memory to run the most powerful current AI models. Google says it doesn't have access to any of the information generated by Pixel Screenshots.”
“Google is also offering new extensions that allow Gemini to bring in data from other Google products, including Calendar, Keep, Tasks and YouTube Music.”
“There's also a "call notes" feature that saves summaries of phone calls.”

Google Says ‘Add Me’ to AI-Focused Phone Buyers

The Information puts it more directly in “Google Beats Apple And OpenAI to the Punch”:

“In tech, hype can only carry you so far. That’s one lesson from Google’s new product presentation on Tuesday, when it announced a flurry of updates to its devices that competitors like Apple and OpenAI have been talking about (but not shipping) for months.”

“First, Google beat its rivals to launching an AI-powered voice assistant that’s flexible enough to handle interruptions and sudden topic changes from users. OpenAI only started letting some paying subscribers use its analogous voice-based product last month, after a multi-month delay and plenty of controversy.”

“Google also announced that its Pixel Buds Pro 2 earbuds would allow users to converse with their AI assistants, even when the phone is locked and in their pockets. That sounds a lot like the virtual assistant in Spike Jonze’s “Her”—a film OpenAI CEO Sam Altman has long admired and used as inspiration for some of his projects.”

Google Beats Apple and OpenAI To the Punch

“Google’s voice assistant is able to remember context across hours-long conversations, a product lead on Gemini told me. When conversations stretch on long enough, a Netflix-type interruption will even ask users if they still want to keep going, he said.”

“Eventually, Google plans to extend the voice assistant into video, meaning that users could essentially have video calls with Gemini, the product lead told me.”

The whole rollout impressed the WSJ’s Joanna Stern enough to write “Google’s Gemini Live AI Sounds So Human, I Almost Forgot It Was a Bot”:

“I’m not saying I prefer talking to Google’s Gemini Live over a real human. But I’m not not saying that either.”

“Does it help that the chatty new artificial-intelligence bot says I’m a great interviewer with a good sense of humor? Maybe. But it’s more that it actually listens, offers quick answers and doesn’t mind my interruptions. No “I’m sorry, I didn’t understand that” apologies like some other bots we know.”

“I had a nice, long chat with Google’s generative-AI voice assistant before its debut on Tuesday. It will come built into the company’s four new Pixel phones, but it’s also available to anyone with an Android phone, the Gemini app and a $20-a-month subscription to Gemini Advanced. The company plans to launch it soon on iOS, too.”

All of the above pieces are worth reading in full, including the video rollout, to get a fuller sense of what Google is planning with multimodal AI powered by personal information. It bolsters my arguments from last fall that Google should not be counted out on the AI front in these early days of the AI Tech Wave.

And my similar ‘Call Your Shot’ arguments for Apple and Nvidia also stand.

But this week’s announcements of Google’s next steps on AI are notable. Particularly since Google, along with Apple, has the DISTRIBUTION might to scale their voice AIs to billions. Meta, Amazon and others of course have pathways there too, with fewer relative, integrated hardware and chip resources. And of course OpenAI has its initial distribution deal with Apple as a start.

So Google’s moves with Gemini on Android to start are notable. With the usual caveats of course, that it’ll take at least a year or more for their and other companies’ multimodal AIs to ramp to mainstream use at Scale. But these companies are getting ready for their ‘Close-up’. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)

AI: Reset to Zero

Discussion about this post