AI: Voice AIs are back. RTZ #358

...talking our way to mainstream AI usage?

May 16, 2024

The computing eras over the decades have seen some epochal transitions in core ways to use these machines. The three main ones in terms of user interfaces/user experiences (aka UI/UX), have been the Command Line, Graphical User Interfaces (GUIs) like Windows and Mac, and Multi-touch as pioneered by the Apple iPhone in 2007.

A group of 5 iPhone 2gs are shown in front of a white background.

Of course, lots of iterative UI/UX innovations in the middle like the Blackberry ‘Crackberry’ and others, but those three are the biggies. And of course the first generation of ‘smart voice’ devices like Amazon’s Alexa and Echo, Apple’s Siri, Google’s Assistant and Nest devices, powered by deterministic computing software.

google-voice-search-header - Paradigm New Media Group

The LLM AI Tech Wave since OpenAI’s ‘ChatGPT moment’ of course harks back to the ‘command line’, users typing in what they want into a text box, Kinda like Google that merged the command line with a barely graphical user box with a text box decades ago, asking if “I’m feeling lucky”.

What does Google's 'I'm Feeling Lucky' feature do? - Quora

If one judged where AI is going over the last few months, the answer may be voice. From the Humane Pin by ex-Apple engineers, to the orange Rabbit R-1 , to whatever OpenAI/Sam Altman and Jony Ive are cooking up in terms of an AI device, the answer being kicked around is Voice. I’ve discussed some of the pros and cons of Voice AI earlier this year.

This Monday of course saw OpenAI release it’s latest ‘multimodal’ UI'/UX powered ChatGPT 4o, where the ‘O’ stands for ‘Omni’. OpenAI really honed its AI Voice Assistant, as the Information explains here in depth. As the NY Times goes on to explain further in “OpenAI Unveils new ChatGPT that listens, looks and talks”:

“Chatbots, image generators and voice assistants are gradually merging into a single technology with a conversational voice.”

“As Apple and Google transform their voice assistants into chatbots, OpenAI is transforming its chatbot into a voice assistant.”

“On Monday, the San Francisco artificial intelligence start-up unveiled a new version of its ChatGPT chatbot that can receive and respond to voice commands, images and videos.”

“The company said the new app — based on an A.I. system called GPT-4o — juggles audio, images and video significantly faster than previous versions of the technology. The app will be available starting on Monday, free of charge, for both smartphones and desktop computers.”

“We are looking at the future of the interaction between ourselves and machines,” said Mira Murati, the company’s chief technology officer.”

The demos were impressive, including this video with the famous education Luminary Sal Khan of Khan Academy famce using OpenAI 4o to help a boy with a geometry problem. A must watch. More video demos here. All a remarkable glimpse into the future today.

Google yesterday at Google I/O showed off their version of AI Voice with Astra, which stands for “Advanced Seeing and Talking Responsive Agent”, as the Information explains, both companies differ in their approach to AI Voice assistants:

“Google has spent the past year and a half chasing OpenAI’s conversational artificial intelligence. But as the technical gap narrows between their products, the companies this week revealed an important difference in how their AIs will interact with people. (To read the five main takeaways from Google’s two-hour spate of announcements, see this article we published last night.).”

“Here’s what’s going on. OpenAI on Monday unveiled an emotionally expressive female-sounding AI voice assistant reminiscent of “Samantha” from the film “Her,” including digressions and humor that made it seem human. On Tuesday at its annual conference for developers, Google took a different approach with an assistant, Project Astra, that was also voiced by a female but spoke in a more matter-of-fact tone and focused on practical tasks. It identified a neighborhood based on a user’s view from their window and named a technical component of a sound speaker, both of which the assistant could see from the user’s phone. Google executives on Tuesday used the adjective “agentive” no fewer than five times, so get ready to hear that word a lot!”

“In other words, OpenAI is stepping into more daring terrain with its AI and Google is taking a more cautious path. In keeping with that divergence, OpenAI’s voice assistant capabilities will launch within ChatGPT in a matter of weeks but Google said it could take months to make Astra available through Gemini, its rival to ChatGPT.”

As I outlined a few weeks ago, Meta is not far behind releasing its Meta AI service on its core appications Instagram, Facebook and WhatsApp, with over 3 billion users. And it’s making a voice driven Meta AI available via its Ray-Ban co-branded ‘Smart Glasses’ in a recent update.

And Apple of course is not far behind, rumored to reveal a deal with OpenAI to power its revamped Siri service, as early as this June at its annual WWDC Developer confernece.

So voice driven AI is going to be available to billions this year and into next, as I outlined yesterday.

Lots will be learned both by the companies rolling out these services, and of course their billions of users. Voice may or may not be the next major leap of user interface in this AI Wave of computing. We’re all going to be figuring it out together. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)

AI: Reset to Zero

Discussion about this post