(Update, 10/1/24: New details on the OpenAI management decisions that led to the launch of GPT-4 Omni discussed here, a day before Google I/O 5/15/24 and Gemini announcements.)
As expected, the week of tit for tat AI announcements by the leading LLM AI companies did not disappoint. OpenAI debuted their new GPT-4 ‘Omni’ model with ‘enhanced real-time voice abilities’. As well as demonstrably more fluid multimodal capabilities. All ahead of Google’s annual I/O Developer conference tomorrow, which is also expected to be packed with AI annunvements around Google Gemini. As Axios summarized:
“OpenAI Monday announced a new flagship model, dubbed GPT-4o, that brings more powerful capabilities to all its customers, including smarter, faster real-time voice interactions.
“Why it matters: Google, Microsoft and Apple are all reorganizing their offerings around a generative-AI based future, and OpenAI, whose ChatGPT kicked off the race, is trying to hold its lead.”
What they're saying: "The special thing about GPT-4o is it brings GPT-4 level intelligence to everyone, including our free users," CTO Mira Murati said during a livestream presentation.
“She said the new GPT-4o — that's a letter "o", for "Omni" — is a "huge step forward when it comes to the ease of use" as well as speed.”
“A more major update to the underlying model — i.e. the successor to GPT-4 — is due to be unveiled later this year, Murati told Axios in an interview after the company's livestream announcement.”
“Zoom in: OpenAI showed off real-time interactions with the voice assistant in ChatGPT, including faster responses and the ability to interrupt the AI assistant.”
“In one demo, OpenAI showed one of its workers getting a real-time tutorial on taking deep breaths.”
“Another showed ChatGPT reading an AI-generated story in different voices, including super-dramatic recital, robotic tones and even singing.”
“In a third demo, a user asked ChatGPT to look at an algebra equation and help the person solve it rather than simply providing an answer.”
“In all the demos, GPT-4o showed considerably greater personality and conversational skills than it has previously had.”
“OpenAI showed the new chatbot working simultaneously across languages, in this case helping translate between English and Italian.”
“The demos highlighted ChatGPT's multimodal capabilities across visual, audio and text interactions, with the AI assistant able to use a phone's camera to read written notes and to attempt to detect the emotion of a person.”
“The big picture: The online event comes a day before Google holds its I/O developer conference, where it is expected to make its progress on generative AI a key focus.”
“OpenAI also said it was releasing a desktop version of ChatGPT, initially for Mac users, with paid users getting access starting today. OpenAI told Axios that a Windows version is also in the works, but it started with the Mac because that's where more of its users are.”
“GPT-4o greatly improves the experience in OpenAI’s AI-powered chatbot, ChatGPT. The platform has long offered a voice mode that transcribes the chatbot’s responses using a text-to-speech model, but GPT-4o supercharges this, allowing users to interact with ChatGPT more like an assistant.“
The AI Tech Wave continues it race amongst the leading AI companies. Tomorrow, it’s Google’s turn to turn on their AI razzle dazzle. Never a dull moment. More tomorrow. Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)