AI: OpenAI edges ahead with o3 AI Reasoning. RTZ # 579
...while Google edges Sora with Veo 2 AI video
This year is ending with OpenAI and Google racing a bit ahead of the pack on AI announcements ahead of the New Year. In particular, both companies have amped up their offerings in AI Reasoning and text to video.
OpenAI with its new o3 for AI reasoning, and a wider release for Sora, while Google showed off its experimental AI Reasoning product off Gemini, and its Veo 2 text to video offering. Initial, early benchmarks and comparisons have Google winning the edge in text to video over Sora, while OpenAI edges ahead on AI Reasoning with o3. And these races have just begun.
The Information summarized these developments well in “OpenAI Wows the Crowd as New Scaling Law Passes Its First Test”:
“OpenAI’s next-generation reasoning model—called o3 for a funny reason—seriously impressed researchers and developers last week.”
“Specifically, o3 and o3-mini (a smaller version of the model) scored impressively on a number of extremely difficult math and coding benchmarks, beating out competitors like Anthropic and Google by a lot. It also reached human levels of performance on ARC-AGI, a benchmark meant to test how AI models handle tasks that humans excel at such as complex problem-solving and pattern recognition.”
These benchmark results are very impressive for software developers. It means that o3 is great for mainstream AI applications and services.
And it addresses head on the concerns over AI Scaling slowing down, a topic I discussed in two parts a few weeks ago.
“This is great news for researchers worried about the slowing improvements they’ve been seeing from pretraining, the process of initially training models on tons of data to help them make sense of the world and the relationships between different concepts. (The idea that AI models improve the more data and compute you give them during pretraining is known as the scaling laws.)”
“Reasoning models, as we noted six weeks ago, were meant to make up for the pretraining slowdown, by having models spend more computing resources “thinking,” or processing customer queries.”
Indeed, there are reasons to think that this AI Scaling way may be the better main way to go.
“Some OpenAI researchers are now suggesting that the reasoning model approach could not only be a fix for the pretraining issue; it might actually be better than the pretraining approach.”
Also notable is that this version is only a quarter after the release of o1 itself, hinting at the exponential rate of embedded improvements ahead.
“More importantly, progress from o1 to o3 was only three months, which shows how fast progress will be in the new paradigm…Way faster than pretraining paradigm of new model every 1-2 years,” OpenAI researcher Jason Wei said on X.”
And OpenAI’s GPT-5, Orion, is yet to come.
“Wei may also be considering the fact that OpenAI hasn’t made use of its next big pretrained model, Orion, yet. Orion could serve as the base model for o4—or whatever OpenAI calls its next reasoning model early next year.”
“Another OpenAI researcher, John Hallman, made an even stronger statement: “When Sam and us researchers say [artificial general intelligence] is coming we aren't doing it to sell you cool aid, a $2000 subscription, or to trick you to invest in our next round. It's actually coming.”
Again, these developments are more for the developers than mainstream AI users. But they’ll get benefits soon enough.
“It’s not clear whether o3 will do much for the average ChatGPT user, for now. Reasoning models have been most helpful for those in coding, math and science fields, including researchers who work on extremely tough problems such as fusion energy. The new models might be overkill for the average person who wants to use conversational AI to draft blog posts or build customer service chatbots.”
Initially there is a sticker shock element given the variable cost of running AI reasoning on the expensive AI data center compute infrastructure today, but that will come down exponentially soon enough.
'“Reasoning models can also be expensive. To reach human levels of performance on the ARC-AGI benchmark, researchers spent more than $1,000 per task. That’s likely why OpenAI has started charging 10 times more for ChatGPT subscriptions that use a version of its o1 reasoning model that can “think” for even longer to solve more complex problems. (We might soon see that $2,000 per month ChatGPT subscription we first wrote about in September!)”
The new o3 is not yet released, but only available to safety testers and OpenAI partners for now.
“In any case, everyone will have to wait at least a few weeks to get access to o3, which is currently limited to pre-approved safety testers.”
Last, expect Google, Anthropic and others to also have similar announcements early next year.
“In the meantime, I wouldn’t count out OpenAI’s competitors. Google also had a good week after launching new versions of its small “Flash” model and its video-generating Veo 2 AI, as well as its own reasoning model. developed in part by Google’s $3 billion AI researcher, Noam Shazeer. People posted on X pointing out how Google’s video-generating model seemed to have a better understanding of physics than OpenAI’s, perhaps a side effect of being able to train on YouTube videos. Others flagged price-sensitive developers moving in droves to the Flash model.”
Overall, would reiterate yet again, that chis AI Tech Wave has barely begun. Despite how AI filled 2024 already feels in these closing weeks of the year. Next year is going to make this year look a bit slow. Stay tuned.
NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)