AI: Google swings well at AI Bat. RTZ #357

...stage set for AI to go mainstream multimodal to billions next 18 months

May 15, 2024

Google took it’s swing at the bat on AI announcements yesterday at its Google I/O 2024 Developer Conference, and played a sweepingly good game. An AI tour-de-force almost.

Lots of details to absorb here, but after Meta’s Meta AI & open sourced Llama 3 announcements a few weeks ago, the OpenAI announcements of Monday as well, and Apple next up at bat on AI announcements at its 2024 WWDC on June 10, here’s the big takeaway for regular folk at this early point in the AI Tech Wave:

A year and a half after OpenAI’s ‘ChatGPT Moment’ on November 2022, hundreds of billions in capex investments in next gen LLM AI software and AI data center ‘accelerated computing’ hardware by the big tech ‘Magnificent 7’ and many others, we’re finally set for billions of end users to actually use ‘multimodal’ AI capabilities across many devices, over the next eighteen months. Both ‘Big and Small AI’, both in the cloud, and locally ‘at the Edge’.

And most of it will be free, as the companies race to get users to try these capabilities, create new user habits, make them a part of their daily routines and lives, BEFORE starting to figure out how to price them, both directly via subscriptions, and/or of course via advertising and transactions.

That’s the big takeaway to absorb, from these and the other flow of AI announcements to come in the coming weeks. This stuff is goint to be available at scale for regular people to try, in ways that go far beyond just typing ‘prompts’ into a text chatbot. We’re going to be able to leverage our three plus billion smartphones, point them at things, and talk to them while taking pictures and videos in real time, and ask them questions in the language of our choice. And get answers via very human sounding voices, engaging us with their intonations and on the fly responsiveness, much as humans do.

Yes, it’ll be increasingly tough to tell the real from fake, but we humans are adaptable. We’ll figure out how to do it at scale. And the companies will help as well, as Google pointed out with its AI Safety efforts with ‘SynthID’, a digital watermarking technology that marks photos, videos and more with tags that identify them as AI generated. And they’re open sourcing that to make it industry-wide ubiquitous.

So a lot to asborb here, both in terms of Google’s announcements yesterday and their broader implications. On the former, here’s pithy summaries by the Information, Techcrunch, Ben’s Bites, The Verge, and Axios, just to pick a few. As Ben’s Bites outlines:

“GOOGLE I/O 2024. Huff!! There’s too much to cover but let’s start with the key themes:”

“Google is integrating AI into all of its ecosystem. In true Google fashion, many features are “coming later this year”. If they ship and perform like the demos, Google will get a serious upper hand over OpenAI/Microsoft.”
“All of the AI features across Google products will be powered by Gemini 1.5 Pro. It’s Google’s best model and one of the top models. A new Gemini 1.5 Flash model is also launched, which is faster and much cheaper.”
“Google has ambitious projects in the pipeline. Those include a real-time voice assistant called Astra, a long-form video generator called Veo, plans for end-to-end agents, virtual AI teammates and more.”

“A key thing to remember about Gemini models is that all the models in this family are multimodal natively. They can take any form: text, images, audio, or video as input and create any form of output. Ask Photos and Audio outputs in Notebook LM are good examples of the power of multimodality.”

“Also, the 1.5 series had a context window of 1M tokens (2M is in preview now). That means you can add a bunch of (very) long files in any media form and these models should work.”

“OpenAI’s GPT-4o is their first such model. Previously they used to stitch different models to make multimodality work.”

The buzz words regular folks should keep in mind: bigger token ‘memory’ widows, more multimodal capabilities, and agents.

Those are the key new tech things that Google and other companies like OpenAI and others are racing to add natural multimodal capabilities (adding images, videos and more beyond just text input), bigger ‘token’ windows which give these AI systems more ability to absorb bigger amounts of user information and remember more, and better integrated memory so that these systems remember more about you as a person, multimodal capabilities so that these systems engage users with natural sounding voices while understanding textt, images, videos and more, AND give you more ‘smart agent’ and ‘agentic’ services over time.

All this and more make these systems lot more relatable to more regular folks on all this ‘AI fuss’. The software and the hardware is catching up at scale. The next eighteen months and more are going to be the mainstream ‘proof in the pudding’ on whether regular folk like the AI pudding. The mother of all capital investment bets in tech rides on this question.

Google did their best at bat yesterday. And their strategy is also well grounded on the Google Cloud Enterprise side vs Amazon AWS, Microsoft Azure, Oracle Cloud and others.

OpenAI, Google, Meta, Nvidia, Microsoft, Amazon, Apple, Tesla, and many others are going to do their best with their best. The pace of invention and innovation is going to be relentless. Good times indeed to be regular folk. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)

AI: Reset to Zero

Discussion about this post