AI: Lessons from Amazon Alexa LLM AI retrofit. RTZ # 389

...multi-billion dollar LLM AI makeovers hard for incumbent Big Tech ML Voice Assistants

Jun 16, 2024

The Bigger Picture, Sunday, June 15, 2024

It’s time to try “Hello Computer” of Star Trek fame again. Voice Assistants, AI Version 2.0.

For the AI ‘Bigger Picture’ today, I’d like to discuss the challenges of ‘grafting’ new AI driven ‘Voice Assistants’ onto existing Voice Assistants like Apple Siri, Amazon Alexa, Google Assistant and others.

This is timely, right after we’ve seen what Apple has in plan for early but now late Siri service. Both by itself, in partnership with OpenAI, and potentially others. And it’ll all take a bit longer than we like.

As I’ve discussed in these pages before, for billions of mainstream users worldwide, the next eighteen months plus are going to see AI driven ‘Voice Assistance’ augmenting the text-driven ‘Chatbot’ AI assistance that’s become all the rage for the last eighteen months, thanks to OpenAI’s ChatGPT.

New multimodal AI Voice capabilities on everything from OpenAI’s market-leading GPT-4o (Omni), to Anthropic’s Claude, to Google’s Gemini and many others, will be eager to answer user prompts and queries in as natural and engaging voices as possible. And be ‘Agentic’ to boot, offering personalized ‘smart agent’ ‘reasoning’ AI assistance along the way.

But the key for all of the pure multimodal LLM AI chatbots will be DISTRIBUTION. The next question is how to connect them to the hundreds of millions plus mainstream users already using the previous generation of ‘Voice Assistants’ out there already in vast numbers. All distributed out there with tens of billions in investments already, by Amazon for Alexa/Echo, Apple for Siri/Homepods, Google for Google Assistant/Nest and many others.

Those were all built using older ML (machine learning) technologies, the ‘old AI’ tech before the current LLM (Large Language Model)/Generative AI of OpenAI ChatGPT fame.

Amazon Alexa alone has over half a billion Alexa units in home worldwide. Apple’s existing Siri service sees over a billion and a half user queries a day from Apple users worldwide, both via iPhones, and Apple devices like Homepods, Carplay, and other Apple wearables.

And that’s where Apple’s plans with Siri being augmented with ‘Apple Intelligence’, and when necessary, ‘external’ help with OpenAI’s GPT-4 Omni, announced at this week’s ‘Dub-Dub’ (Apple WWDC 2024 Developer Conference), shows the way. But the strategies for all its peers will have many differences as well as similarities, with open questions of how mainstream audiences will take to them, if at all.

First, it’s not easy to just do a ‘brain transplant’ onto existing Voice Assistants with LLM/Generative AI. As the Verge explains in “Amazon is reportedly way behind on its new Alexa”:

“According to Fortune, the more conversational, smarter voice assistant Amazon demoed last year isn’t close to ready and may never be.”

“In the voice assistant arms race, the frontrunner may be about to finish last. On the heels of Apple revealing a new “Apple Intelligence”-powered Siri at its WWDC 2024 conference, a new report from Fortune indicates that Amazon’s Alexa — arguably the most capable of the current voice assistants — is struggling with its own generative AI makeover:”

... none of the sources Fortune spoke with believe Alexa is close to accomplishing Amazon’s mission of being “the world’s best personal assistant,” let alone Amazon founder Jeff Bezos’ vision of creating a real-life version of the helpful Star Trek computer. Instead, Amazon’s Alexa runs the risk of becoming a digital relic with a cautionary tale— that of a potentially game-changing technology that got stuck playing the wrong game.”

As Fortune itself explains in its detailed “How Amazon blew Alexa’s shot to dominate AI”:

“The new Alexa LLM, the company said, would soon be available as a free preview on Alexa-powered devices in the US. Rohit Prasad, Amazon’s SVP and Alexa leader said the news marked a “massive transformation of the assistant we love,” and called the new Alexa a “super agent.” It was clear the company wanted to refute perceptions that the existing Alexa lacked smarts. (Microsoft CEO Satya Nadella reportedly called it “dumb as a rock” in March 2023 as OpenAI’s ChatGPT rocketed to fame).”

Close-up shot of the Amazon Echo Dot smart speaker with clock and Alexa on a night stand in Lafayette, California, January 22, 2021. (Photo by Smith Collection/Gado/Getty Images)

“But after the event, there was radio silence—or digital assistant silence, as the case may be. The traditional Alexa voice never changed on the half-a-billion devices that have been sold globally, and little news emerged over the coming months about the new generative AI Alexa, other than recent reports about a potential launch later this year that could include a subscription charge.”

“The reason, according to interviews with more than a dozen former employees who worked on AI for Alexa, is an organization beset by structural dysfunction and technological challenges that have repeatedly delayed shipment of the new generative AI-powered Alexa. Overall, the former employees paint a picture of a company desperately behind its Big Tech rivals Google (GOOG), Microsoft (MSFT), and Meta (META) in the race to launch AI chatbots and agents, and floundering in its efforts to catch up.”

“The September 2023 demo, the former employees emphasize, was just that—a demo. The new Alexa was not ready for a prime time rollout, and still isn’t. The Alexa large language model (LLM), that sits at the heart of the new Alexa, and which Amazon positioned as taking on OpenAI’s ChatGPT, is, according to former employees, far from state-of-the-art. Research scientists who worked on the LLM said Amazon does not have enough data or access to the specialized computer chips needed to run LLMs to compete with rival efforts at companies like OpenAI.”

Amazon.com building in Toronto, Canada. (Stephanie Foden/Bloomberg via Getty Images)

“Amazon has also, former employees say, repeatedly deprioritized the new Alexa in favor of building generative AI for Amazon’s cloud computing unit, AWS. And while Amazon has built a partnership and invested $4 billion in AI startup Anthropic, whose LLM model Claude is considered competitive with OpenAI’s models, it has been unable to capitalize on that relationship to build a better Alexa. Privacy concerns have kept Alexa’s teams from using Anthropic’s Claude model, former employees say—but so too have Amazon’s ego-driven internal politics.”

The whole piece is worth reading in full. It has all the makings of classic HBS (Harvard Business School) case study on the technical, business, and managerial difficulties of building new technologies on top of old. Maybe even a TV show or movie. After all, the original Alexa was championed by Jeff Bezos himself.

And the lessons are very applicable to all of Amazon’s peers, including Apple, Google, Meta and others.

Apple itself has wisely said that the new LLM AI Siri features will likely not be available on traditional Siri devices like Homepods, Apple TVs, presumably Carplay and other Apple Siri services. Some of that is due to those devices not having the local AI processing hardware for the new AI Service services.

And the difficulty of integrating third party partnerships in these product ecosystems, with the new and old AI/ML technologies. As the Fortune piece illustrates, on Amazon Alexa’s travails.

Not to mention the unique additional challenges of securing the immense, ongoing feeds of Data to feed these new AI Voice Assistants. Especially the highly personal and contextual data that also needs the highest adherence of Trust and Privacy standards demanded by users and regulators. It’s something that Apple has an edge on for now, as I’ve previously outlined.

The hard challenges of developing and integrating two distinctly different technology code bases, one based on new LLM AI probabilistic technologies, and the other based on traditional ML (machine learning) tecnologies based on ‘deterministic’ code, are real. And to train the new code locally on devices with upgraded AI GPU and Neural processing chips needed.

It’s not just about the large amounts of capital invested, but more the business and managerial challenges around the ecosystems involved, both first and third party. And that’s not something easily understood by mainstream audiences being promised exciting new ‘AI’ capabilities with their existing Voice Assistants.

That’s why a paced, go-slower approach like what Apple announced this week likely makes sense. Even though many would like the new AI to replace the old AI/ML faster. It’ll all likely get sorted out and distributed at scale. But the journey from here to there is likely to be a bumpy one..

That’s the Bigger Picture to keep in mind for now. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)

AI: Reset to Zero

Discussion about this post

Ready for more?