One of the indelible legacies for millions of Star Trek fans since the original in 1966 is the “Universal Translator”. Accessed by Captain Kirk and others, often with the inimitable assistance of Lieutenant Commander Uhura. I daresay it’s part of the DNA of almost techie in Silicon Valley and beyond. Alongside of course, the iconic Star Trek Communicator, helpful with countless ‘new civilizations’ on ‘away’ missions.
These core techie ‘universal communication’ chords were tickled when I read of Meta’s latest foray in open source LLM AI releases. It’s almost easy to overlook, given Meta’s flow of AI activities led by Zuckerberg and company, especially after the recent release of Llama 2 and related software tools.
This particular announcement was “SeamlessM4T: the first all-in-on multimodal translation model”:
“SeamlessM4T is a foundational speech/text translation and transcription model that overcomes the limitations of previous systems with state-of-the-art results.”
“SeamlessM4T (Massive Multilingual Multimodal Machine Translation) is the first multimodal model representing a significant breakthrough in speech-to-speech and speech-to-text translation and transcription. Publicly-released under a CC BY-NC 4.0 license, the model supports nearly 100 languages for input (speech + text), 100 languages for text output and 35 languages (plus English) for speech output.”
They had me at ‘Translation’.
As a former Goldman Sachs head of Internet Equities Research in the 1990s, I’ve long followed translation software and services startups and efforts in Silicon Valley and beyond. Companies have ranged from Nuance Communications decades ago, to Google of late. Nuance, one of the original language translation software companies, is now a part of Microsoft.
Today there are a range of companies large and small focused on applying the latest Generative and LLM AI technologies to voice and translation opportunities. These include everyone from the big guys like Meta, OpenAI, Google, Microsoft to newer companies like ElevenLabs, Wellsaid Labs, Lyrebird, Papercup and others.
So what caught my eye about Meta’s SeamlessM4T?
For one, their video explanation was short and to the point. They gets across how they’re leveraging LLM AI into language translation on dozens of languages and beyond. Worth a watch. As is the demo for those more technically inclined.
Where it really caught my attention was one of the Meta engineers involved being a native speaker from India, of Hindi, Telugu, and English.
Sravya Popuri, Meta Research Engineer, deftly demonstrates both ‘text to text’ and ‘text of speech’ capabilities of one LLM AI model dealing with all three languages seamlessly, on the fly.
As someone who is a native speaker of Hindi, Gujarati, Urdu, and English myself, along with some dribs and drabs of Arabic and Farsi, I’ve long looked forward to true computer based universal translators being real. My personal driver for years have been aging parents, who didn’t speak English.
But this ‘Universal Translation’ problem is a global ‘Universal Opportunity’. Note that less than 1.4 billion of the world’s 8 billion people (< 17% in 2022) speak English, with over 5 billion souls now online. So LLM AI has an opportunity to contribute meaningfully to the ‘universal translation’ opportunity. A large ‘total addressable market’ or TAM in investment vernacular. Especially as ‘AI eats Software eating the World’.
It’s one of the reasons I’ve tinkered with Amazon Alexa, Google Assistant and Nest and Apple Siri Voice ‘translation’ capabilities for the last few years. They’ve been more than lacking until now. Amazon Alexa, a Jeff Bezos project of immense promise, years of efforts by thousands of Amazon employees, and tens of billions in investment, is still short of aspirations and ultimate potential.
As Microsoft CEO Satya Nadella candidly said recently including Microsoft’s own Cortana voice assistant efforts, “They were all dumb as a rock”.
Speaking of Microsoft CEO Satya Nadella, it’s useful to remember how he got the conviction for the historic $13 billion plus partnership with OpenAI and founder/CEO Sam Altman not too long ago. It was his love for Iranian/Persian 13th century poet Rumi, as he explains to The Verge Editor-in-chief Nilay Patel in a deeply personal take:
“I’ll never forget my first query I did on the model, which, I think for me, growing up, I always felt, if only I could read Rumi (in Farsi) translated into Urdu translated into English, that is my dream. I just put that in as one long query, and it was magical to see it generated. And I said, “Man, this is different.”
Rumi speaks to us all from the 13th century in this age of AI, where our machines crawl, extract, and process every word we’ve ever uttered and recorded for what we’re deeply looking for:
Search for meaning via every language is personal. In never ending loops. For billions.
Despite over five decades of hard work, computers have barely made a dent in bringing the world closer together with seamless universal translators, be they text to text, text to voice, or anything in between.
LLM AI potentially can turbo-charge these efforts, along of course with the voice assistants themselves in their core language. It’s one of the reasons why each of the major tech companies, with Google pacing ahead, is urgently updating those assets with LLM AI capabilities. It’s both voice and video…multi-modal in LLM AI parlance.
The goal of course, the Star Trek Universal Translator. Just to remind all techie AI engineers currently hard at work on language translation projects anywhere, here’s the spec sheet from the future, via ChatGPT of course:
“The universal translator is a device in the Star Trek universe that translates spoken languages in real-time. It was invented sometime before 2151 to translate any language into the user's native language. The universal translator works by scanning brain-wave frequencies to create a basis for translation.”
“The universal translator was first used in the late 22nd century on Earth to translate well-known Earth languages. It was built into the communications systems of most starships. In 2257, the universal translator was programmed with over 1,000 languages.”
The universal translator is the reason why aliens in Star Trek speak in English”.
No pressure. Elon Musk can even leverage Neuralink with X, SpaceX, SpaceLink, and xAI in the years ahead.
Just pull it forward starting in this AI Tech Wave, and ‘Make it So’.
Especially since now Meta is in the universal translator game as well, led by founder/CEO Mark Zuckerberg. And committed to doing it all in an open source way. Joining big tech Google, Microsoft, Amazon, Apple and many others already hard at work on the task. As Meta Chief AI Scientist Yann LeCun summarized Meta’s latest effort succinctly on X/Twitter:
“Seamless4MT: Massive Multilingual Multimodal Machine Translation.”
“Language translation + speech recognition + speech synthesis in a single model: speech-to-speech, text-to-text, speech-to-text and text-to-speech.”
“Works for 100 languages.”
“Code available under CC-BY-NC license.”
“From Meta - Fundamental AI Research”
Links to the Blog post, Demo, Paper, and Code” provided as well.
As John McClane would say in Diehard, “Welcome to the Party, Pal”!
Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)
Great coverage, thank you!
They had me at ‘Translation’. was very funny :) ... cheers!