As I’ve outlined in my year-end piece for 2024, this year is supposed to be when multimodal LLM AI starts to go mainstream in these early days of the AI Tech Wave. This means not just AI applications and services that can can users can use beyond text prompts. So text to voice (and vice versa), text to code, text to photos and text to video are supposed to be the next areas to see lightning captured in a bottle akin to OpenAI’s ChatGPT in November 2022.
We’ve seen successful companies emerge like Midjourney in text to images. Next up is text to video, with contenders ranging from private startups like Runway, Pika labs and others, but also some of the ‘Magnificent 7’ companies like Google, Microsoft and others. Well, OpenAI is not letting grass grow under their feet, despite their corporate governance drama last November.
This week saw their launch text to video service Sora, which is causing a bit of a stir amongst early users. Founder/CEO Sam Altman commented on Sora on social media as well. As Axios reports:
“OpenAI caused a stir on Thursday with its unveiling of Sora, its first tool that can turn a text prompt into a video of up to one minute in length.
“Why it matters: While others, including Meta, Google and Runway, have their own text-to-video engines, the realism shown in sample videos elicited a powerful range of emotions.”
“Details: Sora is a diffusion model that is able to "generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background," per OpenAI.”
“Sora will be able to understand the nuances of the prompt as well as how various objects behave in the physical world.”
“Sora also generates an entire video at once, rather than creating it frame by frame. That helps avoid what has been a continuity challenge with other approaches — ensuring a subject stays the same even when it goes out of view temporarily.”
“Between the lines: An OpenAI spokesperson stressed that it doesn't plan to make Sora broadly available any time soon as it continues to work on a range of safety issues, including efforts to reduce misinformation, hateful content and bias — and also to clearly label the output as generated by AI.”
With a minute length videos via text prompts, we’re a bit far from feature length movies. But this step certainly is an important one in that direction over time. Reactions range from fear to excitement, as Axios also highlights:
“What they're saying: Reactions to Sora represented a Rorschach test for how people already view the impact of generative AI, with a mix of excitement, awe, fear and revulsion.”
Even though the service hasn’t seen wide release, the reactions from mainstream outlets are notable. Time magazine notes:
“Some users marveled at the superiority of Sora’s videos, noting the pace of AI progress in less than one year. Runway CEO and co-founder Cristóbal Valenzuela posted “game on” on X in response to OpenAI’s announcement.”
MIT Technical Review goes into a bit more technical detail behind Sora, while the Atlantic commented on the broader context:
“The excitement from the press has been reminiscent of the buzz surrounding the image creator DALL-E or ChatGPT in 2022: Sora is described as “eye-popping,” “world-changing,” and “breathtaking, yet terrifying.”
The various examples in the pieces above are all worth a read and a watch.
Overall, this continues to be a positive step to adding features and capabilities to AI that connect better with users than just text. A picture as they say is a thousand words, and moving pictures are likely to add few more zeroes to those words.
And it’s fitting it’s OpenAI kicking the ball forward, from text to text to now text to video. Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)