AI: Music, the really emotional AI Modality. RTZ #327

...Udio, Suno and other 'magical', music AIs show their stuff

Apr 15, 2024

As the saying goes, “Music is the strongest form of Magic”.

And AI, has been likened to ‘magic’ in so many of its modalities in the current AI Tech Wave. From text, to images, to video, to code, to voice, is now seeing some entrepreneurial and technical innovation around music. But arguably, music is perhaps the most universally emotional form of human communication.

And here too, seemingly over night again, the state of the art in ‘multimodal’ AI driven music has currently come down to a contest between two bicoastal musical startups: Udio and Suno, West Coast and East Coast. Echoes of Tupac vs Notorious B.I.G./Puff Daddy Hip hop wars of the 1990s.

Udio and Suno are both venture-backed AI phenoms, currently blowing the minds of millions of musicians and music lovers worldwide. As the HollyWood Reporter notes in “AI-Music Arms Race: Meet Udio, the Other ChatGPT for Music”, after a previous piece on rival Suno:

“JUST LAST YEAR, many experts believed an AI model capable of generating complete, high-fidelity songs from text prompts wouldn’t arrive anytime soon, but now, an arms race is on between competing music-making models that do just that. Suno‘s v3 model, released to the public just weeks ago, was a remarkable breakthrough, particularly in realistic, human-sounding vocals — and today, a formidable new competitor arrives via the just-launched startup Udio. The two companies’ output seems closely comparable, though some early users have suggested that on average, Udio‘s output may sound crisper than Suno‘s, with less of the sonic fuzziness that can betray tracks’ machine-created origins.”

A great collection of ex-Google AI folks, backed by a ‘who’s who’ of venture investors:

“Udio’s product came together remarkably quickly after its founding last December by four former employees of Google’s AI-research wing, DeepMind — David Ding, Conor Durkan, Charlie Nash, Yaroslav Ganin, and Andrew Sanchez — along with Andrew Sanchez. They’re backed by a range of tech heavyweights, including a16z (a.k.a. Andreesen Horowitz) and Instagram co-founder and CTO Mike Krieger. “We were very well supported from the day we took investment,” says Sanchez. “So the technical co-founders were sort of able to hit the ground running because we could get that all going pretty quickly.”

Rival Suno also has a good east coast team of AI experts and venture money, as Rolling Stone notes:

“Over the past year alone, generative AI has made major strides in producing credible text, images (via services like Midjourney), and even video, particularly with OpenAI’s new Sora tool. But audio, and music in particular, has lagged. Suno appears to be cracking the code to AI music, and its founders’ ambitions are nearly limitless — they imagine a world of wildly democratized music making. The most vocal of the co-founders, Mikey Shulman, a boyishly charming, backpack-toting 37-year-old with a Harvard Ph.D. in physics, envisions a billion people worldwide paying 10 bucks a month to create songs with Suno. The fact that music listeners so vastly outnumber music-makers at the moment is “so lopsided,” he argues, seeing Suno as poised to fix that perceived imbalance.”

“Suno is barely two years old. Co-founders Shulman, Freyberg, Georg Kucsko, and Martin Camacho, all machine-learning experts, worked together until 2022 at another Cambridge company, Kensho Technologies, which focused on finding AI solutions to complex business problems.”

“Suno uses the same general approach as large language models like ChatGPT, which break down human language into discrete segments known as tokens, absorb its millions of usages, styles, and structures, and then reconstruct it on demand. But audio, particularly music, is almost unfathomably more complex, which is why, just last year, AI-music experts told Rolling Stone that a service as capable as Suno’s might take years to arrive. “Audio is not a discrete thing like words,” Shulman says. “It’s a wave. It’s a continuous signal.” High-quality audio’s sampling rate is generally 44khz or 48hz, which means “48,000 tokens a second,” he adds. “That’s a big problem, right? And so you need to figure out how to kind of smoosh that down to something more reasonable.” How, though? “A lot of work, a lot of heuristics, a lot of other kinds of tricks and models and stuff like that. I don’t think we’re anywhere close to done.” Eventually, Suno wants to find alternatives to the text-to-music interface, adding more advanced and intuitive inputs — generating songs based on users’ own singing is one idea.”

“OpenAI faces multiple lawsuits over ChatGPT’s use of books, news articles, and other copyrighted material in its vast corpus of training data. Suno’s founders decline to reveal details of just what data they’re shoveling into their own model, other than the fact that its ability to generate convincing human vocals comes in part because it’s learning from recordings of speech, in addition to music. “Naked speech will help you learn the characteristics of human voice that are difficult,” Shulman says.”

“One of Suno’s earliest investors is Antonio Rodriguez, a partner at the venture-capital firm Matrix. Rodriguez had only funded one previous music venture, the music-categorization firm EchoNest, which was purchased by Spotify to fuel its algorithm. With Suno, Rodriguez got involved before it was even clear what the product would be. “I backed the team,” says Rodriguez, who exudes the confidence of a man who’s made more than his share of successful bets. “I’d known the team, and I’d especially known Mikey, and so I would have backed him to do almost anything that was legal. He’s that creative.”

I highlight these two contrasting companies and their AI music creations and innovations to underline the exponential innovations going on in this corner of ‘multimodal AI’. Beyond the far more publiczed text, image, video and code modalities. And of course there are others on THEIR heels like YC grad Sonauto and others. As YCombinator President and CEO Garry Tan explains, Sonauto has a different technical AI approach than Udio and Suno in how they do their music magic.

Music does reach mainstream hearts and minds far quicker than most of the other modalities.

And the current possibilities of these two companies, are but a sampling of competing efforts by others big and small. Examples include Google Deepmind Lyria, OpenAI Jukebox, and many others. All of them underscore how Music is also likely see a historic transition from Scarcity to Abundance on the production side. With unexpected, but likely net societal benefits on the consumption side globally.

We’re about to see a lot more ‘magical music’ in our lives, created in the same OTSOG (On the Shoulders of Giants, Go-Getter ‘Creators’ and Grunts) driven way, as other forms of human creativity.

In the meantime, the debates will accelerate and rage on, with flashes of Napster 2,0, and other copyright/IP legal Holy wars to come. For now, just catch a glimpse of what’s possible with Udio, Suno and the like. And bask in the magical AI music to come. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)

AI: Reset to Zero

Discussion about this post