It’s early days in the AI Tech Wave and the big Foundation LLM AI companies are ramping up their next generation models in earnest. From OpenAI’s long-awaited GPT-5, to Google Gemini Pro, Ultra and beyond, to Meta beyond Llama 2, to the third iteration of Claude by Anthropic in partnership with Amazon AWS and Google, and more. The race to ‘Zig vs ‘Zag’ in the LLM AI race is on.
All of these companies are in the process of rolling out more capable, multimodal versions of their models this year. The science and the scaling laws driving these LLMs are now well understood. For most of them it’s. a straightforward matter of adding huge dollops of GPU Compute in the form of GPU data centers, power, and chips, of course by an ever increasing capital budget measured in the tens of billions, driven by heaps of super-smart AI research talent, and let the LLM Scaling laws take care of the rest.
But there’s a new generation of LLM AI startups that think there are better, sometimes different ways to get at the same AI opportunity. As the Information relates in “A New Wave of Foundation Model Developers is Coming”:
“New kinds of AI models, are next. Those that go beyond the basic transformer-based LLMs that companies like OpenAI and Anthropic are building.”
“For instance, Physical Intelligence, a startup founded by ex-Googlers and Stanford and Berkeley professors that’s building a foundation model for robots, raised $70 million from Thrive Capital, OpenAI and others. Or Sakana AI, a Japanese company bringing the idea of evolution to LLM development, which landed $30 million from Lux Capital and Khosla Ventures. And Symbolica, which is building AI models that require less compute and training data.”
“But there are even more startups that have quietly raised funding in recent months or are currently in conversations with investors. These include Cartesia, a startup developing more efficient state-space models (more on those here) that raised funding from Index Ventures at a $100 million post-investment valuation, according to two people with direct knowledge of the deal. (And, we’ve heard from one of those people that more investors have come in after the fact at an even higher price.)”
“There’s also Contextual AI, a startup founded by one of the original researchers behind retrieval augmented generation (more here), which is raising a round led by Greycroft at a $600 million post-investment valuation, according to a person with direct knowledge of the deal.”
I’ve talked about companies like Sakana, who’s trying a different approach to LLM AIs in Japan. And there are a whole suite of others coming on strong as well, often leveraging the best from what’s come already. The driver for a ‘better LLM mouse trap’ is of course to see if there is a way to get to the same objectives of the big boys with different approaches.
An instructive example of zagging while others are zigging is Symbolica, founded by an ex-Tesla AI engineer. As Techcrunch explains:
“In February, Demis Hassabis, the CEO of Google‘s DeepMind AI research lab, warned that throwing increasing amounts of compute at the types of AI algorithms in wide use today could lead to diminishing returns. Getting to the “next level” of AI, as it were, Hassabis said, will instead require fundamental research breakthroughs that yield viable alternatives to today’s entrenched approaches.”
Ex-Tesla engineer George Morgan agrees. So he founded a startup, Symbolica AI, to do just that.
“Traditional deep learning and generative language models require unimaginable scale, time and energy to produce useful outcomes,” Morgan told TechCrunch. “By building [novel] models, Symbolica can accomplish greater accuracy with lower data requirements, lower training time, lower cost and with provably correct structured outputs.”
“Morgan dropped out of college at Rochester to join Tesla, where he worked on the team developing Autopilot, Tesla’s suite of advanced driver-assistance features.”
“While at Tesla, Morgan says that he came to realize that current AI methods — most of which revolved around scaling up compute — wouldn’t be sustainable over the long term.”
“Current methods only have one dial to turn: increase scale and hope for emergent behavior,” Morgan said. “However, scaling requires more compute, more memory, more money to train and more data. But eventually, [this] doesn’t get you significantly better performance.”
“Morgan isn’t the only one to reach that conclusion.”
“In a memo this year, two executives at TSMC, the semiconductor fabricator, said that, if the AI trend continues at its current pace, the industry will need a 1-trillion-transistor chip — a chip containing 10x as many transistors as the average chip today — within a decade.”
“It’s unclear whether that’s technologically feasible.”
Thus the quest for a better LLM mouse trap, and the efforts to zig rather than zag. In these early days of the AI Tech Wave, there are going to be other approaches to the same opportunity. Remember in the mid-1990s after Yahoo! became the leading internet ‘search’ portal by 1996, and dozens of search technology startups tried their hand a doing it better, a little company called Google came along with a better mousetrap in 1998. and by 2004, changed the Search game totally and never looked back.
LLM AI efforts both big and small are seeing the same type of Cambrian explosion of experiments. Many are now zigging vs zagging. It’s early days indeed. Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)