AI: China's DeepSeek does more with less. RTZ #612

...a true AI research innovator with global chops

Jan 26, 2025

The Bigger Picture, Sunday, January 26 2025

Long time readers here know I’ve written often about how China, both as a country, and its extraordinary entrepreneurs, will ‘Find a Way’ to compete globally in this AI Tech Wave, despite the US/China tech trade curbs. And that it’s in the long-term interests of the US as a country, and its extraordinary entrepreneurs, to ‘thread the needle’, competing on a global playing field.

The recent global media attention around Chinese AI company DeepSeek showing the world how it can do more in AI with less, is a vivid case in point. DeepSeek is going head to head with the best closed and open source LLM AI models globally, is a case in point. What’s more, they’re doing it in rapid fashion on AI Reasoning models as well, the next level on the path to AGI. This is the ‘Bigger Picture’ I’d like to unpack this Sunday.

The FT summarizes it all well in “How small Chinese AI start-up DeepSeek shocked Silicon Valley”:

“Hedge fund billionaire Liang Wenfeng builds model on tight budget despite US attempt to halt China’s high-tech ambitions.”

It goes on to explain how Chinese tech billionaires can go toe to toe with US tech billionaires on AI:

“A small Chinese artificial intelligence lab stunned the world this week by revealing the technical recipe for its cutting-edge model, turning its reclusive leader into a national hero who has defied US attempts to stop China’s high-tech ambitions.”

“DeepSeek, founded by hedge fund manager Liang Wenfeng, released its R1 model on Monday, explaining in a detailed paper how to build a large language model on a bootstrapped budget that can automatically learn and improve itself without human supervision.”

DeepSeek of course built both their first LLM AI and the Ai Reasoning R1 product on ‘the Shoulders of Giants’ (OTSOG) ,a topic I’ve discussed as well:

“US companies including OpenAI and Google DeepMind pioneered developments in reasoning models, a relatively new field of AI research that is attempting to make models match human cognitive capabilities. In December, the San Francisco-based OpenAI released the full version of its o1 model but kept its methods secret. DeepSeek’s R1 release sparked a frenzied debate in Silicon Valley about whether better resourced US AI companies, including Meta and Anthropic, can defend their technical edge. “

DeepSeek’s accomplishments of course are boltsering the AI Space Race nature of the current global AI competition between the US and China:

“Meanwhile, Liang has become a focal point of national pride at home. This week, he was the only AI leader selected to attend a publicised meeting of entrepreneurs with the country’s second-most powerful leader, Li Qiang. The entrepreneurs were told to “concentrate efforts to break through key core technologies.” In 2021, Liang started buying thousands of Nvidia graphic processing units for his AI side project while running his quant trading fund High-Flyer. Industry insiders viewed it as the eccentric actions of a billionaire looking for a new hobby. “

And this founder/CEO also has a track record of going from one field to another:

“At High-Flyer, he built a fortune by using AI and algorithms to identify patterns that could affect stock prices. His team became adept at using Nvidia chips to make money trading stocks. In 2023, he launched DeepSeek, announcing his intention to develop human-level AI. “Liang built an exceptional infrastructure team that really understands how the chips worked,” said one founder at a rival LLM company. “He took his best people with him from the hedge fund to DeepSeek.”

And they figured out how ‘to do more with less’, especially in light of AI tech curbs from the US:

”After Washington banned Nvidia from exporting its most powerful chips to China, local AI companies have been forced to find innovative ways to maximise the computing power of a limited number of onshore chips — a problem Liang’s team already knew how to solve.”

“DeepSeek’s engineers know how to unlock the potential of these GPUs, even if they are not state of the art,” said one AI researcher close to the company. Industry insiders say DeepSeek’s singular focus on research makes it a dangerous competitor because it is willing to share its breakthroughs rather than protect them for commercial gains.”

They’ve also done this without outside capital yet:

“DeepSeek has not raised money from outside funds or made significant moves to monetise its models. “DeepSeek is run like the early days of DeepMind,” said one AI investor in Beijing. “It is purely focused on research and engineering.” Liang, who is personally involved in DeepSeek’s research, uses proceeds from his hedge fund trading to pay top salaries for the best AI talent. Along with TikTok-owner ByteDance, DeepSeek is known for giving the highest remuneration available to AI engineers in China, with staff based in offices in Hangzhou and Beijing.”

And they’re leveraging China’s Math and Quant talent to drive their versions of AI Research innovation:

“DeepSeek’s offices feel like a university campus for serious researchers,” said the business partner. “The team believes in Liang’s vision: to show the world that the Chinese can be creative and build something from zero.”

“Liang has styled DeepSeek as a uniquely “local” company, staffed with PhDs from top Chinese schools, Peking, Tsinghua and Beihang universities rather than experts from US institutions. In an interview with the domestic press last year, he said his core team “did not have people who returned from overseas. They are all local . . . We have to develop the top talent ourselves”. DeepSeek’s identity as a purely Chinese LLM company has won it plaudits at home.”

The tech specs of what they’ve done are already part of AI global tech lore:

“DeepSeek claimed it used just 2,048 Nvidia H800s and $5.6mn to train a model with 671bn parameters, a fraction of what OpenAI and Google spent to train comparably sized models. Ritwik Gupta, AI policy researcher at the University of California, Berkeley, said DeepSeek’s recent model releases demonstrate that “there is no moat when it comes to AI capabilities”.

And they leveraged China’s gobs of human AI/Tech talent, another key input in shortage in the US:

“The first person to train models has to expend lots of resources to get there,” he said. “But the second mover can get there cheaper and more quickly.” Gupta added that China had a much larger talent pool of systems engineers than the US who understand how to get the best use of computing resources to train and run models more cheaply.”

Again, global competition is good, and DeepSeek’s example should be an impetus AND inspiration for large and small companies in the US, and the world:

“Its US rivals are not standing still. They are building mega “clusters” of Nvidia’s next-generation Blackwell chips, creating the computing power that threatens to once again create a performance gap with Chinese rivals. This week, OpenAI said it was creating a joint venture with Japan’s SoftBank, dubbed Stargate, with plans to spend at least $100bn on AI infrastructure in the US.”

I’ve recently discussed my take on DeepSeek’s path to seemingly quick success on video here. And a useful CNBC video on the subject too. For deeper textual dives, would recommend these pieces here and here by Jeffrey Emanuel and Steven Sinofsky respectively. Both worth reading in detail.

Also worth reading is the Semianalysis take on Deepseek via hardware, software, and geopolitical frameworks.

This DeepSeek development is one of the best case studies of how AI toothpaste is truly out of the global AI tube this AI Tech Wave, and we can see innovations everywhere, both open and closed source. Chinese or otherwise. That is the ‘Bigger Picture’ I’d like to highlight this weekend. Stay tuned.

(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)

AI: Reset to Zero

Discussion about this post

Ready for more?