AI: AI Inference chips in the Cloud, 'On-Prem', & Local Edge. RTZ #355
...the next AI chip gold rush beyond Training
Hundreds of billions are being committed by Big Tech and investors at the ‘beginning of the beginning’ in this AI Tech Wave. For now, AI GPU chips hardware, data centers, and power (Boxes 1 through 3 below), as well as Data (Box 4), drive this wave to build the applications and services upstream in Box 6. And investors everywhere are scouring the landscape, for the next area to focus on in AI semiconductors in Box 1, wherever they go to work eventually.
Well that of course, are semiconductor chips for AI INFERENCE, the lower loops in the chart above. These chips are generally more power and cost efficient than AI TRAINING GPU chips. Like the top of the line Nvidia Hopper H100 AI GPUs (and soon Blackwelll B100s) for LLM AI model TRAINING in AI data centers everywhere. Running at $30,000 or more per clip, and ramping up capex budgets in the billions for their customers.
But with actual LLM AI use going from hundreds of millions to billions of users soon, the rush is on to build less expensive and far more power efficient AI chips for Inference calculations. The ‘reinforced learning loops’ that drive the actual ‘startling’ AI results for end users everywhere.
And the companies that can design and deploy these inference AI chips at scale, are the usual suspects in tech: Nvidia, Arm, AMD, Qualcomm, Intel, Apple and many others. And these chips are deployed end to end from Cloud to Local. And local can mean On the Premises (aka ‘On-Prem’) of customers/businesses, and of course at the Edge in their devices like PCs, laptops, and smartphones. Closer to the end customers with their applications and services used in Box 6 above. Billions of chips for AI training and inference ‘reinforcement learning loops’, end to end, cloud to edge, from Box 1 to Box 6.
I’ve already discussed Apple recently with their Apple Silicon driven chips powering their two plus billion devices with AI optimized chips, and most recently their ACDC Apple chip for data center driven AI inference applications.
Qualcomm is also racing ahead with their next generation Snapdragon Elite X chips that will see deployment in the next generation round of AI PCs everywhere, especially to be showcased at the upcoming Computex PC trade show in Taiwan in early June.
And of course Arm, the company that provides the chip design templates for Nvidia, Apple, Qualcomm and others, is also racing to provide next generation AI inference chip designs.
As Nikkei Asia reports in ‘Softbank’s Arm plans to launch AI chips in 2025’:
“SoftBank Group subsidiary Arm will foray into the development of artificial intelligence chips, seeking to launch the first products next year.
“The move is part of SoftBank Group CEO Masayoshi Son's 10 trillion yen ($64 billion) push to transform the group into a sprawling AI powerhouse.”
“U.K.-based Arm will set up an AI chip division, aiming to build a prototype by spring 2025. Mass production, to be handled by contract manufacturers, is expected to start in the fall of that year.”
“Arm already supplies circuit designs called architecture to Nvidia and other chip developers. The company holds an over 90% share in architecture for processors used in smartphones.”
“Arm, in which SoftBank owns a 90% stake, will shoulder initial development costs, expected to reach hundreds of billions of yen, with SoftBank also contributing. Once a mass-production system is established, the AI chip business could be spun off and placed under SoftBank.”
“SoftBank is already negotiating with Taiwan Semiconductor Manufacturing Corp. and others over manufacturing, looking to secure production capacity.”
As AI applications and services move in the billions driven by both ‘Big and Small AI’ innovations, AI inference chips will play a critical role not just in the AI cloud data centers, but increasingly in local ‘On-Prem’ computers, PCs, smartphones and myriad other devices.
That’s the next race that’s on for AI chips.
TSMC of course, with a global foundry ‘Fab’ share of over 60%, along with Samsung (13%), and soon Intel are of course in the race. As are their customers Nvidia, Apple, AMD, Qualcomm and many others. Nvidia of course holding a 70% plus share in AI ‘Accelerated Computing’ in cloud data centers, with others furiously trying to catch up and compete. Nvidia chips are already seeing over 40% usage for AI Inference as opposed to training.
And those numbers are only going up. As of course will the AI chip training numbers due to on-going, exponential AI Scaling improvements in LLM AI models being trained by OpenAI and many others at a rapid pace. Announcements to come fast and furious, this week being no exception.
So it’s now AI Inference chips as well as Training chips. In the cloud data centers, and ‘on-prem’, local edge. End to End. For billions of users and applications. Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)
Michael, won't battery capacity be an issue for small AI? Does Snapdragon efficient design reduce the need for battery improvement? Are there any breakthroughs needed to power small AI?