AI: OpenAI leaning into AI Humanoid Robots. RTZ #581
...barely begun to capture the sensory data to train them vs a four year old
Foundation LLMs are coming to Robots in this AI Tech Wave, an area I’ve discussed at length. Investments here are ramping up, and China is in pole position here due to the wide and deep tech manufacturing ecosystem. And US tech companies from Tesla, Nvidia, Google et al have their eyes on AI driven robots, humanoids or not.
Now OpenAI looks like they have want back into robots as well, in addition to their robot partnerships. The Information lays it out in “OpenAI Has Discussed Making a Humanoid Robot”:
“Over the past year, OpenAI has dropped not-so-subtle hints about its revived interest in robotics: investing in startups developing hardware and software for robots such as Figure and Physical Intelligence and rebooting its internal robotics software team, which it had disbanded four years ago.”
“Now, OpenAI could be taking that interest to the next level. The company has recently considered developing a humanoid robot, according to two people with direct knowledge of the discussions.”
OpenAI apparently is interested in the humanoid versions, the same as Elon Musk’s focus with his Optimus line of robots.
“As a refresher, humanoid robots typically have two arms and two legs, distinguishing them from typical robots in a warehouse or factory that might have a single arm repeatedly performing the same task on an assembly line. Developers of humanoid robots think it will be easier for them to handle tasks in the physical world—which is tailored to humans—than it would be to change our physical environments to suit new robots.”
OpenAI of course has an ever more full plate with its other AI products and services around AI reasoning and agents next.
“Don’t get too excited yet, though: Any potential humanoid robot seems to be a lower priority for OpenAI than a number of its other technologies and products, such as its highly praised reasoning models and an agent that could help automate all manner of software engineering and analysis tasks, one of these people said.”
“But the fact that OpenAI is even considering developing a humanoid robot highlights its growing ambition to get into everything from search and a web browser to server chips and data center planning. Only yesterday, my colleagues wrote it was a matter of time before OpenAI (and Google) would develop humanoid robots, given Elon Musk’s comments that such products were a $1 trillion revenue opportunity.”
“It also highlights another pattern: OpenAI’s tendency to compete with some of its largest customers and partners.”
OpenAI of course is also a firm practitioner of ‘coopetition and frenemies’ approach in the tech industry.
“OpenAI’s interest in humanoid robots could put it head-to-head with Figure and 1X Technologies, two humanoid robot startups in which OpenAI has invested. Earlier this year, OpenAI announced it would partner with Figure to provide the AI models that power the startup’s robots. Since then, Figure has released several demos highlighting the new capabilities of its robots, like being able to have full conversations with humans, thanks to OpenAI’s software. (Figure and 1X Technologies didn’t respond when we asked them how they felt about this.)”
But these are the earliest days of LLM driven robots, with a lot to get done.
“There’s still a lot researchers need to do to get humanoid robots working consistently, including navigating unfamiliar environments or knowing how to respond to an unexpected event, like a person throwing a shirt at the robot while it’s trying to fold laundry. Large language models play a role because they help these robots communicate with humans and give them a foundational understanding of the world and the relationship between different concepts. Multimodal models—or those that can understand and produce text, images and audio—can also give robots the ability to “see” and better understand their surroundings. And other techniques borrowed from generative AI, like diffusion, which is used to generate images, help robots overcome obstacles like bumping into walls.”
And finally, the last big issue for all US companies with ambitions to Scale AI Robots, China.
“One unintended side effect of OpenAI’s humanoid robot ambitions is a greater dependence on China. Like electric vehicles, China is critical to the supply chain and manufacturing of robots. But working more closely with Chinese companies likely won’t go over well with the Trump Administration. OpenAI has been trying to cozy up to the incoming administration by emphasizing building data centers and AI infrastructure in the U.S. and making sure America stays ahead of China with AI development.”
It’s important to note that all these companies know that we are barely at the beginning in building humanoid robots. The data that we have to train these robots is far scarcer than the data we’ve used to train our best LLM AIs to date.
And that’s because the visual data needed to train these robots does not exist in most of the text used to train our best AI models to date.
AI ‘Godfather’ Yann LeCun, and Meta’s chief AI scientist, makes this point simply in terms of the amount of visual data a 4 year old child trains on (16,000 hours), than all the text data that trains our LLM AIs. And how that data in quantity is about 30 minutes of YouTube videos updated to date:
“I've made that point before: - LLM: 1E13 tokens x 0.75 word/token x 2 bytes/token = 1E13 bytes. - 4 year old child: 16k wake hours x 3600 s/hour x 1E6 optical nerve fibers x 2 eyes x 10 bytes/s = 1E15 bytes.”
“In 4 years, a child has seen 50 times more data than the biggest LLMs. 1E13 tokens is pretty much all the quality text publicly available on the Internet. It would take 170k years for a human to read (8 h/day, 250 word/minute).”
“Text is simply too low bandwidth and too scarce a modality to learn how the world works. Video is more redundant, but redundancy is precisely what you need for Self-Supervised Learning to work well. Incidentally, 16k hours of video is about 30 minutes of YouTube uploads.”
Hear hin explain this in his own words. It’s clear that a lot of ‘essential science’ and data to train AIs of the future, is missing still in 2024.
So these humanoid robots, by Tesla, Google, OpenAi, and dozens of companies around the world, are merely the opening salvo in video data collection in the physical world. They’re not being deployed because they do useful things yet. They’re being deployed to learn how to do useful things in the future.
Along with countless camera and other sensors in the physical world, that will train the real physical world LLM AI systems of the future. To do the reasoning, agentic, and eventual ‘AGI’ things we’re all really craving for AI to do for us.
It’s all more than a few years away. And billions more in capex investments.
It’s logical for OpenAI to join the AI robots opportunity, even in these early days of the AI Tech Wave. But make sure our expectations for mainstream AI robots, may take a while. Stay tuned.
(NOTE: The discussions here are for information purposes only, and not meant as investment advice at any time. Thanks for joining us here)