For every kid who’s grown up over the last few decades, and every ‘Techie’, billionaire or wanna-be, it’s always been a given that there will be robots who will one day try and emulate humans. From Asimov’s ‘I, Robot’, to Star Trek’s ‘Data’, to of course ‘The Terminator’, robots were always going to try and be like the best and worst of us, when they grew up.
And we’ve also known that the most recent attempts to make that reality real has a long way to go on the AI roadmap, be it Elon’s dancing human in a robot suit, to some quasi-working ‘Optimus’ prototypes, to Softbank Masa-san’s robot aspirations, and of course Boston Dynamic’s ‘dance at the drop of a hat for social media virality’, pseudo-robots. (Boston Dynamic, a pioneering US robot company, owned previously by Google and SoftBank, is now owned by Hyundai).
So at the dawn of LLM and Generative AI, hopes abound that maybe this budding approach in AI points the way to inching us along further to that ‘Robots R Us’ future. And the poster child company for this of course, is Google. Google, with its DeepMind AI subsidiary, and its long time efforts with smart robots, and now, tinkering with RT-2. As the NY Times explains:
“Google has recently begun plugging state-of-the-art language models into its robots, giving them the equivalent of artificial brains. The secretive project has made the robots far smarter and given them new powers of understanding and problem-solving.
I got a glimpse of that progress during a private demonstration of Google’s latest robotics model, called RT-2. The model, which is being unveiled on Friday, amounts to a first step toward what Google executives described as a major leap in the way robots are built and programmed.”
“We’ve had to reconsider our entire research program as a result of this change,” said Vincent Vanhoucke, Google DeepMind’s head of robotics. “A lot of the things that we were working on before have been entirely invalidated.”
We’re at the ‘beginning of the beginning’ of this experimentation,
“Robots still fall short of human-level dexterity and fail at some basic tasks, but Google’s use of A.I. language models to give robots new skills of reasoning and improvisation represents a promising breakthrough, said Ken Goldberg, a robotics professor at the University of California, Berkeley.”
They go on:
“To understand the magnitude of this, it helps to know a little about how robots have conventionally been built.
For years, the way engineers at Google and other companies trained robots to do a mechanical task — flipping a burger, for example — was by programming them with a specific list of instructions. (Lower the spatula 6.5 inches, slide it forward until it encounters resistance, raise it 4.2 inches, rotate it 180 degrees, and so on.) Robots would then practice the task again and again, with engineers tweaking the instructions each time until they got it right.”
This approach worked for certain, limited uses. But training robots this way is slow and labor-intensive. It requires collecting lots of data from real-world tests. And if you wanted to teach a robot to do something new — to flip a pancake instead of a burger, say — you usually had to reprogram it from scratch.”
The definition of Tedious. Then, ‘Eureka’:
“In recent years, researchers at Google had an idea. What if, instead of being programmed for specific tasks one by one, robots could use an A.I. language model — one that had been trained on vast swaths of internet text — to learn new skills for themselves?
“We started playing with these language models around two years ago, and then we realized that they have a lot of knowledge in them,” said Karol Hausman, a Google research scientist. “So we started connecting them to robots.”
“Google’s first attempt to join language models and physical robots was a research project called PaLM-SayCan, which was revealed last year. It drew some attention, but its usefulness was limited. The robots lacked the ability to interpret images — a crucial skill, if you want them to be able to navigate the world. They could write out step-by-step instructions for different tasks, but they couldn’t turn those steps into actions.”
Then, the breakthrough:
“Google’s new robotics model, RT-2, can do just that. It’s what the company calls a “vision-language-action” model, or an A.I. system that has the ability not just to see and analyze the world around it, but to tell a robot how to move.
It does so by translating the robot’s movements into a series of numbers — a process called tokenizing — and incorporating those tokens into the same training data as the language model.”
“Eventually, just as ChatGPT or Bard learns to guess what words should come next in a poem or a history essay, RT-2 can learn to guess how a robot’s arm should move to pick up a ball or throw an empty soda can into the recycling bin.”
“In other words, this model can learn to speak robot,” Mr. Hausman said.”
The piece then goes on to illustrate RT-2 in all its glory, complete with diagrams, photos and videos. You can go and check it out. And of course, the Google ‘mad scientists’ are doing everything to make sure these things are ‘Safe’ for humans, while they go on tinkering, until we figure out “The Three Laws of Robotics”, or something along those lines.
It’s also worth perusing the Google DeepMind AI paper where they explain their 'Eureka' moment of fusing Generative AI into Robots. In case you were wondering why they're calling it the RT-2, it stands for the Robotic Transformer 2 (RT-2). The paper goes into the details of their 'novel visual-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalized instructions for robotic control. Not as cool as another Robot R2-D2, but still very cool for 2023.
But for now, it’s cool that LLM AI and robots are being put together like peanut butter and jelly. A delicious step forward. Stay tuned.
Hey Michael, what a fascinating read! I'm thrilled to see the promising progress Google's making with RT-2. Combining LLM AI and robots like peanut butter and jelly indeed takes us a delicious step forward into the future. Can't wait to see what's next! Keep up the great work!