Unlock the Editor’s Digest for free
Roula Khalaf, Editor of the FT, selects her favourite stories in this weekly newsletter.
Google DeepMind has unveiled artificial intelligence models for robotics that it hailed as a milestone in the long quest to make the general-purpose machines more useful and practical in the everyday world.
The company’s new robotics models, called Gemini Robotics and Gemini Robotics-ER, are designed to help robots adapt to complex environments by taking advantage of the reasoning capabilities of large language models to complete complicated real world tasks.
According to Google DeepMind, a robot trained using its new models was able to fold an origami fox, organise a desk according to verbal instructions, wrap headphone wires and slam dunk a miniature basketball through a hoop. The company is also partnering with start-up Apptronik to build humanoid robots using this technology.
The development comes as tech groups, including Tesla and OpenAI, and start-ups are racing to build the AI “brain” that can autonomously operate robotics in moves that could transform a range of industries, from manufacturing to healthcare.
Jensen Huang, chief executive of chipmaker Nvidia, said this year that the use of generative AI to deploy robots at scale represents a multitrillion-dollar opportunity that will “pave the way to “the largest technology industry the world has ever seen”.
Progress in advanced robotics has been painstakingly slow in recent decades, with scientists manually coding each move a robot makes. Thanks to new AI techniques, scientists have been able to train robots to adapt better to their surroundings and learn new skills much faster.
“Gemini Robotics is twice as general as our previous best models, really making a significant leap towards general purpose robots,” said Kanishka Rao, principal software engineer at Google DeepMind.
To create the Gemini Robotics model, Google used its Gemini 2.0 language model and trained it specifically to control robots. This gave robots a boost in performance and allowed them to do three things: adjust to different new situations, respond quickly to verbal instructions or changes in their environment, and be dexterous enough to manipulate objects.
Such adaptability would be a boon for those developing the technology, as one big obstacle for robotics is that they perform well in laboratories, but poorly in less tightly controlled settings.
To develop Gemini Robotics, Google DeepMind took advantage of the broad understanding of the world exhibited by large language models that are trained on data from the internet. For example, a robot was able to reason that it should grab a coffee cup using two fingers.
“This is certainly an exciting development in the field of robotics that seems to build on Google’s strengths in very large-scale data and computation,” said Ken Goldberg, a robotics professor at the University of California, Berkeley, who was not part of the research.
He added that one of the most novel aspects of these new robotics models is they run smoothly in the cloud, presumably because they could take advantage of Google’s access to very large language models that require substantial computer power.
“This is an impressively comprehensive effort with convincing results ranging from spatial reasoning to dexterous manipulation. It’s pretty compelling evidence that stronger base [vision-language] models can lead to better manipulation performance,” said Russ Tedrake, a professor at the Massachusetts Institute of Technology and the vice-president of robotics research at the Toyota Research Institute.
“Gemini is an important step,” said Goldberg. However, “much remains to be done before general-purpose robots are ready for adoption”.