Robots That Reason: Google’s Gemini 1.5 Raises the Bar

September 30 • 8:16 am

Tags:

No tags

Google has introduced Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, new AI models built to power robots that can think, plan, and act responsibly. The release marks what the company calls an “era of physical agents,” where machines move beyond reacting to commands and start reasoning about the environment.

According to a Google DeepMind announcement, the Gemini Robotics models can tackle complex, multi-step tasks by combining vision, language, and reasoning to deliver more general-purpose intelligence in robotics.

The push for general-purpose intelligent robots

Google said the release follows earlier efforts to extend Gemini’s multimodal intelligence into robotics, marking what it called “another step toward advancing intelligent, truly general-purpose robots.” The company positioned the launch as part of a broader effort to equip machines with the autonomy needed to operate in complex real-world settings.

The two models divide the work between high-level planning and direct action, offering complementary capabilities designed to make robots more versatile and adaptable in real-world environments.

Gemini Robotics 1.5

Google describes Gemini Robotics 1.5 as its most capable vision-language-action model, built to help robots think before moving, rather than simply follow instructions. Instead of directly translating a command into motion, the AI model generates a reasoning process in natural language, allowing it to map out each step and make its actions more transparent.

That approach means a robot can handle semantically complex requests, like sorting laundry or organizing items, by breaking them into manageable steps and deciding the best way to carry them out. It also allows the model to adjust mid-task if the environment changes or a user redirects it.

Key strengths include:

Multi-level reasoning: The ability to explain and refine actions before execution.

Interactivity: Responding to everyday language and clarifying its approach while working.

Dexterity: Performing tasks that demand fine motor control, such as folding paper or packing a lunch box.

Gemini Robotics 1.5 can also learn across different embodiments, transferring behaviors from one robot form to another, whether a stationary bi-arm platform or a humanoid machine.

Gemini Robotics-ER 1.5

Gemini Robotics-ER 1.5, on the other hand, is designed to think ahead. Google calls it a state-of-the-art embodied reasoning model, essentially a brain that orchestrates a robot’s activities and breaks down broad instructions into detailed plans.

Instead of simply reacting to a command like “clean the kitchen,” ER 1.5 can map the task into steps — clearing counters, loading dishes, wiping surfaces — and then instructing other systems to carry them out. It communicates in natural language, estimates progress, and can even call tools like Google Search to fill in missing knowledge.

Its advances include:

Orchestration: Coordinating complex tasks by planning and assigning actions.

Spatial and temporal reasoning: Understanding environments in detail and grasping cause-and-effect as tasks unfold.

Benchmark performance: Achieving top-tier results across 15 embodied reasoning tests, from pointing accuracy to video question answering.

Google says ER 1.5 is the strategic layer of the system, providing the reasoning and foresight that make physical robots more adaptable and reliable in unpredictable real-world settings.

The planning brain and the acting hands

Google developed the two models to function in tandem, with Gemini Robotics-ER 1.5 handling the big-picture planning, and Gemini Robotics 1.5 carrying out the physical steps. The company says this setup allows robots to take a single instruction, break it into smaller goals, and then execute them in sequence.

For example, ER 1.5 might map out how to tidy a room, while Robotics 1.5 translates those plans into specific movements, such as picking up objects or opening containers. According to Google, ER 1.5 can direct tasks as a high-level brain, while Robotics 1.5 can function as the hands and eyes to complete them.

The diagram shows how Gemini Robotics-ER 1.5 and Gemini Robotics 1.5 actively work together to perform complex tasks in the physical world. Source: Google DeepMind

Solving AGI in the physical world

Google cast Gemini Robotics 1.5 as a milestone toward solving artificial general intelligence (AGI) in the physical world, shifting robots from command-followers to systems that can reason, plan, and act with dexterity.

Safety remains a core part of that vision. Google said the models are aligned with its AI principles, equipped with semantic reasoning to assess risks before acting, and backed by updated ASIMOV benchmarks to test responses in safety scenarios.

Google’s robotics push signals a future in which advanced machines step out of research labs and into the fabric of ordinary living.

Robotics firms are accelerating innovation across industries, from manufacturing to healthcare. See which companies are shaping the next wave of automation and AI.

The post Robots That Reason: Google’s Gemini 1.5 Raises the Bar appeared first on eWEEK.