Skip to content
AImpact
IT EN
High Robotics · 1 min read

RT-2: the robot that reasons with a language model

In one sentence DeepMind's RT-2 merges vision-language pretraining with robot control, transferring semantic reasoning from the web to a physical arm without task-specific training.

Verified Official source
ShareLinkedInX
Reading level

RT-2 is the successor to RT-1 with one fundamental difference: the base model is trained not only on robot data, but also on billions of web images and text. This means the robot "already knows" many things about the world before touching an object.

The practical result is striking: if you ask the robot to "pick up the object used to cut fruit," it does so correctly even without seeing that phrase during training. The language model's semantic reasoning transfers to physical control.

It's like taking a model like GPT and teaching it to move hands: language becomes the bridge between world knowledge and physical action.

Companies

DeepMind, Google

Tools

RT-2, PaLI-X, PaLM-E

Tags

DeepMindRT-2VLAVision-Language-ActionEmbodied AIRobotics Transformer

Sources