SayCan: grounding LLMs in robot affordances

In one sentence Google Robotics shows how to combine an LLM for high-level planning with robot value functions that filter only physically executable actions.

Verified Official source

ShareLinkedIn X

A robot that understands natural language can receive commands like "bring me a snack that isn't too sugary" — but does it know how to execute them physically? SayCan solves this mismatch with a two-part architecture.

The language model proposes a list of candidate actions ("pick up the apple", "open the fridge", "bring the water bottle"). A value function trained on the real robot assigns each action a probability of physical success in the current environment.

The final result is the product of both scores: the robot picks the action that is both linguistically sensible and physically feasible. Tests run on a mobile robot in a real kitchen across 551 instruction variants.