CrossFormer: a single transformer for 20+ robot embodiments with rigorous scaling analysis

In one sentence Berkeley and Stanford present CrossFormer, a single transformer policy trained on 900k trajectories from over 20 different robots. It transfers to new robots in minutes with minimal fine-tuning. First cross-embodiment robot foundation model with rigorous scaling analysis.

Needs review Reputable source

ShareLinkedIn X

Every type of robot is different: different arms, different sensors, different ways of moving. Traditionally, every robot requires its own policy trained from scratch. CrossFormer changes this paradigm: a single model, trained on many different robots, that works on almost all of them.

The system was trained on 900,000 trajectories collected from more than 20 different types of robots — fixed arms, mobile robots, different grippers, different configurations. The model learns the common structures of physical interaction with the world, regardless of the specific robot shape.

The most surprising result is adaptation speed: when introducing a completely new robot, CrossFormer learns to control it in just a few minutes of fine-tuning, instead of the hours or days required by models trained from scratch.

Even more importantly, Berkeley and Stanford conducted a rigorous analysis of how performance improves as data and model size increase. This kind of "scaling analysis" is common for language models but was absent in robotics. Now we know that more data and larger models truly lead to better policies — and this guides investment decisions across the entire field.