Reading path
Robotics engineer in the Physical AI era
Foundation models for robots, VLA, Pi0, Figure, Gemini Robotics: the milestones of embodied AI.
You are a robotics engineer or embodied AI researcher who wants to understand how foundation models are radically changing the design of robotic systems: from manual reward engineering to generalist policies trained on heterogeneous data. This path follows the releases that have shifted the boundary between simulation and real-world deployment.
- 01
Why it matters to you
MuJoCo becomes free: the reference physics simulator opens to the entire community, accelerating research on control policies and reinforcement learning for robots.
Medium RoboticsDeepMind acquires MuJoCo and makes it free
DeepMind announces it has acquired MuJoCo, the physics simulator used in most RL and robotics research, and commits to making it free for everyone — a first step toward the full open-source release in 2022.
- 02
Why it matters to you
Codex shows that transformers trained on code generalize beyond text: the conceptual proof that foundation models can learn behaviors from unstructured data — the premise behind VLAs.
High AI CodingCodex paper: OpenAI publishes HumanEval and the model behind Copilot
OpenAI releases Evaluating Large Language Models Trained on Code describing Codex (the model powering GitHub Copilot) and introduces HumanEval, the standard benchmark for code generation.
- 03
Why it matters to you
Figure 01 demonstrates a humanoid robot that reasons and plans actions using an LLM in closed loop: the first convincing deployment of language as a planning layer on real hardware.
High RoboticsFigure 01 + OpenAI: first end-to-end LLM-driven humanoid demo
Figure publishes a video of its Figure 01 humanoid conversing, recognizing objects, and manipulating them using OpenAI models for language and vision, in an end-to-end pipeline.
- 04
Why it matters to you
Physical Intelligence's Pi0 is the first true foundation model for generalist robots: a pre-trained cross-embodiment policy that adapts to different tasks with minimal fine-tuning.
High RoboticsPhysical Intelligence's π0: the first cross-embodiment robotic foundation model
Startup Physical Intelligence (Karol Hausman, Sergey Levine) releases π0, a 3B generalist robotic foundation model trained on 10k+ hours of cross-embodiment data, capable of skills like laundry folding and making coffee.
- 05
Why it matters to you
Figure's Helix introduces an end-to-end VLA (Vision-Language-Action) on a humanoid: it proves that language-action perceptual alignment scales on complex bodies in unstructured environments.
High RoboticsFigure Helix: first generalist VLA driving a full-body humanoid
Figure announces Helix, a proprietary Vision-Language-Action model controlling the Figure 02 humanoid at 200Hz, two robots in collaboration, fingers included. Demos: fold laundry and tidy a kitchen from language alone.
- 06
Why it matters to you
Pi0.5 extends generalization to real domestic scenes across different morphologies: the signal that robotic foundation models are leaving the lab and moving toward in-the-wild deployment.
High RoboticsPhysical Intelligence π0.5: first policy that generalizes to new homes
Physical Intelligence publishes π0.5, an evolution of the π0 VLA. New: zero-shot deployment in homes never seen during training (cleaning unknown kitchens, putting groceries away).
- 07
Why it matters to you
Gemini Robotics integrates Google's multimodal model directly into the control loop: the architecture that unifies visual perception, natural language and motor action in a single model.
High RoboticsGemini Robotics: DeepMind brings foundation models into the physical world
Google DeepMind updates Gemini Robotics and Gemini Robotics-ER: generalist VLAs on Gemini 2 base that drive industrial arms and humanoids (Apptronik Apollo) zero-shot on never-seen tasks.