Robotics foundation model: a new step toward the "GPT of manipulation"
A robotics lab (Physical Intelligence or peer) publishes a new multi-embodiment foundation model for general manipulation, trained on cross-robot datasets.
Category
44 entries
A robotics lab (Physical Intelligence or peer) publishes a new multi-embodiment foundation model for general manipulation, trained on cross-robot datasets.
Google DeepMind updates Gemini Robotics and Gemini Robotics-ER: generalist VLAs on Gemini 2 base that drive industrial arms and humanoids (Apptronik Apollo) zero-shot on never-seen tasks.
1X (Norway/US, OpenAI-backed) opens Neo Home preorders at $20K + $499/month. Bipedal home robot, soft cover, partially controlled by human teleoperators for complex tasks. Shipping 2026.
DeepMind demonstrates zero-shot generalization of diffusion policies on deformable objects like clothes and dishes, tasks where robots had systematically failed until now.
HuggingFace launches LeRobot: open-source ML library for robotics with standardized datasets, ACT and Diffusion Policy training, and an Aloha-compatible hardware kit for 100 dollars.
Berkeley and Stanford present CrossFormer, a single transformer policy trained on 900k trajectories from over 20 different robots. It transfers to new robots in minutes with minimal fine-tuning. First cross-embodiment robot foundation model with rigorous scaling analysis.
NVIDIA updates GR00T to N1.5 with an industrial synthetic data pipeline, unified training for 10+ robot platforms, and availability on Isaac Lab as an open framework.
Physical Intelligence publishes π0.5, an evolution of the π0 VLA. New: zero-shot deployment in homes never seen during training (cleaning unknown kitchens, putting groceries away).
Figure announces Helix, a proprietary Vision-Language-Action model controlling the Figure 02 humanoid at 200Hz, two robots in collaboration, fingers included. Demos: fold laundry and tidy a kitchen from language alone.
Stanford and Berkeley release ALOHA 2, the commercial version of the teleoperated bimanual system used to collect ACT and Diffusion Policy datasets for tasks like cooking and surgery.
Stanford presents HumanPlus, which maps third-person human demonstrations to whole-body robot actions with 40% success on novel tasks. No teleoperation, no robot-specific data collection — just watching humans.
Unitree launches the G1 dual-arm version: 3kg payload per arm, $16,000 price, imitation learning from human demos, available for research.
NVIDIA launches GR00T, a foundation model for humanoids trained on synthetic and human data, released with the Isaac Sim ecosystem for photorealistic simulation and robot training.
Startup Physical Intelligence (Karol Hausman, Sergey Levine) releases π0, a 3B generalist robotic foundation model trained on 10k+ hours of cross-embodiment data, capable of skills like laundry folding and making coffee.
1X Technologies presents an end-to-end world model for humanoid robot EVE: it predicts future video frames from current observations and actions, trained purely on robot data. It enables real-time planning without external compute, a key step toward autonomous household robots.
Figure AI launches Figure 02 with native OpenAI model integration: the robot demonstrates contextual reasoning in an industrial kitchen and responds to questions about its environment.
NVIDIA and UT Austin present DrEureka, which uses GPT-4 to automatically generate domain randomization parameters for sim-to-real transfer. Locomotion and dexterity policies transfer zero-shot to real hardware without manual calibration.
OpenAI advances robotic dexterity research with new results on reduced sim-to-real gap via massive domain randomization and modern RL on the Shadow Hand.
Microsoft and CMU introduce RoboGen: an automatic pipeline using LLMs to generate robotic tasks, simulated environments, and training skills from a simple text description.
ByteDance presents GR-2, a generalist robot that uses 38,000 hours of human activity videos from the internet as pre-training before robot data. It achieves 88.9% success on 100 tasks, best-in-class at release, demonstrating that internet videos are scalable robot training data.
Boston Dynamics retires the hydraulic Atlas after 11 years and presents its electric successor with greater-than-human range of motion and software APIs for industrial partners.
Figure publishes a video of its Figure 01 humanoid conversing, recognizing objects, and manipulating them using OpenAI models for language and vision, in an end-to-end pipeline.
Unitree launches H1 Ultra at 90,000 dollars: RL-based locomotion humanoid capable of backflips and 3.3 m/s, the first bipedal robot accessible to university labs.
Berkeley and Stanford researchers release OpenVLA, 7B parameters, the first open-source VLA for generalist robot control — a universal controller downloadable from Hugging Face.
Stanford, Berkeley, and CMU release DROID, the most diverse robot manipulation dataset ever collected: 76,000 demonstrations, 564 scenes, 86 tasks, 52 robot arms. It enables cross-embodiment generalization and is the reference for robot foundation models.
Apptronik launches Apollo, a 1.73m 73kg humanoid with hot-swappable battery, 160W power draw and an open ROS2 API, with NASA and Mercedes-Benz partnerships already announced.
Tesla shows Optimus Gen 2 with 30% faster movement, per-finger force sensors, and demonstrated ability to manipulate raw eggs without breaking them.
Stanford combines bimanual ALOHA arms with a mobile wheeled platform, creating the first low-cost system for whole-body manipulation. With 50 demonstrations it learns to cook, do laundry, and clean, opening the path to accessible household robots.
Sanctuary AI introduces Phoenix with Carbon AI, a neuro-symbolic system combining symbolic reasoning and neural nets to follow articulated linguistic instructions without explicit programming.
NVIDIA presents Eureka, the first system to use an LLM (GPT-4) to automatically generate reward functions for robotic reinforcement learning. The system achieves expert-level dexterous manipulation, including pen spinning, without manual reward design.
Google DeepMind and 33 labs collect 527k episodes from 22 different robots: the first unified dataset for training generalist policies that work across multiple platforms.
DeepMind's RT-2 merges vision-language pretraining with robot control, transferring semantic reasoning from the web to a physical arm without task-specific training.
MIT and Columbia apply denoising diffusion models to robot imitation learning, learning multi-modal action distributions instead of deterministic policies. They achieve a 46.9% improvement on manipulation benchmarks.
Stanford presents TidyBot, a robotic system that uses LLMs to personalize household tidying behavior from a few user examples. It achieves 91.2% task completion, demonstrating the feasibility of LLM-driven personalization in manipulation.
Tesla releases the first video of Optimus Gen 1 walking and performing tasks autonomously in a real factory environment, with a stated target price of 20,000 dollars.
Google presents PaLM-E, a 562B-parameter multimodal model that feeds images and robot state directly into the transformer, capable of long-horizon planning on real robots.
DeepMind introduces RoboCat, a robotic agent that learns from few demonstrations, self-trains by collecting new data, and improves iteratively without human intervention. With just 10 demos it achieves 36% success on novel tasks.
Agility Robotics announces partnership with Amazon for Digit v3, a bipedal warehouse robot — first real-scale industrial deployment of a humanoid.
Google shows how an LLM directly generates executable robot code from natural-language instructions, without robotic fine-tuning, using hierarchical function composition.
DeepMind releases RT-1, a robotics transformer trained on 130,000 real episodes with 13 robots, generalizing to never-seen tasks.
Spot gains advanced autonomous navigation and industrial anomaly detection via visual AI, operating without pre-loaded maps.
Google pre-trains a single policy on over 800 real robot tasks and 57,000 hours of real-world data, demonstrating for the first time zero-shot transfer to new tasks through large-scale multi-task offline learning.
Google Robotics shows how to combine an LLM for high-level planning with robot value functions that filter only physically executable actions.
DeepMind announces it has acquired MuJoCo, the physics simulator used in most RL and robotics research, and commits to making it free for everyone — a first step toward the full open-source release in 2022.