In practice
It is now a pillar of modern training: big models produce examples to train smaller ones (distillation) or to cover rare cases. It must be filtered carefully, because generator errors compound in the final model. Nvidia, Meta, and Anthropic use it heavily.
Related terms
Seen in the wild
6 entries mentioning it- HighNVIDIA Isaac GR00T N1.5: robotic foundation model with synthetic data pipeline
- HighNVIDIA GR00T: foundation model for humanoid robots with Isaac Sim
- MediumNemotron-4 340B: NVIDIA's model for generating synthetic training data
- MediumMicrosoft RoboGen: generating robot tasks, skills and environments from text
- HighPhi-1.5: big-model reasoning in just 1.3 billion parameters
- HighPhi-1: 1.3B parameters beating models 10x larger on code