Data Beginner Also known as: Dati sintetici

Synthetic Data

Training data generated by another AI model instead of collected from humans.

In practice

It is now a pillar of modern training: big models produce examples to train smaller ones (distillation) or to cover rare cases. It must be filtered carefully, because generator errors compound in the final model. Nvidia, Meta, and Anthropic use it heavily.

Seen in the wild

6 entries mentioning it

March 18, 2025

NVIDIA Isaac GR00T N1.5: robotic foundation model with synthetic data pipeline

High
November 5, 2024

NVIDIA GR00T: foundation model for humanoid robots with Isaac Sim

High
September 25, 2024

Nemotron-4 340B: NVIDIA's model for generating synthetic training data

Medium
May 14, 2024

Microsoft RoboGen: generating robot tasks, skills and environments from text

Medium
September 6, 2023

Phi-1.5: big-model reasoning in just 1.3 billion parameters

High
June 8, 2023

Phi-1: 1.3B parameters beating models 10x larger on code

High

← All terms

In practice

Related terms

Seen in the wild