Training Beginner Also known as: Pre-training · Pre-addestramento

Pretraining

The initial training phase where a model learns the structure of language by predicting the next token on huge amounts of generic text.

ShareLinkedIn X

In practice

It is the most expensive step (months of GPUs and millions of dollars) and produces a "base" model that can write but cannot yet follow instructions. Only big labs run it from scratch; companies start from pretrained models and adapt them with SFT, LoRA, or RLHF.

Seen in the wild

6 entries mentioning it

May 5, 2024

GR-2: ByteDance pre-trains a robot on 38,000 hours of human internet videos

High
June 27, 2022

UL2: Google unifies pretraining paradigms with Mixture-of-Denoisers

Medium
June 1, 2021

The Pile: the 825 GB open dataset that fuels the open LLM era

High
December 31, 2020

The Pile: the open-source 825 GB dataset for training LLMs

High
June 17, 2020

Image GPT: generative pretraining for images

Medium
March 23, 2020

ELECTRA: more efficient NLP pre-training than BERT

Medium

← All terms

In practice

Related terms

Seen in the wild