Skip to content
AImpact
IT EN
High Foundation Models · 1 min read

OLMo: the first truly open model — weights, data, code, and checkpoints

In one sentence AllenAI releases OLMo with weights, the full Dolma dataset (3T tokens), training code, and all intermediate checkpoints, making the entire LLM training process scientifically reproducible for the first time.

Needs review Official source
ShareLinkedInX
Reading level

"Open source" in the AI world has become a term used very loosely. Meta's Llama is "open" in the sense that you can download the final model, but you don't know exactly what data it was trained on, you can't reproduce the training, and you can't see the intermediate steps.

AllenAI did something different with OLMo: they published everything. The final model, yes, but also the entire training dataset (Dolma, 3 trillion tokens), the source code to reproduce training from scratch, and hundreds of intermediate checkpoints showing how the model changes during training.

This matters because science requires reproducibility. If you can't repeat an experiment, you can't truly verify the claims. OLMo is the first LLM on which an external researcher can do this kind of rigorous analysis.

Companies

AllenAI

Tools

Tags

OLMoAllenAIopen sourcereproducibilityDolmatransparent AI

Sources