← Reading paths

◦

Reading path

Data Scientist

Integrate LLMs into your workflow: RAG, embeddings, benchmarks, and fine-tuning.

For data scientists who want to use language models as real engineering components. This path traces the evolution of architectures, benchmarks, and tools that matter: from evaluating open-source models to building production-ready RAG pipelines.

01

Why it matters to you

The GPT-3 paper introduced in-context few-shot learning as a new evaluation paradigm, setting the baseline for understanding model capabilities and scaling without task-specific fine-tuning.

May 28, 2020 Landmark Foundation Models

GPT-3: the paper that opens the scaling-laws era

OpenAI publishes 'Language Models are Few-Shot Learners' and shows that at 175B parameters a model learns new tasks from a handful of examples in the prompt.
02

Why it matters to you

Chinchilla rewrote the scaling laws, showing that data and parameters must be co-optimized — a foundational result for anyone designing or comparing models against benchmarks.

March 29, 2022 Landmark Foundation Models

Chinchilla: the big models were undertrained

DeepMind publishes the Chinchilla paper and shows that, given equal compute, smaller models trained on far more tokens beat oversized undertrained ones.
03

Why it matters to you

LangChain made building RAG pipelines and agents accessible, quickly becoming the go-to stack for integrating LLMs into data-intensive production applications.

October 25, 2022 Landmark Agents

LangChain: the framework for LLM applications is born

Harrison Chase releases LangChain, an open-source Python library to chain LLMs with prompt templates, memory, tools and external data sources. It will become the default stack of the first LLM apps.
04

Why it matters to you

LLaMA opened access to foundation models for the research community — essential for reproducible fine-tuning experiments, comparative benchmarking, and controlled local deployment.

February 24, 2023 High Open Source Models

LLaMA: Meta opens foundation models to research

Meta releases LLaMA in four sizes (7B, 13B, 33B, 65B), available to researchers on request. One week later, the weights leak publicly.
05

Why it matters to you

Gemini 1.5's 1M-token context window changed the RAG equation: less aggressive chunking, new long-document retrieval strategies, and a higher ceiling for in-context analytics.

February 15, 2024 High Foundation Models

Gemini 1.5 Pro: 1 million tokens in context

Google announces Gemini 1.5 Pro: Mixture of Experts architecture, 128K standard context, 1M in preview. New benchmark: near-perfect 'needle in a haystack' retrieval over long inputs.
06

Why it matters to you

DeepSeek V2 proved that MoE architectures can hit top-tier performance at a fraction of active parameters — critical for anyone evaluating cost-performance tradeoffs in production.

May 6, 2024 High Open Source Models

DeepSeek-V2: Multi-head Latent Attention and the first highly efficient Chinese open MoE

DeepSeek releases V2: 236B-total / 21B-active MoE with Multi-head Latent Attention (MLA), drastically cuts KV cache, slashes Chinese API prices by 90%, and ignites a price war.
07

Why it matters to you

DeepSeek R1 showed that advanced reasoning can be distilled into open-weight models, enabling reproducible benchmarks and fine-tuning on reasoning-heavy tasks.

January 20, 2025 Landmark Open Source Models

DeepSeek-R1: open reasoning matches o1 at 1/30 the cost

Chinese startup DeepSeek releases R1, a reasoning model with MIT-licensed open weights. Performance on par with OpenAI o1, API pricing $0.55/$2.19 per 1M tokens (vs o1 $15/$60). Nasdaq AI loses $1T in two days.