Reading path
Data Scientist
Integrate LLMs into your workflow: RAG, embeddings, benchmarks, and fine-tuning.
For data scientists who want to use language models as real engineering components. This path traces the evolution of architectures, benchmarks, and tools that matter: from evaluating open-source models to building production-ready RAG pipelines.
- 01
Why it matters to you
The GPT-3 paper introduced in-context few-shot learning as a new evaluation paradigm, setting the baseline for understanding model capabilities and scaling without task-specific fine-tuning.
Landmark Foundation ModelsGPT-3: the paper that opens the scaling-laws era
OpenAI publishes 'Language Models are Few-Shot Learners' and shows that at 175B parameters a model learns new tasks from a handful of examples in the prompt.
- 02
Why it matters to you
Chinchilla rewrote the scaling laws, showing that data and parameters must be co-optimized — a foundational result for anyone designing or comparing models against benchmarks.
Landmark Foundation ModelsChinchilla: the big models were undertrained
DeepMind publishes the Chinchilla paper and shows that, given equal compute, smaller models trained on far more tokens beat oversized undertrained ones.
- 03
Why it matters to you
LangChain made building RAG pipelines and agents accessible, quickly becoming the go-to stack for integrating LLMs into data-intensive production applications.
Landmark AgentsLangChain: the framework for LLM applications is born
Harrison Chase releases LangChain, an open-source Python library to chain LLMs with prompt templates, memory, tools and external data sources. It will become the default stack of the first LLM apps.
- 04
Why it matters to you
LLaMA opened access to foundation models for the research community — essential for reproducible fine-tuning experiments, comparative benchmarking, and controlled local deployment.
High Open Source ModelsLLaMA: Meta opens foundation models to research
Meta releases LLaMA in four sizes (7B, 13B, 33B, 65B), available to researchers on request. One week later, the weights leak publicly.
- 05
Why it matters to you
Gemini 1.5's 1M-token context window changed the RAG equation: less aggressive chunking, new long-document retrieval strategies, and a higher ceiling for in-context analytics.
High Foundation ModelsGemini 1.5 Pro: 1 million tokens in context
Google announces Gemini 1.5 Pro: Mixture of Experts architecture, 128K standard context, 1M in preview. New benchmark: near-perfect 'needle in a haystack' retrieval over long inputs.
- 06
Why it matters to you
DeepSeek V2 proved that MoE architectures can hit top-tier performance at a fraction of active parameters — critical for anyone evaluating cost-performance tradeoffs in production.
High Open Source ModelsDeepSeek-V2: Multi-head Latent Attention and the first highly efficient Chinese open MoE
DeepSeek releases V2: 236B-total / 21B-active MoE with Multi-head Latent Attention (MLA), drastically cuts KV cache, slashes Chinese API prices by 90%, and ignites a price war.
- 07
Why it matters to you
DeepSeek R1 showed that advanced reasoning can be distilled into open-weight models, enabling reproducible benchmarks and fine-tuning on reasoning-heavy tasks.
Landmark Open Source ModelsDeepSeek-R1: open reasoning matches o1 at 1/30 the cost
Chinese startup DeepSeek releases R1, a reasoning model with MIT-licensed open weights. Performance on par with OpenAI o1, API pricing $0.55/$2.19 per 1M tokens (vs o1 $15/$60). Nasdaq AI loses $1T in two days.