DeepSeek-V3: China releases a shockingly cheap open frontier model
In one sentence DeepSeek publishes V3, MoE 671B (37B active), competitive with GPT-4o and Claude 3.5 Sonnet. Training: 2.788M H800 GPU-hours, claimed cost $5.6M. Changes the 'frontier = billions' narrative.
Late December 2024 a Chinese startup from Hangzhou, DeepSeek, releases an open-weights model called DeepSeek-V3. Technically huge: 671 billion total parameters in a "mixture of experts" architecture (using 37B at a time). On coding, math and reasoning benchmarks, it competes with GPT-4o and Claude 3.5 Sonnet, while Llama 3.1 405B (the other open frontier model) lags behind.
The shocking part: DeepSeek claims it was trained for about $5.6 million in compute (2.788 million H800 GPU-hours). OpenAI, Anthropic, Google spend hundreds of millions or billions on comparable-capability models. Even if $5.6M is just the "final training run" (excluding research, failures, salaries), the number forces the whole industry to reconsider frontier model costs.
Weights are released on Hugging Face with a friendly commercial license. It's the first open model that many consider truly competitive with closed frontier models. It paves the way for the "DeepSeek shock" of January 2025 with R1.
Companies
DeepSeek
Tools
DeepSeek-V3
Tags
Sources