DeepSeek-V3: China releases a shockingly cheap open frontier model

In one sentence DeepSeek publishes V3, MoE 671B (37B active), competitive with GPT-4o and Claude 3.5 Sonnet. Training: 2.788M H800 GPU-hours, claimed cost $5.6M. Changes the 'frontier = billions' narrative.

Verified Official source

ShareLinkedIn X

Late December 2024 a Chinese startup from Hangzhou, DeepSeek, releases an open-weights model called DeepSeek-V3. Technically huge: 671 billion total parameters in a "mixture of experts" architecture (using 37B at a time). On coding, math and reasoning benchmarks, it competes with GPT-4o and Claude 3.5 Sonnet, while Llama 3.1 405B (the other open frontier model) lags behind.

The shocking part: DeepSeek claims it was trained for about $5.6 million in compute (2.788 million H800 GPU-hours). OpenAI, Anthropic, Google spend hundreds of millions or billions on comparable-capability models. Even if $5.6M is just the "final training run" (excluding research, failures, salaries), the number forces the whole industry to reconsider frontier model costs.

Weights are released on Hugging Face with a friendly commercial license. It's the first open model that many consider truly competitive with closed frontier models. It paves the way for the "DeepSeek shock" of January 2025 with R1.