DeepSeek-V2: Multi-head Latent Attention and the first highly efficient Chinese open MoE
In one sentence DeepSeek releases V2: 236B-total / 21B-active MoE with Multi-head Latent Attention (MLA), drastically cuts KV cache, slashes Chinese API prices by 90%, and ignites a price war.
A Chinese startup called DeepSeek releases the weights of a big model for free with two new ideas.
First: like Mixtral, it uses a huge "Mixture of Experts" (236 billion total) but activates only 21 billion per word.
Second: a new technique called MLA that drastically compresses the "memory" the model has to keep during long conversations. Result: 5-10× cheaper to run.
They also offer an API at very low prices (~14× cheaper than GPT-4-Turbo). In China this triggers a price war: Alibaba, Baidu, ByteDance all cut prices up to 90%.
Companies
DeepSeek
Tools
DeepSeek-V2
Tags
Sources