Cohere Command A: the foundation model that runs on-prem on 2 GPUs

In one sentence Cohere ships Command A: 111B parameters, 256K context, multilingual, deployable on 2 H100/A100 GPUs. Positioned for regulated enterprises (banking, healthcare, government) requiring isolated deployment.

Needs review Official source

ShareLinkedIn X

Cohere, the Canadian AI lab founded by Aidan Gomez (co-author of the Attention Is All You Need paper), ships Command A, a model explicitly designed for on-prem deployment. Key points: 111 billion parameters, 256k token context window, and — most notably — runs on just 2 H100 GPUs thanks to efficient quantization.

Unlike OpenAI/Anthropic, which sell only via cloud APIs, Cohere ships weights to enterprise customers to run inside their own data centers. It's the right model for banks, hospitals, and governments that can't ship data to Microsoft or Google.

Claimed performance in the GPT-4o / Claude 3.5 Sonnet range on RAG and enterprise tasks (not consumer chatbots). Strong multilingual: 23 languages, Italian included. North America's answer to Mistral Medium 3.