Qwen, Mistral, Phi: which local models are worth using in 2025
A guide to the open source models that actually run well locally: Qwen2.5, Mistral Small, Phi-3. How to choose based on available VRAM, task type, and Italian language support.
Published: June 3, 2025
Not all local models are equal. Some look good on benchmarks and are useless in real use. Here are the ones I actually use.
Qwen2.5: the best for non-English languages
Alibaba has done serious work on multilingual support with Qwen2.5. It’s not marketing — you can see it immediately by testing it on any text: the grammar holds up, context is maintained, and it doesn’t switch to English mid-paragraph.
If you have 8GB of VRAM or less (a typical RTX 3060 or 4060), start with Qwen2.5-7B:
ollama pull qwen2.5:7b
With 16GB of VRAM move up to the 14B, which is noticeably better for long summarization and document Q&A:
ollama pull qwen2.5:14b
If you have a server with serious GPU hardware (A100, two RTX 4090s in NVLink, or similar), the 72B is the choice — quality comparable to GPT-4o on many tasks.
For code there’s Qwen2.5-Coder-32B, which is the best open source model for completion and review. It requires a GPU with 20+ GB of VRAM or a CPU with 64GB of RAM (it runs, but slowly):
ollama pull qwen2.5-coder:32b
Mistral Small 3.1: the smart compromise
Mistral Small 3.1 at 24B is excellent if you want something that runs on a Mac M2/M3 with 16GB unified memory without the fan spinning up. Bilingual without any special configuration, great for chat, text drafting, and document analysis.
ollama pull mistral-small3.1
The strength here is the quality-to-weight ratio: 24B parameters with Q4 quantization weigh around 14GB, manageable on consumer hardware.
Phi-3.5-mini: for low-resource setups
If you’re on CPU, a company laptop, or you want a model that responds in 2 seconds without a GPU, Microsoft’s Phi-3.5-mini (3.8B parameters) is the choice:
ollama pull phi3.5
It won’t perform miracles on long texts or complex reasoning, but for simple tasks — answering short questions, formatting output, generating snippets — it’s surprisingly capable.
A note on Kimi
Kimi k1.5 from Moonshot AI often comes up in discussions about local models. To be clear: it is not a local model. It’s a cloud service accessible via web and API, like GPT-4o or Claude. You can’t download it, and it doesn’t run on Ollama. You can use it as an alternative to OpenAI/Anthropic models for certain tasks (it has a very long context window), but it has no place in a “local AI in the enterprise” conversation.
Summary table
| Model | Min VRAM | Languages | Main use case | Ollama command |
|---|---|---|---|---|
| Qwen2.5-7B | 8GB | EN/ZH/IT+ | Summarization, Q&A, writing | ollama pull qwen2.5:7b |
| Qwen2.5-14B | 16GB | EN/ZH/IT+ | Long documents, analysis | ollama pull qwen2.5:14b |
| Qwen2.5-Coder-32B | 20GB | EN (code) | Code completion | ollama pull qwen2.5-coder:32b |
| Mistral Small 3.1 | 16GB | EN/IT | Chat, drafting, analysis | ollama pull mistral-small3.1 |
| Phi-3.5-mini | CPU ok | EN (basic IT) | Simple tasks, limited hardware | ollama pull phi3.5 |
What to do
- If you don’t know where to start, install Ollama and try
qwen2.5:7b— it’s the best starting point for enterprise use with normal hardware. - Always test the model on your specific task before deciding: a general benchmark tells you nothing about how it behaves with your documents.
- If you have a Mac with an M-series chip, Mistral Small 3.1 is probably the best choice — it makes good use of unified memory and doesn’t heat up the system.