Skip to content
AImpact
IT EN
AI infrastructure 5 min read

Qwen, Mistral, Phi: which local models are worth using in 2025

A guide to the open source models that actually run well locally: Qwen2.5, Mistral Small, Phi-3. How to choose based on available VRAM, task type, and Italian language support.

Published: June 3, 2025

Not all local models are equal. Some look good on benchmarks and are useless in real use. Here are the ones I actually use.

Qwen2.5: the best for non-English languages

Alibaba has done serious work on multilingual support with Qwen2.5. It’s not marketing — you can see it immediately by testing it on any text: the grammar holds up, context is maintained, and it doesn’t switch to English mid-paragraph.

If you have 8GB of VRAM or less (a typical RTX 3060 or 4060), start with Qwen2.5-7B:

ollama pull qwen2.5:7b

With 16GB of VRAM move up to the 14B, which is noticeably better for long summarization and document Q&A:

ollama pull qwen2.5:14b

If you have a server with serious GPU hardware (A100, two RTX 4090s in NVLink, or similar), the 72B is the choice — quality comparable to GPT-4o on many tasks.

For code there’s Qwen2.5-Coder-32B, which is the best open source model for completion and review. It requires a GPU with 20+ GB of VRAM or a CPU with 64GB of RAM (it runs, but slowly):

ollama pull qwen2.5-coder:32b

Mistral Small 3.1: the smart compromise

Mistral Small 3.1 at 24B is excellent if you want something that runs on a Mac M2/M3 with 16GB unified memory without the fan spinning up. Bilingual without any special configuration, great for chat, text drafting, and document analysis.

ollama pull mistral-small3.1

The strength here is the quality-to-weight ratio: 24B parameters with Q4 quantization weigh around 14GB, manageable on consumer hardware.

Phi-3.5-mini: for low-resource setups

If you’re on CPU, a company laptop, or you want a model that responds in 2 seconds without a GPU, Microsoft’s Phi-3.5-mini (3.8B parameters) is the choice:

ollama pull phi3.5

It won’t perform miracles on long texts or complex reasoning, but for simple tasks — answering short questions, formatting output, generating snippets — it’s surprisingly capable.

A note on Kimi

Kimi k1.5 from Moonshot AI often comes up in discussions about local models. To be clear: it is not a local model. It’s a cloud service accessible via web and API, like GPT-4o or Claude. You can’t download it, and it doesn’t run on Ollama. You can use it as an alternative to OpenAI/Anthropic models for certain tasks (it has a very long context window), but it has no place in a “local AI in the enterprise” conversation.

Summary table

ModelMin VRAMLanguagesMain use caseOllama command
Qwen2.5-7B8GBEN/ZH/IT+Summarization, Q&A, writingollama pull qwen2.5:7b
Qwen2.5-14B16GBEN/ZH/IT+Long documents, analysisollama pull qwen2.5:14b
Qwen2.5-Coder-32B20GBEN (code)Code completionollama pull qwen2.5-coder:32b
Mistral Small 3.116GBEN/ITChat, drafting, analysisollama pull mistral-small3.1
Phi-3.5-miniCPU okEN (basic IT)Simple tasks, limited hardwareollama pull phi3.5

What to do

  • If you don’t know where to start, install Ollama and try qwen2.5:7b — it’s the best starting point for enterprise use with normal hardware.
  • Always test the model on your specific task before deciding: a general benchmark tells you nothing about how it behaves with your documents.
  • If you have a Mac with an M-series chip, Mistral Small 3.1 is probably the best choice — it makes good use of unified memory and doesn’t heat up the system.