Skip to content
AImpact
IT EN
Medium Local AI · 1 min read

Local AI 2025: Ollama, MLX LM, Apple Foundation Models triple the speed

In one sentence The Local AI stack matures: Ollama accelerates inference with a better scheduler and compressed KV cache, MLX LM becomes SOTA on Apple Silicon, Apple debuts the Foundation Models framework for native apps. Running Llama 3.3 70B on a MacBook becomes a daily practice.

Needs review Official source
ShareLinkedInX
Reading level

In 2025 the "local AI" stack — models that run on your PC without sending anything to a cloud — takes a big leap. Three converging factors.

Ollama, the most-used tool to download and run LLMs locally, upgrades the backend with a more efficient scheduler and KV cache compression: large models (Llama 3.3 70B, Qwen 2.5 72B, DeepSeek V3) now realistically run on M3/M4 Max MacBooks or 64-96GB workstations.

MLX LM, Apple's ML framework, becomes SOTA for inference on Apple Silicon: often faster than llama.cpp on Macs. And Apple launches the "Foundation Models framework", a native API letting iOS/macOS apps call Apple Intelligence's local LLM (3B params, tuned on the Neural Engine).

Practical outcome: ChatGPT-like locally, free, private, on consumer laptops. Reshapes the economics of many use cases.

Companies

Ollama, Apple, MLX Project

Tools

Ollama, MLX LM, Apple Foundation Models

Tags

OllamaMLXApple SiliconLocal LLMInference Speedup

Sources