Skip to content
AImpact
IT EN
Landmark Local AI · 1 min read

llama.cpp: LLaMA 7B runs 4-bit on MacBook CPU

In one sentence Georgi Gerganov brings Meta's LLaMA to consumer CPUs via 4-bit C++ quantization: the first foundation model practically usable offline on a laptop.

Verified Official source
ShareLinkedInX
Reading level

Meta had just released LLaMA, a family of powerful language models designed to run on GPU clusters. A few days later, Georgi Gerganov published llama.cpp: a compressed, C++-rewritten version of the model that runs on a regular MacBook's CPU.

The technical breakthrough is 4-bit quantization: instead of full-precision floating-point numbers, each model weight is approximated with just 4 bits. Quality drops slightly, but the model becomes four times smaller and much faster on common hardware.

For the first time, a language model comparable to GPT-3 in structure could run on anyone's laptop, with no internet, no subscription, no server.

Companies

Georgi Gerganov (indipendente), Meta AI

Tools

llama.cpp, LLaMA

Tags

LLaMAllama.cppC++QuantizzazioneGeorgi Gerganov

Sources