Skip to content
AImpact
IT EN
Inference Intermediate Also known as: Quantizzazione

Quantization

A technique that reduces the numeric precision of model weights (for example from 16 to 4 bits) so it takes less memory and runs faster.

ShareLinkedInX

In practice

It is what lets you run a Llama 70B on a single GPU or a 7B model on a Mac. You lose a bit of quality but often not much. Typical tools: GGUF, AWQ, GPTQ. Useful for on-prem or edge deployment.

Related terms

← All terms