Skip to content
AImpact
IT EN
Inference Intermediate Also known as: Key-Value Cache · Cache chiavi-valori

KV Cache

/kay-vee cache/

A temporary GPU memory that stores attention computations for tokens already seen, so the model does not recompute them on every new token generated.

ShareLinkedInX

In practice

It is why generating the tenth token costs less than the first: the cache avoids redoing work. It eats a lot of VRAM and grows with context, so it is often the real bottleneck for serving many users in parallel. Optimizing it (paged, quantized) is central to cutting inference cost.

Related terms

← All terms