Skip to content
AImpact
IT EN
Infrastructure Beginner Also known as: Calcolo in inferenza · Test-time compute

Inference compute

The amount of compute the model uses at response time, not during training. More inference compute often means better but slower and pricier answers.

ShareLinkedInX

In practice

Reasoning models shift resources from training to inference. For anyone deploying a service this is the most visible cost line: every call burns GPU. Ways to cut it: caching, smaller models, quantization, batching.

Related terms

← All terms