Skip to content
AImpact
IT EN
Inference Beginner Also known as: Finestra di contesto · Context length

Context window

The maximum number of tokens the model can read and hold in memory in a single call, counting both prompt and response.

ShareLinkedInX

In practice

If you have a 200-page contract and a 200k-token window the whole thing often fits. Otherwise you have to chunk the text or use RAG. More context means higher cost and higher response latency.

Related terms

← All terms