Inference Beginner Also known as: Finestra di contesto · Context length

Context window

The maximum number of tokens the model can read and hold in memory in a single call, counting both prompt and response.

In practice

If you have a 200-page contract and a 200k-token window the whole thing often fits. Otherwise you have to chunk the text or use RAG. More context means higher cost and higher response latency.

Related terms

Token Attention RAG

Seen in the wild

1 entries mentioning it

February 5, 2026

Claude Opus 4.6: 1M context, agent teams, and leadership on Terminal-Bench 2.0

Landmark

← All terms