In practice
LLM APIs charge per input and output token. In English 1 token is roughly 0.75 words, in Italian a bit less. Counting tokens in your prompt helps estimate cost and stay within the context limit.
Related terms
Seen in the wild
6 entries mentioning it- MediumCerebras hits 2,500+ tok/s on Llama: inference record of the year
- HighEMU3: a single transformer for text, images, and video
- MediumRebuff: three-layer prompt injection defense with canary tokens
- HighGemini 1.5 Pro: 1 million tokens in context
- HighAudioPaLM: the first LLM that processes and generates audio as text
- HighHuggingFace Transformers 3.0: Rust tokenizers and the Model Hub