Inference Beginner Also known as: Token

Token

The basic unit the model breaks text into: it can be a whole word, a syllable, or a few characters, depending on the tokenizer.

ShareLinkedIn X

In practice

LLM APIs charge per input and output token. In English 1 token is roughly 0.75 words, in Italian a bit less. Counting tokens in your prompt helps estimate cost and stay within the context limit.

Related terms

Tokenizer Context window LLM

Seen in the wild

6 entries mentioning it

June 26, 2025

Cerebras hits 2,500+ tok/s on Llama: inference record of the year

Medium
October 20, 2024

EMU3: a single transformer for text, images, and video

High
June 20, 2024

Rebuff: three-layer prompt injection defense with canary tokens

Medium
February 15, 2024

Gemini 1.5 Pro: 1 million tokens in context

High
September 28, 2023

AudioPaLM: the first LLM that processes and generates audio as text

High
July 9, 2020

HuggingFace Transformers 3.0: Rust tokenizers and the Model Hub

High

← All terms