Skip to content
AImpact
IT EN
Data Intermediate Also known as: Byte Pair Encoding · Codifica a coppie di byte

BPE

/bee-pee-ee/

A tokenization algorithm that starts from single characters and progressively merges the most frequent pairs, building a vocabulary of subwords.

ShareLinkedInX

In practice

It is used by GPT, Llama, Mistral, and nearly every Western LLM. It explains why "playing" may become `play` + `ing`: common pieces get one token, rare words get many. It directly affects per-token cost and quality on non-English languages.

Related terms

Seen in the wild

0 entries mentioning it

No archive entry mentions it explicitly. Appears in broader contexts.

← All terms