Skip to content
AImpact
IT EN
Data Intermediate Also known as: WordPiece · SentencePiece

WordPiece / SentencePiece

Two subword tokenization algorithms alternative to BPE: WordPiece is the one in BERT, SentencePiece is the one in T5 and Gemini.

ShareLinkedInX

In practice

WordPiece chooses merges by probability rather than raw frequency. SentencePiece works directly on the raw string without assuming spaces, so it handles Chinese, Japanese, and other space-less languages better. Switching tokenizer requires retraining the model.

Related terms

Seen in the wild

0 entries mentioning it

No archive entry mentions it explicitly. Appears in broader contexts.

← All terms