In practice
WordPiece chooses merges by probability rather than raw frequency. SentencePiece works directly on the raw string without assuming spaces, so it handles Chinese, Japanese, and other space-less languages better. Switching tokenizer requires retraining the model.
Related terms
Seen in the wild
0 entries mentioning itNo archive entry mentions it explicitly. Appears in broader contexts.