Skip to content
AImpact
IT EN
Models Intermediate Also known as: Maschera causale · Maschera autoregressiva

Causal Mask

A filter applied inside attention that prevents each token from seeing tokens that come after it in the sequence.

ShareLinkedInX

In practice

It is what makes a Transformer "causal" or decoder-only: during training the model learns to predict the next token without cheating by looking ahead. At inference time the mask becomes implicit because future tokens do not yet exist. Without it GPT would not make sense.

Related terms

Seen in the wild

0 entries mentioning it

No archive entry mentions it explicitly. Appears in broader contexts.

← All terms