Models Intermediate Also known as: Maschera causale · Maschera autoregressiva

Causal Mask

A filter applied inside attention that prevents each token from seeing tokens that come after it in the sequence.

In practice

It is what makes a Transformer "causal" or decoder-only: during training the model learns to predict the next token without cheating by looking ahead. At inference time the mask becomes implicit because future tokens do not yet exist. Without it GPT would not make sense.

Seen in the wild

0 entries mentioning it

No archive entry mentions it explicitly. Appears in broader contexts.

← All terms

In practice

Related terms

Seen in the wild