Skip to content
AImpact
IT EN
Medium Foundation Models · 1 min read

Longformer: sliding-window attention for long documents

In one sentence Allen Institute for AI releases Longformer, a transformer that combines local sliding-window attention with global attention on special tokens, scaling linearly up to 4096 tokens and beating RoBERTa on long-document tasks.

Verified Official source
ShareLinkedInX
Reading level

Models like BERT can read at most a few hundred words. If you want them to read an article, a contract, or a whole PDF, you have to chunk them, losing context.

Allen Institute presents Longformer, a variant that changes how the model looks at words. Instead of comparing every word with every other word, it only compares nearby ones (a sliding window), plus a few "key points" that look at the whole text.

Result: the same BERT can now read 4,000-word documents or more while keeping performance. It's one of the first practical models for QA, summarization, and classification on real documents.

Companies

Allen Institute for AI

Tools

Longformer

Tags

AllenAILongformerLong ContextSliding Window AttentionEfficient Transformers

Sources