Longformer: sliding-window attention for long documents
In one sentence Allen Institute for AI releases Longformer, a transformer that combines local sliding-window attention with global attention on special tokens, scaling linearly up to 4096 tokens and beating RoBERTa on long-document tasks.
Models like BERT can read at most a few hundred words. If you want them to read an article, a contract, or a whole PDF, you have to chunk them, losing context.
Allen Institute presents Longformer, a variant that changes how the model looks at words. Instead of comparing every word with every other word, it only compares nearby ones (a sliding window), plus a few "key points" that look at the whole text.
Result: the same BERT can now read 4,000-word documents or more while keeping performance. It's one of the first practical models for QA, summarization, and classification on real documents.
Companies
Allen Institute for AI
Tools
Longformer
Tags
Sources