Reformer: the transformer that handles very long sequences

In one sentence Google Research presents Reformer, a transformer variant using LSH attention and reversible layers to go from O(n²) to O(n log n) and handle sequences up to 64k tokens.

Verified Official source

ShareLinkedIn X

Standard transformers, like the ones behind GPT-2 or BERT, have a problem: the longer the input text, the more memory and compute explode. Reading an entire book would require data-center hardware.

Google shows a way around this bottleneck: instead of comparing every word with every other word, the model groups "similar" ones using a hashing trick, and only compares inside each group.

Result: the same transformer can read much longer sequences with less memory. A first step toward models that understand entire documents, not just paragraphs.