Reformer: the transformer that handles very long sequences
In one sentence Google Research presents Reformer, a transformer variant using LSH attention and reversible layers to go from O(n²) to O(n log n) and handle sequences up to 64k tokens.
Standard transformers, like the ones behind GPT-2 or BERT, have a problem: the longer the input text, the more memory and compute explode. Reading an entire book would require data-center hardware.
Google shows a way around this bottleneck: instead of comparing every word with every other word, the model groups "similar" ones using a hashing trick, and only compares inside each group.
Result: the same transformer can read much longer sequences with less memory. A first step toward models that understand entire documents, not just paragraphs.
Companies
Tools
Reformer
Tags
Sources