Microsoft Turing-NLG: 17B parameters and the birth of DeepSpeed
In one sentence Microsoft Research unveils Turing-NLG, the largest announced language model to date (17B), made possible by the DeepSpeed/ZeRO optimizer that drastically cuts GPU memory.
Training ever-larger models needs ever-more GPU memory, but GPUs don't grow that fast. Microsoft shows a fix: split the model across many GPUs intelligently so no one of them has to hold everything.
With this trick they build Turing-NLG, the largest language model so far: 17 billion parameters, ten times bigger than GPT-2. It can summarize, answer questions, and write coherent text for pages.
The work matters beyond Microsoft: the library that enables it, DeepSpeed, is open-sourced and lets other researchers train giant models too.
Companies
Microsoft
Tools
Turing-NLG, DeepSpeed
Tags
Sources