February 13, 2020 Medium Foundation Models · 1 min read

Microsoft Turing-NLG: 17B parameters and the birth of DeepSpeed

In one sentence Microsoft Research unveils Turing-NLG, the largest announced language model to date (17B), made possible by the DeepSpeed/ZeRO optimizer that drastically cuts GPU memory.

Verified Official source

ShareLinkedIn X

Reading level

Training ever-larger models needs ever-more GPU memory, but GPUs don't grow that fast. Microsoft shows a fix: split the model across many GPUs intelligently so no one of them has to hold everything.

With this trick they build Turing-NLG, the largest language model so far: 17 billion parameters, ten times bigger than GPT-2. It can summarize, answer questions, and write coherent text for pages.

The work matters beyond Microsoft: the library that enables it, DeepSpeed, is open-sourced and lets other researchers train giant models too.

Companies

Microsoft

Tools

Turing-NLG, DeepSpeed