2020

27 entries

December 31, 2020 High

The Pile: the open-source 825 GB dataset for training LLMs

EleutherAI releases The Pile, an 825 GB composite text dataset curated from 22 different sources (arXiv, GitHub, PubMed, books, StackExchange…), designed for pre-training large open-source language models.

Open Source Models EleutherAIThe PileDataset

December 23, 2020 High

MuZero in Nature: mastering games without knowing the rules

DeepMind publishes MuZero in Nature: the RL agent learns world dynamics on its own and reaches superhuman performance on Go, chess, shogi, and 57 Atari games without being given the rules.

Foundation Models DeepMindMuZeroReinforcement Learning

December 8, 2020 Medium

Big Bird at NeurIPS 2020: sparse attention for sequences up to 4096 tokens

Google Research presents Big Bird at NeurIPS 2020, a transformer with sparse attention (local + global + random) that scales linearly, reaches SOTA on long-document QA and summarization, and proves Turing-completeness.

Foundation Models GoogleBig BirdSparse Attention

November 30, 2020 Landmark

AlphaFold 2 wins CASP14 and solves protein folding

DeepMind announces that AlphaFold 2 has won the CASP14 competition with mean GDT >90, on par with experimental methods — widely regarded as solving the 50-year-old protein folding problem.

Foundation Models DeepMindAlphaFoldCASP

November 4, 2020 Medium

Bing in production on Turing: deep AI in worldwide-scale search

Microsoft announces a Bing-wide production deployment of Turing-NLR (next-gen NLP) models on Azure GPUs, described as the largest search-quality improvement ever.

Enterprise AI MicrosoftBingTuring

October 26, 2020 Medium

DeepMind acquires MuJoCo and makes it free

DeepMind announces it has acquired MuJoCo, the physics simulator used in most RL and robotics research, and commits to making it free for everyone — a first step toward the full open-source release in 2022.

Robotics DeepMindMuJoCoPhysics Simulator

October 23, 2020 Medium

mT5: a multilingual T5 over 101 languages

Google Research publishes mT5, a T5 variant pre-trained on mC4 (multilingual Common Crawl) over 101 languages, which becomes a standard baseline for many cross-lingual NLP tasks.

Foundation Models GoogleT5mT5

October 22, 2020 Landmark

Vision Transformer (ViT): "An Image is Worth 16x16 Words"

Google Research introduces the Vision Transformer, applying a pure transformer to image patches as if they were tokens, and shows that with enough pre-training it beats CNNs on ImageNet and other vision benchmarks.

Multimodal AI GoogleVision TransformerViT

September 22, 2020 High

Microsoft acquires the exclusive GPT-3 license

Microsoft announces an exclusive license to integrate and redistribute GPT-3 in its products and cloud services, while OpenAI's public API keeps operating. The first major enterprise deal on foundation models.

Enterprise AI MicrosoftOpenAIGPT-3

September 9, 2020 High

DeepSpeed ZeRO-3: training models beyond 100 billion parameters

Microsoft announces ZeRO Stage 3 in DeepSpeed: by sharding parameters across GPUs in addition to gradients and optimizer states, it enables training of 100B+ parameter models on reasonable-size clusters.

AI Infrastructure MicrosoftDeepSpeedZeRO-3

August 4, 2020 Medium

PyTorch Lightning 1.0: a boilerplate-free training loop

William Falcon and team ship PyTorch Lightning 1.0, a framework that separates research code (model) from engineering (training loop, distributed, checkpointing, logging) and becomes the de facto standard for many open projects.

AI Infrastructure PyTorch LightningOpen SourceTraining Loop

July 29, 2020 Medium

Google announces TPU v4 with MLPerf 0.7 records

Posting MLPerf Training 0.7 results, Google reveals TPU v4, a new custom deep-learning accelerator, claiming it built the "world's fastest training supercomputer" with a 4,096-chip pod.

AI Infrastructure GoogleTPU v4Pod

July 22, 2020 Medium

Longformer: sliding-window attention for long documents

Allen Institute for AI releases Longformer, a transformer that combines local sliding-window attention with global attention on special tokens, scaling linearly up to 4096 tokens and beating RoBERTa on long-document tasks.

Foundation Models AllenAILongformerLong Context

July 9, 2020 High

HuggingFace Transformers 3.0: Rust tokenizers and the Model Hub

HuggingFace releases Transformers 3.0 with the Rust-based tokenizers library (up to 100× faster), new NLP pipelines, and tighter Model Hub integration, cementing the de facto standard for using pretrained models in Python.

Open Source Models HuggingFaceTransformersTokenizers

July 3, 2020 High

EleutherAI is founded: a community to replicate GPT-3 in the open

Connor Leahy, Sid Black, and Leo Gao found EleutherAI on Discord with the goal of replicating GPT-3 and releasing models, code, and datasets in the open, kicking off projects like GPT-Neo, GPT-J, and The Pile.

Open Source Models EleutherAIGPT-NeoOpen Source

June 20, 2020 High

wav2vec 2.0: Facebook AI's "BERT for speech"

Facebook AI publishes wav2vec 2.0, a self-supervised model that learns representations from raw audio and reaches SOTA on LibriSpeech with as little as 10 minutes of labeled data.

Voice & Audio Facebook AIwav2vec 2.0Speech Recognition

June 17, 2020 Medium

Image GPT: generative pretraining for images

OpenAI introduces Image GPT (iGPT), a transformer that treats pixels as tokens and shows that GPT-style sequential generative pretraining works on images too, reaching competitive performance on CIFAR-10.

Multimodal AI OpenAIImage GPTGenerative Pretraining

June 11, 2020 Landmark

OpenAI launches the GPT-3 API in private beta

Two weeks after the paper, OpenAI opens a private beta of the first general API for its language models, available to a few hundred developers building applications directly on top of GPT-3.

Foundation Models OpenAIGPT-3API

May 28, 2020 Landmark

GPT-3: the paper that opens the scaling-laws era

OpenAI publishes 'Language Models are Few-Shot Learners' and shows that at 175B parameters a model learns new tasks from a handful of examples in the prompt.

Foundation Models OpenAIGPT-3Few-shot Learning

May 22, 2020 Landmark

RAG: Retrieval-Augmented Generation enters the literature

Lewis et al. at Facebook AI publish the RAG paper, combining a dense retriever (DPR) with a seq2seq generator (BART) to answer knowledge-intensive questions without baking all facts into the weights.

Foundation Models Facebook AIRAGRetrieval-Augmented Generation

May 14, 2020 Landmark

NVIDIA A100: Ampere arrives and the GPU that trains GPT-3

At GTC 2020 Jensen Huang announces the A100 GPU built on the Ampere architecture: 54 billion transistors, 40-80 GB HBM2e, TF32, 2:4 structured sparsity, and MIG support.

AI Infrastructure NVIDIAA100Ampere

April 30, 2020 Medium

OpenAI Jukebox: generating whole songs with vocals

OpenAI releases Jukebox, a generative model that produces raw songs (audio + vocals + lyrics) conditioned on artist and genre, built on a stack of VQ-VAE and autoregressive transformers.

Voice & Audio OpenAIJukeboxMusic Generation

April 9, 2020 Low

fairseq stabilizes modular transformer support

Facebook AI Research consolidates fairseq as the reference sequence-to-sequence framework: it adds modular support for BART, RoBERTa, mBART, wav2vec and becomes the primary codebase for FAIR's 2020 models.

Open Source Models MetaFacebook AIfairseq

March 23, 2020 Medium

ELECTRA: more efficient NLP pre-training than BERT

Clark, Luong, Le, and Manning publish ELECTRA at ICLR 2020: instead of masked language modeling, it trains the model to detect tokens replaced by a small generator, matching BERT with a quarter of the compute.

Foundation Models GoogleStanfordELECTRA

February 13, 2020 Medium

Microsoft Turing-NLG: 17B parameters and the birth of DeepSpeed

Microsoft Research unveils Turing-NLG, the largest announced language model to date (17B), made possible by the DeepSpeed/ZeRO optimizer that drastically cuts GPU memory.

Foundation Models MicrosoftTuring-NLGLarge Language Models

January 28, 2020 Medium

Google Meena: the 2.6B end-to-end chatbot

Google introduces Meena, a 2.6B-parameter conversational model trained on 341 GB of social dialogue, along with SSA, a new metric for evaluating chatbot quality.

Foundation Models GoogleMeenaDialogue

January 13, 2020 Medium

Reformer: the transformer that handles very long sequences

Google Research presents Reformer, a transformer variant using LSH attention and reversible layers to go from O(n²) to O(n log n) and handle sequences up to 64k tokens.

Foundation Models GoogleReformerEfficient Transformers