GLIDE: OpenAI shifts from autoregressive to CLIP-guided diffusion
OpenAI publishes GLIDE, a text-to-image diffusion model with classifier-free guidance — technical foundation for DALL·E 2 and the models that follow.
31 entries
OpenAI publishes GLIDE, a text-to-image diffusion model with classifier-free guidance — technical foundation for DALL·E 2 and the models that follow.
OpenAI publishes WebGPT, a GPT-3 fine-tune that learns to use a text browser to search the web for answers with source citations, trained via imitation learning + RLHF.
DeepMind releases Gopher, a 280B dense model, alongside a systematic 152-task study and a companion paper on ethical considerations of foundation models.
DeepMind publishes RETRO, a 7B-parameter model that retrieves relevant passages from a 2T-token database at inference, matching the performance of models 25x larger.
Eighteen months after the GPT-3 paper, OpenAI removes the API access waitlist and lets any developer sign up, accelerating mainstream adoption of foundation models.
First AI coding tool integrated into a browser IDE: intelligent code completion for students and developers with no local configuration required.
Jeff Dean outlines Pathways, Google's unified architecture for sparse, multitask, multimodal models — the infrastructure foundation that will power PaLM and Gemini.
Google shows that training a model on 60+ tasks framed as instructions dramatically improves zero-shot performance on unseen tasks.
Meta releases PyTorch 1.10 with CUDA Graphs integration, FX-based quantization, TorchScript improvements — consolidating leadership of the framework for AI research and production.
Microsoft and NVIDIA announce MT-NLG, a 530B-parameter dense model trained with DeepSpeed and Megatron-LM, at the time the largest dense LM ever produced.
GitHub introduces Copilot Labs, a VS Code extension hosting experimental features beyond simple autocomplete: code explanation, language translation, test generation.
Meta AI publishes HuBERT, a self-supervised audio model based on masked prediction of discrete clusters — conceptual base for Whisper, w2v-BERT and audio-multimodal models.
GitHub extends the Copilot technical preview to the main JetBrains IDEs (IntelliJ, PyCharm, GoLand, WebStorm) and to Neovim, taking AI coding outside the VS Code ecosystem.
Stanford's Center for Research on Foundation Models publishes a 200+ page report coining the term foundation models, now standard in technical, academic and regulatory discourse.
OpenAI releases the Codex API in private beta, giving developers direct access to the code generation model behind GitHub Copilot, free during the beta.
OpenAI releases Triton, a Python-like language and compiler for writing custom GPU kernels at performance close to hand-written CUDA — dramatically lowering the barrier for model optimization.
DeepMind publishes AlphaFold 2 code and weights on GitHub and, with EMBL-EBI, releases a database with predicted structures for 350,000 human and model-organism proteins.
NVIDIA adds interleaved pipeline scheduling and sequence parallelism to Megatron-LM, enabling training of the 530B-parameter MT-NLG on 2240 A100 GPUs with Microsoft.
OpenAI releases Evaluating Large Language Models Trained on Code describing Codex (the model powering GitHub Copilot) and introduces HumanEval, the standard benchmark for code generation.
GitHub and OpenAI launch a technical preview of an assistant that suggests entire lines and functions right in the editor, based on a GPT-3-derived model trained on public code.
VITS unifies the acoustic model and vocoder into a single end-to-end model, achieving quality surpassing Tacotron 2 with faster inference.
EleutherAI releases GPT-J, a 6B-parameter model trained in JAX on TPUs, performance comparable to GPT-3 Curie, shipped under Apache 2.0.
EleutherAI publishes The Pile, an 825 GB dataset built from 22 diverse sub-datasets — the base for GPT-Neo, GPT-J, Pythia and much of the early open source ecosystem.
BAAI (Beijing Academy of Artificial Intelligence) introduces Wu Dao 2.0, a 1.75 trillion-parameter multimodal Mixture of Experts model — China's response to GPT-3 and Switch Transformer.
Dario and Daniela Amodei, former VP of Research and VP of Safety at OpenAI, co-found Anthropic with a group of researchers, explicitly focused on AI safety and interpretability.
At Google I/O, Google announces MUM (Multitask Unified Model), T5-based, claimed 1000x more powerful than BERT, capable of handling 75 languages and multimodal content.
At Google I/O, Sundar Pichai introduces LaMDA (Language Model for Dialogue Applications), a 137B-parameter model fine-tuned for dialogue, direct ancestor of Bard.
OpenAI ships the content filter endpoint to classify GPT-3 outputs as safe/sensitive/unsafe — the first integrated moderation tool inside a commercial foundation-model API.
EleutherAI releases GPT-Neo 1.3B and 2.7B, open source language models trained on The Pile — the first serious attempt to replicate the GPT-3 architecture with public weights.
Google Brain publishes Switch Transformer, a sparse model with 1.6 trillion parameters that activates only one expert per token, proving sparse routing can scale beyond dense models.
OpenAI announces DALL·E (generates images from text) and CLIP (aligns images and text in the same semantic space) side by side. Two pieces of the multimodal puzzle.