2022

52 entries

December 16, 2022 High

DeepMind RT-1: the first Transformer trained on real robotics data

DeepMind releases RT-1, a robotics transformer trained on 130,000 real episodes with 13 robots, generalizing to never-seen tasks.

Robotics DeepMindRT-1Robotics Transformer

December 15, 2022 Medium

Constitutional AI: the model self-corrects without humans in the loop

Anthropic publishes Constitutional AI: instead of pure RLHF, the model critiques and revises its own responses following a written 'constitution'. Less human labeling, more transparency.

AI Security AnthropicConstitutional AIRLAIF

December 1, 2022 Medium

Boston Dynamics adds visual AI to Spot: map-free autonomy

Spot gains advanced autonomous navigation and industrial anomaly detection via visual AI, operating without pre-loaded maps.

Robotics Boston DynamicsSpotAutonomous Navigation

November 30, 2022 Landmark ★ On my workflow

ChatGPT: AI lands in everyone's browser

OpenAI launches ChatGPT, a free conversational interface on GPT-3.5 aligned via RLHF. It crosses one million users in five days.

Foundation Models OpenAIChatGPTGPT-3.5

November 24, 2022 Medium

Stable Diffusion 2.0: new architecture and OpenCLIP encoder

Stability AI releases SD 2.0 with OpenCLIP replacing CLIP, native 768x768 resolution, a new depth2img model, and improved inpainting. A controversial release due to breaking compatibility with existing LoRAs and prompts.

Image & Video Gen Stable Diffusion 2.0Stability AIOpenCLIP

November 16, 2022 Medium

Notion AI alpha: AI inside the tool you already work in

Notion launches Notion AI in private alpha, GPT integrated inside pages: summarize, rewrite, translate, brainstorm without leaving the document.

Enterprise AI NotionNotion AIProductivity

November 15, 2022 Medium

Galactica: Meta launches (and pulls in three days) a science LLM

Meta unveils Galactica, a 120B-parameter model trained on 48 million scientific papers. The public demo is pulled after three days under a wave of criticism for authoritative hallucinations.

Foundation Models MetaGalacticaScience LLM

November 9, 2022 High

NVIDIA Triton Inference Server 2.x: the de facto standard for production inference

NVIDIA consolidates Triton as the open-source platform for serving PyTorch, TensorFlow, and ONNX models in production, with dynamic batching, multi-GPU support, and gRPC/HTTP APIs.

AI Infrastructure NVIDIATritonInference Server

November 1, 2022 Medium

HuggingFace Accelerate: One Python Script for CPU, GPU, TPU, and Mixed Precision

HuggingFace Accelerate provides a unified API that runs the same training code on any hardware without changes, becoming the backbone of most open LLM training pipelines.

AI Infrastructure AccelerateHuggingFacemulti-GPU

October 25, 2022 Landmark

LangChain: the framework for LLM applications is born

Harrison Chase releases LangChain, an open-source Python library to chain LLMs with prompt templates, memory, tools and external data sources. It will become the default stack of the first LLM apps.

Agents LangChainFrameworkLLM Apps

October 25, 2022 Medium

Textual Inversion: inject a custom concept into diffusion models

Weizmann Institute publishes Textual Inversion: learning a new text token representing a custom concept from 3-5 images, without modifying model weights.

Image & Video Gen Textual Inversionpersonalizationembedding

October 24, 2022 High

EnCodec: Meta AI compresses audio with neural networks and beats Opus

EnCodec compresses 24kHz stereo audio to just 1.5–12 kbps at quality surpassing Opus, becoming the standard vocoder for modern neural TTS.

Voice & Audio EnCodecNeural CodecAudio Compression

October 15, 2022 High

MT-OPT: Google trains a single robot policy on 800+ tasks and 57,000 hours of real data

Google pre-trains a single policy on over 800 real robot tasks and 57,000 hours of real-world data, demonstrating for the first time zero-shot transfer to new tasks through large-scale multi-task offline learning.

Robotics MT-OPTmulti-task robot learningoffline RL

October 12, 2022 High

GPTQ: 4-bit post-training quantization making GPT-scale inference practical

Frantar et al. (ETH Zurich) publish GPTQ: accurate 4-bit quantization without significant fine-tuning, the first technique to make inference of 175B-parameter models practical on consumer hardware.

AI Infrastructure GPTQQuantizzazione4-bit

October 6, 2022 Landmark

ReAct: the framework that unites reasoning and acting in LLMs

Yao et al. introduce ReAct, a schema alternating explicit thoughts (Thought) and concrete actions (Act) in LLMs, the theoretical foundation of all modern agents.

Agents ReActReasoningTool Use

October 5, 2022 Medium

Imagen Video and Phenaki: Google answers on text-to-video

A week after Make-A-Video, Google Research unveils Imagen Video and, around the same time, Phenaki: two different approaches to text-to-video, with longer, more coherent clips.

Image & Video Gen GoogleImagen VideoPhenaki

September 29, 2022 Medium

Make-A-Video: Meta unveils the first credible text-to-video

Meta AI shows Make-A-Video, a system that generates short animated clips from a text description by reusing a pre-existing text-to-image model.

Image & Video Gen MetaMake-A-VideoText-to-Video

September 27, 2022 Medium

Hugging Face Inference Endpoints: deploy LLMs in two clicks

Hugging Face launches Inference Endpoints, a managed service to deploy Hub models on AWS, Azure or GCP with autoscaling, on-demand GPUs and private endpoints.

AI Infrastructure Hugging FaceInference EndpointsDeployment

September 22, 2022 High

Flan-T5 and Flan-PaLM: instruction tuning scales to 1,800 tasks

Google scales instruction tuning to 1,800 tasks and 540B parameters, open-sources Flan-T5, and proves that chain-of-thought reasoning is teachable via fine-tuning.

Foundation Models Flan-T5instruction tuningchain-of-thought

September 21, 2022 High

Whisper open source: audio transcription becomes a commodity

OpenAI releases Whisper under MIT license: a speech-to-text model trained on 680,000 hours of multilingual audio, near commercial-grade quality, runs locally.

Voice & Audio OpenAIWhisperASR

September 16, 2022 Medium

Character.AI: persona chatbots from ex-Google founders

Noam Shazeer and Daniel De Freitas, fathers of LaMDA, launch Character.AI: a platform letting anyone create and chat with AI characters, from Einstein to anime personas.

Foundation Models Character.AIChatbotPersona

September 14, 2022 High

Prompt Injection: when user input hijacks system instructions

Riley Goodside and Perez et al. formalize Prompt Injection: an attack where malicious user input overwrites an LLM's system instructions, bypassing policies and guardrails.

AI Security Prompt InjectionLLM SecurityAdversarial Attacks

September 12, 2022 High

AudioLM: Google teaches a language model to listen and continue audio

AudioLM generates long-range coherent audio using two tiers of tokens — semantic and acoustic — with no text or score conditioning.

Voice & Audio AudioLMLanguage ModelAudio Generation

August 25, 2022 High

DreamBooth: generate your subject in any style with 3-5 photos

Google Research publishes DreamBooth: fine-tune a diffusion model on 3-5 images of a specific subject to reproduce it in any context or style. Foundation of all personalized AI image generation.

Image & Video Gen DreamBoothpersonalizationfine-tuning

August 22, 2022 Landmark

Stable Diffusion: image generation goes open

Stability AI publicly releases weights and code of a text-to-image latent diffusion model that runs on a consumer GPU. AI image generation leaves the cloud.

Image & Video Gen Stable DiffusionStability AIDiffusion Models

August 16, 2022 Medium

GitHub Copilot: 40% of code in active files written by AI

GitHub publishes first real-world data: 40% of code in files with Copilot active is AI-generated. First quantitative benchmark on AI tools' actual impact on developer output.

AI Coding GitHub CopilotDeveloper ProductivityResearch

August 16, 2022 High

SayCan: grounding LLMs in robot affordances

Google Robotics shows how to combine an LLM for high-level planning with robot value functions that filter only physically executable actions.

Robotics GoogleSayCanEmbodied AI

July 22, 2022 High

diffusers v0.1: the standard library for diffusion models

Hugging Face releases diffusers, a modular Python library for diffusion models — text-to-image, audio and beyond. It quickly becomes the de facto standard.

Open Source Models Hugging FaceDiffusersLibrary

July 20, 2022 Medium

DALL-E 2 enters beta: generative image AI for the public

OpenAI opens DALL-E 2 in beta to over one million waitlist users, with a pay-per-image credit system. First large-scale consumer product for image generation.

Image & Video Gen OpenAIDALL-E 2Beta

July 12, 2022 High

BLOOM 176B: the first truly open large multilingual LLM

The BigScience collective releases BLOOM, a 176-billion-parameter model trained on 46 human languages and 13 programming languages, under an open RAIL license.

Open Source Models BigScienceBLOOMHugging Face

July 12, 2022 High

Midjourney opens public beta on Discord

Midjourney opens its public beta with a text-to-image model accessible via a Discord bot. Its strong aesthetic default and community turn image generation into a mass phenomenon.

Image & Video Gen MidjourneyDiscordText-to-Image

July 6, 2022 High

Red Teaming LLMs with LLMs: the DeepMind paper that changed safety testing

Perez et al. (DeepMind) show that an LLM can be used as an automatic attacker against another LLM, discovering undesired behaviors at a scale impossible for human teams.

AI Security Red TeamingDeepMindLLM Safety

June 27, 2022 Medium

UL2: Google unifies pretraining paradigms with Mixture-of-Denoisers

Google Research combines three major pretraining objectives into a single 20B model, outperforming GPT-3 on many benchmarks at one-eighth the parameters.

Foundation Models UL2mixture of denoiserspretraining

June 23, 2022 Medium

Tabnine 3.0: AI code completion with privacy-first and local models

Tabnine releases version 3.0 with local or cloud model support, becoming the first mature AI code completion product on the market before Copilot's rise.

AI Coding TabnineCode CompletionLocal AI

June 21, 2022 Landmark

FlashAttention: IO-aware attention that revolutionizes transformer training

Tri Dao (Stanford) publishes FlashAttention: an IO-aware implementation that avoids materializing the attention matrix in HBM, achieving 2-4x speedup and 10x less GPU memory.

AI Infrastructure FlashAttentionAttentionTransformer

June 21, 2022 Landmark

GitHub Copilot: AI for code becomes a product for everyone

GitHub announces general availability of Copilot for all developers at $10/month. It's the first mass-market AI tool living inside the daily code editor.

AI Coding GitHubCopilotOpenAI

June 17, 2022 High

SoundStream: Google's first real-time neural audio codec

SoundStream introduces Residual Vector Quantization to compress audio at 3kbps with quality surpassing Opus at 12kbps, founding the architecture of all modern neural codecs used in audio LLMs.

Voice & Audio SoundStreamneural codecRVQ

June 6, 2022 Medium

Tortoise TTS: convincing voice cloning from 3 seconds of audio

James Betker releases Tortoise TTS, an open source model with few-second voice cloning and human-like vocal quality — the first real breakthrough in accessible TTS.

Voice & Audio TTSVoice CloningOpen Source

May 23, 2022 High

Imagen: Google enters text-to-image generation

Google Research unveils Imagen, a text-to-image diffusion model that uses a frozen T5 text encoder and beats DALL-E 2 on benchmarks for photorealistic fidelity.

Image & Video Gen GoogleImagenText-to-Image

May 12, 2022 High

Gato: DeepMind tries a single agent for 600+ tasks

DeepMind unveils Gato, a 1.2-billion-parameter Transformer that with the same weights plays Atari games, controls a robot arm, captions images and chats.

Multimodal AI DeepMindGatoGeneralist Agent

May 3, 2022 High

Meta OPT-175B: the first 175-billion LLM opened to researchers

Meta AI releases OPT-175B, a language model comparable in size to GPT-3, with weights available to researchers and a public training logbook.

Open Source Models MetaOPTOpen Source

April 29, 2022 High

DeepMind Flamingo: the first few-shot visual language model

Flamingo brings few-shot learning to vision: SOTA on VQA and captioning with no task-specific fine-tuning.

Multimodal AI Visual Language ModelFew-Shot LearningVQA

April 20, 2022 High

NaturalSpeech: Microsoft achieves human parity on LJSpeech benchmark

NaturalSpeech is the first TTS system to achieve a MOS statistically indistinguishable from recorded human speech on the LJSpeech benchmark, marking a historic milestone for speech synthesis.

Voice & Audio NaturalSpeechMicrosofthuman parity

April 6, 2022 High

DALL·E 2: the quality leap in image generation

OpenAI announces DALL·E 2, a diffusion-based text-to-image model producing photorealistic 1024×1024 images. Initially waitlist-only, public access in July.

Image & Video Gen OpenAIDALL-E 2Diffusion

April 5, 2022 Medium

PaLM 540B: Google's GPT-3 answer brings chain-of-thought

Google publishes PaLM, a 540B-parameter model trained on the new Pathways system. Demonstrates emergent reasoning capabilities when guided with chain-of-thought.

Foundation Models GooglePaLMPathways

March 29, 2022 Landmark

Chinchilla: the big models were undertrained

DeepMind publishes the Chinchilla paper and shows that, given equal compute, smaller models trained on far more tokens beat oversized undertrained ones.

Foundation Models DeepMindChinchillaScaling Laws

March 22, 2022 Landmark

NVIDIA H100 and Hopper architecture: the foundation-model GPU

At GTC 2022 NVIDIA unveils the Hopper architecture and the H100 GPU, with FP8 Transformer Engine and NVLink 4. It will become the hardware substrate for nearly every large LLM of the following years.

AI Infrastructure NVIDIAH100Hopper

March 21, 2022 High

Self-Consistency: sample multiple reasoning paths for better answers

Wang et al. (Google Brain) show that sampling N diverse reasoning paths and taking the most frequent answer beats greedy decoding on all reasoning benchmarks.

Foundation Models Chain of ThoughtSelf-ConsistencyReasoning

February 2, 2022 High

AlphaCode: DeepMind takes on competitive programmers

DeepMind unveils AlphaCode, a system that generates code for competitive programming problems and ranks in the top half of human participants on Codeforces.

AI Coding DeepMindAlphaCodeCompetitive Programming

January 27, 2022 Medium

Coqui TTS: open source speech synthesis for everyone

Coqui TTS is an open source Python library for quality text-to-speech, forked from Mozilla TTS, supporting over 1100 languages and adopted by the HuggingFace community.

Voice & Audio CoquiTTSOpen Source

January 27, 2022 High

InstructGPT: the fine-tuning that teaches GPT to obey

OpenAI introduces InstructGPT: a GPT-3 refined with human feedback (RLHF) that follows instructions better than the 175B base model despite being much smaller (1.3B parameters).

Foundation Models OpenAIInstructGPTRLHF

January 24, 2022 Medium

UnifiedIO (AI2): first unified sequence-to-sequence model for text, images, audio, and video

AI2 and University of Washington present UnifiedIO: the first sequence-to-sequence model capable of handling text, images, audio, video, and structured data as both inputs and outputs through a single architecture, trained on 80+ tasks simultaneously.

Multimodal AI UnifiedIOmultimodalunified model