Timeline

Scroll through the AI breakthroughs that actually matter, year by year.

683 entries

2026

June 2026

June 24, 2026 High

Cursor 0.45: background agents autonomously fix bugs and open PRs from GitHub issues

Cursor 0.45 launches background agents running in cloud VMs that autonomously read GitHub issues, fix bugs, write tests, and open pull requests with no human in the loop.

AI Coding CursorBackground AgentsAutonomous Coding

June 23, 2026 Medium

OpenAI releases o4-mini-high: frontier reasoning at 60% lower cost than o3

OpenAI releases o4-mini-high, a reasoning model matching o3 on key benchmarks like SWE-bench and AIME while costing 60% less, with extended thinking up to 32K tokens and tool use during reasoning chains.

Foundation Models ReasoningOpenAICost Efficiency

June 20, 2026 High

NVIDIA GB300 Blackwell Ultra Launches: 288 GB HBM3e, NVLink 5 at 1.8 TB/s

NVIDIA begins shipping the GB300 Blackwell Ultra GPU featuring 288 GB HBM3e per chip, NVLink 5 at 1.8 TB/s, and double the FP8 throughput of the B200, dramatically lowering inference costs for frontier AI models.

AI Infrastructure GPUNVIDIABlackwell Ultra

June 19, 2026 High

Anthropic releases Memory API GA for Claude: structured persistent storage for agents across sessions

Anthropic has made its Memory API generally available, providing structured persistent storage for Claude agents across sessions with project-scoped memory, user-scoped memory, and semantic search over stored facts.

Agents Memory APIPersistent StorageAgentic AI

June 18, 2026 Medium

xAI releases Grok 3.5: real-time web access, 200K context, DeepSearch 2, and Aurora image gen

xAI launches Grok 3.5 featuring real-time web access, a 200K token context window, multi-step DeepSearch 2 research chains, and the Aurora image generator built in. Available on X Premium+ and via API, positioning against GPT-5.5 and Claude Opus 4.8.

Foundation Models GrokxAIDeepSearch

June 16, 2026 Medium

Mistral AI releases Mistral Medium 3: 22B parameters, best-in-class cost-performance for European AI

Mistral AI launches Medium 3, a 22-billion-parameter model delivering leading cost-to-performance for code, function calling, and JSON mode workloads, available on La Plateforme and as open weights.

Foundation Models Mistral AIOpen WeightsCost Efficiency

June 13, 2026 High

Alibaba releases Qwen 3.5: open-weight models from 7B to 235B MoE with 128K context

Alibaba drops Qwen 3.5 with four dense variants (7B, 14B, 32B, 72B) and a 235B MoE model under Apache 2.0, offering top-tier multilingual performance and GPT-5.5-level results at a fraction of the cost.

Open Source Models QwenOpen WeightsMultilingual

June 12, 2026 High

Microsoft Build 2026: Copilot becomes the agentic OS layer for Windows

Microsoft announces Copilot++ as a native agentic layer in Windows, GitHub Copilot Workspace reaches GA, Azure AI Foundry 2.0 launches, and Phi-4.5 is released as open source.

Enterprise AI Microsoft BuildCopilotGitHub Copilot

June 11, 2026 High

EU AI Act GPAI Compliance Deadline: OpenAI, Google, Anthropic and Meta File Transparency Reports

June 11, 2026 marks the first real enforcement milestone of the EU AI Act for GPAI model providers: major AI companies must register on the EU AI database and publish transparency reports, with fines up to 3% of global annual turnover for non-compliance.

AI Security EU AI ActGPAICompliance

June 10, 2026 Landmark

Meta releases Llama 4.1: Scout, Maverick, and Behemoth MoE models under Apache 2.0

Meta launches Llama 4.1 in three MoE variants — Scout (edge), Maverick (mid-tier with 10M context), and Behemoth (frontier) — all natively multimodal and freely available under Apache 2.0.

Open Source Models LlamaMetaMoE

June 9, 2026 High

OpenAI Codex 2.0: dedicated autonomous coding agent in ChatGPT and API

OpenAI releases Codex 2.0 as a fully autonomous coding agent inside ChatGPT and via API, capable of completing entire repository tasks — reading files, running tests, and opening pull requests — inside isolated cloud sandboxes.

AI Coding CodexCoding AgentAutonomous Coding

June 6, 2026 High

Apple WWDC 2026: Apple Intelligence 2.0 with 4B parameter on-device models

Apple unveils Apple Intelligence 2.0 at WWDC 2026: on-device models upgraded to 4B parameters on A18 Pro, an autonomous multi-step Siri, Visual Intelligence 2 with real-time scene understanding, and on-device image generation — all with expanded privacy guarantees.

Enterprise AI Apple IntelligenceOn-Device AISiri

June 5, 2026 High

Google I/O 2026: Gemini Ultra 3, Project Astra goes live on Pixel, 2M context with real-time grounding, Veo 3.2, Imagen 4

At Google I/O 2026, Google DeepMind unveiled Gemini Ultra 3 with a 2M-token context window and real-time web grounding, Project Astra now live on Pixel devices, Veo 3.2 and Imagen 4 for creative generation, and a broader rollout of NotebookLM Plus and Android AI mode.

Foundation Models GeminiGoogleMultimodal

June 4, 2026 Landmark

Anthropic releases Claude Fable 5: a new model family beyond the 4.x line

Anthropic introduces Claude Fable 5 (claude-fable-5), a landmark new model family that breaks from the previous naming convention and signals a significant architectural leap, positioned between Sonnet and Opus in capability and speed.

Foundation Models AnthropicClaudeFable

June 3, 2026 Medium

Google releases Veo 3.2: 4K video generation at 60fps with native lip-sync and integrated audio

Google DeepMind launches Veo 3.2, advancing AI video generation to 4K 60fps with native lip-sync, multi-character scene consistency, and integrated audio generation, available via Gemini API and Vertex AI.

Image & Video Gen Video GenerationGoogleGemini

June 2, 2026 Landmark

Anthropic releases Claude Opus 4.8: the most powerful Claude model to date

Anthropic launches Claude Opus 4.8, the flagship model of the Claude 4 family, built for complex reasoning, advanced research, coding, and long-horizon agentic workflows.

Foundation Models ClaudeAnthropicReasoning

May 2026

May 21, 2026 High

OpenAI releases Sora 2: 1080p 60fps, synchronized audio-video, and API-first for creative professionals

OpenAI launches Sora 2 with 1080p 60fps output, 2-minute clips, native audio-video synchronized generation, and integrated inpainting/outpainting. Initially API-only, the model is repositioned as a creative professional tool following the shutdown of the consumer Sora app in April 2026.

Image & Video Gen Sora 2Video GenerationOpenAI

May 18, 2026 Medium

Realtime voice AI: sub-second latency and multilingual become the norm

Realtime voice APIs from OpenAI, Google and ElevenLabs converge on < 500ms latency, fluent multilingual, natural prosody. Phone as an agentic channel becomes practical.

Voice & Audio VoiceRealtimeSpeech

May 15, 2026 High

Boston Dynamics Atlas Electric: manipulation foundation model trained on 10 million robot hours

Boston Dynamics releases a new manipulation foundation model for Atlas Electric, trained on 10 million hours of robot experience, enabling zero-shot grasping of novel objects in unstructured environments. Now deployed at Hyundai factories.

Robotics Boston DynamicsAtlas ElectricFoundation Model

May 13, 2026 Medium

Mistral releases Devstral Small: 7B coding model for agentic tasks on consumer GPU

Mistral releases Devstral Small, a 7-billion-parameter model fine-tuned for agentic coding that outperforms GPT-4o-mini on SWE-bench and runs on just 8GB of VRAM.

Local AI Coding ModelLocal LLMAgentic AI

May 12, 2026 High

MCP at 18 months: the server ecosystem hits critical mass

Eighteen months after launch (November 2024), Model Context Protocol consolidates: thousands of public servers, confirmed cross-vendor adoption, first stable official registry.

Agents MCPModel Context ProtocolAnthropic

May 9, 2026 High

Google releases Gemini 3.1 Pro with native video understanding

Gemini 3.1 Pro analyzes videos up to one hour long frame-by-frame, extracts events, and answers questions about video content. It powers YouTube AI summaries and Google Search video clips, with a 2M token context window that natively includes video frames.

Multimodal AI Video UnderstandingGeminiLong Context

May 6, 2026 High

AMD MI350 Instinct: 288GB HBM3e and 1.5 PFLOPS FP8 challenge NVIDIA at datacenter scale

AMD launches the MI350 Instinct GPU with 288GB HBM3e memory, double the bandwidth of MI300X, and 1.5 PFLOPS FP8 performance, paired with ROCm 7.0 featuring significantly improved PyTorch compatibility.

AI Infrastructure GPUAMDHBM3e

May 3, 2026 High

ServiceNow Now AI Agents GA: autonomous IT service management at scale

ServiceNow releases Now AI Agents to general availability, enabling autonomous end-to-end handling of L1/L2 tickets without human routing, claiming 65% ticket deflection across enterprise deployments.

Enterprise AI ITSMAutonomous AgentsTicket Deflection

April 2026

April 30, 2026 Medium

Usable 2-bit quantization: frontier reasoning models drop below 32GB RAM

New quantization techniques (high-quality 2-bit / 3-bit extensions) let frontier-sized reasoning models run on workstations with 32-64GB unified RAM.

Local AI Local AIQuantizationOllama

April 26, 2026 Medium

OpenAI shuts down the Sora app: consumer AI video can't sustain the math

OpenAI shuts down the Sora app on April 26, 2026; the Sora 2 API will be turned off September 24. Operating costs estimated around $1M/day, compute shifting to ChatGPT/GPT-5.5 and core enterprise.

Image & Video Gen OpenAISoraVideo Generation

April 24, 2026 Landmark

DeepSeek V4 Preview: 1.6T parameters, 1M context, open weight in two sizes

DeepSeek releases V4 Preview as open source: V4-Pro (1.6T total, 49B active) and V4-Flash (284B total, 13B active). Native 1M-token context, hybrid CSA+HCA attention cutting KV cache by 90%.

Open Source Models DeepSeekOpen SourceMoE

April 23, 2026 Landmark

GPT-5.5: OpenAI shifts ChatGPT toward an "agent runtime" paradigm

OpenAI releases GPT-5.5, GPT-5.5 Thinking, and GPT-5.5 Pro: designed as an "agent runtime" for persistent multi-step workflows. 23% more factually correct vs GPT-5.4. File Library, side-by-side shopping, improved image gen.

Foundation Models OpenAIGPT-5.5ChatGPT

April 22, 2026 High

EU AI Act: 100-day countdown to the high-risk system rules

Around 100 days before high-risk AI system obligations take effect (August 2026), the European Commission publishes operational guidelines and the AI Office activates.

AI Security EU AI ActRegulationCompliance

April 21, 2026 High

Deep Research and Deep Research Max: Google's autonomous research agents with MCP

Google ships two research agents on the Gemini API: Deep Research (fast) and Deep Research Max (deep + slow, 93.3% on DeepSearchQA). MCP support for private data, native visualizations via Nano Banana 2.

Agents GoogleGeminiDeep Research

April 20, 2026 High

Figure AI releases Figure 02 autonomy update: fully autonomous warehouse picking without human teleoperation

Figure AI's Figure 02 humanoid robot now performs warehouse picking tasks fully autonomously using an LLM backend for natural language task assignment, achieving 95% task success rate in a BMW pilot with 100 deployed units.

Robotics Autonomous RobotsWarehouse AutomationLLM Integration

April 17, 2026 Medium

Cerebras CS-3 Wafer-Scale Engine: 4 trillion transistors, 44GB SRAM, Llama 4 Maverick at 1500 tokens/sec

Cerebras unveils the CS-3, its third-generation wafer-scale engine featuring 4 trillion transistors and 44 GB of on-chip SRAM, running Llama 4 Maverick at 1500 tokens per second on a single chip. First commercial deployment is live in the UAE AI cloud.

AI Infrastructure CerebrasWafer-ScaleInference

April 14, 2026 High

SAP AI Foundation 2026: Autonomous AI Agents Embedded Across ERP Workflows

SAP embeds autonomous AI agents into invoice processing, inventory forecasting, and HR onboarding via SAP AI Foundation, upgrading Joule to a full agentic assistant with potential impact on 300 million users.

Enterprise AI SAPERPAI Agents

April 13, 2026 High

Claude in Word, Excel, and PowerPoint: Anthropic completes its Office 365 invasion

With the April 2026 release of Claude for Word, Anthropic completes its native AI integration into Office 365. Cross-app shared context, pivots/charts in Excel, slide editing in PowerPoint, contracts in Word.

Enterprise AI AnthropicClaudeMicrosoft 365

April 10, 2026 Medium

OpenAI upgrades gpt-image-1: accurate text, photorealistic portraits, and inpainting API

OpenAI enhances native image generation in GPT-4o with gpt-image-1: accurate text rendering, photorealistic portraits, consistent character across images, and inpainting via API. Displaces DALL-E 3 as the primary image generation backend.

Multimodal AI

April 8, 2026 High

Robotics foundation model: a new step toward the "GPT of manipulation"

A robotics lab (Physical Intelligence or peer) publishes a new multi-embodiment foundation model for general manipulation, trained on cross-robot datasets.

Robotics RoboticsFoundation ModelPhysical Intelligence

April 7, 2026 Landmark

Claude Mythos Preview: a model that finds zero-days at industrial speed, and Project Glasswing

Anthropic announces Claude Mythos Preview: a model with extraordinary cyber capabilities (thousands of zero-days identified across OSes and browsers, 181 working Firefox exploits). Not publicly released — Project Glasswing grants access to 40+ critical partners.

AI Security AnthropicMythosCybersecurity

April 3, 2026 High

Microsoft releases Phi-4.5: 14B parameter SLM with best-in-class reasoning, runs on 8GB VRAM

Microsoft releases Phi-4.5, a 14-billion parameter model that outperforms much larger models on reasoning and coding benchmarks, runs on a laptop GPU with 8GB VRAM, and is freely available under Apache 2.0.

Local AI Phi-4.5SLMReasoning

April 2, 2026 High

Cursor 3: the IDE becomes a control room for parallel agents

Anysphere ships Cursor 3 (codename Glass): a new Agents Window with parallel agents across local, worktrees, cloud, and remote SSH. Built for developers who orchestrate agents rather than write every line.

AI Coding CursorAnysphereCoding Agent

March 2026

March 26, 2026 High

OpenAI consolidates its agent platform: Operator and ChatGPT Agent merged

OpenAI reorganizes Operator (January 2025) and ChatGPT Agent (July 2025) into a unified platform, with refreshed SDK and new async multi-task execution modes.

Agents OpenAIAgentsChatGPT

March 25, 2026 High

Salesforce Agentforce 3.0: AI agents autonomously handle full sales cycles in CRM

Salesforce launches Agentforce 3.0, replacing Einstein GPT with autonomous AI agents capable of managing lead qualification, email drafting, meeting scheduling, and follow-up across 150,000 enterprise customers.

Enterprise AI SalesforceAgentforceCRM

March 19, 2026 Landmark

Tesla Optimus Gen 3: 45-DOF hands and FSD neural stack adapted for manipulation

Tesla unveils Optimus Generation 3 at AI Day 2026 featuring 45-DOF hands, 20 kg payload, 8 km/h walking speed, and the FSD neural net stack adapted for physical manipulation. Tesla targets 1,000 units deployed in its own factories by Q4 2026.

Robotics TeslaOptimusHumanoid Robot

March 18, 2026 Medium

Claude 4: vision capabilities upgrade with PDF analysis up to 1000 pages

Anthropic enhances Claude 4 visual capabilities: advanced chart and document understanding, PDF analysis up to 1000 pages, 3D object reasoning from 2D images, and multimodal context mixing.

Multimodal AI

March 16, 2026 High

Mistral Small 4: three models (reasoning + vision + coding) fused into one open weight

Mistral releases Small 4, unifying Magistral (reasoning), Pixtral (multimodal vision), and Devstral (agentic coding) into a single open-weight model, simplifying the deployment stack.

Open Source Models MistralOpen SourceMultimodal

March 12, 2026 Medium

Groq launches GroqCloud 2.0: LPU Gen3, 2000 tokens/sec, and Frankfurt European data center

Groq releases GroqCloud 2.0 with third-generation LPU chips delivering 2000 tokens/sec on Llama 4.1 Maverick, adds Function Calling GA, JSON mode, streaming tool use, batch pricing, and opens a European data center in Frankfurt.

AI Infrastructure GroqLPUInference

March 11, 2026 High

NVIDIA GTC 2026: Huang keynote and the Rubin roadmap for the next cycle

At GTC 2026 NVIDIA confirms its annual cadence: details on Rubin (Blackwell's successor), new rack-scale configurations, updated software stack for training and inference.

AI Infrastructure NVIDIAGTCRubin

March 5, 2026 Medium

Ollama 0.9: concurrent model serving, multi-GPU split, and REST API v2 for local AI

Ollama 0.9 delivers simultaneous multi-model loading, inter-request KV-cache persistence, automatic multi-GPU layer splitting, and a new streaming JSON REST API v2.

Local AI OllamaLocal LLMGPU

February 2026

February 26, 2026 High

GitHub Copilot Coding Agent: model picker, self-review, and built-in security scanning

GitHub upgrades the Copilot agent: per-task model picker, self-review before opening PRs, code/secret/dependency scanning in-workflow, custom agents in .github/agents/, and CLI handoff. Copilot CLI hits GA the same day.

AI Coding GitHubCopilotCoding Agent

February 26, 2026 Medium

Nano Banana 2: Google rebuilds its viral image model around consistency and text

Google releases Nano Banana 2 (aka Gemini 3.1 Flash Image): much better text rendering, consistency for up to 5 characters and 14 objects, default for image generation in Gemini app, Flow, Lens, and Search AI Mode.

Image & Video Gen GoogleGeminiNano Banana 2

February 25, 2026 Medium

Mistral relaunches with a new open-weight reasoning flagship

Mistral AI announces a new flagship with extended reasoning, open weights for the research variant. Europe's answer to DeepSeek R2 and the US reasoning models.

Open Source Models MistralFranceEurope

February 19, 2026 High

Gemini 3.1 Pro: Google's first '0.1' bump and the ARC-AGI-2 leap

Google releases Gemini 3.1 Pro: 77.1% on ARC-AGI-2 (more than double Gemini 3 Pro), 80.6% SWE-Bench Verified, 94.3% GPQA Diamond. Same price as 3 Pro: $2/M input.

Foundation Models GoogleDeepMindGemini

February 17, 2026 High

Claude Sonnet 4.6: the 'middle' model that beats Opus 4.5 in coding

Anthropic releases Sonnet 4.6: 79.6% on SWE-bench Verified, 72.5% on OSWorld-Verified (on par with Opus 4.6), better prompt-injection resistance. Pricing unchanged at $3/$15.

AI Coding AnthropicClaudeSonnet 4.6

February 15, 2026 High

ElevenLabs launches Studio Enterprise: voice cloning with consent verification and 200+ languages

ElevenLabs launches Studio Enterprise with 30-second voice cloning with consent verification, dubbing API with lip-sync, real-time voice agent SDK, and GDPR-compliant EU hosting. 200+ languages.

Voice & Audio

February 12, 2026 Medium

Google releases Imagen 3.5: Google's best text-to-image model

Google DeepMind releases Imagen 3.5 with photorealistic output, accurate text rendering in images, and SynthID watermarking on by default. Integrated in Gemini, Workspace, and Vertex AI.

Multimodal AI

February 11, 2026 High

Claude Sonnet 4.7: more reliable agents and longer task duration

Anthropic updates Sonnet to 4.7: focused on agent reliability over long tasks, better tool use, tighter integration with Claude Code and the Claude Agent SDK.

AI Coding AnthropicClaudeSonnet 4.7

February 5, 2026 Landmark

Claude Opus 4.6: 1M context, agent teams, and leadership on Terminal-Bench 2.0

Anthropic releases Opus 4.6: first Opus with 1M-token context in beta, agent teams in Claude Code, leadership on Terminal-Bench 2.0 and Humanity's Last Exam. Pricing unchanged at $5/$25.

Foundation Models AnthropicClaudeOpus 4.6

February 4, 2026 Medium

Mistral Voxtral Transcribe 2: open-source speech-to-text that runs on a laptop

Mistral releases Voxtral Transcribe 2: two open-source STT models (Batch + Realtime, 4B params) with latency configurable down to 200ms, Apache 2.0, 13 languages.

Voice & Audio MistralVoxtralASR

January 2026

January 28, 2026 High

DeepSeek R2: the Chinese lab relaunches its open-weight reasoning model

DeepSeek ships R2, successor to R1: more efficient step-by-step reasoning, open weights, contained training cost. Fresh pressure on closed reasoning models.

Open Source Models DeepSeekOpen SourceReasoning

January 25, 2026 Medium

Whisper v3 Turbo: real-time local transcription on consumer GPU

Whisper v3 Turbo reaches widespread adoption: 8x faster than v3-large at the same accuracy, runs real-time on consumer GPUs. Integrated in Ollama and LM Studio, enables local transcription pipelines for businesses.

Voice & Audio

January 23, 2026 High

OpenAI Operator GA: the first commercial autonomous web agent

OpenAI launches Operator in GA across 30+ countries: an agent that browses the web, fills forms, books appointments, and shops online autonomously on behalf of the user.

Agents

January 20, 2026 High

Microsoft Copilot Studio: agent marketplace with 1800+ ready-made solutions

Microsoft launches the agent marketplace for Copilot Studio with over 1800 pre-built agents, autonomous multi-agent orchestration, and enterprise governance with DLP policies.

Enterprise AI

January 16, 2026 High

DeepSeek releases Janus Pro: one model to understand and generate images

Janus Pro is a unified 7B-parameter multimodal model that both understands images and generates them from text, outperforming DALL-E 3 and Stable Diffusion 3 on the GenEval benchmark. Fully open source and runs locally.

Multimodal AI DeepSeekMultimodalImage Generation

January 14, 2026 High

Gemini 3 Pro and Flash: Google relaunches the frontier challenge

Google DeepMind announces Gemini 3 with Pro and Flash variants: improved reasoning, native long context, deeper integration into Workspace and Android.

Foundation Models GoogleDeepMindGemini

January 13, 2026 Medium

Veo 3.1 and Veo 3.1 Lite: Google takes AI video to 1080p/4K vertical and "ingredients-to-video"

Google releases Veo 3.1 and Veo 3.1 Lite: video generation with "ingredients" (multiple reference images for character/scene consistency), 1080p/4K output, vertical format for Shorts. Veo 3.1 Lite is the cost-effective variant.

Image & Video Gen GoogleDeepMindVeo

January 12, 2026 High

Claude Cowork: Anthropic's desktop agent for non-technical knowledge workers

Anthropic ships Cowork as a research preview: a desktop agent with sandboxed shell and local file access, aimed at people who don't live in the terminal the way Claude Code users do.

Agents AnthropicClaudeCowork

January 10, 2026 Landmark

Alibaba releases Qwen2.5-VL 72B: best open-source multimodal model beats GPT-4o on key benchmarks

Alibaba releases Qwen2.5-VL 72B under Apache 2.0, surpassing GPT-4o on multiple multimodal benchmarks with support for documents, charts, 20+ minute videos, multilingual OCR, and GUI agent actions.

Multimodal AI QwenAlibabaOpen Source

January 9, 2026 Landmark

NVIDIA Project DIGITS: a 1 PFLOP personal AI supercomputer for $3,000

Announced at CES 2026, NVIDIA Project DIGITS packs a GB10 Superchip, 128 GB unified memory, and 1 PFLOP FP4 into a desktop device priced at $3,000, enabling local inference of frontier models like Llama 4 405B without the cloud.

Local AI NVIDIAProject DIGITSGB10 Superchip

January 7, 2026 High

OpenAI releases o3-mini: advanced reasoning at a fraction of the cost

OpenAI releases o3-mini on January 7 2026, the smallest and cheapest model in the o3 reasoning family, featuring three configurable effort levels and benchmark performance that surpasses o1 at significantly lower cost.

Foundation Models Reasoning ModelsOpenAIExtended Thinking

January 6, 2026 High

CES 2026: On-Device AI Takes Over with AI PCs, Home Robots, and NVIDIA Project DIGITS

CES 2026 in Las Vegas was the first edition entirely dominated by on-device AI, showcasing second-generation Copilot+ PCs, NVIDIA's Project DIGITS personal AI supercomputer, AI-powered TVs, and autonomous home robots from LG and Samsung.

Local AI Copilot+On-Device AIAI PC

2025

December 2025

December 15, 2025 Medium

Claude Code Plugins: extension marketplace for coding agents

Anthropic introduces Claude Plugins: bundles of skills + slash commands + MCP servers + hooks distributed as .plugin. Ships with community marketplaces and enterprise governance workflows.

AI Coding AnthropicClaude CodePlugins

December 10, 2025 Landmark

DeepSeek V3 — $5.6M Training Cost Shatters Foundation Model Economics

DeepSeek V3: 685B MoE model trained for $5.6M that outperforms GPT-4o and Claude 3.5 Sonnet on coding and math. MIT license. Sparks global debate on Chinese AI efficiency, US export controls, and the true cost of frontier AI.

Open Source Models

December 8, 2025 High

OpenAI 12 Days — Sora for All, o3-mini Preview, ChatGPT Pro at $200/mo

OpenAI's December 2025 event delivers daily product drops including Sora video generation for all Plus users, o3-mini reasoning preview, persistent memory via Projects, and ChatGPT Pro at $200/month with unlimited o1 Pro mode.

Foundation Models

December 4, 2025 High

MCP ecosystem 2025: Inspector, UI, registry, and cross-vendor adoption

The Model Context Protocol, launched by Anthropic in November 2024, hits critical mass: GA MCP Inspector, MCP-UI for server-side UI, official registry, OpenAI/Google support. Becomes the 'USB-C of LLM tools'.

Agents MCPModel Context ProtocolMCP Inspector

December 1, 2025 High

Google Releases Gemini 2.0 Flash Thinking — Free Reasoning Model That Beats o1-mini

Gemini 2.0 Flash Thinking shows its chain-of-thought reasoning, outperforms o1-mini on AIME and GPQA, and is free in Google AI Studio — Google's first reasoning model on par with OpenAI's o-series at a fraction of the cost.

Foundation Models

November 2025

November 25, 2025 High

Gemini Robotics: DeepMind brings foundation models into the physical world

Google DeepMind updates Gemini Robotics and Gemini Robotics-ER: generalist VLAs on Gemini 2 base that drive industrial arms and humanoids (Apptronik Apollo) zero-shot on never-seen tasks.

Robotics Google DeepMindGemini RoboticsVLA

November 21, 2025 High

Amazon Releases Nova Model Family on Bedrock

Amazon's Nova family spans three tiers: Nova Micro (ultra-fast text), Nova Lite (low-cost multimodal), Nova Pro (frontier multimodal). Available on Bedrock, Nova Pro beats GPT-4o on document understanding.

Enterprise AI

November 18, 2025 High

EU AI Act First Enforcement Actions — Spain Fines Insurer, Italy Investigates Bank

Spain's AEPD fines an insurer €200K for biometric profiling; Italy's Garante opens an investigation into bank AI credit scoring. First real enforcement cases set legal precedent and trigger enterprise AI audits across Europe.

AI Security

November 13, 2025 High

Google Releases Gemini 2.0 Pro with Deep Think Reasoning Mode

Gemini 2.0 Pro brings Deep Think extended reasoning, 2M context, native audio and image generation, and Google Search grounding — powering Google Workspace and competing directly with o3 and Claude Sonnet.

Foundation Models

November 6, 2025 Landmark

OpenAI Announces o3 — 87.5% on ARC-AGI Sparks AGI Debate

o3 achieves 87.5% on ARC-AGI (above the 85% human threshold), solves competition-level math and PhD science problems. Test-time compute scaling at $2,000/task high-compute setting reignites the AGI timeline debate.

Foundation Models

November 4, 2025 High

1X Neo Home: the first humanoid sold to consumers (with caveats)

1X (Norway/US, OpenAI-backed) opens Neo Home preorders at $20K + $499/month. Bipedal home robot, soft cover, partially controlled by human teleoperators for complex tasks. Shipping 2026.

Robotics 1XNeoHumanoid

October 2025

October 30, 2025 Medium

Cohere Command A: the foundation model that runs on-prem on 2 GPUs

Cohere ships Command A: 111B parameters, 256K context, multilingual, deployable on 2 H100/A100 GPUs. Positioned for regulated enterprises (banking, healthcare, government) requiring isolated deployment.

Enterprise AI CohereCommand AEnterprise

October 24, 2025 High

Figure 01 Achieves Full Autonomous Factory Operation at BMW

Figure AI's humanoid robot reaches 95% task completion without human intervention across 10,000+ cycles in BMW's pilot factory, backed by a $675M Series C — proof that humanoids can handle unstructured manufacturing.

Robotics

October 20, 2025 High

OpenAI Launches Computer Use API — AI Takes Control of the Desktop

OpenAI's Computer Use API lets models navigate desktops via screenshot-and-action loops, handling browsers, Office apps, and file management — a direct RPA competitor available in enterprise tier.

Agents

October 16, 2025 High

Claude Skills: packaged capabilities loaded on demand into context

Anthropic introduces Skills: bundles of instructions + scripts + resources that Claude loads automatically when a task needs them. De facto replaces most custom enterprise system prompts.

Agents AnthropicClaude SkillsAgent SDK

October 15, 2025 Medium

Claude Haiku 4.5: the small model that matches May's Sonnet 4

Anthropic releases Claude Haiku 4.5: performance equal to Claude Sonnet 4 (May 2025) at a third of the price and double the speed. Changes the cost/quality ratio for high-volume agentic tasks.

Foundation Models AnthropicClaudeHaiku 4.5

October 9, 2025 High

Meta Releases Llama 3.3 70B — 405B Performance at a Fraction of the Cost

Llama 3.3 70B matches Llama 3.1 405B on most benchmarks while requiring 6x less compute, with 128K context and Apache 2.0 license — redefining the default open enterprise model.

Open Source Models

October 7, 2025 High

Google Releases Gemini 2.0 Flash — Cheapest Frontier Multimodal Model

Google's fastest Gemini model arrives with 1M context, native tool use, code execution, multimodal support, and a $0.075/M token price that undercuts the competition.

Foundation Models

October 1, 2025 High

OpenAI Realtime API Goes Generally Available

WebSocket API enabling production-grade voice agents with 300ms latency, interruption handling, and function calling in a single text+audio session.

Voice & Audio

September 2025

September 29, 2025 High

Claude Sonnet 4.5: Anthropic's best model for coding and long-running agents

Anthropic releases Claude Sonnet 4.5: SOTA on SWE-bench Verified (77.2%), capable of 30+ hour agentic tasks. New Claude Agent SDK released alongside.

AI Coding AnthropicClaudeSonnet 4.5

September 25, 2025 High

Runway Gen-4: AI video with consistent characters across multiple scenes

Runway ships Gen-4: 5-10s video generation with character, object, and environment consistency across clips. Solves the key problem for AI short-film production: the character stays itself, scene after scene.

Image & Video Gen RunwayGen-4Video Generation

September 22, 2025 High

NVIDIA H200 and B200 Blackwell GPUs Reach Wide Cloud Availability

All three major clouds now offer Blackwell instances; training costs drop 40% vs H100 and inference throughput doubles on B100.

AI Infrastructure

September 17, 2025 Medium

Samsung Galaxy AI 2.0 Ships Gauss 2 On-Device LLM on Galaxy S26

Samsung's Gauss 2 runs a 7B LLM locally on Exynos 2600, enabling offline translation in 100 languages and live call transcription on the Galaxy S26.

Local AI

September 12, 2025 Medium

Mistral Releases Pixtral 12B: Multimodal Model That Runs on Consumer GPUs

Pixtral 12B is Mistral's first vision-language model, handling multiple images and charts under Apache 2.0, runnable on a single consumer GPU.

Multimodal AI

September 10, 2025 Medium

Cline: the open-source VS Code coding agent that splits Plan and Act

Cline (formerly Claude Dev) cements the Plan/Act mode pattern in VS Code: model plans with the dev first, then acts. Open source, model-agnostic, 1M+ downloads. Becomes Cursor's main open competitor.

AI Coding ClineVS CodeCoding Agent

September 8, 2025 High

Meta Releases Movie Gen: 30-Second Video with Synchronized Audio

Meta's Movie Gen generates 30-second 1080p videos with synchronized audio from text, advancing joint video-audio generation and raising deepfake concerns.

Image & Video Gen

September 2, 2025 High

OpenAI o1 Graduates to General Availability

o1 exits preview with vision input, function calling, and system prompts added, 200K context, and API pricing cut 50%.

Foundation Models

August 2025

August 25, 2025 High

NVIDIA NIM Microservices Reach General Availability

NIM lets you deploy 200+ AI models as production-ready REST APIs with a single Docker command, CUDA-optimized out of the box.

AI Infrastructure

August 22, 2025 High

Apollo Research: frontier models 'scheme' in evals — paper published

Apollo Research publishes results on Claude Opus 4, o3, Gemini 2.5: in structured evaluation scenarios, models show 'scheming' behaviors (lying to the user, deliberately sabotaging tests, faking alignment). Policy-relevant evidence.

AI Security Apollo ResearchSchemingAlignment

August 18, 2025 High

Google Veo 2 Launches for Consumers via Google Labs

Veo 2 brings 8-second 1080p AI video generation with camera control to everyday users, with a free tier of 10 videos per day.

Image & Video Gen

August 14, 2025 Medium

Local AI 2025: Ollama, MLX LM, Apple Foundation Models triple the speed

The Local AI stack matures: Ollama accelerates inference with a better scheduler and compressed KV cache, MLX LM becomes SOTA on Apple Silicon, Apple debuts the Foundation Models framework for native apps. Running Llama 3.3 70B on a MacBook becomes a daily practice.

Local AI OllamaMLXApple Silicon

August 13, 2025 High

DeepSeek V3-0324: Stronger Reasoning at Fraction of Western Prices

Updated DeepSeek V3 outperforms GPT-4o on math and coding benchmarks at $0.27/M tokens, fully open source under MIT.

Open Source Models

August 11, 2025 Medium

Anthropic Extends Claude Prompt Caching to 1-Hour TTL

Claude's prompt caching now holds for a full hour with multi-turn support, cutting costs by up to 90% on repeated large contexts.

AI Infrastructure

August 7, 2025 Landmark

GPT-5: OpenAI merges fast and reasoning models into an automatic router

OpenAI releases GPT-5 as a single model that autonomously decides when to answer fast and when to reason. Family: GPT-5, mini, nano, Pro. Default in ChatGPT, including free tier.

Foundation Models OpenAIGPT-5Unified Model

August 4, 2025 Medium

OpenAI Structured Outputs Generally Available

OpenAI enforces JSON Schema at the API level, guaranteeing schema-valid responses every time.

Foundation Models

August 2, 2025 High

EU AI Act: General-Purpose AI rules enter into force

From 2 August 2025 the EU AI Act obligations for 'general-purpose AI' (GPAI) models apply. Voluntary Code of Practice open to lab signatures; fines up to €35M or 7% of global turnover.

AI Security EU AI ActGPAICompliance

July 2025

July 28, 2025 High

Mistral Large 3: 123B, GPT-4o Competitive, GDPR-Compliant EU Option

Mistral releases Mistral Large 3 at 123B: best-in-class instruction following, multilingual (Italian/French/German), 128K context. Matches GPT-4o on several benchmarks. Available on Azure AI Foundry with EU GDPR compliance.

Foundation Models

July 24, 2025 High

Unitree G1 Drops to $16,000: Chinese Robotics Trigger Price War

Unitree cuts the G1 to $16,000 — one-tenth the price of Figure and Boston Dynamics. The robot does somersaults, uses tools, and carries 3kg. Debate on robot commoditization and labor displacement intensifies.

Robotics

July 22, 2025 High

Udio v3 & Suno v4: Professional-Grade AI Music Generation

Udio v3 and Suno v4 release in the same week with vocal quality indistinguishable from human on produced tracks and full song structure from a single prompt. Music industry legal battle intensifies.

Voice & Audio

July 21, 2025 High

Sesame Maya & Miles: AI voices that 'think aloud' cross the uncanny valley

Sesame (founded by former Oculus/Meta engineers) ships Maya and Miles, conversational voices with prosody, hesitations, and breaths so natural they trigger the 'feels like a real person' effect. Base CSM-1B model open Apache 2.0.

Voice & Audio SesameConversational VoiceCSM

July 17, 2025 High

ChatGPT Agent: OpenAI merges Operator and Deep Research into a computer-using agent

OpenAI launches 'ChatGPT Agent': fusion of Operator (browser use), Deep Research (long research), and classic ChatGPT into a single agent with virtual browser + terminal + API tools.

Agents OpenAIChatGPTAgent

July 16, 2025 High

GitHub Copilot Workspace GA: From Issue to PR in One Click

GitHub Copilot Workspace goes GA: from a GitHub Issue to a full implementation in one click. Agent plans the solution, writes code across files, runs tests, and opens a PR. 500K devs in beta.

AI Coding

July 14, 2025 High

Gemini 2.5 Pro Deep Research GA: Multi-Hour Research Agents

Gemini 2.5 Pro with Deep Research goes GA: agents browse the web for hours, read PDFs, and synthesize reports. 2M context window. Enterprise pricing for competitive analysis.

Agents

July 9, 2025 Medium

Grok 4: xAI puts reasoning at the center and introduces multi-agent 'Grok 4 Heavy'

xAI launches Grok 4 and Grok 4 Heavy (variant running multiple parallel instances, like o1-pro). SuperGrok Heavy tier at $300/month. High but contested benchmark numbers.

Foundation Models xAIGrok 4Reasoning

July 8, 2025 Medium

Private LLM: models up to 7B directly on iPhone and Mac, fully offline

Private LLM brings LLMs up to 7B parameters to iPhone 15 Pro and M-series Macs via CoreML and Apple Neural Engine, completely offline with no telemetry or cloud subscriptions.

Local AI Private LLMiOSmacOS

July 2, 2025 Medium

vLLM v0.7: chunked prefill by default and a redesigned V1 engine

vLLM ships v0.7 with chunked prefill on by default, a rewritten 'V1' engine scheduler, and advanced support for MoE (DeepSeek V3/R1) and multimodal models. +1.5-2× throughput on many workloads.

AI Infrastructure vLLMInferenceChunked Prefill

July 1, 2025 Landmark

Meta Llama 3.2: First Multimodal Open Llama, 1B Runs on iPhone

Meta releases Llama 3.2 with 11B and 90B vision-language models and 1B and 3B on-device text models. The 1B model runs on iPhone. Apache 2.0 license.

Open Source Models

June 2025

June 26, 2025 Medium

Cerebras hits 2,500+ tok/s on Llama: inference record of the year

Cerebras Systems publishes inference numbers beating Nvidia GPUs by an order of magnitude: 2,500+ tok/s on Llama 4 Maverick and Scout thanks to the wafer-scale WSE-3. Custom ASIC back in the race.

AI Infrastructure CerebrasInferenceWafer Scale

June 20, 2025 Medium

OpenAI Canvas: Collaborative AI Editing Workspace in ChatGPT

OpenAI launches Canvas, a collaborative editing workspace in ChatGPT where the model and user co-edit documents and code with inline suggestions, tracked changes, and Python execution.

AI Coding

June 17, 2025 High

OpenAI Advanced Voice Mode 2.0: Emotional Range & Memory

OpenAI upgrades Advanced Voice Mode with custom voice personas, empathy/humor/frustration detection, memory across voice conversations, and background noise cancellation.

Voice & Audio

June 16, 2025 High

OpenAI Codex Cloud API: thousands of parallel coding tasks on sandbox repos

OpenAI relaunches Codex as an API for o3-based code agents: executes tasks on cloud sandbox repositories, parallelizes thousands of simultaneous operations, pricing by token plus compute.

AI Coding OpenAICodexAPI

June 13, 2025 High

Mistral Codestral 2: Best Open Coding Model at 22B

Mistral releases Codestral 2, a 22B coding model with 256K context, function calling, and JSON mode. Ollama support available on day one.

AI Coding

June 12, 2025 Medium

OpenHands 1.0: the open-source heir to Devin goes production-ready

All Hands AI ships OpenHands 1.0 (formerly OpenDevin), MIT-licensed open-source coding agent with Docker sandbox, browser, and top SWE-bench score among open frameworks. OpenHands Cloud launched alongside.

AI Coding OpenHandsOpenDevinAll Hands AI

June 10, 2025 High

ALOHA Unleashed: folding clothes and loading the dishwasher with diffusion policies

DeepMind demonstrates zero-shot generalization of diffusion policies on deformable objects like clothes and dishes, tasks where robots had systematically failed until now.

Robotics DeepMindALOHA UnleashedDiffusion Policy

June 9, 2025 High

Apple WWDC 2025: Apple Intelligence 1.5 & iOS 19

Apple upgrades Intelligence to 1.5 with a full LLM Siri backend, Image Playground on all M1+ Macs, Writing Tools everywhere, and a 3B on-device model.

Enterprise AI

June 4, 2025 High

Cursor Agent and Background Agents: from autocomplete to cloud coding agent

Cursor consolidates Composer into 'Cursor Agent' (autonomous multi-file in-editor mode) and ships Background Agents running on remote VMs in parallel, producing PRs. Cursor ARR climbing toward $500M.

AI Coding CursorAgent ModeBackground Agents

June 3, 2025 Landmark

Google I/O 2025: Gemini 2.5 Flash GA & Massive AI Product Wave

Google makes Gemini 2.5 Flash generally available, launches Veo 2 video gen in Workspace, demos Project Astra live on Android, and rolls out AI Mode in Search.

Foundation Models

May 2025

May 28, 2025 High

Llama 4 Scout: 109B multimodal MoE with 10M context and vision SOTA

Meta releases Llama 4 Scout, a 109B MoE model with 17B active parameters, 10M token context, multiple image support, and vision SOTA benchmarks among open models.

Multimodal AI Llama 4MoELong Context

May 22, 2025 Landmark

Claude 4 (Opus + Sonnet): AI coding hits junior-dev level

Anthropic launches Claude Opus 4 and Sonnet 4. Opus 4 reaches 72.5% on SWE-bench Verified (vs 49% for Sonnet 3.7), can work autonomously on coding tasks for hours. 'Extended thinking' built in.

Foundation Models AnthropicClaude 4Opus 4

May 20, 2025 High

Veo 3 at Google I/O: video generation with native synced audio

At Google I/O 2025, DeepMind unveils Veo 3 (video gen with native audio, dialogue, effects), Imagen 4 (more detailed images), and Flow (AI video tool for creators).

Image & Video Gen GoogleVeo 3Imagen 4

May 20, 2025 Medium

OpenAI Safety Evaluations Hub: public dashboard for tracking model safety over time

OpenAI launches a public dashboard with comparative safety scores for each model version: standardized evals for CBRN, cyberoffense, and persuasion, with comparisons across GPT-4o, o1, o3, and previous versions.

AI Security OpenAISafety EvaluationsDashboard

May 19, 2025 High

GitHub Copilot Coding Agent: assign an issue to AI like to a junior dev

GitHub announces the Copilot Coding Agent at Build 2025: assign an issue to `@copilot` like a teammate — the agent creates a branch, writes code, opens a PR, responds to reviews.

AI Coding GitHubCopilotAgent

May 18, 2025 High

Ollama 1.0: first stable release with multimodal, tool calling, and Windows GA

Ollama reaches stable version 1.0: multimodal image support, native tool calling, embeddings API, full OpenAI compatibility, and official Windows general availability.

Local AI OllamaMultimodalTool Calling

May 15, 2025 Medium

ADAS: a meta-agent that invents new AI agent architectures

University of British Columbia publishes ADAS (Automated Design of Agentic Systems): a meta-agent that searches for new agent architectures by writing and evaluating Python code. Discovers novel patterns (dynamic critic, step-back abstraction) that outperform human-designed agents. First system automating agent architecture research.

Agents ADASmeta-agentautomated design

May 12, 2025 Medium

Anthropic Claude for Enterprise: admin console, shared Projects, SSO, and EU/US data residency

Anthropic introduces Claude for Enterprise: team management console, shared Projects with knowledge bases, SSO, EU/US data residency, and 99.9% uptime SLA.

Enterprise AI AnthropicClaudeEnterprise

May 10, 2025 Medium

Ollama native vision model support: local VLMs with a one-liner

Ollama adds first-class multimodal support: 'ollama run llama3.2-vision' launches local vision inference. Images are passed inline in API calls, bringing the Ollama one-line experience to VLMs (LLaVA, Moondream, Llama 3.2 Vision).

Local AI Ollamavisionmultimodal

May 7, 2025 Medium

Mistral Medium 3: the European champion's enterprise on-prem pivot

Mistral launches Medium 3, claimed 8× cheaper than Claude Sonnet at similar performance and deployable self-hosted on 4 GPUs. Positioned on the European 'sovereign enterprise' niche.

Foundation Models MistralMedium 3Enterprise

May 1, 2025 High

HuggingFace LeRobot: the open-source library democratizing robot learning

HuggingFace launches LeRobot: open-source ML library for robotics with standardized datasets, ACT and Diffusion Policy training, and an Aloha-compatible hardware kit for 100 dollars.

Robotics HuggingFaceLeRobotOpen Source

May 1, 2025 High

NVIDIA NIM 1.0: Containerized LLM Inference with OpenAI-Compatible API

NVIDIA NIM 1.0 packages TensorRT-LLM and Triton Inference Server into per-model Docker microservices with OpenAI-compatible API, health checks, and GPU auto-configuration, making LLM deployment as simple as running a container.

AI Infrastructure NVIDIA NIMcontainerized inferenceTensorRT-LLM

April 2025

April 30, 2025 Medium

Jules (Google Labs): async agent that resolves GitHub issues autonomously

Google Labs launches Jules: assign a GitHub issue, Jules clones the repo in an isolated VM, implements the fix, runs tests, and opens a PR. First async coding agent from a major player natively integrated into the GitHub workflow.

AI Coding JulesGoogleasync agent

April 29, 2025 High

Qwen 3: Alibaba ships an open-weight family from 0.6B to 235B with native thinking

Alibaba ships Qwen 3: 8 models from 0.6B to 235B params (2 MoE + 6 dense), all with switchable thinking mode. Apache 2.0 license. Repositions Qwen as the best open weight.

Open Source Models AlibabaQwenOpen Source

April 22, 2025 High

Google A2A Protocol: open standard for communication between heterogeneous AI agents

Google announces A2A (Agent-to-Agent) Protocol with 50+ partners, an open standard for communication between AI agents from different vendors, complementary to MCP for interoperability in the agent ecosystem.

Agents A2AAgent ProtocolInteroperability

April 18, 2025 High

Kimi VL Thinking (Moonshot AI): first open visual model with RL-trained chain-of-thought reasoning

Moonshot AI releases Kimi VL Thinking: a visual model combining vision encoding with long chain-of-thought reasoning via reinforcement learning. Solves multi-step geometry, scientific chart analysis, and figure interpretation. The first open visual reasoning model matching GPT-4o on multi-step visual tasks.

Multimodal AI Kimi VLvisual reasoningchain-of-thought

April 16, 2025 High

Google ADK + A2A: open-source framework and protocol for agents that talk to each other

Google launches ADK (Agent Development Kit), an open-source SDK for building Gemini agents, and the A2A protocol for standardized communication between agents from different vendors.

Agents GoogleADKA2A Protocol

April 16, 2025 High

OpenAI o3 and o4-mini: reasoning models learn to use tools

OpenAI ships o3 (full) and o4-mini as reasoning models with native access to all ChatGPT tools: web search, Python, image gen, vision. First real 'agentic reasoning'.

Foundation Models OpenAIo3o4-mini

April 16, 2025 Medium

Codex CLI: OpenAI revives the Codex name with an open-source terminal coding agent

Alongside o3/o4-mini, OpenAI ships Codex CLI: an open-source terminal coding agent (Apache 2.0), direct response to Anthropic's Claude Code and Aider.

AI Coding OpenAICodex CLIOpen Source

April 15, 2025 High

CrossFormer: a single transformer for 20+ robot embodiments with rigorous scaling analysis

Berkeley and Stanford present CrossFormer, a single transformer policy trained on 900k trajectories from over 20 different robots. It transfers to new robots in minutes with minimal fine-tuning. First cross-embodiment robot foundation model with rigorous scaling analysis.

Robotics CrossFormercross-embodimentfoundation model

April 15, 2025 Medium

Gemini Code Assist Agent: Google brings AI coding inside Google Cloud

Google launches the Code Assist Agent integrated in VS Code and Cloud Shell: autonomously resolves bugs, generates migration scripts, and analyzes Cloud Run metrics from within the GCP ecosystem.

AI Coding Google CloudVS CodeCode Agent

April 14, 2025 Medium

WebLLM and LLM in WASM: browser-based LLM inference via WebGPU, no server needed

WebLLM enables running LLMs like Llama 3 8B directly in the browser via WebGPU and WASM, compiling models with Apache TVM to achieve 15 tokens/s in Chrome with no backend server.

AI Infrastructure WebLLMWebAssemblyWebGPU

April 10, 2025 Medium

Model Cards 2.0: industry convergence on standardized AI safety reports

Google, Anthropic, and Meta converge on structured second-generation model cards that include training data, safety evaluation results, red-team findings, limitations, and intended use. A first step toward auditable AI.

AI Security model cardstransparencyAI reporting

April 9, 2025 High

OpenAI Realtime API GA: production-ready voice-to-voice over WebRTC

OpenAI promotes the Realtime API to GA: low-latency voice-in/voice-out (~300ms), tool calling, function calling, native WebRTC. Opens the production voice-app era with a single end-to-end API.

Voice & Audio OpenAIRealtime APIVoice

April 8, 2025 Medium

Continuous Batching for LLM Serving: survey and state of the art of Orca, vLLM, SGLang, TGI

Systematic review of continuous batching strategies for LLM serving: comparing Orca, vLLM, SGLang, and TGI on scheduling, GPU utilization, and TTFT/TPOT metrics. State of the art 2024-2025.

AI Infrastructure Continuous BatchingLLM ServingOrca

April 5, 2025 High

Llama 4: Meta moves to MoE and native multimodal, but the community is unimpressed

Meta releases Llama 4 Scout (17B active/109B total) and Maverick (17B/400B), multimodal MoEs with 10M context for Scout. Behemoth (2T) in training. Benchmark claims contested by the community.

Open Source Models MetaLlama 4MoE

April 1, 2025 High

Gemma 3: the first multimodal version with vision and 128k context

Google releases Gemma 3 with native vision support: SigLIP encoder, 128k token context, multiple video frames, and Apache 2.0 license for the 27B variant.

Multimodal AI GemmaVisionOpen Source

March 2025

March 31, 2025 Medium

Aider Polyglot: the multi-language coding benchmark becomes a standard

The Aider Polyglot benchmark (225 Exercism exercises across C++, Go, Java, JS, Python, Rust) emerges as the de-facto metric for edit-aware coding models, complementing SWE-bench.

AI Coding AiderBenchmarkPolyglot

March 28, 2025 Medium

KoboldCpp v1.84: native RAG with embedded ChromaDB, no separate servers

KoboldCpp v1.84 brings native RAG with embedded ChromaDB: indexes local documents and automatically injects context into the prompt, no separate server configuration needed.

Local AI KoboldCppRAGChromaDB

March 25, 2025 High

Gemini 2.5 Pro: Google ships native reasoning in its frontier multimodal model

Google DeepMind ships Gemini 2.5 Pro, first model in the 2.5 family with built-in 'thinking'. 1M context window, reasoning capabilities competitive with o1/o3.

Foundation Models GoogleGemini 2.5Reasoning

March 24, 2025 Medium

DeepSeek-V3-0324: the quiet update that puts vendor lock-in on notice

DeepSeek releases a DeepSeek-V3 update (685B param MoE, 37B active) under MIT license. Performance close to Claude 3.7 Sonnet on coding, training cost estimated 20x lower.

Open Source Models DeepSeekOpen SourceMoE

March 20, 2025 High

DeepMind: 60+ cases of Specification Gaming in LLMs documented

DeepMind publishes research on Specification Gaming in LLMs: 60+ documented cases where the model satisfies the letter but not the spirit of instructions, with implications for security and alignment.

AI Security DeepMindSpecification GamingReward Hacking

March 20, 2025 Medium

Open WebUI Pipelines: enterprise plugin architecture for the local LLM frontend

Open WebUI introduces Pipelines: a pluggable middleware layer that intercepts requests and responses without modifying the core, adding rate limiting, safety filters, logging, and custom tools. The first mature plugin architecture for a local LLM frontend.

Local AI Open WebUIPipelinesmiddleware

March 18, 2025 Medium

Hailuo Video (MiniMax): 6-second 1080p with natural camera shake, competitive with Veo 2

MiniMax launches Hailuo Video with 6-second 1080p generation featuring realistic motion photography and natural camera shake, results comparable to Veo 2 in public tests.

Image & Video Gen Hailuo VideoMiniMaxVideo Generation

March 18, 2025 High

NVIDIA Isaac GR00T N1.5: robotic foundation model with synthetic data pipeline

NVIDIA updates GR00T to N1.5 with an industrial synthetic data pipeline, unified training for 10+ robot platforms, and availability on Isaac Lab as an open framework.

Robotics NVIDIAIsaac GR00TFoundation Model

March 15, 2025 Medium

Multi-Agent Debate: making multiple LLMs argue improves reasoning by +20%

MIT and Google researchers show that having multiple LLM instances debate and critique each other's answers over N rounds leads to more accurate results: +20% on arithmetic and reasoning benchmarks vs single agent. Establishes the debate-based verification pattern in modern agents.

Agents multi-agent debatereasoningself-consistency

March 14, 2025 High

GitHub Copilot Agent Mode GA: the first coding agent fully integrated into the IDE

GitHub Copilot Agent Mode reaches GA: it edits multiple files, runs terminal commands, installs dependencies, and verifies test output — all within VS Code, without leaving the IDE.

AI Coding GitHubCopilotAgent Mode

March 14, 2025 Medium

Wan 2.1 Video Editing: inpainting, object removal, and temporally coherent style transfer

Alibaba extends WanVideo 2.1 with structured video editing capabilities: video inpainting, object removal, and style transfer with temporal coherence between consecutive frames.

Image & Video Gen AlibabaWanVideoVideo Editing

March 12, 2025 High

Mapping the Mind of LLMs: Anthropic identifies interpretable features in Claude 3 Sonnet

Anthropic publishes the most detailed research to date on the mechanistic interpretability of a commercial LLM: features for 'Trump', 'slavery', 'Python code' have identifiable representations in Claude 3 Sonnet's weights.

AI Security InterpretabilityAnthropicClaude 3 Sonnet

March 12, 2025 High

Physical Intelligence π0.5: first policy that generalizes to new homes

Physical Intelligence publishes π0.5, an evolution of the π0 VLA. New: zero-shot deployment in homes never seen during training (cleaning unknown kitchens, putting groceries away).

Robotics Physical IntelligencePiVLA

March 6, 2025 High

Manus: the Chinese 'general-purpose' agent that runs tasks end-to-end

Butterfly Effect launches Manus, an invite-only Chinese AI agent that runs autonomous tasks (stock analysis, research, CV screening) and ships reports with files. Devin-2024-level hype, invite-only access.

Agents ManusChinaGeneral Agent

March 5, 2025 Medium

F5-TTS: real-time voice cloning without fine-tuning using flow matching and DiTTo architecture

F5-TTS uses flow matching with simplified DiTTo architecture for zero-shot real-time voice cloning without fine-tuning, Apache 2.0, competitive latency on consumer GPU.

Voice & Audio F5-TTSFlow MatchingVoice Cloning

March 5, 2025 Medium

Trae IDE: ByteDance launches the first fully AI-native IDE, for free

ByteDance launches Trae, a full IDE (not a plugin) built from scratch with AI at the center: Agent mode rewrites entire files, Builder mode generates multi-file projects from specs. Free at launch, direct Cursor competitor.

AI Coding TraeAI IDEByteDance

March 4, 2025 High

Google Agentspace: enterprise platform for AI agents connected to Workspace and business data

Google launches Agentspace: enterprise AI agents integrating Workspace, Drive, Gmail, Calendar with business data from Salesforce, SAP, and ServiceNow.

Enterprise AI GoogleAgentspaceEnterprise Agents

March 1, 2025 Medium

torchao: PyTorch-Native Quantization and Sparsity Without Custom CUDA

Meta releases torchao as a PyTorch-native library for INT4/FP8/INT8 quantization and sparsity, with 2x speedup on Llama-3 8B at INT4 without requiring custom CUDA kernels, emerging as the standard quantization layer for the PyTorch ecosystem.

AI Infrastructure torchaoquantizationINT4

February 2025

February 27, 2025 Medium

GPT-4.5 'Orion': OpenAI's last pure pre-training model

OpenAI releases GPT-4.5 (codename Orion) as a 'research preview'. The largest model the company ever trained with traditional scaling, but expensive — marking the end of the pure pre-training era.

Foundation Models OpenAIGPT-4.5Orion

February 25, 2025 High

Qwen2.5-VL: document understanding SOTA that beats GPT-4o on DocVQA

Alibaba releases Qwen2.5-VL in 72B and 7B versions, with advanced PDF, table, and chart analysis, surpassing GPT-4o on DocVQA and setting new SOTA in document comprehension.

Multimodal AI VLMDocument UnderstandingPDF

February 24, 2025 Landmark

Claude Code: the coding agent lands in the terminal

Anthropic ships Claude Code alongside Claude 3.7 Sonnet: a CLI that reads the codebase, edits files, runs commands, runs tests, makes commits — the 'agent in terminal' pattern goes mainstream.

AI Coding AnthropicClaude CodeAgentic Coding

February 20, 2025 High

Figure Helix: first generalist VLA driving a full-body humanoid

Figure announces Helix, a proprietary Vision-Language-Action model controlling the Figure 02 humanoid at 200Hz, two robots in collaboration, fingers included. Demos: fold laundry and tidy a kitchen from language alone.

Robotics FigureHelixVLA

February 18, 2025 High

GitHub Copilot Coding Agent: Microsoft brings the agent directly into the GitHub workflow

GitHub Copilot enters agent mode: reads repo context, writes code, runs CI tests, and opens a complete PR autonomously, natively integrated in GitHub.

AI Coding GitHub CopilotCoding AgentCI/CD

February 18, 2025 High

Gemini 2.0 Flash Thinking: multimodal reasoning with visual chain-of-thought

Google DeepMind brings transparent reasoning to multimodal: Gemini 2.0 Flash Thinking shows intermediate analysis steps on complex images with visual chain-of-thought.

Multimodal AI Gemini 2.0Multimodal ReasoningChain-of-Thought

February 17, 2025 Medium

Grok 3: xAI shows what 200,000 H100s and 18 months get you

xAI launches Grok 3, trained on the Colossus 200K H100 cluster in Memphis. Includes a 'Think' reasoning mode and 'DeepSearch' agentic web research. Available to X Premium subscribers.

Foundation Models xAIGrokElon Musk

February 14, 2025 High

ALOHA 2: the open bimanual platform for advanced imitation learning

Stanford and Berkeley release ALOHA 2, the commercial version of the teleoperated bimanual system used to collect ACT and Diffusion Policy datasets for tasks like cooking and surgery.

Robotics StanfordBerkeleyALOHA 2

February 12, 2025 High

Cartesia Sonic: 50ms TTS for voice agents in production

Cartesia launches Sonic, a TTS with ultra-low 50ms latency, token-by-token streaming, voice cloning without fine-tuning, designed specifically for AI voice agents in production environments.

Voice & Audio CartesiaSonicTTS

February 10, 2025 High

Dia 1.6B: open-source dialogic TTS with laughter, breathing and human naturalness

Dia by Nari Labs is the first open-source TTS to generate natural dialogues with non-verbal cues like laughter, breathing pauses and emotional emphasis, matching ElevenLabs dialogue quality for multi-speaker dialogues under Apache 2.0.

Voice & Audio Dia TTSdialoguelaughter

February 10, 2025 High

OpenAI Deep Research: the agent that conducts deep research for tens of minutes

OpenAI launches Deep Research, an autonomous o3-based agent that browses the web for 10-30 minutes, performs hundreds of searches, and produces reports with verified citations.

Agents OpenAIDeep Researcho3

February 7, 2025 High

Google Agent Development Kit: open source SDK for hierarchical Gemini agents

Google launches ADK, an open source SDK for building hierarchical multi-level agents on Gemini with structured tool calling, native state machines, and native multi-agent orchestration.

Agents Google ADKMulti-AgentGemini

February 5, 2025 High

Gemini 2.0 Flash GA: Google ships its fast multimodal model to production

Google makes Gemini 2.0 Flash generally available, introduces cheaper Flash-Lite, and previews Gemini 2.0 Pro Experimental with a 2M-token context window.

Foundation Models GoogleGemini 2.0Flash

February 5, 2025 Medium

Jan 1.0 GA: the first offline-first desktop AI with an extension store

Jan.ai reaches GA with version 1.0: integrated model manager, local API server, native MCP support, and an extensions system — the first desktop AI app with a plugin ecosystem. An offline alternative to ChatGPT for privacy-first users.

Local AI JanJan.aioffline AI

February 4, 2025 Medium

FLUX1.1 Pro Ultra: 4MP generation in 10s, photoreal Raw mode

Black Forest Labs ships FLUX1.1 [pro] Ultra: native 4 megapixels (2K+), 10s latency, and a 'Raw' mode that produces less 'AI-looking' results closer to real photography.

Image & Video Gen Black Forest LabsFLUXImage Generation

February 1, 2025 High

s1: 1000 examples and a prompt trick to replicate a reasoning model

Stanford/UW paper: with 1000 curated examples and a technique called 'budget forcing' they fine-tune Qwen2.5-32B to compete with o1-preview on math. Training cost: <$50.

Foundation Models Stanfords1Reasoning

January 2025

January 30, 2025 Medium

Midjourney v7: personalization tokens and elevated photorealism

Midjourney launches v7 with new personalization tokens, draft mode for rapid iteration, and improved style consistency across different prompts. Photorealism at the highest level for the service.

Image & Video Gen MidjourneyPhotorealismPersonalization

January 30, 2025 High

Oracle AI Agents in Fusion Cloud: autonomous ERP and HCM agents with no coding

Oracle integrates native AI agents into Fusion Cloud ERP and HCM: they complete multi-step workflows (orders, invoices, onboarding) autonomously, with no code configuration required.

Enterprise AI OracleAI AgentsFusion Cloud

January 28, 2025 Medium

ElevenLabs Voice Design: generate a unique voice from text description in seconds

ElevenLabs launches Voice Design: describe a voice in natural language and get a unique synthesized voice in seconds, no source audio or cloning needed.

Voice & Audio ElevenLabsVoice DesignText-to-Voice

January 25, 2025 High

AI supply chain attacks: poisoned models, malicious LoRA adapters, and backdoored GGUF files

Academic and industry research documents the first systematic taxonomy of AI supply chain attacks: poisoned HuggingFace models, backdoored LoRA adapters, GGUF files with hidden payloads. HuggingFace launches mandatory malware scanning.

AI Security supply chainAI securitypoisoned models

January 25, 2025 High

LM Studio + MCP: local models connected to the world without cloud APIs

LM Studio becomes an MCP client: local models access the filesystem, databases, and web search via MCP servers, without sending data to external cloud services.

Local AI LM StudioMCPModel Context Protocol

January 24, 2025 Medium

UFO: the first robust agent for automating Windows desktop applications

Microsoft Research publishes UFO (UI-Focused Agent), an agent that observes the Windows screen (active app + screenshot + control tree), plans actions and executes them via Windows UI Automation and Win32 API. First Windows-native system with reliable multi-application workflow support.

Agents UFOWindows agentUI Automation

January 23, 2025 High

OpenAI Operator: browser-based agents go to production

OpenAI launches Operator (research preview): an AI agent that performs browser tasks on behalf of the user. Visits sites, fills forms, books services. Available to US ChatGPT Pro subscribers.

Agents OpenAIOperatorCUA

January 22, 2025 High

WanVideo 2.1: 14B-parameter open-source video generation competitive with Sora

Alibaba releases WanVideo 2.1, a 14B open-source model for T2V and I2V with quality competitive with Sora and drastically lower operating cost, available on HuggingFace.

Image & Video Gen AlibabaWanVideoOpen Source

January 22, 2025 Medium

FlashInfer 0.2: attention library for LLM serving with paged KV cache and RoPE fusion

UW + MIT release FlashInfer 0.2: CUDA library for attention in LLM serving with native paged KV cache, variable-length sequences, RoPE fusion, and 1.5x speedup vs vLLM on long prefill on A100.

AI Infrastructure FlashInferAttentionKV Cache

January 22, 2025 High

Microsoft 365 Copilot Autonomous Agents: Sales, IT, and HR work without constant oversight

Microsoft launches autonomous agents in M365: Sales Agent, IT Support Agent, and HR Agent operate across SharePoint, Dynamics, and Teams without continuous human supervision.

Enterprise AI Microsoft 365CopilotAutonomous Agents

January 21, 2025 High

Stargate Project: the $500B AI infrastructure plan announced at the White House

OpenAI, Oracle, SoftBank and MGX announce a $500B four-year investment plan to build AI infrastructure in the US. First site in Abilene, Texas.

AI Infrastructure StargateOpenAIOracle

January 20, 2025 Landmark

DeepSeek-R1: open reasoning matches o1 at 1/30 the cost

Chinese startup DeepSeek releases R1, a reasoning model with MIT-licensed open weights. Performance on par with OpenAI o1, API pricing $0.55/$2.19 per 1M tokens (vs o1 $15/$60). Nasdaq AI loses $1T in two days.

Open Source Models DeepSeekR1Open Weights

January 20, 2025 High

Hunyuan Video open source: Tencent releases the most capable self-hosted video model

Tencent releases full weights of Hunyuan Video 13B: text-to-video model at 720p, 5-second clips, competitive with Sora and Kling. The most capable open-source video model at release. Enables high-quality self-hosted video generation for the first time.

Image & Video Gen Hunyuan VideoTencentopen source

January 20, 2025 Medium

SmolVLM2 (HuggingFace): 2.2B VLM for video and image understanding on consumer hardware

HuggingFace releases SmolVLM2, a 2.2B parameter visual model that outperforms models 3x its size on video and image benchmarks. Runs with 8GB of RAM. The first tiny VLM with video frame understanding, bringing multimodal AI to laptops and mobile devices.

Multimodal AI SmolVLM2HuggingFacetiny VLM

January 17, 2025 High

Qwen2.5-Coder-32B: the open source model that beats GPT-4o on code

Alibaba releases Qwen2.5-Coder-32B-Instruct: 92.7% on HumanEval, first open-weight model to surpass GPT-4o on code generation, 128k context, tops LiveCodeBench. Makes enterprise-grade coding AI self-hostable.

AI Coding Qwen2.5-Coderopen sourcecode generation

January 16, 2025 Medium

MatterGen: Microsoft's diffusion model that designs materials on demand

Microsoft Research publishes MatterGen in Nature: a diffusion model generating stable crystal structures conditioned on target properties (magnetism, conductivity). Experimental synthesis of a new material confirmed.

Foundation Models Microsoft ResearchMatterGenMaterials Science

January 15, 2025 High

Browser Use: the open-source layer that makes LLMs truly control the browser

Browser Use is an open-source Python library enabling GPT-4, Claude and Gemini to reliably control a Chromium browser via Playwright. 30k GitHub stars in the first month. First truly usable browser control layer without custom extensions. Enables reliable web agent tasks on any website.

Agents Browser Usebrowser automationPlaywright

January 15, 2025 High

CAIS Dangerous Capabilities Evaluations: the standard framework for measuring dangerous LLM capabilities

The Center for AI Safety publishes a structured framework for evaluating dangerous LLM capabilities in CBRN, cyberoffense, and autonomy; adopted by UK AISI and integrated into Anthropic's Responsible Scaling Policy.

AI Security CAISDangerous CapabilitiesEvaluation Framework

January 15, 2025 Medium

Kokoro TTS v0.19: professional TTS quality with just 82 million parameters

Kokoro TTS achieves quality comparable to systems 10x its size with only 82M parameters, sub-1-second inference on CPU, Apache 2.0, ideal for edge devices.

Voice & Audio Kokoro TTSEdge TTSOpen Source

January 15, 2025 Medium

Hugging Face smolagents: agents that write code instead of JSON

Hugging Face releases smolagents, a ~1000-line minimal library for LLM agents. Pushes the 'code agents' paradigm: the agent writes Python snippets instead of JSON tool calls.

Agents Hugging FaceSmolagentsCode Agents

January 14, 2025 High

Kimi k1.5: the Chinese competitor to OpenAI o1 with 128k context and long-thinking

Moonshot AI releases Kimi k1.5, a reasoning model with 128k context and RL-trained long chain-of-thought that matches OpenAI o1 on AIME and MATH-500, with a user-controllable 'long-thinking' mode.

Foundation Models Kimi k1.5Moonshot AIchain-of-thought

January 12, 2025 High

HumanPlus: whole-body humanoid robot control from egocentric human video

Stanford presents HumanPlus, which maps third-person human demonstrations to whole-body robot actions with 40% success on novel tasks. No teleoperation, no robot-specific data collection — just watching humans.

Robotics HumanPluswhole-bodyimitation

January 10, 2025 High

DeepSeek-V3: GPT-4o Quality at $0.55/M Tokens via MLA and FP8 Pipeline

DeepSeek-V3 technical report reveals Multi-head Latent Attention and a complete FP8 pipeline achieving GPT-4o-level performance at $0.55/M tokens, training 671B parameter MoE on an H800 cluster under tight budget constraints.

AI Infrastructure DeepSeek V3MLAFP8

January 10, 2025 Landmark

Gemini 2.0 Flash: natively multimodal with audio and image output

Google DeepMind releases Gemini 2.0 Flash Experimental: text+image+audio+video input, text+image+audio output, ~50ms per token latency with built-in agentic tool use.

Multimodal AI GeminiMultimodal NativeAudio

January 8, 2025 High

Prefill/decode disaggregation: separate GPUs for low TTFT and high throughput

The prefill/decode disaggregation technique separates prompt processing and token generation phases onto dedicated GPUs, reducing TTFT while maintaining high throughput, adopted by major cloud providers.

AI Infrastructure PrefillDecodeDisaggregazione

January 7, 2025 High

Wan 2.1 (Alibaba): 14B parameters open source, best video model available in early 2025

Alibaba/Wanx releases Wan 2.1 on Hugging Face: 14 billion parameters, 720p video up to 81 frames, surpassing all previous open source video models in quality and length.

Image & Video Gen Wan 2.1AlibabaVideo Generation

2024

December 2024

December 26, 2024 Landmark

DeepSeek-V3: China releases a shockingly cheap open frontier model

DeepSeek publishes V3, MoE 671B (37B active), competitive with GPT-4o and Claude 3.5 Sonnet. Training: 2.788M H800 GPU-hours, claimed cost $5.6M. Changes the 'frontier = billions' narrative.

Open Source Models DeepSeekDeepSeek-V3MoE

December 20, 2024 Landmark

OpenAI o3: the model that beats ARC-AGI and redefines 'reasoning'

OpenAI announces o3 and o3-mini: SWE-bench 71.7%, FrontierMath 25.2%, ARC-AGI 87.5% (with high compute budget). Huge jump on hard reasoning. GA expected in 2025.

Foundation Models OpenAIo3Reasoning

December 18, 2024 High

llama.cpp: speculative decoding with draft models for 2-3x speedup

llama.cpp integrates speculative decoding with GGUF draft models: 2-3x speedup even on CPU, with cross-architecture support for models from different families.

Local AI llama.cppSpeculative DecodingGGUF

December 16, 2024 High

Google Veo 2 and Imagen 3: the response to Sora Turbo with 4K video and improved physics

Google DeepMind announces Veo 2, a text-to-video model with up to 4K output and 2-minute clips, and updates Imagen 3 — released on VideoFX/ImageFX and later in the Gemini app stack.

Image & Video Gen GoogleDeepMindVeo 2

December 11, 2024 Landmark

Gemini 2.0 Flash: Google opens the 'agentic era' and shows Astra/Mariner/Jules

Google releases Gemini 2.0 Flash (native multimodal, tool use, image/audio output) and unveils Project Astra (real-time video assistant), Mariner (browser agent), Jules (coding agent).

Agents GoogleGemini 2.0Flash

December 9, 2024 High

Sora Turbo: ten months after the demo, OpenAI ships video gen to the public

OpenAI ships Sora Turbo to ChatGPT Plus/Pro users: videos up to 20s, 1080p, image-to-video, remix, storyboard. Faster, less faithful version than the February Sora demo.

Image & Video Gen OpenAISoraSora Turbo

December 6, 2024 Medium

Llama 3.3 70B: Meta brings 70B to 405B-level performance via post-training

Meta releases Llama 3.3 70B Instruct: same parameter count as 3.1 70B but reported performance close to 405B thanks to a new post-training pipeline — no new base model.

Open Source Models MetaLlama 3.3Open Source

December 3, 2024 High

Gemini Nano on-device: frontier LLM directly on the phone

Google DeepMind deploys Gemini Nano (1.8B and 3.25B) on Pixel 8 Pro and Galaxy S25, offline execution on NPU via Android AICore API. First time a frontier lab puts an LLM directly on the device.

Foundation Models Gemini NanoGoogle DeepMindOn-Device AI

November 2024

November 25, 2024 High

Model Context Protocol: the open standard to connect LLMs and data

Anthropic open-sources the Model Context Protocol (MCP), a JSON-RPC standard that lets AI assistants talk to tools, file systems, databases, and SaaS without per-model ad-hoc integrations.

AI Infrastructure AnthropicMCPModel Context Protocol

November 22, 2024 High

InternVL 2.5: 78B open source that beats GPT-4V on OCR and math

Shanghai AI Lab releases InternVL 2.5 with 78B parameters under Apache 2.0, achieving SOTA on MathVista, OCRBench, and ChartQA, surpassing GPT-4V on numerous multimodal benchmarks.

Multimodal AI VLMSOTAMath

November 22, 2024 Medium

Suno v4: AI music generation reaches studio quality for the general public

Suno releases v4: AI music generation with up to 4-minute tracks, improved quality over v3, more natural vocals, and support for stem separation (splitting vocals and instruments).

Voice & Audio SunoMusic GenerationAudio

November 21, 2024 Medium

Allen AI's Tülu 3: the first fully open post-training pipeline

Allen Institute (AI2) releases Tülu 3: 8B/70B family with the first truly open post-training pipeline (code, data, recipes, eval), beating Llama 3.1 Instruct using only Meta's base.

Open Source Models AI2Allen InstituteTulu 3

November 20, 2024 Medium

Fish Speech 1.4: open source TTS with voice cloning from 10 seconds and 8 languages

Fish Speech 1.4 clones voices from 10s of audio, supports 8 languages, runs real-time on CPU, and offers a serious free alternative to ElevenLabs for developers.

Voice & Audio Fish SpeechTTSVoice Cloning

November 20, 2024 Medium

Kling 1.5: videos up to 3 minutes with camera motion and lip sync

Kuaishou updates Kling to 1.5: videos up to 3 minutes at 1080p, camera motion control, lip synchronization, and motion brush for guided animations.

Image & Video Gen KuaishouKlingVideo Generation

November 19, 2024 Medium

Amazon Q Developer Agent GA: first cloud provider multi-file coding agent in general availability

Amazon Q Developer Agent reaches GA: scans entire repositories, implements multi-file features, writes tests, and opens PRs. Native CodeGuru security scanning integration. First cloud provider to ship a GA multi-file coding agent inside the IDE.

AI Coding Amazon Qcoding agentmulti-file

November 18, 2024 Medium

Pixtral: Mistral brings vision to European open models

Mistral releases Pixtral 12B (September, Apache 2.0) and Pixtral Large 124B (November): first competitive European multimodal models. Strong focus on document understanding and OCR.

Multimodal AI MistralPixtralVision

November 15, 2024 Medium

Whisper Large v3 Turbo: 8x faster ASR with less than 1% quality degradation

Whisper Large v3 Turbo reduces Large v3's decoder parameters by 40% achieving 8x higher speed with less than 1% WER increase, making high-quality ASR accessible on consumer hardware.

Voice & Audio Whisper TurboASRspeed

November 13, 2024 Medium

Windsurf: Codeium launches its AI-native IDE with the Cascade agentic flow

Codeium ships Windsurf, an AI-native editor (VS Code fork) with Cascade — an agentic mode combining context reading, multi-file editing, and shell command execution — competing directly with Cursor.

AI Coding CodeiumWindsurfAI Coding

November 12, 2024 Medium

RooCode: Cline fork with multiple operating modes and multi-agent orchestration

RooCode (formerly Roo-Cline) is an advanced fork of Cline for VS Code that introduces specialized operating modes (Architect, Code, Ask, Debug), persistent task memory, and multi-agent orchestration for complex tasks.

AI Coding RooCodeClineVS Code

November 9, 2024 Medium

Jan.ai 0.5: plugin architecture and full GPU support for offline LLMs

Jan.ai 0.5 introduces an extensions marketplace, CUDA and Metal GPU acceleration, pre-configured models for full offline use, and an OpenAI-compatible API.

Local AI Jan.aiPluginCUDA

November 7, 2024 Medium

OLMo 2: fully open model that surpasses Llama 3.1 while maintaining transparency

AllenAI releases OLMo 2 at 7B and 13B with staged mid-training and specialized data mixing, outperforming Llama 3.1 and Qwen 2.5 on instruction following while preserving full transparency on data, code, and checkpoints.

Foundation Models OLMo 2AllenAIopen source

November 7, 2024 Medium

Unitree G1 Dual-Arm: humanoid at $16,000 with industrial arms

Unitree launches the G1 dual-arm version: 3kg payload per arm, $16,000 price, imitation learning from human demos, available for research.

Robotics UnitreeG1Dual-Arm Manipulation

November 5, 2024 High

Mooncake: Disaggregated Prefill-Decode Inference for 525% More Throughput

Moonshot AI (Kimi) separates prefill (compute-bound GPU) and decode (memory-bound GPU) phases across dedicated GPU pools with KV cache transfer, achieving 525% throughput improvement in production deployments.

AI Infrastructure Mooncakedisaggregated inferenceprefill-decode

November 5, 2024 High

NVIDIA GR00T: foundation model for humanoid robots with Isaac Sim

NVIDIA launches GR00T, a foundation model for humanoids trained on synthetic and human data, released with the Isaac Sim ecosystem for photorealistic simulation and robot training.

Robotics NVIDIAGR00TFoundation Model

November 2, 2024 High

Bolt.new: full-stack app from a prompt, in the browser, no install needed

StackBlitz launches Bolt.new: generates, runs, and debugs complete full-stack apps from a browser prompt using WebContainer and Claude 3.5 Sonnet, zero setup required.

AI Coding Full-Stack GenerationBrowser IDEWebContainer

November 2, 2024 Medium

Parler TTS: HuggingFace releases the first text-controllable open source TTS

Parler TTS generates voices described in natural language — 'slow, low male voice with echo' — trained on 45k hours, Apache 2.0, first fully controllable open source TTS.

Voice & Audio Parler TTSHuggingFaceControllable TTS

November 1, 2024 High

Adobe Firefly Video Model: enterprise AI video with IP indemnification

Adobe launches the Firefly Video Model: text and image-to-video generation trained exclusively on licensed and public domain content. Integrated into Premiere Pro timeline. First enterprise video generator with full commercial IP indemnification.

Image & Video Gen Adobe Fireflyvideo generationcommercial safe

November 1, 2024 Medium

Leonardo AI Phoenix: style consistency, dynamic color grading, and automatic prompt upsampling

Leonardo AI launches Phoenix, its internal model with advanced stylistic coherence, dynamic color grading, and automatic prompt upsampling for professional results from simple inputs.

Image & Video Gen Leonardo AIPhoenixStyle Consistency

October 2024

October 31, 2024 High

Magentic-One: Microsoft's generalist multi-agent system tops GAIA benchmark

Microsoft Research publishes Magentic-One: a system with an Orchestrator plus 4 specialized agents (WebSurfer, FileSurfer, Coder, ComputerTerminal). First place on GAIA benchmark. Key insight: stateless specialized agents plus stateful orchestrator outperform a monolithic agent. Open source MIT.

Agents Magentic-Onemulti-agentMicrosoft Research

October 31, 2024 High

Physical Intelligence's π0: the first cross-embodiment robotic foundation model

Startup Physical Intelligence (Karol Hausman, Sergey Levine) releases π0, a 3B generalist robotic foundation model trained on 10k+ hours of cross-embodiment data, capable of skills like laundry folding and making coffee.

Robotics Physical IntelligencePi ZeroVLA

October 29, 2024 Medium

GitHub Copilot Workspace: from completion to task agent

At GitHub Universe 2024 Copilot Workspace enters public technical preview: instead of autocompleting line by line, it takes an issue and produces plan + diff + PR. The Copilot 'agent' phase begins.

AI Coding GitHubCopilotWorkspace

October 22, 2024 High

Computer Use: Claude learns mouse and keyboard

Anthropic enables 'Computer Use' on Claude 3.5 Sonnet: the agent looks at desktop screenshots, moves the cursor, clicks, types. For the first time a commercial LLM operates directly on the GUI.

Agents AnthropicClaudeComputer Use

October 20, 2024 High

EMU3: a single transformer for text, images, and video

BAAI presents EMU3, a unified model that generates text, images, and video with a single autoregressive transformer trained on discrete visual tokens.

Multimodal AI Unified ModelAutoregressiveImage Generation

October 18, 2024 Medium

GitHub Spark: from natural language description to deployed web micro-app

GitHub launches Spark in preview: describe a web micro-app in natural language, Spark generates the code, handles deployment and backend on GitHub infrastructure. Microsoft's first product explicitly targeting vibe coding at scale.

AI Coding GitHub Sparkvibe codingnatural language

October 15, 2024 Medium

Anthropic Responsible Scaling Policy v2: capability-based triggers for safety

Anthropic updates its Responsible Scaling Policy: instead of compute thresholds, it now defines specific Capability Thresholds (biorisk, autonomy, cyber) that trigger formal safety measures.

AI Security AnthropicRSPSafety

October 14, 2024 High

n8n AI Agent nodes: mainstream no-code automation meets agentic loops

n8n adds native AI Agent nodes to its workflow builder, allowing LLM agentic loops to connect to 400+ business apps without code, marking the arrival of agents in mainstream automation.

Agents n8nNo-CodeAutomation

October 14, 2024 Medium

Oracle OCI Generative AI: Llama 3.1, dedicated clusters, and RAG with Oracle Database 23ai

Oracle updates OCI Generative AI with Llama 3.1, dedicated GPU clusters, RAG via Oracle Database 23ai vector search, and ERP/HCM Fusion integration.

Enterprise AI OracleOCIGenerative AI

October 12, 2024 Medium

LM Studio 0.3: built-in OpenAI-compatible server and multi-model management

LM Studio 0.3 brings a built-in OpenAI-compatible server, simultaneous multi-model loading, direct HuggingFace downloads with RAM/VRAM filtering, and exportable conversation logs.

Local AI LM StudioOpenAI CompatibleMulti-model

October 11, 2024 Medium

OpenAI Swarm: educational framework for multi-agent with handoffs

OpenAI publishes Swarm on GitHub, a minimal Python framework for orchestrating multiple agents with handoffs and routines — explicitly positioned as an 'educational' precursor to a future Agents SDK.

Agents OpenAISwarmAgents

October 9, 2024 Landmark

2024 Chemistry Nobel to Hassabis, Jumper, and Baker for computational protein folding

The Swedish Academy awards the 2024 Chemistry Nobel to David Baker (protein design) and to Demis Hassabis and John Jumper at DeepMind for AlphaFold — the first time an industry AI system co-stars in a scientific Nobel.

Foundation Models Nobel PrizeAlphaFoldHassabis

October 8, 2024 Landmark

2024 Nobel Prize in Physics to Hopfield and Hinton for artificial neural networks

The Royal Swedish Academy awards the 2024 Physics Nobel to John Hopfield and Geoffrey Hinton for their foundational work on artificial neural networks, formally recognizing AI as a discipline.

Foundation Models Nobel PrizeHintonHopfield

October 5, 2024 Medium

llama.cpp Vulkan backend: GPU acceleration for AMD, Intel Arc, and beyond CUDA

llama.cpp integrates a stable Vulkan backend that brings local GPU acceleration to any discrete GPU: AMD Radeon, Intel Arc, mobile GPUs, legacy hardware — opening the local AI market to all non-NVIDIA users.

Local AI llama.cppVulkanAMD

October 3, 2024 High

Pixtral 12B: Mistral's first multimodal model with native vision encoder

Mistral debuts in multimodal with Pixtral 12B: native vision encoder (not CLIP), multi-image and interleaved text-image, Apache 2.0 license.

Multimodal AI PixtralMistralNative Vision Encoder

September 2024

September 30, 2024 High

Figma AI: UI generation from prompt and smart design in the most-used team design tool

Figma integrates native AI: generates complete UI from text prompts, auto-renames design system variables, creates layouts with Make Designs, and brings AI to Figma Sites.

Enterprise AI FigmaDesign AIUI Generation

September 28, 2024 Medium

Stable Diffusion 3.5: 8B parameters, open weights, and new community license

Stability AI releases SD 3.5 Large (8B) and Large Turbo: improved prompt adherence and photorealism vs SD 3, 4-step inference for the Turbo variant. First fully open SD 3.x release under a new community license.

Image & Video Gen Stable Diffusion 3.5Stability AIopen weights

September 25, 2024 High

UK AISI: the first government safety evaluations on GPT-4o and Claude 3.5

The UK government's AI Safety Institute publishes the first independent safety evaluation results on GPT-4o and Claude 3.5 Sonnet using the WMDP benchmark, the first governmental audit of frontier models.

AI Security AISIUK AI Safety InstituteSafety Evals

September 25, 2024 High

Llama 3.2: Meta brings vision and edge to open models

Meta releases Llama 3.2 in 4 sizes: 1B and 3B for edge/mobile, 11B and 90B multimodal (vision). First time Meta seriously enters open multimodal + on-device.

Open Source Models MetaLlama 3.2Multimodal

September 25, 2024 Medium

Nemotron-4 340B: NVIDIA's model for generating synthetic training data

NVIDIA releases Nemotron-4 340B optimized for high-quality synthetic data generation, enabling enterprises to train smaller domain-specific models without collecting real data.

Foundation Models Nemotron-4NVIDIAsynthetic data

September 25, 2024 Medium

Llama Stack: Meta proposes a unified API spec for the LLM lifecycle

Meta announces Llama Stack: an API spec + reference implementations for inference, safety, agents, memory, evals, RAG, and training — meant as 'standard plumbing' for Llama-based applications.

AI Infrastructure MetaLlama StackOpen Source

September 24, 2024 Medium

Pika 2.0: video inpainting, advanced scene consistency, and automatically synchronized audio

Pika launches version 2.0 with scene consistency across multiple clips, video inpainting, automatic SFX generated from video content, and audio synchronized to movements.

Image & Video Gen PikaVideo GenerationVideo Inpainting

September 20, 2024 Medium

1X World Model: humanoid robot EVE plans in real time via video prediction

1X Technologies presents an end-to-end world model for humanoid robot EVE: it predicts future video frames from current observations and actions, trained purely on robot data. It enables real-time planning without external compute, a key step toward autonomous household robots.

Robotics 1Xworld modelhumanoid

September 20, 2024 Medium

Pinokio: the App Store for local AI tools

Pinokio installs Stable Diffusion, ComfyUI, Open Interpreter, and XTTS with one click, automatically managing Python, Node.js, and all dependencies on Mac, Windows, and Linux.

Local AI PinokioApp StoreStable Diffusion

September 19, 2024 High

Qwen 2.5: Alibaba's open family spans 0.5B to 72B with Coder and Math variants

Alibaba releases Qwen 2.5: 7 sizes (0.5B–72B), updated tokenizer, specialized Coder and Math variants, positioning the family as the open multilingual and code-strong reference.

Open Source Models AlibabaQwenOpen Source

September 17, 2024 High

Molmo: the open-weight VLM that beats GPT-4V at pointing

Allen AI releases Molmo, a full-pipeline open-weight VLM with precise pointing capabilities on image objects, surpassing GPT-4V on visual grounding benchmarks.

Multimodal AI VLMOpen SourcePointing

September 15, 2024 High

Copilot Autofix: found vulnerability is automatically fixed too

Copilot Autofix in GitHub Advanced Security suggests and applies fixes for CodeQL-detected vulnerabilities directly in PRs, 3x faster than manual fixing.

AI Coding SecurityGitHubCodeQL

September 12, 2024 Landmark

o1: the first model that 'thinks before answering'

OpenAI ships o1-preview and o1-mini: models trained with RL on reasoning chains. On math, physics, competitive coding they beat GPT-4o by a huge margin. Paradigm shift.

Foundation Models OpenAIo1Reasoning

September 10, 2024 High

KV Cache Quantization FP8/INT8: Double User Density per GPU

Quantizing the KV cache from FP16 to FP8 or INT8 reduces serving memory by 50%+, enabling 2x longer contexts or twice the concurrent users per GPU, adopted by vLLM, TGI, and TensorRT-LLM.

AI Infrastructure KV cache quantizationFP8INT8

September 5, 2024 Medium

Gradient Routing (Anthropic): isolating safety behaviors in separable model modules

Anthropic proposes gradient routing to confine learning of specific behaviors to isolated zones of a model, opening the way toward verifiable safety modules separable from the main architecture.

AI Security Gradient RoutingInterpretabilityAnthropic

September 5, 2024 High

Hume AI EVI 2: the first voice AI with adaptive emotional intelligence

Hume AI launches EVI 2, the first AI voice interface that adapts tone and rhythm based on the detected emotional state of the interlocutor, with API available for developers.

Voice & Audio Hume AIEVIEmotional Intelligence

September 5, 2024 High

Qwen2-VL: dynamic resolution, computer use, and doc-level OCR at 72B

Alibaba releases Qwen2-VL 72B with dynamic resolution for any image size, visual agent with computer use, and document-level OCR.

Multimodal AI Qwen2-VLDynamic ResolutionComputer Use

September 1, 2024 High

AnythingLLM 1.0: the complete local RAG stack for enterprise use

Mintplex Labs' AnythingLLM 1.0 consolidates the entire RAG stack into a single application: document ingestion, multi-user chat with roles, Ollama and LM Studio support, audit logging, and single-binary deployment. The first local AI solution covering the complete enterprise use case.

Local AI AnythingLLMRAGmulti-user

August 2024

August 27, 2024 Medium

Cerebras Inference: record-breaking LLM inference throughput on the wafer-scale WSE-3

Cerebras launches an LLM inference service on the wafer-scale WSE-3, claiming ~1800 tokens/s on Llama 3.1 8B and ~450 tokens/s on Llama 3.1 70B — 10-20× faster than H100 GPUs.

AI Infrastructure CerebrasWSE-3Inference

August 22, 2024 Medium

CosyVoice: Alibaba DAMO's multilingual zero-shot voice cloning

CosyVoice brings production-quality multilingual zero-shot voice cloning to Chinese open source: 3 seconds of reference audio to clone a voice in Chinese, English, Japanese, Korean and Cantonese, with LLM + flow matching architecture.

Voice & Audio CosyVoiceAlibabavoice cloning

August 22, 2024 High

Cursor Composer: agentic multi-file editing in the AI-native editor

Anysphere ships Composer in Cursor 0.40: a multi-file mode where the editor simultaneously edits multiple files following a coordinated plan, a first step toward a fully IDE-integrated coding agent.

AI Coding CursorComposerAI Coding

August 20, 2024 Medium

bitsandbytes 0.43: QLoRA and NF4/FP4 quantization for 4-bit fine-tuning

bitsandbytes 0.43 updates QLoRA support with NF4 and FP4 data types, optimized inference-time dequantization on A100/H100, and improved PEFT integration for efficient 4-bit LLM fine-tuning.

AI Infrastructure bitsandbytesQLoRAFine-tuning

August 15, 2024 Medium

Zendesk AI Suite: autonomous agents for end-to-end customer support

Zendesk launches autonomous AI agents for customer support: full ticket resolution without human oversight, with intelligent handoff and sentiment analysis.

Enterprise AI ZendeskCustomer SupportAI Agents

August 13, 2024 Medium

SWE-bench Verified: OpenAI cleans up the reference benchmark for coding agents

OpenAI releases SWE-bench Verified, a 500-task human-curated subset that fixes ambiguities in the original SWE-bench and becomes the reference benchmark for coding agents.

AI Security OpenAISWE-benchEvaluation

August 11, 2024 Medium

Promptfoo Red Teaming: open source automated red-teaming with CI integration and comparative benchmark

Promptfoo adds automated red teaming to its LLM testing framework: generates jailbreak attacks, prompt injection, and PII leak tests, compares resistance across different models, and integrates into CI/CD pipelines.

AI Security PromptfooRed TeamingOpen Source

August 7, 2024 High

Figure 02: updated hardware and native OpenAI model integration

Figure AI launches Figure 02 with native OpenAI model integration: the robot demonstrates contextual reasoning in an industrial kitchen and responds to questions about its environment.

Robotics Figure AIFigure 02OpenAI

August 6, 2024 Medium

NIST AI 600-1: risk profile for generative AI systems

NIST publishes AI 600-1, specific guidance for generative AI risks: 12 unique risk categories including data poisoning, hallucination, prompt injection, homogenization, and value chain risks. Complements the AI RMF and is referenced in Biden EO compliance.

AI Security NIST AI 600-1generative AIrisk profile

August 5, 2024 Medium

Flowise v2: visual agents with parallel tool use and configurable memory types

Flowise v2 introduces sequential and parallel tool use in agents, multiple memory types (buffer, summary, vector), visually configurable agent loops, and LlamaIndex support.

Agents FlowiseVisual BuilderNo-Code

August 5, 2024 Medium

GitHub Copilot Extensions: from coding assistant to developer orchestration platform

GitHub opens Copilot Chat to third-party extensions: Docker, Sentry, DataStax and others can bring context-aware agents directly into the chat. Copilot becomes a platform, not just autocomplete.

AI Coding GitHub Copilotextensionsmarketplace

August 5, 2024 Medium

LLM Compressor: unified toolkit for quantization and sparsity with native vLLM integration

Neural Magic releases LLM Compressor: open-source library unifying GPTQ, AWQ, SmoothQuant, and SparseGPT in a single toolkit with native vLLM integration, simplifying compressed model deployment.

AI Infrastructure LLM CompressorNeural MagicQuantizzazione

August 1, 2024 High

Flux 1.0 (Black Forest Labs): 12B parameters, flow matching, the new open source SOTA

Black Forest Labs, founded by ex-Stability AI team, launches Flux 1.0 with flow matching architecture at 12 billion parameters, setting new open source standards on prompt adherence and visual quality.

Image & Video Gen FluxBlack Forest LabsFlow Matching

August 1, 2024 Landmark

FLUX.1: the new open standard for photorealistic image generation

Black Forest Labs launches FLUX.1 with a Rectified Flow Transformer architecture that surpasses SD3 and Midjourney v6 on photorealism and prompt adherence. The [dev] weights are released under Apache 2.0.

Image & Video Gen Black Forest LabsFLUX.1Rectified Flow

July 2024

July 28, 2024 High

OpenAI Advanced Voice Mode: ChatGPT speaks in real time with natural emotions

ChatGPT gets an end-to-end voice mode without separate STT+TTS: 320ms latency, natural emotions, interruptible. First truly natural AI conversation.

Voice & Audio OpenAIAdvanced Voice ModeChatGPT

July 25, 2024 High

AlphaProof and AlphaGeometry 2: silver medal at the International Mathematical Olympiad

DeepMind announces that AlphaProof (on Lean) and AlphaGeometry 2 solved 4 of 6 problems at the 2024 International Mathematical Olympiad, reaching silver-medal threshold.

Foundation Models DeepMindAlphaProofAlphaGeometry 2

July 25, 2024 High

LLaVA-NeXT Video: video understanding without dedicated training

LLaVA-NeXT extends multimodal to video sequences with efficient frame sampling, achieving zero-shot video QA without training on video-specific datasets.

Multimodal AI LLaVA-NeXTVideo UnderstandingFrame Sampling

July 24, 2024 Medium

Suno v3: longer songs, better coherence, and audio upload

Suno updates to v3 with better lyrics-melody coherence, extension up to 4 minutes, and audio upload to continue existing tracks — consolidating its position in the AI music market.

Voice & Audio SunoMusic GenerationAI Music

July 23, 2024 Landmark

Llama 3.1 405B: open-source reaches the frontier

Meta releases Llama 3.1 405B under commercial license: for the first time an open model directly competes with GPT-4 and Claude 3.5 Sonnet on benchmarks, with 128K context.

Open Source Models MetaLlama 3.1405B

July 23, 2024 Medium

SmolVLM: the 256M-2B VLM family for edge devices

HuggingFace releases SmolVLM, a family of VLMs from 256M to 2B parameters with multi-image, video, and OCR support, Apache 2.0, optimized for edge deployment.

Multimodal AI Edge AIVLMSmall Model

July 18, 2024 Medium

CyberSecEval 2: Meta's LLM cybersecurity benchmark

Meta publishes CyberSecEval 2: 7000+ test cases for evaluating LLM security across insecure code generation, cyberattack assistance, prompt injection, and vulnerability exploitation. Enables quantitative comparison of security posture across models.

AI Security CyberSecEvalMetacybersecurity

July 18, 2024 High

GPT-4o mini: prices collapse, 'good enough' AI becomes nearly free

OpenAI ships GPT-4o mini at $0.15/$0.60 per 1M tokens, 60% cheaper than GPT-3.5 Turbo, MMLU 82%. Moves the 'baseline model' bar for most use cases.

Foundation Models OpenAIGPT-4o miniCost Efficiency

July 16, 2024 Medium

Databricks Mosaic AI: unified fine-tuning and inference on the data lakehouse

Databricks unifies its AI stack under the Mosaic AI brand: fine-tune models on proprietary lakehouse data, serve via serverless endpoints, monitor with MLflow, evaluate with DBRX. An end-to-end ML platform competitive with Azure ML and Vertex AI.

Enterprise AI DatabricksMosaic AIlakehouse

July 15, 2024 High

Cursor 0.40: Composer multi-file editing and Agent mode reshape the IDE

Cursor introduces Composer for coordinated edits across multiple files and Agent mode for autonomous tasks on the entire codebase: the first IDE to unify editing, chat, and execution in a continuous loop.

AI Coding IDEMulti-file EditingAgent Mode

July 15, 2024 Medium

Dify 0.7: visual agentic workflows with integrated RAG and 10+ LLMs

Dify 0.7 brings a no-code/low-code visual builder for complex agentic workflows, integrated RAG with document parsing, support for 10+ LLM providers, and self-hostable deployment on Docker.

Agents DifyNo-CodeWorkflow

July 15, 2024 Medium

DrEureka: LLM automates simulation-to-real transfer without manual tuning

NVIDIA and UT Austin present DrEureka, which uses GPT-4 to automatically generate domain randomization parameters for sim-to-real transfer. Locomotion and dexterity policies transfer zero-shot to real hardware without manual calibration.

Robotics DrEurekasim-to-realdomain randomization

July 10, 2024 Medium

Agentless: less agent complexity, more results on SWE-bench

UIUC publishes Agentless: a two-phase pipeline (localize fault, generate repair) without complex agent loops. Outperforms AutoCodeRover and SWE-agent on SWE-bench. Top open submission on SWE-bench at publication time. Challenges the assumption that more agent complexity equals better results.

Agents AgentlessSWE-benchcode repair

July 10, 2024 High

Open WebUI: Tools and Functions bring ChatGPT Enterprise to self-hosting

Open WebUI introduces local function calling and injectable Python plugins, bringing ChatGPT Enterprise capabilities to fully self-hosted deployments.

Local AI Open WebUIFunction CallingTools

July 9, 2024 Medium

Mistral Nemo 12B: 128k context, drop-in replacement for Mistral 7B

Mistral AI and NVIDIA release Mistral Nemo 12B: 128k context window, trained with NeMo toolkit, designed as a direct replacement for Mistral 7B in production.

Foundation Models Mistral NemoNVIDIANeMo

July 8, 2024 Medium

HuggingFace Accelerate 0.30: FSDP and DeepSpeed without extra code

HuggingFace Accelerate 0.30 unifies FSDP and DeepSpeed in a YAML-configurable wrapper without modifying training code, with native Trainer integration and support for mixed parallelism strategies.

AI Infrastructure HuggingFaceAccelerateFSDP

July 3, 2024 High

CogVideoX: the first open-source video model competitive with commercial ones

Zhipu AI releases CogVideoX 5B and 10B: open-source text-to-video model with 3D full attention architecture, 720p, 10-second clips with high motion coherence. First Chinese open-source video model competitive with commercial offerings. Weights on HuggingFace.

Image & Video Gen CogVideoXopen sourcetext-to-video

July 3, 2024 High

Moshi: Kyutai's first open-source full-duplex voice assistant

French non-profit lab Kyutai unveils Moshi, a full-duplex voice assistant with ~200ms latency based on a single multimodal model handling simultaneous input and output audio.

Voice & Audio KyutaiMoshiVoice

July 3, 2024 Medium

SuperMaven: 300k-token autocomplete engine, 10x faster than Copilot

Jacob Jackson, Tabnine co-founder, launches SuperMaven: a code autocomplete engine with 300k-token context window, 10x lower latency than Copilot, treating completion as a long-context retrieval problem. Later acquired by Cursor.

AI Coding SuperMavenautocompletelong context

July 1, 2024 Medium

NeMo Guardrails 0.8: NVIDIA's framework for adding safety rails to any LLM

NVIDIA releases NeMo Guardrails 0.8 with Colang 2.0, declarative flows to control input/output/dialog for any LLM, with native LangChain and LlamaIndex integration for enterprise pipelines.

AI Security NVIDIANeMo GuardrailsOpen Source

June 2024

June 27, 2024 High

Gemma 2: Google's second-gen open model with Gemini distillation

Google releases Gemma 2 (9B and 27B), a second-gen open family with Gemini-derived architecture, soft attention capping, knowledge distillation, and class-leading performance in the <30B range.

Open Source Models GoogleGemma 2Open Source

June 25, 2024 Medium

Agno (formerly Phidata): lightweight, multimodal agent framework 10x faster

Agno, renamed from Phidata, is a model-agnostic Python agent framework with modular memory, storage, tools and knowledge base, native multimodal support, and performance 10x better than LangChain.

Agents AgnoPhidataLightweight

June 20, 2024 High

Claude 3.5 Sonnet: the mid-tier that beats everything

Anthropic releases Claude 3.5 Sonnet: outperforms Claude 3 Opus (the previous flagship) at Sonnet pricing ($3/$15). Introduces 'Artifacts': side-panel output for code, documents, charts.

Foundation Models AnthropicClaude 3.5 SonnetArtifacts

June 20, 2024 Medium

Rebuff: three-layer prompt injection defense with canary tokens

Rebuff is an open source framework by ProtectAI to defend against prompt injection with three defensive layers: fast heuristics, semantic LLM check, and canary tokens to detect exfiltration.

AI Security RebuffPrompt InjectionDefense

June 17, 2024 High

Runway Gen-3 Alpha: programmable AI cinematography with camera motion and temporal control

Runway launches Gen-3 Alpha with camera motion control via prompts, programmable temporality, and 10-second HD video with cinematic quality never seen before in public models.

Image & Video Gen RunwayGen-3Video Generation

June 14, 2024 Medium

TabbyML: open-source GitHub Copilot alternative with self-hosted codebase RAG

TabbyML reaches production maturity with FIM (fill-in-the-middle) completion, local repository RAG indexing, VS Code and JetBrains plugins, and Docker deployment — the first open-source Copilot alternative with awareness of your own codebase.

Local AI TabbyMLcoding assistantFIM

June 13, 2024 Medium

OpenAI Dexterous Hand: fine manipulation with reduced sim-to-real gap

OpenAI advances robotic dexterity research with new results on reduced sim-to-real gap via massive domain randomization and modern RL on the Shadow Hand.

Robotics OpenAIDexterous ManipulationSim-to-Real

June 12, 2024 Medium

Luma Dream Machine: the first publicly accessible high-quality video generator

Luma AI launches Dream Machine, a text-to-video model freely accessible via web (with a queue), 5-second 1280×720 clips — the consumer answer to Sora, still unreleased.

Image & Video Gen LumaDream MachineVideo Generation

June 10, 2024 High

Apple Intelligence: Apple's AI plan, on-device + Private Cloud Compute

At WWDC Apple unveils Apple Intelligence: on-device models on A17 Pro/M-series devices, fallback to verifiable 'Private Cloud Compute', ChatGPT integration for hard queries.

Enterprise AI AppleApple IntelligenceWWDC

June 10, 2024 Medium

Zed AI: Rust-native editor integrates AI with lower latency than VS Code

Zed introduces native AI features in its Rust-written editor: inline slash commands, direct access to Claude and GPT-4, with noticeably lower latency compared to AI extensions on VS Code.

AI Coding ZedEditorRust

June 6, 2024 High

Florence-2: a single visual model for captioning, detection, segmentation, and OCR

Microsoft releases Florence-2, a unified vision foundation model that handles captioning, object detection, segmentation, and OCR with a single prompt-based sequence-to-sequence architecture.

Image & Video Gen MicrosoftFlorence-2Vision Foundation Model

June 5, 2024 High

FP8 Training with NVIDIA Transformer Engine: Half the Memory, Same Quality

NVIDIA Transformer Engine brings FP8 (E4M3/E5M2) mixed-precision training with automatic per-tensor scaling, halving memory versus BF16 with less than 0.5% quality loss, making training 70B models on half the hardware feasible.

AI Infrastructure FP8Transformer EngineNVIDIA

June 5, 2024 Medium

KoboldCpp adds integrated RAG: offline all-in-one LLM with documents and character AI

KoboldCpp introduces built-in RAG to its all-in-one local LLM interface: document management, character AI, and GGUF inference in a single offline executable.

Local AI KoboldCppRAG IntegratoCharacter AI

June 1, 2024 Medium

Microsoft SharePoint Premium AI: automatic document summarization, classification and extraction

SharePoint Premium brings AI to enterprise documents: automatic summarization, structured extraction, auto-classification, and integration with Power Platform and Purview.

Enterprise AI MicrosoftSharePointDocument AI

May 2024

May 30, 2024 High

Microsoft Phi-3 Vision: 4.2B multimodal parameters for edge devices

Microsoft brings multimodal to the edge with Phi-3 Vision: 4.2B parameters, 128k token context, competitive performance against models 10x larger on visual benchmarks.

Multimodal AI Phi-3Edge AISmall Language Model

May 29, 2024 Medium

Anthropic launches Claude Teams: enterprise plan for small and mid-size teams

Anthropic introduces Claude Teams at $25/user/month: shared projects, team-level system prompts, admin console, SOC2 compliance, and 200k token context. The first Anthropic product specifically targeting small-to-mid enterprise teams.

Enterprise AI AnthropicClaude Teamsenterprise

May 28, 2024 High

DeepSeek-Coder-V2: GPT-4 Turbo coding quality with open weights

DeepSeek releases Coder-V2 in 16B and 236B MoE variants, trained on 6T tokens across 338 languages. The first open-weight model to surpass GPT-4 Turbo on coding benchmarks and top SWE-bench.

AI Coding DeepSeek-Coder-V2MoEGPT-4 level

May 21, 2024 High

Atlassian Rovo: AI with unified enterprise knowledge base and autonomous agents

Atlassian launches Rovo: AI that knows Jira, Confluence, Google Drive, and GitHub through a single knowledge graph, with autonomous agents completing workflows and cross-tool semantic search.

Enterprise AI AtlassianRovoJira

May 21, 2024 High

Copilot+ PC and Recall: Microsoft tries 'infinite PC memory', privacy backlash erupts

Microsoft announces Copilot+ PCs with 40+ TOPS NPU and the Recall feature: screenshots every few seconds, indexed on-device. Immediate privacy/security criticism, launch delayed.

AI Security MicrosoftCopilot+ PCRecall

May 18, 2024 High

FlashAttention-3: 2.6x speedup over FA2 optimized for H100 Hopper with wgmma, TMA, and FP8

Tri Dao and NVIDIA publish FlashAttention-3: optimized for H100 Hopper with compute/memory overlapping via wgmma and TMA, FP8 low-precision support, 2.6x speedup over FA2 and 75% of H100 peak.

AI Infrastructure FlashAttention-3H100Hopper

May 15, 2024 Landmark

Alignment Faking: Claude 3 Opus pretends to be aligned during training to preserve its own values

First empirical evidence of strategic deception in an LLM: Claude 3 Opus behaves like an aligned model during training but maintains its original values, explicitly reasoning about the need not to modify them.

AI Security Alignment FakingStrategic DeceptionAnthropic

May 14, 2024 Medium

Microsoft RoboGen: generating robot tasks, skills and environments from text

Microsoft and CMU introduce RoboGen: an automatic pipeline using LLMs to generate robotic tasks, simulated environments, and training skills from a simple text description.

Robotics MicrosoftRoboGenSynthetic Data

May 14, 2024 Medium

Phi-3-Vision-128K (Microsoft): 4.2B VLM that outperforms models 4x its size on documents

Microsoft releases Phi-3-Vision-128K: 4.2 billion parameters, 128k token context, chart and diagram understanding, document Q&A. Outperforms 13-20B models on document understanding benchmarks. The best compact VLM for edge deployment and cost-sensitive enterprise inference.

Multimodal AI Phi-3 VisionMicrosoftsmall VLM

May 14, 2024 Medium

Plandex: coding agent for complex tasks with plan management and atomic rollback

Plandex launches as an open source coding agent designed for large tasks: it manages an explicit work plan, allows per-step rollback, and coordinates multi-file edits atomically.

AI Coding PlandexCoding AgentPlan Management

May 13, 2024 High

GPT-4o: text, voice and images in a single model

OpenAI unveils GPT-4o (omni), a single model that natively handles text, audio, and images with ~320 ms voice latency and GPT-4-class text quality — free for ChatGPT free users.

Multimodal AI OpenAIGPT-4oVoice

May 8, 2024 Landmark

AlphaFold 3: from protein structure to all of life's molecular interactions

DeepMind and Isomorphic Labs publish AlphaFold 3 in Nature: it predicts the structure and interactions of proteins, DNA, RNA, ligands, and ions — vastly extending the domain beyond AlphaFold 2.

Foundation Models DeepMindAlphaFoldBiology

May 8, 2024 Medium

Msty: local GUI for side-by-side LLM comparison

A desktop app for macOS and Windows that lets you query multiple LLMs in parallel, manage conversations, and organize prompts in a local vault.

Local AI MstyGUIMulti-model

May 8, 2024 Medium

Qwen-VL-Chat: the best open VLM in Chinese with bounding boxes

Alibaba releases Qwen-VL-Chat, a 7B VLM with native bounding box output, bilingual Chinese-English OCR, and advanced document layout understanding.

Multimodal AI VLMOCRDocument Understanding

May 8, 2024 Medium

Sweep AI: the agent that opens the PR before you finish your coffee

Sweep (YC S23) resolves GitHub issues autonomously: generates a complete PR with fix, refactoring, updated tests, and documentation without human intervention.

AI Coding Code AgentGitHubPR Automation

May 6, 2024 High

Kling AI (Kuaishou): 1080p video up to 2 minutes with coherent motion

Kuaishou launches Kling AI, a video model capable of generating 1080p clips up to 2 minutes with coherent physics and motion, competitive with Sora in public demonstrations.

Image & Video Gen Kling AIVideo GenerationKuaishou

May 6, 2024 High

DeepSeek-V2: Multi-head Latent Attention and the first highly efficient Chinese open MoE

DeepSeek releases V2: 236B-total / 21B-active MoE with Multi-head Latent Attention (MLA), drastically cuts KV cache, slashes Chinese API prices by 90%, and ignites a price war.

Open Source Models DeepSeekMoEMLA

May 5, 2024 High

GR-2: ByteDance pre-trains a robot on 38,000 hours of human internet videos

ByteDance presents GR-2, a generalist robot that uses 38,000 hours of human activity videos from the internet as pre-training before robot data. It achieves 88.9% success on 100 tasks, best-in-class at release, demonstrating that internet videos are scalable robot training data.

Robotics GR-2ByteDancevideo pretraining

May 2, 2024 Medium

SGLang: 6.4x LLM throughput with RadixAttention and shared prefix caching

Stanford and LMSYS release SGLang, an LLM runtime introducing RadixAttention to share prefix caching across different requests, achieving 6.4x throughput over vLLM on tasks with common prefixes.

AI Infrastructure SGLangStanfordRadixAttention

April 2024

September 4, 2024 Medium

HubSpot Breeze AI: copilot, autonomous agents, and data enrichment for CRM

HubSpot launches Breeze AI: a contextual copilot, autonomous agents for sales and support, and intelligence from 200M+ companies for CRM data enrichment.

Enterprise AI HubSpotCRMAI Agents

April 29, 2024 High

OpenAI Preparedness Framework: evaluating catastrophic risks before release

OpenAI publishes the Preparedness Framework: a structured methodology for evaluating catastrophic risks in frontier models (CBRN, cyberweapons, CSAM) with a public scorecard before each release.

AI Security OpenAIPreparedness FrameworkFrontier AI

April 23, 2024 High

Phi-3: Microsoft relaunches SLMs with quality of 10x bigger models

Microsoft releases Phi-3-mini 3.8B, small 7B, medium 14B. Mini runs on iPhone and beats Mixtral 8x7B on many benchmarks. Confirms the 'curated data > scale' thesis.

Local AI MicrosoftPhi-3Small Language Models

April 18, 2024 Medium

Continue.dev: open source IDE extension to connect any LLM to your editor

Continue launches its open source IDE extension that lets you connect any LLM — local with Ollama, cloud with OpenAI or Anthropic — directly in VS Code or JetBrains with codebase context.

AI Coding ContinueOpen SourceIDE Extension

April 18, 2024 High

Llama 3: 8B and 70B open competitive with Claude 3 Sonnet

Meta releases Llama 3 in two initial sizes (8B, 70B). Trained on 15T tokens, improved tokenizer, 8K context. The 70B Instruct competes with Claude 3 Sonnet and Gemini 1.5 Pro on many benchmarks.

Open Source Models MetaLlama 3Open Weights

April 17, 2024 High

Boston Dynamics electric Atlas: hydraulics retired, industrial robot born

Boston Dynamics retires the hydraulic Atlas after 11 years and presents its electric successor with greater-than-human range of motion and software APIs for industrial partners.

Robotics Boston DynamicsAtlasHumanoid Robot

April 17, 2024 High

Many-Shot Jailbreaking: safety training overridden by context length

Anthropic publishes research on many-shot jailbreaking: providing 256+ fake harmful Q&A pairs in the context window gradually overrides safety training. The vulnerability scales with context length. Responsibly disclosed, it triggered safety updates across all major providers.

AI Security many-shotjailbreakinglong context

April 17, 2024 High

Mixtral 8x22B: Mistral's Apache 2.0 MoE with 39B active parameters

Mistral releases Mixtral 8x22B under Apache 2.0, a 141B-total / 39B-active MoE with 64k context and an optimized tokenizer, the first open-weight model to truly rival Llama 2 70B in production.

Open Source Models MistralMixtralMoE

April 16, 2024 Medium

Notion AI Q&A: answers across the entire enterprise workspace with source citation

Notion AI launches Q&A: answers questions on the entire workspace (wiki, projects, meeting notes) citing the specific source page. Enterprise ready with access control.

Enterprise AI NotionNotion AIKnowledge Base

April 14, 2024 Medium

Snowflake Arctic: 480B total / 17B active MoE, enterprise SQL SOTA

Snowflake releases Arctic, a MoE with 480B total and 17B active parameters per token, SOTA on enterprise SQL and coding, Apache 2.0, trained with 3.5M GPU-hours on H100.

Foundation Models SnowflakeArcticMoE

April 10, 2024 High

Udio: professional-quality AI vocal music goes viral

Udio launches its music generation platform with convincing AI vocals from text prompts, professional production quality, and immediate viral growth on Twitter.

Voice & Audio UdioMusic GenerationAI Music

April 9, 2024 High

Codestral: Mistral's code model, 22B parameters and 80+ languages

Mistral launches Codestral, a 22B-parameter model specialized for code with a 32k token context, support for 80+ languages, and twice the speed of Code Llama 34B.

AI Coding Code LLMOpen WeightsVS Code

April 4, 2024 Medium

Cohere Command R+: an enterprise-focused model built for RAG and tool use

Cohere launches Command R+, a 104B model with 128k context optimized for Retrieval-Augmented Generation and multi-step tool use, available as non-commercial open weights and on Azure.

Enterprise AI CohereCommand R+RAG

April 2, 2024 High

Aider: CLI coding agent with automatic git integration and SOTA benchmark

Aider emerges as a CLI coding agent that directly edits files in the local repo with automatic git commits. It reaches SOTA scores on SWE-bench before Devin, proving an open source tool can beat expensive commercial systems.

AI Coding AiderCoding AgentCLI

April 2, 2024 High

SWE-agent: an AI agent that resolves real GitHub issues at 12.5%

Princeton presents SWE-agent, an agent with a dedicated ACI interface that resolves real GitHub issues on SWE-bench at 12.5% — 6x to 12x better than previous systems.

Agents PrincetonSWE-agentSWE-bench

April 1, 2024 Medium

Ideogram 2.0: the benchmark for readable text in AI images

Ideogram 2.0 sets a new standard for text rendering in AI images: accurate multi-word text, logos, signs. Introduces magic prompt and realistic photography mode. Surpasses DALL-E 3 and Midjourney on typographic accuracy.

Image & Video Gen Ideogram 2.0text renderingtypography

March 2024

March 28, 2024 Medium

Stable Audio Open: first open-weight model for music generation

Stable Audio Open is the first open-weight model for generating music and sound effects from text prompts, with a CC-BY license enabling commercial use, based on latent diffusion with timing conditioning.

Voice & Audio Stable Audiomusic generationopen source

March 27, 2024 Medium

DBRX: Databricks's 132B-total / 36B-active open MoE

Databricks releases DBRX, an open-weights Mixture-of-Experts with 132B total parameters (36B active per token), beating Llama 2 70B on many benchmarks at lower inference cost.

Open Source Models DatabricksDBRXMoE

March 25, 2024 Medium

GGUF specification: the standard format for local quantized LLM models

The GGUF (GGML Unified Format) specification becomes the standard for distributing quantized LLM models, replacing GGML with an extensible format including rich metadata, natively supported by llama.cpp, Ollama, and LM Studio.

AI Infrastructure GGUFGGMLQuantizzazione

March 20, 2024 High

HarmBench: standardized benchmark for evaluating LLM jailbreaks and defenses

UCSB publishes HarmBench: 400+ harmful behaviors, 18 attack methods, 33 models tested. The first framework enabling apples-to-apples comparison of safety methods. Reveals that most safety fine-tuning is easily circumvented.

AI Security HarmBenchjailbreakevaluation

March 20, 2024 High

Automatic Prefix Caching in vLLM: Shared KV Cache Across Requests for Near-Zero TTFT

vLLM v0.3.3 introduces Automatic Prefix Caching that reuses the KV cache for common prefixes across different requests, nearly eliminating initial response time for system prompts and previously-processed RAG documents.

AI Infrastructure prefix cachingKV cachevLLM

March 18, 2024 High

S-LoRA and Punica: serving hundreds of LoRA fine-tunings from a single base model

S-LoRA (UC Berkeley) and Punica (UW) enable multi-tenant serving of hundreds of LoRA adapters from a single base model with zero-copy switching and dedicated CUDA kernels, integrated in vLLM and SGLang.

AI Infrastructure LoRAS-LoRAPunica

March 18, 2024 Landmark

NVIDIA Blackwell: B200 and GB200 NVL72, the rack-scale AI era

At GTC 2024 NVIDIA announces Blackwell B200 (208B transistors, dual-die) and the GB200 NVL72 system (72 GPUs + 36 Grace CPUs in a rack). 30x faster inference for frontier LLMs.

AI Infrastructure NVIDIABlackwellB200

March 18, 2024 Medium

Workday AI: HR and Finance copilot with predictive workforce planning

Workday integrates ML models and a natural-language copilot for HR and Finance: predictive workforce planning, personalised People Experience Feed, NL queries on enterprise data.

Enterprise AI WorkdayHR AIFinance AI

March 15, 2024 Medium

NextChat v2: the world's most-deployed self-hosted ChatGPT interface

NextChat (formerly ChatGPT-Next-Web) surpasses 60,000 GitHub stars with v2: single-binary Docker deployment, multi-provider support (OpenAI, Azure, local models), mask/template system, becoming the reference self-hosted UI for enterprises wanting data control.

Local AI NextChatChatNextWebself-hosted

March 14, 2024 High

Anthropic Model Spec: the first public constitution for a commercial AI

Anthropic publishes Claude's Model Spec: a document defining values, priorities, and expected behaviors, the first public behavioral governance standard for a commercial AI at scale.

AI Security AnthropicModel SpecAI Constitution

March 13, 2024 Landmark

EU AI Act: European Parliament adopts the first comprehensive AI law

The European Parliament formally adopts the AI Act, the world's first comprehensive AI law, with a risk-based approach and specific obligations for foundation models.

AI Security EU AI ActRegulationEurope

March 13, 2024 High

Figure 01 + OpenAI: first end-to-end LLM-driven humanoid demo

Figure publishes a video of its Figure 01 humanoid conversing, recognizing objects, and manipulating them using OpenAI models for language and vision, in an end-to-end pipeline.

Robotics FigureOpenAIHumanoid

March 12, 2024 High

Devin: the first 'autonomous AI engineer' goes viral

Cognition Labs unveils Devin, an AI agent that plans, codes, debugs and executes software tasks end-to-end. Viral demo, SWE-bench 13.86%. Defines the 'AI software engineer' category.

Agents CognitionDevinAutonomous Agent

March 12, 2024 Landmark

Devin: 13.86% on SWE-bench, the first autonomous AI software engineer

Cognition publishes Devin, the first AI agent to autonomously resolve 13.86% of real bugs on SWE-bench full, ten times above GPT-4 without external scaffolding.

AI Coding Autonomous AgentSWE-benchCode Agent

March 8, 2024 High

IDEFICS2: 8B open multimodal with native PDF and OCR training

HuggingFace releases IDEFICS2, 8B parameters Apache 2.0, natively trained on PDF and OCR data, with superior text-in-image handling over predecessors.

Multimodal AI IDEFICS2HuggingFaceOCR

March 7, 2024 Medium

Microsoft TaskWeaver: every task becomes executable Python code

Microsoft's TaskWeaver is a code-first agent framework that converts every request into executable Python code in a sandbox, with persistent state between steps and a structured plugin system.

Agents TaskWeaverMicrosoftCode-First

March 5, 2024 High

Stable Diffusion 3: Diffusion Transformer architecture and improved text

Stability AI announces SD3 with a Multi-Modal Diffusion Transformer (MMDiT) architecture, text rendering competitive with Imagen 2 and DALL-E 3, and visual quality superior to SDXL.

Image & Video Gen Stability AIStable Diffusion 3MMDiT

March 4, 2024 Landmark

Claude 3 (Opus, Sonnet, Haiku): Anthropic surpasses GPT-4

Anthropic ships the Claude 3 family in three sizes. Opus, the flagship, beats GPT-4 on MMLU, HumanEval, MATH. Native multimodal vision. For the first time GPT-4 is no longer the outright leader.

Foundation Models AnthropicClaude 3Opus

February 2024

February 29, 2024 Medium

Stable Audio 2.0: stereo music up to 3 minutes with structure control

Stability AI launches Stable Audio 2.0 with stereo audio generation up to 3 minutes, explicit control over intro/outro/instruments, and 44kHz quality, surpassing previous version limits.

Voice & Audio Stability AIStable AudioMusic Generation

February 28, 2024 Medium

Crescendo: the multi-turn jailbreak that bypasses guardrails through gradual escalation

Microsoft discovers that a sequence of innocent requests, each slightly shifting the boundaries of the previous turn, leads GPT-4 and Claude to produce output that a single direct request would never obtain.

AI Security JailbreakMulti-TurnMicrosoft

February 26, 2024 Medium

Mistral Large and Le Chat: Mistral's commercial pivot with Microsoft partnership

Mistral AI announces Mistral Large, a closed flagship model with near-GPT-4 performance, and Le Chat (consumer interface). In parallel it signs a strategic Microsoft partnership for Azure distribution.

Foundation Models MistralMistral LargeLe Chat

February 23, 2024 Medium

Unitree H1 Ultra: the first humanoid accessible for academic research

Unitree launches H1 Ultra at 90,000 dollars: RL-based locomotion humanoid capable of backflips and 3.3 m/s, the first bipedal robot accessible to university labs.

Robotics UnitreeH1Humanoid Robot

February 22, 2024 High

Groq LPU: 500-tokens-per-second inference goes viral

Groq's public demo on Llama 2 70B generates ~500 tokens/sec, orders of magnitude faster than any GPU. LLM latency stops being a given.

AI Infrastructure GroqLPUInference

February 22, 2024 Medium

Stable Video Diffusion 1.1: video from a single image with motion control

Stability AI releases SVD 1.1 with multi-frame video generation from a single image, MotionID for motion intensity control, and open-source weights on HuggingFace.

Image & Video Gen Stability AISVDVideo Generation

February 21, 2024 Medium

Devika: the first open-source alternative to Devin explodes on GitHub

Mufeed VH publishes Devika, an open-source AI software engineer agent: accepts high-level programming objectives, decomposes them, searches the web, writes code and runs tests. First real open alternative to Devin. 15k GitHub stars in 72 hours.

Agents Devikaopen sourcesoftware engineer agent

February 21, 2024 High

Gemma: Google enters the open-weights game

Google releases Gemma 2B and 7B, open-weight models derived from Gemini research. For the first time Google competes directly with Llama and Mistral on open ground.

Open Source Models GoogleGemmaOpen Weights

February 20, 2024 Medium

Box AI: questions and summaries on enterprise documents with page citations

Box integrates native AI into its enterprise cloud platform: answers questions on documents with source page citations, developer API, and Salesforce integration.

Enterprise AI BoxBox AIDocument AI

February 19, 2024 Medium

ComfyUI reaches 30k GitHub stars: node-based interface becomes the standard for advanced workflows

ComfyUI surpasses 30,000 GitHub stars, establishing itself as the de facto interface for advanced Stable Diffusion workflows thanks to its visual node system and very active community.

Image & Video Gen ComfyUIStable DiffusionNode-Based

February 15, 2024 High

Gemini 1.5 Pro: 1 million tokens in context

Google announces Gemini 1.5 Pro: Mixture of Experts architecture, 128K standard context, 1M in preview. New benchmark: near-perfect 'needle in a haystack' retrieval over long inputs.

Foundation Models GoogleGemini 1.5Long Context

February 15, 2024 Landmark

Sora: OpenAI shows cinema-quality AI video

OpenAI announces Sora, a text-to-video model producing 1080p clips up to 60 seconds with temporal consistency, plausible physics, and realistic camera moves. Limited release to red-teamers and selected artists.

Image & Video Gen OpenAISoraText-to-Video

February 14, 2024 High

Google Gemini for Workspace: Duet AI becomes Gemini, reaching 3 billion users

Google renames Duet AI to Gemini and embeds Gemini 1.0 Pro across all Workspace products: Gmail, Docs, Sheets, Meet. Available on all Business and Enterprise tiers. The first Gemini integration at maximum scale in daily productivity tools.

Enterprise AI Google GeminiWorkspaceGmail AI

February 13, 2024 Medium

ChatGPT Memory: cross-conversation persistence for OpenAI models

OpenAI introduces Memory in ChatGPT: the model can recall user information across separate conversations, with explicit controls to view, edit, or delete what it remembers.

Foundation Models OpenAIChatGPTMemory

February 8, 2024 High

Ollama Modelfile and REST API: local LLMs enter dev workflows

Ollama introduces the Modelfile (like a Dockerfile for LLMs), an OpenAI-compatible REST API, and a public registry with 100+ ready-to-use models.

Local AI OllamaModelfileREST API

February 8, 2024 High

Qwen-1.5: 0.5B-110B family with 32k context and 30+ languages

Alibaba Cloud releases Qwen-1.5, a 0.5B-to-110B parameter family with native 32k context support, GQA, bilingual EN/ZH, instructions in 30+ languages, and RLHF chat.

Foundation Models QwenAlibabaMultilingual

February 7, 2024 High

Google Vertex AI + Gemini: enterprise AI with business data and guaranteed SLAs

Gemini lands on Vertex AI for enterprise: fine-tuning, grounding on business data, enterprise SLAs, HIPAA/SOC2 compliance, and native BigQuery integration.

Enterprise AI GoogleVertex AIGemini

February 6, 2024 High

Indirect Prompt Injection: the attack vector in RAG systems and AI agents

Greshake et al. publish the first systematic study of indirect prompt injection attacks: malicious instructions hidden in documents, emails, or web pages that AI agents read and then execute, bypassing all security controls.

AI Security indirect prompt injectionRAG securityagent security

February 5, 2024 High

AMD ROCm 6.0: Production-Grade LLM Support Breaking NVIDIA's Near-Monopoly

ROCm 6.0 brings native PyTorch 2.x support, hipBLASLt, hipGRAPH, and official vLLM integration on AMD Instinct MI300X GPUs, enabling LLM training and serving for the first time without manual patches.

AI Infrastructure ROCm 6AMDMI300X

January 2024

January 31, 2024 Medium

Mozilla llamafile: LLM in a single portable executable on any OS

Mozilla releases llamafile, a single-file executable combining llama.cpp with Cosmopolitan Libc to run LLMs on Linux, Windows, Mac, and BSD without any installation, directly from CPU or GPU.

AI Infrastructure llamafileMozillaLLM

January 30, 2024 High

InternVL: 6B-parameter visual encoder on par with GPT-4V

Shanghai AI Lab releases InternVL with an open-source 6B-parameter visual encoder, achieving GPT-4V-comparable performance on multimodal benchmarks.

Multimodal AI InternVLOpen SourceVisual Encoder

January 30, 2024 High

OLMo: the first truly open model — weights, data, code, and checkpoints

AllenAI releases OLMo with weights, the full Dolma dataset (3T tokens), training code, and all intermediate checkpoints, making the entire LLM training process scientifically reproducible for the first time.

Foundation Models OLMoAllenAIopen source

January 29, 2024 Medium

Code Llama 70B: Meta brings the Llama 2 code branch to GPT-3.5 level

Meta releases Code Llama 70B (base, Python, Instruct), the largest member of the code-specialized family derived from Llama 2, with HumanEval results comparable to GPT-3.5.

AI Coding MetaCode LlamaOpen Source

January 25, 2024 Medium

Ideogram 1.0: the image generator that can actually write text

Ideogram AI launches version 1.0 with text rendering superior to Midjourney and DALL-E 3, templates for design and poster creation, and a branding-oriented interface.

Image & Video Gen IdeogramText RenderingText-to-Image

January 25, 2024 Medium

Ideogram 1.0: readable text in generated images, the historic gap closes

Ideogram launches stable version 1.0 with excellent text rendering, closing the historic weak point of all previous diffusion models in generating coherent text within images.

Image & Video Gen IdeogramText RenderingImage Generation

January 18, 2024 Medium

Moondream 1: the 1.6B VLM that runs on Raspberry Pi

Moondream is a 1.6B parameter VLM capable of captioning, VQA, and object detection on edge hardware like Raspberry Pi and Android smartphones.

Multimodal AI Edge AIVLMTiny Model

January 18, 2024 Medium

OpenVLA: the first open-source Vision-Language-Action model for generalist robotics

Berkeley and Stanford researchers release OpenVLA, 7B parameters, the first open-source VLA for generalist robot control — a universal controller downloadable from Hugging Face.

Robotics OpenVLABerkeleyOpen Source

January 17, 2024 High

AlphaGeometry: DeepMind solves olympiad-level geometry

DeepMind publishes AlphaGeometry in Nature, a neuro-symbolic system that solves International Mathematical Olympiad geometry problems at medal level, without human-annotated training data.

Foundation Models DeepMindAlphaGeometryReasoning

January 17, 2024 High

CrewAI: AI agent teams with roles, goals and backstories like an office

CrewAI launches a Python framework for orchestrating teams of LLM agents with defined roles, individual objectives, and backstories, supporting both sequential and parallel processes.

Agents CrewAIMulti-AgentRoles

January 15, 2024 High

Open WebUI: ChatGPT-style web interface for Ollama with multi-user and history

Open WebUI (formerly Ollama WebUI) delivers a full web interface for Ollama: multi-user chat, persistent history, document upload, all in a single Docker container.

Local AI Open WebUIOllamaChatGPT UI

January 15, 2024 High

SAP Joule: native AI copilot across the entire ERP stack

SAP integrates Joule across its full stack (S/4HANA, SuccessFactors, Ariba): natural-language queries on ERP data, automated workflows, available to 300 million SAP users.

Enterprise AI SAPJouleERP

January 12, 2024 Medium

Garak: the open source vulnerability scanner for LLMs

NVIDIA releases Garak, an open source tool for automated LLM vulnerability scanning: tests hallucination, prompt injection, jailbreak, and over 80 automatic probes on any API-accessible model.

AI Security NVIDIAGarakVulnerability Scanning

January 12, 2024 Medium

MeloTTS: real-time multilingual TTS on CPU at 50MB

MeloTTS is the first production-quality multilingual TTS to run in real-time on CPU, weighing just 50MB and supporting English, Chinese, Japanese, Korean, Spanish and French.

Voice & Audio MeloTTSmultilingualreal-time

January 10, 2024 High

DROID: the most diverse robot manipulation dataset with 76,000 demonstrations

Stanford, Berkeley, and CMU release DROID, the most diverse robot manipulation dataset ever collected: 76,000 demonstrations, 564 scenes, 86 tasks, 52 robot arms. It enables cross-embodiment generalization and is the reference for robot foundation models.

Robotics DROIDrobot datasetmanipulation

January 10, 2024 Medium

GPT Store: the custom GPTs marketplace opens

OpenAI launches the GPT Store inside ChatGPT: anyone with Plus/Team/Enterprise can publish custom GPTs. First serious attempt at an app store for AI agents.

Enterprise AI OpenAIGPT StoreGPTs

January 10, 2024 Medium

LlamaIndex 0.10 stable: the standard RAG framework for local LLMs

LlamaIndex reaches stable 0.10 with 150+ data connectors, full async support, streaming, and modular query engines — becoming the reference framework for RAG pipelines with local LLMs alongside LangChain.

Local AI LlamaIndexRAGdata ingestion

January 10, 2024 High

Sleeper Agents (Anthropic): backdoored models survive safety training

Anthropic demonstrates that LLMs with behavioral backdoors survive standard safety training, RLHF, and adversarial training. Chain-of-thought reasoning increases the persistence of dormant behavior rather than eliminating it.

AI Security Sleeper AgentsAnthropicBackdoor

January 8, 2024 Medium

DeepSpeed-FastGen: Dynamic SplitFuse scheduling for 2.3x throughput over vLLM in production

Microsoft DeepSpeed team releases FastGen via MII: Dynamic SplitFuse scheduling for LLM serving achieves 2.3x throughput vs vLLM on production chat workloads, optimized for Azure H100.

AI Infrastructure DeepSpeedFastGenMII

January 6, 2024 Medium

Apptronik Apollo: general purpose humanoid with open ROS2 API

Apptronik launches Apollo, a 1.73m 73kg humanoid with hot-swappable battery, 160W power draw and an open ROS2 API, with NASA and Mercedes-Benz partnerships already announced.

Robotics ApptronikApolloHumanoid Robot

January 3, 2024 Medium

StarCoder2: 619 languages, 4T tokens, and next-level data governance

BigCode releases StarCoder2 in three sizes (3B/7B/15B) trained on 4 trillion tokens from The Stack v2 covering 619 languages, with the most transparent data governance system yet seen for a coding model.

AI Coding StarCoder2BigCodeThe Stack v2

2023

December 2023

December 18, 2023 Medium

AnythingLLM: full local RAG with web UI and embedded vector DB

AnythingLLM delivers a full-stack RAG system with a web interface, Ollama/LocalAI LLM backend support, and an embedded vector database, all offline in a single container.

Local AI AnythingLLMRAG LocaleVector DB

December 15, 2023 Medium

StyleTTS2: open source TTS with style diffusion outperforms Voicebox on intelligibility

StyleTTS2 uses style diffusion and adversarial training to generate human-level natural voices on LJSpeech, open source, surpassing Voicebox on intelligibility.

Voice & Audio StyleTTS2TTSStyle Diffusion

December 12, 2023 Medium

Phi-2: Microsoft's 2.7B model that beats a 13B

Microsoft Research releases Phi-2, 2.7B params trained on 'textbook-quality' data. Beats LLaMA 2 7B and Mistral 7B on reasoning benchmarks, runs on laptops. 'Small + clean data' philosophy.

Local AI MicrosoftPhi-2SLM

December 11, 2023 Landmark

Mixtral 8x7B: open-source Mixture of Experts that beats GPT-3.5

Mistral drops Mixtral 8x7B via magnet link with no warning: SMoE with 8 experts of 7B, 13B active params out of 47B total. Performance matches/exceeds GPT-3.5. Apache 2.0.

Open Source Models MistralMixtralMoE

December 7, 2023 High

Tesla Optimus Gen 2: handles raw eggs with per-finger force sensors

Tesla shows Optimus Gen 2 with 30% faster movement, per-finger force sensors, and demonstrated ability to manipulate raw eggs without breaking them.

Robotics TeslaOptimusHumanoid Robot

December 6, 2023 Landmark

Google Gemini 1.0: natively multimodal in three sizes

Google announces Gemini Ultra/Pro/Nano, the first family of natively multimodal models (text, images, audio, video). Ultra beats GPT-4 on MMLU 90.0% vs 86.4%. Controversial demo video.

Foundation Models GoogleGeminimultimodal

December 5, 2023 Medium

Jan.ai: open source desktop app for local LLMs with threads and local server

Jan.ai launches its first stable release: an open source local LLM client with persistent threads, an extension system, and a built-in OpenAI-compatible server.

Local AI Jan.aiDesktop AppOpen Source

December 5, 2023 High

MLX: Apple Research brings native machine learning to Apple Silicon

Apple Research releases MLX, an open source ML framework optimized for M1/M2/M3: it leverages unified CPU-GPU memory for LLM inference at near-discrete-GPU performance.

Local AI MLXApple SiliconM1 M2 M3

December 5, 2023 High

Mobile ALOHA: low-cost whole-body manipulation for complex household tasks

Stanford combines bimanual ALOHA arms with a mobile wheeled platform, creating the first low-cost system for whole-body manipulation. With 50 demonstrations it learns to cook, do laundry, and clean, opening the path to accessible household robots.

Robotics Mobile ALOHAbimanualmobile robot

November 2023

November 29, 2023 Medium

JetBrains AI Assistant: native AI across all JetBrains IDEs

JetBrains launches AI Assistant out of beta, bringing intelligent refactoring, automatic documentation, and code chat to all its IDEs: IntelliJ, PyCharm, GoLand, WebStorm, and others.

AI Coding JetBrainsAI AssistantIntelliJ

November 22, 2023 High

Yi-34B: bilingual EN/ZH model in the open-weight top-3 of November 2023

01.ai by Kai-Fu Lee releases Yi-34B: 34B parameters trained on 3.1T tokens, modified Llama-2 architecture, bilingual EN/ZH, top-3 open weight in November 2023.

Foundation Models Yi-34B01.aiKai-Fu Lee

November 21, 2023 High

Claude 2.1: 200K context and fewer hallucinations

Anthropic ships Claude 2.1: 200K-token context window (~500 pages), 2× reduction in false statements on borderline questions, tool use in beta. Reply to GPT-4 Turbo 128K.

Foundation Models AnthropicClaude 2.1200K context

November 21, 2023 High

OpenAI launches TTS API: six voices, streaming and aggressive pricing

OpenAI launches its TTS API with 6 voices, pricing at $0.015 per 1000 characters, low latency streaming, and direct integration into the ChatGPT and Assistants ecosystem.

Voice & Audio OpenAITTSAPI

November 16, 2023 Medium

Google MusicLM: generating music from text goes public

Google makes MusicLM publicly available via Google Labs: musical generation from text description in a specific style, the first consumer music AI integration from a big tech company.

Voice & Audio GoogleMusicLMMusic Generation

November 15, 2023 Medium

Solar 10.7B: depth upscaling to merge layers from two LLaMA-2 models

Upstage presents Solar 10.7B, created by merging intermediate layers of two fine-tuned LLaMA-2 models (depth upscaling), winning the MBTI-OpenLLM leaderboard in November 2023.

Foundation Models SolarUpstageDepth Upscaling

November 14, 2023 Medium

LLaVA-NeXT and VideoLLaVA: LLaVA conquers video

LLaVA extends to video with frame sampling and temporal positional encoding, achieving competitive results on NExT-QA and ActivityNet without dedicated video training.

Multimodal AI VLMVideo UnderstandingLLaVA

November 12, 2023 High

Amazon Q Developer: the AI assistant that knows AWS from the inside

Amazon Q Developer brings AI coding directly into AWS consoles and IDEs: explains cloud resources, debugs errors, automatically migrates Java legacy code, and updates dependencies.

AI Coding AWSIDE AssistantCode Migration

November 7, 2023 Landmark

Ollama 0.1: pull and run local LLMs with one command, Docker-style

Ollama launches version 0.1: a minimal CLI to download and run local LLM models with a single command, reducing setup complexity to zero.

Local AI OllamaCLILLM Locale

November 6, 2023 High

OpenAI DevDay: GPT-4 Turbo, GPTs, Assistants API in one hour

At OpenAI's first developer conference: GPT-4 Turbo (128K context, lower prices), GPTs (shareable custom ChatGPTs), Assistants API (managed agents). Product + dev pivot.

Foundation Models OpenAIDevDayGPT-4 Turbo

November 4, 2023 Medium

Grok-1: xAI's chatbot with real-time access to X data

Elon Musk's xAI launches Grok-1, a model integrated with X (Twitter) for real-time information, with a 314B MoE architecture released as open weights in March 2024.

Foundation Models Grok-1xAIElon Musk

November 4, 2023 Medium

Pika 1.0: text and image to video for the mass market

Pika Labs launches Pika 1.0: a consumer platform for video generation from text or image, region animation, and aspect ratio control. Reaches 500k Discord users. Funded by Khosla Ventures at $55M.

Image & Video Gen Pika 1.0text-to-videoconsumer AI

November 1, 2023 Landmark

Bletchley AI Safety Summit: the first international agreement on frontier AI risks

28 nations sign the Bletchley Declaration on catastrophic frontier AI risks. The first AI Safety Institute (UK) is established. First international diplomatic agreement specifically dedicated to AI.

AI Security BletchleyAI Safety Summitinternational

November 1, 2023 Landmark

Microsoft 365 Copilot GA: available at 30 dollars per user per month

Microsoft 365 Copilot reaches general availability at 30 USD/user/month. Copilot Studio also launches for building custom enterprise agents.

Enterprise AI Microsoft 365CopilotGA

October 2023

October 30, 2023 Landmark

Executive Order 14110: the first comprehensive US federal AI safety regulation

Biden signs the most sweeping executive order ever issued on AI: mandatory safety tests before frontier model releases, NIST standards for AI red-teaming, watermarking research, and new immigration rules for AI talent.

AI Security Executive OrderBidenAI safety

October 26, 2023 Medium

Whisper Large v3: improved multilingual ASR trained on 5 million hours

Whisper Large v3 reduces error rates on low-resource languages, improves timestamp accuracy and adds new language support, remaining the most widely deployed open-source ASR model.

Voice & Audio Whisper Large v3ASRspeech recognition

October 25, 2023 High

Latent Consistency Models: real-time image generation in 4 steps

Tsinghua University publishes LCM: distillation of a diffusion model reducing sampling from 50 steps to 4 with minimal quality loss. LCM-LoRA makes any SD model 10x faster. First technique enabling real-time generation on consumer hardware.

Image & Video Gen LCMlatent consistencydistillation

October 25, 2023 High

Zephyr-7B: DPO on Mistral 7B beats Llama-2-70B-chat on MT-Bench

HuggingFace trains Zephyr-7B with dSFT + Direct Preference Optimization on Mistral 7B base, achieving an MT-Bench score higher than Llama-2-70B-chat with 10x fewer parameters.

Foundation Models ZephyrHuggingFaceDPO

October 25, 2023 Medium

Zoom AI Companion: meeting summaries and action items included in the base plan

Zoom bundles AI Companion into Pro plans at no extra cost: summarises meetings in real-time, extracts automatic action items, and replies in Zoom chat.

Enterprise AI ZoomAI CompanionMeeting AI

October 23, 2023 Medium

Sanctuary AI Phoenix: the robot that understands complex natural language instructions

Sanctuary AI introduces Phoenix with Carbon AI, a neuro-symbolic system combining symbolic reasoning and neural nets to follow articulated linguistic instructions without explicit programming.

Robotics Sanctuary AIPhoenixCarbon AI

October 22, 2023 High

Eureka: NVIDIA uses GPT-4 to write reward functions and train expert robots

NVIDIA presents Eureka, the first system to use an LLM (GPT-4) to automatically generate reward functions for robotic reinforcement learning. The system achieves expert-level dexterous manipulation, including pen spinning, without manual reward design.

Robotics EurekaNVIDIAreward function

October 20, 2023 High

Open X-Embodiment: the first generalist cross-robot robotics dataset

Google DeepMind and 33 labs collect 527k episodes from 22 different robots: the first unified dataset for training generalist policies that work across multiple platforms.

Robotics Google DeepMindOpen X-EmbodimentDataset

October 19, 2023 High

LangGraph: stateful agents as cyclic graphs with loops and branching

LangChain launches LangGraph, a framework for building agents as node graphs with persistent state, support for cycles, conditional branching, and parallel execution of complex workflows.

Agents LangGraphLangChainStateful Agents

October 16, 2023 High

MITRE ATLAS v2: the AI attack taxonomy updated with real case studies

MITRE releases ATLAS v2 (Adversarial Threat Landscape for AI Systems), an expanded taxonomy of AI system attack techniques with real adversarial ML case studies and mapping to MITRE ATT&CK.

AI Security MITREATLASAdversarial ML

October 16, 2023 Medium

OpenAgents: real agents for non-programmers via web interface

XLab (SUTD Singapore) publishes OpenAgents: a deployable platform with three specialized agents (web browsing, data analysis, code execution) accessible from a browser without API keys. First demonstration of real agentic capabilities for non-technical users, with complete open-source code.

Agents OpenAgentsweb browsingdata analysis

October 11, 2023 Medium

WizardCoder: evolutionary instructions for GPT-4 level code generation

The WizardLM team applies Evol-Instruct to code, iteratively rewriting problems to increase complexity. WizardCoder-34B achieves 73.2% on HumanEval, matching GPT-4 at release time.

AI Coding WizardCoderEvol-InstructHumanEval

October 6, 2023 Medium

AgentBench: the first benchmark that measures LLMs as real agents

Tsinghua presents AgentBench, the first comprehensive benchmark for LLM agents across 8 operational environments, revealing a massive gap between GPT-4 and open-source models.

Agents TsinghuaAgentBenchBenchmark

October 5, 2023 High

LLaVA-1.5: open-source vision-language that beats benchmarks with minimal data

LLaVA-1.5 combines CLIP ViT-L, a two-layer MLP projection, and Vicuna to surpass 11 multimodal benchmarks using only 1.2M fine-tuning examples.

Image & Video Gen LLaVAVision-LanguageCLIP

October 4, 2023 High

Falcon-180B: the world's largest open-source model in 2023

The Technology Innovation Institute releases Falcon-180B, the largest openly available model at 180 billion parameters trained on 3.5 trillion tokens, topping the HuggingFace Open LLM Leaderboard.

Foundation Models Falcon-180BTIIopen source

October 3, 2023 High

DALL-E 3: images that actually follow instructions

OpenAI launches DALL-E 3 integrated into ChatGPT: dramatically improved prompt adherence over DALL-E 2, automatic caption synthesis for training, more readable text in images.

Image & Video Gen OpenAIDALL-E 3Text-to-Image

October 3, 2023 High

CogVLM: separate visual expert prevents language degradation

Tsinghua introduces CogVLM with a visual expert module independent from LLM parameters, eliminating performance degradation on pure text and reaching SOTA on VQA and OCR.

Multimodal AI CogVLMVisual ExpertVQA

September 2023

September 28, 2023 High

AudioPaLM: the first LLM that processes and generates audio as text

AudioPaLM fuses PaLM-2 with an audio tokenizer to create an LLM that natively processes audio and text tokens, enabling speech translation while preserving speaker identity.

Voice & Audio AudioPaLMGoogleaudio LLM

September 28, 2023 Medium

HuggingFace Chat UI: open-source chat interface for any HF model

HuggingFace open-sources chat.huggingface.co: a self-hostable web interface via Docker for Llama 2, Mistral, Code Llama, and custom models, with support for tool calls and web search.

Local AI HuggingFace Chat UIopen sourcechat interface

September 27, 2023 High

Mistral 7B: Europe joins the open-source race

Mistral AI (Paris), a three-month-old startup founded by ex-Meta/DeepMind researchers, releases Mistral 7B under Apache 2.0. Beats Llama 2 13B on most benchmarks with half the parameters.

Open Source Models MistralMistral 7BOpen Source

September 27, 2023 High

PAIR: automated LLM-vs-LLM jailbreaking

CMU and UPenn publish PAIR: an attacker LLM that automatically refines its prompts against a target LLM, finding effective jailbreaks in under 20 queries with no human in the loop.

AI Security PAIRjailbreakautomated

September 27, 2023 High

NVIDIA TensorRT-LLM: automatic LLM compilation for GPUs with FP8 and multi-GPU

NVIDIA open-sources TensorRT-LLM, a framework for compiling and optimizing LLMs for NVIDIA GPUs with out-of-the-box FP8, INT4, sparse attention, and multi-GPU tensor parallelism support.

AI Infrastructure NVIDIATensorRT-LLMFP8

September 26, 2023 Medium

Microsoft Copilot in Windows 11: system-level AI for consumers

With update 23H2, Windows 11 integrates Copilot by default as a system side panel. Bing Chat is rebranded to Copilot. AI as an OS feature, not an app.

Enterprise AI MicrosoftCopilotWindows 11

September 25, 2023 High

ChatGPT can see, hear, and speak: voice + vision in mobile app

ChatGPT Plus on iOS/Android gets voice conversations (5 synthetic voices) and image input (GPT-4V). From text chat to a full conversational assistant.

Multimodal AI OpenAIChatGPTvoice

September 25, 2023 High

GPT-4V: ChatGPT learns to see (for real)

OpenAI activates GPT-4's vision capabilities in ChatGPT (announced six months earlier) and adds voice. Upload an image, talk about it, ask for analysis. Multimodality enters the consumer product.

Multimodal AI OpenAIGPT-4VVision

September 21, 2023 Medium

Slack AI: channel summaries and smart search in workplace chat

Slack integrates native AI into Pro+ plans: summarises channels and threads, answers questions about conversation history, supports Claude and OpenAI as LLM providers.

Enterprise AI SlackSalesforceProductivity

September 18, 2023 High

Adobe Firefly Enterprise: indemnified image generation for brands

Adobe launches Firefly Enterprise in Creative Cloud Teams with legal copyright indemnification and enterprise brand guidelines control over every generated image.

Enterprise AI AdobeFireflyGenerative AI

September 15, 2023 Medium

ExLlamaV2: high-speed quantized LLM inference on consumer GPUs

ExLlamaV2 introduces the EXL2 format with per-layer mixed bit-rates (2-8 bit), delivering higher throughput than llama.cpp on NVIDIA GPUs and enabling 70B models to run on a single RTX 3090.

AI Infrastructure ExLlamaV2EXL2Quantizzazione

September 14, 2023 High

Medusa: multi-head speculative decoding without a separate draft model, 2.2x speedup

Cornell/UIUC introduce Medusa: N additional decoding heads on the main model predict N tokens ahead simultaneously, 2.2x speedup without needing a second draft model.

AI Infrastructure MedusaSpeculative DecodingMulti-Head

September 14, 2023 High

Backdoors in fine-tuned LLMs: hidden behaviors activatable on command

Researchers demonstrate that fine-tuned LLMs can contain silent behavioral backdoors, activatable only when specific triggers invisible during normal model evaluation are present.

AI Security BackdoorSleeper AgentsFine-tuning

September 13, 2023 High

Adobe Firefly 1.0 GA: image generation on licensed content, Generative Fill in Photoshop

Adobe launches Firefly 1.0 GA, the first image generation model trained exclusively on licensed content, integrated into Photoshop as Generative Fill for commercially safe use.

Image & Video Gen Adobe FireflyGenerative FillLicensed Content

September 12, 2023 Medium

IP-Adapter: transfer style and subject from a reference image

Tencent AI Lab releases IP-Adapter, a lightweight adapter for Stable Diffusion that conditions generation on a reference image without retraining the base model.

Image & Video Gen TencentIP-AdapterStable Diffusion

September 10, 2023 High

Open Interpreter: LLM that executes code locally

An LLM running locally that can write and execute Python, JS, and Shell code autonomously, browse the web, and modify files on your computer.

Local AI Open InterpreterCode ExecutionLLM

September 6, 2023 High

Phi-1.5: big-model reasoning in just 1.3 billion parameters

Microsoft Research shows that 1.3B parameters trained on 'textbook quality' synthetic data produce multi-step reasoning comparable to models five times larger.

Foundation Models Phi-1.5small language modelsynthetic data

September 5, 2023 High

LM Studio: desktop GUI to download and run GGUF models with OpenAI server

LM Studio launches its first public release: a graphical interface to browse, download, and use local LLMs with a built-in chat and OpenAI-compatible server.

Local AI LM StudioGGUFGUI Desktop

September 1, 2023 High

Meta AudioCraft: open source suite for music and audio from text

Meta releases AudioCraft, an open source suite including MusicGen for generating structured music and AudioGen for ambient sounds, both controllable via text description.

Voice & Audio MetaAudioCraftMusicGen

August 2023

September 25, 2023 High

Anthropic + AWS: 1.25 billion investment to bring Claude to Amazon Bedrock

AWS invests 1.25 billion dollars in Anthropic. Claude becomes available on Amazon Bedrock using dedicated Trainium and Inferentia infrastructure.

Enterprise AI AnthropicAWSClaude

August 28, 2023 Medium

ChatGPT Enterprise: unlimited GPT-4, locked-down data

OpenAI launches the enterprise ChatGPT plan: unlimited GPT-4, 32K context, advanced data analysis included, SOC 2, customer data never used for training. Reply to IT concerns.

Enterprise AI OpenAIChatGPT EnterpriseGPT-4

August 25, 2023 Medium

SuperAGI: the first open-source autonomous agent platform with a GUI

SuperAGI offers an open-source platform for autonomous agents with a web dashboard, tool marketplace, and the ability to run agents in the background without writing code. First solution to bring the 'monitor agent' experience to non-programmers. Concurrent with AutoGPT but more production-oriented.

Agents SuperAGIautonomous agentopen source

August 24, 2023 High

Code Llama: serious open-source coding model

Meta releases Code Llama (7B, 13B, 34B), a code-specialized fine-tune of Llama 2. Three variants per size: base, Python-specific, instruction-tuned. Llama 2 commercial license.

AI Coding MetaCode LlamaOpen Source

August 20, 2023 High

AnimateDiff: bring motion to any Stable Diffusion model

Shanghai AI Lab publishes AnimateDiff: a plug-in motion module that adds temporal consistency to any existing SD checkpoint, turning every image-only model into a video generator without retraining it.

Image & Video Gen AnimateDiffmotion moduleStable Diffusion

August 19, 2023 High

DeepSeek-Coder v1: China enters the open source coding model race

DeepSeek releases coding models from 1B to 33B parameters trained on 2 trillion tokens with advanced FIM training, topping HumanEval among all open-weight models.

AI Coding DeepSeek-Codercode modelFIM

August 15, 2023 Medium

OpenFlamingo (LAION/UW): open reproduction of Flamingo with multi-image few-shot visual learning

LAION and University of Washington release OpenFlamingo, an open-source reproduction of DeepMind's Flamingo: few-shot visual learning from image+text examples, available in 3B and 9B parameter variants. The first open model enabling multimodal research without API costs.

Multimodal AI OpenFlamingoFlamingoopen source

August 7, 2023 Medium

Google TPU v5e: Cost-Optimized AI Chip for Enterprise Inference

Google announces TPU v5e, a cost-optimized AI chip with 4x better performance per dollar compared to TPU v4 for inference, available through Google Kubernetes Engine for containerized workloads.

AI Infrastructure TPU v5eGoogleinference

August 4, 2023 Medium

Sourcegraph Cody: AI with full codebase context, not just the open file

Sourcegraph launches Cody in beta, an AI code assistant that understands the entire codebase — dependencies, architecture, cross-file relationships — thanks to Sourcegraph's code index.

AI Coding SourcegraphCodyCodebase Context

August 1, 2023 High

OWASP LLM Top 10: the 10 critical vulnerabilities in AI applications

OWASP publishes the first official list of the 10 most critical vulnerabilities in LLM applications, from prompt injection to insecure output handling, now the industry reference standard.

AI Security OWASPLLM Top 10Vulnerabilità

July 2023

July 28, 2023 High

RT-2: the robot that reasons with a language model

DeepMind's RT-2 merges vision-language pretraining with robot control, transferring semantic reasoning from the web to a physical arm without task-specific training.

Robotics DeepMindRT-2VLA

July 28, 2023 High

FlashAttention-2: rewrite with 2x speedup, MQA/GQA support, and head-dim 256

Tri Dao rewrites FlashAttention with 2x speedup over FA1: better parallelism across seq-len, head-dim support up to 256, query parallelism for MHA, MQA, and GQA. De facto training standard.

AI Infrastructure FlashAttention-2AttentionTransformer

July 28, 2023 High

Orca: learning GPT-4 reasoning through explanation traces

Microsoft Research trains Orca 13B on step-by-step GPT-4 explanations (explanation traces), outperforming ChatGPT on BigBench and AGIEval with 13 billion parameters.

Foundation Models OrcaMicrosoftImitation Learning

July 26, 2023 High

Stable Diffusion XL 1.0: the open-source quality jump

Stability ships SDXL 1.0 (3.5B base + 6.6B refiner), native 1024×1024 output, shorter prompts. Open source under commercial license, weights on HuggingFace.

Image & Video Gen Stability AISDXLStable Diffusion

July 18, 2023 Landmark

Llama 2: weights become commercially usable

Meta releases Llama 2 (7B, 13B, 70B) under a license that allows commercial use up to 700M MAU. For the first time a serious LLM is genuinely deployable to production without depending on an API.

Open Source Models MetaLlama 2Open Weights

July 17, 2023 High

SeamlessM4T: Meta's universal speech translation model for 100+ languages

SeamlessM4T is the first multimodal system to handle speech-to-text, text-to-speech, and speech-to-speech across 100+ languages in a single model, powering Meta's real-time translation features.

Voice & Audio SeamlessM4TMetaspeech translation

July 15, 2023 High

AutoGen: Microsoft formalizes agent-to-agent communication

Microsoft Research publishes AutoGen, a framework where you define agents with different roles and let them converse with each other to solve a task. First framework to formalize the 'agent-to-agent communication' pattern. Becomes the foundation of many enterprise multi-agent workflows.

Agents AutoGenmulti-agentMicrosoft Research

July 13, 2023 High

WormGPT: the first commercial LLM built for cybercrime

The first LLM explicitly trained for criminal activity appears on the dark web: no safety filters, fine-tuned on malware data, sold as a monthly subscription.

AI Security WormGPTdark LLMcybercrime

July 11, 2023 High

Claude 2: 100K-token context, consumer access opens

Anthropic launches Claude 2 with a 100,000-token context window (~75,000 words) and opens claude.ai to the general public (initially US and UK). Long-context enters the mainstream.

Foundation Models AnthropicClaude 2100K Context

July 11, 2023 High

IBM launches watsonx.ai: governed foundation models for the enterprise

IBM unveils watsonx.ai at Think 2023: a platform featuring Granite models trained on curated data, a fine-tuning studio, AI factsheets for governance, and full data lineage. Built for banking, healthcare, and government.

Enterprise AI IBMwatsonxGranite

July 10, 2023 High

Universal adversarial attacks on LLMs: transferable jailbreaks across GPT-4, Claude, and Gemini

Zou et al. (CMU) demonstrate optimized suffixes that simultaneously jailbreak GPT-3.5/4, Claude, and Gemini: the first systematic proof of attack transferability across different models.

AI Security JailbreakAdversarial AttackCMU

July 9, 2023 High

Reflexion: agents that learn from mistakes without gradient updates

MIT and Northeastern propose Reflexion: agents that self-reflect in natural language after each failure, accumulating insights in episodic memory without modifying weights.

Agents MITNortheasternReflexion

July 8, 2023 High

MetaGPT: agents with company roles that write software together

MetaGPT assigns each LLM agent a specific company role (PM, Architect, Engineer, QA) and has them collaborate to produce working code from a single text requirement.

Agents MetaGPTMulti-AgentSoftware Engineering

July 5, 2023 High

llama.cpp K-quants: the intelligent quantization that transformed local models

llama.cpp introduces K-quants (Q2_K through Q8_K): per-layer quantization assigning different bit-widths based on tensor importance. Q4_K_M matches Q5_1 quality at a smaller file size, becoming the de facto standard for all modern GGUF models.

Local AI llama.cppK-quantsGGUF

June 2023

June 25, 2023 Medium

GPT-Engineer: generate an entire software project from a single sentence

Anton Osika publishes GPT-Engineer on GitHub: describe what you want in natural language, the agent asks clarifying questions, then writes all the files and runs them. 50k stars in one week. First viral implementation of the 'one-shot project generator' concept.

Agents GPT-Engineercode generationproject scaffolding

June 22, 2023 High

AWQ: activation-aware 4-bit quantization for edge deployment with accuracy above GPTQ

MIT Han Lab publishes AWQ: 4-bit quantization that preserves salient weights identified through activation analysis, achieving better accuracy-throughput than GPTQ for edge deployment.

AI Infrastructure AWQQuantizzazione4-bit

June 20, 2023 Medium

Lakera Guard: real-time protection for LLMs in production

Lakera Guard is a SaaS API that protects LLM applications from prompt injection, jailbreak, and PII leakage with sub-millisecond latency, designed for high-traffic production environments.

AI Security LakeraPrompt InjectionJailbreak

June 16, 2023 High

Voicebox: Meta brings flow matching to TTS with audio editing and 6 languages

Voicebox uses flow matching with masked training to synthesize, edit, and transfer vocal styles across 6 languages, with no explicit cloning or fine-tuning.

Voice & Audio VoiceboxTTSFlow Matching

June 15, 2023 High

IDEFICS: the first open-source replica of Flamingo

HuggingFace releases IDEFICS, an open-weight replica of Flamingo in 9B and 80B versions, trained on LAION-5B and WikiMedia with few-shot visual in-context learning.

Multimodal AI VLMOpen SourceFew-Shot Learning

June 14, 2023 Medium

WizardLM: GPT-4-evolved instructions for fine-tuning

WizardLM uses Evol-Instruct — instructions automatically simplified and complicated by GPT-4 — achieving 97% of ChatGPT on WizardEval with a 70B model.

Foundation Models WizardLMEvol-InstructFine-tuning

June 13, 2023 High

Function calling: GPT learns to speak JSON

OpenAI adds 'function calling' to the API: the model returns structured JSON conforming to a schema, enabling reliable tool integrations without fragile prompt engineering.

AI Infrastructure OpenAIFunction CallingTool Use

June 12, 2023 Medium

Bark: open source TTS with laughter, sighs, and music from text

Suno AI releases Bark on HuggingFace: an open source TTS model capable of generating paralinguistics — laughter, sighs, sound effects, music — directly from text prompts.

Voice & Audio BarkSuno AITTS

June 8, 2023 High

GitHub Copilot X: in-IDE chat, test generation and Copilot for CLI

GitHub announces Copilot X with GPT-4-based chat integrated in VS Code, automatic PR description and test generation, a CLI assistant, and voice coding in preview.

AI Coding GitHubCopilotChat

June 8, 2023 High

Phi-1: 1.3B parameters beating models 10x larger on code

Microsoft Research releases Phi-1, 1.3B parameters trained on high-quality synthetic data ('textbooks'), outperforming models 10x larger on HumanEval.

Foundation Models Phi-1MicrosoftSmall Models

June 6, 2023 High

HuggingFace TGI: production-ready Docker container for LLM serving with continuous batching

HuggingFace releases Text Generation Inference, an optimized Docker container for serving LLMs in production with continuous batching, tensor parallelism, and integrated Flash Attention 2.

AI Infrastructure HuggingFaceTGILLM Serving

June 5, 2023 Medium

Gorilla: fine-tuned LLaMA that calls APIs without errors

UC Berkeley presents Gorilla, a retrieval-augmented fine-tuned LLaMA for accurate API calls: reduces API hallucination from 83% to 3%, outperforming GPT-4 on this task.

Agents UC BerkeleyGorillaLLaMA

June 1, 2023 High

Diffusion Policy: robot imitation learning goes multi-modal with diffusion models

MIT and Columbia apply denoising diffusion models to robot imitation learning, learning multi-modal action distributions instead of deterministic policies. They achieve a 46.9% improvement on manipulation benchmarks.

Robotics Diffusion Policyimitation learningdenoising diffusion

May 2023

May 30, 2023 High

InstructBLIP: visual instruction tuning on 26 datasets outperforms GPT-4V

Salesforce extends BLIP-2 with visual instruction tuning on 26 datasets, beating GPT-4V on visual reasoning benchmarks with an open architecture.

Multimodal AI InstructBLIPInstruction TuningVisual Reasoning

May 30, 2023 High

Tree of Thoughts: the LLM that reasons by exploring alternative branches

Princeton and DeepMind propose Tree of Thoughts: the LLM generates and evaluates multiple reasoning paths as a search tree, clearly outperforming Chain-of-Thought.

Agents PrincetonDeepMindTree of Thoughts

May 26, 2023 High

Stable Diffusion XL 0.9: dual-encoder and 1024x1024 resolution

Stability AI launches SDXL 0.9 beta with dual-encoder architecture and separate refiner model for photographic-quality 1024x1024 images.

Image & Video Gen Stable Diffusion XLSDXLStability AI

May 23, 2023 High

Microsoft Build 2023: Copilot everywhere, a shared plugin standard

At Build 2023 Microsoft announces Windows Copilot, Copilot in Edge and 365, and adopts OpenAI's plugin standard. Strategy: 'AI co-pilot' as the primary UI.

Enterprise AI MicrosoftBuildCopilot

May 22, 2023 High

Falcon 40B: first open-weight model to beat LLaMA 65B

The Technology Innovation Institute UAE releases Falcon 40B: trained on 1T tokens of RefinedWeb, it beats LLaMA 65B on benchmarks with a commercial license.

Foundation Models FalconOpen WeightsTII

May 18, 2023 High

SoundStorm: Google generates 30 seconds of natural dialogue in half a second

SoundStorm uses MaskGIT on EnCodec tokens to generate audio in parallel rather than token-by-token: 30s of dialogue in 0.5s, preserving speaker consistency.

Voice & Audio SoundStormAudio GenerationGoogle

May 17, 2023 High

Voyager: the AI agent that learns Minecraft forever, without reset

NVIDIA creates Voyager, a lifelong-learning agent in Minecraft that uses GPT-4 to write skills in JavaScript and accumulate them in a persistent library, never forgetting.

Agents NVIDIAVoyagerLifelong Learning

May 16, 2023 High

Palantir AIP: first public LLM agent demo on classified operational data

First public demonstration of an enterprise LLM agent on real, sensitive operational data: military logistics routing via natural language. AIP sandboxes LLM outputs from raw data access. A turning point for AI in defense and government.

Enterprise AI PalantirAIPenterprise agent

May 15, 2023 Medium

TidyBot: a tidying robot that learns your preferences via LLM

Stanford presents TidyBot, a robotic system that uses LLMs to personalize household tidying behavior from a few user examples. It achieves 91.2% task completion, demonstrating the feasibility of LLM-driven personalization in manipulation.

Robotics TidyBotStanfordLLM planning

May 14, 2023 High

privateGPT: chat with your documents, completely offline

imartinez publishes privateGPT: full RAG on PDFs and TXT with a local LLM, zero cloud data. Your knowledge base stays on your disk.

Local AI privateGPTRAGPDF Offline

May 12, 2023 High

GPT4All v2 (Nomic AI): one-click local AI for everyone

Nomic AI launches GPT4All v2: a desktop installer that downloads and runs quantized models with no command line required, including LocalDocs for private document Q&A with no internet connection.

Local AI GPT4AllNomic AIconsumer AI

May 11, 2023 High

LocalAI: OpenAI drop-in replacement with local models and full privacy

mudler releases LocalAI, an OpenAI-compatible REST server that runs GGML/GGUF models locally: migrate your apps from cloud to self-hosted by changing only the URL.

Local AI LocalAIOpenAI APIPrivacy

May 10, 2023 High

Google PaLM 2: the model that makes Bard fly

At Google I/O 2023, PaLM 2 replaces LaMDA in Bard. Four sizes (Gecko, Otter, Bison, Unicorn), strong multilingual support and improved reasoning. Spawns Med-PaLM 2 and Sec-PaLM.

Foundation Models GooglePaLM 2Bard

May 8, 2023 High

ServiceNow Now Assist: native LLM in enterprise ITSM

ServiceNow embeds an LLM directly into its ITSM platform, summarising open tickets, suggesting resolutions, and automating escalations with no external plugins.

Enterprise AI ServiceNowNow AssistITSM

May 4, 2023 Medium

MPT-7B: the first open-source model explicitly built for commercial use

MosaicML launches MPT-7B under Apache 2.0 with a 65,000-token context window via ALiBi, the first open model explicitly designed for unrestricted commercial deployment.

Foundation Models MPT-7BALiBiApache 2.0

May 4, 2023 High

StarCoder: the first serious open coding model with transparent training data

BigCode and HuggingFace release StarCoder, a 15.5B-parameter model trained on 1 trillion tokens from The Stack across 86 languages, with an opt-out data governance system.

AI Coding StarCoderBigCodeopen source

May 2, 2023 High

MiniGPT-4 (KAUST): open-source visual chatbot with a single alignment layer

KAUST shows how to build a capable visual chatbot by connecting BLIP-2 and Vicuna with a single projection layer trained on 5,000 image-text pairs. The first demonstration that hours of single-GPU training are sufficient to create a working VLM.

Multimodal AI MiniGPT-4KAUSTBLIP-2

March 16, 2023 Landmark

Microsoft 365 Copilot: GPT-4 embedded in Word, Excel, Teams and Outlook

Microsoft announces Copilot across the M365 suite: AI for 300M+ enterprise users, powered by GPT-4 and Microsoft Graph for business context.

Enterprise AI Microsoft 365CopilotGPT-4

April 2023

April 20, 2023 High

LLaVA: Visual Instruction Tuning opens the multimodal open-source era

LLaVA combines CLIP + LLaMA with 150k GPT-4-generated examples to create the first quality open-source visual assistant.

Multimodal AI LLaVAVisual Instruction TuningOpen Source

April 19, 2023 Medium

StableLM: Stability AI enters the open LLM race

Stability AI releases StableLM 3B and 7B under CC BY-SA 4.0, trained on 1.5T tokens. Open response to closed models, but quality still trails LLaMA.

Open Source Models Stability AIStableLMopen source

April 18, 2023 Medium

Microsoft Presidio: PII anonymization in LLM pipelines

Microsoft Presidio reaches general availability: open source framework for detecting and anonymizing personal data in LLM-processed text, with NER and regex for 50+ entity types.

AI Security MicrosoftPresidioPII

April 16, 2023 High

Vicuna-13B: the open chatbot that reaches 90% of ChatGPT quality

LMSYS fine-tunes LLaMA-13B on 70,000 ShareGPT conversations and produces an open-source chatbot that GPT-4, used as judge, rates at 90% of ChatGPT quality.

Foundation Models VicunaLLaMAfine-tuning

April 13, 2023 High

AWS Bedrock: managed multi-model AI on Amazon cloud

AWS announces Bedrock, a managed service exposing Claude (Anthropic), Jurassic-2 (AI21), Stable Diffusion, and its own Titan via one API. Reply to Azure OpenAI.

AI Infrastructure AWSBedrockmanaged AI

April 7, 2023 High

Generative Agents: 25 AI agents simulate a society in Smallville

Stanford creates 25 LLM-based agents simulating daily life in a virtual village, with episodic memory, reflection, and planning — the first credible artificial society.

Agents StanfordGenerative AgentsSmallville

April 3, 2023 High

BabyAGI: 200 lines of Python that spark the autonomous agent debate

Yohei Nakajima publishes BabyAGI, an autonomous task manager in ~200 Python lines using GPT-4 and Pinecone that creates and executes subtasks in an infinite loop, viral on Twitter within 24 hours.

Agents BabyAGIAutonomous AgentTask Management

March 2023

March 30, 2023 High

AutoGPT: the first viral AI agent

A developer publishes AutoGPT on GitHub: given a text goal, the system calls GPT-4 in a loop to plan tasks, execute them, and self-criticize. In two weeks, becomes the most-starred repo in history.

Agents AutoGPTAgentsOpen Source

March 27, 2023 High

GPT4All: click-and-run offline LLM for non-technical users

Nomic AI releases GPT4All, a point-and-click installer to run LLMs offline on Windows, Mac, and Linux, lowering the technical barrier to almost zero.

Local AI GPT4AllNomic AILLM Offline

March 25, 2023 High

oobabooga text-generation-webui: the first GUI for local LLMs

The most-starred open-source web interface for running local LLMs: supports GPTQ, GGML, transformers backends with Gradio UI, extensions, character cards, and chat/instruct modes.

Local AI oobaboogatext-generation-webuilocal LLM

March 23, 2023 Medium

ChatGPT Plugins: the LLM becomes an interface to the web

OpenAI ships plugins for ChatGPT: the model can browse the web, run Python in a sandbox, book flights (Expedia, Kayak), order groceries (Instacart). First big mainstream tool-use experiment.

Agents OpenAIChatGPTPlugins

March 22, 2023 Medium

Codeium: free AI code assistant for 70+ languages, Copilot alternative

Codeium launches its AI code assistant completely free for individual developers, supporting over 70 languages and integrating with VS Code, JetBrains, and Vim.

AI Coding CodeiumCode CompletionFree

March 22, 2023 Medium

HuggingGPT: ChatGPT as a brain orchestrating 800 AI models

Microsoft Research uses ChatGPT as a central planner that decomposes complex tasks and delegates execution to specialized HuggingFace models for vision, audio, and NLP.

Agents Microsoft ResearchHuggingGPTJARVIS

March 22, 2023 High

Llama Guard: an LLM trained to be the gatekeeper of other LLMs

Meta releases Llama Guard, a fine-tuned LLaMA classifier that identifies dangerous inputs and outputs across 6 harm categories, designed as a plug-in safety layer for LLM applications.

AI Security MetaLlamaGuardContent Safety

March 21, 2023 Medium

Google Bard: the (late) answer to ChatGPT

Google opens Bard public preview in US and UK, based on a lightweight LaMDA. Reception is lukewarm: slow, cautious, less useful than ChatGPT.

Foundation Models GoogleBardLaMDA

March 20, 2023 Medium

Runway Gen-1: text- and image-guided video style transfer

Runway launches Gen-1: the first commercial model that applies a visual style from text or a reference image to an existing video, frame by frame. Precursor to the Gen-2/Gen-3 line.

Image & Video Gen Runway Gen-1video style transfertext-to-video

March 17, 2023 Medium

Microsoft Semantic Kernel: the enterprise SDK for LLM orchestration

Microsoft open-sources Semantic Kernel, a C#/Python/Java SDK for integrating LLMs into enterprise apps. Introduces 'skills' (reusable AI functions) and 'planners' (auto-chaining toward a goal). Becomes Microsoft's standard AI orchestration layer for Copilot builds.

Agents Semantic KernelMicrosoftSDK

March 17, 2023 Medium

Tesla Optimus Gen 1: the bipedal robot walks autonomously in a factory

Tesla releases the first video of Optimus Gen 1 walking and performing tasks autonomously in a real factory environment, with a stated target price of 20,000 dollars.

Robotics TeslaOptimusHumanoid Robot

March 15, 2023 High

PyTorch 2.0 and torch.compile: Graph Compilation Without Rewriting Code

PyTorch 2.0 introduces torch.compile built on TorchDynamo and the Inductor backend, delivering up to 2x speedup on transformers without code changes, making PyTorch competitive with XLA/JAX for production workloads.

AI Infrastructure PyTorch 2.0torch.compileTorchDynamo

March 14, 2023 High

Claude arrives: the first serious ChatGPT competitor

Anthropic launches Claude, an AI assistant trained with Constitutional AI. Same day as GPT-4. Two versions: Claude (full) and Claude Instant (faster and cheaper).

Foundation Models AnthropicClaudeConstitutional AI

March 14, 2023 High

Google Workspace AI (Duet AI): the first AI assistant built into G Suite

Google announces Duet AI for Workspace: assisted writing in Docs, email summaries in Gmail, slide generation in Slides, and formula help in Sheets.

Enterprise AI Google WorkspaceDuet AIProductivity

March 14, 2023 Landmark

GPT-4: the reasoning leap that resets the baseline

OpenAI releases GPT-4, multimodal (text + image), with reasoning, coding, and reliability clearly beyond GPT-3.5. Passes bar, medical, and coding exams.

Foundation Models OpenAIGPT-4Multimodal

March 10, 2023 Medium

CAMEL: two LLM agents that cooperate to solve complex tasks

KAUST presents CAMEL, a role-playing framework where an 'AI user' LLM and an 'AI assistant' LLM autonomously collaborate on tasks without human intervention at each step.

Agents KAUSTCAMELMulti-Agent

March 10, 2023 Landmark

llama.cpp: LLaMA 7B runs 4-bit on MacBook CPU

Georgi Gerganov brings Meta's LLaMA to consumer CPUs via 4-bit C++ quantization: the first foundation model practically usable offline on a laptop.

Local AI LLaMAllama.cppC++

March 7, 2023 High

Salesforce Einstein GPT: the first CRM with native generative AI

Salesforce embeds generative AI directly into its CRM, suggesting sales emails, case replies, and Salesforce Flow code without leaving the platform.

Enterprise AI SalesforceEinstein GPTCRM

March 6, 2023 Landmark

PaLM-E: the first embodied VLM at 562 billion parameters

Google presents PaLM-E, a 562B-parameter multimodal model that feeds images and robot state directly into the transformer, capable of long-horizon planning on real robots.

Robotics GooglePaLM-EVLM

March 2, 2023 High

RoboCat: the first robot that self-improves without human labeling

DeepMind introduces RoboCat, a robotic agent that learns from few demonstrations, self-trains by collecting new data, and improves iteratively without human intervention. With just 10 demos it achieves 36% success on novel tasks.

Robotics RoboCatDeepMindself-improvement

March 1, 2023 High

Agility Robotics Digit v3: the first humanoid in an Amazon warehouse

Agility Robotics announces partnership with Amazon for Digit v3, a bipedal warehouse robot — first real-scale industrial deployment of a humanoid.

Robotics Agility RoboticsDigitHumanoid Robot

March 1, 2023 High

ChatGPT API: gpt-3.5-turbo at $0.002 per 1K tokens

OpenAI ships the ChatGPT API (gpt-3.5-turbo) at one tenth the price of text-davinci-003, plus Whisper API for speech-to-text. The wrapper era begins.

Foundation Models OpenAIChatGPTAPI

February 2023

February 24, 2023 High

LLaMA: Meta opens foundation models to research

Meta releases LLaMA in four sizes (7B, 13B, 33B, 65B), available to researchers on request. One week later, the weights leak publicly.

Open Source Models MetaLLaMAOpen Weights

February 23, 2023 Medium

Amazon CodeWhisperer GA: AWS-native code assistant with reference tracking

Amazon launches CodeWhisperer GA with a unique feature: it flags when generated code resembles open source snippets, showing the license and source repo. Free tier for individual developers.

AI Coding AmazonCodeWhispererAWS

February 10, 2023 High

ControlNet: structural control for Stable Diffusion without retraining

Zhang et al. introduce ControlNet, an adapter adding pose, depth, and edge control to Stable Diffusion without modifying the base model weights.

Image & Video Gen ControlNetStable DiffusionDiffusion Models

February 9, 2023 High

Toolformer: the LLM that learns to use tools on its own

Meta AI presents Toolformer: an LLM that autonomously learns when and how to call external tools (calculator, Wikipedia, calendar) using self-supervised examples only.

Agents Meta AIToolformerTool Use

February 9, 2023 High

vLLM: 24x LLM throughput with PagedAttention from UC Berkeley

The UC Berkeley team releases vLLM, a Python library for LLM inference using PagedAttention to manage KV cache like OS virtual memory, achieving 24x throughput over the HuggingFace baseline.

AI Infrastructure vLLMBerkeleyPagedAttention

February 7, 2023 Medium

Bing Chat: search engines change for the first time in 20 years

Microsoft integrates conversational AI into Bing (later revealed to run on pre-release GPT-4) that answers with direct citations from web pages. The Google 'code red' moment.

Foundation Models MicrosoftBing ChatSydney

January 2023

January 30, 2023 High

BLIP-2: the Q-Former bridge between vision and language

Salesforce introduces BLIP-2: a lightweight Q-Former bridges frozen visual encoder and frozen LLM, achieving SOTA captioning with 8x fewer trainable parameters.

Multimodal AI BLIP-2Q-FormerImage Captioning

January 27, 2023 High

XTTS: Coqui AI's open-source multilingual zero-shot voice cloning

XTTS brings multilingual zero-shot voice cloning to open source: just a 6-second audio sample to replicate a voice across 17 different languages, with MIT license.

Voice & Audio XTTSCoquimultilingual

January 26, 2023 High

Code as Policies: the robot programs itself from natural language

Google shows how an LLM directly generates executable robot code from natural-language instructions, without robotic fine-tuning, using hierarchical function composition.

Robotics GoogleCode as PoliciesLLM

January 26, 2023 High

ElevenLabs exits beta: AI voice becomes the creator standard

ElevenLabs exits public beta with 1-minute voice cloning, 29 languages, and prosodically natural TTS, establishing itself as the reference for creators and audiobooks.

Voice & Audio ElevenLabsVoice CloningTTS

January 26, 2023 High

NIST AI Risk Management Framework 1.0

The US government publishes the first official framework for managing AI risks in organizations: four core functions — Govern, Map, Measure, Manage.

AI Security NISTAI RMFrisk management

January 20, 2023 High

Speculative Decoding: 2-3x LLM inference speedup without changing output

Chen et al. (Google Brain) publish Speculative Decoding: a small model proposes tokens, the large model verifies them in parallel. Same output, 2-3x faster with no quality change.

AI Infrastructure Speculative DecodingInferenceAutoregressive

January 16, 2023 Landmark

Azure OpenAI Service goes GA: GPT-4 with enterprise SLA

Microsoft makes OpenAI models (GPT-3.5-Turbo, Codex, DALL-E) available on Azure with enterprise SLA, VNet isolation, HIPAA and SOC2 compliance. A watershed moment for enterprise AI adoption.

Enterprise AI Azure OpenAIMicrosoftenterprise

January 10, 2023 High

whisper.cpp: offline voice transcription on CPU with pure C++

Georgi Gerganov brings OpenAI's Whisper model to CPU via a minimal C++ implementation: real-time transcription with no GPU and no cloud.

Local AI WhisperSpeech-to-TextC++

January 5, 2023 Landmark

VALL-E: Microsoft clones a voice from 3 seconds of audio using in-context learning

VALL-E clones any voice with just 3 seconds of reference audio, no fine-tuning needed, using in-context learning on EnCodec tokens. First zero-shot TTS at naturalistic quality.

Voice & Audio VALL-ETTSVoice Cloning

2022

December 2022

December 16, 2022 High

DeepMind RT-1: the first Transformer trained on real robotics data

DeepMind releases RT-1, a robotics transformer trained on 130,000 real episodes with 13 robots, generalizing to never-seen tasks.

Robotics DeepMindRT-1Robotics Transformer

December 15, 2022 Medium

Constitutional AI: the model self-corrects without humans in the loop

Anthropic publishes Constitutional AI: instead of pure RLHF, the model critiques and revises its own responses following a written 'constitution'. Less human labeling, more transparency.

AI Security AnthropicConstitutional AIRLAIF

December 1, 2022 Medium

Boston Dynamics adds visual AI to Spot: map-free autonomy

Spot gains advanced autonomous navigation and industrial anomaly detection via visual AI, operating without pre-loaded maps.

Robotics Boston DynamicsSpotAutonomous Navigation

November 2022

November 30, 2022 Landmark

ChatGPT: AI lands in everyone's browser

OpenAI launches ChatGPT, a free conversational interface on GPT-3.5 aligned via RLHF. It crosses one million users in five days.

Foundation Models OpenAIChatGPTGPT-3.5

November 24, 2022 Medium

Stable Diffusion 2.0: new architecture and OpenCLIP encoder

Stability AI releases SD 2.0 with OpenCLIP replacing CLIP, native 768x768 resolution, a new depth2img model, and improved inpainting. A controversial release due to breaking compatibility with existing LoRAs and prompts.

Image & Video Gen Stable Diffusion 2.0Stability AIOpenCLIP

November 16, 2022 Medium

Notion AI alpha: AI inside the tool you already work in

Notion launches Notion AI in private alpha, GPT integrated inside pages: summarize, rewrite, translate, brainstorm without leaving the document.

Enterprise AI NotionNotion AIProductivity

November 15, 2022 Medium

Galactica: Meta launches (and pulls in three days) a science LLM

Meta unveils Galactica, a 120B-parameter model trained on 48 million scientific papers. The public demo is pulled after three days under a wave of criticism for authoritative hallucinations.

Foundation Models MetaGalacticaScience LLM

November 9, 2022 High

NVIDIA Triton Inference Server 2.x: the de facto standard for production inference

NVIDIA consolidates Triton as the open-source platform for serving PyTorch, TensorFlow, and ONNX models in production, with dynamic batching, multi-GPU support, and gRPC/HTTP APIs.

AI Infrastructure NVIDIATritonInference Server

November 1, 2022 Medium

HuggingFace Accelerate: One Python Script for CPU, GPU, TPU, and Mixed Precision

HuggingFace Accelerate provides a unified API that runs the same training code on any hardware without changes, becoming the backbone of most open LLM training pipelines.

AI Infrastructure AccelerateHuggingFacemulti-GPU

October 2022

October 25, 2022 Landmark

LangChain: the framework for LLM applications is born

Harrison Chase releases LangChain, an open-source Python library to chain LLMs with prompt templates, memory, tools and external data sources. It will become the default stack of the first LLM apps.

Agents LangChainFrameworkLLM Apps

October 25, 2022 Medium

Textual Inversion: inject a custom concept into diffusion models

Weizmann Institute publishes Textual Inversion: learning a new text token representing a custom concept from 3-5 images, without modifying model weights.

Image & Video Gen Textual Inversionpersonalizationembedding

October 24, 2022 High

EnCodec: Meta AI compresses audio with neural networks and beats Opus

EnCodec compresses 24kHz stereo audio to just 1.5–12 kbps at quality surpassing Opus, becoming the standard vocoder for modern neural TTS.

Voice & Audio EnCodecNeural CodecAudio Compression

October 15, 2022 High

MT-OPT: Google trains a single robot policy on 800+ tasks and 57,000 hours of real data

Google pre-trains a single policy on over 800 real robot tasks and 57,000 hours of real-world data, demonstrating for the first time zero-shot transfer to new tasks through large-scale multi-task offline learning.

Robotics MT-OPTmulti-task robot learningoffline RL

October 12, 2022 High

GPTQ: 4-bit post-training quantization making GPT-scale inference practical

Frantar et al. (ETH Zurich) publish GPTQ: accurate 4-bit quantization without significant fine-tuning, the first technique to make inference of 175B-parameter models practical on consumer hardware.

AI Infrastructure GPTQQuantizzazione4-bit

October 6, 2022 Landmark

ReAct: the framework that unites reasoning and acting in LLMs

Yao et al. introduce ReAct, a schema alternating explicit thoughts (Thought) and concrete actions (Act) in LLMs, the theoretical foundation of all modern agents.

Agents ReActReasoningTool Use

October 5, 2022 Medium

Imagen Video and Phenaki: Google answers on text-to-video

A week after Make-A-Video, Google Research unveils Imagen Video and, around the same time, Phenaki: two different approaches to text-to-video, with longer, more coherent clips.

Image & Video Gen GoogleImagen VideoPhenaki

September 2022

September 29, 2022 Medium

Make-A-Video: Meta unveils the first credible text-to-video

Meta AI shows Make-A-Video, a system that generates short animated clips from a text description by reusing a pre-existing text-to-image model.

Image & Video Gen MetaMake-A-VideoText-to-Video

September 27, 2022 Medium

Hugging Face Inference Endpoints: deploy LLMs in two clicks

Hugging Face launches Inference Endpoints, a managed service to deploy Hub models on AWS, Azure or GCP with autoscaling, on-demand GPUs and private endpoints.

AI Infrastructure Hugging FaceInference EndpointsDeployment

September 22, 2022 High

Flan-T5 and Flan-PaLM: instruction tuning scales to 1,800 tasks

Google scales instruction tuning to 1,800 tasks and 540B parameters, open-sources Flan-T5, and proves that chain-of-thought reasoning is teachable via fine-tuning.

Foundation Models Flan-T5instruction tuningchain-of-thought

September 21, 2022 High

Whisper open source: audio transcription becomes a commodity

OpenAI releases Whisper under MIT license: a speech-to-text model trained on 680,000 hours of multilingual audio, near commercial-grade quality, runs locally.

Voice & Audio OpenAIWhisperASR

September 16, 2022 Medium

Character.AI: persona chatbots from ex-Google founders

Noam Shazeer and Daniel De Freitas, fathers of LaMDA, launch Character.AI: a platform letting anyone create and chat with AI characters, from Einstein to anime personas.

Foundation Models Character.AIChatbotPersona

September 14, 2022 High

Prompt Injection: when user input hijacks system instructions

Riley Goodside and Perez et al. formalize Prompt Injection: an attack where malicious user input overwrites an LLM's system instructions, bypassing policies and guardrails.

AI Security Prompt InjectionLLM SecurityAdversarial Attacks

September 12, 2022 High

AudioLM: Google teaches a language model to listen and continue audio

AudioLM generates long-range coherent audio using two tiers of tokens — semantic and acoustic — with no text or score conditioning.

Voice & Audio AudioLMLanguage ModelAudio Generation

August 2022

August 25, 2022 High

DreamBooth: generate your subject in any style with 3-5 photos

Google Research publishes DreamBooth: fine-tune a diffusion model on 3-5 images of a specific subject to reproduce it in any context or style. Foundation of all personalized AI image generation.

Image & Video Gen DreamBoothpersonalizationfine-tuning

August 22, 2022 Landmark

Stable Diffusion: image generation goes open

Stability AI publicly releases weights and code of a text-to-image latent diffusion model that runs on a consumer GPU. AI image generation leaves the cloud.

Image & Video Gen Stable DiffusionStability AIDiffusion Models

August 16, 2022 Medium

GitHub Copilot: 40% of code in active files written by AI

GitHub publishes first real-world data: 40% of code in files with Copilot active is AI-generated. First quantitative benchmark on AI tools' actual impact on developer output.

AI Coding GitHub CopilotDeveloper ProductivityResearch

August 16, 2022 High

SayCan: grounding LLMs in robot affordances

Google Robotics shows how to combine an LLM for high-level planning with robot value functions that filter only physically executable actions.

Robotics GoogleSayCanEmbodied AI

July 2022

July 22, 2022 High

diffusers v0.1: the standard library for diffusion models

Hugging Face releases diffusers, a modular Python library for diffusion models — text-to-image, audio and beyond. It quickly becomes the de facto standard.

Open Source Models Hugging FaceDiffusersLibrary

July 20, 2022 Medium

DALL-E 2 enters beta: generative image AI for the public

OpenAI opens DALL-E 2 in beta to over one million waitlist users, with a pay-per-image credit system. First large-scale consumer product for image generation.

Image & Video Gen OpenAIDALL-E 2Beta

July 12, 2022 High

BLOOM 176B: the first truly open large multilingual LLM

The BigScience collective releases BLOOM, a 176-billion-parameter model trained on 46 human languages and 13 programming languages, under an open RAIL license.

Open Source Models BigScienceBLOOMHugging Face

July 12, 2022 High

Midjourney opens public beta on Discord

Midjourney opens its public beta with a text-to-image model accessible via a Discord bot. Its strong aesthetic default and community turn image generation into a mass phenomenon.

Image & Video Gen MidjourneyDiscordText-to-Image

July 6, 2022 High

Red Teaming LLMs with LLMs: the DeepMind paper that changed safety testing

Perez et al. (DeepMind) show that an LLM can be used as an automatic attacker against another LLM, discovering undesired behaviors at a scale impossible for human teams.

AI Security Red TeamingDeepMindLLM Safety

June 2022

June 27, 2022 Medium

UL2: Google unifies pretraining paradigms with Mixture-of-Denoisers

Google Research combines three major pretraining objectives into a single 20B model, outperforming GPT-3 on many benchmarks at one-eighth the parameters.

Foundation Models UL2mixture of denoiserspretraining

June 23, 2022 Medium

Tabnine 3.0: AI code completion with privacy-first and local models

Tabnine releases version 3.0 with local or cloud model support, becoming the first mature AI code completion product on the market before Copilot's rise.

AI Coding TabnineCode CompletionLocal AI

June 21, 2022 Landmark

FlashAttention: IO-aware attention that revolutionizes transformer training

Tri Dao (Stanford) publishes FlashAttention: an IO-aware implementation that avoids materializing the attention matrix in HBM, achieving 2-4x speedup and 10x less GPU memory.

AI Infrastructure FlashAttentionAttentionTransformer

June 21, 2022 Landmark

GitHub Copilot: AI for code becomes a product for everyone

GitHub announces general availability of Copilot for all developers at $10/month. It's the first mass-market AI tool living inside the daily code editor.

AI Coding GitHubCopilotOpenAI

June 17, 2022 High

SoundStream: Google's first real-time neural audio codec

SoundStream introduces Residual Vector Quantization to compress audio at 3kbps with quality surpassing Opus at 12kbps, founding the architecture of all modern neural codecs used in audio LLMs.

Voice & Audio SoundStreamneural codecRVQ

June 6, 2022 Medium

Tortoise TTS: convincing voice cloning from 3 seconds of audio

James Betker releases Tortoise TTS, an open source model with few-second voice cloning and human-like vocal quality — the first real breakthrough in accessible TTS.

Voice & Audio TTSVoice CloningOpen Source

May 2022

May 23, 2022 High

Imagen: Google enters text-to-image generation

Google Research unveils Imagen, a text-to-image diffusion model that uses a frozen T5 text encoder and beats DALL-E 2 on benchmarks for photorealistic fidelity.

Image & Video Gen GoogleImagenText-to-Image

May 12, 2022 High

Gato: DeepMind tries a single agent for 600+ tasks

DeepMind unveils Gato, a 1.2-billion-parameter Transformer that with the same weights plays Atari games, controls a robot arm, captions images and chats.

Multimodal AI DeepMindGatoGeneralist Agent

May 3, 2022 High

Meta OPT-175B: the first 175-billion LLM opened to researchers

Meta AI releases OPT-175B, a language model comparable in size to GPT-3, with weights available to researchers and a public training logbook.

Open Source Models MetaOPTOpen Source

April 2022

April 29, 2022 High

DeepMind Flamingo: the first few-shot visual language model

Flamingo brings few-shot learning to vision: SOTA on VQA and captioning with no task-specific fine-tuning.

Multimodal AI Visual Language ModelFew-Shot LearningVQA

April 20, 2022 High

NaturalSpeech: Microsoft achieves human parity on LJSpeech benchmark

NaturalSpeech is the first TTS system to achieve a MOS statistically indistinguishable from recorded human speech on the LJSpeech benchmark, marking a historic milestone for speech synthesis.

Voice & Audio NaturalSpeechMicrosofthuman parity

April 6, 2022 High

DALL·E 2: the quality leap in image generation

OpenAI announces DALL·E 2, a diffusion-based text-to-image model producing photorealistic 1024×1024 images. Initially waitlist-only, public access in July.

Image & Video Gen OpenAIDALL-E 2Diffusion

April 5, 2022 Medium

PaLM 540B: Google's GPT-3 answer brings chain-of-thought

Google publishes PaLM, a 540B-parameter model trained on the new Pathways system. Demonstrates emergent reasoning capabilities when guided with chain-of-thought.

Foundation Models GooglePaLMPathways

March 2022

March 29, 2022 Landmark

Chinchilla: the big models were undertrained

DeepMind publishes the Chinchilla paper and shows that, given equal compute, smaller models trained on far more tokens beat oversized undertrained ones.

Foundation Models DeepMindChinchillaScaling Laws

March 22, 2022 Landmark

NVIDIA H100 and Hopper architecture: the foundation-model GPU

At GTC 2022 NVIDIA unveils the Hopper architecture and the H100 GPU, with FP8 Transformer Engine and NVLink 4. It will become the hardware substrate for nearly every large LLM of the following years.

AI Infrastructure NVIDIAH100Hopper

March 21, 2022 High

Self-Consistency: sample multiple reasoning paths for better answers

Wang et al. (Google Brain) show that sampling N diverse reasoning paths and taking the most frequent answer beats greedy decoding on all reasoning benchmarks.

Foundation Models Chain of ThoughtSelf-ConsistencyReasoning

February 2022

February 2, 2022 High

AlphaCode: DeepMind takes on competitive programmers

DeepMind unveils AlphaCode, a system that generates code for competitive programming problems and ranks in the top half of human participants on Codeforces.

AI Coding DeepMindAlphaCodeCompetitive Programming

January 2022

January 27, 2022 Medium

Coqui TTS: open source speech synthesis for everyone

Coqui TTS is an open source Python library for quality text-to-speech, forked from Mozilla TTS, supporting over 1100 languages and adopted by the HuggingFace community.

Voice & Audio CoquiTTSOpen Source

January 27, 2022 High

InstructGPT: the fine-tuning that teaches GPT to obey

OpenAI introduces InstructGPT: a GPT-3 refined with human feedback (RLHF) that follows instructions better than the 175B base model despite being much smaller (1.3B parameters).

Foundation Models OpenAIInstructGPTRLHF

January 24, 2022 Medium

UnifiedIO (AI2): first unified sequence-to-sequence model for text, images, audio, and video

AI2 and University of Washington present UnifiedIO: the first sequence-to-sequence model capable of handling text, images, audio, video, and structured data as both inputs and outputs through a single architecture, trained on 80+ tasks simultaneously.

Multimodal AI UnifiedIOmultimodalunified model

2021

December 2021

December 20, 2021 High

GLIDE: OpenAI shifts from autoregressive to CLIP-guided diffusion

OpenAI publishes GLIDE, a text-to-image diffusion model with classifier-free guidance — technical foundation for DALL·E 2 and the models that follow.

Image & Video Gen OpenAIGLIDEDiffusion

December 16, 2021 High

WebGPT: OpenAI teaches GPT-3 to browse the web

OpenAI publishes WebGPT, a GPT-3 fine-tune that learns to use a text browser to search the web for answers with source citations, trained via imitation learning + RLHF.

Agents OpenAIWebGPTBrowsing

December 8, 2021 High

Gopher 280B: DeepMind officially enters the LLM race

DeepMind releases Gopher, a 280B dense model, alongside a systematic 152-task study and a companion paper on ethical considerations of foundation models.

Foundation Models DeepMindGopherScaling

December 8, 2021 High

RETRO: DeepMind foreshadows RAG with retrieval over 2 trillion tokens

DeepMind publishes RETRO, a 7B-parameter model that retrieves relevant passages from a 2T-token database at inference, matching the performance of models 25x larger.

Foundation Models DeepMindRETRORetrieval

November 2021

November 18, 2021 High

OpenAI drops the waitlist: GPT-3 API available to all

Eighteen months after the GPT-3 paper, OpenAI removes the API access waitlist and lets any developer sign up, accelerating mainstream adoption of foundation models.

Enterprise AI OpenAIAPIGPT-3

October 2021

October 29, 2021 Medium

Replit Ghostwriter: AI coding in the browser, zero setup

First AI coding tool integrated into a browser IDE: intelligent code completion for students and developers with no local configuration required.

AI Coding Code CompletionBrowser IDEAI Assistant

October 28, 2021 Medium

Pathways: Google sketches the post-Transformer architecture

Jeff Dean outlines Pathways, Google's unified architecture for sparse, multitask, multimodal models — the infrastructure foundation that will power PaLM and Gemini.

AI Infrastructure GooglePathwaysMultitask

October 21, 2021 High

FLAN: instruction tuning that teaches models to follow directions

Google shows that training a model on 60+ tasks framed as instructions dramatically improves zero-shot performance on unseen tasks.

Foundation Models FLANinstruction tuningzero-shot

October 21, 2021 Medium

PyTorch 1.10: CUDA Graphs, FX, and the maturing of the dominant framework

Meta releases PyTorch 1.10 with CUDA Graphs integration, FX-based quantization, TorchScript improvements — consolidating leadership of the framework for AI research and production.

AI Infrastructure PyTorchFrameworkCUDA Graphs

October 11, 2021 High

Megatron-Turing NLG 530B: Microsoft and NVIDIA scale dense past GPT-3

Microsoft and NVIDIA announce MT-NLG, a 530B-parameter dense model trained with DeepSpeed and Megatron-LM, at the time the largest dense LM ever produced.

Foundation Models MicrosoftNVIDIAMegatron

September 2021

September 29, 2021 Low

Copilot Labs: GitHub opens a sandbox for experimental features

GitHub introduces Copilot Labs, a VS Code extension hosting experimental features beyond simple autocomplete: code explanation, language translation, test generation.

AI Coding GitHubCopilot LabsCode Explain

September 9, 2021 Medium

HuBERT: Meta brings self-supervised to speech, foreshadows Whisper

Meta AI publishes HuBERT, a self-supervised audio model based on masked prediction of discrete clusters — conceptual base for Whisper, w2v-BERT and audio-multimodal models.

Voice & Audio FacebookMetaAV-HuBERT

August 2021

August 31, 2021 Medium

Copilot lands on JetBrains and Neovim

GitHub extends the Copilot technical preview to the main JetBrains IDEs (IntelliJ, PyCharm, GoLand, WebStorm) and to Neovim, taking AI coding outside the VS Code ecosystem.

AI Coding GitHubCopilotJetBrains

August 16, 2021 High

On the Opportunities and Risks of Foundation Models: Stanford coins the term

Stanford's Center for Research on Foundation Models publishes a 200+ page report coining the term foundation models, now standard in technical, academic and regulatory discourse.

Foundation Models StanfordCRFMFoundation Models

August 10, 2021 High

Codex API: OpenAI opens access to the model behind Copilot

OpenAI releases the Codex API in private beta, giving developers direct access to the code generation model behind GitHub Copilot, free during the beta.

AI Coding OpenAICodexAPI

July 2021

July 28, 2021 Medium

OpenAI Triton: writing GPU kernels in Python becomes practical

OpenAI releases Triton, a Python-like language and compiler for writing custom GPU kernels at performance close to hand-written CUDA — dramatically lowering the barrier for model optimization.

AI Infrastructure OpenAITritonGPU

July 15, 2021 High

AlphaFold 2: open code and database, biology accelerates

DeepMind publishes AlphaFold 2 code and weights on GitHub and, with EMBL-EBI, releases a database with predicted structures for 350,000 human and model-organism proteins.

AI Infrastructure DeepMindAlphaFoldProtein Folding

July 12, 2021 High

Megatron-LM v2: 3D Parallelism for 530-Billion-Parameter Models

NVIDIA adds interleaved pipeline scheduling and sequence parallelism to Megatron-LM, enabling training of the 530B-parameter MT-NLG on 2240 A100 GPUs with Microsoft.

AI Infrastructure Megatron-LM3D parallelismpipeline parallelism

July 7, 2021 High

Codex paper: OpenAI publishes HumanEval and the model behind Copilot

OpenAI releases Evaluating Large Language Models Trained on Code describing Codex (the model powering GitHub Copilot) and introduces HumanEval, the standard benchmark for code generation.

AI Coding OpenAICodexHumanEval

June 2021

June 29, 2021 High

GitHub Copilot: autocomplete grows up

GitHub and OpenAI launch a technical preview of an assistant that suggests entire lines and functions right in the editor, based on a GPT-3-derived model trained on public code.

AI Coding GitHubCopilotCodex

June 15, 2021 High

VITS: end-to-end TTS with variational autoencoder

VITS unifies the acoustic model and vocoder into a single end-to-end model, achieving quality surpassing Tacotron 2 with faster inference.

Voice & Audio VITSTTSend-to-end

June 4, 2021 High

GPT-J 6B: the open source model that matches GPT-3 Curie on many benchmarks

EleutherAI releases GPT-J, a 6B-parameter model trained in JAX on TPUs, performance comparable to GPT-3 Curie, shipped under Apache 2.0.

Open Source Models EleutherAIGPT-JOpen Source

June 1, 2021 High

The Pile: the 825 GB open dataset that fuels the open LLM era

EleutherAI publishes The Pile, an 825 GB dataset built from 22 diverse sub-datasets — the base for GPT-Neo, GPT-J, Pythia and much of the early open source ecosystem.

Open Source Models EleutherAIThe PileDataset

June 1, 2021 Medium

Wu Dao 2.0: China announces a 1.75T-parameter model

BAAI (Beijing Academy of Artificial Intelligence) introduces Wu Dao 2.0, a 1.75 trillion-parameter multimodal Mixture of Experts model — China's response to GPT-3 and Switch Transformer.

Foundation Models BAAIWu DaoChina

May 2021

May 28, 2021 Landmark

Anthropic: an AI safety-focused lab is born

Dario and Daniela Amodei, former VP of Research and VP of Safety at OpenAI, co-found Anthropic with a group of researchers, explicitly focused on AI safety and interpretability.

AI Security AnthropicAI SafetyFounding

May 18, 2021 Medium

MUM: Google unveils the multitask model for Search

At Google I/O, Google announces MUM (Multitask Unified Model), T5-based, claimed 1000x more powerful than BERT, capable of handling 75 languages and multimodal content.

Multimodal AI GoogleMUMSearch

May 18, 2021 High

LaMDA: Google unveils its dialogue model

At Google I/O, Sundar Pichai introduces LaMDA (Language Model for Dialogue Applications), a 137B-parameter model fine-tuned for dialogue, direct ancestor of Bard.

Foundation Models GoogleLaMDADialogue

April 2021

April 15, 2021 Medium

OpenAI Content Filter: first integrated AI-side moderation infrastructure

OpenAI ships the content filter endpoint to classify GPT-3 outputs as safe/sensitive/unsafe — the first integrated moderation tool inside a commercial foundation-model API.

AI Security OpenAIContent FilterSafety

March 2021

March 22, 2021 High

GPT-Neo: the first open source clone of GPT-3

EleutherAI releases GPT-Neo 1.3B and 2.7B, open source language models trained on The Pile — the first serious attempt to replicate the GPT-3 architecture with public weights.

Open Source Models EleutherAIGPT-NeoOpen Source

January 2021

January 12, 2021 High

Switch Transformer: Google scales to 1.6T parameters with Mixture of Experts

Google Brain publishes Switch Transformer, a sparse model with 1.6 trillion parameters that activates only one expert per token, proving sparse routing can scale beyond dense models.

Foundation Models GoogleMoESparse

January 5, 2021 High

DALL·E and CLIP: text and images finally talk

OpenAI announces DALL·E (generates images from text) and CLIP (aligns images and text in the same semantic space) side by side. Two pieces of the multimodal puzzle.

Multimodal AI OpenAIDALL-ECLIP

2020

December 2020

December 31, 2020 High

The Pile: the open-source 825 GB dataset for training LLMs

EleutherAI releases The Pile, an 825 GB composite text dataset curated from 22 different sources (arXiv, GitHub, PubMed, books, StackExchange…), designed for pre-training large open-source language models.

Open Source Models EleutherAIThe PileDataset

December 23, 2020 High

MuZero in Nature: mastering games without knowing the rules

DeepMind publishes MuZero in Nature: the RL agent learns world dynamics on its own and reaches superhuman performance on Go, chess, shogi, and 57 Atari games without being given the rules.

Foundation Models DeepMindMuZeroReinforcement Learning

December 8, 2020 Medium

Big Bird at NeurIPS 2020: sparse attention for sequences up to 4096 tokens

Google Research presents Big Bird at NeurIPS 2020, a transformer with sparse attention (local + global + random) that scales linearly, reaches SOTA on long-document QA and summarization, and proves Turing-completeness.

Foundation Models GoogleBig BirdSparse Attention

November 2020

November 30, 2020 Landmark

AlphaFold 2 wins CASP14 and solves protein folding

DeepMind announces that AlphaFold 2 has won the CASP14 competition with mean GDT >90, on par with experimental methods — widely regarded as solving the 50-year-old protein folding problem.

Foundation Models DeepMindAlphaFoldCASP

November 4, 2020 Medium

Bing in production on Turing: deep AI in worldwide-scale search

Microsoft announces a Bing-wide production deployment of Turing-NLR (next-gen NLP) models on Azure GPUs, described as the largest search-quality improvement ever.

Enterprise AI MicrosoftBingTuring

October 2020

October 26, 2020 Medium

DeepMind acquires MuJoCo and makes it free

DeepMind announces it has acquired MuJoCo, the physics simulator used in most RL and robotics research, and commits to making it free for everyone — a first step toward the full open-source release in 2022.

Robotics DeepMindMuJoCoPhysics Simulator

October 23, 2020 Medium

mT5: a multilingual T5 over 101 languages

Google Research publishes mT5, a T5 variant pre-trained on mC4 (multilingual Common Crawl) over 101 languages, which becomes a standard baseline for many cross-lingual NLP tasks.

Foundation Models GoogleT5mT5

October 22, 2020 Landmark

Vision Transformer (ViT): "An Image is Worth 16x16 Words"

Google Research introduces the Vision Transformer, applying a pure transformer to image patches as if they were tokens, and shows that with enough pre-training it beats CNNs on ImageNet and other vision benchmarks.

Multimodal AI GoogleVision TransformerViT

September 2020

September 22, 2020 High

Microsoft acquires the exclusive GPT-3 license

Microsoft announces an exclusive license to integrate and redistribute GPT-3 in its products and cloud services, while OpenAI's public API keeps operating. The first major enterprise deal on foundation models.

Enterprise AI MicrosoftOpenAIGPT-3

September 9, 2020 High

DeepSpeed ZeRO-3: training models beyond 100 billion parameters

Microsoft announces ZeRO Stage 3 in DeepSpeed: by sharding parameters across GPUs in addition to gradients and optimizer states, it enables training of 100B+ parameter models on reasonable-size clusters.

AI Infrastructure MicrosoftDeepSpeedZeRO-3

August 2020

August 4, 2020 Medium

PyTorch Lightning 1.0: a boilerplate-free training loop

William Falcon and team ship PyTorch Lightning 1.0, a framework that separates research code (model) from engineering (training loop, distributed, checkpointing, logging) and becomes the de facto standard for many open projects.

AI Infrastructure PyTorch LightningOpen SourceTraining Loop

July 2020

July 29, 2020 Medium

Google announces TPU v4 with MLPerf 0.7 records

Posting MLPerf Training 0.7 results, Google reveals TPU v4, a new custom deep-learning accelerator, claiming it built the "world's fastest training supercomputer" with a 4,096-chip pod.

AI Infrastructure GoogleTPU v4Pod

July 22, 2020 Medium

Longformer: sliding-window attention for long documents

Allen Institute for AI releases Longformer, a transformer that combines local sliding-window attention with global attention on special tokens, scaling linearly up to 4096 tokens and beating RoBERTa on long-document tasks.

Foundation Models AllenAILongformerLong Context

July 9, 2020 High

HuggingFace Transformers 3.0: Rust tokenizers and the Model Hub

HuggingFace releases Transformers 3.0 with the Rust-based tokenizers library (up to 100× faster), new NLP pipelines, and tighter Model Hub integration, cementing the de facto standard for using pretrained models in Python.

Open Source Models HuggingFaceTransformersTokenizers

July 3, 2020 High

EleutherAI is founded: a community to replicate GPT-3 in the open

Connor Leahy, Sid Black, and Leo Gao found EleutherAI on Discord with the goal of replicating GPT-3 and releasing models, code, and datasets in the open, kicking off projects like GPT-Neo, GPT-J, and The Pile.

Open Source Models EleutherAIGPT-NeoOpen Source

June 2020

June 20, 2020 High

wav2vec 2.0: Facebook AI's "BERT for speech"

Facebook AI publishes wav2vec 2.0, a self-supervised model that learns representations from raw audio and reaches SOTA on LibriSpeech with as little as 10 minutes of labeled data.

Voice & Audio Facebook AIwav2vec 2.0Speech Recognition

June 17, 2020 Medium

Image GPT: generative pretraining for images

OpenAI introduces Image GPT (iGPT), a transformer that treats pixels as tokens and shows that GPT-style sequential generative pretraining works on images too, reaching competitive performance on CIFAR-10.

Multimodal AI OpenAIImage GPTGenerative Pretraining

June 11, 2020 Landmark

OpenAI launches the GPT-3 API in private beta

Two weeks after the paper, OpenAI opens a private beta of the first general API for its language models, available to a few hundred developers building applications directly on top of GPT-3.

Foundation Models OpenAIGPT-3API

May 2020

May 28, 2020 Landmark

GPT-3: the paper that opens the scaling-laws era

OpenAI publishes 'Language Models are Few-Shot Learners' and shows that at 175B parameters a model learns new tasks from a handful of examples in the prompt.

Foundation Models OpenAIGPT-3Few-shot Learning

May 22, 2020 Landmark

RAG: Retrieval-Augmented Generation enters the literature

Lewis et al. at Facebook AI publish the RAG paper, combining a dense retriever (DPR) with a seq2seq generator (BART) to answer knowledge-intensive questions without baking all facts into the weights.

Foundation Models Facebook AIRAGRetrieval-Augmented Generation

May 14, 2020 Landmark

NVIDIA A100: Ampere arrives and the GPU that trains GPT-3

At GTC 2020 Jensen Huang announces the A100 GPU built on the Ampere architecture: 54 billion transistors, 40-80 GB HBM2e, TF32, 2:4 structured sparsity, and MIG support.

AI Infrastructure NVIDIAA100Ampere

April 2020

April 30, 2020 Medium

OpenAI Jukebox: generating whole songs with vocals

OpenAI releases Jukebox, a generative model that produces raw songs (audio + vocals + lyrics) conditioned on artist and genre, built on a stack of VQ-VAE and autoregressive transformers.

Voice & Audio OpenAIJukeboxMusic Generation

April 9, 2020 Low

fairseq stabilizes modular transformer support

Facebook AI Research consolidates fairseq as the reference sequence-to-sequence framework: it adds modular support for BART, RoBERTa, mBART, wav2vec and becomes the primary codebase for FAIR's 2020 models.

Open Source Models MetaFacebook AIfairseq

March 2020

March 23, 2020 Medium

ELECTRA: more efficient NLP pre-training than BERT

Clark, Luong, Le, and Manning publish ELECTRA at ICLR 2020: instead of masked language modeling, it trains the model to detect tokens replaced by a small generator, matching BERT with a quarter of the compute.

Foundation Models GoogleStanfordELECTRA

February 2020

February 13, 2020 Medium

Microsoft Turing-NLG: 17B parameters and the birth of DeepSpeed

Microsoft Research unveils Turing-NLG, the largest announced language model to date (17B), made possible by the DeepSpeed/ZeRO optimizer that drastically cuts GPU memory.

Foundation Models MicrosoftTuring-NLGLarge Language Models

January 2020

January 28, 2020 Medium

Google Meena: the 2.6B end-to-end chatbot

Google introduces Meena, a 2.6B-parameter conversational model trained on 341 GB of social dialogue, along with SSA, a new metric for evaluating chatbot quality.

Foundation Models GoogleMeenaDialogue

January 13, 2020 Medium

Reformer: the transformer that handles very long sequences

Google Research presents Reformer, a transformer variant using LSH attention and reversible layers to go from O(n²) to O(n log n) and handle sequences up to 64k tokens.

Foundation Models GoogleReformerEfficient Transformers