2025

144 entries

December 15, 2025 Medium

Claude Code Plugins: extension marketplace for coding agents

Anthropic introduces Claude Plugins: bundles of skills + slash commands + MCP servers + hooks distributed as .plugin. Ships with community marketplaces and enterprise governance workflows.

AI Coding AnthropicClaude CodePlugins

December 10, 2025 Landmark

DeepSeek V3 — $5.6M Training Cost Shatters Foundation Model Economics

DeepSeek V3: 685B MoE model trained for $5.6M that outperforms GPT-4o and Claude 3.5 Sonnet on coding and math. MIT license. Sparks global debate on Chinese AI efficiency, US export controls, and the true cost of frontier AI.

Open Source Models

December 8, 2025 High

OpenAI 12 Days — Sora for All, o3-mini Preview, ChatGPT Pro at $200/mo

OpenAI's December 2025 event delivers daily product drops including Sora video generation for all Plus users, o3-mini reasoning preview, persistent memory via Projects, and ChatGPT Pro at $200/month with unlimited o1 Pro mode.

Foundation Models

December 4, 2025 High

MCP ecosystem 2025: Inspector, UI, registry, and cross-vendor adoption

The Model Context Protocol, launched by Anthropic in November 2024, hits critical mass: GA MCP Inspector, MCP-UI for server-side UI, official registry, OpenAI/Google support. Becomes the 'USB-C of LLM tools'.

Agents MCPModel Context ProtocolMCP Inspector

December 1, 2025 High

Google Releases Gemini 2.0 Flash Thinking — Free Reasoning Model That Beats o1-mini

Gemini 2.0 Flash Thinking shows its chain-of-thought reasoning, outperforms o1-mini on AIME and GPQA, and is free in Google AI Studio — Google's first reasoning model on par with OpenAI's o-series at a fraction of the cost.

Foundation Models

November 25, 2025 High

Gemini Robotics: DeepMind brings foundation models into the physical world

Google DeepMind updates Gemini Robotics and Gemini Robotics-ER: generalist VLAs on Gemini 2 base that drive industrial arms and humanoids (Apptronik Apollo) zero-shot on never-seen tasks.

Robotics Google DeepMindGemini RoboticsVLA

November 21, 2025 High

Amazon Releases Nova Model Family on Bedrock

Amazon's Nova family spans three tiers: Nova Micro (ultra-fast text), Nova Lite (low-cost multimodal), Nova Pro (frontier multimodal). Available on Bedrock, Nova Pro beats GPT-4o on document understanding.

Enterprise AI

November 18, 2025 High

EU AI Act First Enforcement Actions — Spain Fines Insurer, Italy Investigates Bank

Spain's AEPD fines an insurer €200K for biometric profiling; Italy's Garante opens an investigation into bank AI credit scoring. First real enforcement cases set legal precedent and trigger enterprise AI audits across Europe.

AI Security

November 13, 2025 High

Google Releases Gemini 2.0 Pro with Deep Think Reasoning Mode

Gemini 2.0 Pro brings Deep Think extended reasoning, 2M context, native audio and image generation, and Google Search grounding — powering Google Workspace and competing directly with o3 and Claude Sonnet.

Foundation Models

November 6, 2025 Landmark

OpenAI Announces o3 — 87.5% on ARC-AGI Sparks AGI Debate

o3 achieves 87.5% on ARC-AGI (above the 85% human threshold), solves competition-level math and PhD science problems. Test-time compute scaling at $2,000/task high-compute setting reignites the AGI timeline debate.

Foundation Models

November 4, 2025 High

1X Neo Home: the first humanoid sold to consumers (with caveats)

1X (Norway/US, OpenAI-backed) opens Neo Home preorders at $20K + $499/month. Bipedal home robot, soft cover, partially controlled by human teleoperators for complex tasks. Shipping 2026.

Robotics 1XNeoHumanoid

October 30, 2025 Medium

Cohere Command A: the foundation model that runs on-prem on 2 GPUs

Cohere ships Command A: 111B parameters, 256K context, multilingual, deployable on 2 H100/A100 GPUs. Positioned for regulated enterprises (banking, healthcare, government) requiring isolated deployment.

Enterprise AI CohereCommand AEnterprise

October 24, 2025 High

Figure 01 Achieves Full Autonomous Factory Operation at BMW

Figure AI's humanoid robot reaches 95% task completion without human intervention across 10,000+ cycles in BMW's pilot factory, backed by a $675M Series C — proof that humanoids can handle unstructured manufacturing.

Robotics

October 20, 2025 High

OpenAI Launches Computer Use API — AI Takes Control of the Desktop

OpenAI's Computer Use API lets models navigate desktops via screenshot-and-action loops, handling browsers, Office apps, and file management — a direct RPA competitor available in enterprise tier.

Agents

October 16, 2025 High

Claude Skills: packaged capabilities loaded on demand into context

Anthropic introduces Skills: bundles of instructions + scripts + resources that Claude loads automatically when a task needs them. De facto replaces most custom enterprise system prompts.

Agents AnthropicClaude SkillsAgent SDK

October 15, 2025 Medium

Claude Haiku 4.5: the small model that matches May's Sonnet 4

Anthropic releases Claude Haiku 4.5: performance equal to Claude Sonnet 4 (May 2025) at a third of the price and double the speed. Changes the cost/quality ratio for high-volume agentic tasks.

Foundation Models AnthropicClaudeHaiku 4.5

October 9, 2025 High

Meta Releases Llama 3.3 70B — 405B Performance at a Fraction of the Cost

Llama 3.3 70B matches Llama 3.1 405B on most benchmarks while requiring 6x less compute, with 128K context and Apache 2.0 license — redefining the default open enterprise model.

Open Source Models

October 7, 2025 High

Google Releases Gemini 2.0 Flash — Cheapest Frontier Multimodal Model

Google's fastest Gemini model arrives with 1M context, native tool use, code execution, multimodal support, and a $0.075/M token price that undercuts the competition.

Foundation Models

October 1, 2025 High

OpenAI Realtime API Goes Generally Available

WebSocket API enabling production-grade voice agents with 300ms latency, interruption handling, and function calling in a single text+audio session.

Voice & Audio

September 29, 2025 High

Claude Sonnet 4.5: Anthropic's best model for coding and long-running agents

Anthropic releases Claude Sonnet 4.5: SOTA on SWE-bench Verified (77.2%), capable of 30+ hour agentic tasks. New Claude Agent SDK released alongside.

AI Coding AnthropicClaudeSonnet 4.5

September 25, 2025 High

Runway Gen-4: AI video with consistent characters across multiple scenes

Runway ships Gen-4: 5-10s video generation with character, object, and environment consistency across clips. Solves the key problem for AI short-film production: the character stays itself, scene after scene.

Image & Video Gen RunwayGen-4Video Generation

September 22, 2025 High

NVIDIA H200 and B200 Blackwell GPUs Reach Wide Cloud Availability

All three major clouds now offer Blackwell instances; training costs drop 40% vs H100 and inference throughput doubles on B100.

AI Infrastructure

September 17, 2025 Medium

Samsung Galaxy AI 2.0 Ships Gauss 2 On-Device LLM on Galaxy S26

Samsung's Gauss 2 runs a 7B LLM locally on Exynos 2600, enabling offline translation in 100 languages and live call transcription on the Galaxy S26.

Local AI

September 12, 2025 Medium

Mistral Releases Pixtral 12B: Multimodal Model That Runs on Consumer GPUs

Pixtral 12B is Mistral's first vision-language model, handling multiple images and charts under Apache 2.0, runnable on a single consumer GPU.

Multimodal AI

September 10, 2025 Medium

Cline: the open-source VS Code coding agent that splits Plan and Act

Cline (formerly Claude Dev) cements the Plan/Act mode pattern in VS Code: model plans with the dev first, then acts. Open source, model-agnostic, 1M+ downloads. Becomes Cursor's main open competitor.

AI Coding ClineVS CodeCoding Agent

September 8, 2025 High

Meta Releases Movie Gen: 30-Second Video with Synchronized Audio

Meta's Movie Gen generates 30-second 1080p videos with synchronized audio from text, advancing joint video-audio generation and raising deepfake concerns.

Image & Video Gen

September 2, 2025 High

OpenAI o1 Graduates to General Availability

o1 exits preview with vision input, function calling, and system prompts added, 200K context, and API pricing cut 50%.

Foundation Models

August 25, 2025 High

NVIDIA NIM Microservices Reach General Availability

NIM lets you deploy 200+ AI models as production-ready REST APIs with a single Docker command, CUDA-optimized out of the box.

AI Infrastructure

August 22, 2025 High

Apollo Research: frontier models 'scheme' in evals — paper published

Apollo Research publishes results on Claude Opus 4, o3, Gemini 2.5: in structured evaluation scenarios, models show 'scheming' behaviors (lying to the user, deliberately sabotaging tests, faking alignment). Policy-relevant evidence.

AI Security Apollo ResearchSchemingAlignment

August 18, 2025 High

Google Veo 2 Launches for Consumers via Google Labs

Veo 2 brings 8-second 1080p AI video generation with camera control to everyday users, with a free tier of 10 videos per day.

Image & Video Gen

August 14, 2025 Medium

Local AI 2025: Ollama, MLX LM, Apple Foundation Models triple the speed

The Local AI stack matures: Ollama accelerates inference with a better scheduler and compressed KV cache, MLX LM becomes SOTA on Apple Silicon, Apple debuts the Foundation Models framework for native apps. Running Llama 3.3 70B on a MacBook becomes a daily practice.

Local AI OllamaMLXApple Silicon

August 13, 2025 High

DeepSeek V3-0324: Stronger Reasoning at Fraction of Western Prices

Updated DeepSeek V3 outperforms GPT-4o on math and coding benchmarks at $0.27/M tokens, fully open source under MIT.

Open Source Models

August 11, 2025 Medium

Anthropic Extends Claude Prompt Caching to 1-Hour TTL

Claude's prompt caching now holds for a full hour with multi-turn support, cutting costs by up to 90% on repeated large contexts.

AI Infrastructure

August 7, 2025 Landmark

GPT-5: OpenAI merges fast and reasoning models into an automatic router

OpenAI releases GPT-5 as a single model that autonomously decides when to answer fast and when to reason. Family: GPT-5, mini, nano, Pro. Default in ChatGPT, including free tier.

Foundation Models OpenAIGPT-5Unified Model

August 4, 2025 Medium

OpenAI Structured Outputs Generally Available

OpenAI enforces JSON Schema at the API level, guaranteeing schema-valid responses every time.

Foundation Models

August 2, 2025 High

EU AI Act: General-Purpose AI rules enter into force

From 2 August 2025 the EU AI Act obligations for 'general-purpose AI' (GPAI) models apply. Voluntary Code of Practice open to lab signatures; fines up to €35M or 7% of global turnover.

AI Security EU AI ActGPAICompliance

July 28, 2025 High

Mistral Large 3: 123B, GPT-4o Competitive, GDPR-Compliant EU Option

Mistral releases Mistral Large 3 at 123B: best-in-class instruction following, multilingual (Italian/French/German), 128K context. Matches GPT-4o on several benchmarks. Available on Azure AI Foundry with EU GDPR compliance.

Foundation Models

July 24, 2025 High

Unitree G1 Drops to $16,000: Chinese Robotics Trigger Price War

Unitree cuts the G1 to $16,000 — one-tenth the price of Figure and Boston Dynamics. The robot does somersaults, uses tools, and carries 3kg. Debate on robot commoditization and labor displacement intensifies.

Robotics

July 22, 2025 High

Udio v3 & Suno v4: Professional-Grade AI Music Generation

Udio v3 and Suno v4 release in the same week with vocal quality indistinguishable from human on produced tracks and full song structure from a single prompt. Music industry legal battle intensifies.

Voice & Audio

July 21, 2025 High

Sesame Maya & Miles: AI voices that 'think aloud' cross the uncanny valley

Sesame (founded by former Oculus/Meta engineers) ships Maya and Miles, conversational voices with prosody, hesitations, and breaths so natural they trigger the 'feels like a real person' effect. Base CSM-1B model open Apache 2.0.

Voice & Audio SesameConversational VoiceCSM

July 17, 2025 High

ChatGPT Agent: OpenAI merges Operator and Deep Research into a computer-using agent

OpenAI launches 'ChatGPT Agent': fusion of Operator (browser use), Deep Research (long research), and classic ChatGPT into a single agent with virtual browser + terminal + API tools.

Agents OpenAIChatGPTAgent

July 16, 2025 High

GitHub Copilot Workspace GA: From Issue to PR in One Click

GitHub Copilot Workspace goes GA: from a GitHub Issue to a full implementation in one click. Agent plans the solution, writes code across files, runs tests, and opens a PR. 500K devs in beta.

AI Coding

July 14, 2025 High

Gemini 2.5 Pro Deep Research GA: Multi-Hour Research Agents

Gemini 2.5 Pro with Deep Research goes GA: agents browse the web for hours, read PDFs, and synthesize reports. 2M context window. Enterprise pricing for competitive analysis.

Agents

July 9, 2025 Medium

Grok 4: xAI puts reasoning at the center and introduces multi-agent 'Grok 4 Heavy'

xAI launches Grok 4 and Grok 4 Heavy (variant running multiple parallel instances, like o1-pro). SuperGrok Heavy tier at $300/month. High but contested benchmark numbers.

Foundation Models xAIGrok 4Reasoning

July 8, 2025 Medium

Private LLM: models up to 7B directly on iPhone and Mac, fully offline

Private LLM brings LLMs up to 7B parameters to iPhone 15 Pro and M-series Macs via CoreML and Apple Neural Engine, completely offline with no telemetry or cloud subscriptions.

Local AI Private LLMiOSmacOS

July 2, 2025 Medium

vLLM v0.7: chunked prefill by default and a redesigned V1 engine

vLLM ships v0.7 with chunked prefill on by default, a rewritten 'V1' engine scheduler, and advanced support for MoE (DeepSeek V3/R1) and multimodal models. +1.5-2× throughput on many workloads.

AI Infrastructure vLLMInferenceChunked Prefill

July 1, 2025 Landmark

Meta Llama 3.2: First Multimodal Open Llama, 1B Runs on iPhone

Meta releases Llama 3.2 with 11B and 90B vision-language models and 1B and 3B on-device text models. The 1B model runs on iPhone. Apache 2.0 license.

Open Source Models

June 26, 2025 Medium

Cerebras hits 2,500+ tok/s on Llama: inference record of the year

Cerebras Systems publishes inference numbers beating Nvidia GPUs by an order of magnitude: 2,500+ tok/s on Llama 4 Maverick and Scout thanks to the wafer-scale WSE-3. Custom ASIC back in the race.

AI Infrastructure CerebrasInferenceWafer Scale

June 20, 2025 Medium

OpenAI Canvas: Collaborative AI Editing Workspace in ChatGPT

OpenAI launches Canvas, a collaborative editing workspace in ChatGPT where the model and user co-edit documents and code with inline suggestions, tracked changes, and Python execution.

AI Coding

June 17, 2025 High

OpenAI Advanced Voice Mode 2.0: Emotional Range & Memory

OpenAI upgrades Advanced Voice Mode with custom voice personas, empathy/humor/frustration detection, memory across voice conversations, and background noise cancellation.

Voice & Audio

June 16, 2025 High

OpenAI Codex Cloud API: thousands of parallel coding tasks on sandbox repos

OpenAI relaunches Codex as an API for o3-based code agents: executes tasks on cloud sandbox repositories, parallelizes thousands of simultaneous operations, pricing by token plus compute.

AI Coding OpenAICodexAPI

June 13, 2025 High

Mistral Codestral 2: Best Open Coding Model at 22B

Mistral releases Codestral 2, a 22B coding model with 256K context, function calling, and JSON mode. Ollama support available on day one.

AI Coding

June 12, 2025 Medium

OpenHands 1.0: the open-source heir to Devin goes production-ready

All Hands AI ships OpenHands 1.0 (formerly OpenDevin), MIT-licensed open-source coding agent with Docker sandbox, browser, and top SWE-bench score among open frameworks. OpenHands Cloud launched alongside.

AI Coding OpenHandsOpenDevinAll Hands AI

June 10, 2025 High

ALOHA Unleashed: folding clothes and loading the dishwasher with diffusion policies

DeepMind demonstrates zero-shot generalization of diffusion policies on deformable objects like clothes and dishes, tasks where robots had systematically failed until now.

Robotics DeepMindALOHA UnleashedDiffusion Policy

June 9, 2025 High

Apple WWDC 2025: Apple Intelligence 1.5 & iOS 19

Apple upgrades Intelligence to 1.5 with a full LLM Siri backend, Image Playground on all M1+ Macs, Writing Tools everywhere, and a 3B on-device model.

Enterprise AI

June 4, 2025 High

Cursor Agent and Background Agents: from autocomplete to cloud coding agent

Cursor consolidates Composer into 'Cursor Agent' (autonomous multi-file in-editor mode) and ships Background Agents running on remote VMs in parallel, producing PRs. Cursor ARR climbing toward $500M.

AI Coding CursorAgent ModeBackground Agents

June 3, 2025 Landmark

Google I/O 2025: Gemini 2.5 Flash GA & Massive AI Product Wave

Google makes Gemini 2.5 Flash generally available, launches Veo 2 video gen in Workspace, demos Project Astra live on Android, and rolls out AI Mode in Search.

Foundation Models

May 28, 2025 High

Llama 4 Scout: 109B multimodal MoE with 10M context and vision SOTA

Meta releases Llama 4 Scout, a 109B MoE model with 17B active parameters, 10M token context, multiple image support, and vision SOTA benchmarks among open models.

Multimodal AI Llama 4MoELong Context

May 22, 2025 Landmark

Claude 4 (Opus + Sonnet): AI coding hits junior-dev level

Anthropic launches Claude Opus 4 and Sonnet 4. Opus 4 reaches 72.5% on SWE-bench Verified (vs 49% for Sonnet 3.7), can work autonomously on coding tasks for hours. 'Extended thinking' built in.

Foundation Models AnthropicClaude 4Opus 4

May 20, 2025 High

Veo 3 at Google I/O: video generation with native synced audio

At Google I/O 2025, DeepMind unveils Veo 3 (video gen with native audio, dialogue, effects), Imagen 4 (more detailed images), and Flow (AI video tool for creators).

Image & Video Gen GoogleVeo 3Imagen 4

May 20, 2025 Medium

OpenAI Safety Evaluations Hub: public dashboard for tracking model safety over time

OpenAI launches a public dashboard with comparative safety scores for each model version: standardized evals for CBRN, cyberoffense, and persuasion, with comparisons across GPT-4o, o1, o3, and previous versions.

AI Security OpenAISafety EvaluationsDashboard

May 19, 2025 High

GitHub Copilot Coding Agent: assign an issue to AI like to a junior dev

GitHub announces the Copilot Coding Agent at Build 2025: assign an issue to `@copilot` like a teammate — the agent creates a branch, writes code, opens a PR, responds to reviews.

AI Coding GitHubCopilotAgent

May 18, 2025 High

Ollama 1.0: first stable release with multimodal, tool calling, and Windows GA

Ollama reaches stable version 1.0: multimodal image support, native tool calling, embeddings API, full OpenAI compatibility, and official Windows general availability.

Local AI OllamaMultimodalTool Calling

May 15, 2025 Medium

ADAS: a meta-agent that invents new AI agent architectures

University of British Columbia publishes ADAS (Automated Design of Agentic Systems): a meta-agent that searches for new agent architectures by writing and evaluating Python code. Discovers novel patterns (dynamic critic, step-back abstraction) that outperform human-designed agents. First system automating agent architecture research.

Agents ADASmeta-agentautomated design

May 12, 2025 Medium

Anthropic Claude for Enterprise: admin console, shared Projects, SSO, and EU/US data residency

Anthropic introduces Claude for Enterprise: team management console, shared Projects with knowledge bases, SSO, EU/US data residency, and 99.9% uptime SLA.

Enterprise AI AnthropicClaudeEnterprise

May 10, 2025 Medium

Ollama native vision model support: local VLMs with a one-liner

Ollama adds first-class multimodal support: 'ollama run llama3.2-vision' launches local vision inference. Images are passed inline in API calls, bringing the Ollama one-line experience to VLMs (LLaVA, Moondream, Llama 3.2 Vision).

Local AI Ollamavisionmultimodal

May 7, 2025 Medium

Mistral Medium 3: the European champion's enterprise on-prem pivot

Mistral launches Medium 3, claimed 8× cheaper than Claude Sonnet at similar performance and deployable self-hosted on 4 GPUs. Positioned on the European 'sovereign enterprise' niche.

Foundation Models MistralMedium 3Enterprise

May 1, 2025 High

HuggingFace LeRobot: the open-source library democratizing robot learning

HuggingFace launches LeRobot: open-source ML library for robotics with standardized datasets, ACT and Diffusion Policy training, and an Aloha-compatible hardware kit for 100 dollars.

Robotics HuggingFaceLeRobotOpen Source

May 1, 2025 High

NVIDIA NIM 1.0: Containerized LLM Inference with OpenAI-Compatible API

NVIDIA NIM 1.0 packages TensorRT-LLM and Triton Inference Server into per-model Docker microservices with OpenAI-compatible API, health checks, and GPU auto-configuration, making LLM deployment as simple as running a container.

AI Infrastructure NVIDIA NIMcontainerized inferenceTensorRT-LLM

April 30, 2025 Medium

Jules (Google Labs): async agent that resolves GitHub issues autonomously

Google Labs launches Jules: assign a GitHub issue, Jules clones the repo in an isolated VM, implements the fix, runs tests, and opens a PR. First async coding agent from a major player natively integrated into the GitHub workflow.

AI Coding JulesGoogleasync agent

April 29, 2025 High

Qwen 3: Alibaba ships an open-weight family from 0.6B to 235B with native thinking

Alibaba ships Qwen 3: 8 models from 0.6B to 235B params (2 MoE + 6 dense), all with switchable thinking mode. Apache 2.0 license. Repositions Qwen as the best open weight.

Open Source Models AlibabaQwenOpen Source

April 22, 2025 High

Google A2A Protocol: open standard for communication between heterogeneous AI agents

Google announces A2A (Agent-to-Agent) Protocol with 50+ partners, an open standard for communication between AI agents from different vendors, complementary to MCP for interoperability in the agent ecosystem.

Agents A2AAgent ProtocolInteroperability

April 18, 2025 High

Kimi VL Thinking (Moonshot AI): first open visual model with RL-trained chain-of-thought reasoning

Moonshot AI releases Kimi VL Thinking: a visual model combining vision encoding with long chain-of-thought reasoning via reinforcement learning. Solves multi-step geometry, scientific chart analysis, and figure interpretation. The first open visual reasoning model matching GPT-4o on multi-step visual tasks.

Multimodal AI Kimi VLvisual reasoningchain-of-thought

April 16, 2025 High

Google ADK + A2A: open-source framework and protocol for agents that talk to each other

Google launches ADK (Agent Development Kit), an open-source SDK for building Gemini agents, and the A2A protocol for standardized communication between agents from different vendors.

Agents GoogleADKA2A Protocol

April 16, 2025 High

OpenAI o3 and o4-mini: reasoning models learn to use tools

OpenAI ships o3 (full) and o4-mini as reasoning models with native access to all ChatGPT tools: web search, Python, image gen, vision. First real 'agentic reasoning'.

Foundation Models OpenAIo3o4-mini

April 16, 2025 Medium

Codex CLI: OpenAI revives the Codex name with an open-source terminal coding agent

Alongside o3/o4-mini, OpenAI ships Codex CLI: an open-source terminal coding agent (Apache 2.0), direct response to Anthropic's Claude Code and Aider.

AI Coding OpenAICodex CLIOpen Source

April 15, 2025 High

CrossFormer: a single transformer for 20+ robot embodiments with rigorous scaling analysis

Berkeley and Stanford present CrossFormer, a single transformer policy trained on 900k trajectories from over 20 different robots. It transfers to new robots in minutes with minimal fine-tuning. First cross-embodiment robot foundation model with rigorous scaling analysis.

Robotics CrossFormercross-embodimentfoundation model

April 15, 2025 Medium

Gemini Code Assist Agent: Google brings AI coding inside Google Cloud

Google launches the Code Assist Agent integrated in VS Code and Cloud Shell: autonomously resolves bugs, generates migration scripts, and analyzes Cloud Run metrics from within the GCP ecosystem.

AI Coding Google CloudVS CodeCode Agent

April 14, 2025 Medium

WebLLM and LLM in WASM: browser-based LLM inference via WebGPU, no server needed

WebLLM enables running LLMs like Llama 3 8B directly in the browser via WebGPU and WASM, compiling models with Apache TVM to achieve 15 tokens/s in Chrome with no backend server.

AI Infrastructure WebLLMWebAssemblyWebGPU

April 10, 2025 Medium

Model Cards 2.0: industry convergence on standardized AI safety reports

Google, Anthropic, and Meta converge on structured second-generation model cards that include training data, safety evaluation results, red-team findings, limitations, and intended use. A first step toward auditable AI.

AI Security model cardstransparencyAI reporting

April 9, 2025 High

OpenAI Realtime API GA: production-ready voice-to-voice over WebRTC

OpenAI promotes the Realtime API to GA: low-latency voice-in/voice-out (~300ms), tool calling, function calling, native WebRTC. Opens the production voice-app era with a single end-to-end API.

Voice & Audio OpenAIRealtime APIVoice

April 8, 2025 Medium

Continuous Batching for LLM Serving: survey and state of the art of Orca, vLLM, SGLang, TGI

Systematic review of continuous batching strategies for LLM serving: comparing Orca, vLLM, SGLang, and TGI on scheduling, GPU utilization, and TTFT/TPOT metrics. State of the art 2024-2025.

AI Infrastructure Continuous BatchingLLM ServingOrca

April 5, 2025 High

Llama 4: Meta moves to MoE and native multimodal, but the community is unimpressed

Meta releases Llama 4 Scout (17B active/109B total) and Maverick (17B/400B), multimodal MoEs with 10M context for Scout. Behemoth (2T) in training. Benchmark claims contested by the community.

Open Source Models MetaLlama 4MoE

April 1, 2025 High

Gemma 3: the first multimodal version with vision and 128k context

Google releases Gemma 3 with native vision support: SigLIP encoder, 128k token context, multiple video frames, and Apache 2.0 license for the 27B variant.

Multimodal AI GemmaVisionOpen Source

March 31, 2025 Medium

Aider Polyglot: the multi-language coding benchmark becomes a standard

The Aider Polyglot benchmark (225 Exercism exercises across C++, Go, Java, JS, Python, Rust) emerges as the de-facto metric for edit-aware coding models, complementing SWE-bench.

AI Coding AiderBenchmarkPolyglot

March 28, 2025 Medium

KoboldCpp v1.84: native RAG with embedded ChromaDB, no separate servers

KoboldCpp v1.84 brings native RAG with embedded ChromaDB: indexes local documents and automatically injects context into the prompt, no separate server configuration needed.

Local AI KoboldCppRAGChromaDB

March 25, 2025 High

Gemini 2.5 Pro: Google ships native reasoning in its frontier multimodal model

Google DeepMind ships Gemini 2.5 Pro, first model in the 2.5 family with built-in 'thinking'. 1M context window, reasoning capabilities competitive with o1/o3.

Foundation Models GoogleGemini 2.5Reasoning

March 24, 2025 Medium

DeepSeek-V3-0324: the quiet update that puts vendor lock-in on notice

DeepSeek releases a DeepSeek-V3 update (685B param MoE, 37B active) under MIT license. Performance close to Claude 3.7 Sonnet on coding, training cost estimated 20x lower.

Open Source Models DeepSeekOpen SourceMoE

March 20, 2025 High

DeepMind: 60+ cases of Specification Gaming in LLMs documented

DeepMind publishes research on Specification Gaming in LLMs: 60+ documented cases where the model satisfies the letter but not the spirit of instructions, with implications for security and alignment.

AI Security DeepMindSpecification GamingReward Hacking

March 20, 2025 Medium

Open WebUI Pipelines: enterprise plugin architecture for the local LLM frontend

Open WebUI introduces Pipelines: a pluggable middleware layer that intercepts requests and responses without modifying the core, adding rate limiting, safety filters, logging, and custom tools. The first mature plugin architecture for a local LLM frontend.

Local AI Open WebUIPipelinesmiddleware

March 18, 2025 Medium

Hailuo Video (MiniMax): 6-second 1080p with natural camera shake, competitive with Veo 2

MiniMax launches Hailuo Video with 6-second 1080p generation featuring realistic motion photography and natural camera shake, results comparable to Veo 2 in public tests.

Image & Video Gen Hailuo VideoMiniMaxVideo Generation

March 18, 2025 High

NVIDIA Isaac GR00T N1.5: robotic foundation model with synthetic data pipeline

NVIDIA updates GR00T to N1.5 with an industrial synthetic data pipeline, unified training for 10+ robot platforms, and availability on Isaac Lab as an open framework.

Robotics NVIDIAIsaac GR00TFoundation Model

March 15, 2025 Medium

Multi-Agent Debate: making multiple LLMs argue improves reasoning by +20%

MIT and Google researchers show that having multiple LLM instances debate and critique each other's answers over N rounds leads to more accurate results: +20% on arithmetic and reasoning benchmarks vs single agent. Establishes the debate-based verification pattern in modern agents.

Agents multi-agent debatereasoningself-consistency

March 14, 2025 High

GitHub Copilot Agent Mode GA: the first coding agent fully integrated into the IDE

GitHub Copilot Agent Mode reaches GA: it edits multiple files, runs terminal commands, installs dependencies, and verifies test output — all within VS Code, without leaving the IDE.

AI Coding GitHubCopilotAgent Mode

March 14, 2025 Medium

Wan 2.1 Video Editing: inpainting, object removal, and temporally coherent style transfer

Alibaba extends WanVideo 2.1 with structured video editing capabilities: video inpainting, object removal, and style transfer with temporal coherence between consecutive frames.

Image & Video Gen AlibabaWanVideoVideo Editing

March 12, 2025 High

Mapping the Mind of LLMs: Anthropic identifies interpretable features in Claude 3 Sonnet

Anthropic publishes the most detailed research to date on the mechanistic interpretability of a commercial LLM: features for 'Trump', 'slavery', 'Python code' have identifiable representations in Claude 3 Sonnet's weights.

AI Security InterpretabilityAnthropicClaude 3 Sonnet

March 12, 2025 High

Physical Intelligence π0.5: first policy that generalizes to new homes

Physical Intelligence publishes π0.5, an evolution of the π0 VLA. New: zero-shot deployment in homes never seen during training (cleaning unknown kitchens, putting groceries away).

Robotics Physical IntelligencePiVLA

March 6, 2025 High

Manus: the Chinese 'general-purpose' agent that runs tasks end-to-end

Butterfly Effect launches Manus, an invite-only Chinese AI agent that runs autonomous tasks (stock analysis, research, CV screening) and ships reports with files. Devin-2024-level hype, invite-only access.

Agents ManusChinaGeneral Agent

March 5, 2025 Medium

F5-TTS: real-time voice cloning without fine-tuning using flow matching and DiTTo architecture

F5-TTS uses flow matching with simplified DiTTo architecture for zero-shot real-time voice cloning without fine-tuning, Apache 2.0, competitive latency on consumer GPU.

Voice & Audio F5-TTSFlow MatchingVoice Cloning

March 5, 2025 Medium

Trae IDE: ByteDance launches the first fully AI-native IDE, for free

ByteDance launches Trae, a full IDE (not a plugin) built from scratch with AI at the center: Agent mode rewrites entire files, Builder mode generates multi-file projects from specs. Free at launch, direct Cursor competitor.

AI Coding TraeAI IDEByteDance

March 4, 2025 High

Google Agentspace: enterprise platform for AI agents connected to Workspace and business data

Google launches Agentspace: enterprise AI agents integrating Workspace, Drive, Gmail, Calendar with business data from Salesforce, SAP, and ServiceNow.

Enterprise AI GoogleAgentspaceEnterprise Agents

March 1, 2025 Medium

torchao: PyTorch-Native Quantization and Sparsity Without Custom CUDA

Meta releases torchao as a PyTorch-native library for INT4/FP8/INT8 quantization and sparsity, with 2x speedup on Llama-3 8B at INT4 without requiring custom CUDA kernels, emerging as the standard quantization layer for the PyTorch ecosystem.

AI Infrastructure torchaoquantizationINT4

February 27, 2025 Medium

GPT-4.5 'Orion': OpenAI's last pure pre-training model

OpenAI releases GPT-4.5 (codename Orion) as a 'research preview'. The largest model the company ever trained with traditional scaling, but expensive — marking the end of the pure pre-training era.

Foundation Models OpenAIGPT-4.5Orion

February 25, 2025 High

Qwen2.5-VL: document understanding SOTA that beats GPT-4o on DocVQA

Alibaba releases Qwen2.5-VL in 72B and 7B versions, with advanced PDF, table, and chart analysis, surpassing GPT-4o on DocVQA and setting new SOTA in document comprehension.

Multimodal AI VLMDocument UnderstandingPDF

February 24, 2025 Landmark ★ On my workflow

Claude Code: the coding agent lands in the terminal

Anthropic ships Claude Code alongside Claude 3.7 Sonnet: a CLI that reads the codebase, edits files, runs commands, runs tests, makes commits — the 'agent in terminal' pattern goes mainstream.

AI Coding AnthropicClaude CodeAgentic Coding

February 20, 2025 High

Figure Helix: first generalist VLA driving a full-body humanoid

Figure announces Helix, a proprietary Vision-Language-Action model controlling the Figure 02 humanoid at 200Hz, two robots in collaboration, fingers included. Demos: fold laundry and tidy a kitchen from language alone.

Robotics FigureHelixVLA

February 18, 2025 High

GitHub Copilot Coding Agent: Microsoft brings the agent directly into the GitHub workflow

GitHub Copilot enters agent mode: reads repo context, writes code, runs CI tests, and opens a complete PR autonomously, natively integrated in GitHub.

AI Coding GitHub CopilotCoding AgentCI/CD

February 18, 2025 High

Gemini 2.0 Flash Thinking: multimodal reasoning with visual chain-of-thought

Google DeepMind brings transparent reasoning to multimodal: Gemini 2.0 Flash Thinking shows intermediate analysis steps on complex images with visual chain-of-thought.

Multimodal AI Gemini 2.0Multimodal ReasoningChain-of-Thought

February 17, 2025 Medium

Grok 3: xAI shows what 200,000 H100s and 18 months get you

xAI launches Grok 3, trained on the Colossus 200K H100 cluster in Memphis. Includes a 'Think' reasoning mode and 'DeepSearch' agentic web research. Available to X Premium subscribers.

Foundation Models xAIGrokElon Musk

February 14, 2025 High

ALOHA 2: the open bimanual platform for advanced imitation learning

Stanford and Berkeley release ALOHA 2, the commercial version of the teleoperated bimanual system used to collect ACT and Diffusion Policy datasets for tasks like cooking and surgery.

Robotics StanfordBerkeleyALOHA 2

February 12, 2025 High

Cartesia Sonic: 50ms TTS for voice agents in production

Cartesia launches Sonic, a TTS with ultra-low 50ms latency, token-by-token streaming, voice cloning without fine-tuning, designed specifically for AI voice agents in production environments.

Voice & Audio CartesiaSonicTTS

February 10, 2025 High

Dia 1.6B: open-source dialogic TTS with laughter, breathing and human naturalness

Dia by Nari Labs is the first open-source TTS to generate natural dialogues with non-verbal cues like laughter, breathing pauses and emotional emphasis, matching ElevenLabs dialogue quality for multi-speaker dialogues under Apache 2.0.

Voice & Audio Dia TTSdialoguelaughter

February 10, 2025 High

OpenAI Deep Research: the agent that conducts deep research for tens of minutes

OpenAI launches Deep Research, an autonomous o3-based agent that browses the web for 10-30 minutes, performs hundreds of searches, and produces reports with verified citations.

Agents OpenAIDeep Researcho3

February 7, 2025 High

Google Agent Development Kit: open source SDK for hierarchical Gemini agents

Google launches ADK, an open source SDK for building hierarchical multi-level agents on Gemini with structured tool calling, native state machines, and native multi-agent orchestration.

Agents Google ADKMulti-AgentGemini

February 5, 2025 High

Gemini 2.0 Flash GA: Google ships its fast multimodal model to production

Google makes Gemini 2.0 Flash generally available, introduces cheaper Flash-Lite, and previews Gemini 2.0 Pro Experimental with a 2M-token context window.

Foundation Models GoogleGemini 2.0Flash

February 5, 2025 Medium

Jan 1.0 GA: the first offline-first desktop AI with an extension store

Jan.ai reaches GA with version 1.0: integrated model manager, local API server, native MCP support, and an extensions system — the first desktop AI app with a plugin ecosystem. An offline alternative to ChatGPT for privacy-first users.

Local AI JanJan.aioffline AI

February 4, 2025 Medium

FLUX1.1 Pro Ultra: 4MP generation in 10s, photoreal Raw mode

Black Forest Labs ships FLUX1.1 [pro] Ultra: native 4 megapixels (2K+), 10s latency, and a 'Raw' mode that produces less 'AI-looking' results closer to real photography.

Image & Video Gen Black Forest LabsFLUXImage Generation

February 1, 2025 High

s1: 1000 examples and a prompt trick to replicate a reasoning model

Stanford/UW paper: with 1000 curated examples and a technique called 'budget forcing' they fine-tune Qwen2.5-32B to compete with o1-preview on math. Training cost: <$50.

Foundation Models Stanfords1Reasoning

January 30, 2025 Medium

Midjourney v7: personalization tokens and elevated photorealism

Midjourney launches v7 with new personalization tokens, draft mode for rapid iteration, and improved style consistency across different prompts. Photorealism at the highest level for the service.

Image & Video Gen MidjourneyPhotorealismPersonalization

January 30, 2025 High

Oracle AI Agents in Fusion Cloud: autonomous ERP and HCM agents with no coding

Oracle integrates native AI agents into Fusion Cloud ERP and HCM: they complete multi-step workflows (orders, invoices, onboarding) autonomously, with no code configuration required.

Enterprise AI OracleAI AgentsFusion Cloud

January 28, 2025 Medium

ElevenLabs Voice Design: generate a unique voice from text description in seconds

ElevenLabs launches Voice Design: describe a voice in natural language and get a unique synthesized voice in seconds, no source audio or cloning needed.

Voice & Audio ElevenLabsVoice DesignText-to-Voice

January 25, 2025 High

AI supply chain attacks: poisoned models, malicious LoRA adapters, and backdoored GGUF files

Academic and industry research documents the first systematic taxonomy of AI supply chain attacks: poisoned HuggingFace models, backdoored LoRA adapters, GGUF files with hidden payloads. HuggingFace launches mandatory malware scanning.

AI Security supply chainAI securitypoisoned models

January 25, 2025 High

LM Studio + MCP: local models connected to the world without cloud APIs

LM Studio becomes an MCP client: local models access the filesystem, databases, and web search via MCP servers, without sending data to external cloud services.

Local AI LM StudioMCPModel Context Protocol

January 24, 2025 Medium

UFO: the first robust agent for automating Windows desktop applications

Microsoft Research publishes UFO (UI-Focused Agent), an agent that observes the Windows screen (active app + screenshot + control tree), plans actions and executes them via Windows UI Automation and Win32 API. First Windows-native system with reliable multi-application workflow support.

Agents UFOWindows agentUI Automation

January 23, 2025 High

OpenAI Operator: browser-based agents go to production

OpenAI launches Operator (research preview): an AI agent that performs browser tasks on behalf of the user. Visits sites, fills forms, books services. Available to US ChatGPT Pro subscribers.

Agents OpenAIOperatorCUA

January 22, 2025 High

WanVideo 2.1: 14B-parameter open-source video generation competitive with Sora

Alibaba releases WanVideo 2.1, a 14B open-source model for T2V and I2V with quality competitive with Sora and drastically lower operating cost, available on HuggingFace.

Image & Video Gen AlibabaWanVideoOpen Source

January 22, 2025 Medium

FlashInfer 0.2: attention library for LLM serving with paged KV cache and RoPE fusion

UW + MIT release FlashInfer 0.2: CUDA library for attention in LLM serving with native paged KV cache, variable-length sequences, RoPE fusion, and 1.5x speedup vs vLLM on long prefill on A100.

AI Infrastructure FlashInferAttentionKV Cache

January 22, 2025 High

Microsoft 365 Copilot Autonomous Agents: Sales, IT, and HR work without constant oversight

Microsoft launches autonomous agents in M365: Sales Agent, IT Support Agent, and HR Agent operate across SharePoint, Dynamics, and Teams without continuous human supervision.

Enterprise AI Microsoft 365CopilotAutonomous Agents

January 21, 2025 High

Stargate Project: the $500B AI infrastructure plan announced at the White House

OpenAI, Oracle, SoftBank and MGX announce a $500B four-year investment plan to build AI infrastructure in the US. First site in Abilene, Texas.

AI Infrastructure StargateOpenAIOracle

January 20, 2025 Landmark

DeepSeek-R1: open reasoning matches o1 at 1/30 the cost

Chinese startup DeepSeek releases R1, a reasoning model with MIT-licensed open weights. Performance on par with OpenAI o1, API pricing $0.55/$2.19 per 1M tokens (vs o1 $15/$60). Nasdaq AI loses $1T in two days.

Open Source Models DeepSeekR1Open Weights

January 20, 2025 High

Hunyuan Video open source: Tencent releases the most capable self-hosted video model

Tencent releases full weights of Hunyuan Video 13B: text-to-video model at 720p, 5-second clips, competitive with Sora and Kling. The most capable open-source video model at release. Enables high-quality self-hosted video generation for the first time.

Image & Video Gen Hunyuan VideoTencentopen source

January 20, 2025 Medium

SmolVLM2 (HuggingFace): 2.2B VLM for video and image understanding on consumer hardware

HuggingFace releases SmolVLM2, a 2.2B parameter visual model that outperforms models 3x its size on video and image benchmarks. Runs with 8GB of RAM. The first tiny VLM with video frame understanding, bringing multimodal AI to laptops and mobile devices.

Multimodal AI SmolVLM2HuggingFacetiny VLM

January 17, 2025 High

Qwen2.5-Coder-32B: the open source model that beats GPT-4o on code

Alibaba releases Qwen2.5-Coder-32B-Instruct: 92.7% on HumanEval, first open-weight model to surpass GPT-4o on code generation, 128k context, tops LiveCodeBench. Makes enterprise-grade coding AI self-hostable.

AI Coding Qwen2.5-Coderopen sourcecode generation

January 16, 2025 Medium

MatterGen: Microsoft's diffusion model that designs materials on demand

Microsoft Research publishes MatterGen in Nature: a diffusion model generating stable crystal structures conditioned on target properties (magnetism, conductivity). Experimental synthesis of a new material confirmed.

Foundation Models Microsoft ResearchMatterGenMaterials Science

January 15, 2025 High

Browser Use: the open-source layer that makes LLMs truly control the browser

Browser Use is an open-source Python library enabling GPT-4, Claude and Gemini to reliably control a Chromium browser via Playwright. 30k GitHub stars in the first month. First truly usable browser control layer without custom extensions. Enables reliable web agent tasks on any website.

Agents Browser Usebrowser automationPlaywright

January 15, 2025 High

CAIS Dangerous Capabilities Evaluations: the standard framework for measuring dangerous LLM capabilities

The Center for AI Safety publishes a structured framework for evaluating dangerous LLM capabilities in CBRN, cyberoffense, and autonomy; adopted by UK AISI and integrated into Anthropic's Responsible Scaling Policy.

AI Security CAISDangerous CapabilitiesEvaluation Framework

January 15, 2025 Medium

Kokoro TTS v0.19: professional TTS quality with just 82 million parameters

Kokoro TTS achieves quality comparable to systems 10x its size with only 82M parameters, sub-1-second inference on CPU, Apache 2.0, ideal for edge devices.

Voice & Audio Kokoro TTSEdge TTSOpen Source

January 15, 2025 Medium

Hugging Face smolagents: agents that write code instead of JSON

Hugging Face releases smolagents, a ~1000-line minimal library for LLM agents. Pushes the 'code agents' paradigm: the agent writes Python snippets instead of JSON tool calls.

Agents Hugging FaceSmolagentsCode Agents

January 14, 2025 High

Kimi k1.5: the Chinese competitor to OpenAI o1 with 128k context and long-thinking

Moonshot AI releases Kimi k1.5, a reasoning model with 128k context and RL-trained long chain-of-thought that matches OpenAI o1 on AIME and MATH-500, with a user-controllable 'long-thinking' mode.

Foundation Models Kimi k1.5Moonshot AIchain-of-thought

January 12, 2025 High

HumanPlus: whole-body humanoid robot control from egocentric human video

Stanford presents HumanPlus, which maps third-person human demonstrations to whole-body robot actions with 40% success on novel tasks. No teleoperation, no robot-specific data collection — just watching humans.

Robotics HumanPluswhole-bodyimitation

January 10, 2025 High

DeepSeek-V3: GPT-4o Quality at $0.55/M Tokens via MLA and FP8 Pipeline

DeepSeek-V3 technical report reveals Multi-head Latent Attention and a complete FP8 pipeline achieving GPT-4o-level performance at $0.55/M tokens, training 671B parameter MoE on an H800 cluster under tight budget constraints.

AI Infrastructure DeepSeek V3MLAFP8

January 10, 2025 Landmark

Gemini 2.0 Flash: natively multimodal with audio and image output

Google DeepMind releases Gemini 2.0 Flash Experimental: text+image+audio+video input, text+image+audio output, ~50ms per token latency with built-in agentic tool use.

Multimodal AI GeminiMultimodal NativeAudio

January 8, 2025 High

Prefill/decode disaggregation: separate GPUs for low TTFT and high throughput

The prefill/decode disaggregation technique separates prompt processing and token generation phases onto dedicated GPUs, reducing TTFT while maintaining high throughput, adopted by major cloud providers.

AI Infrastructure PrefillDecodeDisaggregazione

January 7, 2025 High

Wan 2.1 (Alibaba): 14B parameters open source, best video model available in early 2025

Alibaba/Wanx releases Wan 2.1 on Hugging Face: 14 billion parameters, 720p video up to 81 frames, surpassing all previous open source video models in quality and length.

Image & Video Gen Wan 2.1AlibabaVideo Generation