Agents

58 entries

June 19, 2026 High

Anthropic releases Memory API GA for Claude: structured persistent storage for agents across sessions

Anthropic has made its Memory API generally available, providing structured persistent storage for Claude agents across sessions with project-scoped memory, user-scoped memory, and semantic search over stored facts.

Agents Memory APIPersistent StorageAgentic AI

May 12, 2026 High

MCP at 18 months: the server ecosystem hits critical mass

Eighteen months after launch (November 2024), Model Context Protocol consolidates: thousands of public servers, confirmed cross-vendor adoption, first stable official registry.

Agents MCPModel Context ProtocolAnthropic

April 21, 2026 High

Deep Research and Deep Research Max: Google's autonomous research agents with MCP

Google ships two research agents on the Gemini API: Deep Research (fast) and Deep Research Max (deep + slow, 93.3% on DeepSearchQA). MCP support for private data, native visualizations via Nano Banana 2.

Agents GoogleGeminiDeep Research

March 26, 2026 High

OpenAI consolidates its agent platform: Operator and ChatGPT Agent merged

OpenAI reorganizes Operator (January 2025) and ChatGPT Agent (July 2025) into a unified platform, with refreshed SDK and new async multi-task execution modes.

Agents OpenAIAgentsChatGPT

January 23, 2026 High

OpenAI Operator GA: the first commercial autonomous web agent

OpenAI launches Operator in GA across 30+ countries: an agent that browses the web, fills forms, books appointments, and shops online autonomously on behalf of the user.

Agents

January 12, 2026 High

Claude Cowork: Anthropic's desktop agent for non-technical knowledge workers

Anthropic ships Cowork as a research preview: a desktop agent with sandboxed shell and local file access, aimed at people who don't live in the terminal the way Claude Code users do.

Agents AnthropicClaudeCowork

December 4, 2025 High

MCP ecosystem 2025: Inspector, UI, registry, and cross-vendor adoption

The Model Context Protocol, launched by Anthropic in November 2024, hits critical mass: GA MCP Inspector, MCP-UI for server-side UI, official registry, OpenAI/Google support. Becomes the 'USB-C of LLM tools'.

Agents MCPModel Context ProtocolMCP Inspector

October 20, 2025 High

OpenAI Launches Computer Use API — AI Takes Control of the Desktop

OpenAI's Computer Use API lets models navigate desktops via screenshot-and-action loops, handling browsers, Office apps, and file management — a direct RPA competitor available in enterprise tier.

Agents

October 16, 2025 High

Claude Skills: packaged capabilities loaded on demand into context

Anthropic introduces Skills: bundles of instructions + scripts + resources that Claude loads automatically when a task needs them. De facto replaces most custom enterprise system prompts.

Agents AnthropicClaude SkillsAgent SDK

July 17, 2025 High

ChatGPT Agent: OpenAI merges Operator and Deep Research into a computer-using agent

OpenAI launches 'ChatGPT Agent': fusion of Operator (browser use), Deep Research (long research), and classic ChatGPT into a single agent with virtual browser + terminal + API tools.

Agents OpenAIChatGPTAgent

July 14, 2025 High

Gemini 2.5 Pro Deep Research GA: Multi-Hour Research Agents

Gemini 2.5 Pro with Deep Research goes GA: agents browse the web for hours, read PDFs, and synthesize reports. 2M context window. Enterprise pricing for competitive analysis.

Agents

May 15, 2025 Medium

ADAS: a meta-agent that invents new AI agent architectures

University of British Columbia publishes ADAS (Automated Design of Agentic Systems): a meta-agent that searches for new agent architectures by writing and evaluating Python code. Discovers novel patterns (dynamic critic, step-back abstraction) that outperform human-designed agents. First system automating agent architecture research.

Agents ADASmeta-agentautomated design

April 22, 2025 High

Google A2A Protocol: open standard for communication between heterogeneous AI agents

Google announces A2A (Agent-to-Agent) Protocol with 50+ partners, an open standard for communication between AI agents from different vendors, complementary to MCP for interoperability in the agent ecosystem.

Agents A2AAgent ProtocolInteroperability

April 16, 2025 High

Google ADK + A2A: open-source framework and protocol for agents that talk to each other

Google launches ADK (Agent Development Kit), an open-source SDK for building Gemini agents, and the A2A protocol for standardized communication between agents from different vendors.

Agents GoogleADKA2A Protocol

March 15, 2025 Medium

Multi-Agent Debate: making multiple LLMs argue improves reasoning by +20%

MIT and Google researchers show that having multiple LLM instances debate and critique each other's answers over N rounds leads to more accurate results: +20% on arithmetic and reasoning benchmarks vs single agent. Establishes the debate-based verification pattern in modern agents.

Agents multi-agent debatereasoningself-consistency

March 6, 2025 High

Manus: the Chinese 'general-purpose' agent that runs tasks end-to-end

Butterfly Effect launches Manus, an invite-only Chinese AI agent that runs autonomous tasks (stock analysis, research, CV screening) and ships reports with files. Devin-2024-level hype, invite-only access.

Agents ManusChinaGeneral Agent

February 10, 2025 High

OpenAI Deep Research: the agent that conducts deep research for tens of minutes

OpenAI launches Deep Research, an autonomous o3-based agent that browses the web for 10-30 minutes, performs hundreds of searches, and produces reports with verified citations.

Agents OpenAIDeep Researcho3

February 7, 2025 High

Google Agent Development Kit: open source SDK for hierarchical Gemini agents

Google launches ADK, an open source SDK for building hierarchical multi-level agents on Gemini with structured tool calling, native state machines, and native multi-agent orchestration.

Agents Google ADKMulti-AgentGemini

January 24, 2025 Medium

UFO: the first robust agent for automating Windows desktop applications

Microsoft Research publishes UFO (UI-Focused Agent), an agent that observes the Windows screen (active app + screenshot + control tree), plans actions and executes them via Windows UI Automation and Win32 API. First Windows-native system with reliable multi-application workflow support.

Agents UFOWindows agentUI Automation

January 23, 2025 High

OpenAI Operator: browser-based agents go to production

OpenAI launches Operator (research preview): an AI agent that performs browser tasks on behalf of the user. Visits sites, fills forms, books services. Available to US ChatGPT Pro subscribers.

Agents OpenAIOperatorCUA

January 15, 2025 High

Browser Use: the open-source layer that makes LLMs truly control the browser

Browser Use is an open-source Python library enabling GPT-4, Claude and Gemini to reliably control a Chromium browser via Playwright. 30k GitHub stars in the first month. First truly usable browser control layer without custom extensions. Enables reliable web agent tasks on any website.

Agents Browser Usebrowser automationPlaywright

January 15, 2025 Medium

Hugging Face smolagents: agents that write code instead of JSON

Hugging Face releases smolagents, a ~1000-line minimal library for LLM agents. Pushes the 'code agents' paradigm: the agent writes Python snippets instead of JSON tool calls.

Agents Hugging FaceSmolagentsCode Agents

December 11, 2024 Landmark

Gemini 2.0 Flash: Google opens the 'agentic era' and shows Astra/Mariner/Jules

Google releases Gemini 2.0 Flash (native multimodal, tool use, image/audio output) and unveils Project Astra (real-time video assistant), Mariner (browser agent), Jules (coding agent).

Agents GoogleGemini 2.0Flash

October 31, 2024 High

Magentic-One: Microsoft's generalist multi-agent system tops GAIA benchmark

Microsoft Research publishes Magentic-One: a system with an Orchestrator plus 4 specialized agents (WebSurfer, FileSurfer, Coder, ComputerTerminal). First place on GAIA benchmark. Key insight: stateless specialized agents plus stateful orchestrator outperform a monolithic agent. Open source MIT.

Agents Magentic-Onemulti-agentMicrosoft Research

October 22, 2024 High ★ On my workflow

Computer Use: Claude learns mouse and keyboard

Anthropic enables 'Computer Use' on Claude 3.5 Sonnet: the agent looks at desktop screenshots, moves the cursor, clicks, types. For the first time a commercial LLM operates directly on the GUI.

Agents AnthropicClaudeComputer Use

October 14, 2024 High

n8n AI Agent nodes: mainstream no-code automation meets agentic loops

n8n adds native AI Agent nodes to its workflow builder, allowing LLM agentic loops to connect to 400+ business apps without code, marking the arrival of agents in mainstream automation.

Agents n8nNo-CodeAutomation

October 11, 2024 Medium

OpenAI Swarm: educational framework for multi-agent with handoffs

OpenAI publishes Swarm on GitHub, a minimal Python framework for orchestrating multiple agents with handoffs and routines — explicitly positioned as an 'educational' precursor to a future Agents SDK.

Agents OpenAISwarmAgents

August 5, 2024 Medium

Flowise v2: visual agents with parallel tool use and configurable memory types

Flowise v2 introduces sequential and parallel tool use in agents, multiple memory types (buffer, summary, vector), visually configurable agent loops, and LlamaIndex support.

Agents FlowiseVisual BuilderNo-Code

July 15, 2024 Medium

Dify 0.7: visual agentic workflows with integrated RAG and 10+ LLMs

Dify 0.7 brings a no-code/low-code visual builder for complex agentic workflows, integrated RAG with document parsing, support for 10+ LLM providers, and self-hostable deployment on Docker.

Agents DifyNo-CodeWorkflow

July 10, 2024 Medium

Agentless: less agent complexity, more results on SWE-bench

UIUC publishes Agentless: a two-phase pipeline (localize fault, generate repair) without complex agent loops. Outperforms AutoCodeRover and SWE-agent on SWE-bench. Top open submission on SWE-bench at publication time. Challenges the assumption that more agent complexity equals better results.

Agents AgentlessSWE-benchcode repair

June 25, 2024 Medium

Agno (formerly Phidata): lightweight, multimodal agent framework 10x faster

Agno, renamed from Phidata, is a model-agnostic Python agent framework with modular memory, storage, tools and knowledge base, native multimodal support, and performance 10x better than LangChain.

Agents AgnoPhidataLightweight

April 2, 2024 High

SWE-agent: an AI agent that resolves real GitHub issues at 12.5%

Princeton presents SWE-agent, an agent with a dedicated ACI interface that resolves real GitHub issues on SWE-bench at 12.5% — 6x to 12x better than previous systems.

Agents PrincetonSWE-agentSWE-bench

March 12, 2024 High

Devin: the first 'autonomous AI engineer' goes viral

Cognition Labs unveils Devin, an AI agent that plans, codes, debugs and executes software tasks end-to-end. Viral demo, SWE-bench 13.86%. Defines the 'AI software engineer' category.

Agents CognitionDevinAutonomous Agent

March 7, 2024 Medium

Microsoft TaskWeaver: every task becomes executable Python code

Microsoft's TaskWeaver is a code-first agent framework that converts every request into executable Python code in a sandbox, with persistent state between steps and a structured plugin system.

Agents TaskWeaverMicrosoftCode-First

February 21, 2024 Medium

Devika: the first open-source alternative to Devin explodes on GitHub

Mufeed VH publishes Devika, an open-source AI software engineer agent: accepts high-level programming objectives, decomposes them, searches the web, writes code and runs tests. First real open alternative to Devin. 15k GitHub stars in 72 hours.

Agents Devikaopen sourcesoftware engineer agent

January 17, 2024 High

CrewAI: AI agent teams with roles, goals and backstories like an office

CrewAI launches a Python framework for orchestrating teams of LLM agents with defined roles, individual objectives, and backstories, supporting both sequential and parallel processes.

Agents CrewAIMulti-AgentRoles

October 19, 2023 High

LangGraph: stateful agents as cyclic graphs with loops and branching

LangChain launches LangGraph, a framework for building agents as node graphs with persistent state, support for cycles, conditional branching, and parallel execution of complex workflows.

Agents LangGraphLangChainStateful Agents

October 16, 2023 Medium

OpenAgents: real agents for non-programmers via web interface

XLab (SUTD Singapore) publishes OpenAgents: a deployable platform with three specialized agents (web browsing, data analysis, code execution) accessible from a browser without API keys. First demonstration of real agentic capabilities for non-technical users, with complete open-source code.

Agents OpenAgentsweb browsingdata analysis

October 6, 2023 Medium

AgentBench: the first benchmark that measures LLMs as real agents

Tsinghua presents AgentBench, the first comprehensive benchmark for LLM agents across 8 operational environments, revealing a massive gap between GPT-4 and open-source models.

Agents TsinghuaAgentBenchBenchmark

August 25, 2023 Medium

SuperAGI: the first open-source autonomous agent platform with a GUI

SuperAGI offers an open-source platform for autonomous agents with a web dashboard, tool marketplace, and the ability to run agents in the background without writing code. First solution to bring the 'monitor agent' experience to non-programmers. Concurrent with AutoGPT but more production-oriented.

Agents SuperAGIautonomous agentopen source

July 15, 2023 High

AutoGen: Microsoft formalizes agent-to-agent communication

Microsoft Research publishes AutoGen, a framework where you define agents with different roles and let them converse with each other to solve a task. First framework to formalize the 'agent-to-agent communication' pattern. Becomes the foundation of many enterprise multi-agent workflows.

Agents AutoGenmulti-agentMicrosoft Research

July 9, 2023 High

Reflexion: agents that learn from mistakes without gradient updates

MIT and Northeastern propose Reflexion: agents that self-reflect in natural language after each failure, accumulating insights in episodic memory without modifying weights.

Agents MITNortheasternReflexion

July 8, 2023 High

MetaGPT: agents with company roles that write software together

MetaGPT assigns each LLM agent a specific company role (PM, Architect, Engineer, QA) and has them collaborate to produce working code from a single text requirement.

Agents MetaGPTMulti-AgentSoftware Engineering

June 25, 2023 Medium

GPT-Engineer: generate an entire software project from a single sentence

Anton Osika publishes GPT-Engineer on GitHub: describe what you want in natural language, the agent asks clarifying questions, then writes all the files and runs them. 50k stars in one week. First viral implementation of the 'one-shot project generator' concept.

Agents GPT-Engineercode generationproject scaffolding

June 5, 2023 Medium

Gorilla: fine-tuned LLaMA that calls APIs without errors

UC Berkeley presents Gorilla, a retrieval-augmented fine-tuned LLaMA for accurate API calls: reduces API hallucination from 83% to 3%, outperforming GPT-4 on this task.

Agents UC BerkeleyGorillaLLaMA

May 30, 2023 High

Tree of Thoughts: the LLM that reasons by exploring alternative branches

Princeton and DeepMind propose Tree of Thoughts: the LLM generates and evaluates multiple reasoning paths as a search tree, clearly outperforming Chain-of-Thought.

Agents PrincetonDeepMindTree of Thoughts

May 17, 2023 High

Voyager: the AI agent that learns Minecraft forever, without reset

NVIDIA creates Voyager, a lifelong-learning agent in Minecraft that uses GPT-4 to write skills in JavaScript and accumulate them in a persistent library, never forgetting.

Agents NVIDIAVoyagerLifelong Learning

April 7, 2023 High

Generative Agents: 25 AI agents simulate a society in Smallville

Stanford creates 25 LLM-based agents simulating daily life in a virtual village, with episodic memory, reflection, and planning — the first credible artificial society.

Agents StanfordGenerative AgentsSmallville

April 3, 2023 High

BabyAGI: 200 lines of Python that spark the autonomous agent debate

Yohei Nakajima publishes BabyAGI, an autonomous task manager in ~200 Python lines using GPT-4 and Pinecone that creates and executes subtasks in an infinite loop, viral on Twitter within 24 hours.

Agents BabyAGIAutonomous AgentTask Management

March 30, 2023 High

AutoGPT: the first viral AI agent

A developer publishes AutoGPT on GitHub: given a text goal, the system calls GPT-4 in a loop to plan tasks, execute them, and self-criticize. In two weeks, becomes the most-starred repo in history.

Agents AutoGPTAgentsOpen Source

March 23, 2023 Medium

ChatGPT Plugins: the LLM becomes an interface to the web

OpenAI ships plugins for ChatGPT: the model can browse the web, run Python in a sandbox, book flights (Expedia, Kayak), order groceries (Instacart). First big mainstream tool-use experiment.

Agents OpenAIChatGPTPlugins

March 22, 2023 Medium

HuggingGPT: ChatGPT as a brain orchestrating 800 AI models

Microsoft Research uses ChatGPT as a central planner that decomposes complex tasks and delegates execution to specialized HuggingFace models for vision, audio, and NLP.

Agents Microsoft ResearchHuggingGPTJARVIS

March 17, 2023 Medium

Microsoft Semantic Kernel: the enterprise SDK for LLM orchestration

Microsoft open-sources Semantic Kernel, a C#/Python/Java SDK for integrating LLMs into enterprise apps. Introduces 'skills' (reusable AI functions) and 'planners' (auto-chaining toward a goal). Becomes Microsoft's standard AI orchestration layer for Copilot builds.

Agents Semantic KernelMicrosoftSDK

March 10, 2023 Medium

CAMEL: two LLM agents that cooperate to solve complex tasks

KAUST presents CAMEL, a role-playing framework where an 'AI user' LLM and an 'AI assistant' LLM autonomously collaborate on tasks without human intervention at each step.

Agents KAUSTCAMELMulti-Agent

February 9, 2023 High

Toolformer: the LLM that learns to use tools on its own

Meta AI presents Toolformer: an LLM that autonomously learns when and how to call external tools (calculator, Wikipedia, calendar) using self-supervised examples only.

Agents Meta AIToolformerTool Use

October 25, 2022 Landmark

LangChain: the framework for LLM applications is born

Harrison Chase releases LangChain, an open-source Python library to chain LLMs with prompt templates, memory, tools and external data sources. It will become the default stack of the first LLM apps.

Agents LangChainFrameworkLLM Apps

October 6, 2022 Landmark

ReAct: the framework that unites reasoning and acting in LLMs

Yao et al. introduce ReAct, a schema alternating explicit thoughts (Thought) and concrete actions (Act) in LLMs, the theoretical foundation of all modern agents.

Agents ReActReasoningTool Use

December 16, 2021 High

WebGPT: OpenAI teaches GPT-3 to browse the web

OpenAI publishes WebGPT, a GPT-3 fine-tune that learns to use a text browser to search the web for answers with source citations, trained via imitation learning + RLHF.

Agents OpenAIWebGPTBrowsing