2023

158 entries

December 18, 2023 Medium

AnythingLLM: full local RAG with web UI and embedded vector DB

AnythingLLM delivers a full-stack RAG system with a web interface, Ollama/LocalAI LLM backend support, and an embedded vector database, all offline in a single container.

Local AI AnythingLLMRAG LocaleVector DB

December 15, 2023 Medium

StyleTTS2: open source TTS with style diffusion outperforms Voicebox on intelligibility

StyleTTS2 uses style diffusion and adversarial training to generate human-level natural voices on LJSpeech, open source, surpassing Voicebox on intelligibility.

Voice & Audio StyleTTS2TTSStyle Diffusion

December 12, 2023 Medium

Phi-2: Microsoft's 2.7B model that beats a 13B

Microsoft Research releases Phi-2, 2.7B params trained on 'textbook-quality' data. Beats LLaMA 2 7B and Mistral 7B on reasoning benchmarks, runs on laptops. 'Small + clean data' philosophy.

Local AI MicrosoftPhi-2SLM

December 11, 2023 Landmark

Mixtral 8x7B: open-source Mixture of Experts that beats GPT-3.5

Mistral drops Mixtral 8x7B via magnet link with no warning: SMoE with 8 experts of 7B, 13B active params out of 47B total. Performance matches/exceeds GPT-3.5. Apache 2.0.

Open Source Models MistralMixtralMoE

December 7, 2023 High

Tesla Optimus Gen 2: handles raw eggs with per-finger force sensors

Tesla shows Optimus Gen 2 with 30% faster movement, per-finger force sensors, and demonstrated ability to manipulate raw eggs without breaking them.

Robotics TeslaOptimusHumanoid Robot

December 6, 2023 Landmark

Google Gemini 1.0: natively multimodal in three sizes

Google announces Gemini Ultra/Pro/Nano, the first family of natively multimodal models (text, images, audio, video). Ultra beats GPT-4 on MMLU 90.0% vs 86.4%. Controversial demo video.

Foundation Models GoogleGeminimultimodal

December 5, 2023 Medium

Jan.ai: open source desktop app for local LLMs with threads and local server

Jan.ai launches its first stable release: an open source local LLM client with persistent threads, an extension system, and a built-in OpenAI-compatible server.

Local AI Jan.aiDesktop AppOpen Source

December 5, 2023 High

MLX: Apple Research brings native machine learning to Apple Silicon

Apple Research releases MLX, an open source ML framework optimized for M1/M2/M3: it leverages unified CPU-GPU memory for LLM inference at near-discrete-GPU performance.

Local AI MLXApple SiliconM1 M2 M3

December 5, 2023 High

Mobile ALOHA: low-cost whole-body manipulation for complex household tasks

Stanford combines bimanual ALOHA arms with a mobile wheeled platform, creating the first low-cost system for whole-body manipulation. With 50 demonstrations it learns to cook, do laundry, and clean, opening the path to accessible household robots.

Robotics Mobile ALOHAbimanualmobile robot

November 29, 2023 Medium

JetBrains AI Assistant: native AI across all JetBrains IDEs

JetBrains launches AI Assistant out of beta, bringing intelligent refactoring, automatic documentation, and code chat to all its IDEs: IntelliJ, PyCharm, GoLand, WebStorm, and others.

AI Coding JetBrainsAI AssistantIntelliJ

November 22, 2023 High

Yi-34B: bilingual EN/ZH model in the open-weight top-3 of November 2023

01.ai by Kai-Fu Lee releases Yi-34B: 34B parameters trained on 3.1T tokens, modified Llama-2 architecture, bilingual EN/ZH, top-3 open weight in November 2023.

Foundation Models Yi-34B01.aiKai-Fu Lee

November 21, 2023 High

Claude 2.1: 200K context and fewer hallucinations

Anthropic ships Claude 2.1: 200K-token context window (~500 pages), 2× reduction in false statements on borderline questions, tool use in beta. Reply to GPT-4 Turbo 128K.

Foundation Models AnthropicClaude 2.1200K context

November 21, 2023 High

OpenAI launches TTS API: six voices, streaming and aggressive pricing

OpenAI launches its TTS API with 6 voices, pricing at $0.015 per 1000 characters, low latency streaming, and direct integration into the ChatGPT and Assistants ecosystem.

Voice & Audio OpenAITTSAPI

November 16, 2023 Medium

Google MusicLM: generating music from text goes public

Google makes MusicLM publicly available via Google Labs: musical generation from text description in a specific style, the first consumer music AI integration from a big tech company.

Voice & Audio GoogleMusicLMMusic Generation

November 15, 2023 Medium

Solar 10.7B: depth upscaling to merge layers from two LLaMA-2 models

Upstage presents Solar 10.7B, created by merging intermediate layers of two fine-tuned LLaMA-2 models (depth upscaling), winning the MBTI-OpenLLM leaderboard in November 2023.

Foundation Models SolarUpstageDepth Upscaling

November 14, 2023 Medium

LLaVA-NeXT and VideoLLaVA: LLaVA conquers video

LLaVA extends to video with frame sampling and temporal positional encoding, achieving competitive results on NExT-QA and ActivityNet without dedicated video training.

Multimodal AI VLMVideo UnderstandingLLaVA

November 12, 2023 High

Amazon Q Developer: the AI assistant that knows AWS from the inside

Amazon Q Developer brings AI coding directly into AWS consoles and IDEs: explains cloud resources, debugs errors, automatically migrates Java legacy code, and updates dependencies.

AI Coding AWSIDE AssistantCode Migration

November 7, 2023 Landmark ★ On my workflow

Ollama 0.1: pull and run local LLMs with one command, Docker-style

Ollama launches version 0.1: a minimal CLI to download and run local LLM models with a single command, reducing setup complexity to zero.

Local AI OllamaCLILLM Locale

November 6, 2023 High

OpenAI DevDay: GPT-4 Turbo, GPTs, Assistants API in one hour

At OpenAI's first developer conference: GPT-4 Turbo (128K context, lower prices), GPTs (shareable custom ChatGPTs), Assistants API (managed agents). Product + dev pivot.

Foundation Models OpenAIDevDayGPT-4 Turbo

November 4, 2023 Medium

Grok-1: xAI's chatbot with real-time access to X data

Elon Musk's xAI launches Grok-1, a model integrated with X (Twitter) for real-time information, with a 314B MoE architecture released as open weights in March 2024.

Foundation Models Grok-1xAIElon Musk

November 4, 2023 Medium

Pika 1.0: text and image to video for the mass market

Pika Labs launches Pika 1.0: a consumer platform for video generation from text or image, region animation, and aspect ratio control. Reaches 500k Discord users. Funded by Khosla Ventures at $55M.

Image & Video Gen Pika 1.0text-to-videoconsumer AI

November 1, 2023 Landmark

Bletchley AI Safety Summit: the first international agreement on frontier AI risks

28 nations sign the Bletchley Declaration on catastrophic frontier AI risks. The first AI Safety Institute (UK) is established. First international diplomatic agreement specifically dedicated to AI.

AI Security BletchleyAI Safety Summitinternational

November 1, 2023 Landmark

Microsoft 365 Copilot GA: available at 30 dollars per user per month

Microsoft 365 Copilot reaches general availability at 30 USD/user/month. Copilot Studio also launches for building custom enterprise agents.

Enterprise AI Microsoft 365CopilotGA

October 30, 2023 Landmark

Executive Order 14110: the first comprehensive US federal AI safety regulation

Biden signs the most sweeping executive order ever issued on AI: mandatory safety tests before frontier model releases, NIST standards for AI red-teaming, watermarking research, and new immigration rules for AI talent.

AI Security Executive OrderBidenAI safety

October 26, 2023 Medium

Whisper Large v3: improved multilingual ASR trained on 5 million hours

Whisper Large v3 reduces error rates on low-resource languages, improves timestamp accuracy and adds new language support, remaining the most widely deployed open-source ASR model.

Voice & Audio Whisper Large v3ASRspeech recognition

October 25, 2023 High

Latent Consistency Models: real-time image generation in 4 steps

Tsinghua University publishes LCM: distillation of a diffusion model reducing sampling from 50 steps to 4 with minimal quality loss. LCM-LoRA makes any SD model 10x faster. First technique enabling real-time generation on consumer hardware.

Image & Video Gen LCMlatent consistencydistillation

October 25, 2023 High

Zephyr-7B: DPO on Mistral 7B beats Llama-2-70B-chat on MT-Bench

HuggingFace trains Zephyr-7B with dSFT + Direct Preference Optimization on Mistral 7B base, achieving an MT-Bench score higher than Llama-2-70B-chat with 10x fewer parameters.

Foundation Models ZephyrHuggingFaceDPO

October 25, 2023 Medium

Zoom AI Companion: meeting summaries and action items included in the base plan

Zoom bundles AI Companion into Pro plans at no extra cost: summarises meetings in real-time, extracts automatic action items, and replies in Zoom chat.

Enterprise AI ZoomAI CompanionMeeting AI

October 23, 2023 Medium

Sanctuary AI Phoenix: the robot that understands complex natural language instructions

Sanctuary AI introduces Phoenix with Carbon AI, a neuro-symbolic system combining symbolic reasoning and neural nets to follow articulated linguistic instructions without explicit programming.

Robotics Sanctuary AIPhoenixCarbon AI

October 22, 2023 High

Eureka: NVIDIA uses GPT-4 to write reward functions and train expert robots

NVIDIA presents Eureka, the first system to use an LLM (GPT-4) to automatically generate reward functions for robotic reinforcement learning. The system achieves expert-level dexterous manipulation, including pen spinning, without manual reward design.

Robotics EurekaNVIDIAreward function

October 20, 2023 High

Open X-Embodiment: the first generalist cross-robot robotics dataset

Google DeepMind and 33 labs collect 527k episodes from 22 different robots: the first unified dataset for training generalist policies that work across multiple platforms.

Robotics Google DeepMindOpen X-EmbodimentDataset

October 19, 2023 High

LangGraph: stateful agents as cyclic graphs with loops and branching

LangChain launches LangGraph, a framework for building agents as node graphs with persistent state, support for cycles, conditional branching, and parallel execution of complex workflows.

Agents LangGraphLangChainStateful Agents

October 16, 2023 High

MITRE ATLAS v2: the AI attack taxonomy updated with real case studies

MITRE releases ATLAS v2 (Adversarial Threat Landscape for AI Systems), an expanded taxonomy of AI system attack techniques with real adversarial ML case studies and mapping to MITRE ATT&CK.

AI Security MITREATLASAdversarial ML

October 16, 2023 Medium

OpenAgents: real agents for non-programmers via web interface

XLab (SUTD Singapore) publishes OpenAgents: a deployable platform with three specialized agents (web browsing, data analysis, code execution) accessible from a browser without API keys. First demonstration of real agentic capabilities for non-technical users, with complete open-source code.

Agents OpenAgentsweb browsingdata analysis

October 11, 2023 Medium

WizardCoder: evolutionary instructions for GPT-4 level code generation

The WizardLM team applies Evol-Instruct to code, iteratively rewriting problems to increase complexity. WizardCoder-34B achieves 73.2% on HumanEval, matching GPT-4 at release time.

AI Coding WizardCoderEvol-InstructHumanEval

October 6, 2023 Medium

AgentBench: the first benchmark that measures LLMs as real agents

Tsinghua presents AgentBench, the first comprehensive benchmark for LLM agents across 8 operational environments, revealing a massive gap between GPT-4 and open-source models.

Agents TsinghuaAgentBenchBenchmark

October 5, 2023 High

LLaVA-1.5: open-source vision-language that beats benchmarks with minimal data

LLaVA-1.5 combines CLIP ViT-L, a two-layer MLP projection, and Vicuna to surpass 11 multimodal benchmarks using only 1.2M fine-tuning examples.

Image & Video Gen LLaVAVision-LanguageCLIP

October 4, 2023 High

Falcon-180B: the world's largest open-source model in 2023

The Technology Innovation Institute releases Falcon-180B, the largest openly available model at 180 billion parameters trained on 3.5 trillion tokens, topping the HuggingFace Open LLM Leaderboard.

Foundation Models Falcon-180BTIIopen source

October 3, 2023 High

DALL-E 3: images that actually follow instructions

OpenAI launches DALL-E 3 integrated into ChatGPT: dramatically improved prompt adherence over DALL-E 2, automatic caption synthesis for training, more readable text in images.

Image & Video Gen OpenAIDALL-E 3Text-to-Image

October 3, 2023 High

CogVLM: separate visual expert prevents language degradation

Tsinghua introduces CogVLM with a visual expert module independent from LLM parameters, eliminating performance degradation on pure text and reaching SOTA on VQA and OCR.

Multimodal AI CogVLMVisual ExpertVQA

September 28, 2023 High

AudioPaLM: the first LLM that processes and generates audio as text

AudioPaLM fuses PaLM-2 with an audio tokenizer to create an LLM that natively processes audio and text tokens, enabling speech translation while preserving speaker identity.

Voice & Audio AudioPaLMGoogleaudio LLM

September 28, 2023 Medium

HuggingFace Chat UI: open-source chat interface for any HF model

HuggingFace open-sources chat.huggingface.co: a self-hostable web interface via Docker for Llama 2, Mistral, Code Llama, and custom models, with support for tool calls and web search.

Local AI HuggingFace Chat UIopen sourcechat interface

September 27, 2023 High

Mistral 7B: Europe joins the open-source race

Mistral AI (Paris), a three-month-old startup founded by ex-Meta/DeepMind researchers, releases Mistral 7B under Apache 2.0. Beats Llama 2 13B on most benchmarks with half the parameters.

Open Source Models MistralMistral 7BOpen Source

September 27, 2023 High

PAIR: automated LLM-vs-LLM jailbreaking

CMU and UPenn publish PAIR: an attacker LLM that automatically refines its prompts against a target LLM, finding effective jailbreaks in under 20 queries with no human in the loop.

AI Security PAIRjailbreakautomated

September 27, 2023 High

NVIDIA TensorRT-LLM: automatic LLM compilation for GPUs with FP8 and multi-GPU

NVIDIA open-sources TensorRT-LLM, a framework for compiling and optimizing LLMs for NVIDIA GPUs with out-of-the-box FP8, INT4, sparse attention, and multi-GPU tensor parallelism support.

AI Infrastructure NVIDIATensorRT-LLMFP8

September 26, 2023 Medium

Microsoft Copilot in Windows 11: system-level AI for consumers

With update 23H2, Windows 11 integrates Copilot by default as a system side panel. Bing Chat is rebranded to Copilot. AI as an OS feature, not an app.

Enterprise AI MicrosoftCopilotWindows 11

September 25, 2023 High

Anthropic + AWS: 1.25 billion investment to bring Claude to Amazon Bedrock

AWS invests 1.25 billion dollars in Anthropic. Claude becomes available on Amazon Bedrock using dedicated Trainium and Inferentia infrastructure.

Enterprise AI AnthropicAWSClaude

September 25, 2023 High

ChatGPT can see, hear, and speak: voice + vision in mobile app

ChatGPT Plus on iOS/Android gets voice conversations (5 synthetic voices) and image input (GPT-4V). From text chat to a full conversational assistant.

Multimodal AI OpenAIChatGPTvoice

September 25, 2023 High

GPT-4V: ChatGPT learns to see (for real)

OpenAI activates GPT-4's vision capabilities in ChatGPT (announced six months earlier) and adds voice. Upload an image, talk about it, ask for analysis. Multimodality enters the consumer product.

Multimodal AI OpenAIGPT-4VVision

September 21, 2023 Medium

Slack AI: channel summaries and smart search in workplace chat

Slack integrates native AI into Pro+ plans: summarises channels and threads, answers questions about conversation history, supports Claude and OpenAI as LLM providers.

Enterprise AI SlackSalesforceProductivity

September 18, 2023 High

Adobe Firefly Enterprise: indemnified image generation for brands

Adobe launches Firefly Enterprise in Creative Cloud Teams with legal copyright indemnification and enterprise brand guidelines control over every generated image.

Enterprise AI AdobeFireflyGenerative AI

September 15, 2023 Medium

ExLlamaV2: high-speed quantized LLM inference on consumer GPUs

ExLlamaV2 introduces the EXL2 format with per-layer mixed bit-rates (2-8 bit), delivering higher throughput than llama.cpp on NVIDIA GPUs and enabling 70B models to run on a single RTX 3090.

AI Infrastructure ExLlamaV2EXL2Quantizzazione

September 14, 2023 High

Medusa: multi-head speculative decoding without a separate draft model, 2.2x speedup

Cornell/UIUC introduce Medusa: N additional decoding heads on the main model predict N tokens ahead simultaneously, 2.2x speedup without needing a second draft model.

AI Infrastructure MedusaSpeculative DecodingMulti-Head

September 14, 2023 High

Backdoors in fine-tuned LLMs: hidden behaviors activatable on command

Researchers demonstrate that fine-tuned LLMs can contain silent behavioral backdoors, activatable only when specific triggers invisible during normal model evaluation are present.

AI Security BackdoorSleeper AgentsFine-tuning

September 13, 2023 High

Adobe Firefly 1.0 GA: image generation on licensed content, Generative Fill in Photoshop

Adobe launches Firefly 1.0 GA, the first image generation model trained exclusively on licensed content, integrated into Photoshop as Generative Fill for commercially safe use.

Image & Video Gen Adobe FireflyGenerative FillLicensed Content

September 12, 2023 Medium

IP-Adapter: transfer style and subject from a reference image

Tencent AI Lab releases IP-Adapter, a lightweight adapter for Stable Diffusion that conditions generation on a reference image without retraining the base model.

Image & Video Gen TencentIP-AdapterStable Diffusion

September 10, 2023 High

Open Interpreter: LLM that executes code locally

An LLM running locally that can write and execute Python, JS, and Shell code autonomously, browse the web, and modify files on your computer.

Local AI Open InterpreterCode ExecutionLLM

September 6, 2023 High

Phi-1.5: big-model reasoning in just 1.3 billion parameters

Microsoft Research shows that 1.3B parameters trained on 'textbook quality' synthetic data produce multi-step reasoning comparable to models five times larger.

Foundation Models Phi-1.5small language modelsynthetic data

September 5, 2023 High

LM Studio: desktop GUI to download and run GGUF models with OpenAI server

LM Studio launches its first public release: a graphical interface to browse, download, and use local LLMs with a built-in chat and OpenAI-compatible server.

Local AI LM StudioGGUFGUI Desktop

September 1, 2023 High

Meta AudioCraft: open source suite for music and audio from text

Meta releases AudioCraft, an open source suite including MusicGen for generating structured music and AudioGen for ambient sounds, both controllable via text description.

Voice & Audio MetaAudioCraftMusicGen

August 28, 2023 Medium

ChatGPT Enterprise: unlimited GPT-4, locked-down data

OpenAI launches the enterprise ChatGPT plan: unlimited GPT-4, 32K context, advanced data analysis included, SOC 2, customer data never used for training. Reply to IT concerns.

Enterprise AI OpenAIChatGPT EnterpriseGPT-4

August 25, 2023 Medium

SuperAGI: the first open-source autonomous agent platform with a GUI

SuperAGI offers an open-source platform for autonomous agents with a web dashboard, tool marketplace, and the ability to run agents in the background without writing code. First solution to bring the 'monitor agent' experience to non-programmers. Concurrent with AutoGPT but more production-oriented.

Agents SuperAGIautonomous agentopen source

August 24, 2023 High

Code Llama: serious open-source coding model

Meta releases Code Llama (7B, 13B, 34B), a code-specialized fine-tune of Llama 2. Three variants per size: base, Python-specific, instruction-tuned. Llama 2 commercial license.

AI Coding MetaCode LlamaOpen Source

August 20, 2023 High

AnimateDiff: bring motion to any Stable Diffusion model

Shanghai AI Lab publishes AnimateDiff: a plug-in motion module that adds temporal consistency to any existing SD checkpoint, turning every image-only model into a video generator without retraining it.

Image & Video Gen AnimateDiffmotion moduleStable Diffusion

August 19, 2023 High

DeepSeek-Coder v1: China enters the open source coding model race

DeepSeek releases coding models from 1B to 33B parameters trained on 2 trillion tokens with advanced FIM training, topping HumanEval among all open-weight models.

AI Coding DeepSeek-Codercode modelFIM

August 15, 2023 Medium

OpenFlamingo (LAION/UW): open reproduction of Flamingo with multi-image few-shot visual learning

LAION and University of Washington release OpenFlamingo, an open-source reproduction of DeepMind's Flamingo: few-shot visual learning from image+text examples, available in 3B and 9B parameter variants. The first open model enabling multimodal research without API costs.

Multimodal AI OpenFlamingoFlamingoopen source

August 7, 2023 Medium

Google TPU v5e: Cost-Optimized AI Chip for Enterprise Inference

Google announces TPU v5e, a cost-optimized AI chip with 4x better performance per dollar compared to TPU v4 for inference, available through Google Kubernetes Engine for containerized workloads.

AI Infrastructure TPU v5eGoogleinference

August 4, 2023 Medium

Sourcegraph Cody: AI with full codebase context, not just the open file

Sourcegraph launches Cody in beta, an AI code assistant that understands the entire codebase — dependencies, architecture, cross-file relationships — thanks to Sourcegraph's code index.

AI Coding SourcegraphCodyCodebase Context

August 1, 2023 High

OWASP LLM Top 10: the 10 critical vulnerabilities in AI applications

OWASP publishes the first official list of the 10 most critical vulnerabilities in LLM applications, from prompt injection to insecure output handling, now the industry reference standard.

AI Security OWASPLLM Top 10Vulnerabilità

July 28, 2023 High

RT-2: the robot that reasons with a language model

DeepMind's RT-2 merges vision-language pretraining with robot control, transferring semantic reasoning from the web to a physical arm without task-specific training.

Robotics DeepMindRT-2VLA

July 28, 2023 High

FlashAttention-2: rewrite with 2x speedup, MQA/GQA support, and head-dim 256

Tri Dao rewrites FlashAttention with 2x speedup over FA1: better parallelism across seq-len, head-dim support up to 256, query parallelism for MHA, MQA, and GQA. De facto training standard.

AI Infrastructure FlashAttention-2AttentionTransformer

July 28, 2023 High

Orca: learning GPT-4 reasoning through explanation traces

Microsoft Research trains Orca 13B on step-by-step GPT-4 explanations (explanation traces), outperforming ChatGPT on BigBench and AGIEval with 13 billion parameters.

Foundation Models OrcaMicrosoftImitation Learning

July 26, 2023 High

Stable Diffusion XL 1.0: the open-source quality jump

Stability ships SDXL 1.0 (3.5B base + 6.6B refiner), native 1024×1024 output, shorter prompts. Open source under commercial license, weights on HuggingFace.

Image & Video Gen Stability AISDXLStable Diffusion

July 18, 2023 Landmark

Llama 2: weights become commercially usable

Meta releases Llama 2 (7B, 13B, 70B) under a license that allows commercial use up to 700M MAU. For the first time a serious LLM is genuinely deployable to production without depending on an API.

Open Source Models MetaLlama 2Open Weights

July 17, 2023 High

SeamlessM4T: Meta's universal speech translation model for 100+ languages

SeamlessM4T is the first multimodal system to handle speech-to-text, text-to-speech, and speech-to-speech across 100+ languages in a single model, powering Meta's real-time translation features.

Voice & Audio SeamlessM4TMetaspeech translation

July 15, 2023 High

AutoGen: Microsoft formalizes agent-to-agent communication

Microsoft Research publishes AutoGen, a framework where you define agents with different roles and let them converse with each other to solve a task. First framework to formalize the 'agent-to-agent communication' pattern. Becomes the foundation of many enterprise multi-agent workflows.

Agents AutoGenmulti-agentMicrosoft Research

July 13, 2023 High

WormGPT: the first commercial LLM built for cybercrime

The first LLM explicitly trained for criminal activity appears on the dark web: no safety filters, fine-tuned on malware data, sold as a monthly subscription.

AI Security WormGPTdark LLMcybercrime

July 11, 2023 High

Claude 2: 100K-token context, consumer access opens

Anthropic launches Claude 2 with a 100,000-token context window (~75,000 words) and opens claude.ai to the general public (initially US and UK). Long-context enters the mainstream.

Foundation Models AnthropicClaude 2100K Context

July 11, 2023 High

IBM launches watsonx.ai: governed foundation models for the enterprise

IBM unveils watsonx.ai at Think 2023: a platform featuring Granite models trained on curated data, a fine-tuning studio, AI factsheets for governance, and full data lineage. Built for banking, healthcare, and government.

Enterprise AI IBMwatsonxGranite

July 10, 2023 High

Universal adversarial attacks on LLMs: transferable jailbreaks across GPT-4, Claude, and Gemini

Zou et al. (CMU) demonstrate optimized suffixes that simultaneously jailbreak GPT-3.5/4, Claude, and Gemini: the first systematic proof of attack transferability across different models.

AI Security JailbreakAdversarial AttackCMU

July 9, 2023 High

Reflexion: agents that learn from mistakes without gradient updates

MIT and Northeastern propose Reflexion: agents that self-reflect in natural language after each failure, accumulating insights in episodic memory without modifying weights.

Agents MITNortheasternReflexion

July 8, 2023 High

MetaGPT: agents with company roles that write software together

MetaGPT assigns each LLM agent a specific company role (PM, Architect, Engineer, QA) and has them collaborate to produce working code from a single text requirement.

Agents MetaGPTMulti-AgentSoftware Engineering

July 5, 2023 High

llama.cpp K-quants: the intelligent quantization that transformed local models

llama.cpp introduces K-quants (Q2_K through Q8_K): per-layer quantization assigning different bit-widths based on tensor importance. Q4_K_M matches Q5_1 quality at a smaller file size, becoming the de facto standard for all modern GGUF models.

Local AI llama.cppK-quantsGGUF

June 25, 2023 Medium

GPT-Engineer: generate an entire software project from a single sentence

Anton Osika publishes GPT-Engineer on GitHub: describe what you want in natural language, the agent asks clarifying questions, then writes all the files and runs them. 50k stars in one week. First viral implementation of the 'one-shot project generator' concept.

Agents GPT-Engineercode generationproject scaffolding

June 22, 2023 High

AWQ: activation-aware 4-bit quantization for edge deployment with accuracy above GPTQ

MIT Han Lab publishes AWQ: 4-bit quantization that preserves salient weights identified through activation analysis, achieving better accuracy-throughput than GPTQ for edge deployment.

AI Infrastructure AWQQuantizzazione4-bit

June 20, 2023 Medium

Lakera Guard: real-time protection for LLMs in production

Lakera Guard is a SaaS API that protects LLM applications from prompt injection, jailbreak, and PII leakage with sub-millisecond latency, designed for high-traffic production environments.

AI Security LakeraPrompt InjectionJailbreak

June 16, 2023 High

Voicebox: Meta brings flow matching to TTS with audio editing and 6 languages

Voicebox uses flow matching with masked training to synthesize, edit, and transfer vocal styles across 6 languages, with no explicit cloning or fine-tuning.

Voice & Audio VoiceboxTTSFlow Matching

June 15, 2023 High

IDEFICS: the first open-source replica of Flamingo

HuggingFace releases IDEFICS, an open-weight replica of Flamingo in 9B and 80B versions, trained on LAION-5B and WikiMedia with few-shot visual in-context learning.

Multimodal AI VLMOpen SourceFew-Shot Learning

June 14, 2023 Medium

WizardLM: GPT-4-evolved instructions for fine-tuning

WizardLM uses Evol-Instruct — instructions automatically simplified and complicated by GPT-4 — achieving 97% of ChatGPT on WizardEval with a 70B model.

Foundation Models WizardLMEvol-InstructFine-tuning

June 13, 2023 High

Function calling: GPT learns to speak JSON

OpenAI adds 'function calling' to the API: the model returns structured JSON conforming to a schema, enabling reliable tool integrations without fragile prompt engineering.

AI Infrastructure OpenAIFunction CallingTool Use

June 12, 2023 Medium

Bark: open source TTS with laughter, sighs, and music from text

Suno AI releases Bark on HuggingFace: an open source TTS model capable of generating paralinguistics — laughter, sighs, sound effects, music — directly from text prompts.

Voice & Audio BarkSuno AITTS

June 8, 2023 High

GitHub Copilot X: in-IDE chat, test generation and Copilot for CLI

GitHub announces Copilot X with GPT-4-based chat integrated in VS Code, automatic PR description and test generation, a CLI assistant, and voice coding in preview.

AI Coding GitHubCopilotChat

June 8, 2023 High

Phi-1: 1.3B parameters beating models 10x larger on code

Microsoft Research releases Phi-1, 1.3B parameters trained on high-quality synthetic data ('textbooks'), outperforming models 10x larger on HumanEval.

Foundation Models Phi-1MicrosoftSmall Models

June 6, 2023 High

HuggingFace TGI: production-ready Docker container for LLM serving with continuous batching

HuggingFace releases Text Generation Inference, an optimized Docker container for serving LLMs in production with continuous batching, tensor parallelism, and integrated Flash Attention 2.

AI Infrastructure HuggingFaceTGILLM Serving

June 5, 2023 Medium

Gorilla: fine-tuned LLaMA that calls APIs without errors

UC Berkeley presents Gorilla, a retrieval-augmented fine-tuned LLaMA for accurate API calls: reduces API hallucination from 83% to 3%, outperforming GPT-4 on this task.

Agents UC BerkeleyGorillaLLaMA

June 1, 2023 High

Diffusion Policy: robot imitation learning goes multi-modal with diffusion models

MIT and Columbia apply denoising diffusion models to robot imitation learning, learning multi-modal action distributions instead of deterministic policies. They achieve a 46.9% improvement on manipulation benchmarks.

Robotics Diffusion Policyimitation learningdenoising diffusion

May 30, 2023 High

InstructBLIP: visual instruction tuning on 26 datasets outperforms GPT-4V

Salesforce extends BLIP-2 with visual instruction tuning on 26 datasets, beating GPT-4V on visual reasoning benchmarks with an open architecture.

Multimodal AI InstructBLIPInstruction TuningVisual Reasoning

May 30, 2023 High

Tree of Thoughts: the LLM that reasons by exploring alternative branches

Princeton and DeepMind propose Tree of Thoughts: the LLM generates and evaluates multiple reasoning paths as a search tree, clearly outperforming Chain-of-Thought.

Agents PrincetonDeepMindTree of Thoughts

May 26, 2023 High

Stable Diffusion XL 0.9: dual-encoder and 1024x1024 resolution

Stability AI launches SDXL 0.9 beta with dual-encoder architecture and separate refiner model for photographic-quality 1024x1024 images.

Image & Video Gen Stable Diffusion XLSDXLStability AI

May 23, 2023 High

Microsoft Build 2023: Copilot everywhere, a shared plugin standard

At Build 2023 Microsoft announces Windows Copilot, Copilot in Edge and 365, and adopts OpenAI's plugin standard. Strategy: 'AI co-pilot' as the primary UI.

Enterprise AI MicrosoftBuildCopilot

May 22, 2023 High

Falcon 40B: first open-weight model to beat LLaMA 65B

The Technology Innovation Institute UAE releases Falcon 40B: trained on 1T tokens of RefinedWeb, it beats LLaMA 65B on benchmarks with a commercial license.

Foundation Models FalconOpen WeightsTII

May 18, 2023 High

SoundStorm: Google generates 30 seconds of natural dialogue in half a second

SoundStorm uses MaskGIT on EnCodec tokens to generate audio in parallel rather than token-by-token: 30s of dialogue in 0.5s, preserving speaker consistency.

Voice & Audio SoundStormAudio GenerationGoogle

May 17, 2023 High

Voyager: the AI agent that learns Minecraft forever, without reset

NVIDIA creates Voyager, a lifelong-learning agent in Minecraft that uses GPT-4 to write skills in JavaScript and accumulate them in a persistent library, never forgetting.

Agents NVIDIAVoyagerLifelong Learning

May 16, 2023 High

Palantir AIP: first public LLM agent demo on classified operational data

First public demonstration of an enterprise LLM agent on real, sensitive operational data: military logistics routing via natural language. AIP sandboxes LLM outputs from raw data access. A turning point for AI in defense and government.

Enterprise AI PalantirAIPenterprise agent

May 15, 2023 Medium

TidyBot: a tidying robot that learns your preferences via LLM

Stanford presents TidyBot, a robotic system that uses LLMs to personalize household tidying behavior from a few user examples. It achieves 91.2% task completion, demonstrating the feasibility of LLM-driven personalization in manipulation.

Robotics TidyBotStanfordLLM planning

May 14, 2023 High

privateGPT: chat with your documents, completely offline

imartinez publishes privateGPT: full RAG on PDFs and TXT with a local LLM, zero cloud data. Your knowledge base stays on your disk.

Local AI privateGPTRAGPDF Offline

May 12, 2023 High

GPT4All v2 (Nomic AI): one-click local AI for everyone

Nomic AI launches GPT4All v2: a desktop installer that downloads and runs quantized models with no command line required, including LocalDocs for private document Q&A with no internet connection.

Local AI GPT4AllNomic AIconsumer AI

May 11, 2023 High

LocalAI: OpenAI drop-in replacement with local models and full privacy

mudler releases LocalAI, an OpenAI-compatible REST server that runs GGML/GGUF models locally: migrate your apps from cloud to self-hosted by changing only the URL.

Local AI LocalAIOpenAI APIPrivacy

May 10, 2023 High

Google PaLM 2: the model that makes Bard fly

At Google I/O 2023, PaLM 2 replaces LaMDA in Bard. Four sizes (Gecko, Otter, Bison, Unicorn), strong multilingual support and improved reasoning. Spawns Med-PaLM 2 and Sec-PaLM.

Foundation Models GooglePaLM 2Bard

May 8, 2023 High

ServiceNow Now Assist: native LLM in enterprise ITSM

ServiceNow embeds an LLM directly into its ITSM platform, summarising open tickets, suggesting resolutions, and automating escalations with no external plugins.

Enterprise AI ServiceNowNow AssistITSM

May 4, 2023 Medium

MPT-7B: the first open-source model explicitly built for commercial use

MosaicML launches MPT-7B under Apache 2.0 with a 65,000-token context window via ALiBi, the first open model explicitly designed for unrestricted commercial deployment.

Foundation Models MPT-7BALiBiApache 2.0

May 4, 2023 High

StarCoder: the first serious open coding model with transparent training data

BigCode and HuggingFace release StarCoder, a 15.5B-parameter model trained on 1 trillion tokens from The Stack across 86 languages, with an opt-out data governance system.

AI Coding StarCoderBigCodeopen source

May 2, 2023 High

MiniGPT-4 (KAUST): open-source visual chatbot with a single alignment layer

KAUST shows how to build a capable visual chatbot by connecting BLIP-2 and Vicuna with a single projection layer trained on 5,000 image-text pairs. The first demonstration that hours of single-GPU training are sufficient to create a working VLM.

Multimodal AI MiniGPT-4KAUSTBLIP-2

April 20, 2023 High

LLaVA: Visual Instruction Tuning opens the multimodal open-source era

LLaVA combines CLIP + LLaMA with 150k GPT-4-generated examples to create the first quality open-source visual assistant.

Multimodal AI LLaVAVisual Instruction TuningOpen Source

April 19, 2023 Medium

StableLM: Stability AI enters the open LLM race

Stability AI releases StableLM 3B and 7B under CC BY-SA 4.0, trained on 1.5T tokens. Open response to closed models, but quality still trails LLaMA.

Open Source Models Stability AIStableLMopen source

April 18, 2023 Medium

Microsoft Presidio: PII anonymization in LLM pipelines

Microsoft Presidio reaches general availability: open source framework for detecting and anonymizing personal data in LLM-processed text, with NER and regex for 50+ entity types.

AI Security MicrosoftPresidioPII

April 16, 2023 High

Vicuna-13B: the open chatbot that reaches 90% of ChatGPT quality

LMSYS fine-tunes LLaMA-13B on 70,000 ShareGPT conversations and produces an open-source chatbot that GPT-4, used as judge, rates at 90% of ChatGPT quality.

Foundation Models VicunaLLaMAfine-tuning

April 13, 2023 High

AWS Bedrock: managed multi-model AI on Amazon cloud

AWS announces Bedrock, a managed service exposing Claude (Anthropic), Jurassic-2 (AI21), Stable Diffusion, and its own Titan via one API. Reply to Azure OpenAI.

AI Infrastructure AWSBedrockmanaged AI

April 7, 2023 High

Generative Agents: 25 AI agents simulate a society in Smallville

Stanford creates 25 LLM-based agents simulating daily life in a virtual village, with episodic memory, reflection, and planning — the first credible artificial society.

Agents StanfordGenerative AgentsSmallville

April 3, 2023 High

BabyAGI: 200 lines of Python that spark the autonomous agent debate

Yohei Nakajima publishes BabyAGI, an autonomous task manager in ~200 Python lines using GPT-4 and Pinecone that creates and executes subtasks in an infinite loop, viral on Twitter within 24 hours.

Agents BabyAGIAutonomous AgentTask Management

March 30, 2023 High

AutoGPT: the first viral AI agent

A developer publishes AutoGPT on GitHub: given a text goal, the system calls GPT-4 in a loop to plan tasks, execute them, and self-criticize. In two weeks, becomes the most-starred repo in history.

Agents AutoGPTAgentsOpen Source

March 27, 2023 High

GPT4All: click-and-run offline LLM for non-technical users

Nomic AI releases GPT4All, a point-and-click installer to run LLMs offline on Windows, Mac, and Linux, lowering the technical barrier to almost zero.

Local AI GPT4AllNomic AILLM Offline

March 25, 2023 High

oobabooga text-generation-webui: the first GUI for local LLMs

The most-starred open-source web interface for running local LLMs: supports GPTQ, GGML, transformers backends with Gradio UI, extensions, character cards, and chat/instruct modes.

Local AI oobaboogatext-generation-webuilocal LLM

March 23, 2023 Medium

ChatGPT Plugins: the LLM becomes an interface to the web

OpenAI ships plugins for ChatGPT: the model can browse the web, run Python in a sandbox, book flights (Expedia, Kayak), order groceries (Instacart). First big mainstream tool-use experiment.

Agents OpenAIChatGPTPlugins

March 22, 2023 Medium

Codeium: free AI code assistant for 70+ languages, Copilot alternative

Codeium launches its AI code assistant completely free for individual developers, supporting over 70 languages and integrating with VS Code, JetBrains, and Vim.

AI Coding CodeiumCode CompletionFree

March 22, 2023 Medium

HuggingGPT: ChatGPT as a brain orchestrating 800 AI models

Microsoft Research uses ChatGPT as a central planner that decomposes complex tasks and delegates execution to specialized HuggingFace models for vision, audio, and NLP.

Agents Microsoft ResearchHuggingGPTJARVIS

March 22, 2023 High

Llama Guard: an LLM trained to be the gatekeeper of other LLMs

Meta releases Llama Guard, a fine-tuned LLaMA classifier that identifies dangerous inputs and outputs across 6 harm categories, designed as a plug-in safety layer for LLM applications.

AI Security MetaLlamaGuardContent Safety

March 21, 2023 Medium

Google Bard: the (late) answer to ChatGPT

Google opens Bard public preview in US and UK, based on a lightweight LaMDA. Reception is lukewarm: slow, cautious, less useful than ChatGPT.

Foundation Models GoogleBardLaMDA

March 20, 2023 Medium

Runway Gen-1: text- and image-guided video style transfer

Runway launches Gen-1: the first commercial model that applies a visual style from text or a reference image to an existing video, frame by frame. Precursor to the Gen-2/Gen-3 line.

Image & Video Gen Runway Gen-1video style transfertext-to-video

March 17, 2023 Medium

Microsoft Semantic Kernel: the enterprise SDK for LLM orchestration

Microsoft open-sources Semantic Kernel, a C#/Python/Java SDK for integrating LLMs into enterprise apps. Introduces 'skills' (reusable AI functions) and 'planners' (auto-chaining toward a goal). Becomes Microsoft's standard AI orchestration layer for Copilot builds.

Agents Semantic KernelMicrosoftSDK

March 17, 2023 Medium

Tesla Optimus Gen 1: the bipedal robot walks autonomously in a factory

Tesla releases the first video of Optimus Gen 1 walking and performing tasks autonomously in a real factory environment, with a stated target price of 20,000 dollars.

Robotics TeslaOptimusHumanoid Robot

March 16, 2023 Landmark

Microsoft 365 Copilot: GPT-4 embedded in Word, Excel, Teams and Outlook

Microsoft announces Copilot across the M365 suite: AI for 300M+ enterprise users, powered by GPT-4 and Microsoft Graph for business context.

Enterprise AI Microsoft 365CopilotGPT-4

March 15, 2023 High

PyTorch 2.0 and torch.compile: Graph Compilation Without Rewriting Code

PyTorch 2.0 introduces torch.compile built on TorchDynamo and the Inductor backend, delivering up to 2x speedup on transformers without code changes, making PyTorch competitive with XLA/JAX for production workloads.

AI Infrastructure PyTorch 2.0torch.compileTorchDynamo

March 14, 2023 High

Claude arrives: the first serious ChatGPT competitor

Anthropic launches Claude, an AI assistant trained with Constitutional AI. Same day as GPT-4. Two versions: Claude (full) and Claude Instant (faster and cheaper).

Foundation Models AnthropicClaudeConstitutional AI

March 14, 2023 High

Google Workspace AI (Duet AI): the first AI assistant built into G Suite

Google announces Duet AI for Workspace: assisted writing in Docs, email summaries in Gmail, slide generation in Slides, and formula help in Sheets.

Enterprise AI Google WorkspaceDuet AIProductivity

March 14, 2023 Landmark

GPT-4: the reasoning leap that resets the baseline

OpenAI releases GPT-4, multimodal (text + image), with reasoning, coding, and reliability clearly beyond GPT-3.5. Passes bar, medical, and coding exams.

Foundation Models OpenAIGPT-4Multimodal

March 10, 2023 Medium

CAMEL: two LLM agents that cooperate to solve complex tasks

KAUST presents CAMEL, a role-playing framework where an 'AI user' LLM and an 'AI assistant' LLM autonomously collaborate on tasks without human intervention at each step.

Agents KAUSTCAMELMulti-Agent

March 10, 2023 Landmark

llama.cpp: LLaMA 7B runs 4-bit on MacBook CPU

Georgi Gerganov brings Meta's LLaMA to consumer CPUs via 4-bit C++ quantization: the first foundation model practically usable offline on a laptop.

Local AI LLaMAllama.cppC++

March 7, 2023 High

Salesforce Einstein GPT: the first CRM with native generative AI

Salesforce embeds generative AI directly into its CRM, suggesting sales emails, case replies, and Salesforce Flow code without leaving the platform.

Enterprise AI SalesforceEinstein GPTCRM

March 6, 2023 Landmark

PaLM-E: the first embodied VLM at 562 billion parameters

Google presents PaLM-E, a 562B-parameter multimodal model that feeds images and robot state directly into the transformer, capable of long-horizon planning on real robots.

Robotics GooglePaLM-EVLM

March 2, 2023 High

RoboCat: the first robot that self-improves without human labeling

DeepMind introduces RoboCat, a robotic agent that learns from few demonstrations, self-trains by collecting new data, and improves iteratively without human intervention. With just 10 demos it achieves 36% success on novel tasks.

Robotics RoboCatDeepMindself-improvement

March 1, 2023 High

Agility Robotics Digit v3: the first humanoid in an Amazon warehouse

Agility Robotics announces partnership with Amazon for Digit v3, a bipedal warehouse robot — first real-scale industrial deployment of a humanoid.

Robotics Agility RoboticsDigitHumanoid Robot

March 1, 2023 High

ChatGPT API: gpt-3.5-turbo at $0.002 per 1K tokens

OpenAI ships the ChatGPT API (gpt-3.5-turbo) at one tenth the price of text-davinci-003, plus Whisper API for speech-to-text. The wrapper era begins.

Foundation Models OpenAIChatGPTAPI

February 24, 2023 High

LLaMA: Meta opens foundation models to research

Meta releases LLaMA in four sizes (7B, 13B, 33B, 65B), available to researchers on request. One week later, the weights leak publicly.

Open Source Models MetaLLaMAOpen Weights

February 23, 2023 Medium

Amazon CodeWhisperer GA: AWS-native code assistant with reference tracking

Amazon launches CodeWhisperer GA with a unique feature: it flags when generated code resembles open source snippets, showing the license and source repo. Free tier for individual developers.

AI Coding AmazonCodeWhispererAWS

February 10, 2023 High

ControlNet: structural control for Stable Diffusion without retraining

Zhang et al. introduce ControlNet, an adapter adding pose, depth, and edge control to Stable Diffusion without modifying the base model weights.

Image & Video Gen ControlNetStable DiffusionDiffusion Models

February 9, 2023 High

Toolformer: the LLM that learns to use tools on its own

Meta AI presents Toolformer: an LLM that autonomously learns when and how to call external tools (calculator, Wikipedia, calendar) using self-supervised examples only.

Agents Meta AIToolformerTool Use

February 9, 2023 High

vLLM: 24x LLM throughput with PagedAttention from UC Berkeley

The UC Berkeley team releases vLLM, a Python library for LLM inference using PagedAttention to manage KV cache like OS virtual memory, achieving 24x throughput over the HuggingFace baseline.

AI Infrastructure vLLMBerkeleyPagedAttention

February 7, 2023 Medium

Bing Chat: search engines change for the first time in 20 years

Microsoft integrates conversational AI into Bing (later revealed to run on pre-release GPT-4) that answers with direct citations from web pages. The Google 'code red' moment.

Foundation Models MicrosoftBing ChatSydney

January 30, 2023 High

BLIP-2: the Q-Former bridge between vision and language

Salesforce introduces BLIP-2: a lightweight Q-Former bridges frozen visual encoder and frozen LLM, achieving SOTA captioning with 8x fewer trainable parameters.

Multimodal AI BLIP-2Q-FormerImage Captioning

January 27, 2023 High

XTTS: Coqui AI's open-source multilingual zero-shot voice cloning

XTTS brings multilingual zero-shot voice cloning to open source: just a 6-second audio sample to replicate a voice across 17 different languages, with MIT license.

Voice & Audio XTTSCoquimultilingual

January 26, 2023 High

Code as Policies: the robot programs itself from natural language

Google shows how an LLM directly generates executable robot code from natural-language instructions, without robotic fine-tuning, using hierarchical function composition.

Robotics GoogleCode as PoliciesLLM

January 26, 2023 High

ElevenLabs exits beta: AI voice becomes the creator standard

ElevenLabs exits public beta with 1-minute voice cloning, 29 languages, and prosodically natural TTS, establishing itself as the reference for creators and audiobooks.

Voice & Audio ElevenLabsVoice CloningTTS

January 26, 2023 High

NIST AI Risk Management Framework 1.0

The US government publishes the first official framework for managing AI risks in organizations: four core functions — Govern, Map, Measure, Manage.

AI Security NISTAI RMFrisk management

January 20, 2023 High

Speculative Decoding: 2-3x LLM inference speedup without changing output

Chen et al. (Google Brain) publish Speculative Decoding: a small model proposes tokens, the large model verifies them in parallel. Same output, 2-3x faster with no quality change.

AI Infrastructure Speculative DecodingInferenceAutoregressive

January 16, 2023 Landmark

Azure OpenAI Service goes GA: GPT-4 with enterprise SLA

Microsoft makes OpenAI models (GPT-3.5-Turbo, Codex, DALL-E) available on Azure with enterprise SLA, VNet isolation, HIPAA and SOC2 compliance. A watershed moment for enterprise AI adoption.

Enterprise AI Azure OpenAIMicrosoftenterprise

January 10, 2023 High

whisper.cpp: offline voice transcription on CPU with pure C++

Georgi Gerganov brings OpenAI's Whisper model to CPU via a minimal C++ implementation: real-time transcription with no GPU and no cloud.

Local AI WhisperSpeech-to-TextC++

January 5, 2023 Landmark

VALL-E: Microsoft clones a voice from 3 seconds of audio using in-context learning

VALL-E clones any voice with just 3 seconds of reference audio, no fine-tuning needed, using in-context learning on EnCodec tokens. First zero-shot TTS at naturalistic quality.

Voice & Audio VALL-ETTSVoice Cloning