AnythingLLM: full local RAG with web UI and embedded vector DB
AnythingLLM delivers a full-stack RAG system with a web interface, Ollama/LocalAI LLM backend support, and an embedded vector database, all offline in a single container.
158 entries
AnythingLLM delivers a full-stack RAG system with a web interface, Ollama/LocalAI LLM backend support, and an embedded vector database, all offline in a single container.
StyleTTS2 uses style diffusion and adversarial training to generate human-level natural voices on LJSpeech, open source, surpassing Voicebox on intelligibility.
Microsoft Research releases Phi-2, 2.7B params trained on 'textbook-quality' data. Beats LLaMA 2 7B and Mistral 7B on reasoning benchmarks, runs on laptops. 'Small + clean data' philosophy.
Mistral drops Mixtral 8x7B via magnet link with no warning: SMoE with 8 experts of 7B, 13B active params out of 47B total. Performance matches/exceeds GPT-3.5. Apache 2.0.
Tesla shows Optimus Gen 2 with 30% faster movement, per-finger force sensors, and demonstrated ability to manipulate raw eggs without breaking them.
Google announces Gemini Ultra/Pro/Nano, the first family of natively multimodal models (text, images, audio, video). Ultra beats GPT-4 on MMLU 90.0% vs 86.4%. Controversial demo video.
Jan.ai launches its first stable release: an open source local LLM client with persistent threads, an extension system, and a built-in OpenAI-compatible server.
Apple Research releases MLX, an open source ML framework optimized for M1/M2/M3: it leverages unified CPU-GPU memory for LLM inference at near-discrete-GPU performance.
Stanford combines bimanual ALOHA arms with a mobile wheeled platform, creating the first low-cost system for whole-body manipulation. With 50 demonstrations it learns to cook, do laundry, and clean, opening the path to accessible household robots.
JetBrains launches AI Assistant out of beta, bringing intelligent refactoring, automatic documentation, and code chat to all its IDEs: IntelliJ, PyCharm, GoLand, WebStorm, and others.
01.ai by Kai-Fu Lee releases Yi-34B: 34B parameters trained on 3.1T tokens, modified Llama-2 architecture, bilingual EN/ZH, top-3 open weight in November 2023.
Anthropic ships Claude 2.1: 200K-token context window (~500 pages), 2× reduction in false statements on borderline questions, tool use in beta. Reply to GPT-4 Turbo 128K.
OpenAI launches its TTS API with 6 voices, pricing at $0.015 per 1000 characters, low latency streaming, and direct integration into the ChatGPT and Assistants ecosystem.
Google makes MusicLM publicly available via Google Labs: musical generation from text description in a specific style, the first consumer music AI integration from a big tech company.
Upstage presents Solar 10.7B, created by merging intermediate layers of two fine-tuned LLaMA-2 models (depth upscaling), winning the MBTI-OpenLLM leaderboard in November 2023.
LLaVA extends to video with frame sampling and temporal positional encoding, achieving competitive results on NExT-QA and ActivityNet without dedicated video training.
Amazon Q Developer brings AI coding directly into AWS consoles and IDEs: explains cloud resources, debugs errors, automatically migrates Java legacy code, and updates dependencies.
Ollama launches version 0.1: a minimal CLI to download and run local LLM models with a single command, reducing setup complexity to zero.
At OpenAI's first developer conference: GPT-4 Turbo (128K context, lower prices), GPTs (shareable custom ChatGPTs), Assistants API (managed agents). Product + dev pivot.
Elon Musk's xAI launches Grok-1, a model integrated with X (Twitter) for real-time information, with a 314B MoE architecture released as open weights in March 2024.
Pika Labs launches Pika 1.0: a consumer platform for video generation from text or image, region animation, and aspect ratio control. Reaches 500k Discord users. Funded by Khosla Ventures at $55M.
28 nations sign the Bletchley Declaration on catastrophic frontier AI risks. The first AI Safety Institute (UK) is established. First international diplomatic agreement specifically dedicated to AI.
Microsoft 365 Copilot reaches general availability at 30 USD/user/month. Copilot Studio also launches for building custom enterprise agents.
Biden signs the most sweeping executive order ever issued on AI: mandatory safety tests before frontier model releases, NIST standards for AI red-teaming, watermarking research, and new immigration rules for AI talent.
Whisper Large v3 reduces error rates on low-resource languages, improves timestamp accuracy and adds new language support, remaining the most widely deployed open-source ASR model.
Tsinghua University publishes LCM: distillation of a diffusion model reducing sampling from 50 steps to 4 with minimal quality loss. LCM-LoRA makes any SD model 10x faster. First technique enabling real-time generation on consumer hardware.
HuggingFace trains Zephyr-7B with dSFT + Direct Preference Optimization on Mistral 7B base, achieving an MT-Bench score higher than Llama-2-70B-chat with 10x fewer parameters.
Zoom bundles AI Companion into Pro plans at no extra cost: summarises meetings in real-time, extracts automatic action items, and replies in Zoom chat.
Sanctuary AI introduces Phoenix with Carbon AI, a neuro-symbolic system combining symbolic reasoning and neural nets to follow articulated linguistic instructions without explicit programming.
NVIDIA presents Eureka, the first system to use an LLM (GPT-4) to automatically generate reward functions for robotic reinforcement learning. The system achieves expert-level dexterous manipulation, including pen spinning, without manual reward design.
Google DeepMind and 33 labs collect 527k episodes from 22 different robots: the first unified dataset for training generalist policies that work across multiple platforms.
LangChain launches LangGraph, a framework for building agents as node graphs with persistent state, support for cycles, conditional branching, and parallel execution of complex workflows.
MITRE releases ATLAS v2 (Adversarial Threat Landscape for AI Systems), an expanded taxonomy of AI system attack techniques with real adversarial ML case studies and mapping to MITRE ATT&CK.
XLab (SUTD Singapore) publishes OpenAgents: a deployable platform with three specialized agents (web browsing, data analysis, code execution) accessible from a browser without API keys. First demonstration of real agentic capabilities for non-technical users, with complete open-source code.
The WizardLM team applies Evol-Instruct to code, iteratively rewriting problems to increase complexity. WizardCoder-34B achieves 73.2% on HumanEval, matching GPT-4 at release time.
Tsinghua presents AgentBench, the first comprehensive benchmark for LLM agents across 8 operational environments, revealing a massive gap between GPT-4 and open-source models.
LLaVA-1.5 combines CLIP ViT-L, a two-layer MLP projection, and Vicuna to surpass 11 multimodal benchmarks using only 1.2M fine-tuning examples.
The Technology Innovation Institute releases Falcon-180B, the largest openly available model at 180 billion parameters trained on 3.5 trillion tokens, topping the HuggingFace Open LLM Leaderboard.
OpenAI launches DALL-E 3 integrated into ChatGPT: dramatically improved prompt adherence over DALL-E 2, automatic caption synthesis for training, more readable text in images.
Tsinghua introduces CogVLM with a visual expert module independent from LLM parameters, eliminating performance degradation on pure text and reaching SOTA on VQA and OCR.
AudioPaLM fuses PaLM-2 with an audio tokenizer to create an LLM that natively processes audio and text tokens, enabling speech translation while preserving speaker identity.
HuggingFace open-sources chat.huggingface.co: a self-hostable web interface via Docker for Llama 2, Mistral, Code Llama, and custom models, with support for tool calls and web search.
Mistral AI (Paris), a three-month-old startup founded by ex-Meta/DeepMind researchers, releases Mistral 7B under Apache 2.0. Beats Llama 2 13B on most benchmarks with half the parameters.
CMU and UPenn publish PAIR: an attacker LLM that automatically refines its prompts against a target LLM, finding effective jailbreaks in under 20 queries with no human in the loop.
NVIDIA open-sources TensorRT-LLM, a framework for compiling and optimizing LLMs for NVIDIA GPUs with out-of-the-box FP8, INT4, sparse attention, and multi-GPU tensor parallelism support.
With update 23H2, Windows 11 integrates Copilot by default as a system side panel. Bing Chat is rebranded to Copilot. AI as an OS feature, not an app.
AWS invests 1.25 billion dollars in Anthropic. Claude becomes available on Amazon Bedrock using dedicated Trainium and Inferentia infrastructure.
ChatGPT Plus on iOS/Android gets voice conversations (5 synthetic voices) and image input (GPT-4V). From text chat to a full conversational assistant.
OpenAI activates GPT-4's vision capabilities in ChatGPT (announced six months earlier) and adds voice. Upload an image, talk about it, ask for analysis. Multimodality enters the consumer product.
Slack integrates native AI into Pro+ plans: summarises channels and threads, answers questions about conversation history, supports Claude and OpenAI as LLM providers.
Adobe launches Firefly Enterprise in Creative Cloud Teams with legal copyright indemnification and enterprise brand guidelines control over every generated image.
ExLlamaV2 introduces the EXL2 format with per-layer mixed bit-rates (2-8 bit), delivering higher throughput than llama.cpp on NVIDIA GPUs and enabling 70B models to run on a single RTX 3090.
Cornell/UIUC introduce Medusa: N additional decoding heads on the main model predict N tokens ahead simultaneously, 2.2x speedup without needing a second draft model.
Researchers demonstrate that fine-tuned LLMs can contain silent behavioral backdoors, activatable only when specific triggers invisible during normal model evaluation are present.
Adobe launches Firefly 1.0 GA, the first image generation model trained exclusively on licensed content, integrated into Photoshop as Generative Fill for commercially safe use.
Tencent AI Lab releases IP-Adapter, a lightweight adapter for Stable Diffusion that conditions generation on a reference image without retraining the base model.
An LLM running locally that can write and execute Python, JS, and Shell code autonomously, browse the web, and modify files on your computer.
Microsoft Research shows that 1.3B parameters trained on 'textbook quality' synthetic data produce multi-step reasoning comparable to models five times larger.
LM Studio launches its first public release: a graphical interface to browse, download, and use local LLMs with a built-in chat and OpenAI-compatible server.
Meta releases AudioCraft, an open source suite including MusicGen for generating structured music and AudioGen for ambient sounds, both controllable via text description.
OpenAI launches the enterprise ChatGPT plan: unlimited GPT-4, 32K context, advanced data analysis included, SOC 2, customer data never used for training. Reply to IT concerns.
SuperAGI offers an open-source platform for autonomous agents with a web dashboard, tool marketplace, and the ability to run agents in the background without writing code. First solution to bring the 'monitor agent' experience to non-programmers. Concurrent with AutoGPT but more production-oriented.
Meta releases Code Llama (7B, 13B, 34B), a code-specialized fine-tune of Llama 2. Three variants per size: base, Python-specific, instruction-tuned. Llama 2 commercial license.
Shanghai AI Lab publishes AnimateDiff: a plug-in motion module that adds temporal consistency to any existing SD checkpoint, turning every image-only model into a video generator without retraining it.
DeepSeek releases coding models from 1B to 33B parameters trained on 2 trillion tokens with advanced FIM training, topping HumanEval among all open-weight models.
LAION and University of Washington release OpenFlamingo, an open-source reproduction of DeepMind's Flamingo: few-shot visual learning from image+text examples, available in 3B and 9B parameter variants. The first open model enabling multimodal research without API costs.
Google announces TPU v5e, a cost-optimized AI chip with 4x better performance per dollar compared to TPU v4 for inference, available through Google Kubernetes Engine for containerized workloads.
Sourcegraph launches Cody in beta, an AI code assistant that understands the entire codebase — dependencies, architecture, cross-file relationships — thanks to Sourcegraph's code index.
OWASP publishes the first official list of the 10 most critical vulnerabilities in LLM applications, from prompt injection to insecure output handling, now the industry reference standard.
DeepMind's RT-2 merges vision-language pretraining with robot control, transferring semantic reasoning from the web to a physical arm without task-specific training.
Tri Dao rewrites FlashAttention with 2x speedup over FA1: better parallelism across seq-len, head-dim support up to 256, query parallelism for MHA, MQA, and GQA. De facto training standard.
Microsoft Research trains Orca 13B on step-by-step GPT-4 explanations (explanation traces), outperforming ChatGPT on BigBench and AGIEval with 13 billion parameters.
Stability ships SDXL 1.0 (3.5B base + 6.6B refiner), native 1024×1024 output, shorter prompts. Open source under commercial license, weights on HuggingFace.
Meta releases Llama 2 (7B, 13B, 70B) under a license that allows commercial use up to 700M MAU. For the first time a serious LLM is genuinely deployable to production without depending on an API.
SeamlessM4T is the first multimodal system to handle speech-to-text, text-to-speech, and speech-to-speech across 100+ languages in a single model, powering Meta's real-time translation features.
Microsoft Research publishes AutoGen, a framework where you define agents with different roles and let them converse with each other to solve a task. First framework to formalize the 'agent-to-agent communication' pattern. Becomes the foundation of many enterprise multi-agent workflows.
The first LLM explicitly trained for criminal activity appears on the dark web: no safety filters, fine-tuned on malware data, sold as a monthly subscription.
Anthropic launches Claude 2 with a 100,000-token context window (~75,000 words) and opens claude.ai to the general public (initially US and UK). Long-context enters the mainstream.
IBM unveils watsonx.ai at Think 2023: a platform featuring Granite models trained on curated data, a fine-tuning studio, AI factsheets for governance, and full data lineage. Built for banking, healthcare, and government.
Zou et al. (CMU) demonstrate optimized suffixes that simultaneously jailbreak GPT-3.5/4, Claude, and Gemini: the first systematic proof of attack transferability across different models.
MIT and Northeastern propose Reflexion: agents that self-reflect in natural language after each failure, accumulating insights in episodic memory without modifying weights.
MetaGPT assigns each LLM agent a specific company role (PM, Architect, Engineer, QA) and has them collaborate to produce working code from a single text requirement.
llama.cpp introduces K-quants (Q2_K through Q8_K): per-layer quantization assigning different bit-widths based on tensor importance. Q4_K_M matches Q5_1 quality at a smaller file size, becoming the de facto standard for all modern GGUF models.
Anton Osika publishes GPT-Engineer on GitHub: describe what you want in natural language, the agent asks clarifying questions, then writes all the files and runs them. 50k stars in one week. First viral implementation of the 'one-shot project generator' concept.
MIT Han Lab publishes AWQ: 4-bit quantization that preserves salient weights identified through activation analysis, achieving better accuracy-throughput than GPTQ for edge deployment.
Lakera Guard is a SaaS API that protects LLM applications from prompt injection, jailbreak, and PII leakage with sub-millisecond latency, designed for high-traffic production environments.
Voicebox uses flow matching with masked training to synthesize, edit, and transfer vocal styles across 6 languages, with no explicit cloning or fine-tuning.
HuggingFace releases IDEFICS, an open-weight replica of Flamingo in 9B and 80B versions, trained on LAION-5B and WikiMedia with few-shot visual in-context learning.
WizardLM uses Evol-Instruct — instructions automatically simplified and complicated by GPT-4 — achieving 97% of ChatGPT on WizardEval with a 70B model.
OpenAI adds 'function calling' to the API: the model returns structured JSON conforming to a schema, enabling reliable tool integrations without fragile prompt engineering.
Suno AI releases Bark on HuggingFace: an open source TTS model capable of generating paralinguistics — laughter, sighs, sound effects, music — directly from text prompts.
GitHub announces Copilot X with GPT-4-based chat integrated in VS Code, automatic PR description and test generation, a CLI assistant, and voice coding in preview.
Microsoft Research releases Phi-1, 1.3B parameters trained on high-quality synthetic data ('textbooks'), outperforming models 10x larger on HumanEval.
HuggingFace releases Text Generation Inference, an optimized Docker container for serving LLMs in production with continuous batching, tensor parallelism, and integrated Flash Attention 2.
UC Berkeley presents Gorilla, a retrieval-augmented fine-tuned LLaMA for accurate API calls: reduces API hallucination from 83% to 3%, outperforming GPT-4 on this task.
MIT and Columbia apply denoising diffusion models to robot imitation learning, learning multi-modal action distributions instead of deterministic policies. They achieve a 46.9% improvement on manipulation benchmarks.
Salesforce extends BLIP-2 with visual instruction tuning on 26 datasets, beating GPT-4V on visual reasoning benchmarks with an open architecture.
Princeton and DeepMind propose Tree of Thoughts: the LLM generates and evaluates multiple reasoning paths as a search tree, clearly outperforming Chain-of-Thought.
Stability AI launches SDXL 0.9 beta with dual-encoder architecture and separate refiner model for photographic-quality 1024x1024 images.
At Build 2023 Microsoft announces Windows Copilot, Copilot in Edge and 365, and adopts OpenAI's plugin standard. Strategy: 'AI co-pilot' as the primary UI.
The Technology Innovation Institute UAE releases Falcon 40B: trained on 1T tokens of RefinedWeb, it beats LLaMA 65B on benchmarks with a commercial license.
SoundStorm uses MaskGIT on EnCodec tokens to generate audio in parallel rather than token-by-token: 30s of dialogue in 0.5s, preserving speaker consistency.
NVIDIA creates Voyager, a lifelong-learning agent in Minecraft that uses GPT-4 to write skills in JavaScript and accumulate them in a persistent library, never forgetting.
First public demonstration of an enterprise LLM agent on real, sensitive operational data: military logistics routing via natural language. AIP sandboxes LLM outputs from raw data access. A turning point for AI in defense and government.
Stanford presents TidyBot, a robotic system that uses LLMs to personalize household tidying behavior from a few user examples. It achieves 91.2% task completion, demonstrating the feasibility of LLM-driven personalization in manipulation.
imartinez publishes privateGPT: full RAG on PDFs and TXT with a local LLM, zero cloud data. Your knowledge base stays on your disk.
Nomic AI launches GPT4All v2: a desktop installer that downloads and runs quantized models with no command line required, including LocalDocs for private document Q&A with no internet connection.
mudler releases LocalAI, an OpenAI-compatible REST server that runs GGML/GGUF models locally: migrate your apps from cloud to self-hosted by changing only the URL.
At Google I/O 2023, PaLM 2 replaces LaMDA in Bard. Four sizes (Gecko, Otter, Bison, Unicorn), strong multilingual support and improved reasoning. Spawns Med-PaLM 2 and Sec-PaLM.
ServiceNow embeds an LLM directly into its ITSM platform, summarising open tickets, suggesting resolutions, and automating escalations with no external plugins.
MosaicML launches MPT-7B under Apache 2.0 with a 65,000-token context window via ALiBi, the first open model explicitly designed for unrestricted commercial deployment.
BigCode and HuggingFace release StarCoder, a 15.5B-parameter model trained on 1 trillion tokens from The Stack across 86 languages, with an opt-out data governance system.
KAUST shows how to build a capable visual chatbot by connecting BLIP-2 and Vicuna with a single projection layer trained on 5,000 image-text pairs. The first demonstration that hours of single-GPU training are sufficient to create a working VLM.
LLaVA combines CLIP + LLaMA with 150k GPT-4-generated examples to create the first quality open-source visual assistant.
Stability AI releases StableLM 3B and 7B under CC BY-SA 4.0, trained on 1.5T tokens. Open response to closed models, but quality still trails LLaMA.
Microsoft Presidio reaches general availability: open source framework for detecting and anonymizing personal data in LLM-processed text, with NER and regex for 50+ entity types.
LMSYS fine-tunes LLaMA-13B on 70,000 ShareGPT conversations and produces an open-source chatbot that GPT-4, used as judge, rates at 90% of ChatGPT quality.
AWS announces Bedrock, a managed service exposing Claude (Anthropic), Jurassic-2 (AI21), Stable Diffusion, and its own Titan via one API. Reply to Azure OpenAI.
Stanford creates 25 LLM-based agents simulating daily life in a virtual village, with episodic memory, reflection, and planning — the first credible artificial society.
Yohei Nakajima publishes BabyAGI, an autonomous task manager in ~200 Python lines using GPT-4 and Pinecone that creates and executes subtasks in an infinite loop, viral on Twitter within 24 hours.
A developer publishes AutoGPT on GitHub: given a text goal, the system calls GPT-4 in a loop to plan tasks, execute them, and self-criticize. In two weeks, becomes the most-starred repo in history.
Nomic AI releases GPT4All, a point-and-click installer to run LLMs offline on Windows, Mac, and Linux, lowering the technical barrier to almost zero.
The most-starred open-source web interface for running local LLMs: supports GPTQ, GGML, transformers backends with Gradio UI, extensions, character cards, and chat/instruct modes.
OpenAI ships plugins for ChatGPT: the model can browse the web, run Python in a sandbox, book flights (Expedia, Kayak), order groceries (Instacart). First big mainstream tool-use experiment.
Codeium launches its AI code assistant completely free for individual developers, supporting over 70 languages and integrating with VS Code, JetBrains, and Vim.
Microsoft Research uses ChatGPT as a central planner that decomposes complex tasks and delegates execution to specialized HuggingFace models for vision, audio, and NLP.
Meta releases Llama Guard, a fine-tuned LLaMA classifier that identifies dangerous inputs and outputs across 6 harm categories, designed as a plug-in safety layer for LLM applications.
Google opens Bard public preview in US and UK, based on a lightweight LaMDA. Reception is lukewarm: slow, cautious, less useful than ChatGPT.
Runway launches Gen-1: the first commercial model that applies a visual style from text or a reference image to an existing video, frame by frame. Precursor to the Gen-2/Gen-3 line.
Microsoft open-sources Semantic Kernel, a C#/Python/Java SDK for integrating LLMs into enterprise apps. Introduces 'skills' (reusable AI functions) and 'planners' (auto-chaining toward a goal). Becomes Microsoft's standard AI orchestration layer for Copilot builds.
Tesla releases the first video of Optimus Gen 1 walking and performing tasks autonomously in a real factory environment, with a stated target price of 20,000 dollars.
Microsoft announces Copilot across the M365 suite: AI for 300M+ enterprise users, powered by GPT-4 and Microsoft Graph for business context.
PyTorch 2.0 introduces torch.compile built on TorchDynamo and the Inductor backend, delivering up to 2x speedup on transformers without code changes, making PyTorch competitive with XLA/JAX for production workloads.
Anthropic launches Claude, an AI assistant trained with Constitutional AI. Same day as GPT-4. Two versions: Claude (full) and Claude Instant (faster and cheaper).
Google announces Duet AI for Workspace: assisted writing in Docs, email summaries in Gmail, slide generation in Slides, and formula help in Sheets.
OpenAI releases GPT-4, multimodal (text + image), with reasoning, coding, and reliability clearly beyond GPT-3.5. Passes bar, medical, and coding exams.
KAUST presents CAMEL, a role-playing framework where an 'AI user' LLM and an 'AI assistant' LLM autonomously collaborate on tasks without human intervention at each step.
Georgi Gerganov brings Meta's LLaMA to consumer CPUs via 4-bit C++ quantization: the first foundation model practically usable offline on a laptop.
Salesforce embeds generative AI directly into its CRM, suggesting sales emails, case replies, and Salesforce Flow code without leaving the platform.
Google presents PaLM-E, a 562B-parameter multimodal model that feeds images and robot state directly into the transformer, capable of long-horizon planning on real robots.
DeepMind introduces RoboCat, a robotic agent that learns from few demonstrations, self-trains by collecting new data, and improves iteratively without human intervention. With just 10 demos it achieves 36% success on novel tasks.
Agility Robotics announces partnership with Amazon for Digit v3, a bipedal warehouse robot — first real-scale industrial deployment of a humanoid.
OpenAI ships the ChatGPT API (gpt-3.5-turbo) at one tenth the price of text-davinci-003, plus Whisper API for speech-to-text. The wrapper era begins.
Meta releases LLaMA in four sizes (7B, 13B, 33B, 65B), available to researchers on request. One week later, the weights leak publicly.
Amazon launches CodeWhisperer GA with a unique feature: it flags when generated code resembles open source snippets, showing the license and source repo. Free tier for individual developers.
Zhang et al. introduce ControlNet, an adapter adding pose, depth, and edge control to Stable Diffusion without modifying the base model weights.
Meta AI presents Toolformer: an LLM that autonomously learns when and how to call external tools (calculator, Wikipedia, calendar) using self-supervised examples only.
The UC Berkeley team releases vLLM, a Python library for LLM inference using PagedAttention to manage KV cache like OS virtual memory, achieving 24x throughput over the HuggingFace baseline.
Microsoft integrates conversational AI into Bing (later revealed to run on pre-release GPT-4) that answers with direct citations from web pages. The Google 'code red' moment.
Salesforce introduces BLIP-2: a lightweight Q-Former bridges frozen visual encoder and frozen LLM, achieving SOTA captioning with 8x fewer trainable parameters.
XTTS brings multilingual zero-shot voice cloning to open source: just a 6-second audio sample to replicate a voice across 17 different languages, with MIT license.
Google shows how an LLM directly generates executable robot code from natural-language instructions, without robotic fine-tuning, using hierarchical function composition.
ElevenLabs exits public beta with 1-minute voice cloning, 29 languages, and prosodically natural TTS, establishing itself as the reference for creators and audiobooks.
The US government publishes the first official framework for managing AI risks in organizations: four core functions — Govern, Map, Measure, Manage.
Chen et al. (Google Brain) publish Speculative Decoding: a small model proposes tokens, the large model verifies them in parallel. Same output, 2-3x faster with no quality change.
Microsoft makes OpenAI models (GPT-3.5-Turbo, Codex, DALL-E) available on Azure with enterprise SLA, VNet isolation, HIPAA and SOC2 compliance. A watershed moment for enterprise AI adoption.
Georgi Gerganov brings OpenAI's Whisper model to CPU via a minimal C++ implementation: real-time transcription with no GPU and no cloud.
VALL-E clones any voice with just 3 seconds of reference audio, no fine-tuning needed, using in-context learning on EnCodec tokens. First zero-shot TTS at naturalistic quality.