Realtime voice AI: sub-second latency and multilingual become the norm
Realtime voice APIs from OpenAI, Google and ElevenLabs converge on < 500ms latency, fluent multilingual, natural prosody. Phone as an agentic channel becomes practical.
27 entries
Realtime voice APIs from OpenAI, Google and ElevenLabs converge on < 500ms latency, fluent multilingual, natural prosody. Phone as an agentic channel becomes practical.
Eighteen months after launch (November 2024), Model Context Protocol consolidates: thousands of public servers, confirmed cross-vendor adoption, first stable official registry.
New quantization techniques (high-quality 2-bit / 3-bit extensions) let frontier-sized reasoning models run on workstations with 32-64GB unified RAM.
OpenAI shuts down the Sora app on April 26, 2026; the Sora 2 API will be turned off September 24. Operating costs estimated around $1M/day, compute shifting to ChatGPT/GPT-5.5 and core enterprise.
DeepSeek releases V4 Preview as open source: V4-Pro (1.6T total, 49B active) and V4-Flash (284B total, 13B active). Native 1M-token context, hybrid CSA+HCA attention cutting KV cache by 90%.
OpenAI releases GPT-5.5, GPT-5.5 Thinking, and GPT-5.5 Pro: designed as an "agent runtime" for persistent multi-step workflows. 23% more factually correct vs GPT-5.4. File Library, side-by-side shopping, improved image gen.
Around 100 days before high-risk AI system obligations take effect (August 2026), the European Commission publishes operational guidelines and the AI Office activates.
Google ships two research agents on the Gemini API: Deep Research (fast) and Deep Research Max (deep + slow, 93.3% on DeepSearchQA). MCP support for private data, native visualizations via Nano Banana 2.
With the April 2026 release of Claude for Word, Anthropic completes its native AI integration into Office 365. Cross-app shared context, pivots/charts in Excel, slide editing in PowerPoint, contracts in Word.
A robotics lab (Physical Intelligence or peer) publishes a new multi-embodiment foundation model for general manipulation, trained on cross-robot datasets.
Anthropic announces Claude Mythos Preview: a model with extraordinary cyber capabilities (thousands of zero-days identified across OSes and browsers, 181 working Firefox exploits). Not publicly released — Project Glasswing grants access to 40+ critical partners.
Anysphere ships Cursor 3 (codename Glass): a new Agents Window with parallel agents across local, worktrees, cloud, and remote SSH. Built for developers who orchestrate agents rather than write every line.
OpenAI reorganizes Operator (January 2025) and ChatGPT Agent (July 2025) into a unified platform, with refreshed SDK and new async multi-task execution modes.
Mistral releases Small 4, unifying Magistral (reasoning), Pixtral (multimodal vision), and Devstral (agentic coding) into a single open-weight model, simplifying the deployment stack.
At GTC 2026 NVIDIA confirms its annual cadence: details on Rubin (Blackwell's successor), new rack-scale configurations, updated software stack for training and inference.
GitHub upgrades the Copilot agent: per-task model picker, self-review before opening PRs, code/secret/dependency scanning in-workflow, custom agents in .github/agents/, and CLI handoff. Copilot CLI hits GA the same day.
Google releases Nano Banana 2 (aka Gemini 3.1 Flash Image): much better text rendering, consistency for up to 5 characters and 14 objects, default for image generation in Gemini app, Flow, Lens, and Search AI Mode.
Mistral AI announces a new flagship with extended reasoning, open weights for the research variant. Europe's answer to DeepSeek R2 and the US reasoning models.
Google releases Gemini 3.1 Pro: 77.1% on ARC-AGI-2 (more than double Gemini 3 Pro), 80.6% SWE-Bench Verified, 94.3% GPQA Diamond. Same price as 3 Pro: $2/M input.
Anthropic releases Sonnet 4.6: 79.6% on SWE-bench Verified, 72.5% on OSWorld-Verified (on par with Opus 4.6), better prompt-injection resistance. Pricing unchanged at $3/$15.
Anthropic updates Sonnet to 4.7: focused on agent reliability over long tasks, better tool use, tighter integration with Claude Code and the Claude Agent SDK.
Anthropic releases Opus 4.6: first Opus with 1M-token context in beta, agent teams in Claude Code, leadership on Terminal-Bench 2.0 and Humanity's Last Exam. Pricing unchanged at $5/$25.
Mistral releases Voxtral Transcribe 2: two open-source STT models (Batch + Realtime, 4B params) with latency configurable down to 200ms, Apache 2.0, 13 languages.
DeepSeek ships R2, successor to R1: more efficient step-by-step reasoning, open weights, contained training cost. Fresh pressure on closed reasoning models.
Google DeepMind announces Gemini 3 with Pro and Flash variants: improved reasoning, native long context, deeper integration into Workspace and Android.
Google releases Veo 3.1 and Veo 3.1 Lite: video generation with "ingredients" (multiple reference images for character/scene consistency), 1080p/4K output, vertical format for Shorts. Veo 3.1 Lite is the cost-effective variant.
Anthropic ships Cowork as a research preview: a desktop agent with sandboxed shell and local file access, aimed at people who don't live in the terminal the way Claude Code users do.