Google Gemini 1.0: natively multimodal in three sizes
In one sentence Google announces Gemini Ultra/Pro/Nano, the first family of natively multimodal models (text, images, audio, video). Ultra beats GPT-4 on MMLU 90.0% vs 86.4%. Controversial demo video.
Google ships Gemini, the family of models succeeding PaLM 2. The headline: Gemini is natively multimodal, meaning trained from the start on text, images, audio, and video in a single model. GPT-4 Vision instead "bolts together" separate modules.
Three sizes:
- Ultra: the top model, claimed superior to GPT-4 on 30 of 32 benchmarks tested. First to pass 85% on MMLU "human expert" (90.0%);
- Pro: the size that goes into Bard and Vertex AI, comparable to GPT-3.5;
- Nano: runs on-device, on Pixel 8 Pro for the first time.
The launch is dented by a controversy: the "Hands on with Gemini" demo video is edited and sped up to look real-time. Google admits responses are prompt-and-image, not live video. Ultra isn't available at launch (it arrives in Bard Advanced in February 2024).
Companies
Google, DeepMind
Tools
Gemini Ultra, Gemini Pro, Gemini Nano, Bard
Tags
Sources