GPT-4V: ChatGPT learns to see (for real)

In one sentence OpenAI activates GPT-4's vision capabilities in ChatGPT (announced six months earlier) and adds voice. Upload an image, talk about it, ask for analysis. Multimodality enters the consumer product.

Verified Official source

ShareLinkedIn X

In March OpenAI had said "GPT-4 can also look at images" but the feature stayed behind the curtain. Six months later it's actually turned on in ChatGPT: you upload a photo and talk about it.

Examples that go viral: photo of an open fridge → "what can I cook?", screenshot of an error → "explain this bug", hand-drawn diagram → "write the corresponding code".

Together with voice (Whisper for input, proprietary TTS for output), ChatGPT stops being a text box and becomes a full multimodal interface.