Skip to content
AImpact
IT EN
High Multimodal AI · 1 min read

GPT-4V: ChatGPT learns to see (for real)

In one sentence OpenAI activates GPT-4's vision capabilities in ChatGPT (announced six months earlier) and adds voice. Upload an image, talk about it, ask for analysis. Multimodality enters the consumer product.

Verified Official source
ShareLinkedInX
Reading level

In March OpenAI had said "GPT-4 can also look at images" but the feature stayed behind the curtain. Six months later it's actually turned on in ChatGPT: you upload a photo and talk about it.

Examples that go viral: photo of an open fridge → "what can I cook?", screenshot of an error → "explain this bug", hand-drawn diagram → "write the corresponding code".

Together with voice (Whisper for input, proprietary TTS for output), ChatGPT stops being a text box and becomes a full multimodal interface.

Companies

OpenAI

Tools

GPT-4V, ChatGPT

Tags

OpenAIGPT-4VVisionMultimodalChatGPT

Sources