GPT-4V: ChatGPT learns to see (for real)
In one sentence OpenAI activates GPT-4's vision capabilities in ChatGPT (announced six months earlier) and adds voice. Upload an image, talk about it, ask for analysis. Multimodality enters the consumer product.
In March OpenAI had said "GPT-4 can also look at images" but the feature stayed behind the curtain. Six months later it's actually turned on in ChatGPT: you upload a photo and talk about it.
Examples that go viral: photo of an open fridge → "what can I cook?", screenshot of an error → "explain this bug", hand-drawn diagram → "write the corresponding code".
Together with voice (Whisper for input, proprietary TTS for output), ChatGPT stops being a text box and becomes a full multimodal interface.
Companies
OpenAI
Tools
GPT-4V, ChatGPT
Tags
Sources