GPT-4o: text, voice and images in a single model
In one sentence OpenAI unveils GPT-4o (omni), a single model that natively handles text, audio, and images with ~320 ms voice latency and GPT-4-class text quality — free for ChatGPT free users.
OpenAI introduces GPT-4o ("omni"). The difference with prior models is that text, voice, and images no longer go through separate models: a single model handles everything.
Practical result: you can talk to ChatGPT the way you talk to a person — real time, with interruptions and tone shifts. You can show a photo and discuss it. All of this is available on the free plan too.
For developers or sysadmins, the API costs half what GPT-4 Turbo did, and brings new capabilities (native voice, low latency) that enable use cases that were not viable before.
Companies
OpenAI
Tools
GPT-4o, ChatGPT
Tags
Sources