GPT-4o: text, voice and images in a single model

In one sentence OpenAI unveils GPT-4o (omni), a single model that natively handles text, audio, and images with ~320 ms voice latency and GPT-4-class text quality — free for ChatGPT free users.

Verified Official source

ShareLinkedIn X

OpenAI introduces GPT-4o ("omni"). The difference with prior models is that text, voice, and images no longer go through separate models: a single model handles everything.

Practical result: you can talk to ChatGPT the way you talk to a person — real time, with interruptions and tone shifts. You can show a photo and discuss it. All of this is available on the free plan too.

For developers or sysadmins, the API costs half what GPT-4 Turbo did, and brings new capabilities (native voice, low latency) that enable use cases that were not viable before.