← All PostsChatGPT Voice Mode Is Running a 2-Year-Old Model

ChatGPT Voice Mode Is Running a 2-Year-Old Model

May 28, 2026

Ask ChatGPT's voice mode what its knowledge cutoff date is. It'll tell you April 2024.

While the text interface has been on GPT-5 since early 2026, voice mode is quietly running a model that's two generations old — and most users have no idea.

Who Spotted It

Developer Simon Willison flagged it in April 2026, noting that ChatGPT's voice mode tells you its knowledge cutoff is April 2024 — it's a GPT-4o era model. The observation was inspired by an Andrej Karpathy post about the growing gap in AI capability depending on which access point you use.

As of February 2026, voice mode is still powered by GPT-4o or GPT-4o mini, depending on the usage and plan. That's the same model OpenAI retired from the main ChatGPT interface in February 2026 for being outdated and sycophancy-prone.

What the Capability Gap Actually Looks Like

The text interface runs GPT-5.3 Instant by default, with GPT-5.4 Thinking available on paid tiers. Voice mode runs GPT-4o. That's not a minor version difference — that's a completely different generation of model.

It's simultaneously the case that OpenAI's voice mode will fumble the simplest questions while OpenAI's highest-tier Codex model will go off for an hour to coherently restructure an entire codebase or find and exploit vulnerabilities in computer systems.

Same company. Same product. Wildly different capability depending on how you talk to it.

Why Voice Mode Is Stuck on Old Hardware

The reason is a genuine technical constraint, not laziness. Real-time voice requires ultra-low latency inference. Current frontier models — GPT-5 and above — are too large and too slow to run voice-to-voice in real time cost-effectively at scale.

GPT-4o was specifically optimised for this. It processes audio natively, end-to-end, without converting speech to text first. The newer models haven't been optimised the same way yet.

Gemini is ahead here. Google's Gemini Live uses Gemini 3.5 Flash — its latest model — as a direct comparison point. That's the current flagship, not a two-year-old holdover.

Why It Matters for Developers

If you're building on the ChatGPT API and using voice or real-time audio, you're working with GPT-4o under the hood — not GPT-5. Anything that changed in the world after April 2024 is invisible to it unless you're injecting context via RAG or tool calls.

By early 2026, GPT-4o's knowledge cutoff had already become a liability for developers — models were hallucinating libraries or API methods that had changed in late 2025. For voice applications, that problem is still live.

The bigger issue is transparency. Users paying $200 per month for ChatGPT Pro expect feature parity across modes. There's no prominent disclosure that voice mode runs a significantly weaker, older model. You have to ask the model itself to find out.

The Takeaway

Voice AI and text AI are not the same product right now, even inside the same app. If you're using ChatGPT voice for anything that requires current knowledge — recent docs, new frameworks, events from 2025 onward — you're talking to a model that doesn't know any of it.

For developers building voice-first products: know which model you're actually calling, check the knowledge cutoff, and plan your context injection accordingly. The modal experience gap is real and it's not going away soon.

Sources: Simon Willison, Wikipedia — GPT-4o, Build Fast With AI