OpenAI launches GPT-Realtime-2 and two new voice API models
GPT-Realtime-2 brings GPT-5-class reasoning to live voice. A separate translation model covers 70+ input languages. A streaming Whisper variant handles transcription. The pricing is aggressive enough to make the comparison unavoidable.
OpenAI released three new voice models in its API, broadening the range of surfaces where developers can plug GPT-class reasoning into live audio.
The three are GPT-Realtime-2, a successor to the company’s existing realtime voice model with what OpenAI describes as GPT-5-class reasoning; GPT-Realtime-Translate, a live translation model with more than 70 input and 13 output languages; and GPT-Realtime-Whisper, a streaming speech-to-text model built for low-latency transcription.
The release lands in the middle of a voice-AI build-out that the rest of the industry has spent the past year staffing for. Enterprises that have shipped voice agents have done so on a stack of stitched-together components: Whisper or Deepgram for transcription, ElevenLabs or Cartesia for text-to-speech, GPT-4 or Claude for...
Copyright of this story solely belongs to thenextweb.com. To see the full text click HERE