TECH NEWS

How Nvidia Made Its ASR Models 3x Faster Than the Competition

Open the Hugging Face Open ASR Leaderboard and sort by RTFx, the inverse real-time factor. Among models with competitive WER, the top of the table is dominated by one family: Nvidia’s Parakeet TDT checkpoints. They process more than 3x as many seconds of audio per second of wall-clock time as the nearest competitor. Their word error rate is competitive with the rest of the top ten.

A gap that wide is rarely just kernel engineering. The mechanism here is architectural. Nvidia's models use a modification to the RNN-Transducer called the Token-and-Duration Transducer, or TDT (Xu et al., 2023).

It changes the decoder loop in a small but consequential way. Instead of stepping through encoder frames one at a time, the model jointly predicts a token and the number of frames that token covers, then jumps.

On long utterances with stretches of silence or steady-state audio, that turns out...

Copyright of this story solely belongs to hackernoon.com. To see the full text click HERE

How Nvidia Made Its ASR Models 3x Faster Than the Competition

Read more

America's top cyber-defense agency left a GitHub repo open with passwords, keys, tokens – and incredibly obvious filenames

SpaceX S-1: xAI had a $6.4B operating loss on $3.2B in revenue in 2025; Grok and X had 550M MAUs combined as of March 2026, and 117M used Grok's AI features

Microsoft Hired An Analyst With An Influential Video Game Blog To Fix Xbox

Google Managed Agents API: fast deployment, Google runtime