Why the next leap in AI video is teaching avatars to see and listen

https://media.thenextweb.com/2026/07/interactive-avatar-models-three-levels-interactivity.avif

TL;DR

AI video is shifting from a fidelity race to an interactivity race. A new class of interactive avatar models can be graded on three levels: Level 1 (talk), Level 2 (talk and listen), and Level 3 (talk, listen, and see). The jump from Level 1 to Level 2, where an avatar learns to listen and react in real time, is the breakthrough that turns a talking face into a convincing conversational counterpart.

For the past few years, progress in generative video and AI avatars has been measured almost entirely in fidelity, with each new model making significant progress in delivering sharper detail, better physics, and smoother motion packaged in longer clips. That race is far from over, but it is starting to miss a more interesting direction. Video, as an online media format, is evolving from a static, broadcast-like experience to a more interactive one.

Software is increasingly mediated...

Copyright of this story solely belongs to thenextweb.com. To see the full text click HERE