TECH NEWS

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

When Google launched Gemini three years ago, the goal was to build a multimodal large language model — a single neural network that was trained on text, image, audio, and video and could generate content in any of those formats.

Today, at its Google I/O developer conference, the company took a concrete step toward that goal with Gemini Omni, a new family of multimodal models that Google CEO Sundar Pichai says will be able to “create anything from any input.”

Omni will start with video. Users can now combine images, audio, video, and text, and rather than simply stitching those inputs together, Omni reasons across all of them to produce a consistent output. The result is high-quality videos that reflect an understanding of physics, culture, history, and science.

Omni also lets users edit photos with plain text commands rather than complex editing software, similar to Google’s Nano Banana.

...

Copyright of this story solely belongs to techcrunch.com. To see the full text click HERE

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

Read more

Are the cheap Philips Hue Essential lights any good? Here's what you need to know

7 ways to grow your service business

IBM details a 0.7nm chip manufacturing process that utilizes a “nanostack” 3D transistor architecture, which it says can maintain chip innovation for 10 years

The hidden market turning home internet connections into cover for hackers