Introducing Gemma 4 12B: a unified, encoder-free multimodal model

https://storage.googleapis.com/gweb-uniblog-publish-prod/images/Social_Image_G4_12B.width-1300.png

Jun 03, 2026

Gemma 4 12B is designed to bring high-performance multimodal intelligence directly to your laptop, combining mobile-first efficiency with advanced reasoning.

Olivier Lacombe

Director of Product Management, Google Deepmind

Gus Martins

Product Manager, Google DeepMind

Listen to article

[[duration]] minutes

Today, we are introducing Gemma 4 12B, our latest model designed to bring agentic multimodal intelligence directly to laptops. Bridging the gap between our edge-friendly E4B and our more advanced 26B Mixture of Experts (MoE), Gemma 4 12B packages powerful capabilities inside a reduced memory footprint. It is also our first mid-sized model to feature native audio inputs.

Thanks to the developer community, Gemma 4 models have now crossed 150 million downloads. You’ve built everything from wearable robotic arms for physical assistance to enterprise-grade AI security. We're excited to see what you build with this latest addition.

Here’s an overview of what makes Gemma 4 12B unique:

  • ...

Copyright of this story solely belongs to blog.google. To see the full text click HERE

Read more