Experimenting with TPUs, GKE Managed DRANET, and Multi-cluster Inference Gateway

https://storage.googleapis.com/gweb-cloudblog-publish/images/0-hero-dra.max-2500x2500.png

What happens when your workload fails in one region but you need access to service? This is a common case for availability and uptime. With recent enhancement to the Kubernetes ecosystem and capabilities like Dynamic Resource Allocation (DRA) and Inference Gateway. I decided to experiment with these capabilities in Google Cloud for a simple test using an AI inference workload.

In this blog, we will explore this setup and you can also jump straight into the detailed configs in this codelab Build multi-cluster GKE Inference Gateway, with TPUs , Cloud Storage FUSE and managed DRANET.

Building blocks

To build out this experiment, use the following products, features, and tools:

  • Google Kubernetes Engine (GKE) managed DRANET: This is a managed feature that lets you request and share resources among Pods. This supports GPUs, and TPUs. In this test TPUs were used in two different regions with networking assigned...

Copyright of this story solely belongs to google.com. To see the full text click HERE

Read more

https://images.wsj.net/im-30915037/social

Chainalysis: addresses linked to Iran, Russia, North Korea, and other US-sanctioned entities received $100B+ in crypto last year, almost 8x the amount in 2024

Sponsor Posts Fast, affordable law for startups — Soxton automates startup legal so founders can move faster and sleep better. We handle incorporation, advisor, employment and commercial contracts. Join the waitlist for early access! Stop vibe coding analytics — Equals AI turns questions about your business into auditable spreadsheet models and dashboards.