TECH NEWS

A Guide to AI Cold Starts on Cloud Run

I saw a developer asking on Reddit if there was any “sane way” to manage Cloud Run cold starts for AI across multiple regions. They were experiencing startup latencies of up to 20 seconds, a frustrating gap where the infrastructure is spinning up while the user waits for a response.

The discussion was full of developers who had almost given up on serverless GPUs, with some even migrating back to GKE just to escape the latency. I decided it was time to dive deep into the Mechanics of AI Cold Starts and see if we could find that "sane way."

During my research into hosting models like Gemma 4 on Cloud Run, I had the privilege of co-presenting at Google Cloud Next '26 with Oded Shahar (Senior Engineering Manager for Cloud Run) and our guest speaker Ajay Nair (Global VP of Platform at Elastic).

In our session, "Build AI...

Copyright of this story solely belongs to google.com. To see the full text click HERE

TAIT’s Market Adda themed KYA LAGTA HAI Sparks Insightful Industry Dialogue on Emerging Technology Trends

Alex Karp is saying what every angry CEO is thinking about AI

We Make Lovely Home-Cooked Meals for Ourselves. Why Not Do the Same for Our Dogs?

Irish datacenters now guzzle 23% of the country's electricity

Read more

TAIT’s Market Adda themed KYA LAGTA HAI Sparks Insightful Industry Dialogue on Emerging Technology Trends

Alex Karp is saying what every angry CEO is thinking about AI

We Make Lovely Home-Cooked Meals for Ourselves. Why Not Do the Same for Our Dogs?

Irish datacenters now guzzle 23% of the country's electricity