How Semantic Routing and Caching Can Cut Enterprise LLM Spend by 50%

https://hackernoon.com/fallback-feat.png

You receive your new AWS billing statement and it’s $47,000 as opposed to the $8,000 you were expecting. You did not modify the code. There was no increase in traffic. What you failed to remember is that for 30 consecutive days all queries (the “Hello, how are you?” ones and the “complex” ones) went through GPT-4. Automatically. Invisibly.

That's not a bug. This is a missing router.

Check that invoice again. FAQ lookups. Form field extraction. Greeting messages. All of those did not require GPT-4. They wanted something that was one-tenth the price and that would have yielded the same result. But no one set it up, so GPT-4 did everything, such as hiring a neurosurgeon to take your blood pressure.

Now, the pricing difference from the cheapest to the most capable LLM is 250x, and routing is no longer an option at enterprise scale — it's a difference between...

Copyright of this story solely belongs to hackernoon.com. To see the full text click HERE

Read more

https://substackcdn.com/image/fetch/$s_!sGDs!,w_1200,h_675,c_fill,f_jpg,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F250e5...

An analysis based on current valuations of OpenAI and Anthropic suggests ~$370B of philanthropic assets tied to the two AI companies are poised to become liquid

Sponsor Posts Niantic Spatial: World models need real-world data — Scaniverse is the gateway to spatial services — self-serve and built for AI and robotics. Large-area 3D reconstruction from 360° cameras and precise localization, anywhere machines operate. Protecting your Cloud Applications Data — Backing up Office 365, Google Workspace, Dropbox & Salesforce data

https://images.ft.com/v3/image/raw/https%3A%2F%2Fd1e00ek4ebabms.cloudfront.net%2Fproduction%2F0523a044-0ad4-4a52-9a5c-860fc5515195.jpg?source=next-article&fit=scale-down&quality=highest&wi...

Q&A with Harvey CEO Winston Weinberg on launching the legal AI startup in 2022, how AI could shake up law firm business models, legal AI competition, and more

Sponsor Posts Niantic Spatial: World models need real-world data — Scaniverse is the gateway to spatial services — self-serve and built for AI and robotics. Large-area 3D reconstruction from 360° cameras and precise localization, anywhere machines operate. Protecting your Cloud Applications Data — Backing up Office 365, Google Workspace, Dropbox & Salesforce data

https://cdn.arstechnica.net/wp-content/uploads/2026/05/GettyImages-2164333125-1152x648.jpg

Two research papers describe how Google's Co-Scientist and nonprofit FutureHouse's AI tools can succeed at drug-retargeting tasks by forming hypotheses

Sponsor Posts Niantic Spatial: World models need real-world data — Scaniverse is the gateway to spatial services — self-serve and built for AI and robotics. Large-area 3D reconstruction from 360° cameras and precise localization, anywhere machines operate. Protecting your Cloud Applications Data — Backing up Office 365, Google Workspace, Dropbox & Salesforce data