Our First Mistake Was Treating LLMs Like APIs
One of the common mistakes we made in our first LLM system was using it as a standard API.
Send a request. Get a response. Return it to the user.
It started out fine. The first one was simple to build, simple to demo, and good enough for early users. However, when the traffic started to grow, the issues became more noticeable. The expenses began to increase at a rate that was higher than anticipated. Latency became inconsistent. Slightly different results were obtained with similar requests. It was difficult to debug, since we had almost no visibility into what was going on in the flow.
It's not that the LLM was bad. The issue was the architecture that surrounded it.
It Was A Mistake to Think of LLMs as Simple Endpoints
Typical APIs are predictable. The same input gives the same type of output. You can measure the response time,...
Copyright of this story solely belongs to hackernoon.com. To see the full text click HERE