Stop Calling Your LLM Once - Split the Call

https://hackernoon.imgix.net/images/a-telephone-cut-in-half-animated-zchyxhaktvkqk7q1oj1wovei.png

The pattern almost everyone starts with

You have a prompt like "generate a 10-slide deck on X." Or "draft a 5-section report on Y." Or "write 8 unit tests for this file." The natural first move is to send one big prompt that asks for everything, parse the JSON that comes back, and render it.

This works in a demo. It often breaks in production. The break is rarely a single dramatic failure. It is a slow, steady set of three problems that show up as soon as real users hit the system.

one big LLM call: user clicks "generate" | v [........... 30 seconds of nothing ............] | v one giant JSON blob arrives | v parse it, render the whole UI at once

That is the shape. Three things go wrong with it.

Problem 1: Nothing happens for 30 seconds.

The first thing the user sees after clicking...

Copyright of this story solely belongs to hackernoon.com. To see the full text click HERE

Read more