Why Errors and Saturation Matter More Than You Think - Part 2
A practical guide
In the first part, I covered the two initial signals to diagnose that something is wrong:
- Latency
- Traffic
Those two alone explain a surprising number of production incidents. But they don’t explain everything. Rising latency tells you a problem is developing. Traffic tells you what the system is dealing with.
I mentioned two more signals:
- Errors
- Saturation
These two tell you something more important - whether the system is approaching failure. And this is where monitoring becomes truly operational. I will cover those two signals in this blog. Let us start with Errors.
Errors - The most misunderstood signal
Many teams think error monitoring is simple. It is about counting failures. Raise an alert when they increase. In practice, error metrics are rarely that straightforward.
The first mistake teams make is treating all errors as equal. They are not. Some errors...
Copyright of this story solely belongs to hackernoon.com. To see the full text click HERE