| Category | Metrics | Typical alert threshold |
|---|---|---|
| Reliability | P95 latency, timeout rate, retry rate | P95 latency +30% week-over-week |
| Quality | Groundedness, citation rate, user re-ask rate | Re-ask rate > 20% on top intents |
| Cost | Tokens/request, cost/successful task | Cost/task +25% without quality gain |
| Safety | Policy violation rate, escalation rate | Violation spikes by intent cluster |