This section is still under construction. I’m building a set of tool-agnostic, repeatable playbooks
for performance root-cause analysis—so you can go from “it’s slow” to a clear causal chain with evidence.
What will be here soon
- Playbooks by symptom (p99 spikes, error bursts, saturation, timeouts, retry storms)
- Playbooks by cause class (CPU throttling, pool exhaustion, DB hot paths, GC/memory churn)
- Trace-first RCA workflow with checklists and decision trees
- Copy/paste templates for one-page RCA summaries and incident notes
Start here (already available)
- Root Cause Analysis Checklist
- Traces, Metrics, Logs
- p95 vs p99: Tail Latency
- Kubernetes signals for RCA
Tip: if you want to follow along, bookmark this page—new playbooks will be added continuously.