About

Robin Ewers

Observability Studio is a practical place to learn observability — with a clear focus on
root-cause analysis for performance problems in modern systems.

What this site is for

Observability is often described with buzzwords — but in practice it’s simple:
it helps you answer why a system behaves the way it does, especially under load.
If you’ve ever stared at a p99 chart thinking “okay… but what exactly is causing this?”, you’re in the right place.

What you’ll learn here

  • The fundamentals: APM vs observability, traces vs metrics vs logs, tail latency (p95/p99), and the minimum telemetry needed for RCA.
  • Root-cause analysis: a repeatable workflow: scope → trace-first → metrics proof → classify → actions.
  • Common performance patterns: pool exhaustion, CPU throttling, downstream bottlenecks, retry storms, GC-related spikes — and how to prove each one.
  • Hands-on labs: practical exercises you can reproduce (often with OpenTelemetry-based demos) to build real debugging intuition.

The method (short version)

Most content on this site follows one simple idea:
don’t argue opinions — build evidence.

  1. Scope the problem (journey/endpoint, baseline vs regression, load profile, time window).
  2. Trace-first analysis (critical path, fan-out, waiting vs working, dominant spans).
  3. Prove or falsify hypotheses with metrics (correlation + breakdown by endpoint/version/pod/region).
  4. Classify the issue (downstream, saturation, pool/queue, contention, GC/memory churn, retry storms).
  5. Write it down as a short RCA summary (evidence + confidence + next actions).

Start learning

New to observability? Don’t start with tools. Start with questions.
The “Start Here” page gives you a simple path from fundamentals to practical RCA.

Nach oben scrollen