Knowledge Guide
HomeSystem DesignObservability & SRE

The Cost of Observability — Cardinality

The Cost of Observability — Cardinality

A metrics backend like Prometheus stores one independent time series for every distinct combination of metric name and label values, indexed by a hash of that label set; because storage, memory, and query cost scale with the number of series rather than the number of data points, attaching an unbounded label such as user_id or request_id silently multiplies your series count into the billions and topples the whole system. This is the single most common way self-hosted observability blows its budget — and it fails not with a clean error but with a slow OOM-and-restart death spiral.

Why it matters

Observability is supposed to be the cheap insurance you buy against production surprises. But metrics have a peculiar cost curve: an extra label value is nearly free, while an extra high-cardinality label is catastrophic, because cardinality multiplies. Engineers reach for "just add user_id so we can slice by customer" without realizing they have converted a 40,000-series metric into a 40-billion-series time bomb. Getting the metrics-vs-logs-vs-traces boundary right is what keeps a monitoring bill at hundreds of dollars instead of hundreds of thousands.

Cardinality is a product, not a sum

The total number of series a metric can produce is the Cartesian product of the count of distinct values across all its labels. Take a request counter:

http_requests_total{method, status, endpoint}

Trace the arithmetic with realistic value counts:

LabelDistinct valuesRunning product
method5 (GET, POST, PUT, DELETE, PATCH)5
status~40 (all HTTP codes seen)200
endpoint~200 route templates40,000

40,000 active series is comfortable — a few hundred MB of RAM. Now a well-meaning engineer adds user_id to "debug per-customer latency," with 1,000,000 users:

Label addedDistinct valuesSeries
+ user_id1,000,00040,000,000,000 (40 billion)
+ request_idnew every requestunbounded — grows forever

The jump is multiplicative, not additive: you didn't add a million series, you multiplied the existing 40,000 by a million. And request_id is worse than large — it is unbounded: every request mints a value that is never reused, so the series count climbs without limit until the process dies. This is called a cardinality explosion (or, when driven by user input, a cardinality bomb).

Where the cost actually lives inside the engine

To see why series count — not sample count — is the cost driver, follow one sample into a Prometheus-style TSDB. When http_requests_total{method="GET", status="200", user_id="U8842"} arrives, the engine hashes the full label set to a stable series ID. That ID owns three things for its entire lifetime:

  1. Inverted-index postings. Every label value gets a posting list mapping it to the series IDs that carry it (user_id="U8842" → [ids…]). This is what makes sum by (status) fast — and what balloons when there are a million distinct user_id values, each needing its own posting list entry.
  2. A head chunk in RAM. Each active series keeps an open, in-memory chunk it appends samples to. As a working figure for Prometheus specifically, budget roughly 1–3 KB of RAM per active series (index + chunk overhead) — this is a Prometheus default, not a universal TSDB constant; other engines (VictoriaMetrics, Mimir, Thanos) use different chunk encodings and index layouts and land at different (often lower) per-series overhead. 1,000,000 series ≈ 1–3 GB of head memory before you have stored a single interesting value; 40 billion is simply unallocatable on any of them.
  3. An on-disk compressed chunk stream plus index segments. Prometheus flushes the head block to disk on a configurable interval that defaults to ~2 hours (--storage.tsdb.min-block-duration); Mimir, Thanos, and VictoriaMetrics use their own compaction/flush cadences, so treat the number as "how Prometheus is configured out of the box," not a law of TSDBs.

The crucial asymmetry — and this part does generalize across time-series engines: appending the millionth sample to an existing series is nearly free (delta-of-delta + XOR compression, often <2 bytes/sample). Creating the millionth series costs a fresh chunk, index entries, and memory that is never reclaimed while the series stays active. High scrape frequency is cheap; high cardinality is ruinous.

What belongs in metrics vs logs vs traces

The fix is not "never record user_id" — it is recording each dimension in the signal whose cost model can absorb it. The three pillars have fundamentally different cardinality tolerances:

SignalCost modelCardinality tolerancePut hereKeep out
Metricsper active series (one open chunk each)Low — must be bounded & smallnumeric aggregates: rates, latencies (histograms), error counts, saturation; low-card labels (method, status, region, endpoint template)user_id, request_id, email, full URL, SQL text
Logsper event (bytes written & indexed)High — every field can varyrich per-event context: the exact user, params, error message, stack tracedata you need to graph continuously (that is a metric)
Tracesper sampled requestHigh, but sampledrequest_id, span timings, per-hop causality across servicesunsampled high-QPS firehose (cost + storage)

Rule of thumb: if a dimension is unbounded or user-controlled, it is a log field or a trace attribute — never a metric label. Metrics answer "how many / how fast, sliced by a handful of fixed dimensions." Logs and traces answer "show me this specific event." The bridge between them is the exemplar: attach a trace_id to a single sample inside a latency-histogram bucket, so you can jump from the p99 spike on the graph straight to one representative trace — high-cardinality pointer, zero high-cardinality series.

Pitfalls a working engineer actually hits

Trade-offs & when to reach for a different tool

The inverted-index TSDB model (Prometheus, Thanos, Cortex/Mimir, VictoriaMetrics) is optimized for low-cardinality, high-frequency data and cheap aggregation queries — that is exactly why it punishes cardinality. The named alternatives and mitigations trade differently:

The senior instinct: default to bounded metrics + sampled traces + rich logs, relabel/limit at the edges as a safety net, and only pay for a high-cardinality store when the business genuinely needs per-entity slicing that alerting-grade metrics can't give.

Takeaways

Recall question

A counter api_calls_total{region, tier, endpoint} has 4 regions, 3 tiers, and 250 endpoints, scraped from 300 instances. A teammate proposes adding customer_id (80,000 customers) so dashboards can slice by customer. What happens to the series count, and what should you do instead?

Answer: base cardinality is 4 × 3 × 250 = 3,000 per instance × 300 = 900,000 series (already large). Adding customer_id multiplies by 80,000 → ~72 billion series — an instant OOM. Instead, keep the metric bounded, emit per-customer detail as a log field or a trace attribute, use a recording rule for the aggregate you actually dashboard, and attach a trace_id exemplar to the latency histogram so you can still drill from a dashboard spike into a specific customer's request. If the label leaks in before instrumentation is fixed, drop it at ingest with metric_relabel_configs and set a series limit as a backstop.


Sources: B. Brazil, Prometheus: Up & Running (label/cardinality guidance, the ~1–3 KB/series working figure — a Prometheus-specific default, not a universal TSDB constant); Prometheus documentation on naming, TSDB head blocks (including the configurable min-block-duration), inverted index, relabeling, series limits, and exemplars; C. Majors et al., Observability Engineering (Honeycomb) on wide events and high-cardinality querying; Google SRE Book & Workbook (metrics, SLOs, and the monitoring signal boundary); B. Gregg, Systems Performance (the USE method and metric selection). Re-authored/Deepened for this guide.

🤖 Don't fully get this? Learn it with Claude

Stuck on The Cost of Observability — Cardinality? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🎨 Explain it visually

Build the mental picture, not memorization.

I just read a lesson on **The Cost of Observability — Cardinality** (System Design) and want to truly understand it. Explain The Cost of Observability — Cardinality from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🤔 Walk me through it (interactive)

Socratic — adapts to where you're stuck.

Teach me **The Cost of Observability — Cardinality** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🧪 Quiz me & fix my gaps

Active recall exposes what you missed.

Quiz me on **The Cost of Observability — Cardinality** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🧠 Make it stick

Intuition + hook + flashcards for long-term memory.

Help me remember **The Cost of Observability — Cardinality** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes