🔎 Pattern Recognition — Databases

Read a problem, spot the signal, pick the technique — your triage map for Databases.

How to use this guide: read the problem statement, scan for the signals below, and jump to the technique each one points at. Signals are grouped by the decision they resolve — storage choice first, then schema design, then query construction.

Signal group 1 — "Do I even need a database?" (storage choice)

If you see persistence via reading/writing flat files (CSV, JSON, binary dump) and the app code itself opens/parses/interprets records → name the file-processing system model (Foundations). It's the pre-database world and tells you which pains to expect.
If you see single writer, small data, no concurrency, no ad-hoc queries (config file, append-only log, one-off export) → files are fine; a DBMS adds operational weight for no gain (Foundations).
If you see data must outlive the process / survive crashes (durability), many concurrent readers-writers needing a consistent view, querying-filtering-aggregation beyond a linear scan, or atomic multi-step transactions → reach for a DBMS (Database). Interview trigger: "store user accounts/orders/posts and look them up by X."
If you see structured data, fixed/predefined schema, entity relationships joined by primary/foreign keys, JOINs, strong ACID, strong consistency → reach for a relational (SQL) database — Postgres/MySQL/Oracle/SQL Server (Database, SQL Fundamentals, Relational Model).

Signal group 2 — "How do I structure the data?" (schema design)

If you see "design the schema / data model for X" (employee management, library, e-commerce) → start with an ER model, then mechanically map it to relational tables (ER Models). ER is conceptual design; SQL DDL is the logical/physical output.
If you see a requirements paragraph of nouns and verbs ("an Employee works for a Department and manages a Project") → nouns become entity types, verbs become relationship types (ER Models). This noun/verb scan is the canonical first move.
If you are sketching tables/collections before writing queries for a new app → this is data modeling, the upstream step before schema DDL (Data Modeling). Stakeholders describing "things" and "relationships" map straight to entities, attributes, relationships.
If you see "X determines Y" language ("each employee has one department," "a ZIP maps to one city," "order ID gives you the customer") → write a functional dependency X → Y (Functional Dependency). FDs are the required input to choosing keys and to every normal form (e.g. BCNF: for every non-trivial X → Y, X is a superkey).
If you see "must never have duplicate/invalid rows" or "this column must point to a real row in another table" → reach for integrity constraints: PRIMARY KEY (entity integrity) and FOREIGN KEY (referential integrity) (Relational Model).

Signal group 3 — "Is my schema redundant?" (normalization)

If you see the same fact repeated across many rows (city "Boston" on every order for a customer) → normalize to kill the redundancy and its update anomaly (Normalization).
If you see an UPDATE must touch many rows to stay correct, a DELETE silently destroys unrelated facts, or you can't INSERT a fact without a dummy row → these are the update / deletion / insertion anomalies → 2NF / 3NF / BCNF (Normalization).
If you see a column holding a list or repeating groups ("phones = 555-1, 555-2"; item1, item2, item3) → reach for 1NF: one atomic value per cell, one fact per row (Normalization).
If you see read-heavy analytics, or a NoSQL access-pattern-first design → deliberately denormalize instead (Data Modeling). Normalize for OLTP write-correctness; denormalize for read speed.

Signal group 4 — "How do I write the query?" (SQL construction)

If you see "for each X, find Y" / "per customer" / "average per category" / "count of X by Y" / one row per group → GROUP BY with an aggregate; the grouping key becomes your output granularity (SQL Fundamentals, SQL Practice Problems). E.g. COUNT(follower_id) per user_id.
If you see just the min/max value per group, no other column carried → plain GROUP BY with MIN()/MAX() — do not reach for a window function here (SQL Practice Problems). E.g. MAX(end_time) ... GROUP BY bike_number.
If you see a condition on an aggregate ("only groups with more than N," "users seen on day 1 AND day 7") → HAVING, never WHERE (SQL Fundamentals, SQL Practice Problems). Remember: WHERE filters rows before grouping; HAVING filters groups after aggregation.
If you see "combine data from two entities" → a JOIN (SQL Fundamentals). "Show each order with its customer name" → INNER JOIN; "show all customers including those with no orders" → LEFT OUTER JOIN.

Triage order — what to try, in sequence:

Decide storage. Is a database even warranted (durability / concurrency / querying / transactions), or do files suffice? If a DB, is it relational? (Foundations, Database)
Model conceptually. If asked to "design," do the noun/verb ER scan first, then map entities → tables. (Data Modeling, ER Models, Relational Model)
State the constraints. Write the functional dependencies and pin down PRIMARY/FOREIGN keys. (Functional Dependency, Relational Model)
Normalize (or knowingly denormalize). Drive out anomalies via 1NF→3NF/BCNF for OLTP; denormalize only for read-heavy/NoSQL cases. (Normalization, Data Modeling)
Write the query. Map the grouping key, choose GROUP BY + aggregate, place row-filters in WHERE and group-filters in HAVING, and add the right JOIN type. (SQL Fundamentals, SQL Practice Problems)