🔎 Pattern Recognition — Databases
Read a problem, spot the signal, pick the technique — your triage map for Databases.
How to use this guide: read the problem statement, scan for the signals below, and jump to the technique each one points at. Signals are grouped by the decision they resolve — storage choice first, then schema design, then query construction.
Signal group 1 — "Do I even need a database?" (storage choice)
- If you see persistence via reading/writing flat files (CSV, JSON, binary dump) and the app code itself opens/parses/interprets records → name the
file-processing systemmodel (Foundations). It's the pre-database world and tells you which pains to expect. - If you see single writer, small data, no concurrency, no ad-hoc queries (config file, append-only log, one-off export) → files are fine; a DBMS adds operational weight for no gain (Foundations).
- If you see data must outlive the process / survive crashes (durability), many concurrent readers-writers needing a consistent view, querying-filtering-aggregation beyond a linear scan, or atomic multi-step transactions → reach for a
DBMS(Database). Interview trigger: "store user accounts/orders/posts and look them up by X." - If you see structured data, fixed/predefined schema, entity relationships joined by primary/foreign keys, JOINs, strong ACID, strong consistency → reach for a
relational (SQL) database— Postgres/MySQL/Oracle/SQL Server (Database, SQL Fundamentals, Relational Model).
Signal group 2 — "How do I structure the data?" (schema design)
- If you see "design the schema / data model for X" (employee management, library, e-commerce) → start with an
ER model, then mechanically map it to relational tables (ER Models). ER is conceptual design; SQL DDL is the logical/physical output. - If you see a requirements paragraph of nouns and verbs ("an Employee works for a Department and manages a Project") → nouns become entity types, verbs become relationship types (ER Models). This noun/verb scan is the canonical first move.
- If you are sketching tables/collections before writing queries for a new app → this is
data modeling, the upstream step before schema DDL (Data Modeling). Stakeholders describing "things" and "relationships" map straight to entities, attributes, relationships. - If you see "X determines Y" language ("each employee has one department," "a ZIP maps to one city," "order ID gives you the customer") → write a
functional dependencyX → Y(Functional Dependency). FDs are the required input to choosing keys and to every normal form (e.g. BCNF: for every non-trivialX → Y,Xis a superkey). - If you see "must never have duplicate/invalid rows" or "this column must point to a real row in another table" → reach for
integrity constraints:PRIMARY KEY(entity integrity) andFOREIGN KEY(referential integrity) (Relational Model).
Signal group 3 — "Is my schema redundant?" (normalization)
- If you see the same fact repeated across many rows (city "Boston" on every order for a customer) →
normalizeto kill the redundancy and its update anomaly (Normalization). - If you see an UPDATE must touch many rows to stay correct, a DELETE silently destroys unrelated facts, or you can't INSERT a fact without a dummy row → these are the update / deletion / insertion anomalies →
2NF / 3NF / BCNF(Normalization). - If you see a column holding a list or repeating groups ("phones = 555-1, 555-2"; item1, item2, item3) → reach for
1NF: one atomic value per cell, one fact per row (Normalization). - If you see read-heavy analytics, or a NoSQL access-pattern-first design → deliberately
denormalizeinstead (Data Modeling). Normalize for OLTP write-correctness; denormalize for read speed.
Signal group 4 — "How do I write the query?" (SQL construction)
- If you see "for each X, find Y" / "per customer" / "average per category" / "count of X by Y" / one row per group →
GROUP BYwith an aggregate; the grouping key becomes your output granularity (SQL Fundamentals, SQL Practice Problems). E.g.COUNT(follower_id)peruser_id. - If you see just the min/max value per group, no other column carried → plain
GROUP BYwithMIN()/MAX()— do not reach for a window function here (SQL Practice Problems). E.g.MAX(end_time) ... GROUP BY bike_number. - If you see a condition on an aggregate ("only groups with more than N," "users seen on day 1 AND day 7") →
HAVING, neverWHERE(SQL Fundamentals, SQL Practice Problems). Remember:WHEREfilters rows before grouping;HAVINGfilters groups after aggregation. - If you see "combine data from two entities" → a
JOIN(SQL Fundamentals). "Show each order with its customer name" →INNER JOIN; "show all customers including those with no orders" →LEFT OUTER JOIN.
Triage order — what to try, in sequence:
- Decide storage. Is a database even warranted (durability / concurrency / querying / transactions), or do files suffice? If a DB, is it relational? (Foundations, Database)
- Model conceptually. If asked to "design," do the noun/verb ER scan first, then map entities → tables. (Data Modeling, ER Models, Relational Model)
- State the constraints. Write the functional dependencies and pin down PRIMARY/FOREIGN keys. (Functional Dependency, Relational Model)
- Normalize (or knowingly denormalize). Drive out anomalies via 1NF→3NF/BCNF for OLTP; denormalize only for read-heavy/NoSQL cases. (Normalization, Data Modeling)
- Write the query. Map the grouping key, choose GROUP BY + aggregate, place row-filters in WHERE and group-filters in HAVING, and add the right JOIN type. (SQL Fundamentals, SQL Practice Problems)