Data Modeling
Step 4 in the Databases path · 4 concepts · 0 problems
📘 Learn Data Modeling from zero
Data modeling is the discipline of deciding what data you will store and how the pieces relate, before you build the database. Think of it as the blueprint an architect draws before anyone pours concrete: you would never build a house by stacking bricks ad hoc, and you should not build a database by inventing columns on the fly.
First principles. Every model is built from three primitives: entities (the nouns/things you track), attributes (the properties of each thing), and relationships (how things connect, described by their cardinality: 1:1, 1:N, or M:N).
The three levels. You refine the model in stages: a conceptual model (business-facing, technology-agnostic — just entities and relationships), a logical model (adds attributes, primary keys, foreign keys, and normalization, still vendor-neutral), and a physical model (actual tables, data types, indexes, and partitioning for a specific engine like PostgreSQL).
Worked example — a library. Conceptual: entities Member, Book, and Loan; a Member borrows Books. Because one member borrows many books and one book is borrowed by many members over time, that is an M:N relationship — so you resolve it with a Loan associative (junction) entity that also carries its own attribute, due_date. Logical: Member(member_id PK, name), Book(book_id PK, title), Loan(loan_id PK, member_id FK, book_id FK, due_date). Physical: pick BIGINT for ids, DATE for due_date, and add an index on Loan.member_id for fast per-member lookups.
Key insight: a good data model is access-pattern-aware structure decided before code — get the entities, cardinalities, and level of normalization right, and queries become simple; get them wrong, and no amount of query tuning saves you.
✨ Added by the guide to build intuition — not from the source course.
🎯 Guided practice
- Easy — Identify cardinality and resolve it.
Problem: A blogging platform has
Authors andPosts. Each post is written by exactly one author; an author can write many posts. Model this.Step 1 — name entities:
AuthorandPost.Step 2 — determine cardinality: one author → many posts, each post → one author. That is 1:N.
Step 3 — apply the rule: for 1:N, place the foreign key on the "many" side. No junction table is needed.
Answer:
Author(author_id PK, name)andPost(post_id PK, title, author_id FK). Theauthor_idliving onPostis the whole pattern. - Medium — Normalize to 3NF and spot the M:N.
Problem: An orders sheet has columns:
order_id, customer_name, customer_email, product1, product2, product3, product_category. Redesign it properly.Step 1 — 1NF (atomic values, no repeating groups): the
product1/2/3columns are a repeating group. Split products into their own rows/entity. Now an order can contain many products and a product can appear in many orders → this is an M:N relationship.Step 2 — resolve M:N with a junction table: introduce
OrderItem(order_id FK, product_id FK)with a composite PK(order_id, product_id).Step 3 — 2NF then 3NF (remove partial, then transitive dependencies):
customer_emaildepends on the customer, not the order, so extractCustomer.product_categorydepends on the product (a transitive dependency via the product, not on the order key), so it belongs onProduct.Answer:
Customer(customer_id PK, name, email),Order(order_id PK, customer_id FK),Product(product_id PK, name, category),OrderItem(order_id FK, product_id FK, PK(order_id, product_id)).Core pattern learned: repeating columns → split into a new entity; M:N → junction table with a composite key; an attribute that depends on something other than the whole key → move it to where its key lives.
✨ Added by the guide — work these before the full problem set.