Knowledge Guide
HomeDatabases

Normalization

Step 8 in the Databases path · 14 concepts · 2 problems

0 / 16 complete

📘 Learn Normalization from zero

The problem. A database table is just a grid. The danger is storing the same fact in more than one place. The moment a fact lives in two cells, those two cells can disagree — and now your data lies.

Analogy. Think of a class roster written so the teacher's office number is copied next to every single student. If the teacher moves offices, you must hand-edit hundreds of rows. Miss one, and the database now claims the teacher is in two rooms at once. Normalization says: write each fact exactly once, in the table where it naturally belongs, and point to it by key.

Worked example. Start with one fat Orders table: (OrderID, CustomerID, CustomerCity, Product, Price) keyed on OrderID. Here OrderID → CustomerID → CustomerCity: city depends on the order only through the customer — a transitive dependency, which is a 3NF violation (the single-column key means there is no partial dependency, so it is already in 2NF). Symptoms: the city is duplicated on every order (redundancy); changing a city means editing many rows (update anomaly); deleting a customer's last order erases their city (deletion anomaly). The fix: split into Customers(CustomerID, City) and Orders(OrderID, CustomerID, Product, Price), with CustomerID as a foreign key. City is now stored once; an order merely references it.

The forms are a ladder. 1NF: atomic values, no repeating groups. 2NF: 1NF + no non-key attribute partially depends on part of a composite key. 3NF: 2NF + no transitive dependency (non-key → non-key). BCNF: for every nontrivial FD X→Y, X is a superkey (every determinant is a candidate key). Each rung removes one more way the data can contradict itself.

Key insight: normalization is the discipline of making every fact have one and only one home, so it can never be updated halfway and never contradict itself.

✨ Added by the guide to build intuition — not from the source course.

🎯 Guided practice

  1. Easy — bring a table to 1NF. Given Students(StudentID, Name, Courses) where a row is (1, 'Asha', 'Math, Physics').

    Step 1 — spot the violation: the Courses cell holds a list, so values aren't atomic — a 1NF violation (a repeating group). Step 2 — the wrong fix: splitting into Course1, Course2 is still a repeating group, not 1NF. Step 3 — the right fix: make one row per (student, course): StudentCourses(StudentID, Course)(1,'Math'), (1,'Physics'), keep Students(StudentID, Name) separate. Result: atomic cells, no lists; the pair (StudentID, Course) is the key of the new table.

  2. Medium — 2NF, 3NF, and the 3NF-but-not-BCNF trap. Given Enrollment(StudentID, CourseID, CourseName, Grade) with composite key (StudentID, CourseID).

    Step 1 — list functional dependencies: (StudentID, CourseID) → Grade (a grade needs both); CourseID → CourseName (the name depends on the course alone). Step 2 — test 2NF: CourseName depends on CourseID, which is only part of the composite key — a partial dependency, so this violates 2NF. Step 3 — decompose: pull the partially-dependent attribute into its own table: Courses(CourseID, CourseName) and Enrollment(StudentID, CourseID, Grade). Both are now in BCNF (every determinant is a candidate key). Step 4 — the FAANG follow-up (3NF ≠ BCNF): suppose instead each course has exactly one room and each room hosts exactly one course, with FDs CourseID → Room and Room → CourseID, key (StudentID, CourseID). The table can be in 3NF (every non-key attribute, e.g. via the room, ties back to a candidate key) yet violate BCNF, because Room → CourseID has a determinant (Room) that is not a superkey. This happens precisely with overlapping candidate keys. Pattern learned: partial dependency → 2NF problem; non-key determining non-key → 3NF problem; any non-superkey determinant → BCNF problem. The fix is always to give the determinant its own table and reference it by foreign key — but note BCNF decomposition can lose dependency preservation, which is why 3NF is sometimes the deliberate stopping point.

✨ Added by the guide — work these before the full problem set.

Lessons in this topic