Normalization
Step 8 in the Databases path · 14 concepts · 2 problems
📘 Learn Normalization from zero
The problem. A database table is just a grid. The danger is storing the same fact in more than one place. The moment a fact lives in two cells, those two cells can disagree — and now your data lies.
Analogy. Think of a class roster written so the teacher's office number is copied next to every single student. If the teacher moves offices, you must hand-edit hundreds of rows. Miss one, and the database now claims the teacher is in two rooms at once. Normalization says: write each fact exactly once, in the table where it naturally belongs, and point to it by key.
Worked example. Start with one fat Orders table: (OrderID, CustomerID, CustomerCity, Product, Price) keyed on OrderID. Here OrderID → CustomerID → CustomerCity: city depends on the order only through the customer — a transitive dependency, which is a 3NF violation (the single-column key means there is no partial dependency, so it is already in 2NF). Symptoms: the city is duplicated on every order (redundancy); changing a city means editing many rows (update anomaly); deleting a customer's last order erases their city (deletion anomaly). The fix: split into Customers(CustomerID, City) and Orders(OrderID, CustomerID, Product, Price), with CustomerID as a foreign key. City is now stored once; an order merely references it.
The forms are a ladder. 1NF: atomic values, no repeating groups. 2NF: 1NF + no non-key attribute partially depends on part of a composite key. 3NF: 2NF + no transitive dependency (non-key → non-key). BCNF: for every nontrivial FD X→Y, X is a superkey (every determinant is a candidate key). Each rung removes one more way the data can contradict itself.
Key insight: normalization is the discipline of making every fact have one and only one home, so it can never be updated halfway and never contradict itself.
✨ Added by the guide to build intuition — not from the source course.
🎯 Guided practice
- Easy — bring a table to 1NF. Given
Students(StudentID, Name, Courses)where a row is(1, 'Asha', 'Math, Physics').Step 1 — spot the violation: the
Coursescell holds a list, so values aren't atomic — a 1NF violation (a repeating group). Step 2 — the wrong fix: splitting intoCourse1, Course2is still a repeating group, not 1NF. Step 3 — the right fix: make one row per (student, course):StudentCourses(StudentID, Course)→(1,'Math'), (1,'Physics'), keepStudents(StudentID, Name)separate. Result: atomic cells, no lists; the pair(StudentID, Course)is the key of the new table. - Medium — 2NF, 3NF, and the 3NF-but-not-BCNF trap. Given
Enrollment(StudentID, CourseID, CourseName, Grade)with composite key(StudentID, CourseID).Step 1 — list functional dependencies:
(StudentID, CourseID) → Grade(a grade needs both);CourseID → CourseName(the name depends on the course alone). Step 2 — test 2NF:CourseNamedepends onCourseID, which is only part of the composite key — a partial dependency, so this violates 2NF. Step 3 — decompose: pull the partially-dependent attribute into its own table:Courses(CourseID, CourseName)andEnrollment(StudentID, CourseID, Grade). Both are now in BCNF (every determinant is a candidate key). Step 4 — the FAANG follow-up (3NF ≠ BCNF): suppose instead each course has exactly one room and each room hosts exactly one course, with FDsCourseID → RoomandRoom → CourseID, key(StudentID, CourseID). The table can be in 3NF (every non-key attribute, e.g. via the room, ties back to a candidate key) yet violate BCNF, becauseRoom → CourseIDhas a determinant (Room) that is not a superkey. This happens precisely with overlapping candidate keys. Pattern learned: partial dependency → 2NF problem; non-key determining non-key → 3NF problem; any non-superkey determinant → BCNF problem. The fix is always to give the determinant its own table and reference it by foreign key — but note BCNF decomposition can lose dependency preservation, which is why 3NF is sometimes the deliberate stopping point.
✨ Added by the guide — work these before the full problem set.
Lessons in this topic
- ○Introduction to Normalization
- ○First Normal Form (1NF)
- ○Second Normal Form (2NF)
- ○Third Normal Form (3NF)
- ○Boyce-Codd Normal Form (BCNF)
- ○Higher Normal Forms (4NF, 5NF)
- ○Exercise 1
- ○Exercise 2
- ○Designing a Hospital Management System
- ○Designing a Hotel Management System
- ○Designing a Bank Management System
- ○Designing an Instagram
- ○Designing an E-commerce Platform
- ○Designing an Online Food Delivery System
- ○Solution to Exercise 1
- ○Solution to Exercise 2