Knowledge Guide
HomeDatabases

Database

Step 2 in the Databases path · 4 concepts · 0 problems

0 / 4 complete

📘 Learn Database from zero

Imagine a giant filing cabinet run by a meticulous clerk. You don't dig through the drawers yourself — you hand the clerk a request ("give me all orders by Maya in May"), and the clerk fetches, files, and keeps everything tidy, even if ten people ask at once. The filing cabinet is the database (the stored data). The clerk is the DBMS (Database Management System) — the software that creates, reads, writes, secures, and protects the data, controlling all access to keep it consistent. You rarely talk to the cabinet directly; you talk to the clerk.

A database is an organized collection of data stored so it can be efficiently retrieved, updated, and kept safe across crashes and concurrent users. The DBMS (Postgres, MySQL, MongoDB) is the engine providing those guarantees. DBMSs split into two families: relational (RDBMS), which store data in tables and query with SQL, and non-relational (NoSQL), which store key-value, document, wide-column, or graph data with a flexible schema.

The dominant model is relational, introduced by Edgar F. Codd in 1970. Data lives in tables (like spreadsheets). Each row is one record, each column a field. A primary key uniquely identifies a row; a foreign key in one table points at another's primary key, linking them.

Worked example. A store has two tables. users(id, name) with row (1, "Maya"), and orders(id, user_id, total) with row (99, 1, 40). Here orders.user_id = 1 is a foreign key pointing to users.id = 1. To answer "what did Maya order?", the DBMS performs a JOIN, matching rows where users.id = orders.user_id, returning ("Maya", 40). You never duplicate Maya's name in the orders table — you link.

Key insight: a database isn't just "saved data" — it's data plus a DBMS that guarantees persistence, concurrent correctness, and efficient querying. The relational model wins when data is structured and related (strong consistency, ACID); non-relational models trade some of those guarantees (often for BASE/eventual consistency) in exchange for flexible schemas and horizontal scale.

✨ Added by the guide to build intuition — not from the source course.

🎯 Guided practice

  1. Easy — File vs. database. An app currently stores users in a plain text file and, on each login, reads the whole file line-by-line to find a match. At 10 million users logins are slow and two simultaneous signups sometimes corrupt the file. What changes when you move to a DBMS?

    Reasoning: Identify the failures. (1) Lookup cost: scanning the file is O(n) per login. A database with an index on the username column makes it O(log n) via a B-tree — the core win. (2) Concurrency: the corruption is a race condition; a DBMS serializes conflicting writes via locks/transactions, so two signups can't clobber each other. (3) Durability: a crash mid-write to a file can lose data; a DBMS uses write-ahead logging to recover. Pattern: when you hear "search-by-field," "many concurrent writers," or "must not lose data," that's the signal to reach for a DBMS — and to add an index on the queried column.

  2. Medium — SQL or NoSQL? You're designing two systems in an interview: (A) a bank ledger tracking account balances and transfers; (B) a social feed storing posts where each post may have different optional fields (image, poll, location) and you expect billions of posts with very high write volume. Pick a database type for each and justify.

    Reasoning: For (A), the non-negotiable requirement is correctness under concurrency: a transfer must atomically debit one account and credit another — both happen or neither (the A in ACID), and the balance must always be exact (strong consistency). Relationships (accounts, transactions) need JOINs. → Choose a relational (SQL) database. Sacrificing consistency for scale is unacceptable here.

    For (B), the drivers are flexible/evolving schema (posts have heterogeneous fields) and massive horizontal write scale. A fixed relational schema fights you, and a single vertically-scaled SQL node won't absorb billions of writes cheaply. → Choose a non-relational store — a document DB (e.g. MongoDB) for the per-post flexible shape, or a wide-column store (e.g. Cassandra) for write-heavy scale, accepting eventual (BASE) consistency (a follower seeing a post a second late is fine). Pattern (canonical SQL-vs-NoSQL frame): map requirements onto the decision axes — data model/schema, scalability (vertical vs horizontal), consistency/transactions (ACID vs BASE, CAP), and query complexity. Strong consistency + transactions + relationships + JOINs ⇒ SQL (Alex Xu's default). Schema flexibility + horizontal scale + tolerance for eventual consistency ⇒ NoSQL. Always justify by the workload, never by fashion.

✨ Added by the guide — work these before the full problem set.

Lessons in this topic