Knowledge Guide
HomeSystem DesignScalable Systems (Advanced Topics)

What Are Safe Patterns for Online Schema Changes vs

Safe patterns for online schema changes involve strategies (like the expand-and-contract approach) to update a database schema without downtime by making incremental, backward-compatible changes, whereas offline migrations rely on taking the system offline during a maintenance window to apply schema updates in one big batch.

Understanding Schema Changes and Migrations

A database schema change means altering the structure of a database. For example, adding or removing a column, changing a data type, or splitting a table.

A migration is the process of applying those schema changes (and often moving or transforming data to fit the new structure).

Broadly, there are two ways to perform migrations:

Understanding the difference is important because choosing the wrong approach can lead to downtime, lost data, or application errors.

Next, we'll explore safe patterns for each approach and how to execute schema changes reliably.

Offline Migrations (Big-Bang Approach)

Offline migration is the simplest way to change a schema: stop the application, apply all changes, then restart the application with the new schema.

Many small applications or early-stage projects use this approach because it’s straightforward.

For example, if you need to add an index or column, you might put the site into maintenance mode at midnight, run the database updates, and bring it back up when done.

This “stop-the-world” style deployment is often planned via maintenance windows (those banners announcing "The site will be down for maintenance from 1–2 AM" are a result of offline migrations).

Safe Patterns for Offline Migrations

Even though offline migrations are simpler, there are still best practices to ensure safety and quick recovery:

Example Scenario (offline)

Imagine a small e-commerce site that needs to change a column’s data type.

The team schedules maintenance at midnight.

They shut down the site, run a script to alter the column type and convert existing data, then start the site again. This big-bang deployment works because no users are on the system during the change.

However, if the migration fails halfway or the new code has a bug, the site must remain down until issues are fixed and illustrating why offline migrations are risky if not thoroughly planned.

Offline migrations are viable when downtime is acceptable (e.g., an internal app or a small user base during off-hours). They are relatively easy to implement and understand.

However, as systems grow, taking downtime becomes less acceptable and more disruptive. This is where online approaches become crucial.

Online Schema Changes (Zero-Downtime Migrations)

An online migration updates the schema without taking the application down. The application continues serving requests while the database changes happen in the background.

This approach is essential for applications that require high availability (think of global services that can’t go offline).

The challenge is that old application code and new database schema must coexist temporarily without errors.

In other words, the changes must be done in a backward-compatible way so that both the “old” and “new” versions of the application see what they expect in the data.

Performing schema changes live is trickier than offline because you have to account for mixed states.

For example, if you add a column or split a table, the running code might not yet know about the new structure.

The new version of the code expects the new schema, while the old version (still running on some servers) expects the old schema. To handle this, developers use safe patterns that phase the changes.

One common analogy is replacing the legs of a table: an offline migration is like taking the table out of service and swapping all its legs at once; an online migration is like replacing the legs one by one while the table is still holding up. It’s doable with care, but you must ensure the table (system) never collapses.

The Expand-and-Contract Pattern

The most widely used safe pattern for online schema changes is called expand/contract (also known as parallel change or double-buffering).

This multi-step strategy ensures the system remains operational throughout by gradually expanding support to a new schema and then contracting (removing) the old schema once it's no longer needed.

At a high level, the steps are:

  1. Expand the schema: Introduce the new schema changes alongside the old schema. For example, if you need to rename or split a column, add new columns or tables for the new format without dropping the old ones. Ensure these additions don’t conflict with existing usage (make new columns nullable or with default values so that old code isn’t disturbed).

  2. Dual-write (update application to use both schemas): Deploy application code that writes to both the old and new schema structures for any new data, while still reading from the old schema. This means every time a write (insert/update) happens, the data is duplicated into the new column/table as well as the old one. The app still uses the old schema for reads, so users see no change in behavior. This step is often controlled by feature flags or configuration toggles, so it can be rolled out gradually. (At this point, new data is populating both schemas, but old data is only in the old schema.)

  3. Backfill/migrate existing data: In the background, run a migration script or job to copy all the historical data from the old schema to the new schema format. This might involve transforming data (e.g. splitting a full name into first and last name fields, converting a boolean to an enum, etc.), so do it carefully and verify the results. After this step, the new schema has all the data it needs; both new and old records are present in the new structure.

  4. Verify and test: With writes going to both schemas and all data backfilled, you should test the new schema in read mode. For example, run the new code in staging or on a subset of traffic (using feature flags) to ensure it can read from the new schema and function correctly. This step gives confidence before the final switch.

  5. Switch reads to the new schema: Update the application (or flip a feature flag) so that it now reads from the new schema instead of the old. At this point, the application is primarily using the new schema for both reads and (still) writes. However, to be extra safe, the application might continue dual-writing to the old schema for a bit longer, just in case you need to roll back. In practice, this switchover can be done gradually (e.g. one service or one region at a time) to catch any issues early.

  6. Stop writing to old schema: Once you’re confident the new schema is working well for reads and writes, update the app to write only to the new schema (disable the dual-write). The old schema is now obsolete. It’s no longer being read or written.

  7. Contract the schema (remove old parts): Finally, safely drop the old columns or tables that are no longer used. This cleanup should be done only after you verify no code is referencing the old schema. At this point, the migration is complete: the system is fully on the new schema with zero downtime during the whole process.

This expand/contract model guarantees that at all times during the migration, the application remains compatible with the database, avoiding errors or downtime.

If something goes wrong halfway, you can roll back the application code to the previous version without losing data, because the old schema is still intact and was still getting writes.

In other words, each step is backward-compatible, allowing safe reversal if needed.

Online Migration
Online Migration

In the diagram above, during an online migration the application (client) is writing to both the original schema and the new schema in parallel (dual writes) while still reading from the original schema. This ensures all new data is captured in both places, preparing the system to switch over to the new schema once backfilling is complete.

Additional Best Practices for Online Changes

When doing zero-downtime migrations, developers often use techniques to manage the complexity:

Example Scenario (Online)

Suppose we need to split a user’s full name stored in one column name into two columns: first_name and last_name.

Doing this as an online migration, we would:

  1. Expand: Add two new columns first_name and last_name to the users table, but keep the old name column too (we don’t remove it yet). Both new columns are nullable (or have defaults) so existing writes that only use name won’t break.

  2. Deploy a new version of the app that on each user update, it writes to both name and the new columns. For now, reads still use the old name to display names, so users see no change.

  3. Backfill existing data: run a script to parse each name into first and last name and fill the new columns. For names where the split is ambiguous, you might use some rules or fallback (this requires careful handling, as the Prisma guide notes in the case of complex names).

  4. After backfilling, test the new reading logic: switch the app (perhaps for a test cohort of users) to read first_name + last_name instead of name. Verify that names appear correctly.

  5. Cut over all reads to the new columns once confident. The app now uses first_name/last_name for displaying names. Writes might still double-write to name just to be safe.

  6. Finally, update the app to stop writing the old name field (now all writes use only first_name and last_name), and then remove the name column from the database. The schema change is complete with zero downtime – at no point did we have to lock the table in a way that users noticed, nor did we break any running queries.

This approach took more steps than an offline “just change the column” migration, but it kept the system running without errors.

If any bug was discovered after step 3, we could have rolled back the app to use the old name column exclusively again (since it was still being maintained), avoiding a crisis.

The expand-and-contract pattern allowed a seamless schema evolution.

Choosing Online vs. Offline – Key Differences and Trade-offs

Both offline and online migration strategies aim to apply schema changes safely, but they differ in trade-offs:

Many teams adopt a hybrid approach: use offline migrations in the early stages of a project or for non-critical updates, but move to online, safe-by-design migrations as the cost of downtime grows.

The key is to always ensure backward compatibility and have a rollback plan regardless of approach.

🤖 Don't fully get this? Learn it with Claude

Stuck on What Are Safe Patterns for Online Schema Changes vs? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🎨 Explain it visually

Build the mental picture, not memorization.

I just read a lesson on **What Are Safe Patterns for Online Schema Changes vs** (System Design) and want to truly understand it. Explain What Are Safe Patterns for Online Schema Changes vs from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🤔 Walk me through it (interactive)

Socratic — adapts to where you're stuck.

Teach me **What Are Safe Patterns for Online Schema Changes vs** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🧪 Quiz me & fix my gaps

Active recall exposes what you missed.

Quiz me on **What Are Safe Patterns for Online Schema Changes vs** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🧠 Make it stick

Intuition + hook + flashcards for long-term memory.

Help me remember **What Are Safe Patterns for Online Schema Changes vs** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes