Knowledge Guide
HomeSystem DesignScalable Systems (Advanced Topics)

What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases

Range sharding, directory-based sharding, and geo-sharding are database sharding strategies that horizontally partition data across multiple servers. Range sharding splits data by contiguous value ranges, directory-based sharding uses a lookup table to map data to specific shards, while geo-sharding partitions data by geographic region to keep data close to its users.

These three approaches are among the most common database sharding techniques (alongside hashed sharding).

In all cases, the goal is to improve scalability and performance by distributing the dataset into shards (independent database partitions) stored on different nodes.

Each method defines a different rule for how a shard key (a chosen key field in each record) determines the shard where that record lives.

Below, we explain each strategy, why it’s important, and typical use cases in simple terms.

What is Range Sharding? (Range-Based Partitioning)

Range sharding (or range-based sharding) partitions the database based on contiguous ranges of the shard key’s values.

In other words, each shard is responsible for a continuous range of values (such as a numeric ID range or alphabetic range).

Range-based Sharding
Range-based Sharding

For example, if users are sharded by last name initial, one shard might contain all names starting with A–I, the next J–S, and another T–Z.

All data falling into a given range is stored together on the same shard, which makes range queries efficient (since a query for a range of values can target a single shard).

However, a key challenge with range sharding is the risk of uneven data distribution or hotspots.

Real data is often skewed. Some ranges may contain far more records or get heavier traffic than others.

For instance, if many last names begin with A or B, the A–I shard could become overloaded while another shard remains underutilized. This can undermine the benefits of sharding by creating a new bottleneck on the “hot” shard.

To mitigate this, careful choice of ranges and periodic rebalancing (splitting or adjusting ranges) may be necessary.

Despite this drawback, range sharding is simple to implement and understand, and it keeps related data (in the same value range) together, which is very intuitive for certain access patterns.

Learn how to choose a sharding key.

Common Use Cases for Range Sharding

Range-based sharding is best suited when data has a natural ordering and queries often involve contiguous ranges of values.

Some typical examples include:

In practice, range sharding works well when the chosen shard key correlates with usage patterns (like time-based access) and the data can be kept roughly balanced.

It’s a straightforward strategy that provides logical segmentation of data, but one must monitor for growing ranges that might need splitting as the dataset evolves.

What is Directory‑Based Sharding? (Lookup Table Sharding)

Directory-based sharding, also known as lookup-based sharding, uses an explicit lookup table (directory) to decide which shard each piece of data belongs to.

Instead of relying on an implicit rule like a range or hash, this approach maintains a mapping of key values to shard identifiers.

Essentially, a separate table (the “directory”) records entries like “Key X ⇒ Shard 1, Key Y ⇒ Shard 2, …,” and all read/write operations consult this table to find the right shard.

Directory-based Sharding
Directory-based Sharding

For example, if you wanted to shard a database by region, you might have a lookup table that maps each region code to a specific shard (e.g. “US-West ⇒ Shard A, US-East ⇒ Shard B, EU ⇒ Shard C,” etc.).

Then all records tagged with “US-West” would be stored on Shard A, and a query for a US-West customer would use the directory to route to Shard A.

The advantage of directory-based sharding is flexibility. You are not constrained to contiguous ranges or a fixed hash function – any logic or criteria can be used to assign data to shards, and you can easily add or reassign shards by updating the lookup table.

Each shard can represent a meaningful grouping (such as a particular category, region, or tenant) rather than an arbitrary slice of the data. This makes it ideal when the shard key has a limited number of possible values (low cardinality) or uneven distribution that doesn’t fit neat ranges.

For instance, if you have only a dozen department names or product categories, you can map each one to a shard directly, which wouldn’t make sense for range sharding.

The trade-offs, however, are additional complexity and a potential performance hit.

Every database operation needs to do an extra lookup to the directory to find the correct shard, introducing an extra query or caching layer. This indirection can add latency (though caching the lookup table in memory can help).

Moreover, the directory becomes a single point of failure – if the lookup table is corrupted or unavailable, the application cannot determine where data lives. This means the directory must be kept highly available and consistent.

Also, if one category or key grows far larger than others, you can still end up with an imbalanced shard (just as with ranges).

Despite these downsides, directory-based sharding is very powerful for custom sharding logic and is commonly used when you need fine-grained control over data placement.

Common Use Cases for Directory Sharding

Directory-based sharding is useful when data naturally clusters into a set of groups or you need dynamic control over shard assignments.

Typical use cases include:

In summary, directory-based sharding shines when you need a custom sharding logic or foresee frequent reconfiguration of how data is partitioned. It offers maximum flexibility at the cost of an extra metadata layer that must be managed carefully.

What is Geo‑Sharding? (Geographical Sharding)

Geo-sharding (geography-based sharding) partitions the database based on geographical location – typically by a region, country, or data center location of the user or data.

In this strategy, each shard corresponds to a certain geographic area’s data.

For example, a social network might keep European users in one shard (stored on EU servers) and American users in another shard (on US servers).

When a user in Los Angeles uses the app, their requests are routed to the Los Angeles (West Coast) shard, which is physically located closer to them.

Essentially, the user’s region or address acts as the shard key, and data is stored in a shard/server located in that region.

The primary benefit of geo-sharding is reducing latency and improving performance for globally distributed users.

By storing data geographically close to where it’s used, read/write requests travel a shorter distance over the network, resulting in faster response times.

For instance, if an Ohio-based customer’s data is stored on a server in Ohio, their queries will generally be faster than if the data resided on a server across the world.

Geo-sharding also helps with data sovereignty and compliance – many organizations need to keep users’ data within certain jurisdictions (like keeping EU citizens’ data on servers located in the EU).

By sharding on region, it’s easier to ensure data never leaves its intended region. This can simplify compliance with laws and regulations about data locality.

That said, geo-sharding can face some of the same challenges as range sharding in terms of imbalance.

Different regions rarely have equal numbers of users or traffic. If one geographic region has a majority of users, its shard could become a hotspot while others are lightly loaded.

For example, if 80% of your customers are in North America and 5% in Australia, a naive geo-shard per continent would put a huge load on the NA shard and very little on the Australia shard.

This uneven data distribution requires planning – sometimes a hybrid approach is used (e.g. sharding by region, and then by hash or range within the heavily-used regions to split them further).

Additionally, geo-sharding alone may not solve all regulatory concerns; you must ensure each shard’s data stays in-region and is accessed accordingly.

But overall, when used appropriately, geo-sharding is a powerful strategy for global applications to serve users with low latency and localized data handling.

Common Use Cases for Geo-Sharding

Geo-sharding is naturally suited for applications with a worldwide user base or location-specific data needs.

Common scenarios include:

Overall, geo-sharding is a specialized form of sharding that leverages physical location for performance and compliance benefits.

It is often combined with other sharding methods (for example, first split by region, then use range or hash sharding within each region’s shard) to achieve both localization and balance.

🤖 Don't fully get this? Learn it with Claude

Stuck on What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🎨 Explain it visually

Build the mental picture, not memorization.

I just read a lesson on **What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases** (System Design) and want to truly understand it. Explain What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🤔 Walk me through it (interactive)

Socratic — adapts to where you're stuck.

Teach me **What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🧪 Quiz me & fix my gaps

Active recall exposes what you missed.

Quiz me on **What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🧠 Make it stick

Intuition + hook + flashcards for long-term memory.

Help me remember **What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes