Home › System Design › Scalable Systems (Advanced Topics)

What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases

Range sharding, directory-based sharding, and geo-sharding are database sharding strategies that horizontally partition data across multiple servers. Range sharding splits data by contiguous value ranges, directory-based sharding uses a lookup table to map data to specific shards, while geo-sharding partitions data by geographic region to keep data close to its users.

These three approaches are among the most common database sharding techniques (alongside hashed sharding).

In all cases, the goal is to improve scalability and performance by distributing the dataset into shards (independent database partitions) stored on different nodes.

Each method defines a different rule for how a shard key (a chosen key field in each record) determines the shard where that record lives.

Below, we explain each strategy, why it’s important, and typical use cases in simple terms.

What is Range Sharding? (Range-Based Partitioning)

Range sharding (or range-based sharding) partitions the database based on contiguous ranges of the shard key’s values.

In other words, each shard is responsible for a continuous range of values (such as a numeric ID range or alphabetic range).

For example, if users are sharded by last name initial, one shard might contain all names starting with A–I, the next J–S, and another T–Z.

All data falling into a given range is stored together on the same shard, which makes range queries efficient (since a query for a range of values can target a single shard).

However, a key challenge with range sharding is the risk of uneven data distribution or hotspots.

Real data is often skewed. Some ranges may contain far more records or get heavier traffic than others.

For instance, if many last names begin with A or B, the A–I shard could become overloaded while another shard remains underutilized. This can undermine the benefits of sharding by creating a new bottleneck on the “hot” shard.

To mitigate this, careful choice of ranges and periodic rebalancing (splitting or adjusting ranges) may be necessary.

Despite this drawback, range sharding is simple to implement and understand, and it keeps related data (in the same value range) together, which is very intuitive for certain access patterns.

Learn how to choose a sharding key.

Common Use Cases for Range Sharding

Range-based sharding is best suited when data has a natural ordering and queries often involve contiguous ranges of values.

Some typical examples include:

Time-series and log data: Partitioning records by time ranges (e.g. by day, month, or quarter) so that each shard holds a chronological segment of data. For instance, a finance system might store Q1 transactions on one shard, Q2 on another, and so on.
Sequential IDs or numeric ranges: Sharding by an increasing ID range or numeric value. For example, an e-commerce platform could shard orders such that Order IDs 1–10,000 reside on Shard 1, 10,001–20,000 on Shard 2, etc., allowing each shard to handle a portion of the order ID space. Similarly, a social media service might split user accounts into shards by user ID ranges.
Ordered categories (e.g. dates or prices): Any scenario where grouping by value range makes sense. For instance, an online retailer could assign products to shards by price range or customers by their sign-up date. This way, queries for a particular range (e.g. “orders from last week” or “products priced 100”) hit only the relevant shard, improving query speed.

In practice, range sharding works well when the chosen shard key correlates with usage patterns (like time-based access) and the data can be kept roughly balanced.

It’s a straightforward strategy that provides logical segmentation of data, but one must monitor for growing ranges that might need splitting as the dataset evolves.

What is Directory‑Based Sharding? (Lookup Table Sharding)

Directory-based sharding, also known as lookup-based sharding, uses an explicit lookup table (directory) to decide which shard each piece of data belongs to.

Instead of relying on an implicit rule like a range or hash, this approach maintains a mapping of key values to shard identifiers.

Essentially, a separate table (the “directory”) records entries like “Key X ⇒ Shard 1, Key Y ⇒ Shard 2, …,” and all read/write operations consult this table to find the right shard.

For example, if you wanted to shard a database by region, you might have a lookup table that maps each region code to a specific shard (e.g. “US-West ⇒ Shard A, US-East ⇒ Shard B, EU ⇒ Shard C,” etc.).

Then all records tagged with “US-West” would be stored on Shard A, and a query for a US-West customer would use the directory to route to Shard A.

The advantage of directory-based sharding is flexibility. You are not constrained to contiguous ranges or a fixed hash function – any logic or criteria can be used to assign data to shards, and you can easily add or reassign shards by updating the lookup table.

Each shard can represent a meaningful grouping (such as a particular category, region, or tenant) rather than an arbitrary slice of the data. This makes it ideal when the shard key has a limited number of possible values (low cardinality) or uneven distribution that doesn’t fit neat ranges.

For instance, if you have only a dozen department names or product categories, you can map each one to a shard directly, which wouldn’t make sense for range sharding.

The trade-offs, however, are additional complexity and a potential performance hit.

Every database operation needs to do an extra lookup to the directory to find the correct shard, introducing an extra query or caching layer. This indirection can add latency (though caching the lookup table in memory can help).

Moreover, the directory becomes a single point of failure – if the lookup table is corrupted or unavailable, the application cannot determine where data lives. This means the directory must be kept highly available and consistent.

Also, if one category or key grows far larger than others, you can still end up with an imbalanced shard (just as with ranges).

Despite these downsides, directory-based sharding is very powerful for custom sharding logic and is commonly used when you need fine-grained control over data placement.

Common Use Cases for Directory Sharding

Directory-based sharding is useful when data naturally clusters into a set of groups or you need dynamic control over shard assignments.

Typical use cases include:

Fixed set of categories or tenants: When the shard key has a limited set of distinct values (for example, a finite list of regions, departments, product types, user roles, etc.), directory sharding is a good fit. Each category can be explicitly mapped to a shard. For instance, an application might store each department’s data on a different shard, or route users of different account tiers to different shards via a lookup. This approach avoids using ranges or hashes when they don’t apply (e.g. department names have no numeric order and are too few to hash evenly).
Multi-tenant SaaS applications: In multi-tenant systems (many client organizations sharing one system), a directory-based scheme can map each tenant ID (or group of tenants) to one or more shards. This allows the provider to distribute tenants based on size and load – e.g. big tenants each on their own shard, smaller tenants pooled together – simply by updating the lookup table. It provides granular control: as a tenant’s data grows, you can allocate it additional shards and update the mapping accordingly. Many SaaS architectures use a form of directory sharding to balance load per tenant while still isolating data.
Evolving or uneven data distributions: If data distribution is unpredictable or changes over time, directory sharding lets you adjust on the fly. You can move a particular key or set of records to a new shard by changing the directory entry, without re-sharding everything. This is useful in scenarios like gaming or IoT platforms where certain users or devices might suddenly generate disproportionate load – those can be reassigned to their own shard relatively easily.

In summary, directory-based sharding shines when you need a custom sharding logic or foresee frequent reconfiguration of how data is partitioned. It offers maximum flexibility at the cost of an extra metadata layer that must be managed carefully.

What is Geo‑Sharding? (Geographical Sharding)

Geo-sharding (geography-based sharding) partitions the database based on geographical location – typically by a region, country, or data center location of the user or data.

In this strategy, each shard corresponds to a certain geographic area’s data.

For example, a social network might keep European users in one shard (stored on EU servers) and American users in another shard (on US servers).

When a user in Los Angeles uses the app, their requests are routed to the Los Angeles (West Coast) shard, which is physically located closer to them.

Essentially, the user’s region or address acts as the shard key, and data is stored in a shard/server located in that region.

The primary benefit of geo-sharding is reducing latency and improving performance for globally distributed users.

By storing data geographically close to where it’s used, read/write requests travel a shorter distance over the network, resulting in faster response times.

For instance, if an Ohio-based customer’s data is stored on a server in Ohio, their queries will generally be faster than if the data resided on a server across the world.

Geo-sharding also helps with data sovereignty and compliance – many organizations need to keep users’ data within certain jurisdictions (like keeping EU citizens’ data on servers located in the EU).

By sharding on region, it’s easier to ensure data never leaves its intended region. This can simplify compliance with laws and regulations about data locality.

That said, geo-sharding can face some of the same challenges as range sharding in terms of imbalance.

Different regions rarely have equal numbers of users or traffic. If one geographic region has a majority of users, its shard could become a hotspot while others are lightly loaded.

For example, if 80% of your customers are in North America and 5% in Australia, a naive geo-shard per continent would put a huge load on the NA shard and very little on the Australia shard.

This uneven data distribution requires planning – sometimes a hybrid approach is used (e.g. sharding by region, and then by hash or range within the heavily-used regions to split them further).

Additionally, geo-sharding alone may not solve all regulatory concerns; you must ensure each shard’s data stays in-region and is accessed accordingly.

But overall, when used appropriately, geo-sharding is a powerful strategy for global applications to serve users with low latency and localized data handling.

Common Use Cases for Geo-Sharding

Geo-sharding is naturally suited for applications with a worldwide user base or location-specific data needs.

Common scenarios include:

Globally distributed applications: Large web platforms (social networks, content platforms, SaaS services) often shard by region to serve users from the nearest data center. Each region’s shard handles local user accounts and data, drastically speeding up access. For example, a dating app might store and query European user profiles on EU-based servers, Asia-Pacific users on APAC servers, etc., to minimize network delay. This also localizes the impact of any outages (if one region’s shard goes down for maintenance, other regions aren’t affected).
Compliance and data residency: Industries like finance, healthcare, or government often face legal requirements to keep data within certain geographic boundaries. Geo-sharding can ensure that, say, Canadian customer data stays on Canadian soil, EU data stays in EU, and so on, by assigning each country or region its own shard stored in the appropriate location. This segmentation by country/region not only improves performance but also makes it easier to demonstrate compliance with data protection regulations.
Regional content and services: Sometimes data is naturally tied to location (for example, a ride-sharing service might shard data by city/region, since almost all rides and queries are local). By sharding along geographic lines, each shard can be optimized and scaled according to regional demand. It also allows regional autonomy – you could take a region’s shard offline for maintenance or backup without impacting users in other regions (e.g. taking the Europe shard offline during EU nighttime). Geo-sharding thus aligns the data architecture with the geographical patterns of the business.

Overall, geo-sharding is a specialized form of sharding that leverages physical location for performance and compliance benefits.

It is often combined with other sharding methods (for example, first split by region, then use range or hash sharding within each region’s shard) to achieve both localization and balance.

🤖 Don't fully get this? Learn it with Claude

Stuck on What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🎨 Explain it visually

Build the mental picture, not memorization.

I just read a lesson on **What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases** (System Design) and want to truly understand it. Explain What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🤔 Walk me through it (interactive)

Socratic — adapts to where you're stuck.

Teach me **What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧪 Quiz me & fix my gaps

Active recall exposes what you missed.

Quiz me on **What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧠 Make it stick

Intuition + hook + flashcards for long-term memory.

Help me remember **What Are Range Sharding, Directory‑Based Sharding, and Geo‑Sharding, and What Are the Common Use Cases** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes

← How Do I Choose a Good Shard Key a What Is the Difference Between Ren →