Database Sharding and Partitioning

2024/10/21

What is a Database Server?

A DB server is just a database process (mysqld, mongod) running in an EC2 instance. It exposes a port through which outside requests can communicate — default 3306 for MySQL, configurable at installation.

The d in mysqld signifies it’s a daemon process — it keeps running in the background of a specific instance.

Vertical Scaling

Increasing the RAM and disk of the EC2 instance the DB server resides on. No change to the mysqld process itself.

Limitation — there is a physical limit to how large an instance can get. Vertical scaling can only go so far.

Horizontal Scaling

Read Replicas (scale reads)

Read replicas don’t help with high write throughput — for that, you need sharding.

Sharding and Partitioning (scale writes)

Add another DB server, but this time split the data across both:

In other words: the DB server is sharded, the data is partitioned.

Data Partitioning

d1 + d2 + d3 + d4 + d5 = total data

Paritioning!

Each partition maps to a shard. If a partition becomes hot and frequently accessed, it can be moved to a different shard.

Categories of Partitioning

Horizontal Partitioning — operates at the document or row level. Split rows across shards based on a key (e.g. user ID ranges).

Vertical Partitioning — operates at the column or table level. Split columns or entire tables across shards based on access patterns.

The right choice depends on access pattern, load, and use case.

Paritioning!

The Four Configurations

A — Single DB server in a single EC2 instance. The baseline.

B — Single DB server in a single EC2 instance, with logical partitioning (e.g. two separate databases). Same machine, different logical separation.

C — Single DB server running across two EC2 instances, both with the same data. This is the read replica use case.

D — Single DB server running across two EC2 instances, with data shared across both instances. This is true sharding.

Advantages of Sharding

Disadvantages of Sharding

References: