Scalability

Categories: Architecture
Sources: Designing Data-Intensive Applications

A system's ability to cope with increased load. It is not a one-dimensional label a system simply "has"; it is a question: if load grows in a specific way, what are our options for handling it? Answering it requires describing load with concrete parameters and performance with concrete numbers.

Why it Matters

"Make it scalable" is meaningless without naming which load grows (requests per second, data volume, fan-out) and which metric must hold (latency, throughput). Stating both turns a vague aspiration into a design question that has answers.

Signals

Scaling discussed as a binary property the system does or does not have.
No agreed load parameter or target metric.
Performance quoted as an average rather than a distribution.

Benefits

Focuses effort on the dimension that will actually grow and makes capacity planning and architecture choices concrete.

Risks

Designing for scale that never arrives (premature scaling), or optimizing the wrong load dimension while the real one saturates.

Tensions

Scaling up (a bigger machine) is simpler but bounded; scaling out (many machines) is unbounded but introduces distribution, partial failure, and coordination cost. The two main tools, replication and partitioning, each carry their own tradeoffs.

Examples

Describing load as "reads per second and average fan-out," then choosing between replicas and caches; splitting data across partitions to handle a write volume one machine cannot hold.