Horizontal vs Vertical Scaling β
TL;DR β
Vertical scaling (scaling up) adds more power to existing machinesβmore CPU, RAM, or storage. Horizontal scaling (scaling out) adds more machines to distribute the load. Most modern distributed systems use horizontal scaling for better fault tolerance and cost efficiency, though vertical scaling is simpler and works well for certain workloads.
The Scaling Problem β
Traffic Growth Over Time:
Requests/sec
β
100k β β±
β β±
50k β β±
β β±
10k β β±ββββββ±
β β±ββββ±
1k β β±ββββ±
ββ±βββ±
ββββββββββββββββββββββββββββββββΊ
Time
Question: How do we handle 100x traffic growth?Vertical Scaling (Scale Up) β
Before: After:
ββββββββββββββββββββ ββββββββββββββββββββ
β Server β β Server β
β β β β
β CPU: 4 cores β ββββΊ β CPU: 64 cores β
β RAM: 16 GB β β RAM: 512 GB β
β SSD: 500 GB β β SSD: 10 TB NVMe β
β β β β
ββββββββββββββββββββ ββββββββββββββββββββWhen Vertical Scaling Works Well β
# Single-threaded workloads benefit from faster CPUs
class SingleThreadedProcessor:
"""
Vertical scaling helps: Faster CPU = faster processing
Examples:
- Complex calculations
- Sequential data processing
- Legacy applications
"""
def process(self, data):
result = complex_computation(data) # CPU-bound
return result
# Memory-intensive workloads
class InMemoryDatabase:
"""
Vertical scaling helps: More RAM = more data in memory
Examples:
- Redis with large datasets
- In-memory analytics
- Caching layers
"""
def __init__(self):
self.data = {} # All data in memory
def get(self, key):
return self.data.get(key) # O(1) accessVertical Scaling Limits β
AWS EC2 Instance Sizes (example):
Instance Type vCPUs Memory Cost/hr
βββββββββββββββββββββββββββββββββββββββββββββ
t3.micro 2 1 GB $0.01
t3.large 2 8 GB $0.08
m5.xlarge 4 16 GB $0.19
m5.4xlarge 16 64 GB $0.77
m5.24xlarge 96 384 GB $4.61
u-24tb1.metal 448 24,576 GB $218.40 β Maximum!
βββββββββββββββββββββββββββββββββββ
β CEILING: Physical limits β
β - Max CPU cores per socket β
β - Max RAM per motherboard β
β - Max I/O bandwidth β
βββββββββββββββββββββββββββββββββββHorizontal Scaling (Scale Out) β
Before: After:
ββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β Server β β Load Balancer β
β β ββββββββββββββββ¬βββββββββββββββ
β Handles 1000 β β
β requests/sec β ββββΊ ββββββββββββββΌβββββββββββββ
β β β β β
ββββββββββββββββββββ ββββββ΄ββββ ββββββ΄ββββ ββββββ΄ββββ
βServer 1β βServer 2β βServer 3β
β 1000/s β β 1000/s β β 1000/s β
ββββββββββ ββββββββββ ββββββββββ
Total: 3000 requests/secStateless Services Scale Horizontally β
from flask import Flask, request
import os
app = Flask(__name__)
# Stateless service - easy to scale horizontally
@app.route('/api/calculate', methods=['POST'])
def calculate():
data = request.json
# No local state - any instance can handle this
result = perform_calculation(data)
return {"result": result, "server": os.getenv("HOSTNAME")}
# Each instance is identical and interchangeable
# βββββββββββ βββββββββββ βββββββββββ
# βInstance1β βInstance2β βInstance3β
# β Code β β Code β β Code β
# β (same) β β (same) β β (same) β
# βββββββββββ βββββββββββ βββββββββββStateful Services Are Harder β
# BAD: Stateful service - hard to scale
class SessionStore:
def __init__(self):
self.sessions = {} # Local state!
def set_session(self, session_id, data):
self.sessions[session_id] = data
def get_session(self, session_id):
return self.sessions.get(session_id)
# Problem: User might hit different server each request
# Request 1 β Server1 (session created here)
# Request 2 β Server2 (session not found!)
# SOLUTION: Externalize state
import redis
class ExternalSessionStore:
def __init__(self):
self.redis = redis.Redis(host='redis-cluster')
def set_session(self, session_id, data):
self.redis.setex(session_id, 3600, json.dumps(data))
def get_session(self, session_id):
data = self.redis.get(session_id)
return json.loads(data) if data else NoneExternalized State Architecture:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Load Balancer β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βββββββββββββββΌββββββββββββββ
β β β
ββββββ΄ββββ ββββββ΄ββββ ββββββ΄ββββ
βServer 1β βServer 2β βServer 3β
βStatelessβ βStatelessβ βStatelessβ
ββββββ¬ββββ ββββββ¬ββββ ββββββ¬ββββ
β β β
βββββββββββββββΌββββββββββββββ
β
βββββββββββββββ΄ββββββββββββββ
β Redis Cluster β
β (Shared State Store) β
βββββββββββββββββββββββββββββDatabase Scaling Strategies β
Vertical Scaling (Simpler) β
-- Single powerful database server
-- Handles all reads and writes
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
user_id INT,
amount DECIMAL(10,2),
created_at TIMESTAMP
);
-- Indexes for performance
CREATE INDEX idx_user_orders ON orders(user_id);
CREATE INDEX idx_created ON orders(created_at);
-- Upgrade server hardware when needed:
-- More RAM β larger buffer pool
-- Faster SSD β faster I/O
-- More cores β more parallel queriesHorizontal Scaling (Read Replicas) β
Write Scaling: Still limited to primary
βββββββββββββββββββ
Writes ββββββββΊβ Primary β
β (Writable) β
ββββββββββ¬βββββββββ
β
Replication β
Stream β
β
ββββββββββββββββββββββΌβββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Replica 1 β β Replica 2 β β Replica 3 β
β (Read-only) β β (Read-only) β β (Read-only) β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β² β² β²
β β β
ββββββββββββββββββββββ΄βββββββββββββββββββββ
β
Readsfrom sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import random
class DatabaseRouter:
def __init__(self):
self.primary = create_engine('postgresql://primary:5432/db')
self.replicas = [
create_engine('postgresql://replica1:5432/db'),
create_engine('postgresql://replica2:5432/db'),
create_engine('postgresql://replica3:5432/db'),
]
def get_write_session(self):
"""All writes go to primary"""
Session = sessionmaker(bind=self.primary)
return Session()
def get_read_session(self):
"""Reads distributed across replicas"""
replica = random.choice(self.replicas)
Session = sessionmaker(bind=replica)
return Session()
# Usage
router = DatabaseRouter()
# Write operation
with router.get_write_session() as session:
order = Order(user_id=123, amount=99.99)
session.add(order)
session.commit()
# Read operation (can hit any replica)
with router.get_read_session() as session:
orders = session.query(Order).filter_by(user_id=123).all()Horizontal Scaling (Sharding) β
Full Horizontal Write Scaling:
βββββββββββββββββββ
β Shard Router β
ββββββββββ¬βββββββββ
β
user_id % 4 = ? β
β
βββββββββββββββββ¬ββββββββ΄ββββββββ¬ββββββββββββββββ
β β β β
βΌ βΌ βΌ βΌ
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β Shard 0 β β Shard 1 β β Shard 2 β β Shard 3 β
βusers β βusers β βusers β βusers β
β0,4,8... β β1,5,9... β β2,6,10...β β3,7,11...β
βββββββββββ βββββββββββ βββββββββββ βββββββββββComparison Table β
| Aspect | Vertical Scaling | Horizontal Scaling |
|---|---|---|
| Complexity | Simple | Complex |
| Downtime | Required for upgrade | Zero-downtime possible |
| Cost curve | Exponential | Linear |
| Failure impact | Single point of failure | Fault tolerant |
| Maximum capacity | Hardware limits | Theoretically unlimited |
| Data consistency | Easy (single node) | Complex (distributed) |
| Application changes | Minimal | May require redesign |
| Operational overhead | Low | High |
Cost Analysis β
Vertical Scaling Cost Curve:
Cost ($)
β
β β±
β β±ββ
β β±ββ
β β±ββ
β β±ββ
β β±βββ
β β±ββββ
β β±βββββ
β β±ββββ
ββ±βββ
ββββββββββββββββββββββββββββββββββββββΊ
Capacity
Cost grows exponentially!
2x performance β 2x cost (often 3-4x cost)
Horizontal Scaling Cost Curve:
Cost ($)
β
β β±
β β±
β β±
β β±
β β±
β β±
β β±
β β±
ββ±
ββββββββββββββββββββββββββββββββββββββΊ
Capacity
Cost grows linearly (with some overhead)
2x servers β 2x cost (+ coordination overhead)def calculate_scaling_cost():
"""Compare costs for 10x capacity increase"""
# Vertical: Upgrade from m5.xlarge to m5.24xlarge
vertical_before = 0.192 * 24 * 30 # $138/month
vertical_after = 4.608 * 24 * 30 # $3,318/month
vertical_ratio = vertical_after / vertical_before # 24x cost!
# Horizontal: Add more m5.xlarge instances
horizontal_before = 0.192 * 24 * 30 # $138/month (1 instance)
horizontal_after = 0.192 * 24 * 30 * 10 # $1,382/month (10 instances)
horizontal_overhead = 0.10 # 10% for load balancer, coordination
horizontal_total = horizontal_after * (1 + horizontal_overhead)
horizontal_ratio = horizontal_total / horizontal_before # 11x cost
return {
"vertical_cost_ratio": vertical_ratio, # 24x
"horizontal_cost_ratio": horizontal_ratio # 11x
}Hybrid Approach β
Most production systems use both strategies:
Optimal Architecture:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Internet β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββ
β Global Load Balancer β
β (Horizontal - DNS-based) β
βββββββββββββββββββ¬βββββββββββββββββββ
β
βββββββββββββββββββ΄ββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Region: US β β Region: EU β
β β β β
β βββββββββββββ β β βββββββββββββ β
β β LB β β β β LB β β
β βββββββ¬ββββββ β β βββββββ¬ββββββ β
β β β β β β
β ββββββ΄βββββ β β ββββββ΄βββββ β
β β β β β β β β β β β β β β β β
β βββ¬ββ¬ββ¬ββ¬ββ β β βββ¬ββ¬ββ¬ββ¬ββ β
β β β β β β β β β β β β
β Web Servers β β Web Servers β
β (Horizontal) β β (Horizontal) β
β β β β
β βββββββββββββ β β βββββββββββββ β
β β Cache β β β β Cache β β
β β (Vertical)β β β β (Vertical)β β
β β 256GB RAM β β β β 256GB RAM β β
β βββββββββββββ β β βββββββββββββ β
β β β β
β βββββββββββββ β β βββββββββββββ β
β β Database β β β β Database β β
β β Primary ββββΌβ Replication βββΌβββ Replica β β
β β(Vertical) β β β β β β
β βββββββββββββ β β βββββββββββββ β
βββββββββββββββββββ βββββββββββββββββββclass HybridScalingStrategy:
"""
Decision framework for when to scale vertically vs horizontally
"""
def recommend_scaling(self, service_type: str, bottleneck: str) -> str:
recommendations = {
# Stateless services: Scale horizontally
("api", "cpu"): "horizontal",
("api", "memory"): "horizontal",
("worker", "cpu"): "horizontal",
# Caches: Often vertical first, then horizontal
("cache", "memory"): "vertical_then_horizontal",
("cache", "cpu"): "horizontal",
# Databases: Depends on workload
("database", "reads"): "horizontal_replicas",
("database", "writes"): "vertical_then_shard",
("database", "storage"): "horizontal_sharding",
# Message queues: Horizontal by design
("queue", "throughput"): "horizontal_partitions",
}
return recommendations.get(
(service_type, bottleneck),
"evaluate_case_by_case"
)
def scale_decision(self, metrics: dict) -> dict:
"""Make scaling decision based on current metrics"""
if metrics['cpu_usage'] > 80:
if metrics['service_type'] == 'stateless':
return {
"action": "scale_out",
"add_instances": self._calculate_instances_needed(
metrics['cpu_usage'],
target=60
)
}
else:
return {
"action": "scale_up",
"new_instance_type": self._next_instance_type(
metrics['current_type']
)
}
if metrics['memory_usage'] > 85:
return {
"action": "scale_up",
"reason": "Memory-bound workloads often benefit from vertical scaling"
}
return {"action": "no_scaling_needed"}Kubernetes Scaling Example β
# Horizontal Pod Autoscaler (Horizontal Scaling)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
---
# Vertical Pod Autoscaler (Vertical Scaling)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: cache-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: cache-server
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: cache
minAllowed:
memory: "1Gi"
cpu: "500m"
maxAllowed:
memory: "32Gi"
cpu: "8"
controlledResources: ["memory"]Decision Framework β
Start
β
βΌ
βββββββββββββββββββ
β Is the service β
β stateless? β
ββββββββββ¬βββββββββ
β
ββββββββββββ΄βββββββββββ
β Yes β No
βΌ βΌ
ββββββββββββββββ ββββββββββββββββββββ
β Scale β β Can state be β
β Horizontally β β externalized? β
ββββββββββββββββ ββββββββββ¬ββββββββββ
β
ββββββββββββ΄βββββββββββ
β Yes β No
βΌ βΌ
ββββββββββββββββ ββββββββββββββββ
β Externalize β β Scale β
β state, then β β Vertically β
β scale out β β first β
ββββββββββββββββ ββββββββββββββββ
β
βΌ
ββββββββββββββββ
β Hit hardware β
β limits? β
ββββββββ¬ββββββββ
β
ββββββββ΄βββββββ
β Yes β No
βΌ βΌ
ββββββββββββββ ββββββββββββ
β Redesign β β Continue β
β for β β vertical β
β horizontal β ββββββββββββ
ββββββββββββββKey Takeaways β
Start simple with vertical scaling: For new systems, vertical scaling is simpler and often sufficient initially
Plan for horizontal scaling: Design stateless services from the start to enable easy horizontal scaling later
Externalize state: Move session data, caches, and shared state to external stores (Redis, databases) to enable horizontal scaling
Use both strategies: Production systems typically use vertical scaling for databases/caches and horizontal scaling for application servers
Consider operational complexity: Horizontal scaling adds complexityβload balancing, service discovery, distributed coordination
Monitor cost curves: Vertical scaling costs grow exponentially; switch to horizontal when the math stops working
Fault tolerance: Horizontal scaling provides natural redundancy; vertical scaling creates single points of failure