The Chubby Lock Service for Loosely-Coupled Distributed Systems
Paper Overview
- Title: The Chubby Lock Service for Loosely-Coupled Distributed Systems
- Authors: Mike Burrows (Google)
- Published: OSDI 2006
- Context: Google needed reliable distributed locking and configuration storage
TL;DR
Chubby is a distributed lock service that provides:
- Coarse-grained locking for distributed coordination
- Reliable storage for small configuration files
- Primary election support for leader selection
- Strong consistency through Paxos replication
Problem Statement
Why a Lock Service?
┌─────────────────────────────────────────────────────────────────┐
│ Coordination Challenges │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Use Cases at Google: │
│ │
│ 1. GFS Master Election │
│ ┌─────────────────────────────────────────────┐ │
│ │ Multiple candidates, only one master │ │
│ │ All clients need to find current master │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ 2. Bigtable Tablet Server Registration │
│ ┌─────────────────────────────────────────────┐ │
│ │ Servers register themselves │ │
│ │ Master tracks live servers │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ 3. Configuration Storage │
│ ┌─────────────────────────────────────────────┐ │
│ │ ACLs, metadata, small config files │ │
│ │ Needs strong consistency │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ Why Lock Service Instead of Paxos Library? │
│ ┌─────────────────────────────────────────────────┐ │
│ │ - Easier to use than raw consensus protocol │ │
│ │ - Fewer servers needed (5 Chubby replicas │ │
│ │ vs. 5+ replicas per service) │ │
│ │ - Provides name service functionality │ │
│ │ - Familiar lock/file interface │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘Architecture
System Overview
┌─────────────────────────────────────────────────────────────────┐
│ Chubby Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Client Applications │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │ │
│ │ │ GFS │ │BigTable│ │MapReduc│ │ Borg │ │ ... │ │ │
│ │ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ │ │
│ └──────┼──────────┼──────────┼──────────┼──────────┼─────┘ │
│ │ │ │ │ │ │
│ └──────────┴──────────┴──────────┴──────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Chubby Cell │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Server │ │ Server │ │ Server │ │ │
│ │ │ (Master) │ │ (Replica)│ │ (Replica)│ │ │
│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │
│ │ │ │ │ │ │
│ │ └─────────────┴─────────────┘ │ │
│ │ │ │ │
│ │ Paxos │ │
│ │ Consensus │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Server │ │ Server │ │ │
│ │ │ (Replica)│ │ (Replica)│ (5 replicas typically) │ │
│ │ └──────────┘ └──────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ Chubby Cell = 5 servers in different failure domains │
│ One master elected via Paxos, handles all reads/writes │
│ │
└─────────────────────────────────────────────────────────────────┘Client Library
python
class ChubbyClient:
"""Chubby client library."""
def __init__(self, cell_name: str):
self.cell = cell_name
self.master = None
self.session = None
self.handles = {}
self.cache = {}
def find_master(self):
"""
Locate the current master.
1. Contact any replica
2. Get master location from DNS
3. Cache for future use
"""
# DNS lookup for cell
replicas = dns_lookup(f"{self.cell}.chubby.google.com")
# Ask any replica for master
for replica in replicas:
try:
response = replica.get_master()
if response.is_master:
self.master = replica
else:
self.master = response.master_location
return
except:
continue
raise NoMasterError()
def open(self, path: str, mode: str) -> Handle:
"""
Open a Chubby file/directory.
Returns a handle for subsequent operations.
"""
if not self.session:
self._create_session()
request = OpenRequest(
path=path,
mode=mode,
events=self._default_events()
)
response = self.master.open(request)
handle = Handle(
id=response.handle_id,
path=path,
mode=mode
)
self.handles[path] = handle
return handle
def acquire_lock(self, handle: Handle,
lock_type: str = 'exclusive') -> bool:
"""
Acquire lock on file.
Lock types:
- exclusive: Only one holder
- shared: Multiple readers
"""
request = AcquireLockRequest(
handle_id=handle.id,
lock_type=lock_type
)
response = self.master.acquire_lock(request)
return response.acquired
def release_lock(self, handle: Handle):
"""Release lock on file."""
request = ReleaseLockRequest(handle_id=handle.id)
self.master.release_lock(request)File System Interface
Namespace
┌─────────────────────────────────────────────────────────────────┐
│ Chubby Namespace │
├─────────────────────────────────────────────────────────────────┤
│ │
│ /ls/cell/... │
│ │
│ Example paths: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ /ls/foo/gfs/master - GFS master location │ │
│ │ /ls/foo/bigtable/root - Bigtable root tablet location │ │
│ │ /ls/foo/mapreduce/job1 - MapReduce job lock │ │
│ │ /ls/foo/acl/users - ACL configuration │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Node Types: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Permanent: Explicitly deleted │ │
│ │ Ephemeral: Deleted when session ends │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Node Metadata: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Instance number (version/generation) │ │
│ │ - Content generation number │ │
│ │ - Lock generation number │ │
│ │ - ACL generation number │ │
│ │ - ACL names │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘File Operations
python
class ChubbyServer:
"""Chubby server-side operations."""
def __init__(self):
self.namespace = {} # path -> Node
self.locks = {} # path -> LockState
self.sessions = {} # session_id -> Session
def open(self, request) -> OpenResponse:
"""
Open or create a node.
Modes:
- READ: Read-only access
- WRITE: Read-write access
- CREATE: Create if not exists
- EXCLUSIVE_CREATE: Fail if exists
"""
path = request.path
# Check if node exists
if path not in self.namespace:
if 'CREATE' not in request.mode:
raise NodeNotFoundError()
self._create_node(path, request)
elif 'EXCLUSIVE_CREATE' in request.mode:
raise NodeExistsError()
# Create handle
handle = self._create_handle(path, request)
# Register event subscriptions
for event_type in request.events:
self._subscribe_event(handle, event_type)
return OpenResponse(handle_id=handle.id)
def get_contents(self, handle_id: int) -> bytes:
"""Read file contents."""
handle = self._get_handle(handle_id)
node = self.namespace[handle.path]
return node.contents
def set_contents(self, handle_id: int, contents: bytes):
"""
Write file contents.
Must hold lock for write.
"""
handle = self._get_handle(handle_id)
# Check lock
if not self._has_write_lock(handle):
raise NoLockError()
node = self.namespace[handle.path]
node.contents = contents
node.content_generation += 1
# Notify subscribers
self._send_event(handle.path, 'CONTENT_MODIFIED')
def _create_node(self, path: str, request):
"""Create new node."""
node = Node(
path=path,
ephemeral=request.ephemeral,
instance_number=self._next_instance_number(),
contents=b'',
acls=request.acls or self._default_acls()
)
self.namespace[path] = node
# Log to Paxos
self.paxos.propose(CreateNode(node))Locking
Lock Semantics
┌─────────────────────────────────────────────────────────────────┐
│ Chubby Locking │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Coarse-Grained Locks: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Held for hours or days, not seconds │ │
│ │ Used for leader election, not fine-grained sync │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Lock Types: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ EXCLUSIVE: Only one holder (writer lock) │ │
│ │ SHARED: Multiple holders (reader lock) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Advisory Locks: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Locks don't prevent file access │ │
│ │ Clients must check lock before accessing │ │
│ │ Simpler, sufficient for Google's use cases │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Sequencer (Lock Token): │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Opaque byte-string describing lock state: │ │
│ │ - Lock name │ │
│ │ - Lock mode (shared/exclusive) │ │
│ │ - Lock generation number │ │
│ │ │ │
│ │ Used to verify lock holder hasn't changed │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘Lock Implementation
python
class LockManager:
"""Chubby lock management."""
def __init__(self):
self.locks = {} # path -> Lock
def acquire(self, path: str, session_id: str,
lock_type: str) -> tuple:
"""
Acquire lock.
Returns (success, sequencer).
"""
if path not in self.locks:
self.locks[path] = Lock(path)
lock = self.locks[path]
if lock_type == 'exclusive':
if lock.exclusive_holder is not None:
return (False, None)
if lock.shared_holders:
return (False, None)
lock.exclusive_holder = session_id
lock.generation += 1
sequencer = self._make_sequencer(lock, 'exclusive')
return (True, sequencer)
elif lock_type == 'shared':
if lock.exclusive_holder is not None:
return (False, None)
lock.shared_holders.add(session_id)
sequencer = self._make_sequencer(lock, 'shared')
return (True, sequencer)
def release(self, path: str, session_id: str):
"""Release lock."""
if path not in self.locks:
return
lock = self.locks[path]
if lock.exclusive_holder == session_id:
lock.exclusive_holder = None
else:
lock.shared_holders.discard(session_id)
def _make_sequencer(self, lock, mode: str) -> bytes:
"""
Create sequencer (lock token).
Allows servers to verify lock is still valid.
"""
return Sequencer(
path=lock.path,
mode=mode,
generation=lock.generation
).encode()
def verify_sequencer(self, sequencer: bytes,
path: str) -> bool:
"""
Verify sequencer is still valid.
Called by resource servers to check lock.
"""
seq = Sequencer.decode(sequencer)
if seq.path != path:
return False
lock = self.locks.get(path)
if not lock:
return False
return seq.generation == lock.generation
class Lock:
"""Lock state."""
def __init__(self, path: str):
self.path = path
self.generation = 0
self.exclusive_holder = None
self.shared_holders = set()Sessions and KeepAlives
Session Management
┌─────────────────────────────────────────────────────────────────┐
│ Chubby Sessions │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Session Lifecycle: │
│ │
│ Client Master │
│ │ │ │
│ │───KeepAlive──────>│ │
│ │ │ Extends lease │
│ │<──────────────────│ │
│ │ lease_timeout │ │
│ │ │ │
│ │───KeepAlive──────>│ │
│ │ │ Extends lease │
│ │<──────────────────│ │
│ │ │ │
│ │ ... │ │
│ │ │ │
│ │ (network │ │
│ │ partition) │ │
│ │ │ │
│ │ │ Master waits local_lease │
│ │ │ before expiring session │
│ │ │ │
│ │ │ Session expired, │
│ │ │ ephemeral files deleted, │
│ │ │ locks released │
│ │
│ Lease Timing: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Default session lease: 12 seconds │ │
│ │ Client sends KeepAlive well before expiry │ │
│ │ Master extends by another lease period │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘KeepAlive Protocol
python
class SessionManager:
"""Manage client sessions."""
def __init__(self, lease_duration: float = 12.0):
self.sessions = {}
self.lease_duration = lease_duration
def create_session(self, client_id: str) -> Session:
"""Create new session."""
session = Session(
id=uuid.uuid4(),
client_id=client_id,
lease_expiry=time.time() + self.lease_duration,
handles=[],
ephemeral_nodes=[]
)
self.sessions[session.id] = session
return session
def keep_alive(self, session_id: str) -> KeepAliveResponse:
"""
Process KeepAlive request.
1. Extend lease
2. Return pending events
3. Block if no events (long polling)
"""
session = self.sessions.get(session_id)
if not session:
return KeepAliveResponse(valid=False)
# Extend lease
session.lease_expiry = time.time() + self.lease_duration
# Get pending events
events = session.pending_events
session.pending_events = []
return KeepAliveResponse(
valid=True,
lease_timeout=self.lease_duration,
events=events
)
def expire_sessions(self):
"""
Background task to expire sessions.
Called periodically.
"""
now = time.time()
for session_id, session in list(self.sessions.items()):
if session.lease_expiry < now:
self._expire_session(session)
def _expire_session(self, session: Session):
"""
Handle session expiry.
1. Release all locks
2. Delete ephemeral nodes
3. Close all handles
"""
# Release locks
for handle in session.handles:
self.lock_manager.release_for_session(session.id)
# Delete ephemeral nodes
for node_path in session.ephemeral_nodes:
self.namespace.delete(node_path)
# Remove session
del self.sessions[session.id]
class GracePeriod:
"""
Handle master failover without breaking sessions.
When master changes, sessions get grace period
to reconnect to new master.
"""
def __init__(self, duration: float = 45.0):
self.duration = duration
def client_side_handling(self, client):
"""
Client behavior during failover.
1. Jeopardy: Can't reach master, may be down
2. Keep trying until grace period expires
3. If reconnect succeeds, continue normally
4. If grace period expires, invalidate session
"""
client.in_jeopardy = True
grace_end = time.time() + self.duration
while time.time() < grace_end:
try:
client.find_master()
# Reconnect session
response = client.master.reconnect_session(
client.session.id
)
if response.valid:
client.in_jeopardy = False
return True
except:
time.sleep(1)
# Grace period expired
client.invalidate_session()
return FalseEvents and Caching
Event System
┌─────────────────────────────────────────────────────────────────┐
│ Chubby Events │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Event Types: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ FILE_CONTENTS_MODIFIED - Content changed │ │
│ │ CHILD_NODE_ADDED - New child in directory │ │
│ │ CHILD_NODE_REMOVED - Child deleted │ │
│ │ CHILD_NODE_MODIFIED - Child metadata changed │ │
│ │ MASTER_FAILOVER - New master elected │ │
│ │ HANDLE_INVALID - Handle expired/revoked │ │
│ │ LOCK_ACQUIRED - Lock obtained │ │
│ │ CONFLICTING_LOCK - Lock contention │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Delivery: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Events delivered via KeepAlive response │ │
│ │ Guaranteed delivery (stored in session) │ │
│ │ Events delivered in order │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘Client-Side Caching
python
class ChubbyCache:
"""Client-side cache with invalidation."""
def __init__(self, client):
self.client = client
self.cache = {} # path -> CachedData
self.subscriptions = set()
def get(self, path: str) -> bytes:
"""
Get file contents with caching.
Cache is consistent via events.
"""
if path in self.cache:
cached = self.cache[path]
if cached.valid:
return cached.contents
# Cache miss or invalidated
contents = self.client.get_contents(path)
# Subscribe to changes
if path not in self.subscriptions:
self._subscribe(path)
self.cache[path] = CachedData(
contents=contents,
valid=True
)
return contents
def _subscribe(self, path: str):
"""Subscribe to content change events."""
self.subscriptions.add(path)
# Events subscribed when opening handle
def handle_event(self, event):
"""
Handle cache invalidation event.
Called when KeepAlive returns events.
"""
if event.type == 'FILE_CONTENTS_MODIFIED':
path = event.path
if path in self.cache:
self.cache[path].valid = False
elif event.type == 'HANDLE_INVALID':
# Handle revoked, clear cache
path = event.path
if path in self.cache:
del self.cache[path]
def invalidate_all(self):
"""
Invalidate entire cache.
Called on master failover.
"""
for cached in self.cache.values():
cached.valid = False
class CacheConsistency:
"""
Cache consistency via invalidation.
Key insight: Chubby guarantees that clients
see a consistent view (may be stale, but
never inconsistent).
"""
def write_with_invalidation(self, server, path: str,
contents: bytes):
"""
Server-side write with cache invalidation.
1. Block write until all caches invalidated
2. Then apply write
"""
# Get all sessions with cached handles
sessions = server.get_caching_sessions(path)
# Send invalidation to all
for session in sessions:
session.pending_events.append(
Event('FILE_CONTENTS_MODIFIED', path)
)
# Wait for acknowledgment (via next KeepAlive)
for session in sessions:
server.wait_for_keepalive(session)
# Now safe to write
server.namespace[path].contents = contentsMaster Election Pattern
Primary Election Using Chubby
python
class PrimaryElection:
"""
Use Chubby for leader election.
Pattern used by GFS, Bigtable, etc.
"""
def __init__(self, chubby: ChubbyClient, service_name: str):
self.chubby = chubby
self.lock_path = f"/ls/cell/{service_name}/primary"
self.is_primary = False
def run_election(self) -> bool:
"""
Attempt to become primary.
1. Open lock file
2. Attempt to acquire lock
3. If successful, write our address
"""
# Open lock file
handle = self.chubby.open(
self.lock_path,
mode='WRITE|CREATE'
)
# Try to acquire exclusive lock
acquired = self.chubby.acquire_lock(
handle,
lock_type='exclusive'
)
if acquired:
self.is_primary = True
# Write our address so others can find us
self.chubby.set_contents(
handle,
self._my_address().encode()
)
return True
else:
return False
def find_primary(self) -> str:
"""
Find current primary.
Read lock file contents.
"""
handle = self.chubby.open(self.lock_path, mode='READ')
contents = self.chubby.get_contents(handle)
return contents.decode()
def watch_primary(self, callback):
"""
Watch for primary changes.
Subscribe to events on lock file.
"""
def on_event(event):
if event.type == 'FILE_CONTENTS_MODIFIED':
new_primary = self.find_primary()
callback(new_primary)
handle = self.chubby.open(
self.lock_path,
mode='READ',
events=['FILE_CONTENTS_MODIFIED']
)
self.chubby.register_callback(handle, on_event)
class ServiceRegistry:
"""
Use Chubby as service registry.
Pattern used by Bigtable tablet servers.
"""
def __init__(self, chubby: ChubbyClient, service_name: str):
self.chubby = chubby
self.service_dir = f"/ls/cell/{service_name}/servers"
def register(self, server_address: str):
"""
Register server in Chubby.
Create ephemeral file with server address.
"""
path = f"{self.service_dir}/{server_address}"
handle = self.chubby.open(
path,
mode='WRITE|CREATE|EPHEMERAL'
)
# Acquire lock to indicate "alive"
self.chubby.acquire_lock(handle, 'exclusive')
# Keep session alive to maintain registration
# Session death = server death
def get_servers(self) -> list:
"""Get list of live servers."""
handle = self.chubby.open(
self.service_dir,
mode='READ'
)
children = self.chubby.get_children(handle)
return children
def watch_servers(self, callback):
"""Watch for server changes."""
handle = self.chubby.open(
self.service_dir,
mode='READ',
events=['CHILD_NODE_ADDED', 'CHILD_NODE_REMOVED']
)
self.chubby.register_callback(handle, callback)Paxos Replication
Consensus Layer
┌─────────────────────────────────────────────────────────────────┐
│ Chubby Paxos Layer │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Multi-Paxos for Log Replication: │
│ │
│ Master Replica 1 Replica 2 │
│ │ │ │ │
│ │───Prepare(n)───────>│ │ │
│ │───Prepare(n)────────┼───────────────────>│ │
│ │ │ │ │
│ │<──Promise(n)────────│ │ │
│ │<──Promise(n)────────┼────────────────────│ │
│ │ │ │ │
│ │───Accept(n,v)──────>│ │ │
│ │───Accept(n,v)───────┼───────────────────>│ │
│ │ │ │ │
│ │<──Accepted(n)───────│ │ │
│ │<──Accepted(n)───────┼────────────────────│ │
│ │ │ │ │
│ │ Committed! │ │ │
│ │
│ Optimizations: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Stable master skips Prepare phase │ │
│ │ - Batching of operations │ │
│ │ - Pipelining of log entries │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘Replication Implementation
python
class ChubbyPaxos:
"""Chubby's Paxos replication layer."""
def __init__(self, replicas: list, my_id: int):
self.replicas = replicas
self.my_id = my_id
self.log = []
self.commit_index = 0
self.is_master = False
self.master_lease = None
def propose(self, operation) -> bool:
"""
Propose operation through Paxos.
Only master can propose.
"""
if not self.is_master:
raise NotMasterError()
# Create log entry
entry = LogEntry(
index=len(self.log),
operation=operation
)
# Multi-Paxos: Master already has promise
# Skip Prepare phase
# Accept phase
accept_count = 1 # Self
for replica in self.replicas:
if replica.id == self.my_id:
continue
try:
response = replica.accept(entry)
if response.accepted:
accept_count += 1
except:
pass
# Check majority
if accept_count > len(self.replicas) // 2:
self.log.append(entry)
self.commit_index = entry.index
# Notify replicas of commit
for replica in self.replicas:
replica.commit(entry.index)
return True
return False
def master_election(self):
"""
Elect new master via Paxos.
Highest-numbered replica usually wins.
"""
# Prepare phase with our proposal number
proposal_num = self._next_proposal_number()
promises = 0
for replica in self.replicas:
try:
response = replica.prepare(proposal_num, self.my_id)
if response.promised:
promises += 1
except:
pass
if promises > len(self.replicas) // 2:
self.is_master = True
self.master_lease = time.time() + 10 # 10s lease
# Become master
self._initialize_master()
def _initialize_master(self):
"""
Initialize as new master.
1. Recover any uncommitted entries
2. Start serving requests
3. Maintain master lease
"""
# Catch up log from replicas
for replica in self.replicas:
entries = replica.get_log_suffix(self.commit_index)
for entry in entries:
if entry.index >= len(self.log):
self.log.append(entry)
# Re-commit any in-progress entries
for i in range(self.commit_index + 1, len(self.log)):
self.propose(self.log[i].operation)Scaling Considerations
Proxy Servers
┌─────────────────────────────────────────────────────────────────┐
│ Scaling Chubby │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Problem: Thousands of clients hitting 5 servers │
│ │
│ Solution: Chubby Proxies │
│ │
│ Clients Proxies Chubby Cell │
│ ┌─────┐ ┌───────┐ │
│ │ C │─────────>│ │ │
│ ├─────┤ │ │ ┌───────────────────┐ │
│ │ C │─────────>│ Proxy │─────────>│ │ │
│ ├─────┤ │ │ │ Chubby Master │ │
│ │ C │─────────>│ │ │ │ │
│ └─────┘ └───────┘ └───────────────────┘ │
│ │
│ ┌─────┐ ┌───────┐ │
│ │ C │─────────>│ │ │
│ ├─────┤ │ Proxy │─────────────────────────────────┘ │
│ │ C │─────────>│ │ │
│ └─────┘ └───────┘ │
│ │
│ Proxy Functionality: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Aggregate KeepAlives from many clients │ │
│ │ - Cache commonly accessed data │ │
│ │ - Handle read requests locally │ │
│ │ - Forward writes to master │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘Partitioning
python
class ChubbyPartitioning:
"""
Partitioning strategies for large deployments.
(Mentioned in paper but not deeply explored)
"""
def partition_by_path_prefix(self, path: str) -> str:
"""
Partition namespace by prefix.
/ls/cell-a/* -> Chubby Cell A
/ls/cell-b/* -> Chubby Cell B
"""
prefix = path.split('/')[2] # cell name
return self.cell_mapping.get(prefix)
def multiple_cells(self):
"""
Deploy multiple independent Chubby cells.
Each cell:
- Serves different namespace prefix
- Independent Paxos group
- Can fail independently
"""
passKey Results
Production Usage
┌─────────────────────────────────────────────────────────────────┐
│ Chubby at Google │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Scale (circa 2006): │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - Multiple Chubby cells across data centers │ │
│ │ - Tens of thousands of clients per cell │ │
│ │ - Hundreds of thousands of files │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Request Rates: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ - 90,000+ KeepAlives per second per cell │ │
│ │ - Heavily read-dominated (caching helps) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Outage Impact: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ "A short Chubby outage can affect a large number │ │
│ │ of servers across Google" │ │
│ │ │ │
│ │ Chubby is critical infrastructure! │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘Influence and Legacy
Impact on Industry
┌──────────────────────────────────────────────────────────────┐
│ Chubby's Legacy │
├──────────────────────────────────────────────────────────────┤
│ │
│ Inspired: │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Apache ZooKeeper - Open source, widely used │ │
│ │ etcd - Kubernetes configuration store │ │
│ │ Consul - HashiCorp service mesh │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ Key Contributions: │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ - Lock service > consensus library │ │
│ │ - Coarse-grained locks pattern │ │
│ │ - Session/lease model │ │
│ │ - Event-based cache invalidation │ │
│ │ - Primary election pattern │ │
│ │ - Service discovery pattern │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘Key Takeaways
- Lock service over library: Easier to use than raw Paxos
- Coarse-grained locks: Designed for leader election, not fine-grained sync
- Sessions with leases: Clean failure detection semantics
- Advisory locks: Simpler, sufficient for most use cases
- Cache with invalidation: Strong consistency with good performance
- Sequencers: Verify lock ownership after acquisition
- Critical infrastructure: Availability is paramount