Publish-Subscribe (Pub/Sub)
TL;DR
Pub/Sub decouples message producers from consumers through topics. Publishers send messages to topics without knowing subscribers. Subscribers receive copies of all messages from topics they subscribe to. Enables event-driven architectures, real-time updates, and loose coupling. Key considerations: fan-out cost, ordering, filtering, and backpressure.
Core Concepts
Architecture
┌─────────────────┐
│ Topic │
│ "user.events" │
└────────┬────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ Sub A │ │ Sub B │ │ Sub C │
│ Email │ │Analytics│ │ Audit │
│Service │ │Service │ │Service │
└────────┘ └────────┘ └────────┘
Publisher doesn't know about subscribers
Each subscriber gets a copy of every messagevs Point-to-Point
Point-to-Point (Queue):
Message ──► [Queue] ──► ONE consumer
Work distribution
Pub/Sub (Topic):
Message ──► [Topic] ──► ALL subscribers
Event broadcastingMessage Flow
1. Publisher sends event to topic
2. Broker stores message
3. Broker fans out to all subscribers
4. Each subscriber processes independently
5. Subscribers acknowledge individuallySubscription Models
Push vs Pull
Push (Broker pushes to subscriber):
Broker ──push──► Subscriber endpoint
Pros:
- Low latency
- Simple subscriber
Cons:
- Subscriber must handle load
- Need webhook endpoint
Example: Google Pub/Sub push subscriptionsPull (Subscriber pulls from broker):
Subscriber ──pull──► Broker
Pros:
- Subscriber controls rate
- Works behind firewalls
Cons:
- Polling overhead
- Higher latency possible
Example: Kafka consumer groupsDurable vs Ephemeral
Durable Subscription:
Subscriber disconnects at T=0
Messages arrive at T=1, T=2, T=3
Subscriber reconnects at T=4
Gets all messages (T=1, T=2, T=3)
Broker stored them during disconnectEphemeral Subscription:
Only receives messages while connected
Missed messages during disconnect are lost
Use for: Real-time displays, live updatesTopic Design
Hierarchical Topics
events.user.created
events.user.updated
events.user.deleted
events.order.placed
events.order.shipped
Wildcards:
events.user.* → All user events
events.*.created → All creation events
events.# → Everything under eventsTopic Naming Conventions
Pattern: <domain>.<entity>.<action>
Examples:
payment.transaction.completed
inventory.stock.low
user.profile.updated
Benefits:
- Clear ownership
- Easy filtering
- Logical groupingSingle vs Multiple Topics
Single topic (events):
All events in one place
Consumers filter by type
Simpler infrastructure
Multiple topics (events.user, events.order):
Natural partitioning
Subscribe to relevant topics only
Better access control
Recommendation: Start with fewer topics, split when neededFan-Out Patterns
Simple Fan-Out
One message → N copies
Publisher ──► [Topic] ──┬──► Sub 1 (email)
├──► Sub 2 (push)
└──► Sub 3 (analytics)
Each gets same message
Process independentlyFan-Out with Filtering
Not all subscribers want all messages
Publisher ──► [Topic: user.events]
│
Filter: type=signup ──► Email Service
Filter: type=purchase ──► Analytics
Filter: country=US ──► US TeamImplementation
python
# Google Cloud Pub/Sub with filter
subscriber.create_subscription(
name="email-signups",
topic="user-events",
filter='attributes.type = "signup"'
)
# Kafka: Consumer reads all, filters in code
for message in consumer:
if message.value['type'] == 'signup':
process_signup(message)Ordering Guarantees
No Ordering (Default)
Published: A, B, C
Subscriber 1 sees: B, A, C
Subscriber 2 sees: A, C, B
No guarantee between subscribers or even for one subscriberPer-Publisher Ordering
Publisher 1: A1, B1, C1 → Subscriber sees A1, B1, C1
Publisher 2: A2, B2, C2 → Subscriber sees A2, B2, C2
But A1 and A2 may interleave arbitrarilyPartition-Based Ordering
Messages with same key → same partition → ordered
user_123 events: login, view, purchase
All go to partition 3
Subscriber sees in order
Different users may interleaveTotal Ordering
All messages in strict global order
Very expensive (single bottleneck)
Rarely neededImplementations
Apache Kafka
Topics → Partitions → Consumer Groups
Producer ──► Topic ──► Partition 0 ──► Consumer Group A
──► Partition 1 (each partition to one consumer)
──► Partition 2
Features:
- Log-based (replay possible)
- Consumer groups for scaling
- Ordered within partition
- High throughputGoogle Cloud Pub/Sub
Topic → Subscriptions
Publisher ──► Topic ──► Subscription A ──► Subscriber 1
──► Subscription B ──► Subscriber 2
Features:
- Managed service
- Push and pull
- Message filtering
- At-least-once (exactly-once preview)Amazon SNS + SQS
SNS Topic → SQS Queues
Publisher ──► SNS Topic ──► SQS Queue A ──► Consumer A
──► SQS Queue B ──► Consumer B
──► Lambda (direct)
Features:
- SNS for fan-out
- SQS for durability and processing
- Multiple protocols (HTTP, email, SMS)Redis Pub/Sub
Simple in-memory pub/sub
PUBLISH user-events '{"type":"login"}'
SUBSCRIBE user-events
Features:
- Very fast
- No persistence (ephemeral)
- No consumer groups
- Good for real-timeHandling Backpressure
The Problem
Publisher: 10,000 msg/sec
Subscriber A: Can handle 10,000 msg/sec ✓
Subscriber B: Can handle 1,000 msg/sec ✗
Subscriber B falls behind
Queue grows unbounded
Eventually: OOM or dropped messagesSolutions
Per-Subscriber Queues:
Topic ──► Queue A (for fast subscriber)
──► Queue B (for slow subscriber)
Each queue buffers independently
Slow subscriber doesn't affect fast oneBackpressure Signals:
Subscriber signals "slow down"
Broker reduces send rate
Or: Subscriber pulls at own paceDead Letter after Timeout:
Message unacked for > 1 hour
Move to dead letter queue
Alert and manual handlingExactly-Once Challenges
Duplicate Delivery
Scenario:
1. Broker sends message to subscriber
2. Subscriber processes
3. Ack lost in network
4. Broker re-sends (thinks it failed)
5. Subscriber processes again
Result: Processed twiceSolutions
1. Idempotent subscriber
Track processed message IDs
Skip if already seen
2. Transactional processing
Process + ack in same transaction
(Not always possible)
3. Deduplication at broker
Broker tracks delivered message IDs
(Limited time window)Event Schema Evolution
The Challenge
Version 1:
{user_id: 123, name: "Alice"}
Version 2 (add field):
{user_id: 123, name: "Alice", email: "..."}
Old subscribers must handle new fields
New subscribers must handle missing fieldsBest Practices
1. Only add optional fields
2. Never remove or rename fields
3. Use schema registry
4. Version in message (or use schema ID)
{
"schema_version": 2,
"user_id": 123,
"name": "Alice",
"email": "alice@example.com" // Optional
}Schema Registry
Publisher:
1. Register schema with registry
2. Get schema ID
3. Include schema ID in message
Subscriber:
1. Get schema ID from message
2. Fetch schema from registry
3. Deserialize with correct schemaUse Cases
Event-Driven Architecture
User signs up
→ UserCreated event published
Subscribed services:
- Email service: Send welcome email
- Analytics: Track signup
- Billing: Create account
- Recommendations: Initialize profile
Services evolve independently
Add new subscriber without changing publisherReal-Time Updates
Stock price changes
→ PriceUpdated event
Subscribers:
- Trading dashboards (WebSocket push)
- Alert service (check thresholds)
- Historical database (record)Log Aggregation
All services ──► Log topic ──► Aggregator ──► Elasticsearch
──► Metrics ──► Prometheus
──► Archive ──► S3Monitoring
Key Metrics
Publication rate:
Messages published per second
Fan-out latency:
Time from publish to subscriber receive
Subscriber lag:
Messages pending per subscription
Acknowledgment rate:
Acks per second (subscriber health)
Dead letter rate:
Failed messages per timeHealth Checks
python
def check_pubsub_health():
# Check broker connectivity
assert can_connect_to_broker()
# Check subscription lag
for sub in subscriptions:
lag = get_subscription_lag(sub)
if lag > threshold:
alert(f"Subscription {sub} lagging: {lag}")
# Check dead letter queue
dlq_size = get_dlq_size()
if dlq_size > 0:
alert(f"Dead letter queue has {dlq_size} messages")Key Takeaways
- Pub/Sub decouples producers from consumers - Neither knows the other
- Each subscriber gets every message - Fan-out pattern
- Durable subscriptions survive disconnection - Messages queued
- Ordering is expensive - Use partition keys when needed
- Backpressure is critical - Slow subscribers can cause problems
- Idempotency handles duplicates - At-least-once is common
- Schema evolution needs planning - Use registry, add-only changes
- Monitor subscriber lag - Early warning of processing issues