Event-Driven Architecture - Building Scalable, Loosely Coupled Production Systems
Event-driven architecture (EDA) enables building highly scalable, loosely coupled systems by having components communicate through events rather than direct calls. Companies like Netflix (processing billions of events daily), Uber (event sourcing for trip state), and LinkedIn (event streaming with Kafka) use EDA to achieve massive scale and system resilience.
This guide covers event fundamentals, event sourcing for state management, CQRS pattern for read/write separation, event streaming with Kafka, publish-subscribe messaging patterns, event choreography vs orchestration, handling eventual consistency, and production deployment strategies for building robust event-driven systems.
Table of Contents
Event-Driven Architecture Fundamentals
What are Events?
Events are immutable facts that represent something that happened in the system. Unlike commands (requests to do something), events describe what already occurred and cannot be changed or rejected.
Event characteristics:
- Immutable - Once created, events never change
- Past tense - Named for what happened (OrderPlaced, PaymentCompleted)
- Timestamped - Include when the event occurred
- Self-contained - Carry all necessary data
Example event:
{
"eventId": "evt_123",
"eventType": "OrderPlaced",
"timestamp": "2026-03-17T10:30:00Z",
"aggregateId": "order_456",
"data": {
"userId": "user_789",
"items": [
{ "productId": "prod_1", "quantity": 2, "price": 29.99 }
],
"total": 59.98,
"shippingAddress": { "street": "123 Main St", "city": "Seattle" }
},
"metadata": {
"causationId": "cmd_abc",
"correlationId": "trace_xyz"
}
}
Benefits of Event-Driven Architecture
Loose Coupling:
Services don't need to know about each other. Publishers emit events without knowing who consumes them.
Scalability:
Event consumers can be scaled independently based on workload. Multiple instances process events in parallel.
Resilience:
If a consumer is down, events are persisted and processed when it recovers. No data loss.
Audit Trail:
Event log provides complete history of what happened in the system for debugging and compliance.
Temporal Decoupling:
Producers and consumers don't need to be online simultaneously. Events persist in queue/log.
Event Sourcing
Storing Events Instead of Current State
Traditional approach stores current state:
// Database: orders table
{
id: "order_456",
status: "SHIPPED", // Current state only
total: 59.98,
updatedAt: "2026-03-17T14:00:00Z"
}
Event sourcing stores all events:
// Event store: order events
[
{ type: "OrderPlaced", timestamp: "10:30:00", data: {...} },
{ type: "PaymentCompleted", timestamp: "10:31:00", data: {...} },
{ type: "OrderShipped", timestamp: "14:00:00", data: {...} }
]
// Current state is derived by replaying events
const order = events.reduce((state, event) => {
switch (event.type) {
case 'OrderPlaced':
return { status: 'PENDING', ...event.data };
case 'PaymentCompleted':
return { ...state, status: 'PAID', paymentId: event.data.paymentId };
case 'OrderShipped':
return { ...state, status: 'SHIPPED', trackingNumber: event.data.trackingNumber };
}
}, {});
Benefits:
- Complete audit trail - Never lose historical data
- Time travel - Rebuild state at any point in time
- Bug fixes - Replay events with fixed logic
- Analytics - Process historical events for insights
Challenges:
- Complexity - More complex than CRUD
- Event schema evolution - Must handle old event formats
- Performance - Replaying many events can be slow (solved with snapshots)
Event Store Implementation
class EventStore {
async appendEvent(streamId, event) {
// Append event to stream (immutable)
await db.events.insert({
streamId,
eventId: event.eventId,
eventType: event.eventType,
timestamp: event.timestamp,
data: event.data,
metadata: event.metadata,
version: await this.getNextVersion(streamId)
});
// Publish event to subscribers
await eventBus.publish(event.eventType, event);
}
async getEvents(streamId, fromVersion = 0) {
return db.events.find({
streamId,
version: { $gte: fromVersion }
}).sort({ version: 1 });
}
async getCurrentState(streamId) {
// Check snapshot cache first
let state = await this.getSnapshot(streamId);
const fromVersion = state ? state.version + 1 : 0;
// Replay events since snapshot
const events = await this.getEvents(streamId, fromVersion);
return events.reduce((s, e) => this.applyEvent(s, e), state || {});
}
async saveSnapshot(streamId, state, version) {
// Save snapshot every N events for performance
await db.snapshots.upsert({ streamId }, { state, version });
}
}
CQRS Pattern
Command Query Responsibility Segregation
Separate read and write models for different optimization strategies:
Write Model (Commands):
- Optimized for consistency and validation
- Enforces business rules
- Writes to event store
Read Model (Queries):
- Optimized for fast queries
- Denormalized views
- Eventually consistent with write model
Implementation:
// Write side: Handle commands, emit events
class OrderCommandHandler {
async placeOrder(command) {
// Validate command
if (!command.items.length) {
throw new Error('Order must have items');
}
// Create event
const event = {
eventId: uuid(),
eventType: 'OrderPlaced',
timestamp: new Date(),
aggregateId: command.orderId,
data: {
userId: command.userId,
items: command.items,
total: command.items.reduce((sum, item) => sum + item.price * item.quantity, 0)
}
};
// Append to event store
await eventStore.appendEvent(`order-${command.orderId}`, event);
return { orderId: command.orderId };
}
}
// Read side: Project events into query models
class OrderProjection {
async onOrderPlaced(event) {
// Update read model (denormalized for fast queries)
await db.orderViews.insert({
orderId: event.aggregateId,
userId: event.data.userId,
status: 'PENDING',
total: event.data.total,
itemCount: event.data.items.length,
placedAt: event.timestamp
});
// Update user's order list view
await db.userOrderLists.push(event.data.userId, {
orderId: event.aggregateId,
total: event.data.total,
status: 'PENDING'
});
}
async onOrderShipped(event) {
// Update multiple read models
await db.orderViews.update(event.aggregateId, {
status: 'SHIPPED',
shippedAt: event.timestamp
});
await db.userOrderLists.update(
{ userId: event.data.userId, orderId: event.aggregateId },
{ status: 'SHIPPED' }
);
}
}
// Query side: Fast reads from denormalized views
class OrderQueryService {
async getOrderById(orderId) {
// Read from optimized view (no joins needed)
return db.orderViews.findOne({ orderId });
}
async getUserOrders(userId, limit = 10) {
// Pre-computed user order list
return db.userOrderLists.find({ userId }).limit(limit);
}
}
Benefits:
- Performance - Reads don't impact writes and vice versa
- Scalability - Scale read and write sides independently
- Flexibility - Different data models for different needs
Event Streaming with Kafka
Kafka for High-Throughput Event Processing
Kafka provides durable, ordered event log with high throughput:
const { Kafka } = require('kafkajs');
const kafka = new Kafka({
clientId: 'order-service',
brokers: ['kafka1:9092', 'kafka2:9092', 'kafka3:9092']
});
// Producer: Publish events
const producer = kafka.producer();
async function publishEvent(event) {
await producer.send({
topic: 'orders',
messages: [{
key: event.aggregateId, // Ensures same order events go to same partition
value: JSON.stringify(event),
headers: {
'event-type': event.eventType,
'correlation-id': event.metadata.correlationId
}
}]
});
}
// Consumer: Subscribe to events
const consumer = kafka.consumer({ groupId: 'order-projections' });
await consumer.subscribe({ topic: 'orders', fromBeginning: false });
await consumer.run({
eachMessage: async ({ topic, partition, message }) => {
const event = JSON.parse(message.value.toString());
console.log(`Processing ${event.eventType} from partition ${partition}`);
// Project event into read models
await orderProjection.handle(event);
// Commit offset after successful processing
}
});
Kafka Benefits:
- Durability - Events persisted to disk, replicated across brokers
- Ordering - Guaranteed order within partition
- Scalability - Horizontal scaling with partitions
- Replay - Consumers can replay events from any point
- Multiple consumers - Many services consume same events
Publish-Subscribe Pattern
Decoupled Event Distribution
Publishers emit events to topics without knowing subscribers:
// Event bus with Redis Pub/Sub
const Redis = require('ioredis');
const pub = new Redis();
const sub = new Redis();
class EventBus {
async publish(eventType, event) {
await pub.publish(eventType, JSON.stringify(event));
}
subscribe(eventType, handler) {
sub.subscribe(eventType);
sub.on('message', async (channel, message) => {
if (channel === eventType) {
const event = JSON.parse(message);
await handler(event);
}
});
}
}
// Publishers don't know about subscribers
class OrderService {
async placeOrder(orderData) {
const order = await db.orders.create(orderData);
// Emit event
await eventBus.publish('OrderPlaced', {
orderId: order.id,
userId: order.userId,
total: order.total
});
return order;
}
}
// Multiple independent subscribers
class InventoryService {
constructor() {
eventBus.subscribe('OrderPlaced', this.reserveInventory.bind(this));
}
async reserveInventory(event) {
console.log('Reserving inventory for order:', event.orderId);
// Reserve stock for order items
}
}
class NotificationService {
constructor() {
eventBus.subscribe('OrderPlaced', this.sendConfirmation.bind(this));
}
async sendConfirmation(event) {
console.log('Sending confirmation for order:', event.orderId);
// Send email confirmation
}
}
class AnalyticsService {
constructor() {
eventBus.subscribe('OrderPlaced', this.trackOrder.bind(this));
}
async trackOrder(event) {
console.log('Tracking order for analytics:', event.orderId);
// Update analytics dashboard
}
}
Choreography vs Orchestration
Event Choreography (Decentralized)
Services react to events independently without central coordinator:
// Order placed event starts the saga
eventBus.on('OrderPlaced', async (event) => {
// Multiple services react independently
});
// Inventory service reserves stock
inventoryService.on('OrderPlaced', async (event) => {
const reserved = await inventoryService.reserve(event.items);
if (reserved) {
await eventBus.publish('InventoryReserved', { orderId: event.orderId });
} else {
await eventBus.publish('InventoryReserveFailed', { orderId: event.orderId });
}
});
// Payment service processes payment
paymentService.on('InventoryReserved', async (event) => {
try {
const payment = await paymentService.charge(event.orderId);
await eventBus.publish('PaymentCompleted', { orderId: event.orderId, payment });
} catch (error) {
await eventBus.publish('PaymentFailed', { orderId: event.orderId });
}
});
// Shipping service ships order
shippingService.on('PaymentCompleted', async (event) => {
await shippingService.ship(event.orderId);
await eventBus.publish('OrderShipped', { orderId: event.orderId });
});
// Handle failures (compensating transactions)
inventoryService.on('PaymentFailed', async (event) => {
await inventoryService.releaseReservation(event.orderId);
await eventBus.publish('InventoryReleased', { orderId: event.orderId });
});
Pros: Loose coupling, easy to add new participants
Cons: Hard to track workflow, implicit dependencies
Event Orchestration (Centralized)
Central orchestrator manages workflow:
class OrderSagaOrchestrator {
async executeOrderSaga(orderId) {
const saga = {
orderId,
status: 'STARTED',
steps: []
};
try {
// Step 1: Reserve inventory
saga.steps.push({ name: 'ReserveInventory', status: 'PENDING' });
await this.reserveInventory(orderId);
saga.steps[0].status = 'COMPLETED';
// Step 2: Process payment
saga.steps.push({ name: 'ProcessPayment', status: 'PENDING' });
await this.processPayment(orderId);
saga.steps[1].status = 'COMPLETED';
// Step 3: Ship order
saga.steps.push({ name: 'ShipOrder', status: 'PENDING' });
await this.shipOrder(orderId);
saga.steps[2].status = 'COMPLETED';
saga.status = 'COMPLETED';
await eventBus.publish('OrderCompleted', { orderId });
} catch (error) {
saga.status = 'FAILED';
// Compensate completed steps in reverse order
await this.compensate(saga);
}
return saga;
}
async compensate(saga) {
const completedSteps = saga.steps
.filter(s => s.status === 'COMPLETED')
.reverse();
for (const step of completedSteps) {
switch (step.name) {
case 'ShipOrder':
await this.cancelShipment(saga.orderId);
break;
case 'ProcessPayment':
await this.refundPayment(saga.orderId);
break;
case 'ReserveInventory':
await this.releaseInventory(saga.orderId);
break;
}
}
await eventBus.publish('OrderFailed', { orderId: saga.orderId });
}
}
Pros: Clear workflow visibility, easier debugging
Cons: Central point of failure, tighter coupling
Eventual Consistency
Handling Async State Propagation
Event-driven systems are eventually consistent - read models update asynchronously:
// Write happens immediately
app.post('/orders', async (req, res) => {
const command = { orderId: uuid(), ...req.body };
await orderCommandHandler.placeOrder(command);
// Return immediately (write completed)
res.status(202).json({
orderId: command.orderId,
message: 'Order placed successfully'
});
});
// Read might not show order yet (eventual consistency)
app.get('/orders/:id', async (req, res) => {
const order = await orderQueryService.getOrderById(req.params.id);
if (!order) {
// Order might be in event queue, not yet projected
return res.status(404).json({ error: 'Order not found (processing)' });
}
res.json(order);
});
// Solution: Include processing status
res.status(202).json({
orderId: command.orderId,
status: 'PROCESSING',
_links: {
self: /orders/${command.orderId},
status: /orders/${command.orderId}/status
}
});
// Client polls or uses WebSocket for updates
Strategies for handling eventual consistency:
- Accept it - Design UX for async operations
- Optimistic UI - Show expected state immediately
- Status polling - Client checks until ready
- WebSockets - Push notifications when ready
- Inbox pattern - Track processed events to ensure idempotency
Real-World Examples
Netflix Event-Driven Architecture
Scale:
- Billions of events processed daily
- Event streaming with Apache Kafka
- Event sourcing for user viewing history
- CQRS for recommendations (write to master, read from replicas)
Patterns:
- Change Data Capture (CDC) from databases
- Event-driven microservices communication
- Real-time stream processing with Flink
Uber Trip State Management
Event Sourcing:
- Trip lifecycle tracked as events
- TripRequested → DriverMatched → TripStarted → TripCompleted
- Complete audit trail for disputes
- Replay events for analytics
Benefits:
- Handle 100M+ trips
- Complete trip history
- Support for complex workflows (cancellations, route changes)
Conclusion
Event-driven architecture enables building scalable, loosely coupled systems through asynchronous event communication. Key takeaways:
Core Concepts:
- Events are immutable facts about what happened
- Publishers and subscribers are decoupled
- Temporal decoupling for resilience
Patterns:
- Event sourcing stores all events (complete audit trail)
- CQRS separates read/write models (optimized for each)
- Choreography for loose coupling, orchestration for visibility
- Eventual consistency requires UX design
Technology:
- Kafka for high-throughput event streaming
- Event stores for event sourcing
- Message queues for pub/sub patterns
Trade-offs:
- Complexity vs scalability
- Eventual consistency vs strong consistency
- Debugging distributed workflows
Companies like Netflix and Uber demonstrate EDA scales to billions of events when implemented correctly. Start simple with pub/sub, adopt event sourcing and CQRS when benefits outweigh complexity.
Related Articles
Event-Driven Architecture - Building Scalable, Loosely Coupled Production Systems
Master event-driven architecture covering event sourcing, CQRS pattern, event streaming with Kafka, publish-subscribe messaging, event choreography vs orchestration, eventual consistency patterns, and production implementation strategies for building scalable, resilient distributed systems.
Microservices Architecture Patterns - Production Design and Implementation Best Practices
Master microservices architecture covering service decomposition strategies, inter-service communication patterns, API gateway design, service discovery, distributed tracing, circuit breakers, and production deployment patterns for building scalable, resilient microservices systems.
GraphQL API Design - Production Architecture and Best Practices for Scalable Systems
Master GraphQL API design covering schema design principles, resolver optimization, N+1 query prevention with DataLoader, authentication and authorization patterns, caching strategies, error handling, and production deployment for high-performance GraphQL systems.
Written by StaticBlock
StaticBlock is a technical writer and software engineer specializing in web development, performance optimization, and developer tooling.