Saga Pattern
A pattern for managing distributed transactions across multiple services by breaking them into a sequence of local transactions with compensating actions for rollback.
Description
The Saga pattern manages data consistency across microservices without distributed transactions (two-phase commit). Instead of a single ACID transaction spanning multiple databases, a saga breaks the operation into a sequence of local transactions, each executed by a single service. If any step fails, the saga executes compensating transactions in reverse order to undo the changes made by preceding steps. For example, an order saga might: reserve inventory -> charge payment -> confirm order. If payment fails, the compensating action releases the reserved inventory.
Sagas come in two coordination flavors: choreography and orchestration. In choreography-based sagas, each service emits events after completing its local transaction, and downstream services react to those events. There is no central coordinator—the flow emerges from the event chain. In orchestration-based sagas, a central saga orchestrator (a state machine) explicitly commands each service to execute its step and handles responses, including triggering compensations on failure. Choreography is simpler for short sagas (2-3 steps) but becomes difficult to reason about as the number of steps grows. Orchestration adds a single point of coordination but makes the flow explicit and debuggable.
Implementing sagas well requires careful design of compensating actions (not all operations are easily reversible—you can refund a payment but cannot un-send an email), idempotent step execution (network retries may cause duplicate delivery), and handling of partial failures during compensation itself (what if the compensation step also fails). Saga state must be persisted durably so that recovery is possible after a crash. Frameworks like Temporal, AWS Step Functions, and MassTransit provide saga infrastructure with built-in retry, timeout, and compensation support.
Prompt Snippet
Implement the order fulfillment saga as an orchestration-based saga using Temporal workflows with TypeScript SDK. Define each saga step as an idempotent Temporal activity with configurable retry policies (initialInterval: 1s, backoffCoefficient: 2, maximumAttempts: 5) and per-activity timeouts. Implement compensating activities (cancelReservation, refundPayment, revertShipment) that are invoked in a try/catch around the main saga sequence. Persist saga state automatically via Temporal's durable execution, and use Temporal's visibility API to query in-progress sagas for operational monitoring. Test the saga using Temporal's test framework with time-skipping for timeout scenarios.
Tags
Related Terms
Event Sourcing
A persistence pattern where state changes are stored as an immutable, append-only sequence of domain events rather than overwriting the current state in place.
CQRS (Command Query Responsibility Segregation)
An architectural pattern that uses separate models for reading data (queries) and writing data (commands), allowing each to be optimized independently.
Event-Driven Architecture
An architectural style where the flow of the program is determined by events—state changes that are produced, detected, consumed, and reacted to by loosely coupled services.
State Machine Design
A modeling technique where system behavior is defined as a finite set of states with explicit transitions triggered by events, eliminating impossible states by design.