Circuit Breaker Pattern
A resilience pattern that prevents cascading failures by temporarily stopping requests to a failing downstream service after a threshold of errors is reached.
Description
The circuit breaker pattern protects a system from cascading failures when a downstream dependency (database, external API, microservice) becomes unresponsive or starts failing at a high rate. Like an electrical circuit breaker, it has three states: Closed (normal operation, requests pass through), Open (failures exceeded the threshold, requests are immediately rejected without calling the downstream service), and Half-Open (after a timeout, a limited number of probe requests are allowed through to test if the dependency has recovered).
When the circuit is Closed, the breaker monitors the error rate or failure count within a sliding window. If failures exceed the configured threshold (e.g., 50% error rate over 10 seconds, or 5 consecutive failures), the circuit Opens. In the Open state, all requests immediately return a fallback response or error without attempting the downstream call, preventing resource exhaustion (thread pool depletion, connection pool exhaustion, timeout accumulation). After a configurable timeout (e.g., 30 seconds), the circuit transitions to Half-Open and allows a small number of probe requests through.
Circuit breakers should be configured per-dependency and per-operation, not globally. The thresholds and timeouts need tuning based on the dependency's SLA and the acceptable impact of the fallback behavior. Libraries like opossum (Node.js), resilience4j (Java), and Polly (.NET) provide circuit breaker implementations. In service mesh architectures, Envoy and Istio implement circuit breaking at the infrastructure layer, applying it transparently without application code changes.
Prompt Snippet
Implement circuit breakers using opossum for each external service dependency with configuration: { timeout: 3000, errorThresholdPercentage: 50, resetTimeout: 30000, rollingCountTimeout: 10000, volumeThreshold: 10 }. On open circuit, return a cached stale response with a Cache-Status: stale header when available, or 503 with Retry-After header when no fallback exists. Emit circuit state change events (open/close/half-open) to Datadog as custom metrics for alerting. Configure separate circuit breaker instances per downstream service with tuned thresholds based on each service's p99 latency and error budget.Tags
Related Terms
API Rate Limiting
Restricting the number of API requests a client can make within a time window to protect backend resources and ensure fair usage.
API Gateway Pattern
A single entry point that sits in front of backend services to handle cross-cutting concerns like authentication, rate limiting, routing, and request transformation.
API Health Check Endpoints
Dedicated endpoints that report the operational status of an API and its dependencies for use by load balancers, orchestrators, and monitoring systems.
Webhook Retry Logic
An automatic retry mechanism with exponential backoff that re-attempts failed webhook deliveries to ensure eventual delivery.