Back to all terms
ClientAPIrequestresponseheaders
APIadvanced

Circuit Breaker Pattern

A resilience pattern that prevents cascading failures by temporarily stopping requests to a failing downstream service after a threshold of errors is reached.

Also known as: Circuit Breaker, Fault Tolerance Pattern

Description

The circuit breaker pattern protects a system from cascading failures when a downstream dependency (database, external API, microservice) becomes unresponsive or starts failing at a high rate. Like an electrical circuit breaker, it has three states: Closed (normal operation, requests pass through), Open (failures exceeded the threshold, requests are immediately rejected without calling the downstream service), and Half-Open (after a timeout, a limited number of probe requests are allowed through to test if the dependency has recovered).

When the circuit is Closed, the breaker monitors the error rate or failure count within a sliding window. If failures exceed the configured threshold (e.g., 50% error rate over 10 seconds, or 5 consecutive failures), the circuit Opens. In the Open state, all requests immediately return a fallback response or error without attempting the downstream call, preventing resource exhaustion (thread pool depletion, connection pool exhaustion, timeout accumulation). After a configurable timeout (e.g., 30 seconds), the circuit transitions to Half-Open and allows a small number of probe requests through.

Circuit breakers should be configured per-dependency and per-operation, not globally. The thresholds and timeouts need tuning based on the dependency's SLA and the acceptable impact of the fallback behavior. Libraries like opossum (Node.js), resilience4j (Java), and Polly (.NET) provide circuit breaker implementations. In service mesh architectures, Envoy and Istio implement circuit breaking at the infrastructure layer, applying it transparently without application code changes.

Prompt Snippet

Implement circuit breakers using opossum for each external service dependency with configuration: { timeout: 3000, errorThresholdPercentage: 50, resetTimeout: 30000, rollingCountTimeout: 10000, volumeThreshold: 10 }. On open circuit, return a cached stale response with a Cache-Status: stale header when available, or 503 with Retry-After header when no fallback exists. Emit circuit state change events (open/close/half-open) to Datadog as custom metrics for alerting. Configure separate circuit breaker instances per downstream service with tuned thresholds based on each service's p99 latency and error budget.

Tags

resiliencefault-tolerancecircuit-breakerreliabilitymicroservices