Webhook Retry Logic
An automatic retry mechanism with exponential backoff that re-attempts failed webhook deliveries to ensure eventual delivery.
Description
Webhook retry logic ensures that transient failures -- network timeouts, temporary server errors, or brief downtime on the consumer's end -- don't result in lost events. When a webhook delivery fails (non-2xx response or connection timeout), the system schedules a retry with exponential backoff. A typical schedule might retry after 1 minute, 5 minutes, 30 minutes, 2 hours, and 8 hours, with a maximum of 5-8 retry attempts over a 24-hour period.
The retry policy needs to define what constitutes a failure: any non-2xx HTTP response code, connection timeouts (typically 5-10 seconds), and read timeouts (typically 30 seconds). A 410 Gone response should be treated as a permanent failure, causing the system to disable the endpoint rather than retrying. Between retries, the webhook delivery should be marked with its attempt count, next retry time, and the error details from the last attempt.
After exhausting all retry attempts, the system should mark the delivery as permanently failed and notify the consumer (via email or in-app notification) that manual intervention is needed. Endpoints with a high failure rate (e.g., 95% failures over 7 days) should be automatically disabled with a notification, preventing the system from wasting resources on dead endpoints. A manual replay API or dashboard button allows consumers to trigger redelivery of specific events after fixing their endpoint.
Prompt Snippet
Implement exponential backoff retries at intervals of 60s, 300s, 1800s, 7200s, and 28800s (5 attempts over ~10 hours) using a delayed job queue in BullMQ. Consider a delivery failed on any non-2xx response, connection timeout (5s), or read timeout (30s). Treat HTTP 410 as a permanent failure and auto-disable the endpoint. Store each attempt's status code, response body (truncated to 1KB), and latency for debugging. After final retry exhaustion, emit a webhook.endpoint.failed event and send an email notification to the endpoint owner. Provide a POST /webhooks/deliveries/{id}/retry endpoint for manual redelivery.Tags
Related Terms
Webhook Design
A push-based integration pattern where the server sends HTTP POST requests to consumer-registered URLs when events occur.
Webhook Signature Verification
Cryptographic signing of webhook payloads using HMAC-SHA256 so consumers can verify authenticity and integrity of incoming events.
Circuit Breaker Pattern
A resilience pattern that prevents cascading failures by temporarily stopping requests to a failing downstream service after a threshold of errors is reached.