Back to all terms
ClientAPIrequestresponse
APIintermediate

Webhook Retry Logic

An automatic retry mechanism with exponential backoff that re-attempts failed webhook deliveries to ensure eventual delivery.

Also known as: Webhook Retry Policy, Delivery Retry Strategy

Description

Webhook retry logic ensures that transient failures -- network timeouts, temporary server errors, or brief downtime on the consumer's end -- don't result in lost events. When a webhook delivery fails (non-2xx response or connection timeout), the system schedules a retry with exponential backoff. A typical schedule might retry after 1 minute, 5 minutes, 30 minutes, 2 hours, and 8 hours, with a maximum of 5-8 retry attempts over a 24-hour period.

The retry policy needs to define what constitutes a failure: any non-2xx HTTP response code, connection timeouts (typically 5-10 seconds), and read timeouts (typically 30 seconds). A 410 Gone response should be treated as a permanent failure, causing the system to disable the endpoint rather than retrying. Between retries, the webhook delivery should be marked with its attempt count, next retry time, and the error details from the last attempt.

After exhausting all retry attempts, the system should mark the delivery as permanently failed and notify the consumer (via email or in-app notification) that manual intervention is needed. Endpoints with a high failure rate (e.g., 95% failures over 7 days) should be automatically disabled with a notification, preventing the system from wasting resources on dead endpoints. A manual replay API or dashboard button allows consumers to trigger redelivery of specific events after fixing their endpoint.

Prompt Snippet

Implement exponential backoff retries at intervals of 60s, 300s, 1800s, 7200s, and 28800s (5 attempts over ~10 hours) using a delayed job queue in BullMQ. Consider a delivery failed on any non-2xx response, connection timeout (5s), or read timeout (30s). Treat HTTP 410 as a permanent failure and auto-disable the endpoint. Store each attempt's status code, response body (truncated to 1KB), and latency for debugging. After final retry exhaustion, emit a webhook.endpoint.failed event and send an email notification to the endpoint owner. Provide a POST /webhooks/deliveries/{id}/retry endpoint for manual redelivery.

Tags

webhooksretryexponential-backoffreliabilitydelivery