Infrabasic

Uptime Monitoring

Continuously check application availability from external locations and alert when endpoints become unreachable or slow.

Also known as: uptime checks, availability monitoring, synthetic monitoring, external monitoring, ping monitoring

Description

Uptime monitoring verifies application availability by periodically sending synthetic requests to endpoints from external locations around the world. Unlike internal health checks (which confirm the app is running), external uptime monitors validate that the entire stack -- DNS, CDN, load balancers, application servers, and databases -- is functioning and accessible to real users. This provides an outside-in perspective that catches issues invisible to internal monitoring.

Uptime monitors typically check HTTP(S) endpoints at intervals ranging from 30 seconds to 5 minutes, verifying response status codes, response body content (keyword checks), response headers, SSL certificate validity and expiration, and response time thresholds. Checks run from multiple geographic regions to distinguish between localized network issues and genuine outages. A check is typically considered failing only after consecutive failures from multiple locations to reduce false positives.

Modern uptime monitoring platforms (Uptime Robot, Better Stack, Pingdom, Checkly) provide status pages for communicating availability to users, incident management workflows, SLA tracking and reporting, and multi-step synthetic monitoring that simulates user flows (login, checkout, API sequences). Integration with alerting channels ensures the right people are notified via the right medium -- Slack for warnings, PagerDuty for critical outages, email for weekly availability reports.

Prompt Snippet

Configure uptime monitoring with checks from at least 5 geographic regions (US-East, US-West, EU-West, AP-Southeast, AP-Northeast) at 60-second intervals. Monitor the primary API endpoint (GET /healthz expecting 200), the marketing site, and critical API flows (POST /api/auth/login with test credentials). Set alert thresholds: notify Slack after 2 consecutive failures from 2+ regions, escalate to PagerDuty after 3 consecutive failures. Monitor SSL certificate expiration with a 14-day warning threshold. Publish a public status page at status.example.com with real-time component statuses and a 90-day incident history. Track monthly SLA percentage targeting 99.95% uptime.