Uptime Monitoring
Continuously check application availability from external locations and alert when endpoints become unreachable or slow.
Description
Uptime monitoring verifies application availability by periodically sending synthetic requests to endpoints from external locations around the world. Unlike internal health checks (which confirm the app is running), external uptime monitors validate that the entire stack -- DNS, CDN, load balancers, application servers, and databases -- is functioning and accessible to real users. This provides an outside-in perspective that catches issues invisible to internal monitoring.
Uptime monitors typically check HTTP(S) endpoints at intervals ranging from 30 seconds to 5 minutes, verifying response status codes, response body content (keyword checks), response headers, SSL certificate validity and expiration, and response time thresholds. Checks run from multiple geographic regions to distinguish between localized network issues and genuine outages. A check is typically considered failing only after consecutive failures from multiple locations to reduce false positives.
Modern uptime monitoring platforms (Uptime Robot, Better Stack, Pingdom, Checkly) provide status pages for communicating availability to users, incident management workflows, SLA tracking and reporting, and multi-step synthetic monitoring that simulates user flows (login, checkout, API sequences). Integration with alerting channels ensures the right people are notified via the right medium -- Slack for warnings, PagerDuty for critical outages, email for weekly availability reports.
Prompt Snippet
Configure uptime monitoring with checks from at least 5 geographic regions (US-East, US-West, EU-West, AP-Southeast, AP-Northeast) at 60-second intervals. Monitor the primary API endpoint (GET /healthz expecting 200), the marketing site, and critical API flows (POST /api/auth/login with test credentials). Set alert thresholds: notify Slack after 2 consecutive failures from 2+ regions, escalate to PagerDuty after 3 consecutive failures. Monitor SSL certificate expiration with a 14-day warning threshold. Publish a public status page at status.example.com with real-time component statuses and a 90-day incident history. Track monthly SLA percentage targeting 99.95% uptime.
Tags
Related Terms
Health Check Endpoints
Expose HTTP endpoints that report application health status for use by load balancers, orchestrators, and monitoring systems.
Application Monitoring (APM)
Monitor application performance, trace requests across services, and identify bottlenecks using APM instrumentation.
Error Tracking (Sentry)
Capture, aggregate, and triage application errors in real-time with full stack traces and contextual data.
SSL/TLS Certificate Management
Provision, configure, and renew SSL/TLS certificates to encrypt traffic between clients and servers.