Load Balancing
Distribute incoming network traffic across multiple server instances to ensure reliability and optimal resource utilization.
Description
Load balancing distributes incoming network requests across multiple backend server instances to prevent any single server from becoming a bottleneck. It improves application availability (if one server fails, traffic routes to healthy ones), enables horizontal scaling (add more instances to handle more traffic), and supports zero-downtime deployments (drain connections from old instances while routing new traffic to updated ones).
Load balancers operate at different layers: L4 (transport layer) balancing routes TCP/UDP connections based on IP and port, offering high throughput with minimal overhead. L7 (application layer) balancing inspects HTTP headers, URLs, and cookies to make more intelligent routing decisions, enabling features like path-based routing, header-based routing, cookie-based session affinity, and request-based health checking. Most modern web applications use L7 load balancing.
Common load balancing algorithms include round-robin (equal distribution), weighted round-robin (proportional to server capacity), least connections (route to the server handling the fewest active requests), IP hash (consistent routing based on client IP for session affinity), and random with two choices (power of two random choices for low-latency selection). Cloud providers offer managed load balancers (AWS ALB/NLB, GCP Cloud Load Balancing, Azure Load Balancer) that integrate with auto-scaling groups, health checks, and TLS certificate management.
Prompt Snippet
Deploy an AWS Application Load Balancer (ALB) with L7 routing. Configure target groups with health checks hitting /healthz every 15s, healthy threshold of 2, unhealthy threshold of 3, and a 5s timeout. Set the deregistration delay to 60s for graceful connection draining during deployments. Use least-outstanding-requests routing algorithm for even distribution across instances with varying response times. Enable access logging to S3, configure WAF integration for OWASP Top 10 protection, and set idle timeout to 120s for long-polling or WebSocket connections.
Tags
Related Terms
Reverse Proxy (Nginx/Caddy)
Route incoming HTTP requests through a reverse proxy that handles TLS termination, routing, and request buffering.
Auto-Scaling
Automatically adjust the number of running application instances based on real-time demand metrics.
Health Check Endpoints
Expose HTTP endpoints that report application health status for use by load balancers, orchestrators, and monitoring systems.
Zero-Downtime Deployments
Deploy application updates without any period of unavailability by gradually replacing old instances with new ones.
Container Orchestration (Kubernetes basics)
Automate deployment, scaling, and management of containerized applications using Kubernetes.