Back to all terms
ServerNode 1Node 2Infrastructure
Infraintermediate

Load Balancing

Distribute incoming network traffic across multiple server instances to ensure reliability and optimal resource utilization.

Also known as: load balancer, LB, traffic distribution, L4/L7 load balancing

Description

Load balancing distributes incoming network requests across multiple backend server instances to prevent any single server from becoming a bottleneck. It improves application availability (if one server fails, traffic routes to healthy ones), enables horizontal scaling (add more instances to handle more traffic), and supports zero-downtime deployments (drain connections from old instances while routing new traffic to updated ones).

Load balancers operate at different layers: L4 (transport layer) balancing routes TCP/UDP connections based on IP and port, offering high throughput with minimal overhead. L7 (application layer) balancing inspects HTTP headers, URLs, and cookies to make more intelligent routing decisions, enabling features like path-based routing, header-based routing, cookie-based session affinity, and request-based health checking. Most modern web applications use L7 load balancing.

Common load balancing algorithms include round-robin (equal distribution), weighted round-robin (proportional to server capacity), least connections (route to the server handling the fewest active requests), IP hash (consistent routing based on client IP for session affinity), and random with two choices (power of two random choices for low-latency selection). Cloud providers offer managed load balancers (AWS ALB/NLB, GCP Cloud Load Balancing, Azure Load Balancer) that integrate with auto-scaling groups, health checks, and TLS certificate management.

Prompt Snippet

Deploy an AWS Application Load Balancer (ALB) with L7 routing. Configure target groups with health checks hitting /healthz every 15s, healthy threshold of 2, unhealthy threshold of 3, and a 5s timeout. Set the deregistration delay to 60s for graceful connection draining during deployments. Use least-outstanding-requests routing algorithm for even distribution across instances with varying response times. Enable access logging to S3, configure WAF integration for OWASP Top 10 protection, and set idle timeout to 120s for long-polling or WebSocket connections.

Tags

load-balancingscalingavailabilitynetworkinginfrastructure