Infraintermediate

Load Balancing

Distribute incoming network traffic across multiple server instances to ensure reliability and optimal resource utilization.

Also known as: load balancer, LB, traffic distribution, L4/L7 load balancing

Description

Load balancing distributes incoming network requests across multiple backend server instances to prevent any single server from becoming a bottleneck. It improves application availability (if one server fails, traffic routes to healthy ones), enables horizontal scaling (add more instances to handle more traffic), and supports zero-downtime deployments (drain connections from old instances while routing new traffic to updated ones).

Load balancers operate at different layers: L4 (transport layer) balancing routes TCP/UDP connections based on IP and port, offering high throughput with minimal overhead. L7 (application layer) balancing inspects HTTP headers, URLs, and cookies to make more intelligent routing decisions, enabling features like path-based routing, header-based routing, cookie-based session affinity, and request-based health checking. Most modern web applications use L7 load balancing.

Common load balancing algorithms include round-robin (equal distribution), weighted round-robin (proportional to server capacity), least connections (route to the server handling the fewest active requests), IP hash (consistent routing based on client IP for session affinity), and random with two choices (power of two random choices for low-latency selection). Cloud providers offer managed load balancers (AWS ALB/NLB, GCP Cloud Load Balancing, Azure Load Balancer) that integrate with auto-scaling groups, health checks, and TLS certificate management.

Prompt Snippet

Deploy an AWS Application Load Balancer (ALB) with L7 routing. Configure target groups with health checks hitting /healthz every 15s, healthy threshold of 2, unhealthy threshold of 3, and a 5s timeout. Set the deregistration delay to 60s for graceful connection draining during deployments. Use least-outstanding-requests routing algorithm for even distribution across instances with varying response times. Enable access logging to S3, configure WAF integration for OWASP Top 10 protection, and set idle timeout to 120s for long-polling or WebSocket connections.

Related Terms

Infra

intermediate

Reverse Proxy (Nginx/Caddy)

Route incoming HTTP requests through a reverse proxy that handles TLS termination, routing, and request buffering.

Infra

advanced

Auto-Scaling

Automatically adjust the number of running application instances based on real-time demand metrics.

Infra

basic

Health Check Endpoints

Expose HTTP endpoints that report application health status for use by load balancers, orchestrators, and monitoring systems.

Infra

intermediate

Zero-Downtime Deployments

Deploy application updates without any period of unavailability by gradually replacing old instances with new ones.

Infra

advanced

Container Orchestration (Kubernetes basics)

Automate deployment, scaling, and management of containerized applications using Kubernetes.

Load Balancing

Description

Prompt Snippet

Tags

Related Terms

Reverse Proxy (Nginx/Caddy)

Auto-Scaling

Health Check Endpoints

Zero-Downtime Deployments

Container Orchestration (Kubernetes basics)