Infraadvanced

Blue-Green Deployments

Maintain two identical production environments and switch traffic between them for instant deployment and rollback.

Also known as: blue-green, blue/green deployment, red-black deployment, A/B deployment

Description

Blue-green deployment is a release strategy that maintains two identical production environments, conventionally called 'blue' (current live) and 'green' (new version). The new version is deployed to the idle environment, thoroughly tested with production-like traffic, and then traffic is switched from blue to green at the load balancer or DNS level. The old environment remains intact as an instant rollback target.

The traffic switch can be implemented via DNS record updates (slow, dependent on TTL propagation), load balancer target group swaps (fast, preferred), or reverse proxy configuration changes. AWS CodeDeploy, ECS with dual target groups, and Kubernetes with service label selectors all support blue-green patterns natively. The key advantage over rolling deployments is that the switch is atomic -- all users experience the new version simultaneously, and rollback is a simple revert of the traffic switch.

The primary cost of blue-green deployments is maintaining double the infrastructure during and between deployments. This can be mitigated by auto-scaling the idle environment down to minimal capacity between releases and scaling it up before deployment. Database schema changes require special care since both environments share the same database -- migrations must be backward-compatible, following an expand-contract pattern where the schema supports both old and new application versions simultaneously.

Prompt Snippet

Implement blue-green deployments using AWS ECS with two target groups registered to a single ALB. Deploy the new version to the inactive target group, run automated smoke tests and synthetic transactions against the new environment via a separate test listener on port 8443. Once validated, use CodeDeploy to shift ALB traffic atomically from the active to the new target group. Retain the previous environment for 1 hour as a rollback target, with an automated rollback triggered if CloudWatch error-rate alarms exceed 1% within 10 minutes of the switch. Scale down the old environment after the bake period.