AWS load balancers and instance health checks with terraform
— October 7, 2018

Auto Scaling Group (ASG) is an AWS feature that allows you to manage the size of a cluster (group) of similar instances. You can create an ASG with a minimum number and maximum number of the instances of a particular image. In other words, a group of instances that scale automatically.

In its simplest forms, it relies on the Health checks to determine if any of the instances is unhealthy. In a more advanced setup, if configured, it can scale the number of instances up or down depending on the usage of the instances.

Health check types

From what I understand, there are three major types of health checks that AWS provides (not counting custom health checks).

AWS EC2 Status Checks

For the most basic ASG, the health checks are simply based on the EC2 instance's vitals like system power, networking issues, memory exhaustion etc. You can read more about Status checks for your instances.

These are default checks and are readily available to use. However, there are a couple of issues that you may see with these checks:

They can only tell you about the instance or system health and not the application level health. This means that we have to rely on metrics that may not be telling the true state of your application health. For example, the CPU consumption may be low and network maybe fine but the application itself crashed.
We cannot truly gauge the load on our instances and hence, scaling up or down in the number of instances may not be feasible with these checks alone.

For this reason, AWS offers a feature of adding a Load Balancer in front of your instances. Amazon calls it Elastic Load Balancer.

Elastic Load Balancing Health Checks - Classic Load Balancer

Classic Load Balancer is meant mostly for EC2-Classic network. New customers do not get this option (EC2-Classic) to launch instances anymore but it is worth writing about the limitations.

The idea is simple enough — you define the CLB, you define health check and the Load Balancer does not route traffic to an unhealthy instance in the group. This is already an improvement over the EC2 Status Checks because these LB health checks let you define more granular checks and lets you rely on a "200 OK" response status of your application.

The downside of CLBs is that you have only one health check per LB. If you want to have more than one health check, then you have to create new LBs and point them to the backend instances. This can grow cumbersome real fast but there are other ways to do more checks without maintaining multiple LBs. One of them is the Application Load Balancer.

Elastic Load Balancing Health Checks - Application Load Balancer

Application Load Balancer is a strict Layer-7 Load Balancing. It is much better than the Classic Load Balancer in many ways -

Host-based routing via HTTP Host header along with Path-based routing.
HTTP/2 support.
Wider range of error codes (200-499).

The major point to remember is that you create target groups, which each have one health check. Then you can configure a listener for ALB and provide rules to the listener that tell it to route to a particular target group.

Overview of steps to create an ALB

Two major resources that you need to pay attention to are Listeners and Targets.

Target Groups and health checks

Setup Target Groups and configure health checks for each group.

# An example of target group
resource "aws_alb_target_group" "target-group-1" {
  name = "target-group-1"
  port = 80
  protocol = "HTTP"

  lifecycle { create_before_destroy=true }

  health_check {
    path = "/api/1/resolve/default?path=/service/my-service"
    port = 2001
    healthy_threshold = 6
    unhealthy_threshold = 2
    timeout = 2
    interval = 5
    matcher = "200"  # has to be HTTP 200 or fails
  }
}

Listener and Listener rules

Setup a Listener with Listener Rules that allow you to forward the requests to appropriate targets in one or more target groups.

# An example of a Listener
resource "aws_alb_listener" "my-alb-listener" {
  default_action {
    target_group_arn = "${aws_alb_target_group.target-group-1.arn}"
    type = "forward"
  }
  load_balancer_arn = "${aws_alb.my-app-alb.arn}"
  port = 80
  protocol = "HTTP"
}

# An example of a Listener rule
resource "aws_alb_listener_rule" "rule-1" {
  action {
    target_group_arn = "${aws_alb_target_group.target-group-1.arn}"
    type = "forward"
  }

  condition { field="path-pattern" values=["/api/1/resolve/default"] }

  listener_arn = "${aws_alb_listener.my-alb-listener.id}"
  priority = 100
}

Conclusion

As you can see above, each target group has a check and the ALB listener rules decide which group to send the request to based on the rules like PATH, Host header etc. What makes it more convenient is that you always have a default rule that is a catch-all. Also, you can have multiple conditions in a rule like

resource "aws_alb_listener_rule" "multi-condition-rule" {
  "action" {
    target_group_arn = "${aws_alb_target_group.my-specific-target-group.arn}"
    type = "forward"
  }

  condition { field="path-pattern"  values=["/api/1/resolve/default"]               }
  condition { field="host-header"   values=["example.org"]  }
  
  listener_arn = "${aws_alb_listener.my-listener.id}"
  priority = 108
}

Finally, here is a sort of semi-complete Terraform code to get you some idea. I have also provided the code as Github Gist: AWS Auto Scaling Group with Application Load Balancer using Terraform.

# Create a basic ALB 
resource "aws_alb" "my-app-alb" {
  name = "my-app-alb"
}

# Create target groups with one health check per group
resource "aws_alb_target_group" "target-group-1" {
  name = "target-group-1"
  port = 80
  protocol = "HTTP"

  lifecycle { create_before_destroy=true }

  health_check {
    path = "/api/1/resolve/default?path=/service/my-service"
    port = 2001
    healthy_threshold = 6
    unhealthy_threshold = 2
    timeout = 2
    interval = 5
    matcher = "200"
  }
}

resource "aws_alb_target_group" "target-group-2" {
  name = "target-group-2"
  port = 80
  protocol = "HTTP"

  lifecycle { create_before_destroy=true }

  health_check {
    path = "/api/2/resolve/default?path=/service/my-service"
    port = 2010
    healthy_threshold = 6
    unhealthy_threshold = 2
    timeout = 2
    interval = 5
    matcher = "200"
  }
}

# Create a Listener 
resource "aws_alb_listener" "my-alb-listener" {
  default_action {
    target_group_arn = "${aws_alb_target_group.target-group-1.arn}"
    type = "forward"
  }
  load_balancer_arn = "${aws_alb.my-app-alb.arn}"
  port = 80
  protocol = "HTTP"
}

# Create Listener Rules
resource "aws_alb_listener_rule" "rule-1" {
  action {
    target_group_arn = "${aws_alb_target_group.target-group-1.arn}"
    type = "forward"
  }

  condition { field="path-pattern" values=["/api/1/resolve/default"] }

  listener_arn = "${aws_alb_listener.my-alb-listener.id}"
  priority = 100
}

resource "aws_alb_listener_rule" "rule-2" {
  action {
    target_group_arn = "${aws_alb_target_group.target-group-2.arn}"
    type = "forward"
  }

  condition { field="path-pattern" values=["/api/2/resolve/default"] }

  listener_arn = "${aws_alb_listener.my-alb-listener.id}"
  priority = 101
}

# Create an ASG that ties all of this together
resource "aws_autoscaling_group" "my-alb-asg" {
  name = "my-alb-asg"
  min_size = "3"
  max_size = "6"
  launch_configuration = "${aws_launch_configuration.my-app-alb.name}"
  termination_policies = [
    "OldestInstance",
    "OldestLaunchConfiguration",
  ]
  
  health_check_type = "ELB"

  depends_on = [
    "aws_alb.my-app-alb",
  ]

  target_group_arns = [
    "${aws_alb_target_group.target-group-1.arn}",
    "${aws_alb_target_group.target-group-2.arn}",
  ]

  lifecycle {
    create_before_destroy = true
  }
}

Further ToDo

Learn more about Network Load Balancer.
Comparison to GCP and Azure.

Top

AWS load balancers and instance health checks with terraform — October 7, 2018