'Application Load Balancer Target Group Register/Deregister Infinite Loop

Setup

Security Groups

  • ALB (inbound rules)

    • HTTPS:443 from 0.0.0.0/0 & ::/0
    • HTTP:80 from 0.0.0.0/0 & ::/0
  • Cluster (inbound rules)

    • All traffic from ALB security group

Cluster

  • instance is t2.micro (only running 1 instance in subnets us-east-1<a,b,c> under default VPC with public IP enabled)
  • client → 0.375 vCPU/0.25 GB, 1 task, bridge network, 0:3000 (host:container)
  • server → 0.25 vCPU/0.25 GB, 2 tasks, bridge network, 0:5000 (host:container)

ALB

  • availability zones: us-east-1<a,b,c>, same default VPC
  • listeners:
    • HTTP:80 → redirect to HTTPS://#{host}:443/#{path}?#{query}
    • HTTPS:443 (/) → forward to client target group
    • HTTPS:443 (/api) → forward to server target group

Target Groups

  • client → HTTP:3000 with default health check of HTTP, /, Traffic Port, 5 healthy, 2 unhealthy, 5s timeout, 30s interval, 200 OK
  • server → HTTP:5000 with health check of HTTP, /api/health, Traffic Port, 5 healthy, 2 unhealthy, 5s timeout, 30s interval, 200 OK

Both docker images for client and server work properly locally & the client service seems to work well in AWS ECS. However, the server service keeps cycling between registering and de-registering (draining) the container instances seemingly without even becoming unhealthy

Here is what I see in the service Deployments and events tab:

5/12/2022, 8:43:04 PM   service server registered 2 targets in target-group <...>
5/12/2022, 8:42:54 PM   service server has started 2 tasks: task <...> task <...>.  <...>
5/12/2022, 8:42:51 PM   service server deregistered 1 targets in target-group <...>
5/12/2022, 8:42:51 PM   service server has begun draining connections on 1 tasks.   <...>
5/12/2022, 8:42:51 PM   service server deregistered 1 targets in target-group <...>
5/12/2022, 8:42:17 PM   service server registered 2 targets in target-group <...>
5/12/2022, 8:42:07 PM   service server has started 2 tasks: task <...> task <...>.  <...>
5/12/2022, 8:42:04 PM   service server deregistered 1 targets in target-group <...>
5/12/2022, 8:42:04 PM   service server has begun draining connections on 1 tasks.   <...>
5/12/2022, 8:42:04 PM   service server deregistered 1 targets in target-group <...> 

Any ideas?



Solution 1:[1]

After enabling AWS CloudWatch logs in my task definition's container specs, I was able to see that the issue was actually with an AWS RDS instance.

The RDS instances' SG was accepting traffic from an old cluster SG (which no longer exists), so that clears up why a health check wasn't being performed and the registered instances were draining immediately.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 lbragile