'Troubleshoot AWS Fargate healthcheck for spring actuator
I have spring boot application with /health
endpoint accessible deployed in AWS ECS Fargate. Sometimes the container is stopped with Task failed container health checks
message. Sometimes happens once daily, sometimes once a week, maybe depends on the load. This is the healthcheck command specified in the Task Definition
:
CMD-SHELL,curl -f http://localhost/actuator/health || exit 1
My question is how to troubleshoot what AWS receive when health-check is failed.
Solution 1:[1]
In case anyone else lands here because of failing container health checks (not the same as ELB health checks), AWS provides some basic advice:
- Check that the command works from inside the container. In my case I had not installed curl in the container image, but when I tested it from outside the container it worked fine, which fooled me into thinking it was working.
- Check the task logs in CloudWatch
If the checks are only failing sometimes (especially under load), you can try increasing the timeout, but also check the task metrics (memory and CPU usage). Garbage collection can cause the task to pause, and if all the vCPUs are busy handling other requests, the health check may be delayed, so you may need to allocate more memory and/or vCPUs to the task.
Solution 2:[2]
Thank @John Velonis, I don't have enough reputation for commenting on your answer, so I post that in a different answer
For my case, the ecs container keeps getting UNKNOWN status from the ecs cluster. But I can access the healthcheck successfully. when I read this post, and check my base image which is node:14.19.1-alpine3.14, it doesn't have curl command.
so I have to install that in the Dockerfile
RUN apk --no-cache add curl
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | John Velonis |
Solution 2 | Minh Chau |