So I setup a aws ecs cluster to run a docker image of Valhalla service almost as-is.
Issue : target group seems to be not able to check for cluster health, like if the health request was reaching the cluster, but container is not "forwarding" the request to Valhalla.
Description :
I created a repository on AWS ECR, and pushed a docker image of gisops/valhalla with only the valhalla.json file changed.
Here is the valhalla configuration I used.
Note that I changed the default listening port from 8002 to 80.
I created a ECS Fargate cluster, and a service that uses this task definition to launch a container that runs Valhalla.
The service receives traffic from an application load balancer via port 80.
The target group is checking /status path on port 80.
All set, the task is then creating, and task logs shows that Valhalla is initializing perfectly and running.
However the target group is not able to check for health status : the request seems to timeout.
If the request was reaching valhalla, the task logs would have at least show it (because valhalla logs every incoming request by default), but it doesn't.
Therefore fargate kills the task (Task failed ELB health checks in (target-group {my-target-group-uri})) (showing that the health request was reaching the cluster service indeed)
I don't think the issue is with the valhalla configuration, because I can run the same docker image locally, and it works perfectly, using :
docker run -dt -p 3000:80 -v /local/path/to/valhalla-files:/custom_files/ --name valhalla gisops/valhalla:latest
And then checking localhost:3000/status
Anyone has an idea of what could be the issue ?
Already spent a lot of time on this, and I'm out of ideas. Thanks for your help !
Related
The app runner is successfully created and works fine, but any attempt to change the configuration gets an error. It seems that the healthcheck does not work, although after creation everything works fine.
[AppRunner] Service status is set to RUNNING.
[AppRunner] Service update failed. For details, see service logs.
[AppRunner] Performing health check on path '/healthz' and port '8080'.
[AppRunner] Provisioning instances and deploying image.
[AppRunner] Service status is set to OPERATION_IN_PROGRESS.
[AppRunner] Service update started.
[AppRunner] Service status is set to RUNNING.
[AppRunner] Service creation completed successfully.
[AppRunner] Successfully routed incoming traffic to application.
[AppRunner] Health check is successful. Routing traffic to application.
[AppRunner] Performing health check on path '/healthz' and port '8080'.
[AppRunner] Provisioning instances and deploying image.
[AppRunner] Successfully pulled image from ECR.
[AppRunner] Service status is set to OPERATION_IN_PROGRESS.
[AppRunner] Service creation started.
This happens with any change. For example, here I just changed the healthcheck interval from the default 10 seconds to the maximum 20.
At the same time, it is impossible to find any logs explaining what went wrong. Cloudwatch just duplicates the message Service update failed. For details, see service logs.
If it matters, I'm running the application inside a VPC, with NAT configured. There is also a private db instance with access only from the VPC. Healthcheck /healthz checks access to the internet and to the database, there are no problems with this.
Any ideas what I'm doing wrong or where I can find useful logs would be helpful.
I have deployed an ECS cluster. The application was running on prem on Docker containers. I have pushed the images to ECR and deployed the services. Everything looks good however, there is one service which is mongo container, it's task is stopping and restarting continuously. Reason for stopping :Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:us-east.....
CloudWatch logs says: "2022-08-29T19:40:48.319+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
Please anyone knows what may be the issue?
I am operating an ECS cluster with one java application container.
I am trying to use aws-cli for updating task-definition either ECS service.
I tried both of them and task-definition result was successed.
But, I have a problem while trying to update ECS service with aws-cli.
***************************
APPLICATION FAILED TO START
***************************
Description:
The Tomcat connector configured to listen on port 10080 failed to start. The port may already be in use or the connector may be misconfigured.
Action:
Verify the connector's configuration, identify and stop any process that's listening on port 10080, or configure this application to listen on another port.
The application's container port is 10080 , and I tried aws ecs update-service for update new task.
I thought the flow is
deregister container & delete container -> new container goes up -> re-register at the ALB
But after update the service the fail log I put above has shown.
I can't understand the 10080 port is still using or misconnected.
Anyone can help me?
I have set up a Docker container running on port 5566 with a small Django application. The Docker image is uploaded into the ECR and later used by Fargate container(s).
I have set up an ECS cluster with a VPC.
After creating the Task Definition and Service, the Service starts up 2 tasks (as it is supposed to):
Here's the Service's Network Access (with health check grace period on 300s):
I also set up an Application Load Balancer (with DNS) with a target group for the service, but the health checks seem to be failing:
Here's the health check configuration:
Because the health checks are failing the tasks are terminated and new ones are started after ~every 5 minutes.
Here's the container's port mapping:
As one cannot access the Fargate container (via SSH for example) and the logs are empty, how should I troubleshoot the issue?
I have tried to follow every step in the Troubleshoot Your Application Load Balancer.
Feel free to ask additional information.
can you confirm once, your application is working on port 5566 inside docker?
you can check logs in cloudwatch. you'll get the link in cluster -> service -> tasks -> your task.
Can you post your ALB configuration? your Target group port?
I deployed docker image using AWS Fargate. When I created a service out of the task definition, logs show that tomcat has no errors and app is up and running but new instances are getting constantly getting spun as health check is failing
Health Checks (On target group tied to the service)
Protocol: HTTP
Path: /Sampler/data/ping
Port: traffic/port
What is the right path for health check?
I tried giving servicename too, but it did not work
for example: /servicename/data/ping
Can you please suggest what I am missing?
I have deployed the same war file in local by running docker run -p 8080:8080 sampler:latest (same image pushed from local to ECR) and when I hit the URL http://localhost:8080/Sampler/data/ping, I get 200 status code
Dockerfile
FROM tomcat:9.0-jre8-alpine
COPY target/Sampler-*.war $CATALINA_HOME/webapps/Sampler.war
EXPOSE 80
The path for the health check depends on your application. Based on the information you have provided, I suspect the issue could be related to healthCheckGracePeriodSeconds
healthCheckGracePeriodSeconds
The period of time, in seconds, that the Amazon ECS service scheduler ignores unhealthy
Elastic Load Balancing target health checks after a task has first started.
https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_Service.html
When ECS tasks took a long time to start, Elastic Load Balancing (ELB) health checks could mark the task as unhealthy and the service scheduler would shut the task down.
You can specify a health check grace period in ECS service definition parameter. This instructs the service scheduler to ignore ELB health checks for a pre-defined time period after a task has been instantiated.
https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-ecs-adds-elb-health-check-grace-period/