ECS Service Active but the Task is stopping and restarting

ECS Service Active but the Task is stopping and restarting - amazon-web-services

I have deployed an ECS cluster. The application was running on prem on Docker containers. I have pushed the images to ECR and deployed the services. Everything looks good however, there is one service which is mongo container, it's task is stopping and restarting continuously. Reason for stopping :Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:us-east.....
CloudWatch logs says: "2022-08-29T19:40:48.319+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
Please anyone knows what may be the issue?

Related

AWS ECS Fargate request not reaching task container

So I setup a aws ecs cluster to run a docker image of Valhalla service almost as-is.
Issue : target group seems to be not able to check for cluster health, like if the health request was reaching the cluster, but container is not "forwarding" the request to Valhalla.
Description :
I created a repository on AWS ECR, and pushed a docker image of gisops/valhalla with only the valhalla.json file changed.
Here is the valhalla configuration I used.
Note that I changed the default listening port from 8002 to 80.
I created a ECS Fargate cluster, and a service that uses this task definition to launch a container that runs Valhalla.
The service receives traffic from an application load balancer via port 80.
The target group is checking /status path on port 80.
All set, the task is then creating, and task logs shows that Valhalla is initializing perfectly and running.
However the target group is not able to check for health status : the request seems to timeout.
If the request was reaching valhalla, the task logs would have at least show it (because valhalla logs every incoming request by default), but it doesn't.
Therefore fargate kills the task (Task failed ELB health checks in (target-group {my-target-group-uri})) (showing that the health request was reaching the cluster service indeed)
I don't think the issue is with the valhalla configuration, because I can run the same docker image locally, and it works perfectly, using :
docker run -dt -p 3000:80 -v /local/path/to/valhalla-files:/custom_files/ --name valhalla gisops/valhalla:latest
And then checking localhost:3000/status
Anyone has an idea of what could be the issue ?
Already spent a lot of time on this, and I'm out of ideas. Thanks for your help !

Can't update AWS App Runner after creation

The app runner is successfully created and works fine, but any attempt to change the configuration gets an error. It seems that the healthcheck does not work, although after creation everything works fine.
[AppRunner] Service status is set to RUNNING.
[AppRunner] Service update failed. For details, see service logs.
[AppRunner] Performing health check on path '/healthz' and port '8080'.
[AppRunner] Provisioning instances and deploying image.
[AppRunner] Service status is set to OPERATION_IN_PROGRESS.
[AppRunner] Service update started.
[AppRunner] Service status is set to RUNNING.
[AppRunner] Service creation completed successfully.
[AppRunner] Successfully routed incoming traffic to application.
[AppRunner] Health check is successful. Routing traffic to application.
[AppRunner] Performing health check on path '/healthz' and port '8080'.
[AppRunner] Provisioning instances and deploying image.
[AppRunner] Successfully pulled image from ECR.
[AppRunner] Service status is set to OPERATION_IN_PROGRESS.
[AppRunner] Service creation started.
This happens with any change. For example, here I just changed the healthcheck interval from the default 10 seconds to the maximum 20.
At the same time, it is impossible to find any logs explaining what went wrong. Cloudwatch just duplicates the message Service update failed. For details, see service logs.
If it matters, I'm running the application inside a VPC, with NAT configured. There is also a private db instance with access only from the VPC. Healthcheck /healthz checks access to the internet and to the database, there are no problems with this.
Any ideas what I'm doing wrong or where I can find useful logs would be helpful.

why ECS suddenly restarts?

our example.com suddenly gives 503 service temporarily unavailable error
I check the monitor and get the following
It looks like docker container restarted for some reason, what should I check?

as you can see UnHealthyHostCount jump from 0 to 2, its the ECS services are restarted and due to some reason it still passing the health check or ALB did not yet receive success HTTP status code.
You can look into ECS Cluster
Select cluster -> ECS service -> Event
You will see the reason for health check failure or might be restarted by the user or some other service.

AWS ECS Fargate ALB Error (Request Timed Out)

I have set up a Docker container running on port 5566 with a small Django application. The Docker image is uploaded into the ECR and later used by Fargate container(s).
I have set up an ECS cluster with a VPC.
After creating the Task Definition and Service, the Service starts up 2 tasks (as it is supposed to):
Here's the Service's Network Access (with health check grace period on 300s):
I also set up an Application Load Balancer (with DNS) with a target group for the service, but the health checks seem to be failing:
Here's the health check configuration:
Because the health checks are failing the tasks are terminated and new ones are started after ~every 5 minutes.
Here's the container's port mapping:
As one cannot access the Fargate container (via SSH for example) and the logs are empty, how should I troubleshoot the issue?
I have tried to follow every step in the Troubleshoot Your Application Load Balancer.
Feel free to ask additional information.

can you confirm once, your application is working on port 5566 inside docker?
you can check logs in cloudwatch. you'll get the link in cluster -> service -> tasks -> your task.
Can you post your ALB configuration? your Target group port?

What should the path of healthcheck be in the target group created for Fargate Service

I deployed docker image using AWS Fargate. When I created a service out of the task definition, logs show that tomcat has no errors and app is up and running but new instances are getting constantly getting spun as health check is failing
Health Checks (On target group tied to the service)
Protocol: HTTP
Path: /Sampler/data/ping
Port: traffic/port
What is the right path for health check?
I tried giving servicename too, but it did not work
for example: /servicename/data/ping
Can you please suggest what I am missing?
I have deployed the same war file in local by running docker run -p 8080:8080 sampler:latest (same image pushed from local to ECR) and when I hit the URL http://localhost:8080/Sampler/data/ping, I get 200 status code
Dockerfile
FROM tomcat:9.0-jre8-alpine
COPY target/Sampler-*.war $CATALINA_HOME/webapps/Sampler.war
EXPOSE 80

The path for the health check depends on your application. Based on the information you have provided, I suspect the issue could be related to healthCheckGracePeriodSeconds
healthCheckGracePeriodSeconds
The period of time, in seconds, that the Amazon ECS service scheduler ignores unhealthy
Elastic Load Balancing target health checks after a task has first started.
https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_Service.html
When ECS tasks took a long time to start, Elastic Load Balancing (ELB) health checks could mark the task as unhealthy and the service scheduler would shut the task down.
You can specify a health check grace period in ECS service definition parameter. This instructs the service scheduler to ignore ELB health checks for a pre-defined time period after a task has been instantiated.
https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-ecs-adds-elb-health-check-grace-period/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js