AWS Fargate 502 Bad Gateway after initial load - amazon-web-services

I've been following a Fargate/docker tutorial here: https://medium.com/containers-on-aws/building-a-socket-io-chat-app-and-deploying-it-using-aws-fargate-86fd7cbce13f
Here is my Dockerfile
FROM mhart/alpine-node:15 AS build
WORKDIR /srv
ADD package.json .
RUN yarn
ADD . .
FROM mhart/alpine-node:base-9
COPY --from=build /srv .
EXPOSE 3000
CMD ["node", "index.js"]
I have two Fargate stacks
production was created from the AWS CloudFormation public-service template
chat was created from the AWS CloudFormation public-vpc template with some parameter substitutions from the tutorial:
The production stack exposes a valid ExternalUrl output parameter
When I open the URL, I can see a successful initial load of the index
But resources respond with a 502 (Bad Gateway)
And if I refresh the URL, the index throws the error as well
I'm new to AWS and Fargate. Are there server logs I should check? Could this be a problem with either of the templates (public-vpc.yml or public-service.yml) that I used for setup? Any help is appreciated — thank you.

Looks like your health checks are failing and so the instance is being stopped. You can validate that by navigating to ECS -> Clusters -> (Cluster) -> (Service) -> Tasks -> Stopped - this will show the list of containers that have recently been stopped and why.
I haven't dug through the CloudFormation, but I bet that the mapping of the Health Check to the container port is wrong.

Mystery solved! Thank you #Marcin and #cynicaljoy for your help. #cynicaljoy, I checked the ECS cluster status, but there was nothing out of the ordinary — the chat task was RUNNING.
#Marcin, I followed your lead and re-created the stacks and app. Now it works! My issue was neglecting to match up the correct regions in all of my AWS commands the first time around. Some were run with us-west-1 and some with us-west-2. Once I matched those up, the gateway problem went away.

Related

Failed to deploy application: During an aborted deployment, some instances may have deployed the new application version

I can't deploy a new version on Elastic Beanstalk.
Everything was working fine until I tried to deploy a new version where I have lots of issues (It is not the first time I deploy a new version on this environment, I already have deployed dozens). When I manage to fixe all of them I got those errors:
Failed to deploy application.
During an aborted deployment, some instances may have deployed the new application version. To ensure all instances are running the same version, re-deploy the appropriate application version.
Unsuccessful command execution on instance id(s) 'i-...'. Aborting the operation
I redeploy the version which does not work.
Here is the Elastic Beanstalk console:
Elastic Beanstalk console
Elastic Beanstalk events
The request logs button from Elastic Beanstalk return nothing.
The system log from EC2 instance shows the last working version logs.
I enable the CloudWatch logs from Configuration navigation pane. It added 4 files to CloudWatch logs:
/var/log/eb-activity.log -> empty so far
/var/log/httpd/access_log -> empty so far
/var/log/httpd/error_log -> empty so far
/environment-health.log -> Command is executing on all instances (56 minutes or more elapsed).", "Incorrect application version found on all instances. Expected version \"prod-v1.7.28-0\" (deployment 128).
It is an Amazon Linux, t2.medium instance with Apache as web server
What I already try:
Change the name of .zip each time to be different of other zip already deploy
Terminate the instance and the loadBalancer automatically create a new one
Reboot the instance
Rebuild Elastic Beanstalk environment
Deploy a simplest code
I tried to deploy just a zip with the code below but I got same errors.
<html>
<head>
<title>This is the title of the webpage!</title>
</head>
<body>
<p>This is an example paragraph. Anything in the <strong>body</strong> tag will appear on the page, just like this <strong>p</strong> tag and its contents.</p>
</body>
</html>
It always go back to last working version and when I tried to deploy the new version it does not work.
On some post I see some people telling it is maybe because the instance is too small but before it was working perfectly and the size does not change since then.
If you have some questions or ideas I will be very thankful.
Have a nice day !
Answer:
The issue was in the logs like you said. I had to ssh into my EC2 instance to reached them. The error was in the file cfn-init-cmd.log.
One of the command was waiting for an input so it timed out with no error message.
You should check the logs of the EBS for any hints as to what goes wrong with your deployment. The AWS console
can be helpful for that.
There are also the logs that can be acquired from EC2:
CloudWatch logs is another thing to check.
You should also check the autoscaling group, and see if there are any health checks there. What kind of checks are these? What's the grace period?
Here's a list of reasons that an EC2 health check could fail.
You could launch a better ec2 instance for troubleshooting.
Instance status checks.
The following are examples of problems that can cause instance status checks to fail:
Failed system status checks
Incorrect networking or startup configuration
Exhausted memory
Corrupted file system
Incompatible kernel
Also rebuilding is really a drastic step as it destroys and rebuilds all your resources. Your ELB DNS for example will be gone, any associated EIP will be released. These things can't be reclaimed.
I also faced same issue and deleted the wrong application versions. And increased the command timeout.
Default max deployment time -Command timeout- is 600 (10 minutes)
Go to Your Environment → Configuration → Deployment preferences → Command timeout
Increase the Deployment preferences higher like 1800 and then try to deploy the previous working application version. It will work.

ECS - task fails to start but container looks fine. Dosn't show error in logs

I am completely new to AWS and trying this for a cloud assignment. I created a ECS cluster and tried adding both fargate and EC2 tasks with a container. My task fails to start on creation itself and doesn't show any errors in logs. My container is active and doesn't show any errors in details. I am not sure how to debug this. Please , need help.
STOPPED (Task failed to start)
The first step I usually do in this case is to download the docker image from ECR and try and run it on my local machine. If that works, then I SSH into the EC2 host and do the same thing. Usually you'll see something go wrong at that point, and you can start debugging from there.

AWS Elastic-Beanstalk: ECS task stopped due to: Task failed to start. (nginx-proxy: php-app: )

I'm trying to move my Laravel website to Elastic beanstalk using this github script:
https://github.com/peec/laravel-aws
Everything should work out of the box, including RDS db, SQS for queues, elastic cache etc..
I did as requested, but, from some reason, I have this error (even though I have nothing on my website folder, Just make elastic beanstalk work with the docker):
Any idea how can I make it run? or, how should I remove the errors above?
you can find the "Dockerrun.aws.json" file on the github library above, didnt change anything except the AWS Keys on the .ebextensions -> options.config
Thank you!
Eran.

AWS Fargate 503 Service Temporarily Unavailable

I'm trying to deploy backend application to the AWS Fargate using cloudformation templates that I found. When I was using the docker image training/webapp I was able to successfully deploy it and access with the externalUrl from the networking stack for the app.
When I try to deploy our backend image I can see the stacks are deploying correctly but when I try to go to the externalUrl I get 503 Service Temporarily Unavailable and I'm unable to see it... Another thing that I've noticed is on the docker hub I can see that the image is continuously pulled all the time when the cloudformation services are running...
The backend is some kind of maven project I don't know exactly what but I know that locally its working but to get it up running the container with this backend image takes about 8 minutes... I'm not sure if this affects the Fargate ?? Any Idea how to get it working ?
It sounds like you need to find the actual error that you're experiencing, the 503 isn't enough information. Can you provide some other context?
I'm not familiar with fargate but have been using ecs quite a bit this year and I generally would find that by going to (on the dashboard) ecs -> cluster -> service -> events. The events tab gives more specific errors as to what is happening.
My ecs deployment problems are generally summarized into
the container is not exposing the same port as is in the definition, this could be the case if you're deploying from a stack written by someone else.
the task definition memory/cpu restrictions don't grant enough space for the application and it has trouble placing (probably a problem with ecs more than fargate but you never know.)
Your timeout in the task definition is not set to 8 minutes: see this question, it has a lot of this covered
Your start command in the task definition does not work as expected with the container you're trying to deploy
If it is pulling from docker hub continuously my bet would be that it's 1, 3 or 4, and it's attempting to pull the image over and over again.
Try adding a Health check grace period of 60 by going to ECS -> cluster -> service -> update Network Access section.

'No hosts succeeded' error on AWS CodeDeploy service

I am trying to set up AWS CodeDeploy for my PHP web app. I have created a CodeDeploy app and a deployment group on the AWS console. I have created the necessary revision bundle with the appspec yaml file. The revision bundle is stored on Amazon S3.
When I click 'Deploy this revision' button on the AWS console it gives me 'no hosts succeeded' error. I went through the Technical FAQ and could not find any answers. How can I counter this error?
UPDATE: I now understand that this error has something to do with Minimum Healthy Hosts count. But still I am not able to understand how does AWS calculate the healthiness of a host.
Basically what its saying is "The codedeploy service on your ec2 instance is not running"...
For why a deployment failed host health is fairly simple. A host is healthy if that host succeeded in deploying the last deployment to it. A host is unhealthy if it failed. A host is unknown if it was Skipped and had no previous deployment.
There are other aspects of host health that affect what order they are deployed to in the next deployment, but that's not going to affect you deployment failing with "No hosts succeeded".
A host can fail it's individual deployment if any of it's lifecycle events failed. A lifecycle event can fail due to service side timeout waiting for the agent to respond or because the host agent reports an error executing the command. You can check the host agent log for more details in exactly why the host agent reported a failure.
If you are hitting the server side timeouts, you should check that the host agent is running and is able to poll for commands correctly. You might have accidentally restricted access in your VPC configuration or didn't grant appropriate permissions to the instance to poll for commands in the instance profile.
This error message means you are not running CodeDeploy service at the EC2 instance targeted by your deployment group.
1) Download latest version of codedeploy from S3 (choose your region)
PS> Read-S3Object -BucketName aws-codedeploy-eu-west-1 -Key latest/codedeploy-agent.msi -File c:\temp\codedeploy-agent.msi
2) Install codedeploy
cmd> c:\temp\codedeploy-agent.msi /quiet /l c:\temp\host-agent-install-log.txt
3) Start codeploy
PS> Start-Service -Name codedeployagent
AWS CodeDeploy guide: http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-run-agent.html#how-to-run-agent-install-windows
I just ran into this issue myself. My solution was to run:
ntpdate-debian
If you are running centos it's something like
ntpdate pool.ntp.org
For me the time was off and was causing issues with the codedeploy agent.
Now, if this doesn't solve your problem. First make sure your problem is that your CodeDeploy agent is not registering. I have had this issue before and it was because one of my instances was in a failed state from a botched deployment so be sure to double check. (ELB status, tests, etc)
Then you should enable logging for your CodeDeploy agent by setting log_aws_wire and verbose to true in /etc/codedeploy-agent/conf/codedeployagent.yml and then restart the CodeDeploy. Tail the logs and you should see the reason for your problems.