AWS Fargate 503 Service Temporarily Unavailable - amazon-web-services

I'm trying to deploy backend application to the AWS Fargate using cloudformation templates that I found. When I was using the docker image training/webapp I was able to successfully deploy it and access with the externalUrl from the networking stack for the app.
When I try to deploy our backend image I can see the stacks are deploying correctly but when I try to go to the externalUrl I get 503 Service Temporarily Unavailable and I'm unable to see it... Another thing that I've noticed is on the docker hub I can see that the image is continuously pulled all the time when the cloudformation services are running...
The backend is some kind of maven project I don't know exactly what but I know that locally its working but to get it up running the container with this backend image takes about 8 minutes... I'm not sure if this affects the Fargate ?? Any Idea how to get it working ?

It sounds like you need to find the actual error that you're experiencing, the 503 isn't enough information. Can you provide some other context?
I'm not familiar with fargate but have been using ecs quite a bit this year and I generally would find that by going to (on the dashboard) ecs -> cluster -> service -> events. The events tab gives more specific errors as to what is happening.
My ecs deployment problems are generally summarized into
the container is not exposing the same port as is in the definition, this could be the case if you're deploying from a stack written by someone else.
the task definition memory/cpu restrictions don't grant enough space for the application and it has trouble placing (probably a problem with ecs more than fargate but you never know.)
Your timeout in the task definition is not set to 8 minutes: see this question, it has a lot of this covered
Your start command in the task definition does not work as expected with the container you're trying to deploy
If it is pulling from docker hub continuously my bet would be that it's 1, 3 or 4, and it's attempting to pull the image over and over again.

Try adding a Health check grace period of 60 by going to ECS -> cluster -> service -> update Network Access section.

Related

Configure cloud-based vscode ide on AWS

CONTEXT:
We have a platform where users can create their own projects - multiple projects per user. We need to provide them with a browser-based IDE to edit those projects.
We decided to go with coder-server. For this we need to configure an auto-scalable cluster on AWS. When the user clicks "Edit Project" we will bring up a new container each time.
https://hub.docker.com/r/codercom/code-server
QUESTION:
How to pass parameters from the url query (my-site.com/edit?project=1234) into a startup script to pre-configure the workspace in a docker container when it starts?
Let's say the stack is AWS + ECS + Fargate. We could use kubernetes instead of ECS if it helps.
I don't have any experience in cluster configuration. Will appreciate any help or at least a direction where to dig further.
The above can be achieved using multiple ways in AWS ECS. The basic requirements for such systems are to launch and terminate containers on the fly while persisting the changes in the files. (I will focus on launching the containers)
Using AWS SDK's:
The task can be easily achieved using AWS SDKs, Using a base task definition. AWS SDK allows starting tasks with overrides on the base task definition.
E.G. If task definition has a memory of 2GB then the SDK can override the memory to parameterised value while launching a task from task def.
Refer to the boto3 (AWS SDK for Python) docs.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task
Overall Solution
Now that we know how to run custom tasks with python SDK (on demand). The overall flow for your application is your API calling AWS lambda function whit parameters to spin up and wait to keep checking task status and update and rout traffic to it once the status is healthy.
API calls AWS lambda functions with parameters
Lambda function using AWS SDK create a new task with overrides from base task definition. (assuming the base task definition already exists)
Keep checking the status of the new task in the same function call and set a flag in your database for your front end to be able to react to it.
Once the status is healthy you can add a rule in the application load balancer using AWS SDK to route traffic to the IP without exposing the IP address to the end client (AWS application load balancer can get expensive, I'll advise using Nginx or HAProxy on ec2 to manage dynamic routing)
Note:
Ensure your Image is lightweight, and the startup times are less than 15 mins as lambda cannot execute beyond that. If that's the case create a microservice for launching ad-hoc containers and hosting them on EC2
Using Terraform:
If you looking for infrastructure provisioning terraform is the way to go. It has a learning curve so recommend it as a secondary option.
Terraform is popular for parametrising using variables and it can be plugged in easily as a backend for an API. The flow of your application still remains the same from step 1, but instead of AWS Lambda API will be calling your ad-hoc container microservice, which in turn calls terraform script and passing variables to it.
Refer to the Terrafrom docs for AWS
https://registry.terraform.io/providers/hashicorp/aws/latest

AWS Fargate 502 Bad Gateway after initial load

I've been following a Fargate/docker tutorial here: https://medium.com/containers-on-aws/building-a-socket-io-chat-app-and-deploying-it-using-aws-fargate-86fd7cbce13f
Here is my Dockerfile
FROM mhart/alpine-node:15 AS build
WORKDIR /srv
ADD package.json .
RUN yarn
ADD . .
FROM mhart/alpine-node:base-9
COPY --from=build /srv .
EXPOSE 3000
CMD ["node", "index.js"]
I have two Fargate stacks
production was created from the AWS CloudFormation public-service template
chat was created from the AWS CloudFormation public-vpc template with some parameter substitutions from the tutorial:
The production stack exposes a valid ExternalUrl output parameter
When I open the URL, I can see a successful initial load of the index
But resources respond with a 502 (Bad Gateway)
And if I refresh the URL, the index throws the error as well
I'm new to AWS and Fargate. Are there server logs I should check? Could this be a problem with either of the templates (public-vpc.yml or public-service.yml) that I used for setup? Any help is appreciated — thank you.
Looks like your health checks are failing and so the instance is being stopped. You can validate that by navigating to ECS -> Clusters -> (Cluster) -> (Service) -> Tasks -> Stopped - this will show the list of containers that have recently been stopped and why.
I haven't dug through the CloudFormation, but I bet that the mapping of the Health Check to the container port is wrong.
Mystery solved! Thank you #Marcin and #cynicaljoy for your help. #cynicaljoy, I checked the ECS cluster status, but there was nothing out of the ordinary — the chat task was RUNNING.
#Marcin, I followed your lead and re-created the stacks and app. Now it works! My issue was neglecting to match up the correct regions in all of my AWS commands the first time around. Some were run with us-west-1 and some with us-west-2. Once I matched those up, the gateway problem went away.

AWS ECS Blue/Green deployment loses my code

I have a python3 project which runs in a docker container environtment.
My Python project uses AWS Acces keys and secret but using a credentials file stored in the computer which is added to the container using ADD.
I deployed my project to EC2. The server has one task running which works fine. I am able to go through port 8080 to the webserver (Airflow).
When I do a new commit and push to a master branch in github, the hook download the content and deploy it without build stage.
The new code is in the EC2 server, I check it using ssh but the container that runs in the task get "stuck" and the bind volumes dissapear and they are not working until I restart a new task. The volumes are applied again from 0, and those reference to the new code. This action is fully manual.
Then, to fix it I listen about AWS ECS Blue/Green deployment, so I implemented it. In this case the Codepipeline add a build stage, but here starts the problem. If in the build I try to push a docker image to the ECR, which my task definition makes reference it fails. It fails because in the server, and in the repo (which I commit push my new code) there is no credentials file.
I tryed doing the latest docker image from my localhost, and avoiding build stage in codepipeline, and it works fine, but then when I go to the 8080 port in both working ip's I am able to get in the webserver, but the code is not there. If i click anywhere it says code not found.
So, in a general review I would like to understand what i am doing wrong, and how to fix in a general guidelines, and in the other hand ask why my EC2 instance in the AWS ECS Blue/Green cluster has 3 ip's.
The first one is the one that I use to reach server through port 22. And if there I run docker ps I see one or two containers running depending if I am in the middle of a new deployment. If here I search my new code its not here...
The other two ip's are changing after every deployment (I guess its blue and green) and both work fine until Codepipeline destroys the green one (5 minutes wait time), but the code is not there. I know it because when I click in any of the links in the webserver it fails saying the Airflow Dag hasn't been found.
So my problem is that I have a fully working AWS ECS Blue/Green deployment but without my code. Then my webserver doesn't have anything to run.

Default message or image to appear when AWS servers are at stopped state

we have an AWS infrastructure as follows
Client --> Internet Facing LoadBalancer --> Firewall --> Internal LoadBalancer--> ECS Containers
we have an LAMBDA FUNCTION, which stops the servers during night time and starts again at Morning time to reduce billing .
during the instances are in stopped state, if any one access the docker containers, obviously they will face "503 service unavailable"
now the task I need to perform is:
Is there any elagant way to show any personalized text or image like(ex: showing servers are in stopped state please visit again at Morning time) rather than this ugly 503 unavailable.
How can I approch to achieve the above scenario ?
what services in AWS can I make use?
Any Ideas or procedures are highly appreciated.
Thanks in Advance, Cheers :)
If you can add CloudFront in front of your application then you have the option to specify custom error pages. By following this method it will retrospectively kick in from the moment that you start serving 503 errors.
If you are unable to use CloudFront, then assuming you are using an Application Load Balancer somewhere in your stack you could update one of your listeners to instead return a fixed response during this time. This would require you to automate the rollout everytime which would be a disadvantage.

ECS - task fails to start but container looks fine. Dosn't show error in logs

I am completely new to AWS and trying this for a cloud assignment. I created a ECS cluster and tried adding both fargate and EC2 tasks with a container. My task fails to start on creation itself and doesn't show any errors in logs. My container is active and doesn't show any errors in details. I am not sure how to debug this. Please , need help.
STOPPED (Task failed to start)
The first step I usually do in this case is to download the docker image from ECR and try and run it on my local machine. If that works, then I SSH into the EC2 host and do the same thing. Usually you'll see something go wrong at that point, and you can start debugging from there.