AWS CodeDeploy detected that the replacement task set is unhealthy? - amazon-web-services

I have an ECS Fargate app up and running on AWS, and I was deploying a newer version of my code through CodeDeploy blue-green deployment. I have been using this method for quite some time, and I have never encountered any problems before unless there was actually a problem with the app itself. As always, I initiated the deployment and waited until all the tasks were running, and checked that the traffic has been rerouted to the newer version of task sets. I tested the app on a couple of devices and made sure that it was working correctly. However, after around 20 minutes or so, my service was down for a few minutes and I get an error message like this on CodeDeploy : CodeDeploy detected that the replacement task set is unhealthy. I expected codedeploy to automatically roll-back the deployment, but it was still the newer version of task set that was receiving traffic, and it was working fine! I did see a couple of stopped tasks but I do not have access to their logs anymore since stopped tasks somehow evaporate and are not accessible after some time. I re-ran the deployment with the exact same task definition, and that worked fine too. Does anyone have any idea what might cause a task set to be in an unhealthy state? Thanks so much!
below is the image of the error
deployment status

Related

aws: Networking issue

While running ecs fargate task aws automatically stopped my task with error
There was an error while describing network interfaces.
The networkInterface ID 'eni-0c21gdfgerg' does not exist
My task was running for more than a day but now it suddenly stopped.
I checked that eni- and that eni is not existing.
How can I troubleshoot it?
Seems like ecs fargate task have different stages and in Deprovisioning stage it deletes all the networking related stuff where network interface is also deleted.
Also I was viewing this task in Stopped stage that's why i was getting the error.
I faced the exactly same issue. "networkInterface ID 'eni-xx' does not exist. My exit code is 1. The application error.
I'm having a python project. It turns out I missed one dependency name in my requirement.txt.

Old ECS tasks still referenced in new deploy

I have an application deployed to Fargate on ECS using CDK. The first time I deployed the stack worked fine. The task passed all the healthchecks and everything was good.
I then tried to use the GitLab CD integration which failed.
I destroyed the stack with (cdk destroy) and after cdk deploy the task isn't passing the healthchecks (Even though I see in the logs that the app is responding just fine to the requests). Moreover I noticed that the new cluster contains somehow the old references of the failed tasks of the previous stack.
Here is how it looks:
Any ideea on why this behaviour is happening? and/or how to solve it?

AWS Fargate clusters stop logging shortly after being started up, although still being healthy and running

We have a blue/green deployment in AWS using Fargate.
In this deploy we run 2 tasks. They are being started and healthy and we see the first logs, in the ECS tasks themselves and in Cloudwatch. After some minutes though, we don't get any new log messages in either one anymore.
The instances though keep on running and all looks like normal.
I found this post here that describes a very similar issue, but does not have any answers yet: https://forums.aws.amazon.com/thread.jspa?messageID=987634&tstart=0
Anybody experienced something similar or can advice where to check for possible causes?

Why is code deploying saying instances are too few or unhealthy?

Im using codebuild, codepipeline and code deploy on AWS. I want code deploy to deploy a built java jar to an ec2 which is part of an ASG. Pulling the code from github and building goes fine but once code pipeline gets to the deploy phase it pauses for about 5 minutes then fails and gives this message.
The overall deployment failed because too many individual instances
failed deployment, too few healthy instances are available for
deployment, or some instances in your deployment group are
experiencing problems.
I have followed these docs perfectly several times over but still get the error.
Integrating code deploy with auto scaling groups
Create/configure ec2 auto scaling group
Deploy the application
Its as if the deploy stage in code pipeline just doesnt pick up the artifact because no events happen and there is no terminal output durig the deploy phase. It just hangs and then fails 5 minutes later. When i click on the link for the old experience Aws routes you to an old version of code deploy and there I can see an error code
Error code: HEALTH_CONSTRAINTS
But otherwise I dont see any other information. Ive looked at this link too. explanation for health concerns ... But still is no help because I have tried those things and still the problem persists. Any help would be greatly appreciated as I have been at this for a couple days now.
I added tags and it started working. In my case despite them being optional, they were necessary to deploy the application.

Codedeploy with AWS ASG

I have configured an aws asg using ansible to provision new instances and then install the codedeploy agent via "user_data" script in a similar fashion as suggested in this question:
Can I use AWS code Deploy for pulling application code while autoscaling?
CodeDeploy works fine and I can install my application onto the asg once it has been created. When new instances are triggered in the ASG via one of my rules (e.g. high cpu usage), the codedeploy agent is installed correctly. The problem is, CodeDeploy does not install the application on these new instances. I suspect it is trying to run before the user_data script has finished. Has anyone else encountered this problem? Or know how to get CodeDeploy to automatically deploy the application to new instances which are spawned as part of the ASG?
AutoScaling tells CodeDeploy to start the deployment before the user data is started. To get around this CodeDeploy gives the instance up to an hour to start polling for commands for the first lifecycle event instead of 5 minutes.
Since you are having problems with automatic deployments but not manual ones and assuming that you didn't make any manual changes to your instances you forgot about, there is most likely a dependency specific to your deployment that's not available yet at the time the instance launches.
Try listing out all the things that your deployment needs to succeed and make sure that each of those is available before you install the host agent. If you can log onto the instance fast enough (before AutoScaling terminates the instance), you can try and grab the host agent logs and your application's logs to find out where the deployment is failing.
If you think the host agent is failing to install entirely, make sure you have Ruby2.0 installed. It should be there by default on AmazonLinux, but Ubuntu and RHEL need to have it installed as part of the user data before you can install the host agent. There is an installer log in /tmp that you can check for problems in the initial install (again you have to be quick to grab the log before the instance terminates).