AWS CodeDeploy hangs before ApplicationStop on a Windows Server 2016 - amazon-web-services

I have gotten a little bit stuck with this one. I am trying to use Code Deploy on a Windows Server EC2 instance with no luck, it keeps getting stuck before Application Stop and all phases are Pending until it fails then they are all Skipped.
What I've checked so far:
I have installed the Code-Deploy Agent on the server and made sure it was running
I have checked and double checked the in-bound and out-bound permissions on the EC2 instance (allowed all HTTP/HTTPS requests)
I have checked the IAM role on the Code Deploy application itself (I have given all the permissions i can think of)
I checked the appspec.yml (it only needs to transfer build files from the build phase to a folder on the EC2 itself
version: 0.0
os: windows
files:
- source: \path
destination: \path
hooks:
BeforeInstall:
AfterInstall:
ApplicationStart:
I have no idea why this would happen (I've deployed on Linux instances without this problem - the agent always started reading the appspec.yml)
Any help would be appreciated. Thanks!

By design, ApplicationStop is always executed from your last successful deployment's archive since that's when you started your application. This way CodeDeploy makes sure the scripts used for starting and stopping an application belong to the same revision [1]. We don't have complete data, but it could be that the ApplicationStop script from last deployment is causing the issue.
As per [1]:
If the cause of the failure is a script from the last successful
deployment that never runs successfully, create a deployment and
specify that the ApplicationStop, BeforeBlockTraffic, and
AfterBlockTraffic failures should be ignored. There are two ways to do
this:
Use the CodeDeploy console to create a deployment. On the Create
deployment page, under ApplicationStop lifecycle event failure, choose
Don't fail the deployment to an instance if this lifecycle event on
the instance fails.
Use the AWS CLI to call the create-deployment command and include the
--ignore-application-stop-failures option.
[1] https://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-deployments.html#troubleshooting-deployments-lifecycle-event-failures

If any future readers com across this thread then may I refer them to the following article which helped me with a "Net::OpenTimeout" error in CodeDeploy that manifested itself in a similar way.
https://aws.amazon.com/premiumsupport/knowledge-center/codedeploy-net-opentimeout-errors/

Related

Failed to deploy application: During an aborted deployment, some instances may have deployed the new application version

I can't deploy a new version on Elastic Beanstalk.
Everything was working fine until I tried to deploy a new version where I have lots of issues (It is not the first time I deploy a new version on this environment, I already have deployed dozens). When I manage to fixe all of them I got those errors:
Failed to deploy application.
During an aborted deployment, some instances may have deployed the new application version. To ensure all instances are running the same version, re-deploy the appropriate application version.
Unsuccessful command execution on instance id(s) 'i-...'. Aborting the operation
I redeploy the version which does not work.
Here is the Elastic Beanstalk console:
Elastic Beanstalk console
Elastic Beanstalk events
The request logs button from Elastic Beanstalk return nothing.
The system log from EC2 instance shows the last working version logs.
I enable the CloudWatch logs from Configuration navigation pane. It added 4 files to CloudWatch logs:
/var/log/eb-activity.log -> empty so far
/var/log/httpd/access_log -> empty so far
/var/log/httpd/error_log -> empty so far
/environment-health.log -> Command is executing on all instances (56 minutes or more elapsed).", "Incorrect application version found on all instances. Expected version \"prod-v1.7.28-0\" (deployment 128).
It is an Amazon Linux, t2.medium instance with Apache as web server
What I already try:
Change the name of .zip each time to be different of other zip already deploy
Terminate the instance and the loadBalancer automatically create a new one
Reboot the instance
Rebuild Elastic Beanstalk environment
Deploy a simplest code
I tried to deploy just a zip with the code below but I got same errors.
<html>
<head>
<title>This is the title of the webpage!</title>
</head>
<body>
<p>This is an example paragraph. Anything in the <strong>body</strong> tag will appear on the page, just like this <strong>p</strong> tag and its contents.</p>
</body>
</html>
It always go back to last working version and when I tried to deploy the new version it does not work.
On some post I see some people telling it is maybe because the instance is too small but before it was working perfectly and the size does not change since then.
If you have some questions or ideas I will be very thankful.
Have a nice day !
Answer:
The issue was in the logs like you said. I had to ssh into my EC2 instance to reached them. The error was in the file cfn-init-cmd.log.
One of the command was waiting for an input so it timed out with no error message.
You should check the logs of the EBS for any hints as to what goes wrong with your deployment. The AWS console
can be helpful for that.
There are also the logs that can be acquired from EC2:
CloudWatch logs is another thing to check.
You should also check the autoscaling group, and see if there are any health checks there. What kind of checks are these? What's the grace period?
Here's a list of reasons that an EC2 health check could fail.
You could launch a better ec2 instance for troubleshooting.
Instance status checks.
The following are examples of problems that can cause instance status checks to fail:
Failed system status checks
Incorrect networking or startup configuration
Exhausted memory
Corrupted file system
Incompatible kernel
Also rebuilding is really a drastic step as it destroys and rebuilds all your resources. Your ELB DNS for example will be gone, any associated EIP will be released. These things can't be reclaimed.
I also faced same issue and deleted the wrong application versions. And increased the command timeout.
Default max deployment time -Command timeout- is 600 (10 minutes)
Go to Your Environment → Configuration → Deployment preferences → Command timeout
Increase the Deployment preferences higher like 1800 and then try to deploy the previous working application version. It will work.

How to debug failed NetCore AWS Elastic Beanstalk deployment?

I have an DotNet Core AWS Elastic Beanstalk environment which has started failing to deploy. The environment waits up to 10 minutes for the healthcheck to pass, but consistent gets "403 - Forbidden: Access is denied.".
I've RDP'd to the environment and the folder C:\inetpub\AspNetCoreWebApps is empty. In working environments, this contains the code.
I've tried redeploying the entire environment and deploying a package from a week ago which was previously fine. Additionally, I've tried deploying using the AWS Toolkit for Visual Studio and by uploading a package rather than using CodePipeline. All of these actions result in the same behaviour.
I'm struggling to find any logs which indicate why the code isn't being deployed to the environment. Requesting the last 100 lines doesn't contain anything useful and I can't find any deployment logs on the filesystem. In the pulled logs there is no AWS.DeploymentCommands log file.
The environment is configured to deploy as rolling with batch +1. As such, it is a new EC2 which is being written to.
What is the next step in debugging the cause of the failure?
I was able to diagnose the problem - a public file referenced in the ebextension folder had been deleted. The log file I was looking for was in C:\cfn\log\cfn-init.txt.

Code Deploy fails without any error message

so I have been trying to setup code deploy for my application, but it keeps on failing. Initially, I didn't have an appspec.yml file in repository, so I got the error message that the appspec.yml file doesn't exist.
I have now included an appspec.yml file, but it still doesn't work and it doesn't give any error message. There are no events mentioned, like it used to before adding the appspec file.
I have less than a beginner's knowledge when it comes to creating a appspec.yml file, but I took hint from a youtube tutorial, and here is the file.
version: 0.0
os: linux
files:
- source: /
destination: /var/www/cms
If it helps, the ec2 instance is running an ubuntu server, /var/www/cms is that directory out of which the nginx is supposed to serve files.
The most likely problem you're facing is that the agent either isn't installed or the instance doesn't have sufficient permissions. When there are no events started on the instance for the deployment, it means that CodeDeploy couldn't talk to the host for some reasons.
Here's the steps I would take:
Confirm that you installed the CodeDeploy agent
Confirm that you've created the IAM service role
Confirm that you have the IAM Instance Profile and that it's associated with the instance
Check that you can reach the CodeDeploy commands endpoint in your region from the box. i.e. ping codedeploy.us-east-1.amazonaws.com Otherwise, your networking setup might be too restrictive.
Look at the logs on the host to see what's going on
I faced at sometime this thing and it was due to the following:
If we initially created and turned on the ec2 instance without setting the IAM service role, and after that we added the service role, it will not take effect until we restart the instance.
I had attached IAM role to EC2 instance but I did not restart my systemd service. And that was the cause of failure.
Also, without rebooting instance, you can just restart systemd service of codedeploy-agent.
In case it helps, I had the same problem and the reason was that codedeploy agent was not installed in the ec2 instance.
After installing it, everything worked like a charm.

AWS CodeDeploy executes before Auto Scaling userdata scripts finishes

I'm trying to setup an Auto Scaling Group in combination with CodeDeploy. Everything works fine except for the fact that when a new instance is created CodeDeploy starts before the user data script (defined in the Launch Configuration) finishes.
The default value of this user data script downloads and install the code deploy agent and i've extended it with installation of a couple of windows features, IIS rewrite module and msdeploy.
In my appspec.yml i'm using the hook AfterInstall to deploy my IIS website and this obviously fails when msdeploy is not installed (yet).
Am i going about this the wrong way or is there a way to make CodeDeploy wait for the user data script to finish?
Unfortunately, there's no was for CodeDeploy to know anything more than the instance has loaded it's OS. The good thing is that CodeDeploy give the host agent 1 hour to start polling for commands with automatic deployments. The easiest thing to do is install the host agent after all the required dependencies are installed. The automatic deployment will be created, but can't proceed until after the host agent is started.
This is explained in detail here - https://aws.amazon.com/blogs/devops/under-the-hood-aws-codedeploy-and-auto-scaling-integration/
Ordering execution of launch scripts – The CodeDeploy agent looks for and executes deployments as soon as it starts. There is no ordering between the deployment execution and launch scripts such as user data, cfn-init, etc. We recommend you install the host agent as part of (and maybe as the last step in) the launch scripts so that you can be sure the deployment won’t be executed until the instance has installed dependencies that are not part of your CodeDeploy deployment. If you prefer baking the agent into the base AMI, we recommend that you keep the agent service in a stopped state and use the launch scripts to start the agent service.

Codedeploy with AWS ASG

I have configured an aws asg using ansible to provision new instances and then install the codedeploy agent via "user_data" script in a similar fashion as suggested in this question:
Can I use AWS code Deploy for pulling application code while autoscaling?
CodeDeploy works fine and I can install my application onto the asg once it has been created. When new instances are triggered in the ASG via one of my rules (e.g. high cpu usage), the codedeploy agent is installed correctly. The problem is, CodeDeploy does not install the application on these new instances. I suspect it is trying to run before the user_data script has finished. Has anyone else encountered this problem? Or know how to get CodeDeploy to automatically deploy the application to new instances which are spawned as part of the ASG?
AutoScaling tells CodeDeploy to start the deployment before the user data is started. To get around this CodeDeploy gives the instance up to an hour to start polling for commands for the first lifecycle event instead of 5 minutes.
Since you are having problems with automatic deployments but not manual ones and assuming that you didn't make any manual changes to your instances you forgot about, there is most likely a dependency specific to your deployment that's not available yet at the time the instance launches.
Try listing out all the things that your deployment needs to succeed and make sure that each of those is available before you install the host agent. If you can log onto the instance fast enough (before AutoScaling terminates the instance), you can try and grab the host agent logs and your application's logs to find out where the deployment is failing.
If you think the host agent is failing to install entirely, make sure you have Ruby2.0 installed. It should be there by default on AmazonLinux, but Ubuntu and RHEL need to have it installed as part of the user data before you can install the host agent. There is an installer log in /tmp that you can check for problems in the initial install (again you have to be quick to grab the log before the instance terminates).