For some reason AWS Codedeploy rollback seems to always pick up the latest version and fails
Deployment 1 is success and a revision is created in S3 bucket.
Deployment 2 is failure and Codedeploy rollback kicks in which is Automatic
Deployment 3 also fails for the same reason as Deployment 2
Expected Codedeploy behaviour is for Deployment 3 it should pick up the Deployment 1 S3 build version.
I am not sure if there are any missing links in S3 bucket with Codedeploy. Any thoughts much appreciated.
Thank you
Not sure if this applies to your situation specifically, but "strange" rollback behavior of CodeDeploy is documented:
However, if the deployment that failed was configured to overwrite, instead of retain files, an unexpected result can occur during the rollback.
Thus it is possible that you are observing these "unexpected results" that can occur when you deploy and fail with an existing content.
You can read up more on that in:
Rollback behavior with existing content
After a bit of investigation with AWS CLI, I can see the the version being re written. Things are more clear when using the AWS CLI than what is shown in the console.
Thank you for taking the time to post back with a possible answer
Related
I have gotten a little bit stuck with this one. I am trying to use Code Deploy on a Windows Server EC2 instance with no luck, it keeps getting stuck before Application Stop and all phases are Pending until it fails then they are all Skipped.
What I've checked so far:
I have installed the Code-Deploy Agent on the server and made sure it was running
I have checked and double checked the in-bound and out-bound permissions on the EC2 instance (allowed all HTTP/HTTPS requests)
I have checked the IAM role on the Code Deploy application itself (I have given all the permissions i can think of)
I checked the appspec.yml (it only needs to transfer build files from the build phase to a folder on the EC2 itself
version: 0.0
os: windows
files:
- source: \path
destination: \path
hooks:
BeforeInstall:
AfterInstall:
ApplicationStart:
I have no idea why this would happen (I've deployed on Linux instances without this problem - the agent always started reading the appspec.yml)
Any help would be appreciated. Thanks!
By design, ApplicationStop is always executed from your last successful deployment's archive since that's when you started your application. This way CodeDeploy makes sure the scripts used for starting and stopping an application belong to the same revision [1]. We don't have complete data, but it could be that the ApplicationStop script from last deployment is causing the issue.
As per [1]:
If the cause of the failure is a script from the last successful
deployment that never runs successfully, create a deployment and
specify that the ApplicationStop, BeforeBlockTraffic, and
AfterBlockTraffic failures should be ignored. There are two ways to do
this:
Use the CodeDeploy console to create a deployment. On the Create
deployment page, under ApplicationStop lifecycle event failure, choose
Don't fail the deployment to an instance if this lifecycle event on
the instance fails.
Use the AWS CLI to call the create-deployment command and include the
--ignore-application-stop-failures option.
[1] https://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-deployments.html#troubleshooting-deployments-lifecycle-event-failures
If any future readers com across this thread then may I refer them to the following article which helped me with a "Net::OpenTimeout" error in CodeDeploy that manifested itself in a similar way.
https://aws.amazon.com/premiumsupport/knowledge-center/codedeploy-net-opentimeout-errors/
If I perform a deployment with AWS CodeDeploy that has an incorrect (failing) script for the ApplicationStop hook, it seems that I cannot do new deployments at all because the ApplicationStop hook from the previous (failed) deployment is run instead of the new one, and it just constantly fails.
Beyond deleting and recreating the application and/or deployment group, is there any way to tell code deploy to do a deployment as if it was a completely new deployment, and not run the ApplicationStop hook?
Relatedly, is it possible to ignore errors in ApplicationStop?
Yes it is possible to ignore errors in ApplicationStop. The CreateDeployment API call has a flag “ignoreApplicationStopFailures” [1] which can be used in this case.
If the cause of the failure is a script from the last successful deployment that will never run successfully, create a new deployment (you can copy a deployment as well) and specify that the ApplicationStop failures should be ignored. You can do this in two ways:
Use the AWS CodeDeploy console to create a deployment. On the Create deployment page, under ApplicationStop lifecycle event failure, choose Don't fail the deployment to an instance if this lifecycle event on the instance fails.
Use the AWS CLI to call the create-deployment command and include the --ignore-application-stop-failures option.
When you deploy the application revision again, the deployment will continue even if the ApplicationStop lifecycle script fails. If the new revision includes fixed scripts for the ApplicationStop lifecycle event, future deployments can succeed without applying this fix.
References:
[1]https://docs.aws.amazon.com/codedeploy/latest/APIReference/API_CreateDeployment.html#CodeDeploy-CreateDeployment-request-ignoreApplicationStopFailures
[2]https://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-deployments.html#troubleshooting-deployments-lifecycle-event-failures
I have a CodePipeline set up where changes to code builds and pushes an image to ECR. I am looking to automate updating ECS with the new image as it is built. I have configured the ECS Blue/Green action but when it runs it fails almost immediately with a message about an "Internal Error". There is no failed deployment created in CodeDeploy.
I have configured CodePipeline with two inputs:
the source code used to build the image
a zip in S3 containing the appspec.yaml and the taskdef.json
When either input changes I rebuild the container and push to ECR tagged 'latest'. The next step should be a Blue/Green deployment to ECS. I have configured CodeDeploy and the job works if triggered manually.
When it is triggered via CodePipeline it will fail and I receive a message "Action execution failed
InternalError. Error reference code: <some id>". I suspect that there may be some underlying issue with IAM but I can't find where to start looking at this stage. There is no failed deployment shown in CodeDeploy so I don't see a way to get more information about what has failed.
My policy for CodePipeline is copied from the one documented here: https://docs.aws.amazon.com/codepipeline/latest/userguide/how-to-custom-role.html#view-default-service-role-policy
I have read through the troubleshooting docs here: https://docs.aws.amazon.com/codepipeline/latest/userguide/troubleshooting.html
I believe my issue is similar to the one described here: https://forums.aws.amazon.com/thread.jspa?messageID=897822
After a bit more reading of similar posts here, on serverfault.com and the AWS forums I have been able to resolve this.
In my case the issue was that my taskdef.json was not valid. It took me several hours going through each step to realise that while it was valid JSON it only included the container definitions section. On fixing that it appears to now be working correctly.
In the end I have two, related, CodePipelines. One for deploying updated ECR images to ECS (described above) and the other which updates infrastructure and generates a zip containing taskdef.json and appspec.yaml. If that zip changes then my container pipeline runs; likewise if the container image source changes. It needs more testing but right now this appears to be working very smoothly.
I have a AWS CodeDeploy which deploy 3 Instances. No matter what deployment configure I set (oneAtTime, halfAtTime, allAtTime) or even use customized type (HOST_COUNT, min_health_host = 2 (cannot set 3 because that is not how codedeploy works), sometime I got codeDepoly succeeds even only 2 instances are successfully deployed.
I have talked to AWS support center. They said it is expected and I know why it is expected. Looks like their calculation works only if there are tons of instances to be deployed.
But in my case, it does not make sense that 2 out of 3 success means success. Is anybody unhappy about this behaviors and have any workaround?
The way CodeDeploy seems to have been designed is to try to have successful overall deployments, so, if you are looking to have your overall deployment to fail because one of the instance deployments failed, then maybe CodeDeploy is not what you are looking for. Additionally, this is the math behind the deployment configurations and the overall deployment failures for 3 instances:
AllAtOnce: The overall deployment will fail only if ALL 3 instance deployments failed. Meaning that having 1 successful instance deployment then the overall deployment will be successful.
HalfAtATime: The overall deployment will fail if 2 instance deployments failed. Having 2 instance deployments successful means the overall deployment will be successful.
OneAtATime: The overall deployment will fail if the first or the second deployment fails. If the third deployment fails, the overall deployment will be successful.
I'm using CircleCI with AWS CodeDeploy to automate the deployment of my app on EC2 from a Github repository.. The problem is that the build is always getting retried with this message given:
"Looks like we had a bug in our infrastructure, or that of our providers (generally GitHub or AWS). We should have automatically retried this build. We've been alerted of the issue and are almost certainly looking into it, please contact us if you're interested in the cause or if the problem persists."
The last step the build stops after is called: "Bootstrap AWS CodeDeploy"
I don't know whether the problem is from the circle.yml file or the appspec.yml file!
Any Help??
Are you able to see a deployment created on your AWS CodeDeploy console? If a deployment is created, then it means the build process succeeded, but the deployment failed. If deployment is failed, you can try to see the root cause for further investigation. Otherwise the failure should be something related to CircleCI.
Thanks,
Binbin