To reproduce:
Create a CloudFormation stack containing an RDS instance.
Attempt to delete the stack with
aws cloudformation delete-stack --stack-name=[stack name]
aws cloudformation wait stack-delete-complete --stack-name=[stack name]
Once the second command returns DELETE_FAILED check the stack events list to find this message:
One or more database instances are still members of this parameter group […], so the group cannot be deleted
Manually force deletion of the database instance with
aws rds delete-db-instance --db-instance-identifier=[DB physical ID] --skip-final-snapshot
aws rds wait db-instance-deleted --db-instance-identifier=[DB physical ID]
Repeat step 2.
Once the second command returns DELETE_FAILED check the stack events list to find these messages:
Secrets Manager can't find the specified secret.
and
The following resource(s) failed to delete: [DB logical ID].
Now what? The secret and DB are gone, but the stack can't be deleted.
Last time this happened I was told by AWS Support to simply wait until the stack "caught up" with the fact that the database instance was deleted, but that's not ideal as it takes more than 12 hours.
Sort of relevant: How do I delete an AWS CloudFormation stack that's stuck in DELETE_FAILED status?
Related
I am receiving the following errors in the EC2 CloudWatch Agent logs, /var/logs/awslogs.log:
I verified the EC2 has a role:
And the role has the correct policies:
I have set the correct region in /etc/awslogs/awscli.conf:
I noticed that running aws configure list in the EC2 gives this:
Is this incorrect? Should it list the profile (EC2_Cloudwatch_Profile) there?
I was using terraform and reprovisioning by doing:
terraform destroy && terraform apply
Looks like due to IAM being a global service it is "eventually consistent" and not "immediately consistent", when the profile instance was destroyed, the terraform apply began too quickly. Despite the "destroy" being complete, the arn for the previous profile instance was still there, and was re-used. However, the ID changed to a new ID.
Replacing the EC2 would bring it up to speed with the correct ID. However, my solution is to just wait longer between terraform destroy and apply.
I have been successfully creating and destroying this AWS CloudFormation stack https://github.com/jacekdalkowski/aws-cf-eks-min/blob/main/eks-min.yaml recently, but today deletion process failed and I cannot clean it up since then.
According to logs, it failed when deleting (in order):
VpcGatewayAttachement
PublicSubnet2
InternetGateway
Vpc.
I tried to manually delete these, but it seems that yet another resource BastionHostSshNetworkInterface (of type AWS::EC2::NetworkInterface) has not actually been destroyed.
I cannot detach nor destroy the interface. When trying to detach the interface via console (web page) on my master/admin account I am getting an error:
Failed to detach the network interface. API error: "You do not have permission to access the specified resource."
When trying to detach via CLI via:
aws ec2 detach-network-interface --attachment-id eni-attach-03e1... --force
I am getting similar error:
An error occurred (AuthFailure) when calling the DetachNetworkInterface operation: You do not have permission to access the specified resource.
I tried granting myself various polices, e.g. AWSNetworkManagerFullAccess and AmazonEC2FullAccess.
How can grant myself or my CLI user rights to particular network interface, so that I could detach and/or destroy it?
So the problem was that before deleting the stack I was playing with Kubernetes and setting up a service. Setting up a service on EKS results in Elastic Load Balancer being created. I did not delete my Kubernetes environment before deleting the stack, and the ELB was left orphaned but connected to VPC and prevented it from being deleted.
When I deploy using cloudformation aws cloudformation deploy --region $region --stack-name ABC
I get the error:
An error occurred (ValidationError) when calling the CreateChangeSet
operation:
Stack:arn:aws:cloudformation:stack/service/7e1d8c70-d60f-11e9-9728-0a4501e4ce4c
is in ROLLBACK_COMPLETE state and can not be updated.
This happens when stack creation fails. By default the stack will remain in place with a status of ROLLBACK_COMPLETE. This means it's successfully rolled back (deleted) all the resources which the stack had created. The only thing remaining is the empty stack itself. You cannot update this stack; you must manually delete it, after which you can attempt to deploy it again.
If you set "Rollback on failure" to disabled in the console (or set --on-failure to DO_NOTHING in the CLI command, if using create-stack), stack creation failure will instead result in a status of CREATE_FAILED. Any resources created before the point of failure won't have been rolled back.
If instead you were deploying updates to an existing (successfully created) stack, and the updates failed but were successfully rolled back, it will go back into its previous valid state (with a status of UPDATE_ROLLBACK_COMPLETE), allowing you to reattempt updates.
As #SteffenOpel points out, you can now specify that a stack should be deleted on failure by setting the --on-failure option (for create-stack only, not deploy) to DELETE in the CLI. This option is not yet available in the console at the time of writing (13/11/20).
Run the following AWS CLI command to delete your stack:
aws cloudformation delete-stack --stack-name <<stack-name>>
It may take less than a minute to delete your stack, and then try re-deploying it.
2 solutions
1.you have to manually delete all the objects in the s3
(if still th error occurs ,Stack:arn:aws:cloudformation:eu-west-3:624140032431:stack/as*****cbucket/f57c54f0-618a-11ec-afd7-06fc90426f3e is in ROLLBACK_COMPLETE state and can not be updated., move to second solution)
2.create a new bucket to continue
the case is that the S3 bucket is unique globally, same happened to me I was getting the same error while I was using the CloudFormation.
in my case, S3 bucket name was not unique in my case, it was already created, i change then name of the bucket and it worked.
I am using sam deploy command to deploy my lambda to AWS. Sometimes I get this error An error occurred (ValidationError) when calling the CreateChangeSet operation: Stack:arn:aws:cloudformation:ap-southeast-2:xxxx:stack/xxxx/xxxx is in ROLLBACK_COMPLETE state and can not be updated. I know there is a failure happens on the previous deployment. I can manually delete the stack in AWS cloundformation console and retry the command. But I wonder is there is way to force the command to delete any rollback state stack?
I know I can delete the failed stack via aws cli or console. But my deploy script is on CI and I'd like to make CI to use deploy command to override the failed stack. So the scenario is:
1. CI failed on deploy lambda function
2. My team analysis the issue and fix the issue in cloudformation template file
3. Push the fix to github to tigger the CI
4. CI is triggered and use the latest change to override the failed stack.
I don't want the team to manually delete the stack.
The ROLLBACK_COMPLETE status exists only after a failed stack creation. The only option is to delete the stack. This is to give you a chance to correctly analyze the reason behind the failure.
You can delete the stack from the command line with:
aws cloudformation delete-stack --stack-name <value>
From the documentation of ROLLBACK_COMPLETE:
Successful removal of one or more stacks after a failed stack creation or after an explicitly canceled stack creation. Any resources that were created during the create stack action are deleted.
This status exists only after a failed stack creation. It signifies that all operations from the partially created stack have been appropriately cleaned up. When in this state, only a delete operation can be performed.
Normally the ROLLBACK_COMPLETE should not happen in production. I would suggest validating your stack in a development environment or have one successful stack creation in your production environment before continuously deploying your stack.
Still, you could have a custom script in your CI that checks the stack status (DescribeStacks) and if it's ROLLBACK_COMPLETE delete it (DeleteStack). This script would run before sam deploy.
I have a CF template with a simple secret inside, like this:
Credentials:
Type: 'AWS::SecretsManager::Secret'
Properties:
Name: !Sub ${ProjectKey}.${StageName}.${ComponentId}.credentials
Description: client credentials
SecretString: !Sub
'{"client_id":"${ClientId}","client_secret":"${ClientSecret}"}'
The stack is created successfully and the secret is correctly generated.
However when I delete the stack and recreate it again I get the following error message:
The operation failed because the secret pk.stage.compid.credentials
already exists. (Service: AWSSecretsManager; Status Code: 400; Error
Code: ResourceExistsException; Request ID: ###)
I guess this is because the secret is not really deleted but only marked for deletion for x days.
It is possible to delete a secret immediately via CLI, but how can this be done within the CF Template?
I need to delete and recreate the stacks because it is part of a continous integration/delivery pipeline which is automatically triggered on source code commits.
Normally when you delete a stack the secret should be deleted also; and CFN does the aforementioned immediate delete. This should succeed even if the secret was scheduled for deletion outside of the CFN stack.
If (after your stack was deleted) the secret was created by another cloud formation stack or the same test running in another CI pipeline re-created the secret, you might see this error. Also, most AWS systems (Secrets Manager included) are eventually consistent, and you may see a delay between the stack being deleted and the actual secret deletion. If your tests run quick enough, or the same secret name is re-used in multiple tests, the previous delete may not have completed before the next create.
We have faced similar problems in our CI stacks and the way we work around it is to use a per test random name that is generated. You could, for example, pass in a random prefix to your stacks as a parameter and use that to construct the name (ensuring each test uses a unique suffix).
BTW - you can test if a secret was scheduled for deletion or is actually not there by running get-secret-value on the secret. If it is scheduled for deletion you will see the error "...You can’t perform this operation on the secret because it was deleted", whereas if the secret is actually deleted you will see "Secrets Manager can’t find the specified secret". If you schedule a secret for deletion and then delete it with --force-delete-without-recovery you may see a short multi-second lag between the two states.
Another option is to delete the secret immediately through the cli. This prevents the 7 day delay before it is is actually gone, after it is marked for deletion. This command line option does the trick:
aws secretsmanager delete-secret --secret-id your-secret --force-delete-without-recovery --region your-region
Do replace your-secret and your-region accordingly.
See this reference page: https://aws.amazon.com/premiumsupport/knowledge-center/delete-secrets-manager-secret/