Terraform fails because tfstate (S3 backend) is lost

Terraform fails because tfstate (S3 backend) is lost - amazon-web-services

I am creating AWS infrastructure using Terraform, and using S3 backend configuration. Now the issue is, someone deleted the S3 bucket storing the state, and now every time I run terraform it fails saying the resources already exist. The old tfstate is lost, and new has no information about existing resources.
Note: I do not have write access to the AWS environment. I trigger terraform via Jenkins CD pipeline, so I cannot manually modify the infrastructure or run any terraform command.
Is there a way to cleanup existing resources or force recreating resources(if they already exist) with tf file? This is the only place I can make changes.

You really are in a mess. You need to restore the S3 bucket or make a new one and point your code at that.
You then need to recreate the state you lost, that or delete every object you created via Terraform and start again. Most objects have the ability to import existing objects via the Terraform import command.
This could be a significantly large task.
And you'd be needing write access to the bucket? Terraform refresh is only going to help if you still had the state file. You don't.
If you haven't got permission to do that, then maybe give up that or persist in getting sufficient privilege.
If you can't run Terraform locally then you are also wasting your time.
Good luck.
However....
You don't want to be here again. How did you delete/lose the bucket?
You really need that never to happen again as #ydaetskcoR said some MFA protection on the bucket - definitely do that and adding versioning to it is a REALLY good idea.
Also if you haven't added DynamoDB locking to the bucket do so, its really worth it.
This may also be a good time to think/dwell about tagging your infrastructure, then you might be able to identify what infra belonged to the code. That would also help - well next time.

If you're working in a production account, follow the advice of others and do not mess with anything in the account manually!
If you are just starting out with terraform or terragrunt and you're trying to reset the terraform state:
Ensure that you are logged into the correct AWS account, it is not a production account, and it is not in use by anyone else
Terraform state is saved in two places under the same name: S3 and DynamoDB. In order to reset the state, you will need to delete both.
If there is anything else that was previously created in the account, you will need to delete those manually. Depending on how much you created, this could take a very long time.
Once you have deleted the S3 and DynamoDB named after your terraform state bucket and deleted the infrastructure created by terraform, you should be able to terraform init and terraform plan without further issue.

Related

Ignoring already configured resources in Terraform - AWS

There is a terraform code to configure an MWAA Environment in AWS. When it runs second time, no need to create IAM role or policy again. So it gives an error.
How to ignore the creation of existing resources in TF?

I assume that you applied a Terraform plan which created resource "MWAA", then you somehow lost the state (locally stored and lost?, or the state wasn't shared with a different client?), then you re-apply the plan again, and Terraform informs you that it created "MWAA", again.
In that case, your main problem is that you lost the state, and you need to make sure that you do persist it, e.g., by storing it in a bucket.
However, if you really need to make Terraform aware about an already created resource, you need to put it in Terraform's state. One tool to do that is "terraform import", about which you can read more here: https://www.terraform.io/cli/import

If you already have the statefile and if terraform is trying to re-install it again, then may be some tag change or modified timestamp value change...
In order to avoid it, you can specify the resource you want to apply using a terraform apply command..
terraform apply --target=resource

CDK deployment and least privilege principle

We're (mostly happily ;)) using the AWS CDK to deploy our application stack to multiple environments (e.g. production, centralized dev, individual dev).
Now we want to increase the security by applying the least privilege principle to the deployment role. As the CDK code already has all the information about which services it will touch, is there a best practice as to how to generate the role definition?
Obviously it can't be a part of the stack as it is needed to deploy the stack.
Is there any mechanism built in to the CDK (e.g. construct CloudFrontDistribution is used thus the deployment role needs to have the permission to create, update and delete CloudFrontDistributions - possibly even after the CloudFrontDistribution is mapped to only do that to that one distribution).
Any best practices as how to achieve that?

No. Sadly there isn't currently (2022-Q3) a way to have the CDK code also provide a IAM policy that would grant you access to run that template and nothing more.
However, everything is there to do it, and thanks to aspects it could probably be done relatively easily if you wanted to put in the leg work. I know many people in the community would love to have this.

You run into a chicken and an egg problem here. (We encounter a similar issue with Secret Manager and initializing secrets) pretty much the only solution I've found that works is a first time setup script that uses an SDK or the CLI to run the necessary commands for that first time setup. Then you can reference that beyond there.
However, it also depends on what roles you're taking about. Cdk deploy pretty much needs access to any given resource you may be setting up - but you can limit it through users. Your kept in a secret lock box root admin setup script can setup a single power user, that can then be used for initial cdk deploys. You can set up additional user groups that have the ability to deploy cdk or have that initial setup create a cdk role that cdk deploy can assume.

How can i create s3 bucket and object (like upload an shell script file) pro grammatically

I have to do this for almost 100 account so planning to create using something infra as code. Cloud formation does not support creating object.. can anyone help

There are several strategies, depending on the client environment.
The aws-cli may be used for shell scripting, aws-sdk for JavaScript environments, or Boto3 for python environments.
If you provide the client environment, creating an S3 object is almost a one-liner holding equal s3 bucket security and lifecycle matters.

As Rich Andrew said, there are several different technologies. If you are trying to do infrastructure as code and attach policies and roles I would suggest you look into Terraform or Serverless.

I frequently combine two of the techniques already mentioned above.
For infrastructure setup - Terraform. This tool is always ahead of competition (Ansible, etc.) in terms of cloud modules. You can use it to create bucket, create bucket policies, users, their IAM policies for bucket access, upload initial files to bucket and much more.
It will keep state file containing record of those resources, so you can use the same workflow to destroy all that's created if necessary with very little modifications.
Very easy to get started, but not flexible and you can be caught out if scope change in middle of project suddenly requires feature that's not there.
To get started check out Terrafrom module registry - https://registry.terraform.io/.
It has quite a few S3 modules available to get started even quicker.
For interaction with aws resources - Python Boto3. In your case that would be subsequent file uploads, deletions in S3 bucket.
You can use Boto3 to set up infrastructure - just like Terraform, but it will require more work on your side (like handling exceptions and errors).

how to update terraform state with manual change done on resources

i had provisioned some resources over AWS which includes EC2 instance as well,but then after that we had attached some extra security groups to these instances which now been detected by terraform and it say's it'll rollback it as per the configuration file.
Let's say i had below code which attaches SG to my EC2
vpc_security_group_ids = ["sg-xxxx"]
but now my problem is how can i update the terraform.tfstate file so that it should not detach manually attached security groups :
I can solve it as below:
i would refresh terraform state file with terraform refresh which will update the state file.
then i have to update my terraform configuration file manually with security group id's that were attached manually
but that possible for a small kind of setup what if we have a complex scenario, so do we have any other mechanism in terraform which would detect the drift and update it
THanks !!

There is no way Terraform will update your source code when detecting a drift on AWS.
The process you mention is right:
Report manual changes done in AWS into the Terraform code
Do a terraform plan. It will refresh the state and show you if there is still a difference

You can use terraform import with the id to import the remote changes to your terraform state file. Later use terraform plan to check if the change is reflected in the code.

This can be achieved by updating terraform state file manually but it is not best practice to update this file manually.
Also, if you are updating your AWS resources (created by Terraform) manually or outside terraform code then it defeats the whole purpose of Infrastructure as Code.
If you are looking to manage complex infrastructure on AWS using Terraform then it is very good to follow best practices and one of them is all changes should be done via code.
Hope this helps.

terraform import <resource>.<resource_name> [unique_id_from_aws]
You may need to temporarily comment out any provider/resource that relies on the output of the manually created resource.
After running the above, un-comment the dependencies and run terraform refresh.

The accepted answer is technically not correct.
As per my testing:
Terraform refresh will update the state file with current live configuration
Terraform plan will only internally update with the live configuration and compare to the code, but not actually update the state file
Terraform apply will update the state file to current live configuration, even if it says no changes to apply (use case = manual change then update TF code to reflect change and now want to update state file)

What AWS Resources Does Terraform Know About

Recently, we had issues with tfstate being deleted on S3.
As a result, there are a number of EC2 instances still running (duplicates if you will)
Is there a way to query Terraform and list which EC2 instances (and other resources) Terraform has under its control? I want to delete the duplicate AWS resources without messing up Terraform state.

Depending on whether you care about availability you could just delete everything and let Terraform recreate it all.
Or you could use terraform state list and then iterate through that with terraform state show (eg. terraform state list | xargs terraform state show) to show everything.
terraform import is for importing stuff that exists back in to your state which doesn't sound like what you want because it sounds like you've already recreated some things so have duplicates. If you had caught the loss of the resources from your state file before Terraform recreated it (for example by seeing an unexpected creation in the plan and seeing that the resource already existed in the AWS console) then you could have used that to import the resources back into the state file so that Terraform would then show an empty plan for these resources.
Iin the future make sure you use state locking to prevent this from happening again!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Terraform fails because tfstate (S3 backend) is lost - amazon-web-services

Related

Ignoring already configured resources in Terraform - AWS

CDK deployment and least privilege principle

How can i create s3 bucket and object (like upload an shell script file) pro grammatically

how to update terraform state with manual change done on resources

What AWS Resources Does Terraform Know About

Categories

Resources