Terraform - Is the unique id in .tfstate file always never used before? - amazon-web-services

In the Terraform State File, following is a section of .tfstate file against OpenStack (which uses AWS APIs and hence AWS is the provider) :
"aws_instance.7.3"
"primary": {
"attributes": {
"id": "6b646e50-..."
Say I delete an instance/resource manually in the console.
My Terraform Configuration can be changed in such a way that Terraform Plan can either trigger a
i) re-create (+/-) or a
ii) destroy (-) and a create/add (+).
Question is : In either case, is there a possibility for the new node that is created to have the same "id" attribute as it had before in the state file ?
In other words, will "id" attribute ("instance_id" in case of GCP) be always unique throughout the infrastructure's lifecycle?
(so that I know a new node is created/re-created for sure when comparing the old tfstate file with new tfstate file against "id" or "instance_id" attribute and ensure .tfstate reflects what was said to happen in the plan.)
The reason I am checking if the .tfstate reflects the plan (exact number of creates/re-creates/destroy) is, though the "apply" happens according to the "plan", sometimes the .tfstate DOES NOT reflect that.
The reason for this is that after "apply" , terraform seems to do a
GET call to the provider to update the .tfstate file and this GET call
sometimes returns inconsistent state of the infrastucture (i.e it may
not return a node's details even though it is created and is part of the
infrastructure !).
In that case I have to let know in our automated tool that .tfstate did not happen according to the plan and so there is a possible corruption/inconsistency in the .tfstate file to fix it with manual import.

Related

How do I conditionally create an S3 bucket

Is there a way I can use a terraform data call for a bucket (perhaps created and stored in a different state file) and then in the event nothing is in data, create the resource by setting a count?
I've been doing some experiments and continually get the following:
Error: Failed getting S3 bucket (example_random_bucket_name): NotFound: Not Found
status code: 404, request id: <ID here>, host id: <host ID here>
Sample code to test (this has been modified from the original code which generated this error):
variable "bucket_name" {
default = "example_random_bucket_name"
}
data "aws_s3_bucket" "new" {
bucket = var.bucket_name
}
resource "aws_s3_bucket" "s3_bucket" {
count = try(1, data.aws_s3_bucket.new.id == "" ? 1 : 0 )
bucket = var.bucket_name
}
I feel like rather than generating an error I should get an empty result, but that's not the case.
Terraform is a desired-state system, so you can only describe what result you want, not the steps/conditions to get there.
If Terraform did allow you to decide whether to declare a bucket based on whether there is already a bucket of that name, you would create a configuration that could never converge: on the first run, it would not exist and so your configuration would declare it. But on the second run, the bucket would then exist and therefore your configuration would not declare it anymore, and so Terraform would plan to destroy it. On the third run, it would propose to create it again, and so on.
Instead, you must decide as part of your system design which Terraform configuration (or other system) is responsible for managing each object:
If you decide that a particular Terraform configuration is responsible for managing this S3 bucket then you can declare it with an unconditional aws_s3_bucket resource.
If you decide that some other system ought to manage the bucket then you'll write your configuration to somehow learn about the bucket name from elsewhere, such as by an input variable or using the aws_s3_bucket data source.
Sadly you can't do this. data sources must exist, otherwise they error out. There is no build in way in TF to check if a resource exists or not. There is nothing in between, in a sense that a resource may, or may not exist.
If you require such functionality, you have to program it yourself using External Data Source. Or maybe simpler, provide an input variable bucket_exist, so that you explicitly set it during apply.
Data sources are designed to fail this way.
However, if you use a state file from external configuration, it's possible to declare an output in the external state, based on whether the s3 bucket is managed by that state and use it in s3_bucket resource as condition.
For example, the output in external state will be empty string (not managed) or value for whatever property is useful for you. Boolean is another choice. Delete data source from this configuration and add condition to the resource based on the output.
It's your call if any such workarounds complicate or simplify your configuration.

Why does Terraform plan recreate blocks when the configuration has not changed?

I have a simple site set up on AWS and have a terraform script working to deploy it (at least from my local machine).
When I have a successful deployment through terraform apply, quite often if I then run terraform plan again (immediately after the apply) I will see changes like this:
# aws_route53_record.alias_route53_record_portal will be updated in-place
~ resource "aws_route53_record" "alias_route53_record_portal" {
fqdn = "mysite.co.uk"
id = "Z12345678UR1K1IFUBA_mysite.co.uk_A"
name = "mysite.co.uk"
records = []
ttl = 0
type = "A"
zone_id = "Z12345678UR1K1IFUBA"
- alias {
- evaluate_target_health = false -> null
- name = "d12345mkpmx9ii.cloudfront.net" -> null
- zone_id = "Z2FDTNDATAQYW2" -> null
}
+ evaluate_target_health = true
+ name = "d12345mkpmx9ii.cloudfront.net"
+ zone_id = "Z2FDTNDATAQYW2"
}
}
Why is terraform saying that some parts of resources need recreating when nothing has changed?
EDIT My actual tf resource...
resource "aws_route53_record" "alias_route53_record_portal" {
zone_id = data.aws_route53_zone.sds_zone.zone_id
name = "mysite.co.uk"
type = "A"
alias {
name = aws_cloudfront_distribution.s3_distribution.domain_name
zone_id = aws_cloudfront_distribution.s3_distribution.hosted_zone_id
evaluate_target_health = true
}
}
You have changed evaluate_target_health from false to true. Terraform will just update the fields that have changed. The reason why it's showing like this is because often times AWS doesn't provide separate APIs for each field. Since TF is showing that this resource will be updated in-place, it will touch the minimum number of resources to make this change.
The "plan" operation in Terraform first synchronizes the Terraform state with remote objects (by making calls to the remote API), and then it compares the updated state with the configuration.
Terraform (or, more accurately, the relevant Terraform provider) then generates a planned update or replace for any case where the state and the configuration disagree.
If you see a planned update for a resource whose configuration you know you haven't changed, then by process of elimination that suggests that the remote system is what has changed.
Sometimes that can happen if some other process (or a human in the admin console) changes an object that Terraform believes itself to be responsible for. In that case, the typical resolution is to ensure that each object is only managed by one system and that no-one is routinely making changes to Terraform-managed objects outside of Terraform.
One way to diagnose this would be to consult the remote system and see whether its current settings agree with your Terraform configuration. If not, that would suggest that something other than Terraform has changed the value.
A less common reason this can arise is due to a bug in the provider itself. There are two variations of this class of bug:
When creating the object, the provider doesn't correctly translate the given configuration to a remote API call, and so it ends up creating an object that doesn't match the configuration. A subsequent Terraform plan will then notice that inconsistency and plan an update to fix it. If the provider's update operation has a similar bug then this will never converge, causing the provider to repeatedly plan the same update.
Conversely, the create/update may be implemented correctly but the "refresh" operation (updating the state to match the remote system) may inaccurately translate the remote object data back to Terraform state data, causing the state to not accurately reflect the remote system. In that case, the provider will probably then find that the configuration doesn't match the state anymore, even though the state was correct after the initial create.
Both of these bugs are typically nicknamed "permadiff" by provider developers, because the symptom is Terraform seeming to plan the same change indefinitely, no matter how many times you apply it. If you think you've encountered a "permadiff" bug then usually the path forward is to report a bug in the provider's development repository so that the maintainers can investigate.
One specific variation of "permadiff" is a situation where the remote system does some sort of normalization of your given values which the provider doesn't take into account. For example, some remote systems will accept strings containing uppercase letters but will convert them to lowercase for permanent storage. If a provider doesn't take that into account, it will probably incorrectly plan to change the value back to the one containing uppercase letters again in order to try to make the state match the configuration. This subclass of bug is a normalization permadiff, which provider teams will typically address by re-implementing the remote system's normalization logic in the provider itself.
If you find a normalization permadiff then you can often work around it until the bug is fixed by figuring out what normalization the remote system expects and then manually normalizing your configuration to match it, so that the provider will then see the configuration as matching the remote system.

Is it safe to apply Terraform plan when it says the database instance must be replaced?

I'm importing the existing resources (AWS RDS) but the terraform plan command showed a summary:
#aws_db_instance.my_main_db must be replaced
+/- resource "aws_db_instance" "my_main_db" {
~ address = x
allocated_storage = x
+ apply_immediately = x
~ arn = x
~ username = x
+ password = x
(others arguments with alot of +/- and ~)
}
my_main_db is online with persistent data. My question is as the title; Is it safe for the existing database to run terrafrom apply? I don't want to lose all my customer data.
"Replace" in Terraform's terminology means to destroy the existing object and create a new one to replace it. The +/- symbol (as opposed to -/+) indicates that this particular resource will be replaced in the "create before destroy" mode, where there will briefly be two database instances existing during the operation. (This may or may not be possible in practice, depending on whether the instance name is changing as part of this operation.)
For aws_db_instance in particular, destroying an instance is equivalent to deleting the instance in the RDS console: unless you have a backup of the contents of the database, it will be lost. Even if you do have a backup, you'll need to restore it via the RDS console or API rather than with Terraform because Terraform doesn't know about the backup/restore mechanism and so its idea of "create" is to produce an entirely new, empty database.
To sum up: applying a plan like this directly is certainly not generally "safe", because Terraform is planning to destroy your database and all of the contents along with it.
If you need to make changes to your database that cannot be performed without creating an entirely new RDS instance, you'll usually need to make those changes outside of Terraform using RDS-specific tools so that you can implement some process for transferring data between the old and new instances, whether that be backup and then restore (which will require a temporary outage) or temporarily running both instances and setting up replication from old to new until you are ready to shut off the old one. The details of such a migration are outside of Terraform's scope, because they are specific to whatever database engine you are using.
It's most likely not safe, but really only someone familiar with the application can make that decision. Look at the properties and what is going to change or be recreated. Unless you are comfortable with all of those properties changing, then it's not safe.

How to conditionally update a resource in Terraform

Seems it's common practice to make use of count on a resource to conditionally create it in Terraform using a ternary statement.
I'd like to conditionally update an AWS Route 53 entry based on a push_to_prod variable. Meaning I don't want to delete the resource if I'm not pushing to production, I only want to update it, or leave the CNAME value as it is.
Has anyone done something like this before in Terraform?
Currently as it stands interpolation syntax isn't supported in lifecycle tags. You can read more here. Which will make this harder because you could use the "Prevent Destroy". However, without more specifics I am going to take my best guess on how to get your there.
I would use the allow_overwrite property on the Route53 record and set that based on your flag. That way if you are pushing to prod you can set it it false. Which should trigger creating a new one. I haven't tested that.
Also note that if you don't make any changes to the Route53 resource it should trigger any changes in Terraform to be applied. So updating any part of the record will trigger the deployment.
You may want to combine this with some lifecycle events, but I don't have enough time to dig into that specific resource and how it happens.
Two examples I can think of are:
type = "${var.push_to_prod == "true" ? "CNAME" : var.other_value}" - this will have a fixed other_value, there is no way to have terraform "ignore" the resource once it's being managed by terraform.
or
type = "${var.aws_route53_record_type}" and you can have dev.tfvars and prod.tfvars, with aws_route53_record_type defined as whatever you want for dev and CNAME for prod.
The thing is with what you're trying to do, "I only want to update it, or leave the CNAME value as it is.", that's not how terraform works. Terraform either manages the resource for you or it doesn't. If it's managing it, it'll update the resource based on the config you've defined in your .tf file. If it's not managing the resource it won't modify it. It sounds like what you're really after is the second solution where you pass in two different configs from your .tfvars file into your .tf file and based off the different configs, different resources are created. You can couple this with count to determine if a resource should be created or not.

How to import manual changes into Terraform remote state

I am new to terraform - I have created remote tfstate in s3, and now there are some manual changes too that are done in my AWS infrastructure. I need to import those manual changes into tfstate.
I used the import command for some resources, but for some resources such as IAM policy etc, there is no such import command.
Also some resources such as DB are changed with new parameters added, and I need to import them as well. When I try to import those changes it says:
Error importing: 1 error(s) occurred:
* Can't import aws_security_group.Q8SgProdAdminSshInt, would collide
with an existing resource.
Please remove or rename this resource before continuing.
Any help would be appreciated. Thanks.
Before directly answering this question I think some context would help:
Behind the scenes, Terraform maintains a state file that contains a mapping from the resources in your configuration to the objects in the underlying provider API. When you create a new object with Terraform, the id of the object that was created is automatically saved in the state so that future commands can locate the referenced object for read, update, and delete operations.
terraform import, then, is a different way to create an entry in the state file. Rather than creating a new object and recording its id, instead the user provides an id on the command line. Terraform reads the object with that id and adds the result to the state file, after which it is indistinguishable in the state from a resource that Terraform created itself.
So with all of that said, let's address your questions one-by-one.
Importing Resources That Don't Support terraform import
Since each resource requires a small amount of validation and data-fetching code to do an import, not all resources are supported for import at this time.
Given what we know about what terraform import does from the above, in theory it's possible to skip Terraform's validation of the provided id and instead manually add the resource to the state. This is an advanced operation and must be done with care to avoid corrupting the state.
First, retrieve the state into a local file that you'll use for your local work:
terraform state pull >manual-import.tfstate
This will create a file manual-import.tfstate that you can open in a text editor. It uses JSON syntax, so though its internal structure is not documented as a stable format we can carefully edit it as long as we remain consistent with the expected structure.
It's simplest to locate an existing resource that is in the same module as where you want to import and duplicate and edit it. Let's assume we have a resources object like this:
"resources": {
"null_resource.foo": {
"type": "null_resource",
"depends_on": [],
"primary": {
"id": "5897853859325638329",
"attributes": {
"id": "5897853859325638329"
},
"meta": {},
"tainted": false
},
"deposed": [],
"provider": ""
}
},
Each attribute within this resources object corresponds to a resource in your configuration. The attribute name is the type and name of the resource. In this case, the resource type is null_resource and the attribute name is foo. In your case you might see something like aws_instance.server here.
The id attributes are, for many resources (but not all!), the main thing that needs to be populated. So we can duplicate this structure for a hypothetical IAM policy:
"resources": {
"null_resource.foo": {
"type": "null_resource",
"depends_on": [],
"primary": {
"id": "5897853859325638329",
"attributes": {
"id": "5897853859325638329"
},
"meta": {},
"tainted": false
},
"deposed": [],
"provider": ""
},
"aws_iam_policy.example": {
"type": "aws_iam_policy",
"depends_on": [],
"primary": {
"id": "?????",
"attributes": {
"id": "?????"
},
"meta": {},
"tainted": false
},
"deposed": [],
"provider": ""
}
},
The challenge at this step is to figure out what sort of id this resource requires. The only sure-fire way to know this is to read the code, which tells me that this resource expects the id to be the full ARN of the policy.
With that knowledge, we replace the two ????? sequences in the above example with the ARN of the policy we want to import.
After making manual changes to the state it's necessary to update the serial number at the top-level of the file. Terraform expects that any new change will have a higher serial number, so we can increment this number.
After completing the updates, we must upload the updated state file back into Terraform:
terraform state push manual-import.tfstate
Finally we can ask Terraform to refresh the state to make sure it worked:
terraform refresh
Again, this is a pretty risky process since the state file is Terraform's record of its relationship with the underlying system and it can be hard to recover if the content of this file is lost. It's often easier to simply replace a resource than to go to all of this effort, unless it's already serving a critical role in your infrastructure and there is no graceful migration strategy available.
Imports Colliding With Existing Resources
The error message given in your question is talking about an import "colliding" with an existing resource:
Error importing: 1 error(s) occurred:
* Can't import aws_security_group.Q8SgProdAdminSshInt, would collide with an existing resource.
Please remove or rename this resource before continuing.
The meaning of this message is that when Terraform tried to write the new resource to the state file it found a resource entry already present for the name aws_security_group.Q8SgProdAdminSshInt. This suggests that either it was already imported or that a new security group was already created by Terraform itself.
You can inspect the attributes of the existing resource in state:
terraform state show aws_security_group.Q8SgProdAdminSshInt
Compare the data returned with the security group you were trying to import. If the ids match then there's nothing left to do, since the resource was already imported.
If the ids don't match then you need to figure out which of the two objects is the one you want to keep. If you'd like to keep the one that Terraform already has, you can manually delete the one you were trying to import.
If you'd like to keep the one you were trying to import instead, you can drop the unwanted one from the Terraform state to make way for the import to succeed:
terraform state rm aws_security_group.Q8SgProdAdminSshInt
Note that this just makes Terraform "forget" the resource; it will still exist in EC2, and will need to be deleted manually via the console, command line tools, or API. Be sure to note down its id before deleting it to ensure that you can find it in order to to clean it up.
For resources you have to use import function or manually add terraform state like block..
or if there is any change in config like you mentioned dB configs.. if the dB resource is managed by terraform remote state.. terraform refresh will help you..