Preventing destroy of resources when refactoring Terraform to use indices - amazon-web-services

When I was just starting to use Terraform, I more or less naively declared resources individually, like this:
resource "aws_cloudwatch_log_group" "image1_log" {
name = "${var.image1}-log-group"
tags = module.tagging.tags
}
resource "aws_cloudwatch_log_group" "image2_log" {
name = "${var.image2}-log-group"
tags = module.tagging.tags
}
resource "aws_cloudwatch_log_stream" "image1_stream" {
name = "${var.image1}-log-stream"
log_group_name = aws_cloudwatch_log_group.image1_log.name
}
resource "aws_cloudwatch_log_stream" "image2_stream" {
name = "${var.image2}-log-stream"
log_group_name = aws_cloudwatch_log_group.image2_log.name
}
Then, 10-20 different log groups later, I realized this wasn't going to work well as infrastructure grew. I decided to define a variable list:
variable "image_names" {
type = list(string)
default = [
"image1",
"image2"
]
}
Then I replaced the resources using indices:
resource "aws_cloudwatch_log_group" "service-log-groups" {
name = "${element(var.image_names, count.index)}-log-group"
count = length(var.image_names)
tags = module.tagging.tags
}
resource "aws_cloudwatch_log_stream" "service-log-streams" {
name = "${element(var.image_names, count.index)}-log-stream"
log_group_name = aws_cloudwatch_log_group.service-log-groups[count.index].name
count = length(var.image_names)
}
The problem here is that when I run terraform apply, I get 4 resources to add, 4 resources to destroy. I tested this with an old log group, and saw that all my logs were wiped (obviously, since the log was destroyed).
The names and other attributes of the log groups/streams are identical- I'm simply refactoring the infrastructure code to be more maintainable. How can I maintain my existing log groups without deleting them yet still refactor my code to use lists?

You'll need to move the existing resources within the Terraform state.
Try running terraform show to get the strings under which the resources are stored, this will be something like [module.xyz.]aws_cloudwatch_log_group.image1_log ...
You can move it with terraform state mv [module.xyz.]aws_cloudwatch_log_group.image1_log '[module.xyz.]aws_cloudwatch_log_group.service-log-groups[0]'.
You can choose which index to assign to each resource by changing [0] accordingly.
Delete the old resource definition for each moved resource, as Terraform would otherwise try to create a new group/stream.
Try it with the first import and check with terraform plan if the resource was moved correctly...
Also check if you need to choose some index for the image_names list jsut to be sure, but I think that won't be necessary.

Related

How can I configure Terraform to update a GCP compute engine instance template without destroying and re-creating?

I have a service deployed on GCP compute engine. It consists of a compute engine instance template, instance group, instance group manager, and load balancer + associated forwarding rules etc.
We're forced into using compute engine rather than Cloud Run or some other serverless offering due to the need for docker-in-docker for the service in question.
The deployment is managed by terraform. I have a config that looks something like this:
data "google_compute_image" "debian_image" {
family = "debian-11"
project = "debian-cloud"
}
resource "google_compute_instance_template" "my_service_template" {
name = "my_service"
machine_type = "n1-standard-1"
disk {
source_image = data.google_compute_image.debian_image.self_link
auto_delete = true
boot = true
}
...
metadata_startup_script = data.local_file.startup_script.content
metadata = {
MY_ENV_VAR = var.whatever
}
}
resource "google_compute_region_instance_group_manager" "my_service_mig" {
version {
instance_template = google_compute_instance_template.my_service_template.id
name = "primary"
}
...
}
resource "google_compute_region_backend_service" "my_service_backend" {
...
backend {
group = google_compute_region_instance_group_manager.my_service_mig.instance_group
}
}
resource "google_compute_forwarding_rule" "my_service_frontend" {
depends_on = [
google_compute_region_instance_group_manager.my_service_mig,
]
name = "my_service_ilb"
backend_service = google_compute_region_backend_service.my_service_backend.id
...
}
I'm running into issues where Terraform is unable to perform any kind of update to this service without running into conflicts. It seems that instance templates are immutable in GCP, and doing anything like updating the startup script, adding an env var, or similar forces it to be deleted and re-created.
Terraform prints info like this in that situation:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
~ update in-place
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.connectors_compute_engine.google_compute_instance_template.airbyte_translation_instance1 must be replaced
-/+ resource "google_compute_instance_template" "my_service_template" {
~ id = "projects/project/..." -> (known after apply)
~ metadata = { # forces replacement
+ "TEST" = "test"
# (1 unchanged element hidden)
}
The only solution I've found for getting out of this situation is to entirely delete the entire service and all associated entities from the load balancer down to the instance template and re-create them.
Is there some way to avoid this situation so that I'm able to change the instance template without having to manually update all the terraform config two times? At this point I'm even fine if it ends up creating some downtime for the service in question rather than a full rolling update or something since that's what's happening now anyway.
I was triggered by this issue as well.
However, according to:
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance_template#using-with-instance-group-manager
Instance Templates cannot be updated after creation with the Google
Cloud Platform API. In order to update an Instance Template, Terraform
will destroy the existing resource and create a replacement. In order
to effectively use an Instance Template resource with an Instance
Group Manager resource, it's recommended to specify
create_before_destroy in a lifecycle block. Either omit the Instance
Template name attribute, or specify a partial name with name_prefix.
I would also test and plan with this lifecycle meta argument as well:
+ lifecycle {
+ prevent_destroy = true
+ }
}
Or more realistically in your specific case, something like:
resource "google_compute_instance_template" "my_service_template" {
version {
instance_template = google_compute_instance_template.my_service_template.id
name = "primary"
}
+ lifecycle {
+ create_before_destroy = true
+ }
}
So terraform plan with either create_before_destroy or prevent_destroy = true before terraform apply on google_compute_instance_template to see results.
Ultimately, you can remove google_compute_instance_template.my_service_template.id from state file and import it back.
Some suggested workarounds in this thread:
terraform lifecycle prevent destroy

Trigger random_id resource recreation on rds instance destroy and recreate

Folks, am trying to find a way with terraform random_id resource to recreate and provide a new random value when the rds instance destroys and recreates due to a change that went in, say the username on rds has changed.
This random value am trying to attach to final_snapshot_identifier of the aws_db_instance resource so that the snapshot should have a unique value to its id everytime it gets created upon rds instance being destroyed.
Current code:
resource "random_id" "snap_id" {
byte_length = 8
}
locals {
inst_id = "test-rds-inst"
inst_snap_id = "${local.inst_id}-snap-${format("%.4s", random_id.snap_id.dec)}"
}
resource "aws_db_instance" "rds" {
.....
identifier = local.inst_id
final_snapshot_identifier = local.inst_snap_id
skip_final_snapshot = false
username = "foo"
apply_immediately = true
.....
}
output "snap_id" {
value = aws_db_instance.rds.final_snapshot_identifier
}
Output after terraform apply:
snap_id = "test-rds-inst-snap-5553"
Use case am trying out:
#1:
Modify value in rds instance to simulate a destroy & recreate:
Modify username to "foo-tmp"
terraform apply -auto-approve
Output:
snap_id = "test-rds-inst-snap-5553"
I was expecting the random_id to kick in and output a unique id, but it didn't.
Observation:
rds instance in deleting state
snapshot "test-rds-inst-snap-5553" in creating state
rds instance recreated and in available state
snapshot "test-rds-inst-snap-5553" in available state
#2:
Modify value again in rds instance to simulate a destroy & recreate:
Modify username to "foo-new"
terraform apply -auto-approve
Kind of expected below error, coz snap id didn't get a new value in prior attempt, but tired anyways..
Observation:
**Error:** error deleting DB Instance (test-rds-inst): DBSnapshotAlreadyExists: Cannot create the snapshot because a snapshot with the identifier test-rds-inst-snap-5553 already exists.
Am aware of the keepers{} map for random_id resource, but not sure on what from the rds_instance that I need to put in the map so that the random_id resource will be recreated and it ends up providing a new unique value to the snap_id suffix.
Also I feel using any attribute of rds instance in the random_id keepers, might cause a circular dependency issue. I may be wrong but haven't tried it though.
Any suggestions will be helpful. Thanks.
The easiest way to do this would be to use taint on the random_id resource, as per the documentation [1]:
To force a random result to be replaced, the taint command can be used to produce a new result on the next run.
Alternatively, looking at the example from the documentation, you could do something like:
resource "random_id" "snap_id" {
byte_length = 8
keepers {
snapshot_id = var.snapshot_id
}
}
resource "aws_db_instance" "rds" {
.....
identifier = local.inst_id
final_snapshot_identifier = random_id.snap_id.keepers.snapshot_id
skip_final_snapshot = false
username = "foo"
apply_immediately = true
.....
}
This means that until the value of the variable snapshot_id changes, the random_id will generate the same result. Not sure if that would work with locals, but you could try replacing var.snapshot_id with local.inst_snap_id. If that works, you could then name the snapshot using built-in functions like formatdate [2] and timestamp [3] to create a snapshot id which will be tied to the time when you were running apply, something like:
locals {
inst_id = "test-rds-inst"
snap_time = formatdate("YYYYMMDD", timestamp())
inst_snap_id = "${local.inst_id}-snap-${format("%.4s", random_id.snap_id.dec)}-${local.snap_time}"
}
[1] https://registry.terraform.io/providers/hashicorp/random/latest/docs#resource-keepers
[2] https://www.terraform.io/language/functions/formatdate
[3] https://www.terraform.io/language/functions/timestamp

terraform 12 count date_template not working

I'm upgrading to terraform 12 and have an asg module that references a root repository. As part of this, it uses a data.template_file resource to attach user data to the asg which is then put in to log files on the instances. The module looks as follows;
module "cef_fleet" {
source = "git::ssh://git#github.com/asg-repo.git?ref=terraform12"
user_data_rendered = data.template_file.init.rendered
instance_type = var.instance_type
ami = var.ami
etc ...
which as you can see calls the data resource;
data "template_file" "init" {
count = signum(var.cluster_size_max)
template = file("${path.module}/template-files/init.sh")
vars = {
account_name = var.account_name
aws_account_number = var.aws_account_number
this works fine in terraform 11 but when I changed to terraoform 12 and try to apply I get this error:
*Because data.template_file.init has "count" set, its attributes must be
accessed on specific instances.
For example, to correlate with indices of a referring resource, use:
data.template_file.init[count.index]*
if I change this in my module to user_data_rendered = data.template_file.init[count.index]
I then get this error;
The "count" object can be used only in "resource" and "data" blocks, and only
when the "count" argument is set.
I don't know what to do here. If I leave the .rendered in then it doesn't seem to recognise the [count.index] as I get the first error again. Has anyone any advice on what I need to do?

terraform count dependent on data from target environment

I'm getting the following error when trying to initially plan or apply a resource that is using the data values from the AWS environment to a count.
$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
------------------------------------------------------------------------
Error: Invalid count argument
on main.tf line 24, in resource "aws_efs_mount_target" "target":
24: count = length(data.aws_subnet_ids.subnets.ids)
The "count" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the count depends on.
$ terraform --version
Terraform v0.12.9
+ provider.aws v2.30.0
I tried using the target option but doesn't seem to work on data type.
$ terraform apply -target aws_subnet_ids.subnets
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
The only solution I found that works is:
remove the resource
apply the project
add the resource back
apply again
Here is a terraform config I created for testing.
provider "aws" {
version = "~> 2.0"
}
locals {
project_id = "it_broke_like_3_collar_watch"
}
terraform {
required_version = ">= 0.12"
}
resource aws_default_vpc default {
}
data aws_subnet_ids subnets {
vpc_id = aws_default_vpc.default.id
}
resource aws_efs_file_system efs {
creation_token = local.project_id
encrypted = true
}
resource aws_efs_mount_target target {
depends_on = [ aws_efs_file_system.efs ]
count = length(data.aws_subnet_ids.subnets.ids)
file_system_id = aws_efs_file_system.efs.id
subnet_id = tolist(data.aws_subnet_ids.subnets.ids)[count.index]
}
Finally figured out the answer after researching the answer by Dude0001.
Short Answer. Use the aws_vpc data source with the default argument instead of the aws_default_vpc resource. Here is the working sample with comments on the changes.
locals {
project_id = "it_broke_like_3_collar_watch"
}
terraform {
required_version = ">= 0.12"
}
// Delete this --> resource aws_default_vpc default {}
// Add this
data aws_vpc default {
default = true
}
data "aws_subnet_ids" "subnets" {
// Update this from aws_default_vpc.default.id
vpc_id = "${data.aws_vpc.default.id}"
}
resource aws_efs_file_system efs {
creation_token = local.project_id
encrypted = true
}
resource aws_efs_mount_target target {
depends_on = [ aws_efs_file_system.efs ]
count = length(data.aws_subnet_ids.subnets.ids)
file_system_id = aws_efs_file_system.efs.id
subnet_id = tolist(data.aws_subnet_ids.subnets.ids)[count.index]
}
What I couldn't figure out was why my work around of removing aws_efs_mount_target on the first apply worked. It's because after the first apply the aws_default_vpc was loaded into the state file.
So an alternate solution without making change to the original tf file would be to use the target option on the first apply:
$ terraform apply --target aws_default_vpc.default
However, I don't like this as it requires a special case on first deployment which is pretty unique for the terraform deployments I've worked with.
The aws_default_vpc isn't a resource TF can create or destroy. It is the default VPC for your account in each region that AWS creates automatically for you that is protected from being destroyed. You can only (and need to) adopt it in to management and your TF state. This will allow you to begin managing and to inspect when you run plan or apply. Otherwise, TF doesn't know what the resource is or what state it is in, and it cannot create a new one for you as it s a special type of protected resource as described above.
With that said, go get the default VPC id from the correct region you are deploying in your account. Then import it into your TF state. It should then be able to inspect and count the number of subnets.
For example
terraform import aws_default_vpc.default vpc-xxxxxx
https://www.terraform.io/docs/providers/aws/r/default_vpc.html
Using the data element for this looks a little odd to me as well. Can you change your TF script to get the count directly through the aws_default_vpc resource?

Setting "count" based on the length of an attribute on another resource

I have a fairly simple Terraform configuration, which creates a Route53 zone and then creates NS records in Cloudflare to delegate the subdomain to that zone. At present, it assumes there's always exactly four authoritative DNS servers for every Route53 zone, and creates four separate cloudflare_record resources, but I'd like to generalise that, partially because who knows if AWS will start putting a fifth authoritative server out there in the future, but also as a "test case" for more complicated stuff in the future (like AWS AZs, which I know vary in count between regions).
What I've come up with so far is:
resource "cloudflare_record" "public-zone-ns" {
domain = "example.com"
name = "${terraform.env}"
type = "NS"
ttl = "120"
count = "${length(aws_route53_zone.public-zone.name_servers)}"
value = "${lookup(aws_route53_zone.public-zone.name_servers, count.index)}"
}
resource "aws_route53_zone" "public-zone" {
name = "${terraform.env}.example.com"
}
When I run terraform plan over this, though, I get this error:
Error running plan: 1 error(s) occurred:
* cloudflare_record.public-zone-ns: cloudflare_record.public-zone-ns: value of 'count' cannot be computed
I think what that means is that because the aws_route53_zone hasn't actually be created, terraform doesn't know what length(aws_route53_zone.public-zone.name_servers) is, and therefore the interpolation into cloudflare_record.public-zone-ns.count fails and I'm screwed.
However, it seems surprising to me that Terraform would be so inflexible; surely being able to create a variable number of resources like this would be meat-and-potatoes stuff. Hard-coding the length, or creating separate resources, just seems so... limiting.
So, what am I missing? How can I create a number of resources when I don't know in advance how many I need?
Currently count not being able to be calculated is an open issue in terraform https://github.com/hashicorp/terraform/issues/12570
You could move the name servers to a variable array and then get the length of that, all in one terraform script.
I got your point, this surprised me as well. Even I added depends_on to resource cloudflare_record, it is helpless.
What you can do to pass this issue is to split it into two stacks and make sure the route 53 record is created before cloudflare record.
Stack #1
resource "aws_route53_zone" "public-zone" {
name = "${terraform.env}.example.com"
}
output "name_servers" {
value = "${aws_route53_zone.public-zone.name_servers}"
}
Stack #2
data "terraform_remote_state" "route53" {
backend = "s3"
config {
bucket = "terraform-state-prod"
key = "network/terraform.tfstate"
region = "us-east-1"
}
}
resource "cloudflare_record" "public-zone-ns" {
domain = "example.com"
name = "${terraform.env}"
type = "NS"
ttl = "120"
count = "${length(data.terraform_remote_state.route53.name_servers)}"
value = "${element(data.terraform_remote_state.route53.name_servers, count.index)}"
}