Terraform wants to replace Google compute engine if its start/stop scheduler is modified - google-cloud-platform

First of all, I am surprised that I have found very few resources on Google that mention this issue with Terraform.
This is an essential feature for optimizing the cost of cloud instances though, so I'm probably missing out on a few things, thanks for your tips and ideas!
I want to create an instance and manage its start and stop daily, programmatically.
The resource "google_compute_resource_policy" seems to meet my use case. However, when I change the stop or start time, Terraform plans to destroy and recreate the instance... which I absolutely don't want!
The resource "google_compute_resource_policy" is attached to the instance via the argument resource_policies where it is specified: "Modifying this list will cause the instance to recreate."
I don't understand why Terraform handles this simple update so badly. It is true that it is not possible to update a scheduler, whereas it is perfectly possible to detach it manually from the instance, then to destroy it before recreating it with the new stop/start schedule and the attach to the instance again.
Is there a workaround without going through a null resource to run a gcloud script to do these steps?
I tried to add an "ignore_changes" lifecycle on the "resource_policies" argument of my instance, Terraform no longer wants to destroy my instance, but it gives me the following error:
Error when reading or editing ResourcePolicy: googleapi: Error 400: The resource_policy resource 'projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule' is already being used by 'projects/my-project-id/zones/europe-west1-b/instances/my-instance', resourceInUseByAnotherResource"
Here is my Terraform code
resource "google_compute_resource_policy" "instance_schedule" {
name = "my-instance-schedule"
region = var.region
description = "Start and stop instance"
instance_schedule_policy {
vm_start_schedule {
schedule = var.vm_start_schedule
}
vm_stop_schedule {
schedule = var.vm_stop_schedule
}
time_zone = "Europe/Paris"
}
}
resource "google_compute_instance" "my-instance" {
// ******** This is my attempted workaround ********
lifecycle {
ignore_changes = [resource_policies]
}
name = "my-instance"
machine_type = var.machine_type
zone = "${var.region}-b"
allow_stopping_for_update = true
resource_policies = [
google_compute_resource_policy.instance_schedule.id
]
boot_disk {
device_name = local.ref_name
initialize_params {
image = var.boot_disk_image
type = var.disk_type
size = var.disk_size
}
}
network_interface {
network = data.google_compute_network.default.name
access_config {
nat_ip = google_compute_address.static.address
}
}
}
If it can be useful, here is what the terraform apply returns
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
- destroy
-/+ destroy and then create replacement
Terraform will perform the following actions:
# google_compute_resource_policy.instance_schedule must be replaced
-/+ resource "google_compute_resource_policy" "instance_schedule" {
~ id = "projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule" -> (known after apply)
name = "my-instance-schedule"
~ project = "my-project-id" -> (known after apply)
~ region = "https://www.googleapis.com/compute/v1/projects/my-project-id/regions/europe-west1" -> "europe-west1"
~ self_link = "https://www.googleapis.com/compute/v1/projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule" -> (known after apply)
# (1 unchanged attribute hidden)
~ instance_schedule_policy {
# (1 unchanged attribute hidden)
~ vm_start_schedule {
~ schedule = "0 9 * * *" -> "0 8 * * *" # forces replacement
}
# (1 unchanged block hidden)
}
}
Plan: 1 to add, 0 to change, 1 to destroy.
Do you want to perform these actions in workspace "prd"?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
google_compute_resource_policy.instance_schedule: Destroying... [id=projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule]
Error: Error when reading or editing ResourcePolicy: googleapi: Error 400: The resource_policy resource 'projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule' is already being used by 'projects/my-project-id/zones/europe-west1-b/instances/my-instance', resourceInUseByAnotherResource
NB: I am working with Terraform 0.14.7 and I am using google provider version 3.76.0

An instance inside GCP can be power off without destroy it with the module google_compute_instance using the argument desired_status, keep in mind that if you are creating the instance for the first time this argument needs to be on “RUNNING”. This module can be used as the following.
resource "google_compute_instance" "default" {
name = "test"
machine_type = "f1-micro"
zone = "us-west1-a"
desired_status = "RUNNING"
}
You can also modify your “main.tf” file if you need to stop the VM first and then started creating a dependency in terraform with depends_on.
As you can see in the following comment, the service account will be created but the key will be assigned until the first sentence is done.
resource "google_service_account" "service_account" {
account_id = "terraform-test"
display_name = "Service Account"
}
resource "google_service_account_key" "mykey" {
service_account_id = google_service_account.service_account.id
public_key_type = "TYPE_X509_PEM_FILE"
depends_on = [google_service_account.service_account]
}
If the first component already exists, terraform only deploys the dependent.

I faced same problem with snapshot policy.
I controlled resource policy creation using a flag input variable and using count. For the first time, I created policy resource using flag as 'true'. When I want to change schedule time, I change the flag as 'false' and apply the plan. This will detach the resource.
I then make flag as 'true' again and apply the plan with new time.
This worked for me for snapshot policy. Hope it could solve yours too.

I solved the "resourceInUseByAnotherResource" error by adding the following lifecycle to the google_compute_resource_policy resource:
lifecycle {
create_before_destroy = true
}
Also, this requires to have a unique name with each change, otherwise, the new resource can't be created, because the resource with the same name already exists. So I appended a random ID to the end of the schedule name:
resource "random_pet" "schedule" {
keepers = {
start_schedule = "${var.vm_start_schedule}"
stop_schedule = "${var.vm_stop_schedule}"
}
}
...
resource "google_compute_resource_policy" "schedule" {
name = "schedule-${random_pet.schedule.id}"
...
lifecycle {
create_before_destroy = true
}
}

Related

How can I configure Terraform to update a GCP compute engine instance template without destroying and re-creating?

I have a service deployed on GCP compute engine. It consists of a compute engine instance template, instance group, instance group manager, and load balancer + associated forwarding rules etc.
We're forced into using compute engine rather than Cloud Run or some other serverless offering due to the need for docker-in-docker for the service in question.
The deployment is managed by terraform. I have a config that looks something like this:
data "google_compute_image" "debian_image" {
family = "debian-11"
project = "debian-cloud"
}
resource "google_compute_instance_template" "my_service_template" {
name = "my_service"
machine_type = "n1-standard-1"
disk {
source_image = data.google_compute_image.debian_image.self_link
auto_delete = true
boot = true
}
...
metadata_startup_script = data.local_file.startup_script.content
metadata = {
MY_ENV_VAR = var.whatever
}
}
resource "google_compute_region_instance_group_manager" "my_service_mig" {
version {
instance_template = google_compute_instance_template.my_service_template.id
name = "primary"
}
...
}
resource "google_compute_region_backend_service" "my_service_backend" {
...
backend {
group = google_compute_region_instance_group_manager.my_service_mig.instance_group
}
}
resource "google_compute_forwarding_rule" "my_service_frontend" {
depends_on = [
google_compute_region_instance_group_manager.my_service_mig,
]
name = "my_service_ilb"
backend_service = google_compute_region_backend_service.my_service_backend.id
...
}
I'm running into issues where Terraform is unable to perform any kind of update to this service without running into conflicts. It seems that instance templates are immutable in GCP, and doing anything like updating the startup script, adding an env var, or similar forces it to be deleted and re-created.
Terraform prints info like this in that situation:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
~ update in-place
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.connectors_compute_engine.google_compute_instance_template.airbyte_translation_instance1 must be replaced
-/+ resource "google_compute_instance_template" "my_service_template" {
~ id = "projects/project/..." -> (known after apply)
~ metadata = { # forces replacement
+ "TEST" = "test"
# (1 unchanged element hidden)
}
The only solution I've found for getting out of this situation is to entirely delete the entire service and all associated entities from the load balancer down to the instance template and re-create them.
Is there some way to avoid this situation so that I'm able to change the instance template without having to manually update all the terraform config two times? At this point I'm even fine if it ends up creating some downtime for the service in question rather than a full rolling update or something since that's what's happening now anyway.
I was triggered by this issue as well.
However, according to:
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance_template#using-with-instance-group-manager
Instance Templates cannot be updated after creation with the Google
Cloud Platform API. In order to update an Instance Template, Terraform
will destroy the existing resource and create a replacement. In order
to effectively use an Instance Template resource with an Instance
Group Manager resource, it's recommended to specify
create_before_destroy in a lifecycle block. Either omit the Instance
Template name attribute, or specify a partial name with name_prefix.
I would also test and plan with this lifecycle meta argument as well:
+ lifecycle {
+ prevent_destroy = true
+ }
}
Or more realistically in your specific case, something like:
resource "google_compute_instance_template" "my_service_template" {
version {
instance_template = google_compute_instance_template.my_service_template.id
name = "primary"
}
+ lifecycle {
+ create_before_destroy = true
+ }
}
So terraform plan with either create_before_destroy or prevent_destroy = true before terraform apply on google_compute_instance_template to see results.
Ultimately, you can remove google_compute_instance_template.my_service_template.id from state file and import it back.
Some suggested workarounds in this thread:
terraform lifecycle prevent destroy

How to recreate aws_rds_cluster in Terraform

I am trying to create an encrypted version of my currently existing unencrypted aws_rds_cluster by updating my resource, I added:
kms_key_id = "mykmskey"
storage_encrypted = true
This is how my resource should look like:
resource "aws_rds_cluster" "my_rds_cluster" {
cluster_identifier = "${var.service_name}-rds-cluster"
database_name = var.db_name
master_username = var.db_username
master_password = random_password.db_password.result
engine = var.db_engine
engine_version = var.db_engine_version
kms_key_id = "mykmskey"
storage_encrypted = true
db_subnet_group_name = aws_db_subnet_group.fleet_service_db_subnet_group.name
vpc_security_group_ids = [aws_security_group.fleet_service_service_db_security_group.id]
skip_final_snapshot = true
backup_retention_period = var.environment != "prod" ? null : 7
# snapshot_identifier = "my-rds-instance-snapshot"
tags = { Name = "${var.service_name}-rds-cluster" }
}
The problem is that the original resource had delete_protection = true defined, which I also removed but, even though I removed it the original cluster cannot be deleted by any means in order for the new one to be created, neither through changes in Terraform, nor manually in AWS console, it just throws an error like:
error creating RDS cluster: DBClusterAlreadyExistsFault: DB Cluster already exists
Any ideas what to do in such cases?
To do that purely through Terraform, you would have to:
Remove deletion protection from the original Terraform resource
Run terraform apply, which will remove deletion protection from the actual resource in AWS
Make the modifications to the Terraform resource that will result in a delete or replace of the current resource
Run terraform apply again, during which time Terraform will now delete and/or replace the resource.
The key thing here being that you can't remove deleting protection at the same time you are actually deleting a resource, because Terraform isn't going to update an existing resource to modify an attribute before attempting to delete the resource.

What happens if running "terraform apply" twice? (Terraform)

What happens if running "terraform apply" twice?
Does it create all the resources twice?
I'm assuming that when you say "terraform deploy" here you mean running the terraform apply command.
The first time you run terraform apply against an entirely new configuration, Terraform will propose to create new objects corresponding with each of the resource instances you declared in the configuration. If you accept the plan and thus allow Terraform to really apply it, Terraform will create each of those objects and record information about them in the Terraform state.
If you then run terraform apply again, Terraform will compare your configuration with the state to see if there are any differences. This time, Terraform will propose changes only if the configuration doesn't match the existing objects that are recorded in the state. If you accept that plan then Terraform will take each of the actions it proposed, which can be a mixture of different action types: update, create, destroy.
This means that in order to use Terraform successfully you need to make sure to keep the state snapshots safe between Terraform runs. With no special configuration at all Terraform will by default save the state in a local file called terraform.tfstate, but when you are using Terraform in production you'll typically use remote state, which is a way to tell Terraform to store state snapshots in a remote data store separate from the computer where you are running Terraform. By storing the state in a location that all of your coworkers can access, you can collaborate together.
If you use Terraform Cloud, a complementary hosted service provided by HashiCorp, you can configure Terraform to store the state snapshots in Terraform Cloud itself. Terraform Cloud has various other capabilities too, such as running Terraform in a remote execution environment so that everyone who uses that environment can be sure to run Terraform with a consistent set of environment variables stored remotely.
If you run the terraform apply command first time, it will create the necessary resource which was in terraform plan.
If you run the terraform apply command second time, it will try to check if that resource already exist there or not. If found then will not create any duplicate resource.
Before running the terraform apply for the second time, if you run terraform plan you will get the list of change/create/delete list.
Apr, 2022 Update:
The first run of "terraform apply" creates(adds) resources.
The second or later run of "terraform apply" creates(adds), updates(changes) or deletes(destroys) existed resources if there are changes for them. Plus, basically when changing the mutable value of an existed resource, its existed resource is updated rather than deleted then created and basically when changing the immutable value of an existed resource, its existed resource is deleted then created rather than updated.
*A mutable value is the value which can change after creating a resource.
*An immutable values is the value which cannot change after creating a resource.
For example, I create(add) the Cloud Storage bucket "kai_bucket" with the Terraform code below:
resource "google_storage_bucket" "bucket" {
name = "kai_bucket"
location = "ASIA-NORTHEAST1"
force_destroy = true
uniform_bucket_level_access = true
}
So, do the first run of the command below:
terraform apply -auto-approve
Then, one resource "kai_bucket" is created(added) as shown below:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# google_storage_bucket.bucket will be created
+ resource "google_storage_bucket" "bucket" {
+ force_destroy = true
+ id = (known after apply)
+ location = "ASIA-NORTHEAST1"
+ name = "kai_bucket"
+ project = (known after apply)
+ self_link = (known after apply)
+ storage_class = "STANDARD"
+ uniform_bucket_level_access = true
+ url = (known after apply)
}
Plan: 1 to add, 0 to change, 0 to destroy.
google_storage_bucket.bucket: Creating...
google_storage_bucket.bucket: Creation complete after 1s [id=kai_bucket]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
Now, I change the mutable value "uniform_bucket_level_access" from "true" to "false":
resource "google_storage_bucket" "bucket" {
name = "kai_bucket"
location = "ASIA-NORTHEAST1"
force_destroy = true
uniform_bucket_level_access = false # Here
}
Then, do the second run of the command below:
terraform apply -auto-approve
Then, "uniform_bucket_level_access" is updated(changed) from "true" to "false" as shown below:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# google_storage_bucket.bucket will be updated in-place
~ resource "google_storage_bucket" "bucket" {
id = "kai_bucket"
name = "kai_bucket"
~ uniform_bucket_level_access = true -> false
# (9 unchanged attributes hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
google_storage_bucket.bucket: Modifying... [id=kai_bucket]
google_storage_bucket.bucket: Modifications complete after 1s [id=kai_bucket]
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
Now, I change the immutable value "location" from "ASIA-NORTHEAST1" to "US-EAST1":
resource "google_storage_bucket" "bucket" {
name = "kai_bucket"
location = "US-EAST1" # Here
force_destroy = true
uniform_bucket_level_access = false
}
Then, do the third run of the command below:
terraform apply -auto-approve
Then, one resource "kai_bucket" with "ASIA-NORTHEAST1" is deleted(destroyed) then one resource "kai_bucket" with "US-EAST1" is created(added) as shown below:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
# google_storage_bucket.bucket must be replaced
-/+ resource "google_storage_bucket" "bucket" {
- default_event_based_hold = false -> null
~ id = "kai_bucket" -> (known after apply)
- labels = {} -> null
~ location = "ASIA-NORTHEAST1" -> "US-EAST1" # forces replacement
name = "kai_bucket"
~ project = "myproject-272234" -> (known after apply)
- requester_pays = false -> null
~ self_link = "https://www.googleapis.com/storage/v1/b/kai_bucket" -> (known after apply)
~ url = "gs://kai_bucket" -> (known after apply)
# (3 unchanged attributes hidden)
}
Plan: 1 to add, 0 to change, 1 to destroy.
google_storage_bucket.bucket: Destroying... [id=kai_bucket]
google_storage_bucket.bucket: Destruction complete after 1s
google_storage_bucket.bucket: Creating...
google_storage_bucket.bucket: Creation complete after 1s [id=kai_bucket]
Apply complete! Resources: 1 added, 0 changed, 1 destroyed.

How to set auto-delete option for additional attached_disk in gcp instance uing terraform?

I am trying to create a vm instance in gcp with a boot_disk and additional attached_disk using terraform. I could not find any parameter to auto delete the additional attached_disk when instance is deleted.
auto-delete option is availble in gcp console.
Terraform code:
resource "google_compute_disk" "elastic-disk" {
count = var.no_of_elastic_intances
name = "elastic-disk-${count.index+1}-data"
type = "pd-standard"
size = "10"
}
resource "google_compute_instance" "elastic" {
count = var.no_of_elastic_intances
name = "${var.elastic_instance_name_prefix}-${count.index+1}"
machine_type = var.elastic_instance_machine_type
boot_disk {
auto_delete = true
mode = "READ_WRITE"
initialize_params {
image = var.elastic_instance_image_type
type = var.elastic_instance_disc_type
size = var.elasitc_instance_disc_size
}
}
attached_disk {
source = "${element(google_compute_disk.elastic-disk.*.self_link, count.index)}"
mode = "READ_WRITE"
}
network_interface {
network = var.elastic_instance_network
access_config {
}
}
}
The feature to set auto-delete for attached disks is not supported. HashiCorp/Google decided to not support this feature for Terraform.
Reference this issue:
If Terraform were told to remove the instance, but not the disks, and
auto-delete were enabled, then it would not specifically delete the
disks, but they would still be deleted by GCP. This behaviour would
not be shown in a plan run, and so could lead to unwanted outcomes, as
well as the state still showing the disks existing.
My opinion is that Terraform should manage the entire lifecycle from creation to destruction. For disks that you want to attach to a new instance, create those disks as part of your Terraform HCL and destroy them as part of your HCL.

terraform count dependent on data from target environment

I'm getting the following error when trying to initially plan or apply a resource that is using the data values from the AWS environment to a count.
$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
------------------------------------------------------------------------
Error: Invalid count argument
on main.tf line 24, in resource "aws_efs_mount_target" "target":
24: count = length(data.aws_subnet_ids.subnets.ids)
The "count" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the count depends on.
$ terraform --version
Terraform v0.12.9
+ provider.aws v2.30.0
I tried using the target option but doesn't seem to work on data type.
$ terraform apply -target aws_subnet_ids.subnets
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
The only solution I found that works is:
remove the resource
apply the project
add the resource back
apply again
Here is a terraform config I created for testing.
provider "aws" {
version = "~> 2.0"
}
locals {
project_id = "it_broke_like_3_collar_watch"
}
terraform {
required_version = ">= 0.12"
}
resource aws_default_vpc default {
}
data aws_subnet_ids subnets {
vpc_id = aws_default_vpc.default.id
}
resource aws_efs_file_system efs {
creation_token = local.project_id
encrypted = true
}
resource aws_efs_mount_target target {
depends_on = [ aws_efs_file_system.efs ]
count = length(data.aws_subnet_ids.subnets.ids)
file_system_id = aws_efs_file_system.efs.id
subnet_id = tolist(data.aws_subnet_ids.subnets.ids)[count.index]
}
Finally figured out the answer after researching the answer by Dude0001.
Short Answer. Use the aws_vpc data source with the default argument instead of the aws_default_vpc resource. Here is the working sample with comments on the changes.
locals {
project_id = "it_broke_like_3_collar_watch"
}
terraform {
required_version = ">= 0.12"
}
// Delete this --> resource aws_default_vpc default {}
// Add this
data aws_vpc default {
default = true
}
data "aws_subnet_ids" "subnets" {
// Update this from aws_default_vpc.default.id
vpc_id = "${data.aws_vpc.default.id}"
}
resource aws_efs_file_system efs {
creation_token = local.project_id
encrypted = true
}
resource aws_efs_mount_target target {
depends_on = [ aws_efs_file_system.efs ]
count = length(data.aws_subnet_ids.subnets.ids)
file_system_id = aws_efs_file_system.efs.id
subnet_id = tolist(data.aws_subnet_ids.subnets.ids)[count.index]
}
What I couldn't figure out was why my work around of removing aws_efs_mount_target on the first apply worked. It's because after the first apply the aws_default_vpc was loaded into the state file.
So an alternate solution without making change to the original tf file would be to use the target option on the first apply:
$ terraform apply --target aws_default_vpc.default
However, I don't like this as it requires a special case on first deployment which is pretty unique for the terraform deployments I've worked with.
The aws_default_vpc isn't a resource TF can create or destroy. It is the default VPC for your account in each region that AWS creates automatically for you that is protected from being destroyed. You can only (and need to) adopt it in to management and your TF state. This will allow you to begin managing and to inspect when you run plan or apply. Otherwise, TF doesn't know what the resource is or what state it is in, and it cannot create a new one for you as it s a special type of protected resource as described above.
With that said, go get the default VPC id from the correct region you are deploying in your account. Then import it into your TF state. It should then be able to inspect and count the number of subnets.
For example
terraform import aws_default_vpc.default vpc-xxxxxx
https://www.terraform.io/docs/providers/aws/r/default_vpc.html
Using the data element for this looks a little odd to me as well. Can you change your TF script to get the count directly through the aws_default_vpc resource?