How to specify either empty snapshot_identifier or a datasource value - amazon-web-services

I am trying to make a common module to allow an rds cluster to be built - however I want to be able to choose to have it build from either a snapshot or from scratch.
I used a count to choose whether to perform the datasource lookup or not which works. However if it is set to 0 and doesn't run, the resource will fail as it doesn't know what data.aws_db_cluster_snapshot.latest_cluster_snapshot. Is there a way around this that I can't quite think of myself?
Datasource:
data "aws_db_cluster_snapshot" "latest_cluster_snapshot" {
count = "${var.enable_restore == "true" ? 1 : 0}"
db_cluster_identifier = "${var.snapshot_to_restore_from}"
most_recent = true
}
Resource:
resource "aws_rds_cluster" "aurora_cluster" {
...
snapshot_identifier = "${var.enable_restore == "false" ? "" : data.aws_db_cluster_snapshot.latest_cluster_snapshot.id}"
...
}
Versions:
Terraform v0.11.10
provider.aws v2.33.0

Related

How can I configure Terraform to update a GCP compute engine instance template without destroying and re-creating?

I have a service deployed on GCP compute engine. It consists of a compute engine instance template, instance group, instance group manager, and load balancer + associated forwarding rules etc.
We're forced into using compute engine rather than Cloud Run or some other serverless offering due to the need for docker-in-docker for the service in question.
The deployment is managed by terraform. I have a config that looks something like this:
data "google_compute_image" "debian_image" {
family = "debian-11"
project = "debian-cloud"
}
resource "google_compute_instance_template" "my_service_template" {
name = "my_service"
machine_type = "n1-standard-1"
disk {
source_image = data.google_compute_image.debian_image.self_link
auto_delete = true
boot = true
}
...
metadata_startup_script = data.local_file.startup_script.content
metadata = {
MY_ENV_VAR = var.whatever
}
}
resource "google_compute_region_instance_group_manager" "my_service_mig" {
version {
instance_template = google_compute_instance_template.my_service_template.id
name = "primary"
}
...
}
resource "google_compute_region_backend_service" "my_service_backend" {
...
backend {
group = google_compute_region_instance_group_manager.my_service_mig.instance_group
}
}
resource "google_compute_forwarding_rule" "my_service_frontend" {
depends_on = [
google_compute_region_instance_group_manager.my_service_mig,
]
name = "my_service_ilb"
backend_service = google_compute_region_backend_service.my_service_backend.id
...
}
I'm running into issues where Terraform is unable to perform any kind of update to this service without running into conflicts. It seems that instance templates are immutable in GCP, and doing anything like updating the startup script, adding an env var, or similar forces it to be deleted and re-created.
Terraform prints info like this in that situation:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
~ update in-place
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.connectors_compute_engine.google_compute_instance_template.airbyte_translation_instance1 must be replaced
-/+ resource "google_compute_instance_template" "my_service_template" {
~ id = "projects/project/..." -> (known after apply)
~ metadata = { # forces replacement
+ "TEST" = "test"
# (1 unchanged element hidden)
}
The only solution I've found for getting out of this situation is to entirely delete the entire service and all associated entities from the load balancer down to the instance template and re-create them.
Is there some way to avoid this situation so that I'm able to change the instance template without having to manually update all the terraform config two times? At this point I'm even fine if it ends up creating some downtime for the service in question rather than a full rolling update or something since that's what's happening now anyway.
I was triggered by this issue as well.
However, according to:
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance_template#using-with-instance-group-manager
Instance Templates cannot be updated after creation with the Google
Cloud Platform API. In order to update an Instance Template, Terraform
will destroy the existing resource and create a replacement. In order
to effectively use an Instance Template resource with an Instance
Group Manager resource, it's recommended to specify
create_before_destroy in a lifecycle block. Either omit the Instance
Template name attribute, or specify a partial name with name_prefix.
I would also test and plan with this lifecycle meta argument as well:
+ lifecycle {
+ prevent_destroy = true
+ }
}
Or more realistically in your specific case, something like:
resource "google_compute_instance_template" "my_service_template" {
version {
instance_template = google_compute_instance_template.my_service_template.id
name = "primary"
}
+ lifecycle {
+ create_before_destroy = true
+ }
}
So terraform plan with either create_before_destroy or prevent_destroy = true before terraform apply on google_compute_instance_template to see results.
Ultimately, you can remove google_compute_instance_template.my_service_template.id from state file and import it back.
Some suggested workarounds in this thread:
terraform lifecycle prevent destroy

Terraform wants to replace Google compute engine if its start/stop scheduler is modified

First of all, I am surprised that I have found very few resources on Google that mention this issue with Terraform.
This is an essential feature for optimizing the cost of cloud instances though, so I'm probably missing out on a few things, thanks for your tips and ideas!
I want to create an instance and manage its start and stop daily, programmatically.
The resource "google_compute_resource_policy" seems to meet my use case. However, when I change the stop or start time, Terraform plans to destroy and recreate the instance... which I absolutely don't want!
The resource "google_compute_resource_policy" is attached to the instance via the argument resource_policies where it is specified: "Modifying this list will cause the instance to recreate."
I don't understand why Terraform handles this simple update so badly. It is true that it is not possible to update a scheduler, whereas it is perfectly possible to detach it manually from the instance, then to destroy it before recreating it with the new stop/start schedule and the attach to the instance again.
Is there a workaround without going through a null resource to run a gcloud script to do these steps?
I tried to add an "ignore_changes" lifecycle on the "resource_policies" argument of my instance, Terraform no longer wants to destroy my instance, but it gives me the following error:
Error when reading or editing ResourcePolicy: googleapi: Error 400: The resource_policy resource 'projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule' is already being used by 'projects/my-project-id/zones/europe-west1-b/instances/my-instance', resourceInUseByAnotherResource"
Here is my Terraform code
resource "google_compute_resource_policy" "instance_schedule" {
name = "my-instance-schedule"
region = var.region
description = "Start and stop instance"
instance_schedule_policy {
vm_start_schedule {
schedule = var.vm_start_schedule
}
vm_stop_schedule {
schedule = var.vm_stop_schedule
}
time_zone = "Europe/Paris"
}
}
resource "google_compute_instance" "my-instance" {
// ******** This is my attempted workaround ********
lifecycle {
ignore_changes = [resource_policies]
}
name = "my-instance"
machine_type = var.machine_type
zone = "${var.region}-b"
allow_stopping_for_update = true
resource_policies = [
google_compute_resource_policy.instance_schedule.id
]
boot_disk {
device_name = local.ref_name
initialize_params {
image = var.boot_disk_image
type = var.disk_type
size = var.disk_size
}
}
network_interface {
network = data.google_compute_network.default.name
access_config {
nat_ip = google_compute_address.static.address
}
}
}
If it can be useful, here is what the terraform apply returns
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
- destroy
-/+ destroy and then create replacement
Terraform will perform the following actions:
# google_compute_resource_policy.instance_schedule must be replaced
-/+ resource "google_compute_resource_policy" "instance_schedule" {
~ id = "projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule" -> (known after apply)
name = "my-instance-schedule"
~ project = "my-project-id" -> (known after apply)
~ region = "https://www.googleapis.com/compute/v1/projects/my-project-id/regions/europe-west1" -> "europe-west1"
~ self_link = "https://www.googleapis.com/compute/v1/projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule" -> (known after apply)
# (1 unchanged attribute hidden)
~ instance_schedule_policy {
# (1 unchanged attribute hidden)
~ vm_start_schedule {
~ schedule = "0 9 * * *" -> "0 8 * * *" # forces replacement
}
# (1 unchanged block hidden)
}
}
Plan: 1 to add, 0 to change, 1 to destroy.
Do you want to perform these actions in workspace "prd"?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
google_compute_resource_policy.instance_schedule: Destroying... [id=projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule]
Error: Error when reading or editing ResourcePolicy: googleapi: Error 400: The resource_policy resource 'projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule' is already being used by 'projects/my-project-id/zones/europe-west1-b/instances/my-instance', resourceInUseByAnotherResource
NB: I am working with Terraform 0.14.7 and I am using google provider version 3.76.0
An instance inside GCP can be power off without destroy it with the module google_compute_instance using the argument desired_status, keep in mind that if you are creating the instance for the first time this argument needs to be on “RUNNING”. This module can be used as the following.
resource "google_compute_instance" "default" {
name = "test"
machine_type = "f1-micro"
zone = "us-west1-a"
desired_status = "RUNNING"
}
You can also modify your “main.tf” file if you need to stop the VM first and then started creating a dependency in terraform with depends_on.
As you can see in the following comment, the service account will be created but the key will be assigned until the first sentence is done.
resource "google_service_account" "service_account" {
account_id = "terraform-test"
display_name = "Service Account"
}
resource "google_service_account_key" "mykey" {
service_account_id = google_service_account.service_account.id
public_key_type = "TYPE_X509_PEM_FILE"
depends_on = [google_service_account.service_account]
}
If the first component already exists, terraform only deploys the dependent.
I faced same problem with snapshot policy.
I controlled resource policy creation using a flag input variable and using count. For the first time, I created policy resource using flag as 'true'. When I want to change schedule time, I change the flag as 'false' and apply the plan. This will detach the resource.
I then make flag as 'true' again and apply the plan with new time.
This worked for me for snapshot policy. Hope it could solve yours too.
I solved the "resourceInUseByAnotherResource" error by adding the following lifecycle to the google_compute_resource_policy resource:
lifecycle {
create_before_destroy = true
}
Also, this requires to have a unique name with each change, otherwise, the new resource can't be created, because the resource with the same name already exists. So I appended a random ID to the end of the schedule name:
resource "random_pet" "schedule" {
keepers = {
start_schedule = "${var.vm_start_schedule}"
stop_schedule = "${var.vm_stop_schedule}"
}
}
...
resource "google_compute_resource_policy" "schedule" {
name = "schedule-${random_pet.schedule.id}"
...
lifecycle {
create_before_destroy = true
}
}

How to sync lifecycle_rule of s3 bucket actual configuration with terraform script

The rule was set up manually in AWS console. I wanted to sync it in my terraform script.
I have the following defined in terraform script:
resource "aws_s3_bucket" "bucketname" {
bucket = "${local.bucket_name}"
acl = "private"
force_destroy = "false"
acceleration_status = "Enabled"
lifecycle_rule {
enabled = true,
transition {
days = 30
storage_class = "INTELLIGENT_TIERING"
}
}
lifecycle_rule {
enabled = true,
expiration {
days = 30
}
}
}
However this always gives me the following output when applying it:
lifecycle_rule.0.transition.1300905083.date: "" => ""
lifecycle_rule.0.transition.1300905083.days: "" => "30"
lifecycle_rule.0.transition.1300905083.storage_class: "" => "INTELLIGENT_TIERING"
lifecycle_rule.0.transition.3021102259.date: "" => ""
lifecycle_rule.0.transition.3021102259.days: "0" => "0"
lifecycle_rule.0.transition.3021102259.storage_class: "INTELLIGENT_TIERING" => ""
I'm not sure what's the behavior , is it trying to delete the existing and recreate it?
is it trying to delete the existing and recreate it?
Yes. If the rules have been created outside of TF, as far as TF is concerned, they don't exist. Thus TF is going to replace existing ones, as it is not aware of them. TF docs states:
It [TF] does not generate configuration.
Since your bucket does not have lifecycles in TF, TF treats them as non-existent.
When you are managing your infrastructure using any IoC tool (TF, CloudFormation, ...) its a bad practice to modify resources "manually", outside of these tools. This leads to, so called, resource drift which in turn can lead to more issues in future.
In your case, you either have to re-create the rules in TF, which means the manually ones will be replaced, or import them. However, you can't import individual attributes of a resource. Thus you would have to import the bucket.
It looks like i just made a silly mistake putting a value for the days parameter. The correct config which is same as the manual update done is:
resource "aws_s3_bucket" "bucketname" {
bucket = "${local.bucket_name}"
acl = "private"
force_destroy = "false"
acceleration_status = "Enabled"
lifecycle_rule {
enabled = true,
transition {
storage_class = "INTELLIGENT_TIERING"
}
}
lifecycle_rule {
enabled = true,
expiration {
days = 30
}
}
}

GCP Cloud SQL failed to delete instance because `deletion_protection` is set to true

I have a tf script for provisioning a Cloud SQL instance, along with a couple of dbs and an admin user. I have renamed the instance, hence a new instance was created but terraform is encountering issues when it comes to deleting the old one.
Error: Error, failed to delete instance because deletion_protection is set to true. Set it to false to proceed with instance deletion
I have tried setting the deletion_protection to false but I keep getting the same error. Is there a way to check which resources need to have the deletion_protection set to false in order to be deleted?
I have only added it to the google_sql_database_instance resource.
My tf script:
// Provision the Cloud SQL Instance
resource "google_sql_database_instance" "instance-master" {
name = "instance-db-${random_id.random_suffix_id.hex}"
region = var.region
database_version = "POSTGRES_12"
project = var.project_id
settings {
availability_type = "REGIONAL"
tier = "db-f1-micro"
activation_policy = "ALWAYS"
disk_type = "PD_SSD"
ip_configuration {
ipv4_enabled = var.is_public ? true : false
private_network = var.network_self_link
require_ssl = true
dynamic "authorized_networks" {
for_each = toset(var.is_public ? [1] : [])
content {
name = "Public Internet"
value = "0.0.0.0/0"
}
}
}
backup_configuration {
enabled = true
}
maintenance_window {
day = 2
hour = 4
update_track = "stable"
}
dynamic "database_flags" {
iterator = flag
for_each = var.database_flags
content {
name = flag.key
value = flag.value
}
}
user_labels = var.default_labels
}
deletion_protection = false
depends_on = [google_service_networking_connection.cloudsql-peering-connection, google_project_service.enable-sqladmin-api]
}
// Provision the databases
resource "google_sql_database" "db" {
name = "orders-placement"
instance = google_sql_database_instance.instance-master.name
project = var.project_id
}
// Provision a super user
resource "google_sql_user" "admin-user" {
name = "admin-user"
instance = google_sql_database_instance.instance-master.name
password = random_password.user-password.result
project = var.project_id
}
// Get latest CA certificate
locals {
furthest_expiration_time = reverse(sort([for k, v in google_sql_database_instance.instance-master.server_ca_cert : v.expiration_time]))[0]
latest_ca_cert = [for v in google_sql_database_instance.instance-master.server_ca_cert : v.cert if v.expiration_time == local.furthest_expiration_time]
}
// Get SSL certificate
resource "google_sql_ssl_cert" "client_cert" {
common_name = "instance-master-client"
instance = google_sql_database_instance.instance-master.name
}
Seems like your code going to recreate this sql-instance. But your current tfstate file contains an instance-code with true value for deletion_protection parameter. In this case, you need first of all change value of this parameter to false manually in tfstate file or by adding deletion_protection = true in the code with running terraform apply command after that (beware: your code shouldn't do a recreation of the instance). And after this manipulations, you can do anything with your SQL instance
You will have to set deletion_protection=false, apply it and then proceed to delete.
As per the documentation
On newer versions of the provider, you must explicitly set deletion_protection=false (and run terraform apply to write the field to state) in order to destroy an instance. It is recommended to not set this field (or set it to true) until you're ready to destroy the instance and its databases.
Link
Editing Terraform state files directly / manually is not recommended
If you added deletion_protection to the google_sql_database_instance after the database instance was created, you need to run terraform apply before running terraform destroy so that deletion_protection is set to false on the database instance.

Terraform AWS RDS fails when creating DB instances with hardcoded RDS cluster ID

i'm trying to create a if/else statement for Terraform AWS RDS Cluster. I have something like this working-ish:
module "aws" {
# false = "0" and true = "1"
create_from_snapshot = "1"
snapshot_identifier = "db-final-snapshot"
}
# Normal deployment
resource "aws_rds_cluster" "cluster" {
count = "${1 - var.create_from_snapshot}"
cluster_identifier = "${var.db_name}"
}
# Creation from a snapshot
resource "aws_rds_cluster" "cluster_from_snapshot" {
count = "${var.create_from_snapshot}"
cluster_identifier = "${var.db_name}"
snapshot_identifier = "${var.snapshot_identifier}"
}
However i'm having problems trying to define the cluster_identifier inside of the cluster instances, the first situation doesn't work uniformly for both cases while the second requires two terraform apply statements:
# Everything works fine, but then 'cluster' isn't defined
# if using the 'cluster_from_snapshot' resource
resource "aws_rds_cluster_instance" "cluster_instance" {
cluster_identifier = "${aws_rds_cluster.cluster.id}"
}
# This throws and error on 'terraform apply'
resource "aws_rds_cluster_instance" "cluster_instance" {
cluster_identifier = "${var.db_name}"
}
Later on the command line ...
$> terraform apply
.... logs...
module.aurora.aws_rds_cluster.cluster_from_snapshot:
Creation complete after 50s (ID: testing)
Error: Error applying plan:
aws_rds_cluster_instance.cluster_instance.0: error creating RDS DB
Instance: DBClusterNotFound, could not find DB Cluster: testing
aws_rds_cluster_instance.cluster_instance.1: error creating RDS DB
Instance: DBClusterNotFound, could not find DB Cluster: testing
$> terraform apply # instance 0 & 1 get created no problem
Advice on creating a resource identifier for instances that would apply in both situations would be appreciated, but i'm really curious if there's something special about ${aws_rds_cluster.cluster.id} (maybe it includes routing information?) that makes it different from a normal hard coded string or the variable, or is there just a timing lang?
Thanks in advance!