How to recreate aws_rds_cluster in Terraform - amazon-web-services

I am trying to create an encrypted version of my currently existing unencrypted aws_rds_cluster by updating my resource, I added:
kms_key_id = "mykmskey"
storage_encrypted = true
This is how my resource should look like:
resource "aws_rds_cluster" "my_rds_cluster" {
cluster_identifier = "${var.service_name}-rds-cluster"
database_name = var.db_name
master_username = var.db_username
master_password = random_password.db_password.result
engine = var.db_engine
engine_version = var.db_engine_version
kms_key_id = "mykmskey"
storage_encrypted = true
db_subnet_group_name = aws_db_subnet_group.fleet_service_db_subnet_group.name
vpc_security_group_ids = [aws_security_group.fleet_service_service_db_security_group.id]
skip_final_snapshot = true
backup_retention_period = var.environment != "prod" ? null : 7
# snapshot_identifier = "my-rds-instance-snapshot"
tags = { Name = "${var.service_name}-rds-cluster" }
}
The problem is that the original resource had delete_protection = true defined, which I also removed but, even though I removed it the original cluster cannot be deleted by any means in order for the new one to be created, neither through changes in Terraform, nor manually in AWS console, it just throws an error like:
error creating RDS cluster: DBClusterAlreadyExistsFault: DB Cluster already exists
Any ideas what to do in such cases?

To do that purely through Terraform, you would have to:
Remove deletion protection from the original Terraform resource
Run terraform apply, which will remove deletion protection from the actual resource in AWS
Make the modifications to the Terraform resource that will result in a delete or replace of the current resource
Run terraform apply again, during which time Terraform will now delete and/or replace the resource.
The key thing here being that you can't remove deleting protection at the same time you are actually deleting a resource, because Terraform isn't going to update an existing resource to modify an attribute before attempting to delete the resource.

Related

Trigger random_id resource recreation on rds instance destroy and recreate

Folks, am trying to find a way with terraform random_id resource to recreate and provide a new random value when the rds instance destroys and recreates due to a change that went in, say the username on rds has changed.
This random value am trying to attach to final_snapshot_identifier of the aws_db_instance resource so that the snapshot should have a unique value to its id everytime it gets created upon rds instance being destroyed.
Current code:
resource "random_id" "snap_id" {
byte_length = 8
}
locals {
inst_id = "test-rds-inst"
inst_snap_id = "${local.inst_id}-snap-${format("%.4s", random_id.snap_id.dec)}"
}
resource "aws_db_instance" "rds" {
.....
identifier = local.inst_id
final_snapshot_identifier = local.inst_snap_id
skip_final_snapshot = false
username = "foo"
apply_immediately = true
.....
}
output "snap_id" {
value = aws_db_instance.rds.final_snapshot_identifier
}
Output after terraform apply:
snap_id = "test-rds-inst-snap-5553"
Use case am trying out:
#1:
Modify value in rds instance to simulate a destroy & recreate:
Modify username to "foo-tmp"
terraform apply -auto-approve
Output:
snap_id = "test-rds-inst-snap-5553"
I was expecting the random_id to kick in and output a unique id, but it didn't.
Observation:
rds instance in deleting state
snapshot "test-rds-inst-snap-5553" in creating state
rds instance recreated and in available state
snapshot "test-rds-inst-snap-5553" in available state
#2:
Modify value again in rds instance to simulate a destroy & recreate:
Modify username to "foo-new"
terraform apply -auto-approve
Kind of expected below error, coz snap id didn't get a new value in prior attempt, but tired anyways..
Observation:
**Error:** error deleting DB Instance (test-rds-inst): DBSnapshotAlreadyExists: Cannot create the snapshot because a snapshot with the identifier test-rds-inst-snap-5553 already exists.
Am aware of the keepers{} map for random_id resource, but not sure on what from the rds_instance that I need to put in the map so that the random_id resource will be recreated and it ends up providing a new unique value to the snap_id suffix.
Also I feel using any attribute of rds instance in the random_id keepers, might cause a circular dependency issue. I may be wrong but haven't tried it though.
Any suggestions will be helpful. Thanks.
The easiest way to do this would be to use taint on the random_id resource, as per the documentation [1]:
To force a random result to be replaced, the taint command can be used to produce a new result on the next run.
Alternatively, looking at the example from the documentation, you could do something like:
resource "random_id" "snap_id" {
byte_length = 8
keepers {
snapshot_id = var.snapshot_id
}
}
resource "aws_db_instance" "rds" {
.....
identifier = local.inst_id
final_snapshot_identifier = random_id.snap_id.keepers.snapshot_id
skip_final_snapshot = false
username = "foo"
apply_immediately = true
.....
}
This means that until the value of the variable snapshot_id changes, the random_id will generate the same result. Not sure if that would work with locals, but you could try replacing var.snapshot_id with local.inst_snap_id. If that works, you could then name the snapshot using built-in functions like formatdate [2] and timestamp [3] to create a snapshot id which will be tied to the time when you were running apply, something like:
locals {
inst_id = "test-rds-inst"
snap_time = formatdate("YYYYMMDD", timestamp())
inst_snap_id = "${local.inst_id}-snap-${format("%.4s", random_id.snap_id.dec)}-${local.snap_time}"
}
[1] https://registry.terraform.io/providers/hashicorp/random/latest/docs#resource-keepers
[2] https://www.terraform.io/language/functions/formatdate
[3] https://www.terraform.io/language/functions/timestamp

Terraform wants to replace Google compute engine if its start/stop scheduler is modified

First of all, I am surprised that I have found very few resources on Google that mention this issue with Terraform.
This is an essential feature for optimizing the cost of cloud instances though, so I'm probably missing out on a few things, thanks for your tips and ideas!
I want to create an instance and manage its start and stop daily, programmatically.
The resource "google_compute_resource_policy" seems to meet my use case. However, when I change the stop or start time, Terraform plans to destroy and recreate the instance... which I absolutely don't want!
The resource "google_compute_resource_policy" is attached to the instance via the argument resource_policies where it is specified: "Modifying this list will cause the instance to recreate."
I don't understand why Terraform handles this simple update so badly. It is true that it is not possible to update a scheduler, whereas it is perfectly possible to detach it manually from the instance, then to destroy it before recreating it with the new stop/start schedule and the attach to the instance again.
Is there a workaround without going through a null resource to run a gcloud script to do these steps?
I tried to add an "ignore_changes" lifecycle on the "resource_policies" argument of my instance, Terraform no longer wants to destroy my instance, but it gives me the following error:
Error when reading or editing ResourcePolicy: googleapi: Error 400: The resource_policy resource 'projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule' is already being used by 'projects/my-project-id/zones/europe-west1-b/instances/my-instance', resourceInUseByAnotherResource"
Here is my Terraform code
resource "google_compute_resource_policy" "instance_schedule" {
name = "my-instance-schedule"
region = var.region
description = "Start and stop instance"
instance_schedule_policy {
vm_start_schedule {
schedule = var.vm_start_schedule
}
vm_stop_schedule {
schedule = var.vm_stop_schedule
}
time_zone = "Europe/Paris"
}
}
resource "google_compute_instance" "my-instance" {
// ******** This is my attempted workaround ********
lifecycle {
ignore_changes = [resource_policies]
}
name = "my-instance"
machine_type = var.machine_type
zone = "${var.region}-b"
allow_stopping_for_update = true
resource_policies = [
google_compute_resource_policy.instance_schedule.id
]
boot_disk {
device_name = local.ref_name
initialize_params {
image = var.boot_disk_image
type = var.disk_type
size = var.disk_size
}
}
network_interface {
network = data.google_compute_network.default.name
access_config {
nat_ip = google_compute_address.static.address
}
}
}
If it can be useful, here is what the terraform apply returns
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
- destroy
-/+ destroy and then create replacement
Terraform will perform the following actions:
# google_compute_resource_policy.instance_schedule must be replaced
-/+ resource "google_compute_resource_policy" "instance_schedule" {
~ id = "projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule" -> (known after apply)
name = "my-instance-schedule"
~ project = "my-project-id" -> (known after apply)
~ region = "https://www.googleapis.com/compute/v1/projects/my-project-id/regions/europe-west1" -> "europe-west1"
~ self_link = "https://www.googleapis.com/compute/v1/projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule" -> (known after apply)
# (1 unchanged attribute hidden)
~ instance_schedule_policy {
# (1 unchanged attribute hidden)
~ vm_start_schedule {
~ schedule = "0 9 * * *" -> "0 8 * * *" # forces replacement
}
# (1 unchanged block hidden)
}
}
Plan: 1 to add, 0 to change, 1 to destroy.
Do you want to perform these actions in workspace "prd"?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
google_compute_resource_policy.instance_schedule: Destroying... [id=projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule]
Error: Error when reading or editing ResourcePolicy: googleapi: Error 400: The resource_policy resource 'projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule' is already being used by 'projects/my-project-id/zones/europe-west1-b/instances/my-instance', resourceInUseByAnotherResource
NB: I am working with Terraform 0.14.7 and I am using google provider version 3.76.0
An instance inside GCP can be power off without destroy it with the module google_compute_instance using the argument desired_status, keep in mind that if you are creating the instance for the first time this argument needs to be on “RUNNING”. This module can be used as the following.
resource "google_compute_instance" "default" {
name = "test"
machine_type = "f1-micro"
zone = "us-west1-a"
desired_status = "RUNNING"
}
You can also modify your “main.tf” file if you need to stop the VM first and then started creating a dependency in terraform with depends_on.
As you can see in the following comment, the service account will be created but the key will be assigned until the first sentence is done.
resource "google_service_account" "service_account" {
account_id = "terraform-test"
display_name = "Service Account"
}
resource "google_service_account_key" "mykey" {
service_account_id = google_service_account.service_account.id
public_key_type = "TYPE_X509_PEM_FILE"
depends_on = [google_service_account.service_account]
}
If the first component already exists, terraform only deploys the dependent.
I faced same problem with snapshot policy.
I controlled resource policy creation using a flag input variable and using count. For the first time, I created policy resource using flag as 'true'. When I want to change schedule time, I change the flag as 'false' and apply the plan. This will detach the resource.
I then make flag as 'true' again and apply the plan with new time.
This worked for me for snapshot policy. Hope it could solve yours too.
I solved the "resourceInUseByAnotherResource" error by adding the following lifecycle to the google_compute_resource_policy resource:
lifecycle {
create_before_destroy = true
}
Also, this requires to have a unique name with each change, otherwise, the new resource can't be created, because the resource with the same name already exists. So I appended a random ID to the end of the schedule name:
resource "random_pet" "schedule" {
keepers = {
start_schedule = "${var.vm_start_schedule}"
stop_schedule = "${var.vm_stop_schedule}"
}
}
...
resource "google_compute_resource_policy" "schedule" {
name = "schedule-${random_pet.schedule.id}"
...
lifecycle {
create_before_destroy = true
}
}

GCP Cloud SQL failed to delete instance because `deletion_protection` is set to true

I have a tf script for provisioning a Cloud SQL instance, along with a couple of dbs and an admin user. I have renamed the instance, hence a new instance was created but terraform is encountering issues when it comes to deleting the old one.
Error: Error, failed to delete instance because deletion_protection is set to true. Set it to false to proceed with instance deletion
I have tried setting the deletion_protection to false but I keep getting the same error. Is there a way to check which resources need to have the deletion_protection set to false in order to be deleted?
I have only added it to the google_sql_database_instance resource.
My tf script:
// Provision the Cloud SQL Instance
resource "google_sql_database_instance" "instance-master" {
name = "instance-db-${random_id.random_suffix_id.hex}"
region = var.region
database_version = "POSTGRES_12"
project = var.project_id
settings {
availability_type = "REGIONAL"
tier = "db-f1-micro"
activation_policy = "ALWAYS"
disk_type = "PD_SSD"
ip_configuration {
ipv4_enabled = var.is_public ? true : false
private_network = var.network_self_link
require_ssl = true
dynamic "authorized_networks" {
for_each = toset(var.is_public ? [1] : [])
content {
name = "Public Internet"
value = "0.0.0.0/0"
}
}
}
backup_configuration {
enabled = true
}
maintenance_window {
day = 2
hour = 4
update_track = "stable"
}
dynamic "database_flags" {
iterator = flag
for_each = var.database_flags
content {
name = flag.key
value = flag.value
}
}
user_labels = var.default_labels
}
deletion_protection = false
depends_on = [google_service_networking_connection.cloudsql-peering-connection, google_project_service.enable-sqladmin-api]
}
// Provision the databases
resource "google_sql_database" "db" {
name = "orders-placement"
instance = google_sql_database_instance.instance-master.name
project = var.project_id
}
// Provision a super user
resource "google_sql_user" "admin-user" {
name = "admin-user"
instance = google_sql_database_instance.instance-master.name
password = random_password.user-password.result
project = var.project_id
}
// Get latest CA certificate
locals {
furthest_expiration_time = reverse(sort([for k, v in google_sql_database_instance.instance-master.server_ca_cert : v.expiration_time]))[0]
latest_ca_cert = [for v in google_sql_database_instance.instance-master.server_ca_cert : v.cert if v.expiration_time == local.furthest_expiration_time]
}
// Get SSL certificate
resource "google_sql_ssl_cert" "client_cert" {
common_name = "instance-master-client"
instance = google_sql_database_instance.instance-master.name
}
Seems like your code going to recreate this sql-instance. But your current tfstate file contains an instance-code with true value for deletion_protection parameter. In this case, you need first of all change value of this parameter to false manually in tfstate file or by adding deletion_protection = true in the code with running terraform apply command after that (beware: your code shouldn't do a recreation of the instance). And after this manipulations, you can do anything with your SQL instance
You will have to set deletion_protection=false, apply it and then proceed to delete.
As per the documentation
On newer versions of the provider, you must explicitly set deletion_protection=false (and run terraform apply to write the field to state) in order to destroy an instance. It is recommended to not set this field (or set it to true) until you're ready to destroy the instance and its databases.
Link
Editing Terraform state files directly / manually is not recommended
If you added deletion_protection to the google_sql_database_instance after the database instance was created, you need to run terraform apply before running terraform destroy so that deletion_protection is set to false on the database instance.

Creating RDS instances from not the recent snapshot using Terraform

In Terraform project I am creating an RDS instance from a not recent snapshot (fifth before the last), my script here:
data "aws_db_snapshot" "db_snapshot" {
db_instance_identifier = "production-db-intern"
db_snapshot_arn = "arn:aws:rds:eu-central-1:123114111478:snapshot:rds:production-db-intern-2019-05-09-16-10"
}
resource "aws_db_instance" "db_intern" {
skip_final_snapshot = true
identifier = "db-intern"
auto_minor_version_upgrade = false
instance_class = "db.m4.4xlarge"
deletion_protection = false
vpc_security_group_ids = ["${var.security_group_id}"]
db_subnet_group_name = "${var.subnet_group_name}"
timeouts {
create = "3h"
delete = "2h"
}
lifecycle {
prevent_destroy = false
}
snapshot_identifier = "${data.aws_db_snapshot.db_snapshot.id}"
}
I did a "terraform plan" and
I got the next error:
Error: data.aws_db_snapshot.db_snapshot: "db_snapshot_arn": this field cannot be set
db_snapshot_arn is not a valid field of the aws_db_snapshot data resource. Did you mean db_snapshot_identifier.
Also, you can't pass the ARN to this data resource, you can pass the snapshot ID instead, e.g. snap-1234567890abcdef0.
Besides that, the data resource only expects either the db_instance_identifier to be set or the db_snapshot_identifier. See the documentation on the snapshot CLI for more details on the specifics. Terraform leverages the CLI to retrieve these resources.

Creating RDS Instances from Snapshot Using Terraform

Working on a Terraform project in which I am creating an RDS cluster by grabbing and using the most recent production db snapshot:
# Get latest snapshot from production DB
data "aws_db_snapshot" "db_snapshot" {
most_recent = true
db_instance_identifier = "${var.db_instance_to_clone}"
}
#Create RDS instance from snapshot
resource "aws_db_instance" "primary" {
identifier = "${var.app_name}-primary"
snapshot_identifier = "${data.aws_db_snapshot.db_snapshot.id}"
instance_class = "${var.instance_class}"
vpc_security_group_ids = ["${var.security_group_id}"]
skip_final_snapshot = true
final_snapshot_identifier = "snapshot"
parameter_group_name = "${var.parameter_group_name}"
publicly_accessible = true
timeouts {
create = "2h"
}
}
The issue with this approach is that following runs of the terraform code (once another snapshot has been taken) want to re-create the primary RDS instance (and subsequently, the read replicas) with the latest snapshot of the DB. I was thinking something along the lines of a boolean count parameters that specifies first run, but setting count = 0 on the snapshot resource causes issues with the snapshot_id parameters of the db resource. Likewise setting a count = 0 on the db resource would indicate that it would destroy the db.
Use case for this is to be able to make changes to other aspects of the production infrastructure that this terraform plan manages without having to re-create the entire RDS cluster, which is a very time consuming resource to destroy/create.
Try placing an ignore_changes lifecycle block within your aws_db_instance definition:
lifecycle {
ignore_changes = [
snapshot_identifier,
]
}
This will cause Terraform to only look for changes to the database's snapshot_identifier upon initial creation.
If the database already exists, Terraform will ignore any changes to the existing database's snapshot_identifier field -- even if a new snapshot has been created since then.