Terraform: Create new GCE instance using same terraform code - google-cloud-platform

I have created a new gcp vm instance successfully using terraform modules. The contents in my module folder are as follows
#main.tf
# google_compute_instance.default:
resource "google_compute_instance" "default" {
machine_type = "${var.machinetype}"
name = "${var.name}"
project = "demo"
tags = []
zone = "${var.zone}"
boot_disk {
initialize_params {
image = "https://www.googleapis.com/compute/v1/projects/centos-cloud/global/images/centos-7-v20210701"
size = 20
type = "pd-balanced"
}
}
network_interface {
network = "default"
subnetwork = "default"
subnetwork_project = "gcp-infrastructure-319318"
}
service_account {
email = "971558418058-compute#developer.gserviceaccount.com"
scopes = [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/trace.append",
]
}
}
-----------
#variables.tf
variable "zone" {
default="us-east1-b"
}
variable "machinetype" {
default="f1-micro"
}
------------------
#terraform.tfvars
machinetype="g1-small"
zone="europe-west1-b"
My main code block is follows
$ cat providers.tf
provider "google" {
}
$ cat main.tf
module "gce" {
source = "../../../modules/services/gce"
name = "new-vm-tf"
}
With this code I am able create a new vm instance successfully
$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
new-vm-tf us-east1-b f1-micro 10.142.0.3 RUNNING
Now I have a requirement to create a new VM instance of machine type e2-standard. how can I handle that scenario?
If I edit my existing main.tf as shown below, it will destroy the existing vm instance to create the new vm instance .
$ cat main.tf
module "gce" {
source = "../../../modules/services/gce"
name = "new-vm-tf1"
}
terraform plan confirms as below
~ name = "new-vm-tf" -> "new-vm-tf1" # forces replacement
Plan: 1 to add, 0 to change, 1 to destroy.
I need pointers to resuse the same code to create a new vm existing without any changes to existing one . Please suggest

I recommend you to deep dive into terraform mechanism and best practices. I have 2 keywords to start: tfstate and variables.
The tfstate is the deployment context. If you change the deployment and the context isn't consistent, Terraform delete what is not consistent and create the missing pieces.
The variables are useful to reuse piece of generic code by customizing the values in entry.

Related

GCP - create multiple VM using terraform

we need to create 2 VM in GCP using terraform with:
different startup scripts
we need the first and second VM will not be created at the same time.
we also need the IP of the first VM that was created and pass it in to the metadata script of the second VM.
How can we do it?
This is our working terraform file to create 1 VM:
provider "google" {
project = "project-name"
region = "europe-west3"
zone = "europe-west3-c"
}
resource "google_compute_instance" "vm_instance" {
name = "terra-instance-with-script-newvpc"
machine_type = "n1-standard-2"
zone = "europe-west3-c"
boot_disk {
initialize_params {
image ="centos-7"
}
}
metadata = {
startup-script = <<-EOF
hello world
EOF
}
network_interface {
subnetwork="test-subnet-frankfurt"
}
}
we tried to add another script and its failed
To ensure that the second GCE Instance is created after the first
one, you can use the depends_on argument (see the example code
below)
For the startup scripts, you can use a separate cloud-init script(s) and manage them using the Terraform variables, as
illustrated in the example code below.
For the IP address part, there are 2 possible use cases, depending on your needs:
Assign a Public Static IP: you can use the google_compute_address Terraform resource first to prepare an IP for the first instance, then call
this resource in the network_interface {access_config = { nat_ip = google_compute_address.static.first_instance_ip.address }} argument (see the
example below)
Then, in the init script of the second instance you can again call the Terraform resource google_compute_address.first_instance_ip.address and inject it into your script the way you want.
If you want to use an ephemeral (generated) IP instead, then you don’t need to use the google_compute_address at all, in your init script of the second instance, call the first instance Terraform resource with the nat ip attribute, e.g: google_compute_instance.firs_instance.network_interface.0.access_config.0.nat_ip and inject it in your script the way you want.
In the example code below, it assumes you will use a static IP address:
# Create the IP address
resource "google_compute_address" "static" {
name = "first_instance_ip_address"
}
# Source the init script and inject the IP Address variable
data "template_file" "init" {
template = "${file("./startup.sh")}" # call your startup script from it's path
vars = {
first_instance_ip = "${google_compute_address.static.address}"
# In case you use the generated IP:
# first_instance_ip = "${google_compute_compute_instance.first_instance.network_interface.0.access_config.0.nat_ip}"
}
}
# Create the 1st google compute instance
resource "google_compute_instance" "firs_instance" {
depends_on = [google_compute_address.static]
....
# Associated our public IP address to this instance
access_config = {
nat_ip = google_compute_address.static.address
}
}
...
# Create your second IP same as the first one
...
#Create the second google compute instance
resource "google_compute_instance" "second_instance" {
depends_on = [google_compute_instance.first_instance]
...
# Associated our public IP address to this instance
access_config = {
nat_ip = google_compute_address.second_static_ip.address # optional depends on your usecase
}
}
metadata_startup_script = "${data.template_file.startup_script.rendered}"
}
P.S: this is not a complete and production-ready code, it is a collection of tips to help you solve your issue.

Is it possible to update existing compute engine using terraform?

I'm a beginner in terraform and I'm planning to create a terraform module that updates the tags and service account of an existing compute engine in google cloud.
The problem is this existing VMs were provisioned manually and there is a need to update the tags and service account occasionally.
So far, I created a small module that use data to fetch the details and resource to create my changes but this doesn't fit the requirement as its creating a new resource.
data "google_compute_instance" "data_host" {
name = var.instance_name
project = var.project_name
zone = var.project_zone
}
resource "google_compute_instance" "host" {
name = data.google_compute_instance.data_host.self_link
zone = data.google_compute_instance.data_host.self_link
machine_type = data.google_compute_instance.data_host.self_link
boot_disk {
source = data.google_compute_instance.data_host.self_link
}
network_interface {
network = data.google_compute_instance.data_host.self_link
}
service_account {
email = var.service_account
scopes = ["cloud-platform"]
}
etc...
}
are there anyway to update my existing VMs without recreating the entire system?

Terraform wants to replace Google compute engine if its start/stop scheduler is modified

First of all, I am surprised that I have found very few resources on Google that mention this issue with Terraform.
This is an essential feature for optimizing the cost of cloud instances though, so I'm probably missing out on a few things, thanks for your tips and ideas!
I want to create an instance and manage its start and stop daily, programmatically.
The resource "google_compute_resource_policy" seems to meet my use case. However, when I change the stop or start time, Terraform plans to destroy and recreate the instance... which I absolutely don't want!
The resource "google_compute_resource_policy" is attached to the instance via the argument resource_policies where it is specified: "Modifying this list will cause the instance to recreate."
I don't understand why Terraform handles this simple update so badly. It is true that it is not possible to update a scheduler, whereas it is perfectly possible to detach it manually from the instance, then to destroy it before recreating it with the new stop/start schedule and the attach to the instance again.
Is there a workaround without going through a null resource to run a gcloud script to do these steps?
I tried to add an "ignore_changes" lifecycle on the "resource_policies" argument of my instance, Terraform no longer wants to destroy my instance, but it gives me the following error:
Error when reading or editing ResourcePolicy: googleapi: Error 400: The resource_policy resource 'projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule' is already being used by 'projects/my-project-id/zones/europe-west1-b/instances/my-instance', resourceInUseByAnotherResource"
Here is my Terraform code
resource "google_compute_resource_policy" "instance_schedule" {
name = "my-instance-schedule"
region = var.region
description = "Start and stop instance"
instance_schedule_policy {
vm_start_schedule {
schedule = var.vm_start_schedule
}
vm_stop_schedule {
schedule = var.vm_stop_schedule
}
time_zone = "Europe/Paris"
}
}
resource "google_compute_instance" "my-instance" {
// ******** This is my attempted workaround ********
lifecycle {
ignore_changes = [resource_policies]
}
name = "my-instance"
machine_type = var.machine_type
zone = "${var.region}-b"
allow_stopping_for_update = true
resource_policies = [
google_compute_resource_policy.instance_schedule.id
]
boot_disk {
device_name = local.ref_name
initialize_params {
image = var.boot_disk_image
type = var.disk_type
size = var.disk_size
}
}
network_interface {
network = data.google_compute_network.default.name
access_config {
nat_ip = google_compute_address.static.address
}
}
}
If it can be useful, here is what the terraform apply returns
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
- destroy
-/+ destroy and then create replacement
Terraform will perform the following actions:
# google_compute_resource_policy.instance_schedule must be replaced
-/+ resource "google_compute_resource_policy" "instance_schedule" {
~ id = "projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule" -> (known after apply)
name = "my-instance-schedule"
~ project = "my-project-id" -> (known after apply)
~ region = "https://www.googleapis.com/compute/v1/projects/my-project-id/regions/europe-west1" -> "europe-west1"
~ self_link = "https://www.googleapis.com/compute/v1/projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule" -> (known after apply)
# (1 unchanged attribute hidden)
~ instance_schedule_policy {
# (1 unchanged attribute hidden)
~ vm_start_schedule {
~ schedule = "0 9 * * *" -> "0 8 * * *" # forces replacement
}
# (1 unchanged block hidden)
}
}
Plan: 1 to add, 0 to change, 1 to destroy.
Do you want to perform these actions in workspace "prd"?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
google_compute_resource_policy.instance_schedule: Destroying... [id=projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule]
Error: Error when reading or editing ResourcePolicy: googleapi: Error 400: The resource_policy resource 'projects/my-project-id/regions/europe-west1/resourcePolicies/my-instance-schedule' is already being used by 'projects/my-project-id/zones/europe-west1-b/instances/my-instance', resourceInUseByAnotherResource
NB: I am working with Terraform 0.14.7 and I am using google provider version 3.76.0
An instance inside GCP can be power off without destroy it with the module google_compute_instance using the argument desired_status, keep in mind that if you are creating the instance for the first time this argument needs to be on “RUNNING”. This module can be used as the following.
resource "google_compute_instance" "default" {
name = "test"
machine_type = "f1-micro"
zone = "us-west1-a"
desired_status = "RUNNING"
}
You can also modify your “main.tf” file if you need to stop the VM first and then started creating a dependency in terraform with depends_on.
As you can see in the following comment, the service account will be created but the key will be assigned until the first sentence is done.
resource "google_service_account" "service_account" {
account_id = "terraform-test"
display_name = "Service Account"
}
resource "google_service_account_key" "mykey" {
service_account_id = google_service_account.service_account.id
public_key_type = "TYPE_X509_PEM_FILE"
depends_on = [google_service_account.service_account]
}
If the first component already exists, terraform only deploys the dependent.
I faced same problem with snapshot policy.
I controlled resource policy creation using a flag input variable and using count. For the first time, I created policy resource using flag as 'true'. When I want to change schedule time, I change the flag as 'false' and apply the plan. This will detach the resource.
I then make flag as 'true' again and apply the plan with new time.
This worked for me for snapshot policy. Hope it could solve yours too.
I solved the "resourceInUseByAnotherResource" error by adding the following lifecycle to the google_compute_resource_policy resource:
lifecycle {
create_before_destroy = true
}
Also, this requires to have a unique name with each change, otherwise, the new resource can't be created, because the resource with the same name already exists. So I appended a random ID to the end of the schedule name:
resource "random_pet" "schedule" {
keepers = {
start_schedule = "${var.vm_start_schedule}"
stop_schedule = "${var.vm_stop_schedule}"
}
}
...
resource "google_compute_resource_policy" "schedule" {
name = "schedule-${random_pet.schedule.id}"
...
lifecycle {
create_before_destroy = true
}
}

How to set auto-delete option for additional attached_disk in gcp instance uing terraform?

I am trying to create a vm instance in gcp with a boot_disk and additional attached_disk using terraform. I could not find any parameter to auto delete the additional attached_disk when instance is deleted.
auto-delete option is availble in gcp console.
Terraform code:
resource "google_compute_disk" "elastic-disk" {
count = var.no_of_elastic_intances
name = "elastic-disk-${count.index+1}-data"
type = "pd-standard"
size = "10"
}
resource "google_compute_instance" "elastic" {
count = var.no_of_elastic_intances
name = "${var.elastic_instance_name_prefix}-${count.index+1}"
machine_type = var.elastic_instance_machine_type
boot_disk {
auto_delete = true
mode = "READ_WRITE"
initialize_params {
image = var.elastic_instance_image_type
type = var.elastic_instance_disc_type
size = var.elasitc_instance_disc_size
}
}
attached_disk {
source = "${element(google_compute_disk.elastic-disk.*.self_link, count.index)}"
mode = "READ_WRITE"
}
network_interface {
network = var.elastic_instance_network
access_config {
}
}
}
The feature to set auto-delete for attached disks is not supported. HashiCorp/Google decided to not support this feature for Terraform.
Reference this issue:
If Terraform were told to remove the instance, but not the disks, and
auto-delete were enabled, then it would not specifically delete the
disks, but they would still be deleted by GCP. This behaviour would
not be shown in a plan run, and so could lead to unwanted outcomes, as
well as the state still showing the disks existing.
My opinion is that Terraform should manage the entire lifecycle from creation to destruction. For disks that you want to attach to a new instance, create those disks as part of your Terraform HCL and destroy them as part of your HCL.

How to fix "An Unknown Error Occurred" when creating multiple Google Cloud SQL instances with private IP simultaneously?

Our cloud backend setup contains 5 Cloud SQL for Postgres instances. We manage our infrastructure using Terraform. We are using connecting them from GKE using a public IP and the Cloud SQL container.
In order to simplify our setup we wish to get rid of the proxy containers by moving to a private IP. I tried following the Terraform guide. While a creating a single instance works fine, trying to create 5 instances simultaneously ends in 4 failed ones and one successful:
The error which appears in the Google Clod Console on the failed instances is "An Unknown Error occurred":
Following is the code which reproduces it. Pay attention to the count = 5 line:
resource "google_compute_network" "private_network" {
provider = "google-beta"
name = "private-network"
}
resource "google_compute_global_address" "private_ip_address" {
provider = "google-beta"
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = "${google_compute_network.private_network.self_link}"
}
resource "google_service_networking_connection" "private_vpc_connection" {
provider = "google-beta"
network = "${google_compute_network.private_network.self_link}"
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}
resource "google_sql_database_instance" "instance" {
provider = "google-beta"
count = 5
name = "private-instance-${count.index}"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
provider "google-beta" {
version = "~> 2.5"
credentials = "credentials.json"
project = "PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
}
I tried several alternatives:
Waiting a minute after creating the google_service_networking_connection and then creating all the instances simultaneously, but I got the same error.
Creating an address range and a google_service_networking_connection per instance, but I got an error that google_service_networking_connection cannot be created simultaneously.
Creating an address range per instance and a single google_service_networking_connection which links to all of them, but I got the same error.
Found an ugly yet working solution. There is a bug in GCP which does not prevent simultaneous creation of instances although it cannot be completed. There is neither documentation about it nor a meaningful error message. It appears in the Terraform Google provider issue tracker as well.
One alternative is adding a dependence between the instances. This allows their creation to complete successfully. However, each instance takes several minutes to create. This accumulates to many spent minutes. If we add an artificial delay of 60 seconds between instance creation, we manage to avoid the failures. Notes:
The needed amount of seconds to delay depends on the instance tier. For example, for db-f1-micro, 30 seconds were enough. They were not enough for db-custom-1-3840.
I am not sure what is the exact number of needed seconds for db-custom-1-3840. 30 seconds were not enough, 60 were.
Following is a the code sample to resolve the issue. It shows 2 instances only since due to depends_on limitations I could not use the count feature and showing the full code for 5 instances would be very long. It works the same for 5 instances:
resource "google_compute_network" "private_network" {
provider = "google-beta"
name = "private-network"
}
resource "google_compute_global_address" "private_ip_address" {
provider = "google-beta"
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = "${google_compute_network.private_network.self_link}"
}
resource "google_service_networking_connection" "private_vpc_connection" {
provider = "google-beta"
network = "${google_compute_network.private_network.self_link}"
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}
locals {
db_instance_creation_delay_factor_seconds = 60
}
resource "null_resource" "delayer_1" {
depends_on = ["google_service_networking_connection.private_vpc_connection"]
provisioner "local-exec" {
command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 0}"
}
}
resource "google_sql_database_instance" "instance_1" {
provider = "google-beta"
name = "private-instance-delayed-1"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection",
"null_resource.delayer_1"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
resource "null_resource" "delayer_2" {
depends_on = ["google_service_networking_connection.private_vpc_connection"]
provisioner "local-exec" {
command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 1}"
}
}
resource "google_sql_database_instance" "instance_2" {
provider = "google-beta"
name = "private-instance-delayed-2"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection",
"null_resource.delayer_2"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
provider "google-beta" {
version = "~> 2.5"
credentials = "credentials.json"
project = "PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
}
provider "null" {
version = "~> 1.0"
}
In case someone lands here with a slightly different case (creating google_sql_database_instance in a private network results in an "Unknown error"):
Launch one Cloud SQL instance manually (this will enable servicenetworking.googleapis.com and some other APIs for the project it seems)
Run your manifest
Terminate the instance created in step 1.
Works for me after that
¯_(ツ)_/¯
I land here with a slightly different case, same as #Grigorash Vasilij
(creating google_sql_database_instance in a private network results in an "Unknown error").
I was using the UI to deploy an SQL instance on a private VPC, for some reason that trows me an "Unknown error" as well. I finally solved using the gcloud command instead (why that works and no the UI? IDK, maybe the UI is not doing the same as the command)
gcloud --project=[PROJECT_ID] beta sql instances create [INSTANCE_ID]
--network=[VPC_NETWORK_NAME]
--no-assign-ip
follow this for more details