Is it possible to update existing compute engine using terraform? - google-cloud-platform

I'm a beginner in terraform and I'm planning to create a terraform module that updates the tags and service account of an existing compute engine in google cloud.
The problem is this existing VMs were provisioned manually and there is a need to update the tags and service account occasionally.
So far, I created a small module that use data to fetch the details and resource to create my changes but this doesn't fit the requirement as its creating a new resource.
data "google_compute_instance" "data_host" {
name = var.instance_name
project = var.project_name
zone = var.project_zone
}
resource "google_compute_instance" "host" {
name = data.google_compute_instance.data_host.self_link
zone = data.google_compute_instance.data_host.self_link
machine_type = data.google_compute_instance.data_host.self_link
boot_disk {
source = data.google_compute_instance.data_host.self_link
}
network_interface {
network = data.google_compute_instance.data_host.self_link
}
service_account {
email = var.service_account
scopes = ["cloud-platform"]
}
etc...
}
are there anyway to update my existing VMs without recreating the entire system?

Related

Google Cloud VM Autoscaling in Terraform - Updating Images

I can create a fully-functioning autoscaling group in GCP Compute via Terraform, but it's not clear to me how to update the ASG to use a new image.
Here's an example of a working ASG:
resource "google_compute_region_autoscaler" "default" {
name = "example-autoscaler"
region = "us-west1"
project = "my-project
target = google_compute_region_instance_group_manager.default.id
autoscaling_policy {
max_replicas = 10
min_replicas = 3
cooldown_period = 60
cpu_utilization {
target = 0.5
}
}
}
resource "google_compute_region_instance_group_manager" "default" {
name = "example-igm"
region = "us-west1"
version {
instance_template = google_compute_instance_template.default.id
name = "primary"
}
target_pools = [google_compute_target_pool.default.id]
base_instance_name = "example"
}
resource "google_compute_target_pool" "default" {
name = "example-pool"
}
resource "google_compute_instance_template" "default" {
name = "example-template"
machine_type = "e2-medium"
can_ip_forward = false
tags = ["my-tag"]
disk {
source_image = data.google_compute_image.default.id
}
network_interface {
subnetwork = "my-subnet"
}
}
data "google_compute_image" "default" {
name = "my-image"
}
My goal is to be able to create a new Image (out of band) and then update my infrastructure to utilize it. It doesn't appear possible to change a google_compute_instance_template while it's in use.
One option I can think of is to create two separate templates, and then adjust the google_compute_region_instance_group_manager to refer to a different google_compute_instance_template which references the new image.
Another possible option is to use the version block inside the instance group manager. You can use this similarly to above to essentially toggle between two versions. You start with one "version" at 100%, and the other at 0%. When you create a new image, you update the version that is at 0% to point to the new image and change its skew to 100% and the other to 0%. I'm not actually sure this works, because you'd still have to update the template of the version that's at 0% and it might actually be in use.
Regardless, both of those methods are incredibly bulky for a large-scale production environment where we have multiple autoscaling groups in multiple regions and update them frequently.
Ideally I'd be able to change a variable that represents the image, terraform apply and that's it. Is there any way to do what I'm describing?
You need to use the lifecycle block with the create_before_destroy but also the attribute name of the google_compute_instance_template resource must be omitted or replaced by name_prefix attribute.
With this setup Terraform generates a unique name for your Instance Template and can then update the Instance Group manager without conflict before destroying the previous Instance Template.
References:
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance_template#using-with-instance-group-manager

Terraform GCP instance from machine-image in different project

On the GCP CLI , I can create a VM in "project-a" from a machine-image in "project-b":
(from "project-a" cloud shell)
gcloud beta compute instances create new-vm-name --project=project-a --zone=us-east4-a --source-machine-image=projects/project-b/global/machineImages/image-name --service-account=12345678-compute#developer.gserviceaccount.com --network=net-name --subnet=subnet-name
However, in Terraform, this is failing. This is my TF code:
resource "google_compute_instance_from_machine_image" "tpl" {
provider = google-beta
name = "${var.project_short_name}-admin"
project = var.project_id
zone = "us-east4-a"
network_interface {
network = module.vpc.network_self_link
subnetwork = module.vpc.subnets_self_links[0]
}
source_machine_image = "image-from-project-b"
// Override fields from machine image
can_ip_forward = false
labels = {
image = "05-apr-2022"
}
This is my error message:
Invalid value for field 'resource.disks[0].source
Disk must be in the same project as the instance or instance template., invalid
Can anyone help me determine if TF can even do this (docs seem to say yes) or what I am doing wrong?

Terraform: Create new GCE instance using same terraform code

I have created a new gcp vm instance successfully using terraform modules. The contents in my module folder are as follows
#main.tf
# google_compute_instance.default:
resource "google_compute_instance" "default" {
machine_type = "${var.machinetype}"
name = "${var.name}"
project = "demo"
tags = []
zone = "${var.zone}"
boot_disk {
initialize_params {
image = "https://www.googleapis.com/compute/v1/projects/centos-cloud/global/images/centos-7-v20210701"
size = 20
type = "pd-balanced"
}
}
network_interface {
network = "default"
subnetwork = "default"
subnetwork_project = "gcp-infrastructure-319318"
}
service_account {
email = "971558418058-compute#developer.gserviceaccount.com"
scopes = [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/trace.append",
]
}
}
-----------
#variables.tf
variable "zone" {
default="us-east1-b"
}
variable "machinetype" {
default="f1-micro"
}
------------------
#terraform.tfvars
machinetype="g1-small"
zone="europe-west1-b"
My main code block is follows
$ cat providers.tf
provider "google" {
}
$ cat main.tf
module "gce" {
source = "../../../modules/services/gce"
name = "new-vm-tf"
}
With this code I am able create a new vm instance successfully
$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
new-vm-tf us-east1-b f1-micro 10.142.0.3 RUNNING
Now I have a requirement to create a new VM instance of machine type e2-standard. how can I handle that scenario?
If I edit my existing main.tf as shown below, it will destroy the existing vm instance to create the new vm instance .
$ cat main.tf
module "gce" {
source = "../../../modules/services/gce"
name = "new-vm-tf1"
}
terraform plan confirms as below
~ name = "new-vm-tf" -> "new-vm-tf1" # forces replacement
Plan: 1 to add, 0 to change, 1 to destroy.
I need pointers to resuse the same code to create a new vm existing without any changes to existing one . Please suggest
I recommend you to deep dive into terraform mechanism and best practices. I have 2 keywords to start: tfstate and variables.
The tfstate is the deployment context. If you change the deployment and the context isn't consistent, Terraform delete what is not consistent and create the missing pieces.
The variables are useful to reuse piece of generic code by customizing the values in entry.

How to set auto-delete option for additional attached_disk in gcp instance uing terraform?

I am trying to create a vm instance in gcp with a boot_disk and additional attached_disk using terraform. I could not find any parameter to auto delete the additional attached_disk when instance is deleted.
auto-delete option is availble in gcp console.
Terraform code:
resource "google_compute_disk" "elastic-disk" {
count = var.no_of_elastic_intances
name = "elastic-disk-${count.index+1}-data"
type = "pd-standard"
size = "10"
}
resource "google_compute_instance" "elastic" {
count = var.no_of_elastic_intances
name = "${var.elastic_instance_name_prefix}-${count.index+1}"
machine_type = var.elastic_instance_machine_type
boot_disk {
auto_delete = true
mode = "READ_WRITE"
initialize_params {
image = var.elastic_instance_image_type
type = var.elastic_instance_disc_type
size = var.elasitc_instance_disc_size
}
}
attached_disk {
source = "${element(google_compute_disk.elastic-disk.*.self_link, count.index)}"
mode = "READ_WRITE"
}
network_interface {
network = var.elastic_instance_network
access_config {
}
}
}
The feature to set auto-delete for attached disks is not supported. HashiCorp/Google decided to not support this feature for Terraform.
Reference this issue:
If Terraform were told to remove the instance, but not the disks, and
auto-delete were enabled, then it would not specifically delete the
disks, but they would still be deleted by GCP. This behaviour would
not be shown in a plan run, and so could lead to unwanted outcomes, as
well as the state still showing the disks existing.
My opinion is that Terraform should manage the entire lifecycle from creation to destruction. For disks that you want to attach to a new instance, create those disks as part of your Terraform HCL and destroy them as part of your HCL.

How to fix "An Unknown Error Occurred" when creating multiple Google Cloud SQL instances with private IP simultaneously?

Our cloud backend setup contains 5 Cloud SQL for Postgres instances. We manage our infrastructure using Terraform. We are using connecting them from GKE using a public IP and the Cloud SQL container.
In order to simplify our setup we wish to get rid of the proxy containers by moving to a private IP. I tried following the Terraform guide. While a creating a single instance works fine, trying to create 5 instances simultaneously ends in 4 failed ones and one successful:
The error which appears in the Google Clod Console on the failed instances is "An Unknown Error occurred":
Following is the code which reproduces it. Pay attention to the count = 5 line:
resource "google_compute_network" "private_network" {
provider = "google-beta"
name = "private-network"
}
resource "google_compute_global_address" "private_ip_address" {
provider = "google-beta"
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = "${google_compute_network.private_network.self_link}"
}
resource "google_service_networking_connection" "private_vpc_connection" {
provider = "google-beta"
network = "${google_compute_network.private_network.self_link}"
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}
resource "google_sql_database_instance" "instance" {
provider = "google-beta"
count = 5
name = "private-instance-${count.index}"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
provider "google-beta" {
version = "~> 2.5"
credentials = "credentials.json"
project = "PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
}
I tried several alternatives:
Waiting a minute after creating the google_service_networking_connection and then creating all the instances simultaneously, but I got the same error.
Creating an address range and a google_service_networking_connection per instance, but I got an error that google_service_networking_connection cannot be created simultaneously.
Creating an address range per instance and a single google_service_networking_connection which links to all of them, but I got the same error.
Found an ugly yet working solution. There is a bug in GCP which does not prevent simultaneous creation of instances although it cannot be completed. There is neither documentation about it nor a meaningful error message. It appears in the Terraform Google provider issue tracker as well.
One alternative is adding a dependence between the instances. This allows their creation to complete successfully. However, each instance takes several minutes to create. This accumulates to many spent minutes. If we add an artificial delay of 60 seconds between instance creation, we manage to avoid the failures. Notes:
The needed amount of seconds to delay depends on the instance tier. For example, for db-f1-micro, 30 seconds were enough. They were not enough for db-custom-1-3840.
I am not sure what is the exact number of needed seconds for db-custom-1-3840. 30 seconds were not enough, 60 were.
Following is a the code sample to resolve the issue. It shows 2 instances only since due to depends_on limitations I could not use the count feature and showing the full code for 5 instances would be very long. It works the same for 5 instances:
resource "google_compute_network" "private_network" {
provider = "google-beta"
name = "private-network"
}
resource "google_compute_global_address" "private_ip_address" {
provider = "google-beta"
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = "${google_compute_network.private_network.self_link}"
}
resource "google_service_networking_connection" "private_vpc_connection" {
provider = "google-beta"
network = "${google_compute_network.private_network.self_link}"
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}
locals {
db_instance_creation_delay_factor_seconds = 60
}
resource "null_resource" "delayer_1" {
depends_on = ["google_service_networking_connection.private_vpc_connection"]
provisioner "local-exec" {
command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 0}"
}
}
resource "google_sql_database_instance" "instance_1" {
provider = "google-beta"
name = "private-instance-delayed-1"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection",
"null_resource.delayer_1"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
resource "null_resource" "delayer_2" {
depends_on = ["google_service_networking_connection.private_vpc_connection"]
provisioner "local-exec" {
command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 1}"
}
}
resource "google_sql_database_instance" "instance_2" {
provider = "google-beta"
name = "private-instance-delayed-2"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection",
"null_resource.delayer_2"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
provider "google-beta" {
version = "~> 2.5"
credentials = "credentials.json"
project = "PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
}
provider "null" {
version = "~> 1.0"
}
In case someone lands here with a slightly different case (creating google_sql_database_instance in a private network results in an "Unknown error"):
Launch one Cloud SQL instance manually (this will enable servicenetworking.googleapis.com and some other APIs for the project it seems)
Run your manifest
Terminate the instance created in step 1.
Works for me after that
¯_(ツ)_/¯
I land here with a slightly different case, same as #Grigorash Vasilij
(creating google_sql_database_instance in a private network results in an "Unknown error").
I was using the UI to deploy an SQL instance on a private VPC, for some reason that trows me an "Unknown error" as well. I finally solved using the gcloud command instead (why that works and no the UI? IDK, maybe the UI is not doing the same as the command)
gcloud --project=[PROJECT_ID] beta sql instances create [INSTANCE_ID]
--network=[VPC_NETWORK_NAME]
--no-assign-ip
follow this for more details