GCP - create multiple VM using terraform - google-cloud-platform

we need to create 2 VM in GCP using terraform with:
different startup scripts
we need the first and second VM will not be created at the same time.
we also need the IP of the first VM that was created and pass it in to the metadata script of the second VM.
How can we do it?
This is our working terraform file to create 1 VM:
provider "google" {
project = "project-name"
region = "europe-west3"
zone = "europe-west3-c"
}
resource "google_compute_instance" "vm_instance" {
name = "terra-instance-with-script-newvpc"
machine_type = "n1-standard-2"
zone = "europe-west3-c"
boot_disk {
initialize_params {
image ="centos-7"
}
}
metadata = {
startup-script = <<-EOF
hello world
EOF
}
network_interface {
subnetwork="test-subnet-frankfurt"
}
}
we tried to add another script and its failed

To ensure that the second GCE Instance is created after the first
one, you can use the depends_on argument (see the example code
below)
For the startup scripts, you can use a separate cloud-init script(s) and manage them using the Terraform variables, as
illustrated in the example code below.
For the IP address part, there are 2 possible use cases, depending on your needs:
Assign a Public Static IP: you can use the google_compute_address Terraform resource first to prepare an IP for the first instance, then call
this resource in the network_interface {access_config = { nat_ip = google_compute_address.static.first_instance_ip.address }} argument (see the
example below)
Then, in the init script of the second instance you can again call the Terraform resource google_compute_address.first_instance_ip.address and inject it into your script the way you want.
If you want to use an ephemeral (generated) IP instead, then you don’t need to use the google_compute_address at all, in your init script of the second instance, call the first instance Terraform resource with the nat ip attribute, e.g: google_compute_instance.firs_instance.network_interface.0.access_config.0.nat_ip and inject it in your script the way you want.
In the example code below, it assumes you will use a static IP address:
# Create the IP address
resource "google_compute_address" "static" {
name = "first_instance_ip_address"
}
# Source the init script and inject the IP Address variable
data "template_file" "init" {
template = "${file("./startup.sh")}" # call your startup script from it's path
vars = {
first_instance_ip = "${google_compute_address.static.address}"
# In case you use the generated IP:
# first_instance_ip = "${google_compute_compute_instance.first_instance.network_interface.0.access_config.0.nat_ip}"
}
}
# Create the 1st google compute instance
resource "google_compute_instance" "firs_instance" {
depends_on = [google_compute_address.static]
....
# Associated our public IP address to this instance
access_config = {
nat_ip = google_compute_address.static.address
}
}
...
# Create your second IP same as the first one
...
#Create the second google compute instance
resource "google_compute_instance" "second_instance" {
depends_on = [google_compute_instance.first_instance]
...
# Associated our public IP address to this instance
access_config = {
nat_ip = google_compute_address.second_static_ip.address # optional depends on your usecase
}
}
metadata_startup_script = "${data.template_file.startup_script.rendered}"
}
P.S: this is not a complete and production-ready code, it is a collection of tips to help you solve your issue.

Related

How can I configure Terraform to update a GCP compute engine instance template without destroying and re-creating?

I have a service deployed on GCP compute engine. It consists of a compute engine instance template, instance group, instance group manager, and load balancer + associated forwarding rules etc.
We're forced into using compute engine rather than Cloud Run or some other serverless offering due to the need for docker-in-docker for the service in question.
The deployment is managed by terraform. I have a config that looks something like this:
data "google_compute_image" "debian_image" {
family = "debian-11"
project = "debian-cloud"
}
resource "google_compute_instance_template" "my_service_template" {
name = "my_service"
machine_type = "n1-standard-1"
disk {
source_image = data.google_compute_image.debian_image.self_link
auto_delete = true
boot = true
}
...
metadata_startup_script = data.local_file.startup_script.content
metadata = {
MY_ENV_VAR = var.whatever
}
}
resource "google_compute_region_instance_group_manager" "my_service_mig" {
version {
instance_template = google_compute_instance_template.my_service_template.id
name = "primary"
}
...
}
resource "google_compute_region_backend_service" "my_service_backend" {
...
backend {
group = google_compute_region_instance_group_manager.my_service_mig.instance_group
}
}
resource "google_compute_forwarding_rule" "my_service_frontend" {
depends_on = [
google_compute_region_instance_group_manager.my_service_mig,
]
name = "my_service_ilb"
backend_service = google_compute_region_backend_service.my_service_backend.id
...
}
I'm running into issues where Terraform is unable to perform any kind of update to this service without running into conflicts. It seems that instance templates are immutable in GCP, and doing anything like updating the startup script, adding an env var, or similar forces it to be deleted and re-created.
Terraform prints info like this in that situation:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
~ update in-place
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.connectors_compute_engine.google_compute_instance_template.airbyte_translation_instance1 must be replaced
-/+ resource "google_compute_instance_template" "my_service_template" {
~ id = "projects/project/..." -> (known after apply)
~ metadata = { # forces replacement
+ "TEST" = "test"
# (1 unchanged element hidden)
}
The only solution I've found for getting out of this situation is to entirely delete the entire service and all associated entities from the load balancer down to the instance template and re-create them.
Is there some way to avoid this situation so that I'm able to change the instance template without having to manually update all the terraform config two times? At this point I'm even fine if it ends up creating some downtime for the service in question rather than a full rolling update or something since that's what's happening now anyway.
I was triggered by this issue as well.
However, according to:
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance_template#using-with-instance-group-manager
Instance Templates cannot be updated after creation with the Google
Cloud Platform API. In order to update an Instance Template, Terraform
will destroy the existing resource and create a replacement. In order
to effectively use an Instance Template resource with an Instance
Group Manager resource, it's recommended to specify
create_before_destroy in a lifecycle block. Either omit the Instance
Template name attribute, or specify a partial name with name_prefix.
I would also test and plan with this lifecycle meta argument as well:
+ lifecycle {
+ prevent_destroy = true
+ }
}
Or more realistically in your specific case, something like:
resource "google_compute_instance_template" "my_service_template" {
version {
instance_template = google_compute_instance_template.my_service_template.id
name = "primary"
}
+ lifecycle {
+ create_before_destroy = true
+ }
}
So terraform plan with either create_before_destroy or prevent_destroy = true before terraform apply on google_compute_instance_template to see results.
Ultimately, you can remove google_compute_instance_template.my_service_template.id from state file and import it back.
Some suggested workarounds in this thread:
terraform lifecycle prevent destroy

Terraform gcp get internal private ip of instance and write to file

Working on the following example github i am trying to add a static internal ip value to terraform ( so instance will run with specific internal ip ) or at least get the internal ip of the instance so i can write it in the internal file of the instance.
Any suggestion or thoughts on this one?
Thank you in advance.
You might like to check how terraform works with an IP address resource
A particulare example (from the top of my head, assuming that the subnetwork (my_subnet) is created in the same terraform batch):
resource "google_compute_address" "my_internal_ip_addr" {
project = "my_target_project_id"
address_type = "INTERNAL"
region = "my_region"
subnetwork = data.google_compute_subnetwork.my_subnet.name
name = "my_ip_address_name"
description = "An internal IP address for my VM"
}
and for a compute engine (ony relevant elements):
resource "google_compute_instance" "my_vm" {
project = "my_target_project_id"
name = "my_vm"
# other items omitted
network_interface {
subnetwork = data.google_compute_subnetwork.my_subnet.name
subnetwork_project = "my_target_project_id"
network_ip = google_compute_address.my_internal_ip_addr.address
access_config {
# Include this section to give the VM an external IP address
}
}
# other items omitted
}
However, I don't know if the above can help you with your ultimate requirement to i can write it in the internal file of the instance and why you need that.

Google Cloud VM Autoscaling in Terraform - Updating Images

I can create a fully-functioning autoscaling group in GCP Compute via Terraform, but it's not clear to me how to update the ASG to use a new image.
Here's an example of a working ASG:
resource "google_compute_region_autoscaler" "default" {
name = "example-autoscaler"
region = "us-west1"
project = "my-project
target = google_compute_region_instance_group_manager.default.id
autoscaling_policy {
max_replicas = 10
min_replicas = 3
cooldown_period = 60
cpu_utilization {
target = 0.5
}
}
}
resource "google_compute_region_instance_group_manager" "default" {
name = "example-igm"
region = "us-west1"
version {
instance_template = google_compute_instance_template.default.id
name = "primary"
}
target_pools = [google_compute_target_pool.default.id]
base_instance_name = "example"
}
resource "google_compute_target_pool" "default" {
name = "example-pool"
}
resource "google_compute_instance_template" "default" {
name = "example-template"
machine_type = "e2-medium"
can_ip_forward = false
tags = ["my-tag"]
disk {
source_image = data.google_compute_image.default.id
}
network_interface {
subnetwork = "my-subnet"
}
}
data "google_compute_image" "default" {
name = "my-image"
}
My goal is to be able to create a new Image (out of band) and then update my infrastructure to utilize it. It doesn't appear possible to change a google_compute_instance_template while it's in use.
One option I can think of is to create two separate templates, and then adjust the google_compute_region_instance_group_manager to refer to a different google_compute_instance_template which references the new image.
Another possible option is to use the version block inside the instance group manager. You can use this similarly to above to essentially toggle between two versions. You start with one "version" at 100%, and the other at 0%. When you create a new image, you update the version that is at 0% to point to the new image and change its skew to 100% and the other to 0%. I'm not actually sure this works, because you'd still have to update the template of the version that's at 0% and it might actually be in use.
Regardless, both of those methods are incredibly bulky for a large-scale production environment where we have multiple autoscaling groups in multiple regions and update them frequently.
Ideally I'd be able to change a variable that represents the image, terraform apply and that's it. Is there any way to do what I'm describing?
You need to use the lifecycle block with the create_before_destroy but also the attribute name of the google_compute_instance_template resource must be omitted or replaced by name_prefix attribute.
With this setup Terraform generates a unique name for your Instance Template and can then update the Instance Group manager without conflict before destroying the previous Instance Template.
References:
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance_template#using-with-instance-group-manager

Terraform: Create new GCE instance using same terraform code

I have created a new gcp vm instance successfully using terraform modules. The contents in my module folder are as follows
#main.tf
# google_compute_instance.default:
resource "google_compute_instance" "default" {
machine_type = "${var.machinetype}"
name = "${var.name}"
project = "demo"
tags = []
zone = "${var.zone}"
boot_disk {
initialize_params {
image = "https://www.googleapis.com/compute/v1/projects/centos-cloud/global/images/centos-7-v20210701"
size = 20
type = "pd-balanced"
}
}
network_interface {
network = "default"
subnetwork = "default"
subnetwork_project = "gcp-infrastructure-319318"
}
service_account {
email = "971558418058-compute#developer.gserviceaccount.com"
scopes = [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/trace.append",
]
}
}
-----------
#variables.tf
variable "zone" {
default="us-east1-b"
}
variable "machinetype" {
default="f1-micro"
}
------------------
#terraform.tfvars
machinetype="g1-small"
zone="europe-west1-b"
My main code block is follows
$ cat providers.tf
provider "google" {
}
$ cat main.tf
module "gce" {
source = "../../../modules/services/gce"
name = "new-vm-tf"
}
With this code I am able create a new vm instance successfully
$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
new-vm-tf us-east1-b f1-micro 10.142.0.3 RUNNING
Now I have a requirement to create a new VM instance of machine type e2-standard. how can I handle that scenario?
If I edit my existing main.tf as shown below, it will destroy the existing vm instance to create the new vm instance .
$ cat main.tf
module "gce" {
source = "../../../modules/services/gce"
name = "new-vm-tf1"
}
terraform plan confirms as below
~ name = "new-vm-tf" -> "new-vm-tf1" # forces replacement
Plan: 1 to add, 0 to change, 1 to destroy.
I need pointers to resuse the same code to create a new vm existing without any changes to existing one . Please suggest
I recommend you to deep dive into terraform mechanism and best practices. I have 2 keywords to start: tfstate and variables.
The tfstate is the deployment context. If you change the deployment and the context isn't consistent, Terraform delete what is not consistent and create the missing pieces.
The variables are useful to reuse piece of generic code by customizing the values in entry.

Google Cloud Run outbound static IP is 169.254.X.X instead of reserved one

I created a Google Cloud Run revision with a Serverless VPC access connector to a VPC Network. The VPC Network has access to the internet via Cloud NAT, to allow the Cloud Run instance to have a static outbound ip address, as described in this tutorial: https://cloud.google.com/run/docs/configuring/static-outbound-ip. I followed the tutorial, and all was well, I got a static ip adress on egress traffic from the Cloud Run instance.
I used terraform to deploy all the resources, the code of which you can find below. The problem is this: After destroying the resources, I got following error:
ERROR: (gcloud.compute.networks.delete) Could not fetch resource:
- The network resource 'projects/myproject/global/networks/webhook-network' is already being used by 'projects/myproject/global/networkInstances/v1823516883-618da3a7-bd4f-4524-...-...'
(the dots contain more numbers, but as this seems to be some kind of uuid I prefer not to share the rest).
So I can't delete the first network. When I change the network's name and reapply, the apply succeeds, but the outbound static ip address of the egress is 169.254.X.X, which I find the following information about:
"When you see a 169.254.X.X, you definitely have a problem" ==> smells like trouble.
Any Googlers that can help me out? I think the steps to reproduce the 'corrupted' VPC network is to create a Serverless Access Connector with a connection to the VPC, reference it with a Cloud Run revision, and then delete the VPC network and the Serverless Access Connector before you delete the Cloud Run revision, but honestly not sure, I don't really have spare GCP projects laying around to test it out on.
This StackOverflow question did not help out: https://serverfault.com/questions/1016015/google-cloud-platform-find-resource-by-full-resource-name, and it's the only related one I can find.
Anyone have any ideas?
locals {
region = "europe-west1"
}
resource "google_compute_network" "webhook_network" {
name = "webhook-network-6"
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "subnetwork" {
depends_on = [
google_compute_network.webhook_network
]
name = "webhook-subnet-6"
network = google_compute_network.webhook_network.self_link
ip_cidr_range = "10.14.0.0/28"
region = local.region
}
resource "google_compute_router" "router" {
depends_on = [
google_compute_subnetwork.subnetwork,
google_compute_network.webhook_network
]
name = "router6"
region = google_compute_subnetwork.subnetwork.region
network = google_compute_network.webhook_network.name
}
// I created the static IP address manually
//resource "google_compute_address" "static_address" {
// name = "nat-static-ip-address"
// region = local.region
//}
resource "google_compute_router_nat" "advanced-nat" {
name = "natt"
router = google_compute_router.router.name
region = local.region
nat_ip_allocate_option = "MANUAL_ONLY"
nat_ips = [
var.ip_name
]
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}
data "google_project" "project" {
}
}
resource "google_vpc_access_connector" "access_connector" {
depends_on = [
google_compute_network.webhook_network,
google_compute_subnetwork.subnetwork
]
name = "stat-ip-conn-6"
project = var.project_id
region = local.region
ip_cidr_range = "10.4.0.0/28"
network = google_compute_network.webhook_network.name
}
Turns out I was working correctly, the way to test it was wrong. I was testing it using following Cloud Function:
def hello_world(request):
request_json = request.get_json()
ip = request.remote_addr # the culprit
remote_port = request.environ.get('REMOTE_PORT')
url = request.url
host_url = request.host_url
return {"ip": ip, "url": url, "port": remote_port, "host_url": host_url}
which returns the 169.254.X.X, but when I test against curlmyip.org, it is indeed the correct ip address.
But, that still does not solve the issue of not being able to delete the VPC network.