Terraform: Creating GCP Project using Shared VPC - google-cloud-platform

I've been working through this for what feels like an eternity now.. so the host project already exists.. and has all the VPN's and networking set up. I am looking to create a new project, through Terraform and allowing it to use the host projects shared VPC.
Every time I run up against a problem and end up resolving it, I just run up against another one.
Right now I'm seeing:
google_compute_shared_vpc_service_project.project: googleapi: Error 404: The resource 'projects/intacct-staging-db3b7e7a' was not found, notFound
* google_compute_instance.dokku: 1 error(s) occurred:
As well as:
google_compute_instance.dokku: Error loading zone 'europe-west2-a': googleapi: Error 404: Failed to find project intacct-staging, notFound
I was originally convinced it was ordering, which is why I was playing around with depends_on configurations, to try and sort out the order. That hasn't seemed to resolve it.
Reading it simply, google_compute_shared_vpc_service_project doesn't exist as far as google_compute_shared_vpc_service_project is concerned. Even though I've added the following to google_compute_shared_vpc_service_project:
depends_on = ["google_project.project",
"google_compute_shared_vpc_host_project.host_project",
]
Perhaps, because the host project already exists I should use data to refer to it instead of resource?
My full TF File is here:
provider "google" {
region = "${var.gcp_region}"
credentials = "${file("./creds/serviceaccount.json")}"
}
resource "random_id" "id" {
byte_length = 4
prefix = "${var.project_name}-"
}
resource "google_project" "project" {
name = "${var.project_name}"
project_id = "${random_id.id.hex}"
billing_account = "${var.billing_account}"
org_id = "${var.org_id}"
}
resource "google_project_services" "project" {
project = "${google_project.project.project_id}"
services = [
"compute.googleapis.com"
]
depends_on = [ "google_project.project" ]
}
# resource "google_service_account" "service-account" {
# account_id = "intacct-staging-service"
# display_name = "Service Account for the intacct staging app"
# }
resource "google_compute_shared_vpc_host_project" "host_project" {
project = "${var.vpc_parent}"
}
resource "google_compute_shared_vpc_service_project" "project" {
host_project = "${google_compute_shared_vpc_host_project.host_project.project}"
service_project = "${google_project.project.project_id}"
depends_on = ["google_project.project",
"google_compute_shared_vpc_host_project.host_project",
]
}
resource "google_compute_address" "dokku" {
name = "fr-intacct-staging-ip"
address_type = "EXTERNAL"
project = "${google_project.project.project_id}"
depends_on = [ "google_project_services.project" ]
}
resource "google_compute_instance" "dokku" {
project = "${google_project.project.name}"
name = "dokku-host"
machine_type = "${var.comp_type}"
zone = "${var.gcp_zone}"
allow_stopping_for_update = "true"
tags = ["intacct"]
# Install Dokku
metadata_startup_script = <<SCRIPT
sed -i 's/PermitRootLogin no/PermitRootLogin yes/' /etc/ssh/sshd_config && service sshd restart
SCRIPT
boot_disk {
initialize_params {
image = "${var.compute_image}"
}
}
network_interface {
subnetwork = "${var.subnetwork}"
subnetwork_project = "${var.vpc_parent}"
access_config = {
nat_ip = "${google_compute_address.dokku.address}"
}
}
metadata {
sshKeys = "root:${file("./id_rsa.pub")}"
}
}
EDIT:
As discussed below I was able to resolve the latter project not found error by changing the reference to project_id instead of name as name does not include the random hex.
I'm now also seeing another error, referring to the static IP. The network interface is configured to use the subnetwork from the Host VPC...
network_interface {
subnetwork = "${var.subnetwork}"
subnetwork_project = "${var.vpc_parent}"
access_config = {
nat_ip = "${google_compute_address.dokku.address}"
}
}
The IP is setup here:
resource "google_compute_address" "dokku" {
name = "fr-intacct-staging-ip"
address_type = "EXTERNAL"
project = "${google_project.project.project_id}"
}
The IP should really be in the host project, which I've tried.. and when I do I get an error saying that cross-project is not allowed with this resource.
When I change to the above, it also errors saying that the new project is now capable of handling API Calls. Which I suppose would make sense as I only allowed compute API calls per the google_project_services resource.
I'll try allowing network API calls and see if that works, but I'm thinking the external IP needs to be in the host project's shared VPC?

For anyone encountering the same problem, in my case the project not found error was solved just by enabling the Compute Engine API.

Related

Terraform loop through multiple providers(accounts) - invokation through module

i have a use case where need help to use for_each to loop through multiple providers( AWS accounts & regions) and this is a module, the TF will be using hub and spoke model.
below is the TF Pseudo code i would like to achieve.
module.tf
---------
app_accounts = [
{ "account" : "53xxxx08", "app_vpc_id" : "vpc-0fxxxxxfec8", "role" : "xxxxxxx", "profile" : "child1"},
{ "account" : "53xxxx08", "app_vpc_id" : "vpc-0fxxxxxfec8", "role" : "xxxxxxx", "profile" : "child2"}
]
below are the provider and resource files, pleas ignore the variables and output files, as its not relevant here
provider.tf
------------
provider "aws" {
for_each = var.app_accounts
alias = "child"
profile = each.value.role
}
here is the main resouce block where i want to multiple child accounts against single master account, so i want to iterate through the loop
resource "aws_route53_vpc_association_authorization" "master" {
provider = aws.master
vpc_id = vpc_id
zone_id = zone_id
}
resource "aws_route53_zone_association" "child" {
provider = aws.child
vpc_id = vpc_id
zone_id = zone_id
}
any idea on how to achieve this, please? thanks in advance.
The typical way to achieve your goal in Terraform is to define a shared module representing the objects that should be present in a single account and then to call that module once for each account, passing a different provider configuration into each.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
}
}
provider "aws" {
alias = "master"
# ...
}
provider "aws" {
alias = "example1"
profile = "example1"
}
module "example1" {
source = "./modules/account"
account = "53xxxx08"
app_vpc_id = "vpc-0fxxxxxfec8"
providers = {
aws = aws.example1
aws.master = aws.master
}
}
provider "aws" {
alias = "example2"
profile = "example2"
}
module "example2" {
source = "./modules/account"
account = "53xxxx08"
app_vpc_id = "vpc-0fxxxxxfec8"
providers = {
aws = aws.example2
aws.master = aws.master
}
}
The ./modules/account directory would then contain the resource blocks describing what should exist in each individual account. For example:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
configuration_aliases = [ aws, aws.master ]
}
}
}
variable "account" {
type = string
}
variable "app_vpc_id" {
type = string
}
resource "aws_route53_zone" "example" {
# (omitting the provider argument will associate
# with the default provider configuration, which
# is different for each instance of this module)
# ...
}
resource "aws_route53_vpc_association_authorization" "master" {
provider = aws.master
vpc_id = var.app_vpc_id
zone_id = aws_route53_zone.example.id
}
resource "aws_route53_zone_association" "child" {
provider = aws.master
vpc_id = var.app_vpc_id
zone_id = aws_route53_zone.example.id
}
(I'm not sure if you actually intended var.app_vpc_id to be the VPC specified for those zone associations, but my goal here is only to show the general pattern, not to show a fully-working example.)
Using a shared module in this way allows to avoid repeating the definitions for each account separately, and keeps each account-specific setting specified in only one place (either in a provider "aws" block or in a module block).
There is no way to make this more dynamic within the Terraform language itself, but if you expect to be adding and removing accounts regularly and want to make it more systematic then you could use code generation for the root module to mechanically produce the provider and module block for each account, to ensure that they all remain consistent and that you can update them all together in case you need to change the interface of the shared module in a way that will affect all of the calls.

Terraform referenced module has no attributes error

I am trying to use a Palo Alto Networks module to deploy a panorama VM instance to GCP with Terraform. In the example module, I see they create a VPC together with a subnetwork, however, I have an existing VPC I am adding to. So I data source the VPC and create the subnetwork with a module. Upon referencing this subnetwork in my VM instance module, it complains it has no attributes:
Error: Incorrect attribute value type
on ../../../../modules/panorama/main.tf line 67, in resource "google_compute_instance" "panorama":
67: subnetwork = var.subnet
|----------------
| var.subnet is object with no attributes
Here is the subnet code:
data "google_compute_network" "panorama" {
project = var.project_id
name = "fed-il4-p-net-panorama"
}
module "panorama_subnet" {
source = "../../../../modules/subnetwork-module"
subnet_name = "panorama-${var.region_short_name[var.region]}"
subnet_ip = var.panorama_subnet
subnet_region = var.region
project_id = var.project_id
network = data.google_compute_network.panorama.self_link
}
Here is the panorama VM instance code:
module "panorama" {
source = "../../../../modules/panorama"
name = "${var.project_id}-panorama-${var.region_short_name[var.region]}"
project = var.project_id
region = var.region
zone = data.google_compute_zones.zones.names[0]
*# panorama_version = var.panorama_version
ssh_keys = (file(var.ssh_keys))
network = data.google_compute_network.panorama.self_link
subnet = module.panorama <====== I cannot do module.panorama.id or .name here
private_static_ip = var.private_static_ip
custom_image = var.custom_image_pano
#attach_public_ip = var.attach_public_ip
}
Can anyone tell me what I may be doing wrong. Any help would be appreciated. Thanks!
Edit:
parent module for vm instance
resource "google_compute_instance" "panorama" {
name = var.name
zone = var.zone
machine_type = var.machine_type
min_cpu_platform = var.min_cpu_platform
labels = var.labels
tags = var.tags
project = var.project
can_ip_forward = false
allow_stopping_for_update = true
metadata = merge({
serial-port-enable = true
ssh-keys = var.ssh_keys
}, var.metadata)
network_interface {
/*
dynamic "access_config" {
for_each = var.attach_public_ip ? [""] : []
content {
nat_ip = google_compute_address.public[0].address
}
}
*/
network_ip = google_compute_address.private.address
network = var.network
subnetwork = var.subnet
}
I've come across this issue var.xxx is object with [n] attributes multiple times, and 9/10 times it has got to do with wrong referencing of a variable. In your case, in the panorama VM module , you're assigning value of subnet as:
subnet = module.panorama
Now, its not possible to assign a module to an attribute within the module. From your problem statement, i see you're trying to get name attribute assigned to subnet. Try this:
subnet = this.id OR
subnet = this.name
Also, regarding what values can be called, the resources defined in a module are encapsulated, so the calling module cannot access their attributes directly. However, the child module can declare output values to selectively export certain values to be accessed by the calling module.
For example, if the ./panorama module referenced in the example below exported an output value named subnet
module "panorama" {
source = "../../../../modules/panorama"
output "subnet_name" {
value = var.subnet
}
OR WITHOUT SETTING subnet VALUE
output "subnet_name" {
value = var.name
}
then the calling module can reference that result using the expression module.panorama.outputs.subnet_name. Hope this helps

Cannot create Cloud Composer

I am getting the error below while provisioning Composer via terraform.
Error: Error waiting to create Environment: Error waiting to create Environment: Error waiting for Creating Environment: error while retrieving operation: Get "https://composer.googleapis.com/v1beta1/projects/aayush-terraform/locations/us-central1/operations/ee459492-abb0-4646-893e-09d112219d79?alt=json&prettyPrint=false": write tcp 10.227.112.165:63811->142.251.12.95:443: write: broken pipe. An initial environment was or is still being created, and clean up failed with error: Getting creation operation state failed while waiting for environment to finish creating, but environment seems to still be in 'CREATING' state. Wait for operation to finish and either manually delete environment or import "projects/aayush-terraform/locations/us-central1/environments/example-composer-env" into your state.
Below is the code snippet:
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "~>3.0"
}
}
}
variable "gcp_region" {
type = string
description = "Region to use for GCP provider"
default = "us-central1"
}
variable "gcp_project" {
type = string
description = "Project to use for this config"
default = "aayush-terraform"
}
provider "google" {
region = var.gcp_region
project = var.gcp_project
}
resource "google_service_account" "test" {
account_id = "composer-env-account"
display_name = "Test Service Account for Composer Environment"
}
resource "google_project_iam_member" "composer-worker" {
role = "roles/composer.worker"
member = "serviceAccount:${google_service_account.test.email}"
}
resource "google_compute_network" "test" {
name = "composer-test-network"
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "test" {
name = "composer-test-subnetwork"
ip_cidr_range = "10.2.0.0/16"
region = "us-central1"
network = google_compute_network.test.id
}
resource "google_composer_environment" "test" {
name = "example-composer-env"
region = "us-central1"
config {
node_count = 3
node_config {
zone = "us-central1-a"
machine_type = "n1-standard-1"
network = google_compute_network.test.id
subnetwork = google_compute_subnetwork.test.id
service_account = google_service_account.test.name
}
}
}
NOTE: Composer is getting created even after this error is being thrown and I am provisioning this composer via service account which has been given owner access.
I had the same problem and I solved it by giving the "composer.operations.get
" permission to the service account which is provisioning the Composer.
This permission is part of the Composer Administrator role.
To prevent future operations like updates or deletion through Terraform, I think it's better to use the role rather than a single permission.
Or if you want to make some least privileges work, you can first use the role, then removing permissions you think you won't need and test your terraform code.

Terraform cycle with AWS and Kubernetes provider

My Terraform code describes some AWS infrastructure to build a Kubernetes cluster including some deployments into the cluster. When I try to destroy the infrastructure using terraform plan -destroy I get a cycle:
module.eks_control_plane.aws_eks_cluster.this[0] (destroy)
module.eks_control_plane.output.cluster
provider.kubernetes
module.aws_auth.kubernetes_config_map.this[0] (destroy)
data.aws_eks_cluster_auth.this[0] (destroy)
Destroying the infrastructure works by hand using just terraform destroy works fine. Unfortunately, Terraform Cloud uses terraform plan -destroy to plan the destructuion first, which causes this to fail. Here is the relevant code:
excerpt from eks_control_plane module:
resource "aws_eks_cluster" "this" {
count = var.enabled ? 1 : 0
name = var.cluster_name
role_arn = aws_iam_role.control_plane[0].arn
version = var.k8s_version
# https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html
enabled_cluster_log_types = var.control_plane_log_enabled ? var.control_plane_log_types : []
vpc_config {
security_group_ids = [aws_security_group.control_plane[0].id]
subnet_ids = [for subnet in var.control_plane_subnets : subnet.id]
}
tags = merge(var.tags,
{
}
)
depends_on = [
var.dependencies,
aws_security_group.node,
aws_iam_role_policy_attachment.control_plane_cluster_policy,
aws_iam_role_policy_attachment.control_plane_service_policy,
aws_iam_role_policy.eks_cluster_ingress_loadbalancer_creation,
]
}
output "cluster" {
value = length(aws_eks_cluster.this) > 0 ? aws_eks_cluster.this[0] : null
}
aws-auth Kubernetes config map from aws_auth module:
resource "kubernetes_config_map" "this" {
count = var.enabled ? 1 : 0
metadata {
name = "aws-auth"
namespace = "kube-system"
}
data = {
mapRoles = jsonencode(
concat(
[
{
rolearn = var.node_iam_role.arn
username = "system:node:{{EC2PrivateDNSName}}"
groups = [
"system:bootstrappers",
"system:nodes",
]
}
],
var.map_roles
)
)
}
depends_on = [
var.dependencies,
]
}
Kubernetes provider from root module:
data "aws_eks_cluster_auth" "this" {
count = module.eks_control_plane.cluster != null ? 1 : 0
name = module.eks_control_plane.cluster.name
}
provider "kubernetes" {
version = "~> 1.10"
load_config_file = false
host = module.eks_control_plane.cluster != null ? module.eks_control_plane.cluster.endpoint : null
cluster_ca_certificate = module.eks_control_plane.cluster != null ? base64decode(module.eks_control_plane.cluster.certificate_authority[0].data) : null
token = length(data.aws_eks_cluster_auth.this) > 0 ? data.aws_eks_cluster_auth.this[0].token : null
}
And this is how the modules are called:
module "eks_control_plane" {
source = "app.terraform.io/SDA-SE/eks-control-plane/aws"
version = "0.0.1"
enabled = local.k8s_enabled
cluster_name = var.name
control_plane_subnets = module.vpc.private_subnets
k8s_version = var.k8s_version
node_subnets = module.vpc.private_subnets
tags = var.tags
vpc = module.vpc.vpc
dependencies = concat(var.dependencies, [
# Ensure that VPC including all security group rules, network ACL rules,
# routing table entries, etc. is fully created
module.vpc,
])
}
# aws-auth config map module. Creating this config map will allow nodes and
# Other users to join the cluster.
# CNI and CSI plugins must be set up before creating this config map.
# Enable or disable this via `aws_auth_enabled` variable.
# TODO: Add Developer and other roles.
module "aws_auth" {
source = "app.terraform.io/SDA-SE/aws-auth/kubernetes"
version = "0.0.0"
enabled = local.aws_auth_enabled
node_iam_role = module.eks_control_plane.node_iam_role
map_roles = [
{
rolearn = "arn:aws:iam::${var.aws_account_id}:role/Administrator"
username = "admin"
groups = [
"system:masters",
]
},
{
rolearn = "arn:aws:iam::${var.aws_account_id}:role/Terraform"
username = "terraform"
groups = [
"system:masters",
]
}
]
}
Removing the aws_auth config map, which means not using the Kubernetes provider at all, breaks the cycle. The problem is obviously that Terraform tries to destroys the Kubernetes cluster, which is required for the Kubernetes provider. Manually removing the resources step by step using multiple terraform apply steps works fine, too.
Is there a way that I can tell Terraform first to destroy all Kubernetes resources so that the Provider is not required anymore, then destroy the EKS cluster?
You can control the order of destruction using the depends_on meta-argument, like you did with some of your Terraform code.
If you add the depends_on argument to all of the required resources that are needing to be destroyed first and have it depend on the eks-cluster Terraform will destroy those resources before the cluster.
You can also visualize your configuration and dependency with the terraform graph command to help you make decisions on what dependencies need to be created.
https://www.terraform.io/docs/cli/commands/graph.html

How to fix "An Unknown Error Occurred" when creating multiple Google Cloud SQL instances with private IP simultaneously?

Our cloud backend setup contains 5 Cloud SQL for Postgres instances. We manage our infrastructure using Terraform. We are using connecting them from GKE using a public IP and the Cloud SQL container.
In order to simplify our setup we wish to get rid of the proxy containers by moving to a private IP. I tried following the Terraform guide. While a creating a single instance works fine, trying to create 5 instances simultaneously ends in 4 failed ones and one successful:
The error which appears in the Google Clod Console on the failed instances is "An Unknown Error occurred":
Following is the code which reproduces it. Pay attention to the count = 5 line:
resource "google_compute_network" "private_network" {
provider = "google-beta"
name = "private-network"
}
resource "google_compute_global_address" "private_ip_address" {
provider = "google-beta"
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = "${google_compute_network.private_network.self_link}"
}
resource "google_service_networking_connection" "private_vpc_connection" {
provider = "google-beta"
network = "${google_compute_network.private_network.self_link}"
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}
resource "google_sql_database_instance" "instance" {
provider = "google-beta"
count = 5
name = "private-instance-${count.index}"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
provider "google-beta" {
version = "~> 2.5"
credentials = "credentials.json"
project = "PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
}
I tried several alternatives:
Waiting a minute after creating the google_service_networking_connection and then creating all the instances simultaneously, but I got the same error.
Creating an address range and a google_service_networking_connection per instance, but I got an error that google_service_networking_connection cannot be created simultaneously.
Creating an address range per instance and a single google_service_networking_connection which links to all of them, but I got the same error.
Found an ugly yet working solution. There is a bug in GCP which does not prevent simultaneous creation of instances although it cannot be completed. There is neither documentation about it nor a meaningful error message. It appears in the Terraform Google provider issue tracker as well.
One alternative is adding a dependence between the instances. This allows their creation to complete successfully. However, each instance takes several minutes to create. This accumulates to many spent minutes. If we add an artificial delay of 60 seconds between instance creation, we manage to avoid the failures. Notes:
The needed amount of seconds to delay depends on the instance tier. For example, for db-f1-micro, 30 seconds were enough. They were not enough for db-custom-1-3840.
I am not sure what is the exact number of needed seconds for db-custom-1-3840. 30 seconds were not enough, 60 were.
Following is a the code sample to resolve the issue. It shows 2 instances only since due to depends_on limitations I could not use the count feature and showing the full code for 5 instances would be very long. It works the same for 5 instances:
resource "google_compute_network" "private_network" {
provider = "google-beta"
name = "private-network"
}
resource "google_compute_global_address" "private_ip_address" {
provider = "google-beta"
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = "${google_compute_network.private_network.self_link}"
}
resource "google_service_networking_connection" "private_vpc_connection" {
provider = "google-beta"
network = "${google_compute_network.private_network.self_link}"
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}
locals {
db_instance_creation_delay_factor_seconds = 60
}
resource "null_resource" "delayer_1" {
depends_on = ["google_service_networking_connection.private_vpc_connection"]
provisioner "local-exec" {
command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 0}"
}
}
resource "google_sql_database_instance" "instance_1" {
provider = "google-beta"
name = "private-instance-delayed-1"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection",
"null_resource.delayer_1"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
resource "null_resource" "delayer_2" {
depends_on = ["google_service_networking_connection.private_vpc_connection"]
provisioner "local-exec" {
command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 1}"
}
}
resource "google_sql_database_instance" "instance_2" {
provider = "google-beta"
name = "private-instance-delayed-2"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection",
"null_resource.delayer_2"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
provider "google-beta" {
version = "~> 2.5"
credentials = "credentials.json"
project = "PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
}
provider "null" {
version = "~> 1.0"
}
In case someone lands here with a slightly different case (creating google_sql_database_instance in a private network results in an "Unknown error"):
Launch one Cloud SQL instance manually (this will enable servicenetworking.googleapis.com and some other APIs for the project it seems)
Run your manifest
Terminate the instance created in step 1.
Works for me after that
¯_(ツ)_/¯
I land here with a slightly different case, same as #Grigorash Vasilij
(creating google_sql_database_instance in a private network results in an "Unknown error").
I was using the UI to deploy an SQL instance on a private VPC, for some reason that trows me an "Unknown error" as well. I finally solved using the gcloud command instead (why that works and no the UI? IDK, maybe the UI is not doing the same as the command)
gcloud --project=[PROJECT_ID] beta sql instances create [INSTANCE_ID]
--network=[VPC_NETWORK_NAME]
--no-assign-ip
follow this for more details