Terraform/GCP Timeout (and Resources Already Exist)

Terraform/GCP Timeout (and Resources Already Exist) - google-cloud-platform

I'm trying to provision GCP resources through Terraform, but it's timing out while also throwing errors saying that resources already exist (I've looked in GCP and through the CLI, and the resources do not exist).
Error: Error waiting to create Image: Error waiting for Creating Image: timeout while waiting for state to become 'DONE' (last state: 'RUNNING', timeout: 15m0s)
│
│ with google_compute_image.student-image,
│ on main.tf line 29, in resource "google_compute_image" "student-image":
│ 29: resource "google_compute_image" "student-image" {
│
╵
╷
│ Error: Error creating Firewall: googleapi: Error 409: The resource 'projects/**-****-**********-******/global/firewalls/*****-*********-*******-*****-firewall' already exists, alreadyExists
│
│ with google_compute_firewall.default,
│ on main.tf line 46, in resource "google_compute_firewall" "default":
│ 46: resource "google_compute_firewall" "default" {
Some (perhaps salient) details:
I have previously provisioned these resources successfully using this same approach.
My billing account has since changed.
At another point, it was saying that the machine image existed (which, if it does, I can't see either in the console or the CLI).
I welcome any insights/suggestions.
EDIT
Including HCL; variables are defined in variables.tf and terraform.tfvars
provider google {
region = var.region
}
resource "google_compute_image" "student-image" {
name = var.google_compute_image_name
project = var.project
raw_disk {
source = var.google_compute_image_source
}
timeouts {
create = "15m"
update = "15m"
delete = "15m"
}
}
resource "google_compute_firewall" "default" {
name = "cloud-computing-project-image-firewall"
network = "default"
project = var.project
allow {
protocol = "tcp"
# 22: SSH
# 80: HTTP
ports = [
"22",
"80",
]
}
source_ranges = ["0.0.0.0/0"]
}
source = "./vm"
name = "workspace-vm"
project = var.project
image = google_compute_image.student-image.self_link
machine_type = "n1-standard-1"
}
There is a vm subdirectory with main.tf:
resource "google_compute_instance" "student_instance" {
name = var.name
machine_type = var.machine_type
zone = var.zone
project = var.project
boot_disk {
initialize_params {
image = var.image
size = var.disk_size
}
}
network_interface {
network = "default"
access_config {
}
}
labels = {
project = "machine-learning-on-the-cloud"
}
}
...and variables.tf:
variable name {}
variable project {}
variable zone {
default = "us-east1-b"
}
variable image {}
variable machine_type {}
variable disk_size {
default = 20
}

It sounds like maybe the resources were provisioned with Terraform but perhaps someone deleted them manually and so now your state file and what's actual doesn't match. terraform refresh might solve your problem.

Related

Get ID of AWS Security Group Terraform Module

I used this module to create a security group inside a VPC. One of the outputs is the security_group_id, but I'm getting this error:
│ Error: Unsupported attribute
│
│ on ecs.tf line 39, in resource "aws_ecs_service" "hello_world":
│ 39: security_groups = [module.app_security_group.security_group_id]
│ ├────────────────
│ │ module.app_security_group is a object, known only after apply
│
│ This object does not have an attribute named "security_group_id".
I need the security group for an ECS service:
resource "aws_ecs_service" "hello_world" {
name = "hello-world-service"
cluster = aws_ecs_cluster.container_service_cluster.id
task_definition = aws_ecs_task_definition.hello_world.arn
desired_count = 1
launch_type = "FARGATE"
network_configuration {
security_groups = [module.app_security_group.security_group_id]
subnets = module.vpc.private_subnets
}
load_balancer {
target_group_arn = aws_lb_target_group.loadbalancer_target_group.id
container_name = "hello-world-app"
container_port = 3000
}
depends_on = [aws_lb_listener.loadbalancer_listener, module.app_security_group]
}
I understand that I can only know the security group ID after it is created. That's why I added the depends_on part on the ECS stanza, but it kept returning the same error.
Update
I specified count as 1 on the app_security_group module and this is the error I'm getting now.
│ Error: Unsupported attribute
│
│ on ecs.tf line 39, in resource "aws_ecs_service" "hello_world":
│ 39: security_groups = module.app_security_group.security_group_id
│ ├────────────────
│ │ module.app_security_group is a list of object, known only after apply
│
│ Can't access attributes on a list of objects. Did you mean to access an attribute for a specific element of the list, or across all elements of the list?
Update II
This is the module declaration:
module "app_security_group" {
source = "terraform-aws-modules/security-group/aws//modules/web"
version = "3.17.0"
name = "${var.project}-web-sg"
description = "Security group for web-servers with HTTP ports open within VPC"
vpc_id = module.vpc.vpc_id
# ingress_cidr_blocks = module.vpc.public_subnets_cidr_blocks
ingress_cidr_blocks = ["0.0.0.0/0"]
}

I took a look at that module. The problem is that the version 3.17.0 of the module simply does not have the output of security_group_id. You are using a really old version.
The latest version from the site is 4.7.0, you would want to upgrade to this one. In fact, any version above 4.0.0 has the security_group_id, so you need to at least 4.0.0.

As you are using count, please try below.
network_configuration {
security_groups = [module.app_security_group[0].security_group_id]
subnets = module.vpc.private_subnets
}

Defining a ClusterRoleBinding for Terraform service account

So I have a GCP service account that is Kubernetes Admin and Kubernetes Cluster Admin in the GCP cloud console.
I am now trying to give this terraform service account the ClusterRole role in GKE to manage all namespaces via following terraform configuration:
data "google_service_account" "terraform" {
project = var.project_id
account_id = var.terraform_sa_email
}
# Terraform needs to manage cluster
resource "google_project_iam_member" "terraform-gke-admin" {
project = var.project_id
role = "roles/container.admin"
member = "serviceAccount:${data.google_service_account.terraform.email}"
}
# Terraform needs to manage K8S RBAC
# https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control#iam-rolebinding-bootstrap
resource "kubernetes_cluster_role_binding" "terraform_clusteradmin" {
depends_on = [
google_project_iam_member.terraform-gke-admin,
]
metadata {
name = "cluster-admin-binding-terraform"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "cluster-admin"
}
subject {
api_group = "rbac.authorization.k8s.io"
kind = "User"
name = data.google_service_account.terraform.email
}
# must create a binding on unique ID of SA too
subject {
api_group = "rbac.authorization.k8s.io"
kind = "User"
name = data.google_service_account.terraform.unique_id
}
}
However, this always returns the following error:
Error: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "client" cannot create resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope
│
│ with module.kubernetes[0].kubernetes_cluster_role_binding.terraform_clusteradmin,
│ on kubernetes/terraform_role.tf line 15, in resource "kubernetes_cluster_role_binding" "terraform_clusteradmin":
│ 15: resource "kubernetes_cluster_role_binding" "terraform_clusteradmin" {
Any ideas what goes wrong here?
Could this be related to using Google Groups RBAC?
authenticator_groups_config {
security_group = "gke-security-groups#${var.acl_group_domain}"
}

data "google_client_config" "provider" {}
provider "kubernetes" {
cluster_ca_certificate = module.google.cluster_ca_certificate
host = module.google.cluster_endpoint
token = data.google_client_config.provider.access_token
}

Using Terraform to create an AWS EC2 bastion

I am trying to spin-up an AWS bastion host on AWS EC2. I am using the Terraform module provided by Guimove. I am getting stuck on the bastion_host_key_pair field. I need to provide a keypair that can be used to launch the EC2 template, but the bucket (aws_s3_bucket.bucket) that needs to contain the public key of the key pair gets created during the module, therefore the key isn't there when it tries to launch the instance and it fails. It feels like a chicken and egg scenario, so I am obviously doing something wrong. What am I doing wrong?
Error:
╷
│ Error: Error creating Auto Scaling Group: AccessDenied: You are not authorized to use launch template: lt-004b0af2895c684b3
│ status code: 403, request id: c6096e0d-dc83-4384-a036-f35b8ca292f8
│
│ with module.bastion.aws_autoscaling_group.bastion_auto_scaling_group,
│ on .terraform\modules\bastion\main.tf line 300, in resource "aws_autoscaling_group" "bastion_auto_scaling_group":
│ 300: resource "aws_autoscaling_group" "bastion_auto_scaling_group" {
│
╵
Terraform:
resource "tls_private_key" "bastion_host" {
algorithm = "RSA"
rsa_bits = 4096
}
resource "aws_key_pair" "bastion_host" {
key_name = "bastion_user"
public_key = tls_private_key.bastion_host.public_key_openssh
}
resource "aws_s3_bucket_object" "bucket_public_key" {
bucket = aws_s3_bucket.bucket.id
key = "public-keys/${aws_key_pair.bastion_host.key_name}.pub"
content = aws_key_pair.bastion_host.public_key
kms_key_id = aws_kms_key.key.arn
}
module "bastion" {
source = "Guimove/bastion/aws"
bucket_name = "${var.identifier}-ssh-bastion-bucket-${var.env}"
region = var.aws_region
vpc_id = var.vpc_id
is_lb_private = "false"
bastion_host_key_pair = aws_key_pair.bastion_host.key_name
create_dns_record = "false"
elb_subnets = var.public_subnet_ids
auto_scaling_group_subnets = var.public_subnet_ids
instance_type = "t2.micro"
tags = {
Name = "SSH Bastion Host - ${var.identifier}-${var.env}",
}
}

I had the same issue. The fix was to go into AWS Market place, accept the EULA and subscribe to the AMI I was trying to use.

Terraform Error refreshing state: BucketRegionError: incorrect region

I have the terraform file main.tf that used to create AWS resources:
provider "aws" {
region = "us-east-2"
}
resource "aws_instance" "example" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
vpc_security_group_ids = [
aws_security_group.instance.id]
user_data = <<-EOF
#!/bin/bash
echo "Hello, World" > index.html
nohup busybox httpd -f -p "${var.server_port}" &
EOF
tags = {
Name = "terraform-example"
}
}
resource "aws_security_group" "instance" {
name = "terraform-example-instance"
ingress {
from_port = var.server_port
to_port = var.server_port
protocol = "tcp"
cidr_blocks = [
"0.0.0.0/0"]
}
}
resource "aws_security_group" "elb" {
name = "terraform-example-elb"
# Allow all outbound
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = [
"0.0.0.0/0"]
}
# Inbound HTTP from anywhere
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = [
"0.0.0.0/0"]
}
}
variable "server_port" {
description = "The port the server will use for HTTP requests"
type = number
default = 8080
}
variable "elb_port" {
description = "The port the server will use for HTTP requests"
type = number
default = 80
}
resource "aws_launch_configuration" "example" {
image_id = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
security_groups = [
aws_security_group.instance.id]
user_data = <<-EOF
#!/bin/bash
echo "Hello, World" > index.html
nohup busybox httpd -f -p "${var.server_port}" &
EOF
lifecycle {
create_before_destroy = true
}
}
resource "aws_elb" "example" {
name = "terraform-asg-example"
security_groups = [
aws_security_group.elb.id]
availability_zones = data.aws_availability_zones.all.names
health_check {
target = "HTTP:${var.server_port}/"
interval = 30
timeout = 3
healthy_threshold = 2
unhealthy_threshold = 2
}
# This adds a listener for incoming HTTP requests.
listener {
lb_port = var.elb_port
lb_protocol = "http"
instance_port = var.server_port
instance_protocol = "http"
}
}
resource "aws_autoscaling_group" "example" {
launch_configuration = aws_launch_configuration.example.id
availability_zones = data.aws_availability_zones.all.names
min_size = 2
max_size = 10
load_balancers = [
aws_elb.example.name]
health_check_type = "ELB"
tag {
key = "Name"
value = "terraform-asg-example"
propagate_at_launch = true
}
}
data "aws_availability_zones" "all" {}
output "public_ip" {
value = aws_instance.example.public_ip
description = "The public IP of the web server"
}
I successfully created the resources and then, destroyed them afterward. Now, I would like to create an AWS S3 remote backend for the project and appended the extra resources in the same file -
resource "aws_s3_bucket" "terraform_state" {
bucket = "terraform-up-and-running-state12345"
# Enable versioning so we can see the full revision history of our
# state files
versioning {
enabled = true
}
# Enable server-side encryption by default
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
}
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-up-and-running-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
output "s3_bucket_arn" {
value = aws_s3_bucket.terraform_state.arn
description = "The ARN of the S3 bucket"
}
output "dynamodb_table_name" {
value = aws_dynamodb_table.terraform_locks.name
description = "The name of the DynamoDB table"
}
Then, I created a new file named backend.tf and add the code there:
terraform {
backend "s3" {
# Replace this with your bucket name!
bucket = "terraform-up-and-running-state12345"
key = "global/s3/terraform.tfstate"
region = "us-east-2"
# Replace this with your DynamoDB table name!
dynamodb_table = "terraform-up-and-running-locks"
encrypt = true
}
}
When I run the $ terraform init, I get the error below:
Initializing the backend...
Backend configuration changed!
Terraform has detected that the configuration specified for the backend
has changed. Terraform will now check for existing state in the backends.
╷
│ Error: Error loading state:
│ BucketRegionError: incorrect region, the bucket is not in 'us-east-2' region at endpoint ''
│ status code: 301, request id: , host id:
│
│ Terraform failed to load the default state from the "s3" backend.
│ State migration cannot occur unless the state can be loaded. Backend
│ modification and state migration has been aborted. The state in both the
│ source and the destination remain unmodified. Please resolve the
│ above error and try again.
I created the S3 bucket from the terminal:
$ aws s3api create-bucket --bucket terraform-up-and-running-state12345 --region us-east-2 --create-bucket-configuration LocationConstraint=us-east-2
Then, I tried again and receive the same error again. However, the bucket is already there:
I can't also run the destroy command as well:
$ terraform destroy
Acquiring state lock. This may take a few moments...
╷
│ Error: Error acquiring the state lock
│
│ Error message: 2 errors occurred:
│ * ResourceNotFoundException: Requested resource not found
│ * ResourceNotFoundException: Requested resource not found
│
│
│
│ Terraform acquires a state lock to protect the state from being written
│ by multiple users at the same time. Please resolve the issue above and try
│ again. For most commands, you can disable locking with the "-lock=false"
│ flag, but this is not recommended.
Can someone explain to me why is that and how to solve it?

Remove the .terraform folder and try terraform init
again
OR
error is because there's no S3 bucket created to sync with.
remove json object of s3 in .terraform/terraform.tfstate
remove the object generating remote backend run
terraform init

Terraform failing to set some attributes for GCP Compute Engine during deploying

I am using terraform to import the state of existing GCP Compute Engine Resource so that the resource can be later managed with terraform
I imported using below command
terraform import google_compute_instance.default <project-d>/us-east1-b/server-001
After that I executed terraform show to identify the state of existing resources and copy pasted the output of it to main.tf file .
Now when I do terraform plan it shows below errors
Error: "label_fingerprint": this field cannot be set
# google_compute_instance.default:
on main.tf line 2, in resource "google_compute_instance" "default":
2: resource "google_compute_instance" "default" {
Error: "current_status": this field cannot be set
on main.tf line 2, in resource "google_compute_instance" "default":
2: resource "google_compute_instance" "default" {
Error: "network_interface.0.name": this field cannot be set
on main.tf line 2, in resource "google_compute_instance" "default":
2: resource "google_compute_instance" "default" {
Error: "instance_id": this field cannot be set
on main.tf line 2, in resource "google_compute_instance" "default":
2: resource "google_compute_instance" "default" {
Error: : invalid or unknown key: id
on main.tf line 2, in resource "google_compute_instance" "default":
2: resource "google_compute_instance" "default" {
Following are the lines of code
project = "<Project-ID>"
current_status = "TERMINATED"
name = "server-001"
hostname = "server-001.example.com"
id = "projects/<project-id>/zones/us-east1-b/instances/server-001"
instance_id = "7335818403011119952"
labels = {
"env" = "dev"
"server" = "app"
}
machine_type = "f1-micro"
zone = "us-east1-b"
boot_disk {
auto_delete = true
device_name = "server-001"
mode = "READ_WRITE"
source = "https://www.googleapis.com/compute/v1/projects/<projec-id>/zones/us-east1-b/disks/server-001"
initialize_params {
image = "https://www.googleapis.com/compute/v1/projects/centos-cloud/global/images/centos-7-v20200309"
labels = {}
size = 10
type = "pd-standard"
}
}
network_interface {
name = "nic0"
network = "https://www.googleapis.com/compute/v1/projects/<projec-id>/global/networks/adminproject-vpc"
network_ip = "10.3.0.2"
subnetwork = "https://www.googleapis.com/compute/v1/projects/<projec-id>/regions/us-east1/subnetworks/app-subnet"
subnetwork_project = "<project-id>"
}
Terraform version is as follows
$ terraform version
Terraform v0.12.24
+ provider.google v3.29.0
Removing these attributes fixes the issue but can't we set these attributes while creating the manifests file for terraform ? Please guide.

These fields cannot be managed by Terraform. They may be used in the configuration of other resources but cannot be modified on the google_compute_instance resource itself, as they are delegated by GCP. You can look at the supported arguments for the google_compute_instance resource here.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Terraform/GCP Timeout (and Resources Already Exist) - google-cloud-platform

It sounds like maybe the resources were provisioned with Terraform but perhaps someone deleted them manually and so now your state file and what's actual doesn't match. terraform refresh might solve your problem.

Related

Get ID of AWS Security Group Terraform Module

Defining a ClusterRoleBinding for Terraform service account

Using Terraform to create an AWS EC2 bastion

Terraform Error refreshing state: BucketRegionError: incorrect region

Terraform failing to set some attributes for GCP Compute Engine during deploying

Categories

Resources