Cannot create Cloud Composer - google-cloud-platform

I am getting the error below while provisioning Composer via terraform.
Error: Error waiting to create Environment: Error waiting to create Environment: Error waiting for Creating Environment: error while retrieving operation: Get "https://composer.googleapis.com/v1beta1/projects/aayush-terraform/locations/us-central1/operations/ee459492-abb0-4646-893e-09d112219d79?alt=json&prettyPrint=false": write tcp 10.227.112.165:63811->142.251.12.95:443: write: broken pipe. An initial environment was or is still being created, and clean up failed with error: Getting creation operation state failed while waiting for environment to finish creating, but environment seems to still be in 'CREATING' state. Wait for operation to finish and either manually delete environment or import "projects/aayush-terraform/locations/us-central1/environments/example-composer-env" into your state.
Below is the code snippet:
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "~>3.0"
}
}
}
variable "gcp_region" {
type = string
description = "Region to use for GCP provider"
default = "us-central1"
}
variable "gcp_project" {
type = string
description = "Project to use for this config"
default = "aayush-terraform"
}
provider "google" {
region = var.gcp_region
project = var.gcp_project
}
resource "google_service_account" "test" {
account_id = "composer-env-account"
display_name = "Test Service Account for Composer Environment"
}
resource "google_project_iam_member" "composer-worker" {
role = "roles/composer.worker"
member = "serviceAccount:${google_service_account.test.email}"
}
resource "google_compute_network" "test" {
name = "composer-test-network"
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "test" {
name = "composer-test-subnetwork"
ip_cidr_range = "10.2.0.0/16"
region = "us-central1"
network = google_compute_network.test.id
}
resource "google_composer_environment" "test" {
name = "example-composer-env"
region = "us-central1"
config {
node_count = 3
node_config {
zone = "us-central1-a"
machine_type = "n1-standard-1"
network = google_compute_network.test.id
subnetwork = google_compute_subnetwork.test.id
service_account = google_service_account.test.name
}
}
}
NOTE: Composer is getting created even after this error is being thrown and I am provisioning this composer via service account which has been given owner access.

I had the same problem and I solved it by giving the "composer.operations.get
" permission to the service account which is provisioning the Composer.
This permission is part of the Composer Administrator role.
To prevent future operations like updates or deletion through Terraform, I think it's better to use the role rather than a single permission.
Or if you want to make some least privileges work, you can first use the role, then removing permissions you think you won't need and test your terraform code.

Related

Failed to pull image on EKS created by Terraform

I created EKS using Terraform, then I am trying to install aws-load-balancer using helm
it gives me this error
"Failed to pull image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.5": rpc error: code = Unknown desc = Error response from daemon: Head "https://602401143452.dkr.ecr.us-west-2.amazonaws.com/v2/amazon/aws-load-balancer-controller/manifests/v2.4.5": no basic auth credentials"
why there is no basic auth credentials ?
How to make it pull the image from another region? my EKS is in Asia not us-west-2 ?
Note:
EKS is in a private subnet, I am using AWS CodeBuild to run Terraform and deploy EKS, in the last step of CodeBuild I run a simple HELM command to deploy
# Install AWS LoadBalancer Controller
resource "helm_release" "aws-lb-controller" {
name = "aws-load-balancer-controller"
chart = "eks/aws-load-balancer-controller"
repository = "https://aws.github.io/eks-charts"
values = []
set {
name = "region"
value = "ap-southeast-1"
}
set {
name = "image.repository"
value = "public.ecr.aws/eks/aws-load-balancer-controller"
}
set {
name = "image.tag"
value = "v2.4.5"
}
set {
name = "namespace"
value = "kube-system"
}
set {
name = "serviceAccount.create"
value = "false"
}
set {
name = "serviceAccount.name"
value = "aws-load-balancer-controller"
}
set {
name = "ingress-class"
value = "alb"
}
set {
name = "controller.admissionWebhooks.enabled"
value = "false"
}
}
Why EKS failed to pull the image ?

googleapi: Error 400: Precondition check failed., failedPrecondition while creating Cloud Composer Environment through Terraform

I'm trying to create Cloud Composer Environment through Terraform and getting this error
googleapi: Error 400: Precondition check failed., failedPrecondition while creating Cloud Composer Environment through Terraform
The service account of VM from which I'm trying to create composer has owner permissions in the GCP project.
I have tried with same composer configurations from GCP console and the environment got created without any issues.
I have tried disabling the Cloud Composer API and enabled it once again, yet no solution.
Eventually, for the very first time doing terraform apply, it was trying to create composer environment but ended up with version error and I changed the Image version of composer. Now I'm facing this issue. Can anyone help?
Error message from terminal
composer/main.tf
resource "google_composer_environment" "etl_env" {
provider = google-beta
name = var.env_name
region = var.region
config {
node_count = 3
node_config {
zone = var.zone
machine_type = var.node_machine_type
network = var.network
subnetwork = var.app_subnet_selflink
ip_allocation_policy {
use_ip_aliases = true
}
}
software_config {
image_version = var.software_image_version
python_version = 3
}
private_environment_config {
enable_private_endpoint = false
}
database_config {
machine_type = var.database_machine_type
}
web_server_config {
machine_type = var.web_machine_type
}
}
}
composer/variables.tf
variable "app_subnet_selflink" {
type = string
description = "App Subnet Selflink"
}
variable "region" {
type = string
description = "Region"
default = "us-east4"
}
variable "zone" {
type = string
description = "Availability Zone"
default = "us-east4-c"
}
variable "network" {
type = string
description = "Name of the network"
}
variable "env_name" {
type = string
default = "composer-etl"
description = "The name of the composer environment"
}
variable "node_machine_type" {
type = string
default = "n1-standard-1"
description = "The machine type of the worker nodes"
}
variable "software_image_version" {
type = string
default = "composer-1.15.2-airflow-1.10.14"
description = "The image version used in the software configurations of composer"
}
variable "database_machine_type" {
type = string
default = "db-n1-standard-2"
description = "The machine type of the database instance"
}
variable "web_machine_type" {
type = string
default = "composer-n1-webserver-2"
description = "The machine type of the web server instance"
}
Network and Subnetwork are referenced from another module and they are correct.
The issue will be with master_ipv4_cidr_block range. If left blank, the default value of '172.16.0.0/28' is used. As you already created it manually the range already is in use, use some other ranges.
Please follow links for more GCP and Terraform

Creating endpoint in cloud run with Terraform and Google Cloud Platform

I'm research for a way to use Terraform with GCP provider to create cloud run endpoint. For starter I'm creating testing data a simple hello world. I have resource cloud run service configured and cloud endpoints resource configured with cloud endpoints depends_on cloud run. However, I'm trying to pass in the cloud run url as a service name to the cloud endpoints. File are constructed with best practice, with module > cloud run and cloud endpoints resource. However, the Terraform interpolation for passing the output of
service_name = "${google_cloud_run_service.default.status[0].url}"
Terraform throughs an Error: Invalid character. I've also tried module.folder.output.url.
I have the openapi_config.yml hardcoded in the TF config within
I'm wondering if it's possible to have to work. I research many post and some forum are outdated.
#Cloud Run
resource "google_cloud_run_service" "default" {
name = var.name
location = var.location
template {
spec {
containers {
image = "gcr.io/cloudrun/hello"
}
}
metadata {
annotations = {
"autoscaling.knative.dev/maxScale" = "1000"
"run.googleapis.com/cloudstorage" = "project_name:us-central1:${google_storage_bucket.storage-run.name}"
"run.googleapis.com/client-name" = "terraform"
}
}
}
traffic {
percent = 100
latest_revision = true
}
autogenerate_revision_name = true
}
output "url" {
value = "${google_cloud_run_service.default.status[0].url}"
}
data "google_iam_policy" "noauth" {
binding {
role = "roles/run.invoker"
members = [
"allUsers",
]
}
}
resource "google_cloud_run_service_iam_policy" "noauth" {
location = google_cloud_run_service.default.location
project = google_cloud_run_service.default.project
service = google_cloud_run_service.default.name
policy_data = data.google_iam_policy.noauth.policy_data
}
#CLOUD STORAGE
resource "google_storage_bucket" "storage-run" {
name = var.name
location = var.location
force_destroy = true
bucket_policy_only = true
}
data "template_file" "openapi_spec" {
template = file("${path.module}/openapi_spec.yml")
}
#CLOUD ENDPOINT SERVICE
resource "google_endpoints_service" "api-service" {
service_name = "api_name.endpoints.project_name.cloud.goog"
project = var.project
openapi_config = data.template_file.openapi_spec.rendered
}
ERROR: googleapi: Error 400: Service name 'CLOUD_RUN_ESP_NAME' provided in the config files doesn't match the service name 'api_name.endpoints.project_name.cloud.goog' provided in the request., badRequest
So I later discovered, that the service name must match the same as the host/cloud run esp service url without https:// in order for the cloud endpoint services to provisioner. Terraform docs states otherwise in the form of " $apiname.endpoints.$projectid.cloud.goog " terraform_cloud_endpoints and in GCP docs states that the cloud run ESP service must be the url without https:// > gateway-12345-uc.a.run.app
Getting Started with Endpoints for Cloud Run

Fails with Health check error in GCP composer using terraform

I was trying to create a Cloud Composer in GCP using terraform. I was using the terraform version Terraform v0.12.5. But i am unable to launch an instance using terraform.
I am getting the following error
Error: Error waiting to create Environment: Error waiting for Creating Environment: Error code 3, message: Http error status code: 400
Http error message: BAD REQUEST
Additional errors:
{"ResourceType":"appengine.v1.version","ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"message":"Legacy health checks are no longer supported for the App Engine Flexible environment. Please remove the 'health_check' section from your app.yaml and configure updated health checks. For instructions on migrating to split health checks see https://cloud.google.com/appengine/docs/flexible/java/migrating-to-split-health-checks","status":"INVALID_ARGUMENT","details":[],"statusMessage":"Bad Request","requestPath":"https://appengine.googleapis.com/v1/apps/qabc39fc336994cc4-tp/services/default/versions","httpMethod":"POST"}}
main.tf
resource "google_composer_environment" "sample-composer" {
provider= google-beta
project = "${var.project_id}"
name = "${var.google_composer_environment_name}"
region = "${var.region}"
config {
node_count = "${var.composer_node_count}"
node_config {
zone = "${var.zone}"
disk_size_gb = "${var.disk_size_gb}"
machine_type = "${var.composer_machine_type}"
network = google_compute_network.xxx-network.self_link
subnetwork = google_compute_subnetwork.xxx-subnetwork.self_link
}
software_config {
env_variables = {
AIRFLOW_CONN_SAMPLEMEDIA_FTP_CONNECTION = "ftp://${var.ftp_user}:${var.ftp_password}#${var.ftp_host}"
}
image_version = "${var.composer_airflow_version}"
python_version = "${var.composer_python_version}"
}
}
}
resource "google_compute_network" "sample-network" {
name = "composer-xxx-network"
project = "${var.project_id}"
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "sample-subnetwork" {
name = "composer-xxx-subnetwork"
project = "${var.project_id}"
ip_cidr_range = "10.2.0.0/16"
region = "${var.region}"
network = google_compute_network.xxx-network.self_link
}
variables.tf
# Machine specific information for creating Instance in GCP
variable "project_id" {
description = "The name of GCP project"
default = "sample-test"
}
variable "google_composer_environment_name" {
description = "The name of the instance"
default = "sample-analytics-dev"
}
variable "region" {
description = "The name of GCP region"
default = "europe-west1"
}
variable "composer_node_count" {
description = "The number of node count"
default = "3"
}
variable "zone" {
description = "The zone in which instance to be launched"
default = "europe-west1-c"
}
variable "disk_size_gb" {
description = "The machine size in GB"
default = "100"
}
variable "composer_machine_type" {
description = "The type of machine to be launched in GCP"
default = "n1-standard-1"
}
# Environmental Variables
variable "ftp_user" {
description = "Environmental variables for FTP user"
default = "test"
}
variable "ftp_password" {
description = "Environmental variables for FTP password"
default = "4444erf"
}
variable "ftp_host" {
description = "Environmental variables for FTP host"
default = "sample.logs.llnw.net"
}
# Versions for Cloud Composer, Aiflow and Python
variable "composer_airflow_version" {
description = "The composer and airflow versions to launch instance in GCP"
default = "composer-1.7.2-airflow-1.10.2"
}
variable "composer_python_version" {
description = "The version of python"
default = "3"
}
# Network information
variable "composer_network_name" {
description = "Environmental variables for FTP user"
default = "composer-xxx-network"
}
variable "composer_subnetwork_name" {
description = "Environmental variables for FTP user"
default = "composer-xxx-subnetwork"
}
Creating Composer on GCP platform works without any issues. When creating using terraform it requires a health check.
I've tested your current user case within my GCP cloudshell Terraform binary and so far no issue occurred, Composer environment has been successfully created:
$ terraform -v
Terraform v0.12.9
+ provider.google v3.1.0
+ provider.google-beta v3.1.0
A few concerns from my side:
The issue you've reported might be relevant to the usage of legacy health checks, which are essentially deprecated and replaced by split health checks:
As of September 15, 2019, if you're using the legacy health checks,
your application will continue to run and receive health checks but
you won't be able to deploy new versions of your application.
You've not specified any info part about your Terraform GCP provider version and I suppose that issue can be hidden there, as I've seen in this Changelog that split_health_checks are enabled in google_app_engine_application.feature_settings since 3.0.0-beta.1 has been released.
Feel free to add some more insights in order to support you resolving the current issue.

How to fix "An Unknown Error Occurred" when creating multiple Google Cloud SQL instances with private IP simultaneously?

Our cloud backend setup contains 5 Cloud SQL for Postgres instances. We manage our infrastructure using Terraform. We are using connecting them from GKE using a public IP and the Cloud SQL container.
In order to simplify our setup we wish to get rid of the proxy containers by moving to a private IP. I tried following the Terraform guide. While a creating a single instance works fine, trying to create 5 instances simultaneously ends in 4 failed ones and one successful:
The error which appears in the Google Clod Console on the failed instances is "An Unknown Error occurred":
Following is the code which reproduces it. Pay attention to the count = 5 line:
resource "google_compute_network" "private_network" {
provider = "google-beta"
name = "private-network"
}
resource "google_compute_global_address" "private_ip_address" {
provider = "google-beta"
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = "${google_compute_network.private_network.self_link}"
}
resource "google_service_networking_connection" "private_vpc_connection" {
provider = "google-beta"
network = "${google_compute_network.private_network.self_link}"
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}
resource "google_sql_database_instance" "instance" {
provider = "google-beta"
count = 5
name = "private-instance-${count.index}"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
provider "google-beta" {
version = "~> 2.5"
credentials = "credentials.json"
project = "PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
}
I tried several alternatives:
Waiting a minute after creating the google_service_networking_connection and then creating all the instances simultaneously, but I got the same error.
Creating an address range and a google_service_networking_connection per instance, but I got an error that google_service_networking_connection cannot be created simultaneously.
Creating an address range per instance and a single google_service_networking_connection which links to all of them, but I got the same error.
Found an ugly yet working solution. There is a bug in GCP which does not prevent simultaneous creation of instances although it cannot be completed. There is neither documentation about it nor a meaningful error message. It appears in the Terraform Google provider issue tracker as well.
One alternative is adding a dependence between the instances. This allows their creation to complete successfully. However, each instance takes several minutes to create. This accumulates to many spent minutes. If we add an artificial delay of 60 seconds between instance creation, we manage to avoid the failures. Notes:
The needed amount of seconds to delay depends on the instance tier. For example, for db-f1-micro, 30 seconds were enough. They were not enough for db-custom-1-3840.
I am not sure what is the exact number of needed seconds for db-custom-1-3840. 30 seconds were not enough, 60 were.
Following is a the code sample to resolve the issue. It shows 2 instances only since due to depends_on limitations I could not use the count feature and showing the full code for 5 instances would be very long. It works the same for 5 instances:
resource "google_compute_network" "private_network" {
provider = "google-beta"
name = "private-network"
}
resource "google_compute_global_address" "private_ip_address" {
provider = "google-beta"
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = "${google_compute_network.private_network.self_link}"
}
resource "google_service_networking_connection" "private_vpc_connection" {
provider = "google-beta"
network = "${google_compute_network.private_network.self_link}"
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.private_ip_address.name}"]
}
locals {
db_instance_creation_delay_factor_seconds = 60
}
resource "null_resource" "delayer_1" {
depends_on = ["google_service_networking_connection.private_vpc_connection"]
provisioner "local-exec" {
command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 0}"
}
}
resource "google_sql_database_instance" "instance_1" {
provider = "google-beta"
name = "private-instance-delayed-1"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection",
"null_resource.delayer_1"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
resource "null_resource" "delayer_2" {
depends_on = ["google_service_networking_connection.private_vpc_connection"]
provisioner "local-exec" {
command = "echo Gradual DB instance creation && sleep ${local.db_instance_creation_delay_factor_seconds * 1}"
}
}
resource "google_sql_database_instance" "instance_2" {
provider = "google-beta"
name = "private-instance-delayed-2"
database_version = "POSTGRES_9_6"
depends_on = [
"google_service_networking_connection.private_vpc_connection",
"null_resource.delayer_2"
]
settings {
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
ip_configuration {
ipv4_enabled = "false"
private_network = "${google_compute_network.private_network.self_link}"
}
}
}
provider "google-beta" {
version = "~> 2.5"
credentials = "credentials.json"
project = "PROJECT_ID"
region = "us-central1"
zone = "us-central1-a"
}
provider "null" {
version = "~> 1.0"
}
In case someone lands here with a slightly different case (creating google_sql_database_instance in a private network results in an "Unknown error"):
Launch one Cloud SQL instance manually (this will enable servicenetworking.googleapis.com and some other APIs for the project it seems)
Run your manifest
Terminate the instance created in step 1.
Works for me after that
¯_(ツ)_/¯
I land here with a slightly different case, same as #Grigorash Vasilij
(creating google_sql_database_instance in a private network results in an "Unknown error").
I was using the UI to deploy an SQL instance on a private VPC, for some reason that trows me an "Unknown error" as well. I finally solved using the gcloud command instead (why that works and no the UI? IDK, maybe the UI is not doing the same as the command)
gcloud --project=[PROJECT_ID] beta sql instances create [INSTANCE_ID]
--network=[VPC_NETWORK_NAME]
--no-assign-ip
follow this for more details