Atlantis plan erroring with querying Cloud Storage failed message - google-cloud-platform

I have a GCP VM to which a GCP Service Account has been attached.
This SA has the appropriate permissions to perform some terraform / terragrunt related actions, such as querying the backend configuration GCS bucket etc.
So, when I log in to the VM (to which I have already transferred my terraform configuration files, I can for example do
$ terragrunt plan
Initializing the backend...
Successfully configured the backend "gcs"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing provider plugins...
- terraform.io/builtin/terraform is built in to Terraform
- Finding hashicorp/random versions matching "3.1.0"...
- Finding hashicorp/template versions matching "2.2.0"...
- Finding hashicorp/local versions matching "2.1.0"...
.
.
.
(...and the plan goes on)
I have now set up atlantis to run as a systemd service (under a same name user)
The problem is that when I create a PR, the plan (as posted as a PR comment) fails as follows:
Initializing the backend...
Successfully configured the backend "gcs"! Terraform will automatically
use this backend unless the backend configuration changes.
Failed to get existing workspaces: querying Cloud Storage failed: storage: bucket doesn't exist
Does anyone know (suspects) whether this problem may be related to the change the terraform service account is / can not be used by the systemd service running atlantis? (cause the bucket is there, since I am able to plan manually)
update: I have validated that a systemd service does inherit the GCP SA by creating a systemd service that just runs this script
#!/bin/bash
gcloud auth list
and this does output the SA of the VM.
So I changed my original question since this apparently is not the issue.

Posting my comment as an answer for visibility to other community members.
You were maybe getting an error because there can be an issue with the terraform configuration. To update it, Please run the following command and see if it solves your issue.
terraform init -reconfigure

Related

Deploy new container revision to Cloud Run without changing Terraform

I am setting up a CI&CD environment for a GCP project involves Cloud Run. While setting up everything via Terraform is pretty much straightforward, I cannot figure out how to update the environment when the code changes.
The documentation says:
Make a change to the configuration file.
But that couples the application deployment to terraform configuration, which should be responsible only for infrastructure deployment.
Ideally, I use terraform to provision the infrastructure, and another CI step to build and deploy the container.
Is there a best-practice here?
Relevant sources: 1.
I ended up separating Cloud Run service creation (which is still done in Terraform) and deployment to two different workflows.
The key component was to make terraform ignore the actual deployed image so that when the code deployment workflow is done, terraform won't complained that the Cloud Run image is different from the one it manages. I achieved this by setting ignore_changes = [template[0].spec[0].containers[0].image] on the google_cloud_run_service resource.

How can we use the cloud build privatePool in google cloud deploy

We cannot make the cloud deploy job run the RENDER or DEPLOY in another project's private pool, we can make it use the private pool in the project that hosts the cloud deploy.
Following the documentation of the cloud-deploy setup here: https://cloud.google.com/deploy/docs/execution-environment#changing_from_the_default_pool_to_a_private_pool and here: https://cloud.google.com/build/docs/private-pools/set-up-private-pool-environment#setup-private-connection I have created a clouddeploy.yaml with the following parameters:
apiVersion: deploy.cloud.google.com/v1beta1
kind: Target
metadata:
name: k8-target
description: apply development
requireApproval: false
gke:
cluster: projects/development-k8-cluster/locations/europe-west1/clusters/development-k8
executionConfigs:
- privatePool:
workerPool: projects/vpchost-project-development/locations/europe-west1/workerPools/cloudddeploy-pool
usages:
- RENDER
- DEPLOY
In summary: there's a cloudbuild project, a k8s project and a clouddeploy project. However, no matter what I do I cannot make the cloud deploy job run the RENDER or DEPLOY in another project's private pool. It does run, but in the clouddeploy project itself. There are no logs, or errors until the deployment phase. Where either cloudbuild starts up inside the clouddeploy project and not in the private pool project or there is an eventual timeout and the pipeline remains stuck as there is no cancel function.
I have given the clouddeploy service account, the cloudbuild service account, a custom service account (not shown in the yaml above) and the default compute service account: owner privileges, cloud deploy runner privileges, cloud build owner and worker pool user privileges.
The request from cloud deploy appears empty except for a run ID that is created when a job is submitted with:
cloud beta deploy releases create.
After the deploy release is picked up the job will not do anything until a there is a timeout.
Can anyone see what i've done wrong or has anyone managed to make this work?
EDIT Following a comment from one of the contributors: I expected the 'privatePool' field to be filled and a job running in either the cloud deploy host project or the cloud build project but there is no activity in either.
You can use another project's private pool, but the Cloud Build instance (and thus where builds show up along with the logs for them) will always been in the project where the Cloud Deploy pipeline lives
Note that in order to make this work, the you will need to grant permission for that pool to be used across projects (See the note under: https://cloud.google.com/deploy/docs/execution-environment#changing_from_the_default_pool_to_a_private_pool)
For logs to show up, you will need to ensure that the service account that is running the build has logging permission in the same project where the delivery pipeline exists.
If the service account being used for the build is not in the same project as Cloud Deploy, you will also need to grant Cloud Deploy act-as permission to use that service account

gcloud builds submit fails while docker push + gcloud run deploy work just fine?

EDIT: The so called duplicate question was way off since 1. I could push another image and 2. I could not push a build image. Finally, point #3 is the solution was totally different and ONLY related to pushing build images via cloudbuild. ie. I beg to differ that this question WAS different.
Running into some more google cloud security stuff. We currently deploy to cloud run like so
docker build . --tag gcr.io/myproject/authservice
docker push gcr.io/myproject/authservice
gcloud run deploy staging-admin --region us-west1 --image gcr.io/myproject/authservice --platform managed
I did the quick start for google builds but I am getting permission errors. I did this command
https://cloud.google.com/cloud-build/docs/quickstart-build
The command I ran was
gcloud builds submit --tag gcr.io/myproject/quickstart-image
This is all the same project but submitting builds gets this same error over and over and over(I am not sure why it doesn't just exit on first error.
The push refers to repository [gcr.io/myproject/quickstart-image]
e3831abe9997: Preparing
60664c29ef5a: Preparing
denied: Token exchange failed for project 'myproject'. Caller does not have permission 'storage.buckets.get'. To configure permissions, follow instructions at: https://cloud.google.com/container-registry/docs/access-control
Any ideas how to fix so I can use google cloud build?
Complementing the previous answer, as is mentioned in this document to perform actions in Container Registry the role "sotrage admin" is necessary
Do you have "roles/storage.admin" role? If not, add it and try.
The Could build service account has this format [project_number]#cloudbuild.gserviceaccount.com please add the role "roles/storage.admin" by following this steps
Open the Cloud IAM page
Select your Cloud project.
In the permissions table, locate the row with the email address
ending with #cloudbuild.gserviceaccount.com. This is your Cloud
Build service account.
Click on the pencil icon.
Select the role you wish to grant to the Cloud Build service
account.
Click Save.
BE WARNED: I read the duplicate question post but in my case
I can push items
only the build one is failing AND the solution I found is different than any of the other question answers
This was a VERY weird issue. The storage permission MUST be a red herring because these permissions fixed the issue
I found some documentation somewhere that I can't seem to find on a google github repo about adding these permissions AND a document on the TWO #cloudbuild.gserviceaccount.com accouts AND you must add the permissions to the correct one!!!! One is owned by google and you should not touch.
In my case, the permission / token exchange failed error was caused by having the storage bucket used by Google Container Registry inside a VPC Service Perimeter.
This can be checked / confirmed via the VPC Service Controls logs - accessible easily from the troubleshooting page.
There is a (very clunky) way to get Cloud Build working to push images to a registry inside a VPC perimeter. It involves running a build worker pool and applying appropriate config + permissions to the perimeter etc.

Kubernetes Engine unable to pull image from non-private / GCR repository

I was happily deploying to Kubernetes Engine for a while, but while working on an integrated cloud container builder pipeline, I started getting into trouble.
I don't know what changed. I can not deploy to kubernetes anymore, even in ways I did before without cloud builder.
The pods rollout process gives an error indicating that it is unable to pull from the registry. Which seems weird because the images exist (I can pull them using cli) and I granted all possibly related permissions to my user and the cloud builder service account.
I get the error ImagePullBackOff and see this in the pod events:
Failed to pull image
"gcr.io/my-project/backend:f4711979-eaab-4de1-afd8-d2e37eaeb988":
rpc error: code = Unknown desc = unauthorized: authentication required
What's going on? Who needs authorization, and for what?
In my case, my cluster didn't have the Storage read permission, which is necessary for GKE to pull an image from GCR.
My cluster didn't have proper permissions because I created the cluster through terraform and didn't include the node_config.oauth_scopes block. When creating a cluster through the console, the Storage read permission is added by default.
The credentials in my project somehow got messed up. I solved the problem by re-initializing a few APIs including Kubernetes Engine, Deployment Manager and Container Builder.
First time I tried this I didn't succeed, because to disable something you have to disable first all the APIs that depend on it. If you do this via the GCloud web UI then you'll likely see a list of services that are not all available for disabling in the UI.
I learned that using the gcloud CLI you can list all APIs of your project and disable everything properly.
Things worked after that.
The reason I knew things were messed up, is because I had a copy of the same things as a production environment, and there these problems did not exist. The development environment had a lot of iterations and messing around with credentials, so somewhere things got corrupted.
These are some examples of useful commands:
gcloud projects get-iam-policy $PROJECT_ID
gcloud services disable container.googleapis.com --verbosity=debug
gcloud services enable container.googleapis.com
More info here, including how to restore service account credentials.

Spinnaker clouddriver doesn't start

After Spinnaker deployment on EC2, clouddriver doesn't start. tried the same on local machine and the result is the same. Trying to run 1.6.1 on ubuntu 16.04.
I am using s3 as storage aws as cloudprovider.
After deployment spinnaker UI is accesable, when creating new application the windows hangs and error message appears in browser's console regarding localhost:8084/ credentials and 7002 port.
tried to send curl request to localhost:7002 from the server, but connection refused. 7002 port isn't being listened but all other services ports are. clouddriver start and then enters failed state (for about after 30 seconds).
For deployment I've followed this guide on official website.
Also I can't find logs of services in /var/log/spinnaker/any service/ path, there are logs only in /var/log/spinnaker/halyard/ path.
All policies/roles/users have been made in aws properly as described in official setup guide. double checked. Still facing issue.
Maybe I am missing anything?
Here is the error from browser console when trying to create new application
GET http://localhost:8084/credentials?expand=true 500 () angular.js:14525 Possibly unhandled rejection: {"data":{"error":"Internal Server Error","exception":"com.google.common.util.concurrent.UncheckedExecutionException","message":"retrofit.RetrofitError: Failed to connect to localhost/127.0.0.1:7002","status":500,"timestamp":1523484058259},"status":500,"config":{"method":"GET","transformRequest":[null],"transformResponse":[null],"jsonpCallbackParam":"callback","url":"http://localhost:8084/credentials","cache":true,"params":{"expand":true},"timeout":65000,"headers":{"X-RateLimit-App":"deck","Accept":"application/json, text/plain, */*"},"withCredentials":true},"statusText":""} undefined
Have done some tests later. Here are results.
1deployed spinnaker without s3 storage and any cloud provider - clouddriver works
2added s3 as persistent storage - clouddriver works again. Opened UI created dummy project and saw that files have been created in the s3 bucket under front50 folder. everything fine.
3added aws configurations - created user in aws, and ran this command with appropriate changes
hal config provider aws edit --access-key-id ${ACCESS_KEY_ID} \ --secret-access-key
and ran this command with appropriate changes
hal config provider aws account add $AWS_ACCOUNT_NAME \ --account-id ${ACCOUNT_ID} \ --assume-role role/spinnakerManaged
after checking aws configs with hal config provider aws the value of defaultAssumeRole=0
and after hal deploy apply again clouddriver doesn't start and I cannot create an application from UI. the window loads infinitely.
This is the option for dev spinnaker. localdebian type never worked for me as it contains extra dependencies.
Please use a Kubernetes cluster for the installation or user Minnaker for quick PoC of OSS. It runs on a K3S cluster.