How can we use the cloud build privatePool in google cloud deploy

How can we use the cloud build privatePool in google cloud deploy - google-cloud-platform

We cannot make the cloud deploy job run the RENDER or DEPLOY in another project's private pool, we can make it use the private pool in the project that hosts the cloud deploy.
Following the documentation of the cloud-deploy setup here: https://cloud.google.com/deploy/docs/execution-environment#changing_from_the_default_pool_to_a_private_pool and here: https://cloud.google.com/build/docs/private-pools/set-up-private-pool-environment#setup-private-connection I have created a clouddeploy.yaml with the following parameters:
apiVersion: deploy.cloud.google.com/v1beta1
kind: Target
metadata:
name: k8-target
description: apply development
requireApproval: false
gke:
cluster: projects/development-k8-cluster/locations/europe-west1/clusters/development-k8
executionConfigs:
- privatePool:
workerPool: projects/vpchost-project-development/locations/europe-west1/workerPools/cloudddeploy-pool
usages:
- RENDER
- DEPLOY
In summary: there's a cloudbuild project, a k8s project and a clouddeploy project. However, no matter what I do I cannot make the cloud deploy job run the RENDER or DEPLOY in another project's private pool. It does run, but in the clouddeploy project itself. There are no logs, or errors until the deployment phase. Where either cloudbuild starts up inside the clouddeploy project and not in the private pool project or there is an eventual timeout and the pipeline remains stuck as there is no cancel function.
I have given the clouddeploy service account, the cloudbuild service account, a custom service account (not shown in the yaml above) and the default compute service account: owner privileges, cloud deploy runner privileges, cloud build owner and worker pool user privileges.
The request from cloud deploy appears empty except for a run ID that is created when a job is submitted with:
cloud beta deploy releases create.
After the deploy release is picked up the job will not do anything until a there is a timeout.
Can anyone see what i've done wrong or has anyone managed to make this work?
EDIT Following a comment from one of the contributors: I expected the 'privatePool' field to be filled and a job running in either the cloud deploy host project or the cloud build project but there is no activity in either.

You can use another project's private pool, but the Cloud Build instance (and thus where builds show up along with the logs for them) will always been in the project where the Cloud Deploy pipeline lives
Note that in order to make this work, the you will need to grant permission for that pool to be used across projects (See the note under: https://cloud.google.com/deploy/docs/execution-environment#changing_from_the_default_pool_to_a_private_pool)
For logs to show up, you will need to ensure that the service account that is running the build has logging permission in the same project where the delivery pipeline exists.
If the service account being used for the build is not in the same project as Cloud Deploy, you will also need to grant Cloud Deploy act-as permission to use that service account

Related

Atlantis plan erroring with querying Cloud Storage failed message

I have a GCP VM to which a GCP Service Account has been attached.
This SA has the appropriate permissions to perform some terraform / terragrunt related actions, such as querying the backend configuration GCS bucket etc.
So, when I log in to the VM (to which I have already transferred my terraform configuration files, I can for example do
$ terragrunt plan
Initializing the backend...
Successfully configured the backend "gcs"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing provider plugins...
- terraform.io/builtin/terraform is built in to Terraform
- Finding hashicorp/random versions matching "3.1.0"...
- Finding hashicorp/template versions matching "2.2.0"...
- Finding hashicorp/local versions matching "2.1.0"...
.
.
.
(...and the plan goes on)
I have now set up atlantis to run as a systemd service (under a same name user)
The problem is that when I create a PR, the plan (as posted as a PR comment) fails as follows:
Initializing the backend...
Successfully configured the backend "gcs"! Terraform will automatically
use this backend unless the backend configuration changes.
Failed to get existing workspaces: querying Cloud Storage failed: storage: bucket doesn't exist
Does anyone know (suspects) whether this problem may be related to the change the terraform service account is / can not be used by the systemd service running atlantis? (cause the bucket is there, since I am able to plan manually)
update: I have validated that a systemd service does inherit the GCP SA by creating a systemd service that just runs this script
#!/bin/bash
gcloud auth list
and this does output the SA of the VM.
So I changed my original question since this apparently is not the issue.

Posting my comment as an answer for visibility to other community members.
You were maybe getting an error because there can be an issue with the terraform configuration. To update it, Please run the following command and see if it solves your issue.
terraform init -reconfigure

Running Cloud Build trigger via GCP Console returns 'build.service_account' field cannot be set for triggered builds

I am currently using Cloud Build for my Dataflow Flex template to kick off jobs.
Here's my current command:
gcloud beta builds submit --config run.yaml --substitutions _REGION=$REGION \
--substitutions _FMPKEY=$FMPKEY --no-source
Currently this is running fine from Cloud Shell.
But now I want the build to be kicked off based on a trigger..
So I created a Cloud Build that will trigger running this file based on dropping a message to a topic:
https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/pipeline/run.yaml
However, after publishing a message to the selected topic, all my builds fail with the following error:
our build failed to run: generic::invalid_argument:generic::invalid_argument:
'build.service_account' field cannot be set for triggered builds
I cannot see any logs or details, so it's not clear to me what is going on..
I am guessing it has something to do with the last line in my run.yaml?
options:
logging: CLOUD_LOGGING_ONLY
# Use the Compute Engine default service account to launch the job.
serviceAccount: projects/$PROJECT_ID/serviceAccounts/$PROJECT_NUMBER-compute#developer.gserviceaccount.com
However I see no option for selecting the service account in cloud build. Do I need to set some permissions in IAM?

You are correct with your guess and this is working as intended.
Cloud Build has a default service account to execute builds on your behalf. While GCP allows you to configure user-specific accounts for additional control, it doesn't apply when you're using build triggers. Build triggers only use the default service account to execute builds.
This is documented in GCP docs:
Build triggers use Cloud Build service account to execute builds. This could provide elevated build-time permissions to users who use triggers to start a build. Keep the following security implications in mind when using build triggers ...
Also as part of limitation:
User-specified service accounts only work with manual builds; they don't work with build triggers.
Therefore, you must pass a config yaml without serviceAccount if you plan on using build triggers.

Cloud Run error: Internal system error. Missing necessary permission

I cannot seem to deploy/run any Google Cloud Run services.
I have attempted this from multiple accounts (with billing on all accounts and projects), created fresh projects in each account, added every permission I could find to try to get around this.
I've built my own container based on the Hello World example from here: https://cloud.google.com/run/docs/quickstarts/build-and-deploy
Trying to deploy:
helloworld-csharp>gcloud run deploy --image gcr.io/[Project ID]/helloworld --platform managed
Service name (helloworld):
Deploying container to Cloud Run service [helloworld] in project [Project ID] region [us-west1]
Deploying...
Creating Revision... Cloud Run error: Internal system error. Missing necessary permission for service-[ID]#serverless-robot-prod.iam.gserviceaccount.com on resource [Project ID]. Please visit https://cloud.google.com/run
/docs/troubleshooting for in-depth troubleshooting documentation....failed
Deployment failed
ERROR: (gcloud.run.deploy) Cloud Run error: Internal system error. Missing necessary permission for service-[ID]#serverless-robot-prod.iam.gserviceaccount.com on resource [Project ID]. Please visit https://cloud.google.com/run/docs/troubleshooting for in-depth troubleshooting documentation.
Trying to deploy the simple 'hello' example here from the web console leaves me with the same error:
Cloud Run error: Internal system error. Missing necessary permission for service-[ID]#serverless-robot-prod.iam.gserviceaccount.com on resource [Project ID]. Please visit https(...)cloud.google.com/run/docs/troubleshooting for in-depth troubleshooting documentation.
I have the following users in the project, as they were auto-setup and configured when I enabled the API:
[ID]-compute#developer.gserviceaccount.com Compute Engine default service account
[ID]#cloudbuild.gserviceaccount.com Cloud Build Service Account
[ID]#cloudservices.gserviceaccount.com Google APIs Service Agent
service-[ID]#compute-system.iam.gserviceaccount.com Compute Engine Service Agent
service-[ID]#gcp-sa-cloudbuild.iam.gserviceaccount.com Cloud Build Service Account
service-[ID]#serverless-robot-prod.iam.gserviceaccount.com Google Cloud Run Service Agent

Yes, it seems it was indeed a Google issue, I didn't change anything, just went back to the console and I can start all my test containers without any issue now...

gcloud builds submit fails while docker push + gcloud run deploy work just fine?

EDIT: The so called duplicate question was way off since 1. I could push another image and 2. I could not push a build image. Finally, point #3 is the solution was totally different and ONLY related to pushing build images via cloudbuild. ie. I beg to differ that this question WAS different.
Running into some more google cloud security stuff. We currently deploy to cloud run like so
docker build . --tag gcr.io/myproject/authservice
docker push gcr.io/myproject/authservice
gcloud run deploy staging-admin --region us-west1 --image gcr.io/myproject/authservice --platform managed
I did the quick start for google builds but I am getting permission errors. I did this command
https://cloud.google.com/cloud-build/docs/quickstart-build
The command I ran was
gcloud builds submit --tag gcr.io/myproject/quickstart-image
This is all the same project but submitting builds gets this same error over and over and over(I am not sure why it doesn't just exit on first error.
The push refers to repository [gcr.io/myproject/quickstart-image]
e3831abe9997: Preparing
60664c29ef5a: Preparing
denied: Token exchange failed for project 'myproject'. Caller does not have permission 'storage.buckets.get'. To configure permissions, follow instructions at: https://cloud.google.com/container-registry/docs/access-control
Any ideas how to fix so I can use google cloud build?

Complementing the previous answer, as is mentioned in this document to perform actions in Container Registry the role "sotrage admin" is necessary
Do you have "roles/storage.admin" role? If not, add it and try.
The Could build service account has this format [project_number]#cloudbuild.gserviceaccount.com please add the role "roles/storage.admin" by following this steps
Open the Cloud IAM page
Select your Cloud project.
In the permissions table, locate the row with the email address
ending with #cloudbuild.gserviceaccount.com. This is your Cloud
Build service account.
Click on the pencil icon.
Select the role you wish to grant to the Cloud Build service
account.
Click Save.

BE WARNED: I read the duplicate question post but in my case
I can push items
only the build one is failing AND the solution I found is different than any of the other question answers
This was a VERY weird issue. The storage permission MUST be a red herring because these permissions fixed the issue
I found some documentation somewhere that I can't seem to find on a google github repo about adding these permissions AND a document on the TWO #cloudbuild.gserviceaccount.com accouts AND you must add the permissions to the correct one!!!! One is owned by google and you should not touch.

In my case, the permission / token exchange failed error was caused by having the storage bucket used by Google Container Registry inside a VPC Service Perimeter.
This can be checked / confirmed via the VPC Service Controls logs - accessible easily from the troubleshooting page.
There is a (very clunky) way to get Cloud Build working to push images to a registry inside a VPC perimeter. It involves running a build worker pool and applying appropriate config + permissions to the perimeter etc.

Kubernetes Engine unable to pull image from non-private / GCR repository

I was happily deploying to Kubernetes Engine for a while, but while working on an integrated cloud container builder pipeline, I started getting into trouble.
I don't know what changed. I can not deploy to kubernetes anymore, even in ways I did before without cloud builder.
The pods rollout process gives an error indicating that it is unable to pull from the registry. Which seems weird because the images exist (I can pull them using cli) and I granted all possibly related permissions to my user and the cloud builder service account.
I get the error ImagePullBackOff and see this in the pod events:
Failed to pull image
"gcr.io/my-project/backend:f4711979-eaab-4de1-afd8-d2e37eaeb988":
rpc error: code = Unknown desc = unauthorized: authentication required
What's going on? Who needs authorization, and for what?

In my case, my cluster didn't have the Storage read permission, which is necessary for GKE to pull an image from GCR.
My cluster didn't have proper permissions because I created the cluster through terraform and didn't include the node_config.oauth_scopes block. When creating a cluster through the console, the Storage read permission is added by default.

The credentials in my project somehow got messed up. I solved the problem by re-initializing a few APIs including Kubernetes Engine, Deployment Manager and Container Builder.
First time I tried this I didn't succeed, because to disable something you have to disable first all the APIs that depend on it. If you do this via the GCloud web UI then you'll likely see a list of services that are not all available for disabling in the UI.
I learned that using the gcloud CLI you can list all APIs of your project and disable everything properly.
Things worked after that.
The reason I knew things were messed up, is because I had a copy of the same things as a production environment, and there these problems did not exist. The development environment had a lot of iterations and messing around with credentials, so somewhere things got corrupted.
These are some examples of useful commands:
gcloud projects get-iam-policy $PROJECT_ID
gcloud services disable container.googleapis.com --verbosity=debug
gcloud services enable container.googleapis.com
More info here, including how to restore service account credentials.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js