Permissions Issue with Google Cloud Data Fusion - google-cloud-platform

I'm following the instructions in the Cloud Data Fusion sample tutorial and everything seems to work fine, until I try to run the pipeline right at the end. Cloud Data Fusion Service API permissions are set for the Google managed Service account as per the instructions. The pipeline preview function works without any issues.
However, when I deploy and run the pipeline it fails after a couple of minutes. Shortly after the status changes from provisioning to running the pipeline stops with the following permissions error:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "xxxxxxxxxxx-compute#developer.gserviceaccount.com does not have storage.buckets.create access to project X.",
"reason" : "forbidden"
} ],
"message" : "xxxxxxxxxxx-compute#developer.gserviceaccount.com does not have storage.buckets.create access to project X."
}
xxxxxxxxxxx-compute#developer.gserviceaccount.com is the default Compute Engine service account for my project.
"Project X" is not one of mine though, I've no idea why the pipeline startup code is trying to create a bucket there, it does successfully create temporary buckets ( one called df-xxx and one called dataproc-xxx) in my project before it fails.
I've tried this with two separate accounts and get the same error in both places. I had tried adding storage/admin roles to the various service accounts to no avail but that was before I realized it was attempting to access a different project entirely.

I believe I was able to reproduce this. What's happening is that the BigQuery Source plugin first creates a temporary working GCS bucket to export the data to, and I suspect it is attempting to create it in the Dataset Project ID by default, instead of your own project as it should.
As a workaround, create a GCS bucket in your account, and then in the BigQuery Source configuration of your pipeline, set the "Temporary Bucket Name" configuration to "gs://<your-bucket-name>"

You are missing setting up permissions steps after you create an instance. The instructions to give your service account right permissions is in this page https://cloud.google.com/data-fusion/docs/how-to/create-instance

Related

Terraform GCS backend writing .tflock failed. 403 access denied

I am trying to use Terraform with a Google Cloud Storage backend, but I'm facing some issues when executing this in my CI pipeline.
I have set the GOOGLE_APPLICATION_CREDENTIALS to my service account JSON keyfile, but whenever I try to init Terraform, I get the following errors:
Error loading state: 2 errors occurred:
* writing "gs://[my bucket name]/state/default.tflock" failed: googleapi: Error 403: Access denied., forbidden
* storage: object doesn't exist
I have tried all documented methods of authentication, but still no luck.
Turns out only the second error was actually relevant and there were no authentication issues after all.
My remote backend only contained my custom workspace state files and no default state.
Since terraform init needs to be executed before being able to switch to a workspace, it was looking for a default.tflock/default.tfstate file that did not exist.
From my local workstation I initialized the default workspace, which created the file that Terraform was looking for.
I wasted a good few hours trying to debug a service account authentication issue that did not exist. I hope this answer can save someone else from that rabbit hole...

Unsure how to configure credentials for AWS Amplify cli user - ready to ditch Amplify

I have a react Amplify App. All I want to do is work on it, push the changes to amplify, etc. These are all standard and basic commands like amplify push.
The problem is that shortly after starting to work on my app ( a month or two ), I was no longer allowed to push, pull, or work on the app from the command line. There is no explanation, and the only error is this ...
An error occurred during the push operation: /
Access Denied
✅ Report saved: /var/folders/8j/db7_b0d90tq8hgpfcxrdlr400000gq/T/storygraf/report-1658279884644.zip
✔ Done
The logs created from the error show this.
error.json
{
"message": "Access Denied",
"code": "AccessDenied",
"region": null,
"time": "2022-07-20T01:20:01.876Z",
"requestId": "DRFVQWYWJAHWZ8JR",
"extendedRequestId": "hFfxnwUjbtG/yBPYG+GW3B+XfzgNiI7KBqZ1vLLwDqs/D9Qo+YfIc9dVOxqpMo8NKDtHlw3Uglk=",
"statusCode": 403,
"retryable": false,
"retryDelay": 60.622127086356855
}
I have two users in my .aws/credentials file. One is the default (which is my work account). The other is called "personal". I have tried to push with
amplify push
amplify push --profile default
amplify push --profile personal
It always results in the same.
I followed the procedure located here under the title "Create environment variables to assume the IAM role and verify access" and entered a new AWS_ACCESS_KEY_ID and a new AWS_SECRET_ACCESS_KEY. When I then run the command ...
aws sts get-caller-id
It returns the correct Arn. However, there is a AWS_SESSION_TOKEN variable that the docs say need to be set, and I have no idea what that is.
Running amplify push under this new profile still results in an error.
I have also tried
AWS_PROFILE=personal aws sts get-caller-identity
Again, this results in the correct settings, but the amplify push still fails for the same reasons.
At this point, i'm ready to drop it and move to something else. I've been debugging this for literally months now and it would be far easier to setup a standard react app on S3 and stand up my resources manually without dealing with this.
Any help is appreciated.
This is the same issue for me. There seems to be no way to reconfigure the CLI once its authentication method is set to profile. I'm trying to change it back to amplify studio and have not been able to crack the code on updating it. Documentation in this area is awful.
In the amplify folder there is a .config directory. There are three files:
local-aws-info.json
local-env-info.json
project-config.json
project-config.json is required, but the local-* files maintain state for your local configuration. Delete these and you can reinit the project and reauthenticate the amplify cli for the environment

Missing required GCS remote state configuration location

After Google Cloud quota update, I can't run terragrunt/terraform code due to strange error. Same code worked before with other project on same account. After I tried to recreate project (to get new clear project) there was some "Billing Quota" popup and I asked support for changing quota.
I got the following message from support:
Dear Developer,
We have approved your request for additional quota. Your new quota should take effect within one hour of receiving this message.
And now (1 day after) terragrunt is not working due to error:
Missing required GCS remote state configuration location
Actually what I got:
service account for pipelines with Project Editor and Service Networking Admin;
bucket without public access (europe-west3)
following terragrunt config:
remote_state {
backend = "gcs"
config = {
project = get_env("TF_VAR_project")
bucket = "bucket name"
prefix = "${path_relative_to_include()}"
}
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
}
Also i`m running following pipeline
- terragrunt run-all init
- terragrunt run-all validate
- terragrunt run-all plan
- terragrunt run-all apply --terragrunt-non-interactive -auto-approve
and its failing on init with error.'
Project and credentials are correct (also credentials stored in GOOGLE_CREDENTIALS env as json without new lines or whitespaces).
Also tryed to specify "location" in "config" but got error that bucket not found in project.
Does anybody know how to fix or where can be problem?
It worked before I got quota.

GCP Vertex AI Training Custom Job : User does not have bigquery.jobs.create permission

I'm struggling to execute a query with Bigquery python client from inside a training custom job of Vertex AI from Google Cloud Platform.
I have built a Docker image which contains this python code then I have pushed it to Container Registry (eu.gcr.io)
I am using this command to deploy
gcloud beta ai custom-jobs create --region=europe-west1 --display-name="$job_name" \
--config=config_custom_container.yaml \
--worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri="$docker_img_path" \
--args="${model_type},${env},${now}"
I have even tried to use the option --service-account to specify a service account with admin Bigquery role, it did not work.
According to this link
https://cloud.google.com/vertex-ai/docs/general/access-control?hl=th#granting_service_agents_access_to_other_resources
the Google-managed service accounts for AI Platform Custom Code Service Agent (Vertex AI) have already the right to access to BigQuery, so I do not understand why my job fails with this error
google.api_core.exceptions.Forbidden: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/*******/jobs?prettyPrint=false:
Access Denied: Project *******:
User does not have bigquery.jobs.create permission in project *******.
I have replaced the id with *******
Edit:
I have tried several configuration, my last config YAML file only contents this
baseOutputDirectory:
outputUriPrefix:
Using the field serviceAccount does not seem to edit the actual configuration unlike --service-account option
Edit 14-06-2021 : Quick Fix
like #Ricco.D said
try explicitly defining the project_id in your bigquery code if you
have not done this yet.
bigquery.Client(project=[your-project])
has fixed my problem. I still do not know about the causes.
To fix the issue it is needed to explicitly specify the project ID in the Bigquery code.
Example:
bigquery.Client(project=[your-project], credentials=credentials)

Google Cloud - creating sink

Im trying to export logs in to bigquery using sink from the cloud shell.
I did the following steps:
bq mk dataset
gcloud beta logging sinks create my-bq-sink \
bigquery.googleapis.com/projects/my-project/datasets/\
my_dataset --log-filter='resource.type="gce_instance"'
I created a service account for the sink and bind him to bigQuery.dataEditor and logging.logWriter
The problem is that if im not going to the console-> edit sink -> update sink im getting that my access to the dataset was denied. how can i solve that from cloud shell?
Like in many products, creating a service is separate from the IAM authorization. For the logging sink, the "strange" decision from Google has been to generate a service account by the logging service and to send you the name of this service account in the command result
Created [https://logging.googleapis.com/v2/projects/My_PROJECT/sinks/test].
Please remember to grant `serviceAccount:p78401601954-957849#gcp-sa-logging.iam.gserviceaccount.com` the BigQuery Data Editor role on the dataset.
More information about sinks can be found at https://cloud.google.com/logging/docs/export/configure_export
Not very usable if you want to script something. So, add the parameter --format=json into the sink creation command and the result is the following
{
"createTime": "2020-05-21T19:27:36.599050569Z",
"destination": "bigquery.googleapis.com/projects/My_PROJECT/datasets/asset_eu",
"filter": "resource.type=cloud_function",
"name": "test",
"updateTime": "2020-05-21T19:27:36.599050569Z",
"writerIdentity": "serviceAccount:p78401601954-465055#gcp-sa-logging.iam.gserviceaccount.com"
}
Now you can get the writerIdentity and grant the role that you need on it. However, I repeat, this choice is strange for Google (and not consistant with other products) and I won't be surprised that this behavior change in the future.