“Create new version” ignores custom service account - google-cloud-platform

I'm trying to deploy a new version of a model to AI Platform, it's a custom prediction routine. I've managed to deploy just fine when I have all the resources in the same GCP project, but when I try to deploy and I point the GCS files to a bucket in a different project, it fails to deploy. So I'm trying to pass which service account to use when creating the version, but it keeps ignoring it.
That's the message I get:
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://ml.googleapis.com/v1/projects/[gcp-project-1]/models/[model_name]/versions?alt=json returned "Field: version.deployment_uri Error: The provided GCS prefix [gs://[bucket-gcp-project-2]/] cannot be read by service account service-*****#cloud-ml.google.com.iam.gserviceaccount.com.". Details: "[{'#type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'version.deployment_uri', 'description': 'The provided GCS prefix [gs://[bucket-gcp-project-2]/] cannot be read by service account service-******#cloud-ml.google.com.iam.gserviceaccount.com.'}]}]
My request looks like
POST https://ml.googleapis.com/v1/projects/[gcp-project-1]/models/[model_name]/versions?alt=json
{
"name": "v1",
"deploymentUri": "gs://[bucket-gcp-project-2]",
"pythonVersion": "3.5",
"runtimeVersion": "1.13",
"package_uris": "gs://[bucket-gcp-project-2]/model.tar.gz",
"predictionClass": "predictor.Predictor",
"serviceAccount": "my-service-account#[gcp-project-1].iam.gserviceaccount.com"
}
The service account has access in both projects

Specifying a service account is documented as a beta feature. Try using the gcloud SDK, e.g.:
gcloud components install beta
gcloud beta ai-platform versions create v1 \
--service-account my-service-account#[gcp-project-1].iam.gserviceaccount.com ...

Related

Accessing Cloud storage bucket in a java SDK Apache Beam pipeline yields 401 Invalid credentials

I'm trying to read a csv from a cloud storage bucket and store it in a pcollection. To authenticate with the bucket, I'm using a service account with roles/storage.admin and a JSON key. This my pipelinesOptions object.
DataflowPipelineOptions dfOptions = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
dfOptions.setProject("project_name");
dfOptions.setStagingLocation("bucket_name");
dfOptions.setGcpCredential(GoogleCredentials.fromStream(
new FileInputStream(PATH_TO_JSON_KEY)));
dfOptions.setTempLocation("gs://bucket_name/folder_name");
dfOptions.setServiceAccount("serivce_acount_name");
Pipeline myPipe= Pipeline.create(dfOptions);
PCollection<ReadableFile> readFile= myPipe.apply( FileIO.match().filepattern("gs://bucket_name/file_name.csv")).apply(FileIO.readMatches());
However, running the above mentioned pipeline results in the error:
Caused by: java.io.IOException: Error trying to get gs://bucket_name/object_name.csv: {"code":401,"errors":[{"domain":"global","location":"Authorization","locationType":"header","message":"Invalid Credentials","reason":"authError"}],"message":"Invalid Credentials"}
If I use the dataflowrunner instead by adding to my Pipelineoptions
dfOptions.setRunner(DataflowRunner.class);
I get the same exact error for my staging bucket.
401 Unauthorized
GET https://storage.googleapis.com/storage/v1/b/bucket_name
{
"code" : 401,
...same as above...
}
I'm using the same credentials to access the same bucket GCS Java client library and it works absolutely fine.
StorageOptions options = StorageOptions.newBuilder()
.setProjectId(PROJECT_ID)
.setCredentials(GoogleCredentials.fromStream(
new FileInputStream(PATH_TO_JSON_KEY))).build();
Storage storage = options.getService();
Blob blob = storage.get(BUCKET_NAME, OBJECT_NAME);
ReadChannel r = blob.reader();
I also downloaded the same file from the same bucket with same Service account and key using gsutil with no problems. The problem only occurs when using Apache beam.
Versions of various dependencies I'm using-
Apache Beam 2.24
google-cloud-storage 2.11.3
google-cloud-dataflow-java-sdk-all 2.5.0
google-api-client 1.35.1
It is worth noting that Dataflow's support for Apache Beam 2.24.0 was deprecated on September 18, 2021. A first step I would say is to update to a recent version of the SDK. In particular, Beam has adopted the GCP Libraries BOM which coordinates the versions of GCP client libraries and auth libraries.
Instead of set the service account from the code of Dataflow job, you can pass a program argument when launching the job --serviceAccount :
example :
mvn compile exec:java \
-Dexec.mainClass=yourApp \
-Dexec.args=" \
--project=your_project \
--serviceAccount=your_account_id#project.gserviceaccount.com
You can check from the documentation :
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions

GCP Vertex AI Training Custom Job : User does not have bigquery.jobs.create permission

I'm struggling to execute a query with Bigquery python client from inside a training custom job of Vertex AI from Google Cloud Platform.
I have built a Docker image which contains this python code then I have pushed it to Container Registry (eu.gcr.io)
I am using this command to deploy
gcloud beta ai custom-jobs create --region=europe-west1 --display-name="$job_name" \
--config=config_custom_container.yaml \
--worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri="$docker_img_path" \
--args="${model_type},${env},${now}"
I have even tried to use the option --service-account to specify a service account with admin Bigquery role, it did not work.
According to this link
https://cloud.google.com/vertex-ai/docs/general/access-control?hl=th#granting_service_agents_access_to_other_resources
the Google-managed service accounts for AI Platform Custom Code Service Agent (Vertex AI) have already the right to access to BigQuery, so I do not understand why my job fails with this error
google.api_core.exceptions.Forbidden: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/*******/jobs?prettyPrint=false:
Access Denied: Project *******:
User does not have bigquery.jobs.create permission in project *******.
I have replaced the id with *******
Edit:
I have tried several configuration, my last config YAML file only contents this
baseOutputDirectory:
outputUriPrefix:
Using the field serviceAccount does not seem to edit the actual configuration unlike --service-account option
Edit 14-06-2021 : Quick Fix
like #Ricco.D said
try explicitly defining the project_id in your bigquery code if you
have not done this yet.
bigquery.Client(project=[your-project])
has fixed my problem. I still do not know about the causes.
To fix the issue it is needed to explicitly specify the project ID in the Bigquery code.
Example:
bigquery.Client(project=[your-project], credentials=credentials)

Google Cloud - creating sink

Im trying to export logs in to bigquery using sink from the cloud shell.
I did the following steps:
bq mk dataset
gcloud beta logging sinks create my-bq-sink \
bigquery.googleapis.com/projects/my-project/datasets/\
my_dataset --log-filter='resource.type="gce_instance"'
I created a service account for the sink and bind him to bigQuery.dataEditor and logging.logWriter
The problem is that if im not going to the console-> edit sink -> update sink im getting that my access to the dataset was denied. how can i solve that from cloud shell?
Like in many products, creating a service is separate from the IAM authorization. For the logging sink, the "strange" decision from Google has been to generate a service account by the logging service and to send you the name of this service account in the command result
Created [https://logging.googleapis.com/v2/projects/My_PROJECT/sinks/test].
Please remember to grant `serviceAccount:p78401601954-957849#gcp-sa-logging.iam.gserviceaccount.com` the BigQuery Data Editor role on the dataset.
More information about sinks can be found at https://cloud.google.com/logging/docs/export/configure_export
Not very usable if you want to script something. So, add the parameter --format=json into the sink creation command and the result is the following
{
"createTime": "2020-05-21T19:27:36.599050569Z",
"destination": "bigquery.googleapis.com/projects/My_PROJECT/datasets/asset_eu",
"filter": "resource.type=cloud_function",
"name": "test",
"updateTime": "2020-05-21T19:27:36.599050569Z",
"writerIdentity": "serviceAccount:p78401601954-465055#gcp-sa-logging.iam.gserviceaccount.com"
}
Now you can get the writerIdentity and grant the role that you need on it. However, I repeat, this choice is strange for Google (and not consistant with other products) and I won't be surprised that this behavior change in the future.

I'm getting an error creating a AWS AppSync Authenticated DataSource

I working through the Build On Serverless|S2 E4 video and I've gotten to the point of creating an authenticated HTTP datasource using the AWS CLI. I'm getting this error.
Parameter validation failed:
Unknown parameter in httpConfig: "authorizationConfig", must be one of: endpoint
I think I'm using the same information provided in the video, repository and gist, updated for my own aws account. It seems like it's some kind of formatting or missing information error, but, I'm just not seeing the problem.
When I remove the "authorizationConfig" property from the state-machine-datasource.json the command works.
I've reviewed the code against the information in the video as well as documentation and examples here and here provided by aws
This is the command I'm running.
aws appsync create-data-source --api-id {my app sync app id} --name ProcessBookingStateMachine
--type HTTP --http-config file://src/backend/booking/state-machine-datasource.json
--service-role-arn arn:aws:iam::{my account}:role/AppSyncProcessBookingState --profile default
This is my state-machine-datasource.json:
{
"endpoint": "https://states.us-east-2.amazonaws.com",
"authorizationConfig": {
"authorizationType": "AWS_IAM",
"awsIamConfig": {
"signingRegion": "us-east-2",
"signingServiceName": "states"
}
}
}
Thanks,
I needed to update my aws cli to the latest version. The authenticated http datasource is something fairly new I guess.

Permissions Issue with Google Cloud Data Fusion

I'm following the instructions in the Cloud Data Fusion sample tutorial and everything seems to work fine, until I try to run the pipeline right at the end. Cloud Data Fusion Service API permissions are set for the Google managed Service account as per the instructions. The pipeline preview function works without any issues.
However, when I deploy and run the pipeline it fails after a couple of minutes. Shortly after the status changes from provisioning to running the pipeline stops with the following permissions error:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "xxxxxxxxxxx-compute#developer.gserviceaccount.com does not have storage.buckets.create access to project X.",
"reason" : "forbidden"
} ],
"message" : "xxxxxxxxxxx-compute#developer.gserviceaccount.com does not have storage.buckets.create access to project X."
}
xxxxxxxxxxx-compute#developer.gserviceaccount.com is the default Compute Engine service account for my project.
"Project X" is not one of mine though, I've no idea why the pipeline startup code is trying to create a bucket there, it does successfully create temporary buckets ( one called df-xxx and one called dataproc-xxx) in my project before it fails.
I've tried this with two separate accounts and get the same error in both places. I had tried adding storage/admin roles to the various service accounts to no avail but that was before I realized it was attempting to access a different project entirely.
I believe I was able to reproduce this. What's happening is that the BigQuery Source plugin first creates a temporary working GCS bucket to export the data to, and I suspect it is attempting to create it in the Dataset Project ID by default, instead of your own project as it should.
As a workaround, create a GCS bucket in your account, and then in the BigQuery Source configuration of your pipeline, set the "Temporary Bucket Name" configuration to "gs://<your-bucket-name>"
You are missing setting up permissions steps after you create an instance. The instructions to give your service account right permissions is in this page https://cloud.google.com/data-fusion/docs/how-to/create-instance