Google Cloud - creating sink - google-cloud-platform

Im trying to export logs in to bigquery using sink from the cloud shell.
I did the following steps:
bq mk dataset
gcloud beta logging sinks create my-bq-sink \
bigquery.googleapis.com/projects/my-project/datasets/\
my_dataset --log-filter='resource.type="gce_instance"'
I created a service account for the sink and bind him to bigQuery.dataEditor and logging.logWriter
The problem is that if im not going to the console-> edit sink -> update sink im getting that my access to the dataset was denied. how can i solve that from cloud shell?

Like in many products, creating a service is separate from the IAM authorization. For the logging sink, the "strange" decision from Google has been to generate a service account by the logging service and to send you the name of this service account in the command result
Created [https://logging.googleapis.com/v2/projects/My_PROJECT/sinks/test].
Please remember to grant `serviceAccount:p78401601954-957849#gcp-sa-logging.iam.gserviceaccount.com` the BigQuery Data Editor role on the dataset.
More information about sinks can be found at https://cloud.google.com/logging/docs/export/configure_export
Not very usable if you want to script something. So, add the parameter --format=json into the sink creation command and the result is the following
{
"createTime": "2020-05-21T19:27:36.599050569Z",
"destination": "bigquery.googleapis.com/projects/My_PROJECT/datasets/asset_eu",
"filter": "resource.type=cloud_function",
"name": "test",
"updateTime": "2020-05-21T19:27:36.599050569Z",
"writerIdentity": "serviceAccount:p78401601954-465055#gcp-sa-logging.iam.gserviceaccount.com"
}
Now you can get the writerIdentity and grant the role that you need on it. However, I repeat, this choice is strange for Google (and not consistant with other products) and I won't be surprised that this behavior change in the future.

Related

Running Dataflow Flex template poll time out

I have two service accounts with exact same roles under the same project and one can run the Flex template without any issue but the other fails to do so and returns:
Timeout in polling result file: <LOGGING_BUCKET>. Service account: <SERVICE_ACCOUNT> Image URL: <IMAGE_URL> Troubleshooting guide at https://cloud.google.com/dataflow/docs/guides/common-errors#timeout-polling
The SA that fails to run doesn't write the logs to GCS bucket, making it really difficult to debug. The graph doesn't get created and seems to get stuck at queue stage. The roles of both SAs are:
BigQuery Admin
Bigtable User
Dataflow Developer
Editor
Storage Object Viewer
Sorry if is it obvious...but
Have you checked the google doc from the error? (https://cloud.google.com/dataflow/docs/guides/common-errors#timeout-polling).
Both SAs have the same roles?
Let's say that SA1 can run Flex1, and SA2 can't run Flex2. Have you tried to assign SA1 into Flex2?
What could be any possible difference between both SAs?
If you create SA3 with the same roles as SA2 and assign it to Flex2, does it work?
Good luck

GCP Vertex AI Training Custom Job : User does not have bigquery.jobs.create permission

I'm struggling to execute a query with Bigquery python client from inside a training custom job of Vertex AI from Google Cloud Platform.
I have built a Docker image which contains this python code then I have pushed it to Container Registry (eu.gcr.io)
I am using this command to deploy
gcloud beta ai custom-jobs create --region=europe-west1 --display-name="$job_name" \
--config=config_custom_container.yaml \
--worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri="$docker_img_path" \
--args="${model_type},${env},${now}"
I have even tried to use the option --service-account to specify a service account with admin Bigquery role, it did not work.
According to this link
https://cloud.google.com/vertex-ai/docs/general/access-control?hl=th#granting_service_agents_access_to_other_resources
the Google-managed service accounts for AI Platform Custom Code Service Agent (Vertex AI) have already the right to access to BigQuery, so I do not understand why my job fails with this error
google.api_core.exceptions.Forbidden: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/*******/jobs?prettyPrint=false:
Access Denied: Project *******:
User does not have bigquery.jobs.create permission in project *******.
I have replaced the id with *******
Edit:
I have tried several configuration, my last config YAML file only contents this
baseOutputDirectory:
outputUriPrefix:
Using the field serviceAccount does not seem to edit the actual configuration unlike --service-account option
Edit 14-06-2021 : Quick Fix
like #Ricco.D said
try explicitly defining the project_id in your bigquery code if you
have not done this yet.
bigquery.Client(project=[your-project])
has fixed my problem. I still do not know about the causes.
To fix the issue it is needed to explicitly specify the project ID in the Bigquery code.
Example:
bigquery.Client(project=[your-project], credentials=credentials)

BigQury Storage Read API, the user does not have 'bigquery.readsessions.create'

I'm trying to use BigQuery Storage Read API. As far as I can tell, the local script is using the an account, that has Owner role, BigQuery user, and BigQuery read session on the entire project. However, running the code from the local machine yields this error:
google.api_core.exceptions.PermissionDenied: 403 request failed: the user does not have 'bigquery.readsessions.create' permission for 'projects/xyz'
According to the GCP documentation the API is enabled by default. So the only reason I can think of is my script is using the wrong account.
How would you go debugging this issue? Is there a way to know for sure which user/account is running a python code on run time, something like print(user.user_name)
There is a gcloud command to get the current user permissions
$ gcloud projects get-iam-policy [PROJECT_ID]
You can also check the user_email field of your job to find out which user it is using to execute your query.
Example:
{
# ...
"user_email": "myemail#company.com",
"configuration": {
# ...
"jobType": QUERY
},
},
"jobReference": {
"projectId": "my-project",
# ...
}

Permissions Issue with Google Cloud Data Fusion

I'm following the instructions in the Cloud Data Fusion sample tutorial and everything seems to work fine, until I try to run the pipeline right at the end. Cloud Data Fusion Service API permissions are set for the Google managed Service account as per the instructions. The pipeline preview function works without any issues.
However, when I deploy and run the pipeline it fails after a couple of minutes. Shortly after the status changes from provisioning to running the pipeline stops with the following permissions error:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "xxxxxxxxxxx-compute#developer.gserviceaccount.com does not have storage.buckets.create access to project X.",
"reason" : "forbidden"
} ],
"message" : "xxxxxxxxxxx-compute#developer.gserviceaccount.com does not have storage.buckets.create access to project X."
}
xxxxxxxxxxx-compute#developer.gserviceaccount.com is the default Compute Engine service account for my project.
"Project X" is not one of mine though, I've no idea why the pipeline startup code is trying to create a bucket there, it does successfully create temporary buckets ( one called df-xxx and one called dataproc-xxx) in my project before it fails.
I've tried this with two separate accounts and get the same error in both places. I had tried adding storage/admin roles to the various service accounts to no avail but that was before I realized it was attempting to access a different project entirely.
I believe I was able to reproduce this. What's happening is that the BigQuery Source plugin first creates a temporary working GCS bucket to export the data to, and I suspect it is attempting to create it in the Dataset Project ID by default, instead of your own project as it should.
As a workaround, create a GCS bucket in your account, and then in the BigQuery Source configuration of your pipeline, set the "Temporary Bucket Name" configuration to "gs://<your-bucket-name>"
You are missing setting up permissions steps after you create an instance. The instructions to give your service account right permissions is in this page https://cloud.google.com/data-fusion/docs/how-to/create-instance

Get the BigQuery Table creator and Google Storage Bucket Creator Details

I am trying to identify the users who created tables in BigQuery.
Is there any command line or API that would provide this information. I know that audit logs do provide this information, but I was looking for a command line which could do the job so that i could wrap this in a shell script and run them against all the tables at one time. Same for Google Storage Buckets as well. I did try
gsutil iam get gs://my-bkt and looked for "role": "roles/storage.admin" role, but I do not find the admin role with all buckets. Any help?
This is a use case for audit logs. BigQuery tables don't report metadata about the original resource creator, so scanning via tables.list or inspecting the ACLs don't really expose who created the resource, only who currently has access.
What's the use case? You could certainly export the audit logs back into BigQuery and query for table creation events going forward, but that's not exactly the same.
You can find it out using Audit Logs. You can access them both via Console/Log Explorer or using gcloud tool from the CLI.
The log filter that you're interested in is this one:
resource.type = ("bigquery_project" OR "bigquery_dataset")
logName="projects/YOUR_PROJECT/logs/cloudaudit.googleapis.com%2Factivity"
protoPayload.methodName = "google.cloud.bigquery.v2.TableService.InsertTable"
protoPayload.resourceName = "projects/YOUR_PROJECT/datasets/curb_tracking/tables/YOUR_TABLE"
If you want to run it from the command line, you'd do something like this:
gcloud logging read \
'
resource.type = ("bigquery_project" OR "bigquery_dataset")
logName="projects/YOUR_PROJECT/logs/cloudaudit.googleapis.com%2Factivity"
protoPayload.methodName = "google.cloud.bigquery.v2.TableService.InsertTable"
protoPayload.resourceName = "projects/YOUR_PROJECT/datasets/curb_tracking/tables/YOUR_TABLE"
'\
--limit 10
You can then post-process the output to find out who created the table. Look for principalEmail field.