BigQury Storage Read API, the user does not have 'bigquery.readsessions.create' - google-cloud-platform

I'm trying to use BigQuery Storage Read API. As far as I can tell, the local script is using the an account, that has Owner role, BigQuery user, and BigQuery read session on the entire project. However, running the code from the local machine yields this error:
google.api_core.exceptions.PermissionDenied: 403 request failed: the user does not have 'bigquery.readsessions.create' permission for 'projects/xyz'
According to the GCP documentation the API is enabled by default. So the only reason I can think of is my script is using the wrong account.
How would you go debugging this issue? Is there a way to know for sure which user/account is running a python code on run time, something like print(user.user_name)

There is a gcloud command to get the current user permissions
$ gcloud projects get-iam-policy [PROJECT_ID]
You can also check the user_email field of your job to find out which user it is using to execute your query.
Example:
{
# ...
"user_email": "myemail#company.com",
"configuration": {
# ...
"jobType": QUERY
},
},
"jobReference": {
"projectId": "my-project",
# ...
}

Related

'403 Permission denied while getting Drive credentials' when using Deployment Manager to create an 'external table' in BigQuery

Steps to reproduce:
Create sheet in Google Sheets
Enable Deployment Manager & Google Drive API in Google Cloud Platform
add deployment manager service-account with view permissions on sheet
Create dataset with deployment manager
Create table with deployment manager, reference external sheet in sourceUris
partial python template:
def GenerateConfig(context):
name: str = context.env['name']
dataset: str = context.properties['dataset']
tables: [] = context.properties['tables']
location: str = context.properties.get('location', 'EU')
resources = [{
'name': name,
'type': 'gcp-types/bigquery-v2:datasets',
'properties': {
'datasetReference': {
'datasetId': dataset,
},
'location': location
},
}]
for t in tables:
resources.append({
'name': '{}-tbl'.format(t["name"]),
'type': 'gcp-types/bigquery-v2:tables',
'properties': {
'datasetId': dataset,
'tableReference': {
'tableId': t["name"]
},
'externalDataConfiguration': {
'sourceUris': ['https://docs.google.com/spreadsheets/d/123123123123123-123123123123/edit?usp=sharing'],
'sourceFormat': 'GOOGLE_SHEETS',
'autodetect': True,
'googleSheetsOptions':
{
"skipLeadingRows": '1',
}
}
},
})
return {'resources': resources}
I've found a few leads such as this, but they all reference using 'scopes' to add https://www.googleapis.com/auth/drive.
I'm not sure of how to add scopes to a deployment manager request, or really how scopes work.
Any help would be appreciated.
Yes, using scopes solves the problem. However, even after adding the scopes, I was facing the same error. Sharing the google sheets document with the GCP service account helped me get rid of this error.
To summarize - use scopes and share the document with the GCP service account that you will use for querying the table.
Also, this document is helpful for querying external tables
I was having the same issue when running Airflow DAGs on Cloud Composer, which is the managed Airflow service on Google Cloud Platform.
Essentially you need to:
Share the file with the email of the service account (give Viewer or Editor permissions based on what the DAG is supposed to execute)
Enable Google Drive OAuth Scopes
Depending on the Cloud Composer version you are using, the second step should be executed in a slightly different way:
For Cloud Composer 1
You should add the Google Drive OAuth Scope through the User Interface:
"https://www.googleapis.com/auth/drive"
Alternatively, if you are using Infrastructure as a Code (e.g. Terraform), you can specify oauth_scopes as shown below:
config {
...
node_config {
...
oauth_scopes = [
"https://www.googleapis.com/auth/drive",
]
}
}
For Cloud Composer 2
Since Cloud Composer v2 uses GKE Autopilot, it does not support OAuth on the environment level. You can however specify the scope at the connection level, that is being used by your Airflow Operator in order to initiate the connection.
If you are using the default GCP connection (i.e. google_cloud_default which is automatically created upon deployment of the Cloud Composer instance), then all you need to do is specify Google Drive ("https://www.googleapis.com/auth/drive") in the scopes of the connection (through Airflow Connections UI).
Alternatively, you can even create your new connection and once again specify the Google Drive in the scopes field and then pass the name of this connection in the gcp_conn_id argument of your Operator.

Google Cloud - creating sink

Im trying to export logs in to bigquery using sink from the cloud shell.
I did the following steps:
bq mk dataset
gcloud beta logging sinks create my-bq-sink \
bigquery.googleapis.com/projects/my-project/datasets/\
my_dataset --log-filter='resource.type="gce_instance"'
I created a service account for the sink and bind him to bigQuery.dataEditor and logging.logWriter
The problem is that if im not going to the console-> edit sink -> update sink im getting that my access to the dataset was denied. how can i solve that from cloud shell?
Like in many products, creating a service is separate from the IAM authorization. For the logging sink, the "strange" decision from Google has been to generate a service account by the logging service and to send you the name of this service account in the command result
Created [https://logging.googleapis.com/v2/projects/My_PROJECT/sinks/test].
Please remember to grant `serviceAccount:p78401601954-957849#gcp-sa-logging.iam.gserviceaccount.com` the BigQuery Data Editor role on the dataset.
More information about sinks can be found at https://cloud.google.com/logging/docs/export/configure_export
Not very usable if you want to script something. So, add the parameter --format=json into the sink creation command and the result is the following
{
"createTime": "2020-05-21T19:27:36.599050569Z",
"destination": "bigquery.googleapis.com/projects/My_PROJECT/datasets/asset_eu",
"filter": "resource.type=cloud_function",
"name": "test",
"updateTime": "2020-05-21T19:27:36.599050569Z",
"writerIdentity": "serviceAccount:p78401601954-465055#gcp-sa-logging.iam.gserviceaccount.com"
}
Now you can get the writerIdentity and grant the role that you need on it. However, I repeat, this choice is strange for Google (and not consistant with other products) and I won't be surprised that this behavior change in the future.

How to show and change user in Scheduled Queries

Some of the scheduled queries in Google Cloud Platform suddenly don't run anymore, with the message "Access Denied: ... User does not have bigquery.tables.get permission for table..."
First, is it possible to see under which user the scheduled query is running?
Second, is it possible to change the user?
Thanks, Silvan
I always use service accounts for command line execution...
if you can use bq cli, look at --service_account and --service_account_credential_file
If you still want to use the schedule query, there is some documentation on the service account on https://cloud.google.com/bigquery/docs/scheduling-queries (per above)
This can also be done (for a normal non-service account user) via the console as per the instructions at: https://cloud.google.com/bigquery/docs/scheduling-queries#update_scheduled_query_credentials
"To refresh the existing credentials on a scheduled query:
Find and view the status of a scheduled query.
Click the MORE button and select Update credentials."
Although this thread is 2 years old, it is still relevant. So I will guide you on how to troubleshoot this issue below:
Cause:
This issue happens when the user that was running the query does not meet the required permissions. This could have been caused by a permissions removal or update of the scheduled query's user.
Step 1 - Checking which user is running the query:
Head to GCP - BigQuery - Scheduled Queries
Once on the scheduled queries screen, click on the display name of the query that need to be checked and head to configuration. There you will find the user that currently runs the query.
Step 2 - Understanding the permissions that are needed for running the query:
As specified on Google Cloud's website you need 3 permissions:
bigquery.transfers.update, and, on the dataset: bigquery.datasets.get and bigquery.datasets.update
Step 3 - Check running user's permissions:
From the GCP menu head to IAM & Admin - IAM
IAM
There you will find the permissions assigned to different users. Verify the permissions possessed by the user running the query.
Now we can solve this issue in 2 different ways:
Step 4 - Edit current user's roles or update the scheduler's credentials with an email that has the required permissions:
Option 1: Edit current user's roles: On the IAM screen you can click on "Edit
principal" next to a user to add, remove or update roles (remember to
add a role that complies with the permissions required mentioned in
Step 2).
Option 2: Update credentials (as #coderintherye suggested in another
answer): Head to GCP - BigQuery - Scheduled Queries and select
the query you want to troubleshoot - Head to MORE (on the
top-right corner of the screen) - Update credentials - Finally,
choose a mail. WARNING: That mail will now be the user that
runs the query, so make sure that it has the permissions needed
as mentioned in step 2.
To change a scheduled query from a user to a service account, you need to:
make sure that the service account is from the same project as the project where you are running your scheduled query.
You as a user and the service account, should have the appropriate permissions:
https://cloud.google.com/bigquery/docs/scheduling-queries#required_permissions
You can run a command from the CLI or python code to make the change from user to service account:
CLI:
bq update \
--transfer_config \
--update_credentials \
--service_account_name=abcdef-test-sa#abcdef-test.iam.gserviceaccount.com \
projects/862514312345/locations/us/transferConfigs/5dd12f12-0000-122f-bc38-089e0820fe38
Python:
from google.cloud import bigquery_datatransfer
from google.protobuf import field_mask_pb2
transfer_client = bigquery_datatransfer.DataTransferServiceClient()
service_account_name = "email address of your service account"
transfer_config_name = "projects/SOME_NUMBER/locations/EUROPE_OR_US/transferConfigs/A_LONG_ALPHANUMERIC_ID"
transfer_config = bigquery_datatransfer.TransferConfig(name=transfer_config_name)
transfer_config = transfer_client.update_transfer_config(
{
"transfer_config": transfer_config,
"update_mask": field_mask_pb2.FieldMask(paths=["service_account_name"]),
"service_account_name": service_account_name,
}
)
print("Updated config: '{}'".format(transfer_config.name))
See also here for code examples:
https://cloud.google.com/bigquery/docs/scheduling-queries#update_scheduled_query_credentials
bq update --transfer_config --update_credentials --service_account_name=<service_accounnt> <resource_name>
service account = service account id that you wish to use as a credential.
resource_name = resource name of the Scheduled query that you can see in the configuration section of the Scheduled query detail page.

Get the BigQuery Table creator and Google Storage Bucket Creator Details

I am trying to identify the users who created tables in BigQuery.
Is there any command line or API that would provide this information. I know that audit logs do provide this information, but I was looking for a command line which could do the job so that i could wrap this in a shell script and run them against all the tables at one time. Same for Google Storage Buckets as well. I did try
gsutil iam get gs://my-bkt and looked for "role": "roles/storage.admin" role, but I do not find the admin role with all buckets. Any help?
This is a use case for audit logs. BigQuery tables don't report metadata about the original resource creator, so scanning via tables.list or inspecting the ACLs don't really expose who created the resource, only who currently has access.
What's the use case? You could certainly export the audit logs back into BigQuery and query for table creation events going forward, but that's not exactly the same.
You can find it out using Audit Logs. You can access them both via Console/Log Explorer or using gcloud tool from the CLI.
The log filter that you're interested in is this one:
resource.type = ("bigquery_project" OR "bigquery_dataset")
logName="projects/YOUR_PROJECT/logs/cloudaudit.googleapis.com%2Factivity"
protoPayload.methodName = "google.cloud.bigquery.v2.TableService.InsertTable"
protoPayload.resourceName = "projects/YOUR_PROJECT/datasets/curb_tracking/tables/YOUR_TABLE"
If you want to run it from the command line, you'd do something like this:
gcloud logging read \
'
resource.type = ("bigquery_project" OR "bigquery_dataset")
logName="projects/YOUR_PROJECT/logs/cloudaudit.googleapis.com%2Factivity"
protoPayload.methodName = "google.cloud.bigquery.v2.TableService.InsertTable"
protoPayload.resourceName = "projects/YOUR_PROJECT/datasets/curb_tracking/tables/YOUR_TABLE"
'\
--limit 10
You can then post-process the output to find out who created the table. Look for principalEmail field.

Access google cloud spanner database list using service account

We have created cloud spanner instance and databases on google cloud console.
Following code snippet which we are executing.
def getDatabaseList(self,):
try:
parent = "projects/"+self._PROJECT_NAME + "/instances/" + self._INSTANCE_NAME
response = self.service.projects().instances().databases().list(parent=parent).execute()
except Exception, e:
logging.info("Exception while getDatabaseList %s", e)
return False
return response
In the above code snippet is self.service is object googleapiclinet library build object.
We are getting below exception while executing above code snippet using service account id.
Exception while getDatabaseList <HttpError 403 when requesting https://spanner.googleapis.com/v1/projects/<projectName>/instances/<instanceName>/databases?alt=json&key=<APIKEY>
returned "Resource projects/<projectName>/instances/<instanceName> is missing IAM permission: spanner.databases.list.">
Reference document cloud spanner IAM
The following link shows an example to list Databases in an instance using Python Spanner Client Library
https://github.com/googleapis/python-spanner/blob/main/samples/samples/snippets.py#L144
Regarding the IAM permission issue it seems you have not set the GOOGLE_APPLICATION_CREDENTIALS. #ACimander answer is correct.
You can also use gcloud to authenticate using service account by
gcloud auth activate-service-account SERVICE_ACCOUNT#DOMAIN.COM --key-file=/path/key.json --project=PROJECT_ID
More information on this can be found in https://cloud.google.com/sdk/gcloud/reference/auth/activate-service-account
A little late, but hopefully this helps: Did you set path to your service-account's json file correctly? I wasted half a day playing with the permissions until I figured out that I simply missed a an env key.
set export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service_account/key.json