How to show and change user in Scheduled Queries - google-cloud-platform

Some of the scheduled queries in Google Cloud Platform suddenly don't run anymore, with the message "Access Denied: ... User does not have bigquery.tables.get permission for table..."
First, is it possible to see under which user the scheduled query is running?
Second, is it possible to change the user?
Thanks, Silvan

I always use service accounts for command line execution...
if you can use bq cli, look at --service_account and --service_account_credential_file
If you still want to use the schedule query, there is some documentation on the service account on https://cloud.google.com/bigquery/docs/scheduling-queries (per above)

This can also be done (for a normal non-service account user) via the console as per the instructions at: https://cloud.google.com/bigquery/docs/scheduling-queries#update_scheduled_query_credentials
"To refresh the existing credentials on a scheduled query:
Find and view the status of a scheduled query.
Click the MORE button and select Update credentials."

Although this thread is 2 years old, it is still relevant. So I will guide you on how to troubleshoot this issue below:
Cause:
This issue happens when the user that was running the query does not meet the required permissions. This could have been caused by a permissions removal or update of the scheduled query's user.
Step 1 - Checking which user is running the query:
Head to GCP - BigQuery - Scheduled Queries
Once on the scheduled queries screen, click on the display name of the query that need to be checked and head to configuration. There you will find the user that currently runs the query.
Step 2 - Understanding the permissions that are needed for running the query:
As specified on Google Cloud's website you need 3 permissions:
bigquery.transfers.update, and, on the dataset: bigquery.datasets.get and bigquery.datasets.update
Step 3 - Check running user's permissions:
From the GCP menu head to IAM & Admin - IAM
IAM
There you will find the permissions assigned to different users. Verify the permissions possessed by the user running the query.
Now we can solve this issue in 2 different ways:
Step 4 - Edit current user's roles or update the scheduler's credentials with an email that has the required permissions:
Option 1: Edit current user's roles: On the IAM screen you can click on "Edit
principal" next to a user to add, remove or update roles (remember to
add a role that complies with the permissions required mentioned in
Step 2).
Option 2: Update credentials (as #coderintherye suggested in another
answer): Head to GCP - BigQuery - Scheduled Queries and select
the query you want to troubleshoot - Head to MORE (on the
top-right corner of the screen) - Update credentials - Finally,
choose a mail. WARNING: That mail will now be the user that
runs the query, so make sure that it has the permissions needed
as mentioned in step 2.

To change a scheduled query from a user to a service account, you need to:
make sure that the service account is from the same project as the project where you are running your scheduled query.
You as a user and the service account, should have the appropriate permissions:
https://cloud.google.com/bigquery/docs/scheduling-queries#required_permissions
You can run a command from the CLI or python code to make the change from user to service account:
CLI:
bq update \
--transfer_config \
--update_credentials \
--service_account_name=abcdef-test-sa#abcdef-test.iam.gserviceaccount.com \
projects/862514312345/locations/us/transferConfigs/5dd12f12-0000-122f-bc38-089e0820fe38
Python:
from google.cloud import bigquery_datatransfer
from google.protobuf import field_mask_pb2
transfer_client = bigquery_datatransfer.DataTransferServiceClient()
service_account_name = "email address of your service account"
transfer_config_name = "projects/SOME_NUMBER/locations/EUROPE_OR_US/transferConfigs/A_LONG_ALPHANUMERIC_ID"
transfer_config = bigquery_datatransfer.TransferConfig(name=transfer_config_name)
transfer_config = transfer_client.update_transfer_config(
{
"transfer_config": transfer_config,
"update_mask": field_mask_pb2.FieldMask(paths=["service_account_name"]),
"service_account_name": service_account_name,
}
)
print("Updated config: '{}'".format(transfer_config.name))
See also here for code examples:
https://cloud.google.com/bigquery/docs/scheduling-queries#update_scheduled_query_credentials

bq update --transfer_config --update_credentials --service_account_name=<service_accounnt> <resource_name>
service account = service account id that you wish to use as a credential.
resource_name = resource name of the Scheduled query that you can see in the configuration section of the Scheduled query detail page.

Related

GCP logging: Find all resources (recently) used by a specific user

This is part of my journey to get a clear overview of which users/service accounts are in my GCP Project and when they last logged in.
Endgoal: to be able to clean up users/service-accounts if needed when they weren't on GCP for a long time.
First question:
How can I find in the logs when a specific user used resources, so I can determine when this person last logged in?
You need the Auditlogs and to see them you can run the following query in Cloud Logging:
protoPayload.#type="type.googleapis.com/google.cloud.audit.AuditLog"
protoPayload.authenticationInfo.principalEmail="your_user_name_email_or_your_service_account_email"
You can also check the Activity logs and filter on a user:
https://console.cloud.google.com/home/activity
Related questions + answers:
Pull "last access" information on projects from Google Cloud Platform (GCP)
IAM users and last login date in google cloud
How to list, find, or search iam policies across services (APIs), resource types, and projects in google cloud platform (GCP)?
There is now also the newly added Log Analytics.
This allows you to use SQL to query your logs.
Your logging buckets _Default and _Required need to be upgraded to be able to use Log Analytics:
https://cloud.google.com/logging/docs/buckets#upgrade-bucket
After that you use for example the console to use SQL on your logs:
https://console.cloud.google.com/logs/analytics
Unfortunately, at the moment you can only query the logs that were created after you've switched on Log Analytics.
Example query in the Log Analytics:
SELECT
timestamp,
proto_Payload.audit_log.authentication_info.principal_email,
auth_info.resource,
auth_info.permission,
auth_info.granted
FROM
`logs__Default_US._AllLogs`
left join unnest(proto_Payload.audit_log.authorization_info) auth_info
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
and proto_payload.type = "type.googleapis.com/google.cloud.audit.AuditLog"
and proto_Payload.audit_log.authentication_info.principal_email in ("name_of_your_user")
ORDER BY
timestamp

Schedule query failure in GCP with 'The caller does not have permission' error

So I created a python script similar to [BQ tutorial on SQ][1]. The service account has been set using os.environ. When executing with BigQuery Admin and other similar permissions(Data user, Data transfer agent, Data view etc) the schedule query creation fails with
status = StatusCode.PERMISSION_DENIED
details = "The caller does not have permission"
The least permission level it is accepting is 'Project Owner'. As this is a service account, I was hoping a lower permission level can be applied eg Bigquery Admin, as all I need with the Service account is to remotely create schedule queries. Even the how to guide says it should work. Can anyone provide some input if there is any other combination of permissions which will allow this to work please.
[1]: https://cloud.google.com/bigquery/docs/scheduling-queries#set_up_scheduled_queries

Get the BigQuery Table creator and Google Storage Bucket Creator Details

I am trying to identify the users who created tables in BigQuery.
Is there any command line or API that would provide this information. I know that audit logs do provide this information, but I was looking for a command line which could do the job so that i could wrap this in a shell script and run them against all the tables at one time. Same for Google Storage Buckets as well. I did try
gsutil iam get gs://my-bkt and looked for "role": "roles/storage.admin" role, but I do not find the admin role with all buckets. Any help?
This is a use case for audit logs. BigQuery tables don't report metadata about the original resource creator, so scanning via tables.list or inspecting the ACLs don't really expose who created the resource, only who currently has access.
What's the use case? You could certainly export the audit logs back into BigQuery and query for table creation events going forward, but that's not exactly the same.
You can find it out using Audit Logs. You can access them both via Console/Log Explorer or using gcloud tool from the CLI.
The log filter that you're interested in is this one:
resource.type = ("bigquery_project" OR "bigquery_dataset")
logName="projects/YOUR_PROJECT/logs/cloudaudit.googleapis.com%2Factivity"
protoPayload.methodName = "google.cloud.bigquery.v2.TableService.InsertTable"
protoPayload.resourceName = "projects/YOUR_PROJECT/datasets/curb_tracking/tables/YOUR_TABLE"
If you want to run it from the command line, you'd do something like this:
gcloud logging read \
'
resource.type = ("bigquery_project" OR "bigquery_dataset")
logName="projects/YOUR_PROJECT/logs/cloudaudit.googleapis.com%2Factivity"
protoPayload.methodName = "google.cloud.bigquery.v2.TableService.InsertTable"
protoPayload.resourceName = "projects/YOUR_PROJECT/datasets/curb_tracking/tables/YOUR_TABLE"
'\
--limit 10
You can then post-process the output to find out who created the table. Look for principalEmail field.

Workflow failed. Causes: There was a problem refreshing your credentials

I have a problem with dataflow, I need to execute a job and I get the following error:
Workflow failed. Causes: There was a problem refreshing your credentials. Please check:
1. Dataflow API is enabled for your project.
2. There is a robot service account for your project:
service-[project number]#dataflow-service-producer-prod.iam.gserviceaccount.com should have access to your project. If this account does not appear in the permissions tab for your project, contact Dataflow support.
And I have created the service account and with permits within the project.
Service account
so I still can not identify this error.
I have disabled the Dataflow API and reactivated and still nothing. Or if it is possible to regenerate this service account in order to execute a job?
Regards
Add the service account email into service_account_email option of your PipelineOptions.
def get_pipeline_options(self):
options = PipelineOptions( )
gcp_options = options.view_as( GoogleCloudOptions )
gcp_options.job_name = "sampleflow"
gcp_options.project = "etldemo-000000"
gcp_options.staging_location = "gs://<bucket name>/stage"
gcp_options.temp_location = "gs://<bucket name>/tmp"
gcp_options.service_account_email = "etldemo#etldemo-000000.iam.gserviceaccount.com"
options.view_as( StandardOptions ).runner = 'DataflowRunner'
return options

How can I grant individual permissions in Google Cloud Platform for BigQuery users using python

I need to set up very fine-grained access control for user accounts in GCP using a python script
I know that via UI/gcloud util I can give it role roles/big query. user, but it has a lot of other permissions I don't want this service account to have.
How can I grant individual permissions via python scripts?
Go to your BigQuery console, click into the arrow at the right of one dataset and then click into Share dataset
And then add the e-mail of the user here:
You can choose one of 3 roles available: Viewer/Owner/Editor.
Do this in every dataset to every user.
Update to do it via Python script
You can do it with a Python script following this small tutorial.
The code will be something like:
from google.cloud import bigquery
client = bigquery.Client()
dataset = client.get_dataset(client.dataset('dataset1'))
entry = bigquery.AccessEntry(
role='READER',
entity_type='userByEmail',
entity_id='user1#example.com')
assert entry not in dataset.access_entries
entries = list(dataset.access_entries)
entries.append(entry)
dataset.access_entries = entries
dataset = client.update_dataset(dataset, ['access_entries']) # API request
#assert entry in dataset.access_entries