Granting BigQuery permissions for a specific running project - google-cloud-platform

We're using BigQuery in a big team.
When I grant access for a dataset to a user, that user can query this dataset using a different project. (Project chosen on top)
In that case, I can't see the users query history on this dataset, since it's run using another project.
Let's say I grant access to an external consultant. That consultant can query all my data using another project, so I can't detect which queries that consultant ran.
So in short, is it possible to see all the queries from all the projects accessing to a specific table / dataset in BQ?
Or another solution, is it possible to limit the projects that can access a dataset?

Related

Power BI Embedded Approach for 100s of SQL Targets

I'm trying to find the best approach to delivering a BI solution to 400+ customers which each have their own database.
I've got PowerBI Embedded working using service principal licensing and I have the PowerBI service connected to my data through the On Premise Data Gateway.
I've build my first report pointing to 1 of the customer databases. Which works lovely.
What I want to do next, when embedding the report, is to tell PowerBI, for this session, to get the database from a different database.
I'm struggling to find somewhere where this is explained, or to understand if this is even possible.
I'm trying to avoid creating 400+ WorkSpaces or 400+ Data Sets.
If someone could point me in the right direction, it would be appreciated.
You can configure the report to use parameters and these parameters can be used to configure the source for your dataset:
https://www.phdata.io/blog/how-to-parameterize-data-sources-power-bi/
These parameters can be set by the app hosting the embedded report:
https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/update-parameters-in-group
Because the app is setting the parameter, each user will only see their own data. Since this will be a live connection, you would need to think about how the underlying server can support the workload.
An alternative solution would be to consolidate the customer databases into a single database (just the relevant tables) and use row level security to restrict access for each customer. The advantage to this design is that you take the burden off of the underlying SQL instance and push it into a PBI dataset that is made to handle huge datasets with sub-second response times.
More on that here: https://learn.microsoft.com/en-us/power-bi/enterprise/service-admin-rls

Get all BigQuery Query jobs across organisation that reference a specific table

Problem Statement
We're a large organisation (7000+ people) with many BigQuery projects. My team own a highly used set of approx 250 tables. We are aware of some data quality issues, but need to prioritise which tables we focus our efforts on.
In order to prioritise our effort, we plan to calculate two metrics for each table:
Monthly total count of query jobs referencing that table
Total number of distinct destination tables referencing that table
However, we are stuck on one aspect -- how do you access all the query jobs across the entire org that reference a specific table?
What we've tried
We've tried using the following query to find all query jobs referencing a table:
select count(*) from
`project-a`.`region-qualifier`.INFORMATION_SCHEMA.JOBS
where job_type = 'QUERY'
and referenced_tables.project_id = 'project-a'
and referenced_tables.dataset_id = 'dataset-b'
and referenced_tables.table_id = 'table-c'
Unfortunately, this is only showing query jobs that are kicked off with project-a as the billing project (afaik).
Summary
Imagine we have 50+ GCP projects that could be executing queries referencing a table we own, what we want is to see ALL those query jobs across all those projects.
Currently it's not possible to access all the query jobs across the entire organization that reference a specific table.
As you have mentioned you can list query jobs within the project using the query as:
select * from `PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.JOBS
where job_type = 'QUERY'
PROJECT_ID is the ID of your Cloud project. If not specified, the default project is used.
You can use the query without the project-id as:
select * from `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS
where job_type = 'QUERY'
For more information you can refer to this document.
If you want the feature to list query jobs across the entire organization be implemented, you can open a new feature request on the issue tracker describing your requirement.
Turns out that you can get this information through Google Cloud Logging.
The following command extracted the logs of all queries across the org referencing tables within <DATASET_ID>.
gcloud logging read 'timestamp >= "2022-09-01T00:00:00Z" AND resource.type=bigquery_dataset AND resource.labels.dataset_id=<DATASET_ID> AND severity=INFO'
Important that this command needs to be run from the project on which <DATASET_ID> exists and you need the roles/logging.admin role.
Worth to note that I was not able to test the INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION which should do the trick.

Any way to control users gaining access to an authorized dataset?

We have a bunch of BigQuery datasets and for some reason we need to give out an authorized dataset access to a dataset that's not owned by us or included in our project but the main concern here is that I need to have control over who he gives access to view our datasets from that authorized dataset. Any method or a best practice for this type of problem?
Basically, we did it this way. Gave that other project's dataset the "Authorized dataset" access because they need to build their own views and then open these views to other customers they have. It's now able to view our tables and run queries against our datasets but the problem is that we have no control over who they give access to their dataset that they're using against ours and we need to figure out a way to control this.

Data Set edit/refresh fails after being migrated to another user

I have the following issue in AWS QuickSight: A user created a dataset through Athena. Everything worked fine. The user shared the dataset with another user granting him OWNER rights. Then the first user was deleted. Now the second user can't edit the dataset anymore. He can share it but the person it is shared to can't edit it either. The error message:
Hopefully this can be solved by the Quicksight account Admin using the Quicksight UI to add dataset editing permission to this user as shown here.
Or it may well be that the new Owner does not have the required IAM permissions such as quicksight:UpdateDataSet IAM permission, see the docs.
What does it say when you click the "Show details" link in the screenshot above?
This is quite a mess to be honest. The data sources in QuickSight are connected to the user who created it. They inherit their access roles from whoever created them. This is not accessible through the API though I think it is mentioned in the documentation somewhere. Thus it can't be changed.
So when we deleted the users who originally created the data sources they ceased working along with the data sets based on them.
Our solution for this was that we created "standard" data sources with a technical user - this was not such a big deal because we exclusively use Athena - and then recreated all the data sets and switched them to the new standard data sources - this was a big deal because analysts had to switch data sets in their analysis / dashboards.
To me this shows that QuickSight is not quite complete as a analytics platform in large companies. The API is not quite there.

Google Data Studio Billing Report Demo for GCP multiple projects

Basically I am trying to setup Google Cloud Billing Report Demo for multiple projects.
Example mentioned in this link
In it there are 3 steps to configure datasource for data studio
Create the Billing Export Data Source
Create the Spending Trends Data Source
Create the BigQuery Audit Data Source
Now 1st point is quite clear.
For 2nd point the query example which is provided in demo is based on a single project. In my case I wanted to have spending datasource from multiple projects.
Does doing UNION of query based on each project works in this case?
For 3rd point, I need Bigquery Audit log from all my projects. I thought setting the external single dataset sink as shown below for bigquery in all my project should be able to do the needful.
bigquery.googleapis.com/projects/myorg-project/datasets/myorg_cloud_costs
But I see that in my dataset tables are creating with a suffix _(1) as shown below
cloudaudit_googleapis_com_activity_ (1)
cloudaudit_googleapis_com_data_access_ (1)
and these tables doesn't contain any data despite running bigquery queries in all projects multiple times.In fact it shows below error on previewing.
Unable to find table: myorg-project:cloud_costs.cloudaudit_googleapis_com_activity_20190113
I think auto generated name with suffix _ (1) is causing some issue and because of that data is also not getting populated.
I believe there should be a very simple solution for it, but it just I am not able to think in correct way.
Can somebody please provide some information on how to solve 2nd and 3rd requirement for multiple projects in gcp datastudio billing report demo?
For 2nd point the query example which is provided in demo is based on
a single project. In my case I wanted to have spending datasource from
multiple projects. Does doing UNION of query based on each project
works in this case?
That project is the project you specify for the bulling audit logs in BigQuery. The logs are attached to the billing account, which can contain multiple projects underneath it. All projects in the billing account will be captured in the logs - more specifically, the column project.id.
For 3rd point, I need Bigquery Audit log from all my projects. I
thought setting the external single dataset sink as shown below for
bigquery in all my project should be able to do the needful.
You use the includeChildren property. See here. If you don't have an organisation or use folders, then you will need to create a sink per project and point it at the dataset in BigQuery where you want all the logs to go. You can script this up using the gcloud tool. It's easy.
I think auto generated name with suffix _ (1) is causing some issue and because of that data is also not getting populated.
The suffix normal. Also, it can take a few hours for your logs/sinks to start flowing.