I am currently exploring the GCP BigQuery IAM & Access Control.
The list of users can be managed through groups. Here's an example:
Group A - Has access to BigQuery (BigQuery User Role, Viewer Access to specific datasets)
Group B - Has bucket-owner access to Cloud Storage
Common users who are in both groups have the ability to export data from BigQuery to Cloud Storage. So, what is the best practice to deny exporting certain BigQuery tables or data that is larger than a certain number of rows, so transactional data exports are restricted?
As you can see here, right now access controls can be applied to BigQuery datasets, but not to specific tables or views. Therefore, from a BigQuery perspective, your restrictions cannot be applied. Instead, you should use your own application to define the rules regarding the restrictions you seek.
However, given that the feature of restricting access to BigQuery tables and data with a big number of rows for exports, I have filed a feature request on your behalf as a Cloud Support representative. You can go to this link to check this request and follow the progress that is being made through it. You can click on the star icon on the top left corner to enable notification that will be sent to you whenever any progress has been made.
Related
is it in Google Cloud Platform possible to create a user with read only Access to a BigQuery Table and moreover a pre-set Budget ? I have a new colleague and he never worked before with BigQuery and i want to avoid a high bill at the end of month.
Thx
You can set, not for a specific user, but for all the users, a quotas that limit the too high cost expenses. You have details here
If you want to enforce this quotas to only this user, create a project dedicated to this user, grant them as bigquery job user (to allow them to create query jobs), and bigquery data viewer on the table/dataset that you want (to allow them to access to the data to request). on this specific project set the quotas that you want. Like that, only the user that use this project to query BigQuery will be limited by the quota.
Problem: I have a project in BigQuery where all my data is stored. Within this project I created multiple datasets containing different views. Now I want to use different service accounts to query the different datasets containing different views via grafana (if that matters). These users should only be able to query the views (and therefore a specific dataset) meant for them.
What I tried: I granted BigQuery User, Viewer or Editor permissions (I tried all of them) at a dataset level (and also BigQuery Meatadata Viewer at a project level). When I query a view, I receive the error:
User does not have bigquery.jobs.create permission in project xy.
Questions: It is not clear to me if granting bigquery.jobs.create permission on project level, will allow the user to query all datasets instead of only the one I want him to access to.
Is there any way to allow the user to create jobs only on a single dataset?
Update October 2021
I've just seen that this question did go unanswered for me back then but still gets a lot of views. I believe the possibilities changed a bit since I asked the question so here is how I'm handling it now:
I give the respective service account the role roles/bigquery.jobUser on project level. This allows it to create jobs in general, however since I don't give any other permissions yet it cannot query data yet.
Then I give the role roles/bigquery.dataViewer on the dataset level. That makes it possible for the service account to query only the dataset I granted the permission on.
It is also possible to grant roles/bigquery.dataViewer on table level, what will restrict access to only the specific table.
In case you want the service account not only to query (view) the data, but also to insert or change it for example, replace roles/bigquery.dataViewer with the role having the necessary permissions (or assign that role in addition).
How to grant the permissions:
On dataset level
On table or view level
We had a same problem, how we solved was, created a custom role and assigned the custom role to the particular dataset.
You can grant bigquery.user role to a specific dataset as indicated in this guide. The bigquery.user role contains the bigquery.jobs.create permission as well as other basic permissions related to querying datasets. You can check the full list of permissions for this role in this list.
As suggested above, you can also create custom roles having only the exact permissions you want by following this piece of documentation.
How to fetch cloud storage bucket last access details. As of now, I'm seeing we can find only last modified date for bucket and objects. Is there any way to fetch last access details for buckets and objects. Do we need to enable logging for each object to fetch it or Is there any options available?
There are several types of logs you can enable to get this information.
Cloud Audit Logs is the recommended method for generating logs that track API operations performed in Cloud Storage:
Cloud Audit Logs tracks access on a continuous basis.
Cloud Audit Logs produces logs that are easier to work with.
Cloud Audit Logs can monitor many of your Google Cloud services, not
just Cloud Storage.
Audit Logs are logged in "near" real-time and available as any other logs in GCP. You can view a summary of the audit logs for your project in the Activity Stream in the Google Cloud Console. A more detailed version of the logs can found in the Logs Viewer.
In some cases, you may want to use Access Logs instead. You most likely want to use access logs if:
You want to track access to public objects, such as assets in a
bucket that you've configured to be a static website.
You want to track access to objects when the access is exclusively
granted because of the Access Control Lists (ACLs) set on the
objects.
You want to track changes made by the Object Lifecycle Management
feature.
You intend to use authenticated browser downloads to access objects
in the bucket.
You want your logs to include latency information, or the request and
response size of individual HTTP requests.
As opposed to audit logs, access logs aren't sent "real-time" to Stackdriver Logging but are offered in the form of CSV files, generated hourly when there is activity to report in the monitored bucket, that you can download and view.
The access logs can provide an overwhelming amount of information. You'll find here a table to help you identify all the information provided in these logs.
Cloud Storage buckets are meant to serve high volumes of read requests through a variety of means. As such, reads don't also write any additional data - that would not scale well. If you want to record when an object gets read, you would need to have the client code reading the object to also write the current time to some persistent storage. Or, you could force all reads through some API endpoint that performs the update manually. In either case, you are writing code and using additional resources to store this data.
I'm trying to figure out if I can create multiple service accounts and for each service account create a different Policy (or even a generic policy).
In this policy I want to set the default retention for a dataset/table.
Only I (admin) can change the retention after table creation.
This is very important to control costs.
Did anyone managed to do this?
In Google Cloud Platform (GCP) it is possible to create different service accounts with distinct roles. These roles give access to specific resources across different services. In addition to the already existing roles in Bigquery, GCP allows to set service accounts with customized roles.
To control costs, the Project Admin or BigQuery Admin can establish a particular expiration date for a dataset and grant access to other service accounts with restricted permissions like BigQuery Job User or BigQuery Data Viewer, for example. This way, all the tables included in the dataset will have a default expiration date (set by the administrator) that all the other service accounts could not modify.
I have a Power BI workbook that I have created in Desktop. It sources from a SQL Server database. I can access this database with account x. My Azure tenant admin has created a data source for this database in our gateway (within the Power BI service), and I have access to this gateway. The admin supplied account y in connecting to this data source. How does this work when I go to refresh the dataset that this workbook creates when I publish it to the service? That is, when I schedule a refresh on the dataset, will it dial into the SQL Server database using account y provided in the data source definition (virtually ignoring / dropping account x's credentials)?
Yep. That's exactly how it works. The automated refresh will use account 'Y.'
Data sources that have been deployed to some hosted location will almost always disregard the credentials used to create the dataset and instead use credentials that are specifically supplied for the refresh. These 'service' accounts will typically have different rules about password resets, have the lowest appropriate levels of access, and be under the prevue of system administrators rather than report authors. Its a very standard practice. It protects against misuse, error, loss of accounts, and segregates actual user activity from automated behaviors in the logs.
However, it is a little odd to me that your admin 'created the datasource' -- is that correct? Or did the admin just wire up the gateway to the datasource that was deployed when you published?
If you want to use a datasource that is already published, then you need to connect to that datasource from PowerBI desktop. Otherwise you'll be pushing out something new that has nothing to do with the resources that your admin created.