There is a project in bigquery called project1 and it has a dataset called config_prd. I am trying to create a table in the dataset if it does not exist and then update the table each time I trigger the pipeline. Creating and updating tables are airflow tasks.
At the moment the DAG fails because
Access Denied: Table project1:config_prd.table1: User does not have bigquery.tables.get
permission for table project1:config_prd.table1.
My Question:
So the airflow service account needs that permission to check if the table exists. How I can give the airflow account data viewer permission to config_prd dataset?
My suggested solution:
go to the GCP consol > API and Services > credentials > under the service account section I can see an email address:
airflow#project1.iam.gserviceaccount.com
I have to copy this email address and go to the GCP console > IAM and Admin >IAM > add
member > enter the email address and role is viewer
Please let me know if this is correct.
My other question is we have several projects on GCP. Should I do this for every single project?
another question: how previously airflow was able to update other tables in the dataset?
Related
Hi I have a google email account that I use the get into a GCP project, I am trying to read a BigQuery table via a Notebook but when I try to read this table via the notebook I see this error;
Access Denied: Table project-name:data_warehouse_us.partnerize_data_clicks: User does not have permission to query table project-name:data_warehouse_us.partnerize_data_clicks. [accessDenied]
Traceback:
I go into the IAM settings, and I see the email account I use to access it and it has these 3 roles, `"BigQuery Admin" "BigQuery Data Owner" "BigQuery Job User" do I need to add another role to be able to read/write/delete access to the tables? or is there another place I need to go to be able to fix this error?
thanks
Posting as answer confirmed by #JuanLozano. Notebooks uses a service account to authorize requests. While you may set those permission to your email account, you still have to set those permission to the service account that your Notebook uses.
Check the defined service account to the Notebook, and then add the necessary permissions too it.
In the code below, i am getting one error attached. Also there is a service_account_path i have to enter, where do i find this path and how can i download this file?
# Replace with your service account path
path_service_account = 'service1'
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = path_service_account
# Replace with your input file path
input_file = 'C:\tumbling window\store_sales.csv'
# create publisher
publisher = pubsub_v1.PublisherClient()[enter image description here][1]
Client libraries make use of service account credentials to authenticate into GCP services and APIs, such as Pub/Sub.
To do this, an environment variable named GOOGLE_APPLICATION_CREDENTIALS needs to be set and its value is a string with the path of the JSON key value of that service account.
First, you need to make sure that you've created a service account with enough permissions to do the required Pub/Sub operations:
In the Cloud Console, go to the Service accounts page.
Select your project.
Click Create service account.
Enter a service account name to display in the Cloud Console.
Choose one or more IAM roles to grant to the service account on the project. This is an important step, if you're only planning to use this account to consume the Pub/Sub service, then you could grant it the Pub/Sub admin role. If your code also makes use of another service (Such as BigQuery, for example) you need to grant the required roles as well. An easy option would be to grant it the Project Editor role, which grants access to all GCP services, but is always a recommended security practice to grant only the required minimal necessary permissions.
Once you've granted the required role(s), click Done to finish creating the service account.
Once you've created the service account, then you need to generate a JSON key file:
In the Cloud Console, go to the Service accounts page.
Select your project.
Find the row of the service account that you want to create a key for. In that row, click the More (3-dot) button, and then click Create key.
Select a JSON Key type and click Create.
Clicking Create downloads a service account key file.
That is the file which path needs to be referenced in path_service_account. Let's say that your JSON key file was downloaded in C:\Downloads\YOUR_JSON_KEY_FILENAME.json, then your code would be something like:
path_service_account = 'C:\Downloads\YOUR_JSON_KEY_FILENAME.json'
This should solve the authentication errors that you're getting.
I'm trying to figure out if I can create multiple service accounts and for each service account create a different Policy (or even a generic policy).
In this policy I want to set the default retention for a dataset/table.
Only I (admin) can change the retention after table creation.
This is very important to control costs.
Did anyone managed to do this?
In Google Cloud Platform (GCP) it is possible to create different service accounts with distinct roles. These roles give access to specific resources across different services. In addition to the already existing roles in Bigquery, GCP allows to set service accounts with customized roles.
To control costs, the Project Admin or BigQuery Admin can establish a particular expiration date for a dataset and grant access to other service accounts with restricted permissions like BigQuery Job User or BigQuery Data Viewer, for example. This way, all the tables included in the dataset will have a default expiration date (set by the administrator) that all the other service accounts could not modify.
I have a IAM user with Role: BigQuery Data Editor
In my data set I did Share dataset added the user with Can Edit privileges.
However when I'm running my script which access BigQuery I get error 403
When I add to my IAM user the Role BigQuery User The script works.
The scripts runs only SELECT query from a table in this data set.
I don't understand why I must grant BigQuery User for this to work.
According to the documentation https://cloud.google.com/bigquery/docs/access-control
Rationale: The dataEditor role extends bigquery.dataViewer by issuing
create, update, delete privileges for the tables within the dataset
roles/bigquery.dataViewer has bigquery.tables.getData which get table data
What am I doing wrong here?
Having access to the data and being able to retrieve it with a query are different things and that's where the confusion is coming from.
Per the documentation, roles/bigquery.dataEditor has the following permissions:
Read the dataset's metadata and to list tables in the dataset.
Create, update, get, and delete the dataset's tables.
This means that the user with this role has access and manipulation rights to the dataset's information and the tables in it. An example would be that a user with this role can see all the table information by navigating to it through the GCP console (schema, details and preview tabs) but when trying to run a query there, the following message will appear:
Access Denied: Project <PROJECT-ID>: The user <USER> does not have bigquery.jobs.create permission in project <PROJECT-ID>.
Now let's check the roles/bigquery.user permissions:
Permissions to run jobs, including queries, within the project.
The key element here is that the BigQuery User role can run jobs and the BigQuery DataEditor can't. BigQuery Jobs are the objects that manage the BigQuery tasks, this includes running queries.
With this information, it's clearer in the roles comparison matrix that for what you are trying to accomplish you'll need the BigQuery DataEditor role (Get table data/metadata) and the BigQuery User role (Create jobs/queries).
While using BigQuery Java Client, need to join between Table A present in project A.dataset A and Table B present in project B.dataset B
I am able to run the query using BigQuery console and get cross-project access to the tables by specifying the complete table id i.e. project.dataset.table
Is it possible to add both projects A and B to the same service account, so that the client can be initialized with a single Google Service Account Configuration and query the tables from both the projects?
Thanks.
Yes, it is possible to add the same Service Account to different projects.
Once you have created your Service Account in one project, copy the e-mail. Navigate to Cloud IAM page, choosing your second project. Add the Service Account as a member with necessary BigQuery role to your second project.