Permission bigquery.tables.updateData denied when querying INFORMATION_SCHEMA.COLUMNS - google-cloud-platform

I'm querying bigquery (via databricks) with a service account with the following roles:
BigQuery Data Viewer
BigQuery Job User
BigQuery Metadata Viewer
BigQuery Read Session User
The query is:
SELECT distinct(column_name) FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS` where data_type = "TIMESTAMP" and is_partitioning_column = "YES"
I'm actually querying via Azure Databricks:
spark.read.format("bigquery")
.option("materializationDataset", dataset)
.option("parentProject", projectId)
.option("query", query)
.load()
.collect()
But I'm getting:
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "Access Denied: Table project:dataset._sbc_f67ac00fbd5f453b90....: Permission bigquery.tables.updateData denied on table project:dataset._sbc_f67ac00fbd5f453b90.... (or it may not exist).",
"reason" : "accessDenied"
} ],
After adding BigQuery Data Editor the query works.
Why I need write permissions to view this metadata? Any lower permissions I can give?
In the docs I see that only data viewer is required, so I'm not sure what I'm doing wrong.

BigQuery saves all query results to a temporary table if a specific table name is not specified.
From the document, following permissions are required.
bigquery.tables.create permissions to create a new table
bigquery.tables.updateData to write data to a new table, overwrite a table, or append data to a table
bigquery.jobs.create to run a query job
Since the service account already have BigQuery Job User role, it is able to run the query, it needs BigQuery Data Editor role for bigquery.tables.create and bigquery.tables.updateData permissions.

Related

Cloud Function to import data into CloudSQL from cloud storage bucket but getting already schema exist error

I'm trying to import data into CloudSQL instance from cloud storage bucket using cloud function.
How can i delete schema's before importing the data using a single cloud function?
I am using Node.js in cloud function.
error:
error: exit status 3 stdout(capped at 100k bytes): SET SET SET SET SET set_config ------------ (1 row) SET SET SET SET stderr: ERROR: schema "< >" already exists
https://cloud.google.com/sql/docs/mysql/admin-api/rest/v1beta4/instances/import
in below code where do i need to put delete all existing schema's apart from public schema?
Entry point : importDatabase
index.js
const {google} = require('googleapis');
const {auth} = require("google-auth-library");
var sqlAdmin = google.sqladmin('v1beta4');
exports.importDatabase = (_req, res) => {
async function doIt() {
const authRes = await auth.getApplicationDefault();
let authClient = authRes.credential;
var request = {
project: 'my-project', // TODO: Update placeholder value.
instance: 'my-instance', // TODO: Update placeholder value.
resource: {
importContext: {
kind: "sql#importContext",
fileType: "SQL", // CSV
uri: <bucket path>,
database: <database-name>
// Options for importing data as SQL statements.
// sqlimportOptions: {
// /**
},
auth: authClient,
};
sqladmin.instances.import(request, function(err, result) {
if (err) {
console.log(err);
} else {
console.log(result);
}
res.status(200).send("Command completed", err, result);
});
}
doIt();
};
package.json
{
"name": "import-database",
"version": "0.0.1",
"dependencies": {
"googleapis": "^39.2.0",
"google-auth-library": "3.1.2"
}
}
The error looks to be occurring due to a previous aborted import managed to transfer the "schema_name" schema, and then this subsequent import was done without first re-initializing the DB,check helpful document on Cloud SQL import
One way to prevent this issue is to change the create statements in the SQL file from:
CREATE SCHEMA schema_name;
to
CREATE SCHEMA IF NOT EXISTS schema_name;
As far the removing of currently created schema is considered by default, only user or service accounts with the Cloud SQL Admin (roles/cloudsql.admin) or Owner (roles/owner) role have the permission to delete a Cloud SQL instance,please check the helpful document on cloudsql.instances.delete to help you understand the next steps.You can also define an IAM custom role for the user or service account that includes the cloudsql.instances.delete permission. This permission is supported in IAM custom roles.
As a best practice for import export operations, we recommend that you adopt the principle of least privilege, which in this case would mean creating a custom role and adding that specific permission and assigning it to your service account. Alternatively, the service account could be given the “Cloud SQL Admin” role, or the “Cloud Composer API Service Agent” role, which include this permission, and would therefore allow you to execute this command.
NOTE:It is recommended and advised to revalidate any delete actions performed as this may lead to loss of useful data.

Code fails to update a table on BQ using DML, but succeeds for insertion and deletion with RPC

I wrote some code that uses service-account to write to BQ on google-cloud.
A very strange thing is that only "update" operation using DML fails. (Other insertion, deletion RPC calls succeeds).
def create_table(self, table_id, schema):
table_full_name = self.get_table_full_name(table_id)
table = self.get_table(table_full_name)
if table is not None:
return # self.client.delete_table(table_full_name, not_found_ok=True) # Make an API
# request. # print("Deleted table '{}'.".format(table_full_name))
table = bigquery.Table(table_full_name, schema=schema)
table = self.client.create_table(table) # Make an API request.
print("Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id))
#Works!
def upload_rows_to_bq(self, table_id, rows_to_insert):
table_full_name = self.get_table_full_name(table_id)
for ads_chunk in split(rows_to_insert, _BQ_CHUNK_SIZE):
errors = self.client.insert_rows_json(table_full_name, ads_chunk,
row_ids=[None] * len(rows_to_insert)) # Make an API request.
if not errors:
print("New rows have been added.")
else:
print("Encountered errors while inserting rows: {}".format(errors))
#Permissions Failure
def update_bq_ads_status_removed(self, table_id, update_ads):
affected_rows = 0
table_full_name = self.get_table_full_name(table_id)
for update_ads_chunk in split(update_ads, _BQ_CHUNK_SIZE):
ad_ids = [item["ad_id"] for item in update_ads_chunk]
affected_rows += self.update_bq_ads_status(f"""
UPDATE {table_full_name}
SET status = 'Removed'
WHERE ad_id IN {tuple(ad_ids)}
""")
return affected_rows
I get this error for update only:
User does not have bigquery.jobs.create permission in project ABC.
I will elaborate on my comment.
In GCP you have 3 types of IAM roles.
Basic Roles
include the Owner, Editor, and Viewer roles.
Predefined Roles
provide granular access for a specific service and are managed by Google Cloud. Predefined roles are meant to support common use cases and access control patterns.
Custom Roles
provide granular access according to a user-specified list of permissions.
What's the difference between predefinied and custom roles? If you change (add/remove) permission for a predefinied role it will become custom role.
Predefinied roles for BigQuery with permissions list can be found here
Mentioned error:
User does not have bigquery.jobs.create permission in project ABC.
Means that IAM Role doesn't have specific BigQuery Permission - bigquery.jobs.create.
bigquery.jobs.create permission can be found in two predefinied roles like:
BigQuery Job User - (roles/bigquery.jobUser)
BigQuery User - (roles/bigquery.user)
Or can be added to a different predefinied role, however it would change to custom role.
Just for addition, in Testing Permission guide, you can find information on how to test IAM permissions.
Please give the service account the bigquery.user role and try to run the code again.
BigQuery Job User
role: bigquery.user

How to query a table that I don't own and don't have bigquery.jobs.create permissions on it

I was shared on a BigQuery table that I don't own and I don't have the bigquery.jobs.create permission on the dataset that contains the table.
I successfully listed all the tables in the dataset, but when I tried to query the table using this code:
tables.map(async (table) => {
const url = `https://bigquery.googleapis.com/bigquery/v2/projects/${process.env.PROJECT_ID}/queries`;
const query = `SELECT * FROM \`${table.id}\` LIMIT 10`;
const data = {
query,
maxResults: 10,
};
const reqRes = await oAuth2Client.request({
method: "POST",
url,
data,
});
console.log(reqRes.data);
});
I got the following error:
Error: Access Denied: Project project_id: <project_id>
gaia_id: <gaia_id>
: User does not have bigquery.jobs.create permission in project <project_id>.
I can't ask for those permissions, what should I do in this situation?
IMPORTANT:
I have tried to run the same query in the GCP and it ran successfully, but it seems like it created a temporary table clone and then queried this table and no the original one:
There are two projects here: your project, and the project that contains the table.
You currently create the job in ${process.env.PROJECT_ID} that you use in URL, try specifying your own project instead, where you can create jobs.
You'll need to modify query to include table's project to allow BigQuery to find it, so make sure ${table.id} includes project (table's - not yours), dataset and table.

Dataflow ReadFromBigQuery transformation - How to configure location for Temporary table

When we perform the ReadFromBigquery transformation, Dataflow creates a temporary dataset in which it stores the data before reading it.
My user is only allowed to create a dataset in Europe Region (not in the US), and it seems that dataflow is using the US region by default. How can I ask dataflow to create the Temp Dataset in Europe ?
Important: the tables that I am reading are in Europe and I specified "Region = Europe" in pipeline options
Please find Below:
The read transformation: read = (
p
| 'ReadForVal' >> beam.io.ReadFromBigQuery(query='SELECT id_ligne FROM project.dataset.table', use_standard_sql = True))
The error:
content <{ "error": { "code": 403, "message": "US violates constraint constraints/gcp.resourceLocations on the resource projects/irn-71631-lab-80/datasets/temp_dataset_243e02d1cda342d4962195b28cf33bba", "errors": [ { "message": "US violates constraint constraints/gcp.resourceLocations on the resource projects/irn-71631-lab-80/datasets/temp_dataset_243e02d1cda342d4962195b28cf33bba", "domain": "global", "reason": "policyViolation" } ], "status": "PERMISSION_DENIED" } } >
I am struggling for a few days now...
Thank you a lot for your help!
Unfortunately there doesn't seem to be a way to manually set the location of the temporary dataset. From the code, it appears to get the location from the table in the query:
def _setup_temporary_dataset(self, bq):
location = bq.get_query_location(
self._get_project(), self.query.get(), self.use_legacy_sql)
bq.create_temporary_dataset(self._get_project(), location)
And the documentation for get_query_location states "This method returns the location of the first available referenced table for user in the query".
The simplest workaround at the moment is to only read tables in Europe, if possible by copying any tables from the US over before reading them from Dataflow. Actually adding the ability to configure the temporary database would probably be welcome in Beam, so I encourage you to report this as a feature request on the Beam Jira.

How to fetch DBPROPERTIES, S3Location and comment set while creating database in AWS Athena?

As provided in AWS athena documentation.
https://docs.aws.amazon.com/athena/latest/ug/create-database.html
We can specify DBPROPERTIES, S3Location and comment while creating Athena database as
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT 'database_comment']
[LOCATION 'S3_loc']
[WITH DBPROPERTIES ('property_name' = 'property_value') [, ...]]
For example:
CREATE DATABASE IF NOT EXISTS clickstreams
COMMENT 'Site Foo clickstream data aggregates'
LOCATION 's3://myS3location/clickstreams/'
WITH DBPROPERTIES ('creator'='Jane D.', 'Dept.'='Marketing analytics');
But once the properties are set. How can I fetch the properties back using Query.
Let say, I want to fetch creator name from the above example.
You can get these using the Glue Data Catalog GetDatabase API call.
Databases and tables in Athena are stored in the Glue Data Catalog. When you run DDL statements in Athena it translates these into Glue API calls. Not all operations you can do in Glue are available in Athena, because of historical reasons.
I was able to fetch AWS Athena Database properties in Json format using following code of Glue data catalog.
package com.amazonaws.samples;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.services.glue.AWSGlue;
import com.amazonaws.services.glue.AWSGlueClient;
import com.amazonaws.services.glue.model.GetDatabaseRequest;
import com.amazonaws.services.glue.model.GetDatabaseResult;
public class Glue {
public static void main(String[] args) {
BasicAWSCredentials awsCreds = new BasicAWSCredentials("*api*","*key*");
AWSGlue glue = AWSGlueClient.builder().withRegion("*bucket_region*")
.withCredentials(new AWSStaticCredentialsProvider(awsCreds)).build();
GetDatabaseRequest req = new GetDatabaseRequest();
req.setName("*database_name*");
GetDatabaseResult result = glue.getDatabase(req);
System.out.println(result);
}
}
Also, following permissions are required for user
AWSGlueServiceRole
AmazonS3FullAccess