I have a problem with dataflow, I need to execute a job and I get the following error:
Workflow failed. Causes: There was a problem refreshing your credentials. Please check:
1. Dataflow API is enabled for your project.
2. There is a robot service account for your project:
service-[project number]#dataflow-service-producer-prod.iam.gserviceaccount.com should have access to your project. If this account does not appear in the permissions tab for your project, contact Dataflow support.
And I have created the service account and with permits within the project.
Service account
so I still can not identify this error.
I have disabled the Dataflow API and reactivated and still nothing. Or if it is possible to regenerate this service account in order to execute a job?
Regards
Add the service account email into service_account_email option of your PipelineOptions.
def get_pipeline_options(self):
options = PipelineOptions( )
gcp_options = options.view_as( GoogleCloudOptions )
gcp_options.job_name = "sampleflow"
gcp_options.project = "etldemo-000000"
gcp_options.staging_location = "gs://<bucket name>/stage"
gcp_options.temp_location = "gs://<bucket name>/tmp"
gcp_options.service_account_email = "etldemo#etldemo-000000.iam.gserviceaccount.com"
options.view_as( StandardOptions ).runner = 'DataflowRunner'
return options
Related
I'm trying to create dlp template using terraform in a project but it asks me to activate dlp in another one.
here is the code I submitted :
resource "google_data_loss_prevention_inspect_template" "mytemplate" {
parent = "projects/${local.project_id}/locations/europe-west1"
description = "Custom Template"
display_name = "Custom Template"
inspect_config {
custom_info_types {
....
I'm authenticated with json.key file (using GOOGLE_APPLICATION_CREDENTIALS environment variable to point to the key)
I got the error :
Error: Error creating InspectTemplate: googleapi: Error 403: Cloud Data Loss Prevention (DLP) API has not been used in project XXXXX before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/dlp.googleapis.com/overview?project=XXXXX then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.
where project XXXXX is the one associated with my service account and not the one defined in the resource.
I tried to force project in google provider too, but same result.
The service account's owning project does need to have DLP enabled.
So I created a python script similar to [BQ tutorial on SQ][1]. The service account has been set using os.environ. When executing with BigQuery Admin and other similar permissions(Data user, Data transfer agent, Data view etc) the schedule query creation fails with
status = StatusCode.PERMISSION_DENIED
details = "The caller does not have permission"
The least permission level it is accepting is 'Project Owner'. As this is a service account, I was hoping a lower permission level can be applied eg Bigquery Admin, as all I need with the Service account is to remotely create schedule queries. Even the how to guide says it should work. Can anyone provide some input if there is any other combination of permissions which will allow this to work please.
[1]: https://cloud.google.com/bigquery/docs/scheduling-queries#set_up_scheduled_queries
I have a script that pulls data from a third party API and writes it to a big query table. I am overwriting the BQ table every execution to include the most recent data, but am looking to store historic data in Cloud Storage.
I have seen multiple topics on this error but I can't see any that are having the same issue I have. I have granted the Service account permissions to create objects and when running the function locally, have successfully uploaded multiple objects to the bucket. However, when I try to deploy this as cloud function (using the "test function" functionality), I receive the 403 Forbidden error. The cloud function uses the same service account to access other GCP services (including BQ). The error I recieve is 'Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>, 308.
Here are a the relevant parts of my code.
def insert_stats(request):
environment = os.environ.get('env', 'none')
project_id="proj_id"
#if dev environment obtain credentails locally, else retrieve from cloud funciton environment
if(environment == 'dev'):
credentials = service_account.Credentials.from_service_account_file(
"path"
)
pandas_gbq.context.credentials = credentials
secret_client = secretmanager.SecretManagerServiceClient(credentials=credentials)
storage_client = storage.Client(project=project_id, credentials=credentials)
elif environment=="prod":
secret_client = secretmanager.SecretManagerServiceClient()
logging_client = logging.Client()
storage_client = storage.Client(project=project_id)
logger = logging_client.logger("ingest_log")
I'm setting env=dev as an environment variable locally, and env=prod for the cloud function
#json_file = json.dumps(list, indent=4)
#bucket = storage_client.bucket("bucket")
#blob = bucket.blob("list_"+ str(rpt_strt.date()) + "_" + str(rpt_strt.hour) + "-00.json")
#blob.upload_from_string(json_file)
Any help would be appreciated.
Some of the scheduled queries in Google Cloud Platform suddenly don't run anymore, with the message "Access Denied: ... User does not have bigquery.tables.get permission for table..."
First, is it possible to see under which user the scheduled query is running?
Second, is it possible to change the user?
Thanks, Silvan
I always use service accounts for command line execution...
if you can use bq cli, look at --service_account and --service_account_credential_file
If you still want to use the schedule query, there is some documentation on the service account on https://cloud.google.com/bigquery/docs/scheduling-queries (per above)
This can also be done (for a normal non-service account user) via the console as per the instructions at: https://cloud.google.com/bigquery/docs/scheduling-queries#update_scheduled_query_credentials
"To refresh the existing credentials on a scheduled query:
Find and view the status of a scheduled query.
Click the MORE button and select Update credentials."
Although this thread is 2 years old, it is still relevant. So I will guide you on how to troubleshoot this issue below:
Cause:
This issue happens when the user that was running the query does not meet the required permissions. This could have been caused by a permissions removal or update of the scheduled query's user.
Step 1 - Checking which user is running the query:
Head to GCP - BigQuery - Scheduled Queries
Once on the scheduled queries screen, click on the display name of the query that need to be checked and head to configuration. There you will find the user that currently runs the query.
Step 2 - Understanding the permissions that are needed for running the query:
As specified on Google Cloud's website you need 3 permissions:
bigquery.transfers.update, and, on the dataset: bigquery.datasets.get and bigquery.datasets.update
Step 3 - Check running user's permissions:
From the GCP menu head to IAM & Admin - IAM
IAM
There you will find the permissions assigned to different users. Verify the permissions possessed by the user running the query.
Now we can solve this issue in 2 different ways:
Step 4 - Edit current user's roles or update the scheduler's credentials with an email that has the required permissions:
Option 1: Edit current user's roles: On the IAM screen you can click on "Edit
principal" next to a user to add, remove or update roles (remember to
add a role that complies with the permissions required mentioned in
Step 2).
Option 2: Update credentials (as #coderintherye suggested in another
answer): Head to GCP - BigQuery - Scheduled Queries and select
the query you want to troubleshoot - Head to MORE (on the
top-right corner of the screen) - Update credentials - Finally,
choose a mail. WARNING: That mail will now be the user that
runs the query, so make sure that it has the permissions needed
as mentioned in step 2.
To change a scheduled query from a user to a service account, you need to:
make sure that the service account is from the same project as the project where you are running your scheduled query.
You as a user and the service account, should have the appropriate permissions:
https://cloud.google.com/bigquery/docs/scheduling-queries#required_permissions
You can run a command from the CLI or python code to make the change from user to service account:
CLI:
bq update \
--transfer_config \
--update_credentials \
--service_account_name=abcdef-test-sa#abcdef-test.iam.gserviceaccount.com \
projects/862514312345/locations/us/transferConfigs/5dd12f12-0000-122f-bc38-089e0820fe38
Python:
from google.cloud import bigquery_datatransfer
from google.protobuf import field_mask_pb2
transfer_client = bigquery_datatransfer.DataTransferServiceClient()
service_account_name = "email address of your service account"
transfer_config_name = "projects/SOME_NUMBER/locations/EUROPE_OR_US/transferConfigs/A_LONG_ALPHANUMERIC_ID"
transfer_config = bigquery_datatransfer.TransferConfig(name=transfer_config_name)
transfer_config = transfer_client.update_transfer_config(
{
"transfer_config": transfer_config,
"update_mask": field_mask_pb2.FieldMask(paths=["service_account_name"]),
"service_account_name": service_account_name,
}
)
print("Updated config: '{}'".format(transfer_config.name))
See also here for code examples:
https://cloud.google.com/bigquery/docs/scheduling-queries#update_scheduled_query_credentials
bq update --transfer_config --update_credentials --service_account_name=<service_accounnt> <resource_name>
service account = service account id that you wish to use as a credential.
resource_name = resource name of the Scheduled query that you can see in the configuration section of the Scheduled query detail page.
We have created cloud spanner instance and databases on google cloud console.
Following code snippet which we are executing.
def getDatabaseList(self,):
try:
parent = "projects/"+self._PROJECT_NAME + "/instances/" + self._INSTANCE_NAME
response = self.service.projects().instances().databases().list(parent=parent).execute()
except Exception, e:
logging.info("Exception while getDatabaseList %s", e)
return False
return response
In the above code snippet is self.service is object googleapiclinet library build object.
We are getting below exception while executing above code snippet using service account id.
Exception while getDatabaseList <HttpError 403 when requesting https://spanner.googleapis.com/v1/projects/<projectName>/instances/<instanceName>/databases?alt=json&key=<APIKEY>
returned "Resource projects/<projectName>/instances/<instanceName> is missing IAM permission: spanner.databases.list.">
Reference document cloud spanner IAM
The following link shows an example to list Databases in an instance using Python Spanner Client Library
https://github.com/googleapis/python-spanner/blob/main/samples/samples/snippets.py#L144
Regarding the IAM permission issue it seems you have not set the GOOGLE_APPLICATION_CREDENTIALS. #ACimander answer is correct.
You can also use gcloud to authenticate using service account by
gcloud auth activate-service-account SERVICE_ACCOUNT#DOMAIN.COM --key-file=/path/key.json --project=PROJECT_ID
More information on this can be found in https://cloud.google.com/sdk/gcloud/reference/auth/activate-service-account
A little late, but hopefully this helps: Did you set path to your service-account's json file correctly? I wasted half a day playing with the permissions until I figured out that I simply missed a an env key.
set export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service_account/key.json