I have created simple dataprep workflow(Source File as CSV from GCS, simple transformation(Upper case conversion) & Target - load into BigQuery).
When i run this workflow job in DataPrep UI, I am getting error as:
Unable to rename output files from
gs://test//temp/dax-tmp-2021-04-02_09_21_11-16351772716646701863-S07-0-d007cba17fe923f9/tmp-d007cba17fe92b44#DAX.ism to gs://test//temp/tmp-d007cba17fe92b44#.ism.,
I am having below IAM roles:
Dataprep User,
Storage Admin,
Storage Object Admin,
Storage Object Creator,
Viewer
Since i already have admin access in GCS, i am not sure why i am getting 'Unable to rename output files' error. But i am able to modify files are available in GCS using gsutil command.
Kindly advise whether this is access issue in DataPrep and how to solve this problem.
DataPrep Logs:
DataFlow Log:
Related
I am referring to this article of creating Cloud Dataprep pipeline
When following the step of importing the data while creating flow, I am not able to read the data and its says access denied as per the above screenshot.
Reference Link : https://www.trifacta.com/blog/data-quality-monitoring-for-cloud-dataprep-pipelines/
I tried importing the json file and I am expecting the flow to read the table
I'm struggling to execute a query with Bigquery python client from inside a training custom job of Vertex AI from Google Cloud Platform.
I have built a Docker image which contains this python code then I have pushed it to Container Registry (eu.gcr.io)
I am using this command to deploy
gcloud beta ai custom-jobs create --region=europe-west1 --display-name="$job_name" \
--config=config_custom_container.yaml \
--worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri="$docker_img_path" \
--args="${model_type},${env},${now}"
I have even tried to use the option --service-account to specify a service account with admin Bigquery role, it did not work.
According to this link
https://cloud.google.com/vertex-ai/docs/general/access-control?hl=th#granting_service_agents_access_to_other_resources
the Google-managed service accounts for AI Platform Custom Code Service Agent (Vertex AI) have already the right to access to BigQuery, so I do not understand why my job fails with this error
google.api_core.exceptions.Forbidden: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/*******/jobs?prettyPrint=false:
Access Denied: Project *******:
User does not have bigquery.jobs.create permission in project *******.
I have replaced the id with *******
Edit:
I have tried several configuration, my last config YAML file only contents this
baseOutputDirectory:
outputUriPrefix:
Using the field serviceAccount does not seem to edit the actual configuration unlike --service-account option
Edit 14-06-2021 : Quick Fix
like #Ricco.D said
try explicitly defining the project_id in your bigquery code if you
have not done this yet.
bigquery.Client(project=[your-project])
has fixed my problem. I still do not know about the causes.
To fix the issue it is needed to explicitly specify the project ID in the Bigquery code.
Example:
bigquery.Client(project=[your-project], credentials=credentials)
I'm trying to load around 1000 files from Google Cloud Storage into BigQuery using the BigQuery transfer service, but it appears I have an error in one of my files:
Job bqts_601e696e-0000-2ef0-812d-f403043921ec (table streams) failed with error INVALID_ARGUMENT: Error while reading data, error message: CSV table references column position 19, but line starting at position:206 contains only 19 columns.; JobID: 931777629779:bqts_601e696e-0000-2ef0-812d-f403043921ec
How can I find which file is causing this error?
I feel like this is in the docs somewhere, but I can't seem to find it.
Thanks!
You can use bq show --format=prettyjson -j job_id_here and will show a verbose error about the failed job. You can see more info about the usage of the command in BigQuery managing jobs docs.
I tried this with a failed job of mine wherein I'm loading csv files from a Google Coud Storage bucket in my project.
Command used:
bq show --format=prettyjson -j bqts_xxxx-xxxx-xxxx-xxxx
Here is a snippet of the output. Output is in JSON format:
New to GCP. Trying to load a saved model file into an AI Platform notebook. Tried several approaches without success.
Most obvious approach seemed to be to set the value of a variable to the path copied from storage:
model_path = "gs://<my-bucket>/models/3B/export/1600635833/saved_model.pb"
Results: OSError: SavedModel file does not exist at: (the above path)
I know I can connect to the bucket and retrieve contents because I downloaded a csv file from the bucket and printed out the contents.
OSError to me sounds like you are trying to access GCS bucket with a regular file system which do not support looking at GCS. (Example: Python open() function)
To access files in GCS I recommend you use the Client Libraries. https://cloud.google.com/storage/docs/reference/libraries
Another option for testing is to try to connect to SSH and use gsutil command.
Note: I assume <my-bucket> was edited to replace your real GCS bucket name.
According to the GCP documentation enter here , you are able to access Cloud Storage. This page will guide to using Cloud Storage with AI Platform Training.
According to the Google Cloud Platform SQL docs, I should be able to both export to and import from sharded files in a GCS bucket by putting a * in the filename.
If I import a single file, it works fine:
gcloud sql import csv sql-instance-name gs://gcsbucketname/data/led/led_finance_view_000000000000.csv --project=project-name --database=finance --table=Import_Test -q
Importing data into Cloud SQL instance...done.
Imported data from [gs://gcsbucketname/data/led/led_finance_view_000000000000.csv] into [https://www.googleapis.com/sql/v1beta4/projects/project-name/instances/sql-instance-name].
But if I import a sharded file, it throws a permissions error:
gcloud sql import csv sql-instance-name gs://gcsbucketname/data/led/led_finance_view_*.csv --project=project-name --database=finance --table=Import_Test -q
ERROR: (gcloud.sql.import.csv) HTTPError 403: The service account does not have the required permissions for the bucket.
I can confirm that these commands are running under the same user, using the same bucket & SQL instance.
I think the '403 service account permissions' error is probably a bug in GCP and therefore a red herring - but why won't it let me import the sharded file?
Currently, CloudSQL doesn't incorporate the possibility of importing several CSV files at once using wildcards. I have opened a public feature request for this. You can star to make it gain visibility.
Meanwhile, as a workaround, you can either have a script to run the import command once for each file, or join the CSV files before importing.
You have to get the user id (of creation) of your current database instance, in order to get that user id, type something like this in your terminal:
Command: gcloud sql instances describe short [name/db instance]
After that , go to the gcs project -> open your bucket -> add member permitions and you have to add the user id that you got in first command, add role for Storage Admin for that user id.
and that´s all.