Can I run dataflowjob between projects? - google-cloud-platform

I want to export data from Cloud Spanner in project A to GCS in project B as AVRO.
If my service-account in project B is given spanner.read access in project A, can I run a dataflow-job from project B with template: Cloud_Spanner_to_GCS_Avro and write to GCS in project B?
I've tried both in console and with following command:
gcloud dataflow jobs run my_job_name
--gcs-location='gs://dataflow-emplates/latest/Cloud_Spanner_to_GCS_Avro'
--region=my_region
--parameters='instanceId=name_of_instance,databaseId=databaseid,outputDir=my_bucket_url
--service-account-email=my_serviceaccount_email
I'm not sure how to specify projectId of the Spanner instance.
With this command from project B it looks in project B:s Spanner and cannot find the instance and database.
I've tried to set: instanceId=projects/id_of_project_A/instances/
name_of_instance but it's not a valid input

Yes you can, you have to grant the correct authorization on the dataflow service account
I recommend you to use a "user-managed service account". The default one is the Compute Engine default service account with the editor roles on the host project, too many authorizations....

So the answer seems to be that it's possible for some templates or if you write a custom one, but not the template I want to use, batch export from Spanner to GCS Avro files.
And that it might be added in a future update to the template.

Related

Running Dataflow Flex template poll time out

I have two service accounts with exact same roles under the same project and one can run the Flex template without any issue but the other fails to do so and returns:
Timeout in polling result file: <LOGGING_BUCKET>. Service account: <SERVICE_ACCOUNT> Image URL: <IMAGE_URL> Troubleshooting guide at https://cloud.google.com/dataflow/docs/guides/common-errors#timeout-polling
The SA that fails to run doesn't write the logs to GCS bucket, making it really difficult to debug. The graph doesn't get created and seems to get stuck at queue stage. The roles of both SAs are:
BigQuery Admin
Bigtable User
Dataflow Developer
Editor
Storage Object Viewer
Sorry if is it obvious...but
Have you checked the google doc from the error? (https://cloud.google.com/dataflow/docs/guides/common-errors#timeout-polling).
Both SAs have the same roles?
Let's say that SA1 can run Flex1, and SA2 can't run Flex2. Have you tried to assign SA1 into Flex2?
What could be any possible difference between both SAs?
If you create SA3 with the same roles as SA2 and assign it to Flex2, does it work?
Good luck

GCP Vertex AI Training Custom Job : User does not have bigquery.jobs.create permission

I'm struggling to execute a query with Bigquery python client from inside a training custom job of Vertex AI from Google Cloud Platform.
I have built a Docker image which contains this python code then I have pushed it to Container Registry (eu.gcr.io)
I am using this command to deploy
gcloud beta ai custom-jobs create --region=europe-west1 --display-name="$job_name" \
--config=config_custom_container.yaml \
--worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri="$docker_img_path" \
--args="${model_type},${env},${now}"
I have even tried to use the option --service-account to specify a service account with admin Bigquery role, it did not work.
According to this link
https://cloud.google.com/vertex-ai/docs/general/access-control?hl=th#granting_service_agents_access_to_other_resources
the Google-managed service accounts for AI Platform Custom Code Service Agent (Vertex AI) have already the right to access to BigQuery, so I do not understand why my job fails with this error
google.api_core.exceptions.Forbidden: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/*******/jobs?prettyPrint=false:
Access Denied: Project *******:
User does not have bigquery.jobs.create permission in project *******.
I have replaced the id with *******
Edit:
I have tried several configuration, my last config YAML file only contents this
baseOutputDirectory:
outputUriPrefix:
Using the field serviceAccount does not seem to edit the actual configuration unlike --service-account option
Edit 14-06-2021 : Quick Fix
like #Ricco.D said
try explicitly defining the project_id in your bigquery code if you
have not done this yet.
bigquery.Client(project=[your-project])
has fixed my problem. I still do not know about the causes.
To fix the issue it is needed to explicitly specify the project ID in the Bigquery code.
Example:
bigquery.Client(project=[your-project], credentials=credentials)

In GCP, how to list all the resources running under project?

I need to list out all the instance, container, function, notebooks, bucket, dataproc and composer running under project in all the region/locations.
Is it possible to list resources of all the regions location. Gcloud or python script both can work for me
My ultimate goal after listing is to put tag as per its name of the resource.
Thanks
You can use Google Asset inventory feature and query your project like this
gcloud asset search-all-resources --scope=projects/<PROJECT_ID> --page-size=500 --format=json
More detail in the documentation about the query format.
All the ressources aren't supported. You can find the full list here (For example, Cloud Run isn't yet supported, but it's coming soon!)
If you want to access through console, you could go to IAM & Admin Menu, then select Asset Inventory.
Then you could see bunch of asset list.
Click Resource tab if you want download all the details in csv format.
In search asset you will get abundance of irrelevant data. Better to use resource api of the resource you think relevant to you. Like
compute.googleapis.com/Instance
storage.googleapis.com/Bucket
dataproc.googleapis.com/Cluster
container.googleapis.com/Cluster
cloudfunctions.googleapis.com/CloudFunction
dataflow.googleapis.com/Job //Notebook
gcloud asset search-all-resources --asset-types='compute.googleapis.com/Instance,storage.googleapis.com/Bucket' --query='labels.name:*' --format='table(name, assetType, labels)'”

Connect BigQuery as a source to Data Fusion in another GCP project

I am trying to connect BigQuery of ProjectA to Data Fusion of ProjectB and its asking me to enter a service key file. I have tried to upload the service key file to Cloud Storage of ProjectB and provided the link but it's asking me to provide a local file path.
Can someone help me on this?
Thanks in advance.
Can you try this, grant BQ permission of project A to data fusion in project B.
service-project_number#gcp-sa-datafusion.iam.gserviceaccount.com.
project_number-compute#developer.gserviceaccount.com.
Steps:
Navigate to the customer project that contains the CDF instance and copy the project number (this is found on the Home Page in the Project Info card)
Navigate to the project that contains the resources you would like to interact with.
In the sidebar, click on ‘IAM & Admin’
Click on ‘Add’ at the top of the page.
Provide the first service account name from the table above, be sure to replace with the actual number you obtained in step 1
Grant the Admin role for the resource you would like to interact with. Ex. BigQuery Admin for reading/writing to BigQuery. For BigQuery, you will also need to grant the BigQuery Data Owner role as well.
Repeat steps 5 & 6 for the second service account in the table above.
In your pipeline, ensure you define the correct Project Id for the sources/sinks. Using ‘auto-detect’ will default to the customer project that contains the CDF instance.
Can you try download the service key json file to the local, ie you local computer? And try to put the file into some folder and provide the full path to that service key file in the BigQuery properties.

How to export and import google cloud monitoring dashboards between projects using script or API?

I have exported the dashboards using gcloud alpha monitoring dashboards list --format=json, but using gcloud dashboard create using file is not working, basically I want to export the dashboards from one project and import that in other project.
The output of the list sub command probably (didn't test this) has too many dashboards for the create command.
Also, you should remove two fields (name and etag). No need to export as json, yaml will also work and is easier to edit anyway.
I did the following:
gcloud monitoring dashboards list and find the dashboard I was looking for
Note it's name property and get the id from the last part in the name property (a large decimal number or guid)
gcloud monitoring dashboards describe $DASHBOARD_ID > dashboard-$DASHBOARD_ID.yaml the dashboard
Edit the file to remove the etag and name field (the name is usually located at the end of the file)
gcloud monitoring dashboards create --config-from-file dashboard-$DASHBOARD_ID.yaml