Connect BigQuery as a source to Data Fusion in another GCP project - google-cloud-platform

I am trying to connect BigQuery of ProjectA to Data Fusion of ProjectB and its asking me to enter a service key file. I have tried to upload the service key file to Cloud Storage of ProjectB and provided the link but it's asking me to provide a local file path.
Can someone help me on this?
Thanks in advance.

Can you try this, grant BQ permission of project A to data fusion in project B.
service-project_number#gcp-sa-datafusion.iam.gserviceaccount.com.
project_number-compute#developer.gserviceaccount.com.
Steps:
Navigate to the customer project that contains the CDF instance and copy the project number (this is found on the Home Page in the Project Info card)
Navigate to the project that contains the resources you would like to interact with.
In the sidebar, click on ‘IAM & Admin’
Click on ‘Add’ at the top of the page.
Provide the first service account name from the table above, be sure to replace with the actual number you obtained in step 1
Grant the Admin role for the resource you would like to interact with. Ex. BigQuery Admin for reading/writing to BigQuery. For BigQuery, you will also need to grant the BigQuery Data Owner role as well.
Repeat steps 5 & 6 for the second service account in the table above.
In your pipeline, ensure you define the correct Project Id for the sources/sinks. Using ‘auto-detect’ will default to the customer project that contains the CDF instance.

Can you try download the service key json file to the local, ie you local computer? And try to put the file into some folder and provide the full path to that service key file in the BigQuery properties.

Related

GCP How to copy files automatically from Project A to Project B every monday?

GCP is a new thing for me but i want to know if it's possible to copy a specific file (e.g : myFiles.csv) from a bucket in the project A to a bucket in the project B every monday at 6.00 am ?
I need it because myFiles.csv is overwritten every Monday and i need to share it with the project B.
You can use Storage transfer :
https://cloud.google.com/storage-transfer/docs/create-transfers#google-cloud-console
With this service, you can select the source and destination bucket and scheduling options, every day in your case.
Source bucket project GCP A :
In this example, I selected a folder team_league in a bucket called mazlum_dev
In the prefix field, I added the name of the file I want to transfer input_team_slogans.json
You have to put your file name for your job.
Destination bucket project B :
You have to select the output folder of your destination bucket.
Sheduling options :
You can also use the GCloud sdk if needed with gsutil :
gsutil cp gs://your_bucket_project_a/your_file gs://your_bucket_project_b/output/
But you have to find a way to cron this script every day, that's why I recommend the first solution because everything is native and integrated for your need.
Follow below steps :
Click on this Web console link Storage > Transfer to create a new transfer. Then Select the source bucket you want to copy from ex.Project A. So once you go to the destination part of the transfer form, you can write/paste the target bucket (Ex. Project B) right in its text input. Even if that bucket is from another project. It will show you a green icon once the target has been verified as being an existing bucket. You can continue the form again to finalize your setup.
Once you start the transfer from the form, you can follow its progress by hitting the refresh button on top of the console.
As the bucket identifiers are globally unique (this is key to the solution).
Refer this SO Link for more information.

Azure Data Factory HDFS dataset preview error

I'm trying to connect to the HDFS from the ADF. I created a folder and sample file (orc format) and put it in the newly created folder.
Then in ADF I created successfully linked service for HDFS using my Windows credentials (the same user which was used for creating sample file):
But when trying to browse the data through dataset:
I'm getting an error: The response content from the data store is not expected, and cannot be parsed.:
Is there something I'm doing wrongly or it is kind of permissions issue?
Please advise
This appears to be a generic issue, you need to point to a file with appropriate extension rather than a folder itself. Also make sure you are using a supported data store activity.
You can follow this official MS doc to use HDFS server with Azure Data Factory

GCP: Is it possible to have an access to a resource if don't have project access?

It is my first expirience in Google Cloud Platform and I'm confused.
I've got an access to a resource:
xxx#gmail.com has granted you the following roles for resource resource_name(projects/project_name/datasets/ClientsExport/tables/resource_name) BigQuery Data Editor
But if I open BigQuery Data Editor, I don't see project_name and resource_name. Search by resource_name also returns no result.
Is it only access that I have in the project (I didn't get another accesses and mails).
Could you please help me with this? Maybe should I get some additional access to resource_name will be available? If is there another way to find the resource?
Thank you in advance!
In the message you have access to BigQuery data inside a table. You can query them from your project, you are autorised to access them (and to write also, because you are editor).
However, this table isn't in your project, it's in another project that's why you don't see it directly in the BigQuery console. In addition, you haven't the right to read the metadata (roles/bigquery.metadataViewer) on the dataset of the other project. Eventually, you can't also view the table schema in the console, but the bq CLI allow you to view it.
I had some discussions with Google BigQuery team about that (because I got the same issue in my company), and updates should happen by the end of the year (or soon in 2022) to fix this "view" issue in the console.
It looks like you have IAM permission to access a specific resource in BigQuery but cannot access it from the GUI.
Some reasons you may not see access on your GUI:
You have permission to interact with BigQuery but don't have access to any of the data.
You aren't a member of the organization which provided the resources and they have higher level permissions (on the org level) which prevents sharing of resources outside of the org.
Your access is restricted to the command line/app level. (If your account is a service account then this is likely the case.)

Can I run dataflowjob between projects?

I want to export data from Cloud Spanner in project A to GCS in project B as AVRO.
If my service-account in project B is given spanner.read access in project A, can I run a dataflow-job from project B with template: Cloud_Spanner_to_GCS_Avro and write to GCS in project B?
I've tried both in console and with following command:
gcloud dataflow jobs run my_job_name
--gcs-location='gs://dataflow-emplates/latest/Cloud_Spanner_to_GCS_Avro'
--region=my_region
--parameters='instanceId=name_of_instance,databaseId=databaseid,outputDir=my_bucket_url
--service-account-email=my_serviceaccount_email
I'm not sure how to specify projectId of the Spanner instance.
With this command from project B it looks in project B:s Spanner and cannot find the instance and database.
I've tried to set: instanceId=projects/id_of_project_A/instances/
name_of_instance but it's not a valid input
Yes you can, you have to grant the correct authorization on the dataflow service account
I recommend you to use a "user-managed service account". The default one is the Compute Engine default service account with the editor roles on the host project, too many authorizations....
So the answer seems to be that it's possible for some templates or if you write a custom one, but not the template I want to use, batch export from Spanner to GCS Avro files.
And that it might be added in a future update to the template.

How to specify the GCP Credential Location in application.properties file (for using the Pub/Sub in GCP)?

This seems straightforward to do that passing the Service Account key file (generated from the GCP console) by specifying the file location in the application.properties file. However, I tried all the following options:
1. spring.cloud.gcp.credentials.location=file:/home/my_user_id/mp6key.json
2. spring.cloud.gcp.credentials.location=file:src/main/resources/mp6key.json
3. spring.cloud.gcp.credentials.location=file:./main/resources/mp6key.json
4. spring.cloud.gcp.credentials.location=file:/src/main/resources/mp6key.json
It all ended up with the same error:
java.io.FileNotFoundException: /home/my_user_id/mp6key.json (No such file or directory)
Could anyone advise where I should put the key file and then how should I specify the path to the file properly?
The same programs run successfully in Ecplise with messages published and subscribed using the Pub/Sub processing from GCP (using the Project Id/Service Account key generated in GCP), but now stuck with the above issue after deployed to run on GCP.
As mentioned in the official documentation, the credentials file can be obtained from a number of different locations such as the file system, classpath, URL, etc.
for example, if the service account key file is stored in the classpath as src/main/resources/key.json, pass the following property
spring.cloud.gcp.credentials.location=classpath:key.json
if the key file is stored somewhere else in your local file system, use the file prefix in the property value
spring.cloud.gcp.credentials.location=file:<path to key file>
My line looks like this:
spring.cloud.gcp.credentials.location=file:src/main/resources/[my_json_file]
And this works.
The following also works if I put it in the root of the project directory:
spring.cloud.gcp.credentials.location=file:./[my_json_file]
Have you tried to follow this quickstart? Please, try to follow it thoughtfully and explain if you get any error finishing the quickstart.
Anyway, before running your Java script, try running on the console the following (please modify with the exact path where you store your key):
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/mp6key.json"
How are you authenticating your credentials in your Java script?
My answer is easy: if you run you code on GCP, you don't have to use service account key file. Problem eliminated, problem solved!
More seriously, have a look on service identity. I don't know what is your current service (Compute? Function? Cloud Run?). Anyway, you can attach any service account on GCP components. Then, when you code, simply use the default credential. Automatically the component identity is loaded. No key to manage, no key to store securely, no key to rotate!
If you provide more detail on your target platform, I could provide your some guidance to achieve this.
Keep in mind that the service account key file are designed to be used by automatic apps (w/o user account involved) hosted outside GCP (on prem, other Cloud Provider, a CI/CD, Apigee,...)
UPDATE
When you use your personal account, you can also use the default credential.
Install gcloud SDK on your computer
Use the command gcloud auth application-default login
Follow the instructions
Enjoy!
If it doesn't work, get the <path> displayed after the login command and set this value in the environment variable named GOOGLE_APPLICATION_CREDENTIALS.
If you definitively want to use service account key file (which are a security issue for the previous reason, but...), you can use it locally
Either set the json key file path into the GOOGLE_APPLICATION_CREDENTIALS environment variable
Or run this command gcloud auth activate-service-account --key-file=<path to your json key file>
Provided your file is in the resources folder try
file://mp6key.json
using file:// instead of file:/ works for me at least