How do I specify a key file when using Dataflow? - python-2.7

Within my Dataflow pipeline, I have a function that creates a Cloud Storage client. Instead of my VMs automatically using the default credentials, I would like to specify a key file.
I believe the way to do that is
client = storage.Client.from_service_account_json([path to local file])
However, I'm not sure where to put my json file so that my pipeline function has access to it. Where should I upload my json file?

Dataflow uses controller service accounts to create and manage resources when executing a pipeline
If you want to create and use resources with fine-grained access and control, you can use a service account from your job's project as the user-managed controller service account.
Use the --serviceAccount option and specify your service account when you run your pipeline job:
--serviceAccount=my-service-account-name#my-project.iam.gserviceaccount.com

Related

Is there a way to run a GCP Cloud Function locally while authenticated as a service account?

I'm fairly new to GCP Cloud Functions.
I'm developing a cloud function within a GCP project which needs to access some other resources from the project (such as GCS, for instance). When I set up a cloud function, it gets a service account associated to it, so, I'm able give this service account the required permissions on the IAM and it works just fine in production.
I'm handling the required integrations by using the GCP SDKs and identifying the resources relative to the GCP project. For instance, if I need to access a GCS bucket within that project, it looks something like this:
const bucket = await storage.bucket("bucket-name");
The problem with this is that I'm not able to access these resources if I'm running the cloud function locally for development, so, I have to deploy it every time to test it, which is a process that takes some time and makes development fairly unproductive.
So, is there any way I can run this cloud function locally whilst keeping the access to the necessary project resources so that I'm able to test it while developing? I figured that running this function as it's service account could work, but I don't know how to do it and I'm also open to different approaches.
Yes, there is!
The only thing you need to do is setting the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of a service account json file and then the googleapis libraries handle the rest automatically, most of the time.

How to securely store and manage Google Cloud Storage key (json file)

I am writing an application where I have to upload media files to GCS. So, I created a storage bucket and also created a service account which is being used by the application to put and get images from the bucket. To access this service account from the application I had to generate a private key as a JSON file.
I am tested my code and it is working fine. Now, I want to push this code to my Github repository but I don't want this service account key to be in Github.
How do I manage to keep this service account key secret, yet all my fellow colleagues should be able to use it.
I am going to put my application on GCP Container Instance and I want it to work there as well.
As I understand, if your application works from inside the GCP and use some custom service account, you might not need any private keys (as json files) at all.
The custom service account, which is used by your application, should get relevant IAM roles/permissions on the correspondent GCS bucket. And that's all you might need to do.
You can assign those IAM roles/permissions either manually (through UI console), or using CLI commands, or as part of your deployment CI/CD pipeline.

Monitoring Performance metrics more than one GCP environment

My requirement is to monitor performance metrics of GCP Environments. We have to monitor more than one GCP environment (Service Account). Now Problem is how can I set the all service Accounts JSON files in Environment variable "GOOGLE_APPLICATION_CREDENTIALS".
creating MetricServiceClient like below after setting JSON file in an environment variable
MetricServiceClient client = MetricServiceClient.create()
Is there another way to create MetricServiceClient using credentials.
I suggest you use the StackDriver Workspace and add all the GCP Project you want to monitor to that workspace. Here is the detailed guide https://cloud.google.com/monitoring/workspaces/guide.
By using a single Workspace for all GCP Project, you will have all the Metrics/Logging data in a single place and then you can use one set of credentials to access all GCP Project Monitoring data.
If the Single workspace is not a feasible option, then you can create a single GCP Service account and add StackDriver related permission from all the projects. Then you can use this service account to interact with Stackdriver metrics.
Note: Always try to use the principle of least privilege
Hope this helps.

Is it possible to use multiple service keys in the same command

We wanted to copy a file from one project's storage to another.
I have credentials for project A and project B in separate service accounts.
The only way we knew how to copy files was to add service key credential permissions to the bucket's access control list.
Is there some other way to run commands across accounts using multiple service keys?
You can use Cloud Storage Transfer Service to accomplish this.
The docs should guide you to setup the permissions for buckets in both projects and do the transfers programmatically or on the console.
You need to get the service account email associated to the Storage Transfer Service by entering your project ID in the Try this API page. You then need to give this service account email the required roles to access the data from the source. Storage Object Viewer should be enough permissions.
At the data destination, you need get the service account email for the second project ID, then give it the Storage Legacy Bucket Writer role.
You can then do the transfer using the snippets in the docs.

Sending credentials to Google Dataflow jobs

What is the right way to pass credentials to Dataflow jobs?
Some of my Dataflow jobs need credentials to make REST calls and fetch/post processed data.
I am currently using environment variables to pass the credentials to the JVM, read them into a Serializable object and pass them on to the DoFn implementation's constructor. I am not sure this is the right approach as any class which is Serializable should not contain sensitive information.
Another way I thought of is to store the credential in GCS and retrieve them using service account key file, but was wondering why should my job execute this task of reading credentials from GCS.
Google Cloud Dataflow does not have native support for passing or storing secured secrets. However you can use Cloud KMS and/or GCS as you propose to read a secret at runtime using your Dataflow service account credentials.
If you read the credential at runtime from a DoFn, you can use the DoFn.Setup lifecycle API to read the value once and cache it for the lifetime of the DoFn.
You can learn about various options for secret management in Google Cloud here: Secret management with Cloud KMS.