Google cloud function is not able to access data from GCS bucket - google-cloud-platform

I have one cloud function which is triggering the dataflow. For this process it should get dataflow template which kept in a gcs bucket.
Using a default service account (linked to cloud function) with Editor Role I am able to fetch this file.
But using a custom service account with below roles it showing 403 status.
Cloud Build Service Account
Cloud Build Service Agent
Cloud Functions Service Agent
Container Registry Service Agent
Dataflow Developer
Storage Object Admin
The error I am getting is
2020-10-21 11:14:20.820 WARN 1 --- [p2094777811-167] .a.b.s.e.g.u.RetryHttpRequestInitializer : Request failed with code 403, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes, HTTP framework says request can be retried, (caller responsible for retrying): https://dataflow.googleapis.com/v1b3/projects/<project id>/locations/australia-southeast1/templates:launch?gcsPath=gs://<path>/templates/i-template.
Do I missed any roles? Please help

The error message means that you do not have the required permissions to execute the operation:
RetryHttpRequestInitializer : Request failed with code 403, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes, HTTP framework says request can be retried, (caller responsible for retrying): https://dataflow.googleapis.com/v1b3/projects//locations/australia-southeast1/templates:launch?gcsPath=gs:///templates/i-template.
You mentioned that by using the Editor role, you were able to execute the operation without issues. This is because, as an Editor, you are able to accomplish many tasks on a vast majority of resources: viewer permissions, plus permissions for actions that modify state, such as changing existing resources.
You can refer to this documentation for more information about Basic role definitions.
Now, you can narrow down the permission scope to a minimum set of permissions which will allow you to have more control over each resource. For this, I would recommend that you add the Cloud Functions Developer and Dataflow Admin roles.
Being a CF developer, you will have full access to functions, operations and locations. Then, the Dataflow Admin role encompasses all the necessary permissions for creating and managing dataflow jobs and also includes some Cloud Storage permissions, such as storage.buckets.get and create, get, and list objects.
Lastly, please make sure that you have the necessary permissions for the trigger sources, i.e. Cloud Storage, and using Storage Admin should be enough.
Please note that you can always double check your roles along with its permissions by looking at the predefined roles tables for each Google Cloud resource, in case you need to narrow it down further.

If you want to give Storage read access (asuming that you are not using fine grained permissions) to this service account you are missing at least the Storage Object Viewer role for this account.

Related

Google cloud function deployment vs runtime permissions

I am paranoid about security and this makes no sense to me. I must be missing something here. I can get it working no problems. But I want to know why? What is the philosophy behind it? and how am I protected?
I wrote a google cloud function that receives a post request and publishes an event to a google pubsub topic. I've set up my topic pubsub resource and set up an iam binding so that only my functions service account can publish to that channel - that is all good.
However, it does not let me deploy (using gcloud functions deploy --service-account=...) my function with that service account. Says it does not have secretAccessor and and deploymentManager.editor and cloudfunctions.developer etc
My confusion is...why should it need development/deployment related permissions? I am deploying the function and I have those permissions. So it should use my permissions to deploy. But when the function is actually running, I dont want it to have those development/deployment management permissions in case there is some vulnerability that can be exploited. I want it to run as the service account I specify. It needs to be restricted to only the permissions related to receiving request and publishing to my topic. Otherwise it would break the principle of having least privileges.
When you create a service such as Functions, Run, or Compute Engine, you, as the deployer, need two types of permissions:
Permission to create the service
Permission to assign an existing identity (aka service account) to the service.
The service typically needs an identity (service account) with appropriate permissions. The permissions are the ones required for the service to access other services and resources. This service runs independently of the identity that created the service.
Two identities and two sets of permissions to manage. That means your goal of least privilege can be achieved.
My confusion is...why should it need development/deployment related
permissions?
I do not know because your question does not have the details required to answer. The error you posted does not make sense in the context described. I am not aware of any instance where, for example, deploying a Function requires Deployment Manager Editor for the Function's identity. The function itself might need that IAM role, but the deployment command does not nor does the deployment command even know which permissions the function requires except for those derived by the command line flags.
If you need more help on this, edit your question to clearly describe both identities and IAM roles, the deployment, which resources are accessed, and how you are deploying. Include the actual commands and error messages.

How to give Google Cloud Eventarc correct permission so it can trigger a cloud function?

I have succesfully deployed a 2nd generation cloud function with a storage trigger per the google tutorial.
The Cloud Function works when I run a test command in the shell. But if I try for real by uploading a file to my bucket the could function is not invoked.
I can see that the event triggers the pubsub topic:
And in Eventarc I can see signs of the problem:
So, my layman analyse of why the cloud function invokation fails is that I lack some permission for Eventarc to receive the message from PubSub (?). I have read Eventarc troubleshooting and Eventarc accesscontrol and tried to add the eventarc admin role to the eventarc serviceaccount (as seen in image below) but to no result. (I've also added it to any other service account I can find, made the compute service account project owner, etc. but no luck). What am I missing?
(Note, I had an earlier question about this but with broader scope but I opted for a new, more specific question)
You used the Compute Engine default Service Account.
You need to give the needed permissions to this Service Account :
According to the documentation :
Make sure the runtime service account key you are using for your
Application Default Credentials has either the
cloudfunctions.serviceAgent role or the storage.buckets.{get, update}
and the resourcemanager.projects.get permissions. For more information
on setting these permissions, see Granting, changing, and revoking
access to resources.
Please check in IAM page if the default Service Account has the following permissions :
cloudfunctions.serviceAgent
storage.buckets.{get, update}
resourcemanager.projects.get
Also, don't hesitate to check in Cloud logging to see the exact error and the missing permissions.

How can I give my GKE deployed application access to Google Pub/Sub?

I deployed a kotlin backend application that is utilizing google cloud pub/sub. I recently deployed that application with Cloud Run and it ran fine having full access to Pub/Sub.
Now because of reasons I have to deploy the application with GKE. However now the access to Pub/Sub seems not to work anymore.
I checked what service account my GKE Cluster is using and figured out it was the default one. Therefore I granted Permissions as Pub/Sub Editor to that service account.
I thought with this everything should work.
But still I see this error message in my logs:
com.google.api.gax.rpc.PermissionDeniedException: io.grpc.StatusRuntimeException: PERMISSION_DENIED: User not authorized to perform this action.
Any ideas what I have missed out?
That could be 2 things:
Either your pod use Workload Identity and doesn't use the default service account (with the editor role, thing to avoid by the way...). And so, the service account that you use hasn't the PubSub permissions
Or, because you use the default compute engine service account (with the editor role, thing to avoid by the way... I repeat myself, but it's really bad!), the Node pool scope are set by default (if you haven't override that parameters) and you can't access to the PubSub API because of credential scopes.
The best solution is to recreate your node pool, with a custom service account. Like that you can enforce the least privilege at node pool level, and you avoid the legacy compute engine scope definitions and limitations. If you use workload identity, you can go a level beyond in term of security and enforcing the least privilege at the pod level.

service account execution batch dataflow job

I need to execute a dataflow job using a service account , I'm following a very simple and basic example wordcount offered within the same platform itself.
Which is weird is the error I'm getting:
According to this, GCP requires the service account having permissions as Dataflow worker in order to execute my job. The weir part comes over when the error kept on showing up even though I have already set the required permissions:
Can someone explain this strange behavior? thanks so much
To run a Dataflow job, a project must enable billing and the following Google Cloud Platform APIs:
Google Cloud Dataflow API
Compute Engine API (Google Compute Engine)
Google Cloud Logging API
Google Cloud Storage
Google Cloud Storage JSON API
BigQuery API
Google Cloud Pub/Sub
Google Cloud Datastore API
You should also have enough quota in the project for any one of the APIs you are using in the Dataflow job.
I would suggest you to create a fresh service account which its name has not been used before and then granting roles/dataflow.worker to this new fresh service account. Remember, that Cloud IAM propagation takes fewer than 60 seconds, up to 7 minutes, so please have a couple of minutes between an IAM change and Dataflow job creation.
Another possible workaround is to delete the Dataflow worker permission and add it again. The permission remains after the removal of the account, pointing to its old ID. This ID must not be refreshed until explicitly deleting the role.
I encourage you to visit Dataflow IAM roles with role descriptions and permissions documentation.

Minimal access requirement for Dataproc initialization scripts

I have a bucket with initialization actions, that has the following ACL:
deployment_service_user: Owner
dataproc_service_user: Reader
Objects in the bucket have the same ACL. While all users involved into launching that cluster should have the access (gcloud runs as deployment_service_user, and workers should run as dataproc_service_user), I'm getting the following access error:
stderr: ERROR: (gcloud.beta.dataproc.clusters.create) INVALID_ARGUMENT:
Multiple validation errors:
- Access denied for Google Cloud Storage object: 'gs://init-action-bucket/my-init-action.sh'
When I'm adding the following rule to the ACL, it works fine:
project viewers: Reader
Is it possible to specify more specific permission instead of allowing project viewers to read from initialization actions?
Thanks for asking! This is something not very clear in the docs.
The answer depends on whether you're using the Default or Custom Service Account with dataproc VMs.
If you specified a Custom Service Account (as --service-account in gcloud) then you should give reader access to this account. If you're using a custom service account, you still have to give reader access to the Default service account (due to a known issue).
On the other hand if you're not explicitly specifying a service account, then you're using the Compute Engine default service account. It usually looks like this: <your-project-number>-compute#developer.gserviceaccount.com. Give reader to this account.
The user creating the cluster is not required to have acls on init actions.