Secure way to upload files to GCP Cloud Storage

Secure way to upload files to GCP Cloud Storage - google-cloud-platform

We're making some machines in which there's a part which uploads the images captured by the camera to Google Cloud Storage. For this purpose what I've done is
Create a service account for each machine.
Create a custom role with
permissions:
storage.objects.create
storage.buckets.get
storage.objects.get
Apply this role to that service account.
Download the JSON credentials key file and use this file with python script (in which I specify bucket name) to upload image to GCP Storage.
Is this way of doing things efficient and secure given that we only ship 2-3 machines each month?
Also I will have to ship JSON file with each machine, if the above method is valid, is this fine or there's any method to hide this key file?

Your case isn't so simple!
Firstly, if you want to put a service account in each machine, you will be limited a day (you are limited to 100 service accounts per project). And using the same service account, or the same key is too dangerous
Secondly, your use case sounds like IoT use case where you have lot of devices on edge to communicate with the cloud. But PubSub messages are limited to 10Mb max and IoT Core solution doesn't fit with your case.
The 2 latest solutions are based on the same principle:
Make an endpoint public (Cloud Run, Cloud Functions, App Engine or whatever you want)
Call this endpoint with your machine, and their own token (i.e. a string, encrypted or not)
Check the token, if OK you can (here the 2 alternatives)
Create an access token (short lived token) on a service account with the minimal permission for the machine usage, and send it back to the machine. The machine will use it to call the Google Cloud API, such as Cloud Storage API. The advantage of this solution is that you will be able to use the access token to reach other GCP APIs in the future if your use case, and your machine update require them.
Create a signedUrl and send it back to the machine. Then the machine has to upload file to this URL. The advantage is the strict limitation to Cloud Storage, no other GCP service.
The main issue with the 2 latest solution is that required public endpoint and you are exposed to attacks on it. You can protect it behind a load balancer and mitigate the attacks with Cloud Armor. Think also to limit the scalability of your public endpoint, to prevent any useless expenses in case of attacks.

Related

Using Google Cloud KMS on behalf of user

I have a CLI tool that interacts with Google KMS. In order for it to work, I fetch the user credentials as a JSON file which is stored on disk. Now a new requirement came along. I need to make a web app out of this CLI tool. The web app will be protected via Google Cloud IAP. Question is, how do I run the CLI tool on behalf of the authenticated user?

You don't. Better use a service-account and assign the required role. That service account still could have domain-wide delegation of rights (able to impersonate just any user, which is known).
Running CLI tools from a web-application probably also could/should be avoided. Iit might be better to convert his CLI tool into a Cloud Function and then call it via HTTP trigger, from within the web-application (so that access to the service account is limited as far as possible).
This might also be something to reconsider, security-wise:
I fetch the user credentials as a JSON file which is stored on disk.
Even if it might have been required, with a service-account it wouldn't.

AWS : Python SDK, Do I need to configure Access key and Secure access key

I am trying to write an application in Python.
Through this application I want to create AWS Cognito users and provide services like user Sign-in, Forgot password, etc.
As I understand, boto3, is the standard Python library for accessing AWS APIs, from Python.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html
This library needs storing of AWS credentials ( Access key and secure access keys) on the host machine.
Can this be avoided?
I want to distribute this Python application to my users.
I am checking, if I can avoid this configuration of AWS credentials on every user's host.
Is there any alternative option to boto3 library?

If you absolutely need to access internal AWS API's you need to log in to AWS. Access keys is one way, it's also possible to use aws-adfs command line tool to log in though active directory, but that requires your AWS/AD administrators to do some additional setup on their side.
I would suggest looking into writing a client-server / web applications that would be hosted within AWS and only expose relevant functionality to authenticated users.
If costs are an issue for a hosted application, look into lambdas, as there you pay only for cpu/memory time. In case of setting management app it will probably not even exceed free tier.

AWS - how to separate resource of each user for an AWS Service

I am opening an AWS Service (say: AWS Rekognition) for my app's users.
The problem is: when one user (ex: user1) creates a resource (such as a collection), other users (ex: user2, user3) also see the resource that was created by user1.
I have tried to use Identity Pool, and acquired Token/Identity from my backend server for my users but things are not better (my users still see the resources of each other).
What should I do to let user1 receive user1's resource only?
I have been struggling with this problem for days, but can't seem to figure out.
Regards

There are two approaches to this architecture:
Option 1: Client/Server
In this architecture, client apps (eg on a mobile device or a web-based app) make calls to an API that is hosted by your back-end application. The back-end app then verifies the request and makes calls to AWS on behalf of the user.
The user's app never receives AWS credentials. This is very secure because the back-end app can authenticate all requests and apply business logic.
Option 2: Providing AWS credentials
In this architecture, the client apps receive temporary AWS credentials that enables them to directly call AWS services (which matches the architecture you describe).
The benefit is that the app can directly access AWS services such as Amazon S3. The downside is that they you need to very tightly limit the permissions they are given to ensure they only access the desired resources.
Some services make this easy by allowing Conditions on IAM Permissions that can limit the resources that can be accessed, such as by tag or other identifier.
However, based upon Actions, Resources, and Condition Keys for Amazon Rekognition - AWS Identity and Access Management, there is no such capability for Amazon Rekognition:
Rekognition has no service-specific context keys that can be used in the Condition element of policy statements.
I think you could limit the calls by providing a Resource string in the IAM Policy, which can limit their ability to make certain calls (eg DeleteFaces) so that it is only done against a specific collection.
However, please note that list calls such as ListCollections are either permitted fully or not at all. It is not possible to limit the list of collections returned. (This is the same as most AWS Services, such as listing EC2 instances.)
Thus, when using this method of providing credentials, you should be very careful about the permissions granted to the app.

Using OAuth2 tokens for interactive usage of GCP services instead of service account (keys)

In order to limit the number of service accounts to manage as well as handling their keys, I'm exploring other ways of accessing GCP resources from a developer laptop or desktop so I can run ad-hoc scripts or interactive programs (e.g. Jupyter notebook) that access GCP services.
Using gcloud auth application-default login generates, after authenticating via a web browser, a refresh token that can be used to get and renew access tokens that can be used to interact with GCP services.
The workflow I'm following is this:
Run gcloud auth application-default login. This generates a JSON file on my disk that
contains the refresh token.
Export the JSON file location as GOOGLE_APPLICATION_CREDENTIALS env variable
GOOGLE_APPLICATION_CREDENTIALS=/Users/my.username/.config/gcloud/application_default_credentials.json
Use that file to authenticate via Google auth library and interact with different GCP services.
This is convenient, as it reduces the need to circulate, secure and, if needed, share service account key files around team members. However, I have noticed that the refresh token provided does not expire and is still valid.
Unless I'm missing something here, this makes application_default_credentials.json file as sensitive as a service account key. If it gets lost or compromised it can be used to get access tokens without the need to re-authenticate, which is fairly insecure, IMO.
We're aware of the GCP security best practices recommend using service account (and their keys) for service-to-service workloads. This scenario I'm describing is for ad-hoc, development/testing of code from
a developer's or engineer's laptop. We think that forcing users to interactively authenticate via the web to get new tokens every few hours would be more secure and convenient than using long-lived service account keys stored in the hard drive.
I have read through [1] but I could not find a definitive answer.
Does anyone know if there is an expiration for these refresh tokens?
Is there a way of controlling and limiting their lifetimes (ideally to hours or minutes)?
What is the best/common practice for this scenario? Using a single service account (and key) per individual user?
[1] https://developers.google.com/identity/protocols/OAuth2#expiration

Note: User Credentials have Refresh Tokens too.
Does anyone know if there is an expiration for these refresh tokens?
Google OAuth Refresh Tokens do not expire. They can be revoked.
Is there a way of controlling and limiting their lifetimes (ideally to
hours or minutes)?
You could periodically revoke the Refresh Token which will invalidate the Access and Client ID tokens. This means that you are handling the Refresh Tokens which adds another security issue to manage.
What is the best/common practice for this scenario? Using a single
service account (and key) per individual user?
If you use User Credentials (the method where you log in to Google) you will receive SDK warnings and if you make a lot of API calls, you will become blocked. Google does not want you to use User credentials in place of Service Account credentials. The verification process for User Credentials requires more effort on Google's backend systems. User Credentials are assumed to be created in an insecure environment (web browsers) whereas Service Account credentials are assumed to be in a secure environment.
Best practices are to issue service account JSON key files to an individual application with only the required permissions for that application to operate. For example, if you create a tool that only needs to read Cloud Storage objects, create a service account with only read permissions. Periodically the service account keys should be rotated and new keys downloaded and old keys deleted. Each application should have its own service account JSON key file. I wrote an article on how to securely store JSON key files on Cloud Storage. This helps with rotating keys as your application just downloads the latest key when needed. (link). My article discusses Google Cloud Run, but the same principles apply.

Using Google Cloud Platform Storage to store user images

I was trying to understand the Google Cloud Platform storage but couldn't really comprehend the language used in the documentation. I wanted to ask if you could use the storage and the APIs to store photos users take within your application and also get the images back if provided with a URL? and even if you can, would it be a safe and reasonable method to do so?

Yes you can pretty much use a storage bucket to store any kind of data.
In terms of transferring images from an application to storage buckets, the application must be authorised to write to the bucket.
One option is to use a service account key within the application. A service account is a special account that can be used by an application to authorise to various Google APIs, including the storage API.
There is some more information about service accounts here and information here about using service account keys. These keys can be used within your application, and allow the application to inherit the permission/scopes assigned to that service account.
In terms of retrieving images using a URL, one possible option would be to use signed URLs which would allow you to give users read or write access to an object (in your case images) in a bucket for a given amount of time.
Access to bucket objects can also be controlled with ACL (Access Control Lists). If you're happy for you images to be available publicly (i.e. accessible to everybody), it's possible to set an ACL with 'Reader' access for AllUsers.
More information on this can be found here.
Should you decide to make the images available publically, the URL format to retrive the object/image from the bucket would be:
https://storage.googleapis.com/[BUCKET_NAME]/[OBJECT_NAME]
EDIT:
In relation to using an interface to upload the files before the files land in the bucket, one option would be to have a instance with an external IP address (or multiple instances behind a Load Balancer) where the images are initially uploaded. You could mount Cloud Storage to this instance using FUSE, so that uploaded files are easily transferred to the bucket. In terms of databases you have the option of manually installing your database on a Compute Engine instance, or using a fully managed database service such as Cloud SQL.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js