I was trying to understand the Google Cloud Platform storage but couldn't really comprehend the language used in the documentation. I wanted to ask if you could use the storage and the APIs to store photos users take within your application and also get the images back if provided with a URL? and even if you can, would it be a safe and reasonable method to do so?
Yes you can pretty much use a storage bucket to store any kind of data.
In terms of transferring images from an application to storage buckets, the application must be authorised to write to the bucket.
One option is to use a service account key within the application. A service account is a special account that can be used by an application to authorise to various Google APIs, including the storage API.
There is some more information about service accounts here and information here about using service account keys. These keys can be used within your application, and allow the application to inherit the permission/scopes assigned to that service account.
In terms of retrieving images using a URL, one possible option would be to use signed URLs which would allow you to give users read or write access to an object (in your case images) in a bucket for a given amount of time.
Access to bucket objects can also be controlled with ACL (Access Control Lists). If you're happy for you images to be available publicly (i.e. accessible to everybody), it's possible to set an ACL with 'Reader' access for AllUsers.
More information on this can be found here.
Should you decide to make the images available publically, the URL format to retrive the object/image from the bucket would be:
https://storage.googleapis.com/[BUCKET_NAME]/[OBJECT_NAME]
EDIT:
In relation to using an interface to upload the files before the files land in the bucket, one option would be to have a instance with an external IP address (or multiple instances behind a Load Balancer) where the images are initially uploaded. You could mount Cloud Storage to this instance using FUSE, so that uploaded files are easily transferred to the bucket. In terms of databases you have the option of manually installing your database on a Compute Engine instance, or using a fully managed database service such as Cloud SQL.
Related
We're making some machines in which there's a part which uploads the images captured by the camera to Google Cloud Storage. For this purpose what I've done is
Create a service account for each machine.
Create a custom role with
permissions:
storage.objects.create
storage.buckets.get
storage.objects.get
Apply this role to that service account.
Download the JSON credentials key file and use this file with python script (in which I specify bucket name) to upload image to GCP Storage.
Is this way of doing things efficient and secure given that we only ship 2-3 machines each month?
Also I will have to ship JSON file with each machine, if the above method is valid, is this fine or there's any method to hide this key file?
Your case isn't so simple!
Firstly, if you want to put a service account in each machine, you will be limited a day (you are limited to 100 service accounts per project). And using the same service account, or the same key is too dangerous
Secondly, your use case sounds like IoT use case where you have lot of devices on edge to communicate with the cloud. But PubSub messages are limited to 10Mb max and IoT Core solution doesn't fit with your case.
The 2 latest solutions are based on the same principle:
Make an endpoint public (Cloud Run, Cloud Functions, App Engine or whatever you want)
Call this endpoint with your machine, and their own token (i.e. a string, encrypted or not)
Check the token, if OK you can (here the 2 alternatives)
Create an access token (short lived token) on a service account with the minimal permission for the machine usage, and send it back to the machine. The machine will use it to call the Google Cloud API, such as Cloud Storage API. The advantage of this solution is that you will be able to use the access token to reach other GCP APIs in the future if your use case, and your machine update require them.
Create a signedUrl and send it back to the machine. Then the machine has to upload file to this URL. The advantage is the strict limitation to Cloud Storage, no other GCP service.
The main issue with the 2 latest solution is that required public endpoint and you are exposed to attacks on it. You can protect it behind a load balancer and mitigate the attacks with Cloud Armor. Think also to limit the scalability of your public endpoint, to prevent any useless expenses in case of attacks.
Goal:
For example, users could create courses which has resources such as images, videos etc.
I want to restrict access to them using signed cookies. i.e. resources on /courses/1 will only be accessible to logged-in users who have a valid signed cookie.
Background
I'll be creating a bucket of media files per course based on https://cloud.google.com/storage/docs/access-control#recommended_bucket_architecture.
Where I am stuck
How to add backend buckets to the load balancer dynamically since I could only add them in the console
How to use the same signing key for all buckets for easy maintenance https://cloud.google.com/cdn/docs/using-signed-cookies#creatingkeys. It seems like I need to manually create a key for each bucket.
So is there a standard way to do these or am I thinking about this whole architecture wrong since this won't scale without automation?
You will be limited to 50 path rules as mentioned in the Quotas, limited to 50 courses. I hope you expect more than this!!
So, this pattern isn't suitable for your use case. You need to use the same bucket and to control access with a backend app. And then to generated SignedUrl for the resources requested by the users
I'm new to S3 and I'm wondering how real-world web applications typically interact with it, in particular how user access permissions are handled.
Say, for instance, that I have designed a basic project management web application which, amongst other features, permits users to upload project files into a shared space which other project members can access.
So User file upload/read access would be determined by project membership but also by project roles.
Using S3, would one simply create a Bucket for the entire application with a single S3 user with all permissions and leave the handling of the user permissions to the application ? Or am I missing something ? I haven't been able to find many examples of real-world S3 usage online, in particular where access permissions are concerned.
The typical architecture is to keep the Amazon S3 buckets totally private.
When your application determines that a user is permitted to upload or download a file, it can generate a Presigned URL. This is a time-limited URL that allows an object to be uploaded or downloaded.
When uploading, it is also possible to Create a POST Policy to enforce some restrictions on the upload, such as its length, type and where it is being stored. If the upload meets the requirements, the file will be accepted.
You should maintain a database that identifies all objects that have been uploaded and maps it to the 'owner', permission groups, shares, etc. All of this is application-specific. Later, when a user requests a particular object for download, your app can generate a pre-signed URL that lets the user download the object even those it is a private object.
Always have your application determine permissions for accessing an object. Do not define application users as IAM Users.
If there is a straight-forward permission model (eg all of one user's files are in one path/folder within an S3 bucket), you can generate temporary credentials using the AWS Security Token Service that grants List and Get permissions on the given path. This can be useful for mobile applications that could then directly call the Amazon S3 API to retrieve objects. However, it is not suitable for a web-based application.
I have an application where users are part of a 'group' of users. Each group can 'upload' documents to the application. Behind the scenes I am using S3 to store these documents.
I've spent a ton of time reading the AWS documentation but still don't understand the simplest/correct way to do the following:
User 1 in group A can upload documents to application
User 2 in group A can see and access all group A documents in application
User 3 in group B can upload documents to application
User 3 in group B cannot see any documents that belong to group A (and vice-versa)
Should I be using the API to create a new bucket for each 'group'?
Or can all of this be done in a single bucket with subdirectories for each group & then set access limitations?
Should I be setting up an IAM group policy and applying it to each web app user?
I'm not sure of the best architecture for this scenario so would really appreciate a point in the right direction.
AWS credentials should be assigned to your application and to your IT staff who need to maintain the application.
Users of your application should not be given AWS credentials.
Users should interact directly with your application and your application will make calls to the AWS API from the back-end. This way, your application has full control of what data they can see and what operations they can perform.
Think of it like a database -- you never want to give users direct access to a database. Instead, they should always interact via an application, which will store and update information in a database.
There are some common exceptions to the above:
If you want users to access/download a file stored in S3, your application can generate a pre-signed URL, which is a time-limited URL that permits access to an Amazon S3 object. Your application is responsible for generating the URL when it wants to grant access and the URl can be included in an HTML page (eg show a private picture on a web page).
If you want to allow users to upload files directly to S3, you could again use a pre-signed URL or you could grant public Write access to an Amazon S3 bucket. Think of it like a modern FTP server.
Bottom line: Your application is in charge! Also, consider using pre-signed URLs to provide direct access to objects when the application permits it.
I work on a SaaS application where Creators can create Groups and invite others to their Group to share files, chat and so on. Only people within specific group should have access to this group's files.
People from other group must not have access to not their group's files.
And of course all files permission should be set to 'Private', i.e. they should not be searchable/visible/accessable by anonymous users of Internet since information in those files is for personal use only.
I am new to Amazon S3 and don't know how to achieve it... Should I create only 1 main bucket? Or create for each group a new Amazon Bucket?
It is not recommended to use AWS Identity and Access Management (IAM) for storing application users. Application users should be maintained in a separate database (or LDAP, Active directory, etc).
Therefore, creating "one bucket per group" is not feasible, since it is not possible to assign your applications users to permissions within Amazon S3.
The better method would be to manage permissions within your application. When a user requests access to a file, the application can determine whether they should be permitted access. If they are permitted, then the application can generate a Pre-Signed URL.
A Pre-Signed URL permits access to private objects stored on Amazon S3. It is a means of keeping objects secure, yet granting temporary access to a specific object.
When listing available files, your application would generate links that include the pre-signed URL. Then, when a user clicks the link, they can access the file. Then, after a certain time has expired (eg 10 minutes), the link will no longer function. So, if a user shares a link with somebody else, it will probably have timed-out.
See: Creating a pre-signed URL in Ruby