Is there a way to restrict the GCP Storage bucket to a specific domain, Android/iOS app etc. so that only those entities be allowed to use this particular bucket's resources?
If the user aren't always authenticated, there isn't strong security, only small thing to increase the difficulty to pass through...
I recommend you to serve your assets behind a HTTPS load balancer with the bucket as backend (like a static website).
The main reason is the capacity to use Cloud Armor and to customize the policy to catch and check one of the request attribute. I think you can achieve something with the request header, either with a custom header that you set in your application, or to reuse the Application specific headers (I'm not a mobile developer, but I know that Android has. I'm sure Ios also).
It's not very strong, but it let you the capacity to test easily and to reduce the capacity of anyone to get the content.
Related
I have a Cloud Function which i want to secure by allowing only access from my domain to all users. I am exploring this for days.
Google seems to limit many options and instead you are forced to buy and use more products, for example for this you need a Network Balancer, which is a great product but a monster to smaller businesses, and not everyone needs it (or wants to pay for it).
So, how do you secure a Function on the Console, without IAM (no signin needed), to only allow a certain domain calls before you expand to a Balancer ?
I do see that Google has something called Organization policies for project which supposed to restrict a domain, but the docs are not clear and outdated (indicate UI that doesn't exist)
I know that Firebase has the Anonymous User, which allow a Function to check a Google ID of an anonymous user, but everything online is a Firebase thing, and no explanation anywhere how to do this using normal Function with Python.
EDIT
I do use Firebase Hosting, but my Function is Python and it's handled from the GCP, not a Firebase Function.
Solved, you can use API Gateway, with API key, restrict the key to your domain only, and upload a config with your Function url, so you access it with a API url+key, and nobody else can just run it.
See here Cloud API Gateway doesn't allow with CORS
I wish i could connect it to a domain as well, but we can't, google seems to want everyone to use the expensive Balancer, or Firebase (charged in this case on a Function use for every website visit)
I'm creating a platform whereby users upload and download data. The amount of data uploaded isn't trivial---this could be on the order of GB.
Users should be able to download a subset of this data via hyperlinks.
If I'm not mistaken, my AWS account will be charged for the egress of downloaded these files. If that's true, I'm concerned about two related scenarios:
Users who abuse this, and constantly click on the download hyperlinks (more than reasonable)
More concerning, robots which would click the download links every few seconds.
I had planned to make the downloads accessible to anyone who visits the website as a public resource. Naturally, if users logged in to the platform, I could easily restrict the amount of data downloaded over a period of time.
For public websites, how could I stop users from downloading too much? Could I use IP addresses maybe?
Any insight appreciated.
IP address can be easily changed. Thus, its a poor control, but probably better than nothing.
For robots, use capcha. This is an effective way of preventing automated scraping of your links.
In addition, you could considered providing access to your links through API gateway. The gateway has throttling limits which you can set (e.g. 10 invocations per minute). This way you can ensure that you will not go over some pre-defined.
On top of this you could use S3 pre-signed URLs. They have expiration time so you could adjust this time to be valid for short time. This also prevents users from sharing links as they would expire after a set time. In this scenario, he users would obtained the S3 pre-signed urls through a lambda function, which would be invoked from API gateway.
You basically need to decide whether your files are accessible to everyone in the world (like a normal website), or whether they should only be accessible to logged-in users.
As an example, let's say that you were running a photo-sharing website. Users want their photos to be private, but they want to be able to access their own photos and share selected photos with other specific users. In this case, all content should be kept as private by default. The flow would then be:
Users login to the application
When a user wants a link to one of their files, or if the application wants to use an <img> tag within an HTML page (eg to show photo thumbnails), the application can generate an Amazon S3 pre-signed URLs, which is a time-limited URL that grants temporary access to a private object
The user can follow that link, or the browser can use the link within the HTML page
When Amazon S3 receives the pre-signed URL, it verifies that it is correctly created and the expiry time has not been exceeded. If so, it provides access to the file.
When a user shares a photo with another user, your application can track this in a database. If a user requests to see a photo for which they have been granted access, the application can generate a pre-signed URL.
It basically means that your application is in control of which users can access which objects stored in Amazon S3.
Alternatively, if you choose to make all content in Amazon S3 publicly accessible, there is no capability to limit the downloads of the files.
Goal:
For example, users could create courses which has resources such as images, videos etc.
I want to restrict access to them using signed cookies. i.e. resources on /courses/1 will only be accessible to logged-in users who have a valid signed cookie.
Background
I'll be creating a bucket of media files per course based on https://cloud.google.com/storage/docs/access-control#recommended_bucket_architecture.
Where I am stuck
How to add backend buckets to the load balancer dynamically since I could only add them in the console
How to use the same signing key for all buckets for easy maintenance https://cloud.google.com/cdn/docs/using-signed-cookies#creatingkeys. It seems like I need to manually create a key for each bucket.
So is there a standard way to do these or am I thinking about this whole architecture wrong since this won't scale without automation?
You will be limited to 50 path rules as mentioned in the Quotas, limited to 50 courses. I hope you expect more than this!!
So, this pattern isn't suitable for your use case. You need to use the same bucket and to control access with a backend app. And then to generated SignedUrl for the resources requested by the users
I was trying to understand the Google Cloud Platform storage but couldn't really comprehend the language used in the documentation. I wanted to ask if you could use the storage and the APIs to store photos users take within your application and also get the images back if provided with a URL? and even if you can, would it be a safe and reasonable method to do so?
Yes you can pretty much use a storage bucket to store any kind of data.
In terms of transferring images from an application to storage buckets, the application must be authorised to write to the bucket.
One option is to use a service account key within the application. A service account is a special account that can be used by an application to authorise to various Google APIs, including the storage API.
There is some more information about service accounts here and information here about using service account keys. These keys can be used within your application, and allow the application to inherit the permission/scopes assigned to that service account.
In terms of retrieving images using a URL, one possible option would be to use signed URLs which would allow you to give users read or write access to an object (in your case images) in a bucket for a given amount of time.
Access to bucket objects can also be controlled with ACL (Access Control Lists). If you're happy for you images to be available publicly (i.e. accessible to everybody), it's possible to set an ACL with 'Reader' access for AllUsers.
More information on this can be found here.
Should you decide to make the images available publically, the URL format to retrive the object/image from the bucket would be:
https://storage.googleapis.com/[BUCKET_NAME]/[OBJECT_NAME]
EDIT:
In relation to using an interface to upload the files before the files land in the bucket, one option would be to have a instance with an external IP address (or multiple instances behind a Load Balancer) where the images are initially uploaded. You could mount Cloud Storage to this instance using FUSE, so that uploaded files are easily transferred to the bucket. In terms of databases you have the option of manually installing your database on a Compute Engine instance, or using a fully managed database service such as Cloud SQL.
Let's say that I want to create a simplistic version of Dropbox' website, where you can sign up and perform operations on files such as upload, download, delete, rename, etc. - pretty much like in this question. I want to use Amazon S3 for the storage of the files. This is all quite easy with the AWS SDK, except for one thing: security.
Obviously user A should not be allowed to access user B's files. I can kind of add "security through obscurity" by handling permissions in my application, but it is not good enough to have public files and rely on that, because then anyone with the right URL could access files that they should not be able to. Therefore I have searched and looked through the AWS documentation for a solution, but I have been unable to find a suitable one. The problem is that everything I could find relates to permissions based on AWS accounts, and it is not appropriate for me to create many thousand IAM users. I considered IAM users, bucket policies, S3 ACLs, pre-signed URLs, etc.
I could indeed solve this by authorizing everything in my application and setting permissions on my bucket so that only my application can access the objects, and then having users download files through my application. However, this would put increased load on my application, where I really want people to download the files directly through Amazon S3 to make use of its scalability.
Is there a way that I can do this? To clarify, I want to give a given user in my application access to only a subset of the objects in Amazon S3, without creating thousands of IAM users, which is not so scalable.
Have the users download the files with the help of your application, but not through your application.
Provide each link as a link the points to an endpoint of your application. When each request comes in, evaluate whether the user is authorized to download the file. Evaluate this with the user's session data.
If not, return an error response.
If so, pre-sign a download URL for the object, with a very short expiration time (e.g. 5 seconds) and redirect the user's browser with 302 Found and set the signed URL in the Location: response header. As long as the download is started before the signed URL expires, it won't be interrupted if the URL expires while the download is already in progress.
If the connection to your app, and the scheme of the signed URL are both HTTPS, this provides a substantial level of security against any unauthorized download, at very low resource cost.