Let's say that I want to create a simplistic version of Dropbox' website, where you can sign up and perform operations on files such as upload, download, delete, rename, etc. - pretty much like in this question. I want to use Amazon S3 for the storage of the files. This is all quite easy with the AWS SDK, except for one thing: security.
Obviously user A should not be allowed to access user B's files. I can kind of add "security through obscurity" by handling permissions in my application, but it is not good enough to have public files and rely on that, because then anyone with the right URL could access files that they should not be able to. Therefore I have searched and looked through the AWS documentation for a solution, but I have been unable to find a suitable one. The problem is that everything I could find relates to permissions based on AWS accounts, and it is not appropriate for me to create many thousand IAM users. I considered IAM users, bucket policies, S3 ACLs, pre-signed URLs, etc.
I could indeed solve this by authorizing everything in my application and setting permissions on my bucket so that only my application can access the objects, and then having users download files through my application. However, this would put increased load on my application, where I really want people to download the files directly through Amazon S3 to make use of its scalability.
Is there a way that I can do this? To clarify, I want to give a given user in my application access to only a subset of the objects in Amazon S3, without creating thousands of IAM users, which is not so scalable.
Have the users download the files with the help of your application, but not through your application.
Provide each link as a link the points to an endpoint of your application. When each request comes in, evaluate whether the user is authorized to download the file. Evaluate this with the user's session data.
If not, return an error response.
If so, pre-sign a download URL for the object, with a very short expiration time (e.g. 5 seconds) and redirect the user's browser with 302 Found and set the signed URL in the Location: response header. As long as the download is started before the signed URL expires, it won't be interrupted if the URL expires while the download is already in progress.
If the connection to your app, and the scheme of the signed URL are both HTTPS, this provides a substantial level of security against any unauthorized download, at very low resource cost.
Related
Is it possible in S3 to allow dynamic groups of users access to resources in a bucket? For example, I know you can use Cognito to restrict access of users' content to the respective users. However, I don't know how to apply some dynamic rule which would require DB access. Some example scenarios I can think of:
Instagram-like functionality, users can connect with friends and upload photos. Only friends can view a user's photos.
Project-level resources. Multiple users can be added to a project, and only members of the project may view its resources. Projects can be created and managed by users and so are not pre-defined.
Users have private file storage, but can share files with other users.
Now the obvious 1st layer of protection would be the front-end simply not giving the links to these resources to unauthorized users. But suppose in the second scenario, the S3 link to SECRET_COMPANY_DATA.zip gets leaked. I would hope that when someone tries to access that link, it only succeeds if they're in the associated project and have sufficient privileges.
I think, to some degree, this can be handled with adding custom claims to the cognito token, e.g. you could probably add a project_id claim and do a similar path-based Allow on it. But if a user can be part of multiple projects, this seems to go out the window.
It seems to me like this should be a common enough requirement enough that there is a simple solution. Any advice?
The best approach would be:
Keep your bucket private, with no Bucket Policy
Users authenticate to your app
When a user requests access to a file stored in Amazon S3, the app should check if they are permitted to access the file. This could check who 'owns' the file, their list of friends, their projects, etc. You would program all this logic in your own app.
If the user is authorised to access the file, the your app should generate an Amazon S3 pre-signed URL, which is a time-limited URL that provides temporary access to a private object. This URL can be inserted into HTML, such as in <a HREF="..."> or <img src="...">.
When the user clicks the link, Amazon S3 will verify the signature and will confirm that the link has not yet expired. If everything is okay, it will return the file to the user's browser.
This approach means that your app can control all the authentication and authorization, while S3 will be responsible for serving the content to the user.
If another person got access to the pre-signed URL, then they can also download the content. Therefore, keep the expiry time to a minimum (a few minutes). After this period, the URL will no longer work.
Your app can generate the pre-signed URL in a few lines of code. It does not require a call to AWS to create the URL.
I'm creating a platform whereby users upload and download data. The amount of data uploaded isn't trivial---this could be on the order of GB.
Users should be able to download a subset of this data via hyperlinks.
If I'm not mistaken, my AWS account will be charged for the egress of downloaded these files. If that's true, I'm concerned about two related scenarios:
Users who abuse this, and constantly click on the download hyperlinks (more than reasonable)
More concerning, robots which would click the download links every few seconds.
I had planned to make the downloads accessible to anyone who visits the website as a public resource. Naturally, if users logged in to the platform, I could easily restrict the amount of data downloaded over a period of time.
For public websites, how could I stop users from downloading too much? Could I use IP addresses maybe?
Any insight appreciated.
IP address can be easily changed. Thus, its a poor control, but probably better than nothing.
For robots, use capcha. This is an effective way of preventing automated scraping of your links.
In addition, you could considered providing access to your links through API gateway. The gateway has throttling limits which you can set (e.g. 10 invocations per minute). This way you can ensure that you will not go over some pre-defined.
On top of this you could use S3 pre-signed URLs. They have expiration time so you could adjust this time to be valid for short time. This also prevents users from sharing links as they would expire after a set time. In this scenario, he users would obtained the S3 pre-signed urls through a lambda function, which would be invoked from API gateway.
You basically need to decide whether your files are accessible to everyone in the world (like a normal website), or whether they should only be accessible to logged-in users.
As an example, let's say that you were running a photo-sharing website. Users want their photos to be private, but they want to be able to access their own photos and share selected photos with other specific users. In this case, all content should be kept as private by default. The flow would then be:
Users login to the application
When a user wants a link to one of their files, or if the application wants to use an <img> tag within an HTML page (eg to show photo thumbnails), the application can generate an Amazon S3 pre-signed URLs, which is a time-limited URL that grants temporary access to a private object
The user can follow that link, or the browser can use the link within the HTML page
When Amazon S3 receives the pre-signed URL, it verifies that it is correctly created and the expiry time has not been exceeded. If so, it provides access to the file.
When a user shares a photo with another user, your application can track this in a database. If a user requests to see a photo for which they have been granted access, the application can generate a pre-signed URL.
It basically means that your application is in control of which users can access which objects stored in Amazon S3.
Alternatively, if you choose to make all content in Amazon S3 publicly accessible, there is no capability to limit the downloads of the files.
I'm not sure if this is the appropriate use case, so please tell me what to look for if I'm incorrect in my assumption of how to do this.
What I'm trying to do:
I have an s3 bucket with different 'packs' that users can download. Upon their purchase, they are given a user role in Wordpress. I have an S3 browser set up via php that makes requests to the bucket for info.
Based on their 'role', it will only show files that match prefix (whole pack users see all, single product people only see single product prefix).
In that way, the server will be sending the files on behalf of the user, and changing IAM roles based on the user's permission level. Do I have to have it set that way? Can I just analyze the WP role and specify and endpoint or query that notes the prefixes allowed?
Pack users see /
Individual users see /--prefix/
If that makes sense
Thanks in advance! I've never used AWS, so this is all new to me. :)
This sounds too complex. It's possible to do with AWS STS but it would be extremely fragile.
I presume you're hiding the actual S3 bucket from end users and are streaming through your php application? If so, it makes more sense to do any role-based filtering in the php application as you have far more logic available to you there - IAM is granular, but restrictions to resources in S3 is going to be funky and there's always a chance you'll get something wrong and expose the incorrect downloads.
Rather do this inside your app:
establish the role you've granted
issue the S3 ls command filtered by the role - i.e. if the role permits only --prefix, issue the ls command so that it only lists files matching --prefix
don't expose files in the bucket globally - only your app should have access to the S3 bucket - that way people also can't share links once they've downloaded a pack.
this has the added benefit of not encoding your S3 bucket structure in IAM, and keeps your decision logic isolated to code.
There are basically three ways you can grant access to private content in Amazon S3.
Option 1: IAM credentials
You can add a policy to an IAM User, so that they can access private content. However, such credentials should only be used by staff in your own organization. it should not be used to grant access to application users.
Option 2: Temporary credentials via STS
Your application can generate temporary credentials via the AWS Security Token Service. These credentials can be given specific permissions and are valid for a limited time period. This is ideal for granting mobile apps access to Amazon S3 because they can communicate directly with S3 without having to go via the back-end app. The credentials would only be granted access to resources they are permitted to use.
These types of credentials can also be used by web applications, where the web apps make calls directly to AWS services (eg from Node/JavaScript in the browser). However, this doesn't seem suitable for your WordPress situation.
Option 3: Pre-Signed URLs
Imagine a photo-sharing application where users can access their private photos, and users can also share photos with other users. When a user requests access to a particular photo (or when the back-end app is creating an HTML page that uses a photo), the app can generate a pre-signed URL that grants temporary access to an Amazon S3 object.
Each pre-signed URL gives access only to a single S3 object and only for a selected time period (eg 5 minutes). This means that all the permission logic for whether a user is entitled to access a file can be performed in the back-end application. When the back-end application provides a pre-signed URL to the user's browser, the user can access the content directly from Amazon S3 without going via the back-end.
See: Amazon S3 pre-signed URLs
Your situation sounds suitable for Option #3. Once you have determined that a user is permitted to access a particular file in S3, it can generate the pre-signed URL and include it as a link (or even in <img src=...> tags). The user can then download the file. There is no need to use IAM Roles in this process.
I'm new to S3 and I'm wondering how real-world web applications typically interact with it, in particular how user access permissions are handled.
Say, for instance, that I have designed a basic project management web application which, amongst other features, permits users to upload project files into a shared space which other project members can access.
So User file upload/read access would be determined by project membership but also by project roles.
Using S3, would one simply create a Bucket for the entire application with a single S3 user with all permissions and leave the handling of the user permissions to the application ? Or am I missing something ? I haven't been able to find many examples of real-world S3 usage online, in particular where access permissions are concerned.
The typical architecture is to keep the Amazon S3 buckets totally private.
When your application determines that a user is permitted to upload or download a file, it can generate a Presigned URL. This is a time-limited URL that allows an object to be uploaded or downloaded.
When uploading, it is also possible to Create a POST Policy to enforce some restrictions on the upload, such as its length, type and where it is being stored. If the upload meets the requirements, the file will be accepted.
You should maintain a database that identifies all objects that have been uploaded and maps it to the 'owner', permission groups, shares, etc. All of this is application-specific. Later, when a user requests a particular object for download, your app can generate a pre-signed URL that lets the user download the object even those it is a private object.
Always have your application determine permissions for accessing an object. Do not define application users as IAM Users.
If there is a straight-forward permission model (eg all of one user's files are in one path/folder within an S3 bucket), you can generate temporary credentials using the AWS Security Token Service that grants List and Get permissions on the given path. This can be useful for mobile applications that could then directly call the Amazon S3 API to retrieve objects. However, it is not suitable for a web-based application.
Can I allow a 3rd party file upload to an S3 bucket without using IAM? I would like to avoid the hassle of sending them credentials for an AWS account, but still take advantage of the S3 UI. I have only found solutions for one or the other.
The pre-signed url option sounded great but appears to only work with their SDKs and I'm not about to tell my client to install python on their computer to upload a file.
The browser based upload requires me to make my own front end html form and run in on a server just to upload (lol).
Can I not simply create a pre-signed url which navigates the user to the S3 console and allows them to upload before expiration time? Of course, making the bucket public is not an option either. Why is this so complicated!
Management Console
The Amazon S3 management console will only display S3 buckets that are associated with the AWS account of the user. Also, it is not possible to limit the buckets displayed (it will display all buckets in the account, even if the user cannot access them).
Thus, you certainly don't want to give them access to your AWS management console.
Pre-Signed URL
Your user does not require the AWS SDK to use a pre-signed URL. Rather, you must run your own system that generates the pre-signed URL and makes it available to the user (eg through a web page or API call).
Web page
You can host a static upload page on Amazon S3, but it will not be able to authenticate the user. Since you only wish to provide access to specific people, you'll need some code running on the back-end to authenticate them.
Generate...
You ask: "Can I not simply create a pre-signed url which navigates the user to the S3 console and allows them to upload before expiration time?"
Yes and no. Yes, you can generate a pre-signed URL. However, it cannot be used with the S3 console (see above).
Why is this so complicated?
Because security is important.
So, what to do?
A few options:
Make a bucket publicly writable, but not publicly readable. Tell your customer how to upload. The downside is that anyone could upload to the bucket (if they know about it), so it is only security by obscurity. But, it might be a simple solution for you.
Generate a very long-lived pre-signed URL. You can create a URL that works for months or years. Provide this to them, and they can upload (eg via a static HTML page that you give them).
Generate some IAM User credentials for them, then have them use a utility like the AWS Command-Line Interface (CLI) or Cloudberry. Give them just enough credentials for upload access. This assumes you only have a few customers that need access.
Bottom line: Security is important. Yet, you wish to "avoid the hassle of sending them credentials", nor do you wish to run a system to perform the authentication checks. You can't have security without doing some work, and the cost of poor security will be much more than the cost of implementing good security.
you could deploy a lambda function to call "signed URL" then use that URL to upload the file. here is an example
https://aws.amazon.com/blogs/compute/uploading-to-amazon-s3-directly-from-a-web-or-mobile-application/