We are using the s3 server to allow users to download large zip files a limited number of times. We are searching for a better method of counting downloads that just counting button clicks.
Is there anyways we can give our user a signed url to temporary download the file (like we are doing now) and check that token with amazon to make sure the file was successfully downloaded?
Please let me know what you think
You could use Amazon S3 Server Access Logging:
In order to track requests for access to your bucket, you can enable access logging. Each access log record provides details about a single access request, such as the requester, bucket name, request time, request action, response status, and error code, if any.
There is no automatic ability to limit the number of downloads via an Amazon S3 pre-signed URL.
A pre-signed URL limits access based upon time, but cannot limit based upon quantity.
The closest option would be to provide a very small time window for the pre-signed URL, with the assumption that only one download would happen within that time window.
Related
I'm creating a platform whereby users upload and download data. The amount of data uploaded isn't trivial---this could be on the order of GB.
Users should be able to download a subset of this data via hyperlinks.
If I'm not mistaken, my AWS account will be charged for the egress of downloaded these files. If that's true, I'm concerned about two related scenarios:
Users who abuse this, and constantly click on the download hyperlinks (more than reasonable)
More concerning, robots which would click the download links every few seconds.
I had planned to make the downloads accessible to anyone who visits the website as a public resource. Naturally, if users logged in to the platform, I could easily restrict the amount of data downloaded over a period of time.
For public websites, how could I stop users from downloading too much? Could I use IP addresses maybe?
Any insight appreciated.
IP address can be easily changed. Thus, its a poor control, but probably better than nothing.
For robots, use capcha. This is an effective way of preventing automated scraping of your links.
In addition, you could considered providing access to your links through API gateway. The gateway has throttling limits which you can set (e.g. 10 invocations per minute). This way you can ensure that you will not go over some pre-defined.
On top of this you could use S3 pre-signed URLs. They have expiration time so you could adjust this time to be valid for short time. This also prevents users from sharing links as they would expire after a set time. In this scenario, he users would obtained the S3 pre-signed urls through a lambda function, which would be invoked from API gateway.
You basically need to decide whether your files are accessible to everyone in the world (like a normal website), or whether they should only be accessible to logged-in users.
As an example, let's say that you were running a photo-sharing website. Users want their photos to be private, but they want to be able to access their own photos and share selected photos with other specific users. In this case, all content should be kept as private by default. The flow would then be:
Users login to the application
When a user wants a link to one of their files, or if the application wants to use an <img> tag within an HTML page (eg to show photo thumbnails), the application can generate an Amazon S3 pre-signed URLs, which is a time-limited URL that grants temporary access to a private object
The user can follow that link, or the browser can use the link within the HTML page
When Amazon S3 receives the pre-signed URL, it verifies that it is correctly created and the expiry time has not been exceeded. If so, it provides access to the file.
When a user shares a photo with another user, your application can track this in a database. If a user requests to see a photo for which they have been granted access, the application can generate a pre-signed URL.
It basically means that your application is in control of which users can access which objects stored in Amazon S3.
Alternatively, if you choose to make all content in Amazon S3 publicly accessible, there is no capability to limit the downloads of the files.
I'm using S3 to store a bunch of confidential files for clients. The bucket can not have public access and only authenticated users can access these files.
This is my current idea
I'm using Cognito to authenticate the user and allow them to access API Gateway. When they make a request to the path /files, it directs the request to a lambda, which generates a signed url for every file that the user has access too. Then API Gateway returns the list of all these signed urls and the browser displays them.
Gathering a signed url for every file seems very inefficient. Is there any other way to get confidential files from S3 in one large batch?
A safer approach would be for your application to generate signed URLs, valid for a single request or period, and have your bucket accept only requests originating from CloudFront using an Origin Access Identity.
See the documentation for this at https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html
You say "Gathering a signed url for every file seems very inefficient", but the process of creating the Signed URL itself is very easy — just a few lines of code.
However, if there are many files, it would put a lot of work on your users to download each file individually.
Therefore, another approach could be:
Identify all the files they wish to download
Create a Zip of the files and store it in Amazon S3
Provide a Signed URL to the Zip file
Delete the Zip file later (since it is not required anymore), possibly by creating a lifecycle rule on a folder within the bucket
Please note that AWS Lambda functions have a disk storage limit of 500MB, which might not be enough to create the Zip file.
Can I allow a 3rd party file upload to an S3 bucket without using IAM? I would like to avoid the hassle of sending them credentials for an AWS account, but still take advantage of the S3 UI. I have only found solutions for one or the other.
The pre-signed url option sounded great but appears to only work with their SDKs and I'm not about to tell my client to install python on their computer to upload a file.
The browser based upload requires me to make my own front end html form and run in on a server just to upload (lol).
Can I not simply create a pre-signed url which navigates the user to the S3 console and allows them to upload before expiration time? Of course, making the bucket public is not an option either. Why is this so complicated!
Management Console
The Amazon S3 management console will only display S3 buckets that are associated with the AWS account of the user. Also, it is not possible to limit the buckets displayed (it will display all buckets in the account, even if the user cannot access them).
Thus, you certainly don't want to give them access to your AWS management console.
Pre-Signed URL
Your user does not require the AWS SDK to use a pre-signed URL. Rather, you must run your own system that generates the pre-signed URL and makes it available to the user (eg through a web page or API call).
Web page
You can host a static upload page on Amazon S3, but it will not be able to authenticate the user. Since you only wish to provide access to specific people, you'll need some code running on the back-end to authenticate them.
Generate...
You ask: "Can I not simply create a pre-signed url which navigates the user to the S3 console and allows them to upload before expiration time?"
Yes and no. Yes, you can generate a pre-signed URL. However, it cannot be used with the S3 console (see above).
Why is this so complicated?
Because security is important.
So, what to do?
A few options:
Make a bucket publicly writable, but not publicly readable. Tell your customer how to upload. The downside is that anyone could upload to the bucket (if they know about it), so it is only security by obscurity. But, it might be a simple solution for you.
Generate a very long-lived pre-signed URL. You can create a URL that works for months or years. Provide this to them, and they can upload (eg via a static HTML page that you give them).
Generate some IAM User credentials for them, then have them use a utility like the AWS Command-Line Interface (CLI) or Cloudberry. Give them just enough credentials for upload access. This assumes you only have a few customers that need access.
Bottom line: Security is important. Yet, you wish to "avoid the hassle of sending them credentials", nor do you wish to run a system to perform the authentication checks. You can't have security without doing some work, and the cost of poor security will be much more than the cost of implementing good security.
you could deploy a lambda function to call "signed URL" then use that URL to upload the file. here is an example
https://aws.amazon.com/blogs/compute/uploading-to-amazon-s3-directly-from-a-web-or-mobile-application/
Let's say that I want to create a simplistic version of Dropbox' website, where you can sign up and perform operations on files such as upload, download, delete, rename, etc. - pretty much like in this question. I want to use Amazon S3 for the storage of the files. This is all quite easy with the AWS SDK, except for one thing: security.
Obviously user A should not be allowed to access user B's files. I can kind of add "security through obscurity" by handling permissions in my application, but it is not good enough to have public files and rely on that, because then anyone with the right URL could access files that they should not be able to. Therefore I have searched and looked through the AWS documentation for a solution, but I have been unable to find a suitable one. The problem is that everything I could find relates to permissions based on AWS accounts, and it is not appropriate for me to create many thousand IAM users. I considered IAM users, bucket policies, S3 ACLs, pre-signed URLs, etc.
I could indeed solve this by authorizing everything in my application and setting permissions on my bucket so that only my application can access the objects, and then having users download files through my application. However, this would put increased load on my application, where I really want people to download the files directly through Amazon S3 to make use of its scalability.
Is there a way that I can do this? To clarify, I want to give a given user in my application access to only a subset of the objects in Amazon S3, without creating thousands of IAM users, which is not so scalable.
Have the users download the files with the help of your application, but not through your application.
Provide each link as a link the points to an endpoint of your application. When each request comes in, evaluate whether the user is authorized to download the file. Evaluate this with the user's session data.
If not, return an error response.
If so, pre-sign a download URL for the object, with a very short expiration time (e.g. 5 seconds) and redirect the user's browser with 302 Found and set the signed URL in the Location: response header. As long as the download is started before the signed URL expires, it won't be interrupted if the URL expires while the download is already in progress.
If the connection to your app, and the scheme of the signed URL are both HTTPS, this provides a substantial level of security against any unauthorized download, at very low resource cost.
Based on: http://s3.amazonaws.com/doc/s3-example-code/post/post_sample.html
Is there a way to limit a browser based upload to Amazon S3 such that it is rejected if it does not originate from my secure URL (i.e. https://www.someurl.com)?
Thanks!
I want to absolutely guarantee the post is coming from my website
That is impossible.
The web is stateless and a POST coming "from" a specific domain is just not a valid concept, because the Referer: header is trivial to spoof, and a malicious user most likely knows this. Running through an EC2 server will gain you nothing, because it will tell you nothing new and meaningful.
The post policy document not only expires, it also can constrain the object key to a prefix or an exact match. How is a malicious user going to defeat this? They can't.
in your client form you have encrypted/hashed versions of your credentials.
No, you do not.
What you have is a signature that attests to your authorization for S3 to honor the form post. It can't feasibly be reverse-engineered such that the policy can be modified, and that's the point. The form has to match the policy, which can't be edited and still remain valid.
You generate this signature using information known only to you and AWS; specifically, the secret that accompanies your access key.
When S3 receives the request, it computes what the signature should have been. If it's a match, then the privileges of the specific user owning that key are checked to see whether the request is authorized.
By constraining the object key in the policy, you prevent the user from uploading (or overwriting) any object other than the specific one authorized by the policy. Or the specific object ket prefix, in which case, you restrict the user from harm to anything not under that prefix.
If you are handing over a policy that allows any object key to be overwritten in the entire bucket, then you're solving the wrong problem by trying to constrain posts as coming "from" your website.
I think you've misunderstood how the S3 service authenticates.
Your server would have a credentials file holding your access id and key and then your server signs the file as it is uploaded to your S3 bucket.
Amazon's S3 servers then check that the uploaded file has been signed by your access id and key.
This credentials file should never be publicly exposed anywhere and there's no way to get the keys off the wire.
In the case of browser based uploads your form should contain a signature that is passed to Amazon's S3 servers and authenticated against. This signature is generated from a combination of the upload policy, your access id and key but it is hashed so you shouldn't be able to get back to the secret key.
As you mentioned, this could mean that someone would be able to upload to your bucket from outside the confines of your app by simply reusing the signature in the X-Amz-Signature header.
This is what the policy's expiration header is for as it allows you to set a reasonably short expiration period on the form to prevent misuse.
So when a user goes to your upload page your server should generate a policy with a short expiration date (for example, 5 minutes after generation time). It should then create a signature from this policy and your Amazon credentials. From here you can now create a form that will post any data to your S3 bucket with the relevant policy and signature.
If a malicious user was to then attempt to copy the policy and signature and use that directly elsewhere then it would still expire 5 minutes after they originally landed on your upload page.
You can also use the policy to restrict other things such as the name of the file or mime types.
More detailed information is available in the AWS docs about browser based uploads to S3 and how S3 authenticates requests.
To further restrict where requests can come from you should look into enabling Cross-Origin Resource Sharing (CORS) permissions on your S3 bucket.
This allows you to specify which domain(s) each type of request may originate from.
Instead of trying to barricade the door. Remove the door.
A better solution IMHO would be to prevent any uploads at all directly to s3.
Meaning delete your s3 upload policy that allows strangers to upload.
Make them upload to one of your servers.
Validate the upload however you like.
If it is acceptable then your server could move the file to s3.