Our service is using S3 presigned URLS to allow our clients to download data.
Since S3 presigned downloads do not pass through our software stack at the time of download: We cannot be sure if the links were actually used by our clients.
We would like to understand how many bytes are actually being downloaded by our clients. (We give the clients the links, but we do not have visibility on what the actual traffic is)
We are looking for a simple way to monitor the usage (cloudwatch metrics, or any other AWS service). To get a single number: Total number of bytes downloaded by all our clients combined per specific date
Check out S3 usage report. It should allow you to get the usage of the bucket. If your bucket is only used by your clients, this should be enough.
However, if the same bucket has other objects, you may want more granularity. In that case, the article above provides yet another link for enabling server access logging. This logs detailed records for each request, including the number of bytes sent.
Related
I'm creating a platform whereby users upload and download data. The amount of data uploaded isn't trivial---this could be on the order of GB.
Users should be able to download a subset of this data via hyperlinks.
If I'm not mistaken, my AWS account will be charged for the egress of downloaded these files. If that's true, I'm concerned about two related scenarios:
Users who abuse this, and constantly click on the download hyperlinks (more than reasonable)
More concerning, robots which would click the download links every few seconds.
I had planned to make the downloads accessible to anyone who visits the website as a public resource. Naturally, if users logged in to the platform, I could easily restrict the amount of data downloaded over a period of time.
For public websites, how could I stop users from downloading too much? Could I use IP addresses maybe?
Any insight appreciated.
IP address can be easily changed. Thus, its a poor control, but probably better than nothing.
For robots, use capcha. This is an effective way of preventing automated scraping of your links.
In addition, you could considered providing access to your links through API gateway. The gateway has throttling limits which you can set (e.g. 10 invocations per minute). This way you can ensure that you will not go over some pre-defined.
On top of this you could use S3 pre-signed URLs. They have expiration time so you could adjust this time to be valid for short time. This also prevents users from sharing links as they would expire after a set time. In this scenario, he users would obtained the S3 pre-signed urls through a lambda function, which would be invoked from API gateway.
You basically need to decide whether your files are accessible to everyone in the world (like a normal website), or whether they should only be accessible to logged-in users.
As an example, let's say that you were running a photo-sharing website. Users want their photos to be private, but they want to be able to access their own photos and share selected photos with other specific users. In this case, all content should be kept as private by default. The flow would then be:
Users login to the application
When a user wants a link to one of their files, or if the application wants to use an <img> tag within an HTML page (eg to show photo thumbnails), the application can generate an Amazon S3 pre-signed URLs, which is a time-limited URL that grants temporary access to a private object
The user can follow that link, or the browser can use the link within the HTML page
When Amazon S3 receives the pre-signed URL, it verifies that it is correctly created and the expiry time has not been exceeded. If so, it provides access to the file.
When a user shares a photo with another user, your application can track this in a database. If a user requests to see a photo for which they have been granted access, the application can generate a pre-signed URL.
It basically means that your application is in control of which users can access which objects stored in Amazon S3.
Alternatively, if you choose to make all content in Amazon S3 publicly accessible, there is no capability to limit the downloads of the files.
I want to put some files on S3 bucket for a customer to download.
However, I want to restrict the number of download for the IAM user.
Is there any way to achieve this without deploying any additional service?
I have come across metrics to track how many times the customer has downloaded a file, but I havent found a way to restrict it to a specific number.
I could not find a way to do this directly, after checking Signed URLs do not provide a way to control the number of GET operations.
What you can do is to Create an Alarm for the metric you came across that will trigger a Lambda function to add a Deny policy to the IAM user on the specified files when the threshold is reached.
It is not possible to limit the number of downloads of objects from Amazon S3. You would need to write your own application that authenticates users and then serves the content via the application.
An alternative is to provide a pre-signed URL that is valid for only a short period of time, with the expectation that this would not provide enough time for multiple downloads.
I need to send an email along with a large attachment.
I tried using an AWS lambda function along with SES functionality, and my files are stored in S3 with sizes varying between 1MB to 1GB.
It really isn't advisable to send large attachments in emails like this. It would be much more practical to include a link to this file so that it can be downloaded by the user you're sending the email to. S3 allows you to configure permissions settings so that you can ensure this user can download the file. Consider taking that approach.
I would consider using pre-signed URLs to S3 buckets that are granted a limited time until expiry. Pre-signed URLs. Or perhaps through an IAM route. Bucket access to specific roles
We are using the s3 server to allow users to download large zip files a limited number of times. We are searching for a better method of counting downloads that just counting button clicks.
Is there anyways we can give our user a signed url to temporary download the file (like we are doing now) and check that token with amazon to make sure the file was successfully downloaded?
Please let me know what you think
You could use Amazon S3 Server Access Logging:
In order to track requests for access to your bucket, you can enable access logging. Each access log record provides details about a single access request, such as the requester, bucket name, request time, request action, response status, and error code, if any.
There is no automatic ability to limit the number of downloads via an Amazon S3 pre-signed URL.
A pre-signed URL limits access based upon time, but cannot limit based upon quantity.
The closest option would be to provide a very small time window for the pre-signed URL, with the assumption that only one download would happen within that time window.
Let's say that I want to create a simplistic version of Dropbox' website, where you can sign up and perform operations on files such as upload, download, delete, rename, etc. - pretty much like in this question. I want to use Amazon S3 for the storage of the files. This is all quite easy with the AWS SDK, except for one thing: security.
Obviously user A should not be allowed to access user B's files. I can kind of add "security through obscurity" by handling permissions in my application, but it is not good enough to have public files and rely on that, because then anyone with the right URL could access files that they should not be able to. Therefore I have searched and looked through the AWS documentation for a solution, but I have been unable to find a suitable one. The problem is that everything I could find relates to permissions based on AWS accounts, and it is not appropriate for me to create many thousand IAM users. I considered IAM users, bucket policies, S3 ACLs, pre-signed URLs, etc.
I could indeed solve this by authorizing everything in my application and setting permissions on my bucket so that only my application can access the objects, and then having users download files through my application. However, this would put increased load on my application, where I really want people to download the files directly through Amazon S3 to make use of its scalability.
Is there a way that I can do this? To clarify, I want to give a given user in my application access to only a subset of the objects in Amazon S3, without creating thousands of IAM users, which is not so scalable.
Have the users download the files with the help of your application, but not through your application.
Provide each link as a link the points to an endpoint of your application. When each request comes in, evaluate whether the user is authorized to download the file. Evaluate this with the user's session data.
If not, return an error response.
If so, pre-sign a download URL for the object, with a very short expiration time (e.g. 5 seconds) and redirect the user's browser with 302 Found and set the signed URL in the Location: response header. As long as the download is started before the signed URL expires, it won't be interrupted if the URL expires while the download is already in progress.
If the connection to your app, and the scheme of the signed URL are both HTTPS, this provides a substantial level of security against any unauthorized download, at very low resource cost.