How to download hundreds of confidential files from S3?

How to download hundreds of confidential files from S3? - amazon-web-services

I'm using S3 to store a bunch of confidential files for clients. The bucket can not have public access and only authenticated users can access these files.
This is my current idea
I'm using Cognito to authenticate the user and allow them to access API Gateway. When they make a request to the path /files, it directs the request to a lambda, which generates a signed url for every file that the user has access too. Then API Gateway returns the list of all these signed urls and the browser displays them.
Gathering a signed url for every file seems very inefficient. Is there any other way to get confidential files from S3 in one large batch?

A safer approach would be for your application to generate signed URLs, valid for a single request or period, and have your bucket accept only requests originating from CloudFront using an Origin Access Identity.
See the documentation for this at https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html

You say "Gathering a signed url for every file seems very inefficient", but the process of creating the Signed URL itself is very easy — just a few lines of code.
However, if there are many files, it would put a lot of work on your users to download each file individually.
Therefore, another approach could be:
Identify all the files they wish to download
Create a Zip of the files and store it in Amazon S3
Provide a Signed URL to the Zip file
Delete the Zip file later (since it is not required anymore), possibly by creating a lifecycle rule on a folder within the bucket
Please note that AWS Lambda functions have a disk storage limit of 500MB, which might not be enough to create the Zip file.

Related

AWS S3 Per Bucket Permission for non-AWS accounts

This question is in the same line of thought than Is it possible to give token access to link to amazon s3 storage?.
Basically, we are building an app where groups of users can save pictures, that should be visible only to their own group.
We are thinking of using either a folder per user group, or it could even be an independent S3 bucket per user group.
The rules are very simple:
Any member of Group A should be able to add a picture to the Group A folder (or bucket)
Any member of Group A should be able to read all pictures of the Group A folder (or bucket)
No member of Group A should not have access to any of the pictures
However, the solution used by the post mentioned above (temporary pre-signed URLs) is not usable, as we need the client to be able to write files on his bucket as well as read the files on his bucket, without having any access to any other bucket. The file write part is the difficulty here and the reason why we cannot use pre-signed URLs.
Additionally, the solution from various AWS security posts that we read (for example https://aws.amazon.com/blogs/security/writing-iam-policies-grant-access-to-user-specific-folders-in-an-amazon-s3-bucket/) do not apply because they show how to control accesses for IAM groups of for other AWS accounts. In our case, a group of users does not have an IAM account...
The only solutions that we see so far are either insecure or wasteful
Open buckets to everybody and rely on obfuscating the folder / bucket names (lots of security issues, including the ability to brute force and read / overwrite anybody's files)
Have a back-end that acts as a facade between the app and S3, validating the accesses. S3 has no public access, the bucket is only opened to an IAM role that the back-end has. However this is a big waste of bandwidth, since all the data would transit on the EC2 instance(s) of that back-end
Any better solution?
Is this kind of customized access doable with S3?

The correct way to achieve your goal is to use Amazon S3 pre-signed URLs, which are time-limited URLs that provides temporary access to a private object.
You can also Upload objects using presigned URLs - Amazon Simple Storage Service.
The flow is basically:
Users authenticate to your back-end app
When a user wants to access a private object, the back-end verifies that they are permitted to access the object (using your own business logic, such as the Groups you mention). If they are allowed to access the object, the back-end generates a pre-signed URL.
The pre-signed URL is returned to the user's browser, such as putting it in a <img src="..."> tag.
When the user's browser requests the object, S3 verifies the signature in the pre-signed URL. If it is valid and the time period has not expired, S3 provides the requested object. (Otherwise, it returns Access Denied.)
A similar process is used when users upload objects:
Users authenticate to your back-end app
They request the opportunity to upload a file
Your back-end app generates an S3 Pre-signed URL that is included in the HTML page for upload
Your back-end should track the object in a database so it knows who performed the upload and keeps track of who is permitted to access the object (eg particular users or groups)
Your back-end app is fully responsible for deciding whether particular users can upload/download objects. It then hands-off the actual upload/download process to S3 via the pre-signed URLs. This reduces load on your server because all uploads/downloads go direct to/from S3.

Using S3 for uploads but not allowing public access

My idea was (is) to create an S3 bucket for allowing users to upload binary objects. The next step would be to confirm the upload and the API will then initiate processing of the file.
To make it more secure the client would first request an upload location. The API then allocates and pre-creates a one-time use directory on S3 for this upload, and sets access policy on that directory to allow a file to be dumped in there (but ideally not be read or even overwritten).
After confirmation by the client the API initiates processing and clean-up.
The problem I'm facing is authentication and authorisation. Simplest would be to allow public write with difficult-to-guess bucket directories, eg
s3://bucket/year/month/day/UUID/UUID/filename
Where the date is added in to allow clean-up later for orphaned files (and should volume grow to require it one can add hours/minutes.
The first UUID is not meaningful other than providing a unique upload location. The second identifies the user.
The entire path is created by the API. The API then allows the user access to write into that final directory. (The user should not be allowed to create this directory).
The question I'm stuck with is that from googling it seems that public writable S3 buckets is considered bad practice, even horribly so.
What alternative do I have?
a) provide the client with some kind of access token?
b) create an IAM account for every uploader (I do not want to be tied to Amazon this way)
c) Any other options?
P.S And is it possible to control the actual file name that the client can use to create a file from the policy?

From what I understand, your goals are to:
Securely allow users to upload specific files to an S3 bucket
Limit access by preventing users from reading or writing other files
Ideally, upload the files directly to S3 without going through your server
You can do this by generating presigned PUT URLs server-side and returning those URLs to the client. The client can use those URLs to upload directly to S3. The client is limited to only the filename you specify when signing the URL. It will be limited to PUT only. You keep your AWS access keys secure on the server and never send it to the client.
If you are using the PutObject API, you only need to sign one URL per file. If you are using the multi-part upload API, it's a bit more complicated and you'll need to start and finish the upload server-side and send presigned UploadPart URLs to the client.

How to access files from one S3 bucket into another bucket without making them public?

To make files accessible to other S3 buckets, we either need to make the bucket public or enable cors configuration.
I have an HTML page in one public bucket which is hosted as a static website. In another bucket, I have mp3 files. This bucket is not public. From the first bucket, the HTML invokes a script.js file that tries to access the MP3 files in the second bucket using the resource URL. This is not directly possible and gives a 403 error. Hence, I wrote a CORS configuration for bucket-2 with the ARN of the first bucket in . Still, the script was unable to access the MP3 files. I also tried using the static website URL instead of ARN. Again got a 403 error. Is it possible to enable the script.js to access the mp3 files in bucket-2 without making bucket-2 public?

You have to understand that your javascript is run in the customer's browser window, hence this is the browser trying to access the mp3 file in your second bucket, not the first bucket.
Knowing that, there is no easy solution to solve your problem, beside opening access to the second bucket and using CORS as you tried (but CORS alone will not give access to the private bucket)
Proposal 1 : manually generated signatures
If you just want to give access to a couple of files in the second bucket (and not all files) I would recommend to include in your javascript a fully signed URL to the object in the second bucket. Signed URLs allow to access individual objects in a non public bucket, as per S3 documentation. However generating signatures is not trivial and requires a bit of code.
I wrote this command line utility to help you to generate a signature for a given object in a private bucket.
https://github.com/sebsto/s3sign
The AWS command line has also a presign option nowadays
https://docs.aws.amazon.com/cli/latest/reference/s3/presign.html
Also, signature are time-bounded and the maximum age is 7 days So if you choose this approach, you will need to re-generate your links every week. This is not very scalable but can be easy to automate.
Proposal 2 : dynamic signature generation on the web server
If you decide to move away from client-side Javascript and use server side generated pages instead (using Python, Ruby, PhP etc ... and a server) you can dynamically generate signatures from your server. The downside of this approach is that you will need a server.
Proposal 3 : dynamic signature generation, serverless
If you're familiar with AWS Lambda and API Gateway, you can create a serverless service that will dynamically return a signed URL to your MP3 file. Your static HTML page (or client side Javascript) will call the API Gateway URL, the API Gateway will call Lambda and Lambda, based on the path or query string, will return the appropriate signed URL for your MP3.
Proposal 2 and 3 have AWS costs associated to it (either to run an EC2 server, or for the API Gateway and Lambda execution time), so be sure to check AWS Pricing before choosing an option. (hint : Proposal 3 will be more cost effective)
The real question is WHY do you want to create this ? Why can't you have all your public content in the same bucket using fine grained S3 access policies when required.

Confirm download from S3 server to user

We are using the s3 server to allow users to download large zip files a limited number of times. We are searching for a better method of counting downloads that just counting button clicks.
Is there anyways we can give our user a signed url to temporary download the file (like we are doing now) and check that token with amazon to make sure the file was successfully downloaded?
Please let me know what you think

You could use Amazon S3 Server Access Logging:
In order to track requests for access to your bucket, you can enable access logging. Each access log record provides details about a single access request, such as the requester, bucket name, request time, request action, response status, and error code, if any.

There is no automatic ability to limit the number of downloads via an Amazon S3 pre-signed URL.
A pre-signed URL limits access based upon time, but cannot limit based upon quantity.
The closest option would be to provide a very small time window for the pre-signed URL, with the assumption that only one download would happen within that time window.

Permissions on Amazon S3 objects without AWS account

Let's say that I want to create a simplistic version of Dropbox' website, where you can sign up and perform operations on files such as upload, download, delete, rename, etc. - pretty much like in this question. I want to use Amazon S3 for the storage of the files. This is all quite easy with the AWS SDK, except for one thing: security.
Obviously user A should not be allowed to access user B's files. I can kind of add "security through obscurity" by handling permissions in my application, but it is not good enough to have public files and rely on that, because then anyone with the right URL could access files that they should not be able to. Therefore I have searched and looked through the AWS documentation for a solution, but I have been unable to find a suitable one. The problem is that everything I could find relates to permissions based on AWS accounts, and it is not appropriate for me to create many thousand IAM users. I considered IAM users, bucket policies, S3 ACLs, pre-signed URLs, etc.
I could indeed solve this by authorizing everything in my application and setting permissions on my bucket so that only my application can access the objects, and then having users download files through my application. However, this would put increased load on my application, where I really want people to download the files directly through Amazon S3 to make use of its scalability.
Is there a way that I can do this? To clarify, I want to give a given user in my application access to only a subset of the objects in Amazon S3, without creating thousands of IAM users, which is not so scalable.

Have the users download the files with the help of your application, but not through your application.
Provide each link as a link the points to an endpoint of your application. When each request comes in, evaluate whether the user is authorized to download the file. Evaluate this with the user's session data.
If not, return an error response.
If so, pre-sign a download URL for the object, with a very short expiration time (e.g. 5 seconds) and redirect the user's browser with 302 Found and set the signed URL in the Location: response header. As long as the download is started before the signed URL expires, it won't be interrupted if the URL expires while the download is already in progress.
If the connection to your app, and the scheme of the signed URL are both HTTPS, this provides a substantial level of security against any unauthorized download, at very low resource cost.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js