I have a requirement whereby a client will need to send files that will need to saved onto a S3 bucket. I will also need to parse the JSON and identify them based on certain key-pair values in order to save each of them into a specific folder/sub-folder.
Essentially I will need to expose the S3 bucket as an endpoint. I have read that it is possible to do so in VPC (https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-s3.html).
However, this seems to be only working with IPV4 and most importantly I am struggling to see how I can have the "filtering logic" to parse and save the files into the correct folder. Ultimately, my question is.
Can I instead use the API - Gateway + Lambda function to meet my requirements?
Is there any potential alternative approach to it?
Thanks
You can expose an S3 bucket for your client, it's possible to setup it such a way that anyone would be able to upload files (of course, you can also setup an appropriate level of authentication).
Then, once an object is placed inside the bucket, S3 can be setup to trigger an AWS Lambda function, which will take that object, parse it and place into correct folder.
Related
I'm not sure if this is feasible. I'm looking for a way to limit the files that can be saved into an S3 bucket to a certain pattern (number.*, e.g. 2893.jpg or 18928.png). Can this be done thru IAM policy or is there another way?
Thanks
There is no native way to do this. Permissions can be assigned to upload to a particular Prefix (folder), but there is no method for specifying permitted characters in a Key (filename).
You would likely want to implement a frontend that verifies filenames and allows upload via a pre-signed URL.
See: Uploading Objects Using Presigned URLs - Amazon S3
I'm using S3 to store a bunch of confidential files for clients. The bucket can not have public access and only authenticated users can access these files.
This is my current idea
I'm using Cognito to authenticate the user and allow them to access API Gateway. When they make a request to the path /files, it directs the request to a lambda, which generates a signed url for every file that the user has access too. Then API Gateway returns the list of all these signed urls and the browser displays them.
Gathering a signed url for every file seems very inefficient. Is there any other way to get confidential files from S3 in one large batch?
A safer approach would be for your application to generate signed URLs, valid for a single request or period, and have your bucket accept only requests originating from CloudFront using an Origin Access Identity.
See the documentation for this at https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html
You say "Gathering a signed url for every file seems very inefficient", but the process of creating the Signed URL itself is very easy — just a few lines of code.
However, if there are many files, it would put a lot of work on your users to download each file individually.
Therefore, another approach could be:
Identify all the files they wish to download
Create a Zip of the files and store it in Amazon S3
Provide a Signed URL to the Zip file
Delete the Zip file later (since it is not required anymore), possibly by creating a lifecycle rule on a folder within the bucket
Please note that AWS Lambda functions have a disk storage limit of 500MB, which might not be enough to create the Zip file.
My idea was (is) to create an S3 bucket for allowing users to upload binary objects. The next step would be to confirm the upload and the API will then initiate processing of the file.
To make it more secure the client would first request an upload location. The API then allocates and pre-creates a one-time use directory on S3 for this upload, and sets access policy on that directory to allow a file to be dumped in there (but ideally not be read or even overwritten).
After confirmation by the client the API initiates processing and clean-up.
The problem I'm facing is authentication and authorisation. Simplest would be to allow public write with difficult-to-guess bucket directories, eg
s3://bucket/year/month/day/UUID/UUID/filename
Where the date is added in to allow clean-up later for orphaned files (and should volume grow to require it one can add hours/minutes.
The first UUID is not meaningful other than providing a unique upload location. The second identifies the user.
The entire path is created by the API. The API then allows the user access to write into that final directory. (The user should not be allowed to create this directory).
The question I'm stuck with is that from googling it seems that public writable S3 buckets is considered bad practice, even horribly so.
What alternative do I have?
a) provide the client with some kind of access token?
b) create an IAM account for every uploader (I do not want to be tied to Amazon this way)
c) Any other options?
P.S And is it possible to control the actual file name that the client can use to create a file from the policy?
From what I understand, your goals are to:
Securely allow users to upload specific files to an S3 bucket
Limit access by preventing users from reading or writing other files
Ideally, upload the files directly to S3 without going through your server
You can do this by generating presigned PUT URLs server-side and returning those URLs to the client. The client can use those URLs to upload directly to S3. The client is limited to only the filename you specify when signing the URL. It will be limited to PUT only. You keep your AWS access keys secure on the server and never send it to the client.
If you are using the PutObject API, you only need to sign one URL per file. If you are using the multi-part upload API, it's a bit more complicated and you'll need to start and finish the upload server-side and send presigned UploadPart URLs to the client.
To make files accessible to other S3 buckets, we either need to make the bucket public or enable cors configuration.
I have an HTML page in one public bucket which is hosted as a static website. In another bucket, I have mp3 files. This bucket is not public. From the first bucket, the HTML invokes a script.js file that tries to access the MP3 files in the second bucket using the resource URL. This is not directly possible and gives a 403 error. Hence, I wrote a CORS configuration for bucket-2 with the ARN of the first bucket in . Still, the script was unable to access the MP3 files. I also tried using the static website URL instead of ARN. Again got a 403 error. Is it possible to enable the script.js to access the mp3 files in bucket-2 without making bucket-2 public?
You have to understand that your javascript is run in the customer's browser window, hence this is the browser trying to access the mp3 file in your second bucket, not the first bucket.
Knowing that, there is no easy solution to solve your problem, beside opening access to the second bucket and using CORS as you tried (but CORS alone will not give access to the private bucket)
Proposal 1 : manually generated signatures
If you just want to give access to a couple of files in the second bucket (and not all files) I would recommend to include in your javascript a fully signed URL to the object in the second bucket. Signed URLs allow to access individual objects in a non public bucket, as per S3 documentation. However generating signatures is not trivial and requires a bit of code.
I wrote this command line utility to help you to generate a signature for a given object in a private bucket.
https://github.com/sebsto/s3sign
The AWS command line has also a presign option nowadays
https://docs.aws.amazon.com/cli/latest/reference/s3/presign.html
Also, signature are time-bounded and the maximum age is 7 days So if you choose this approach, you will need to re-generate your links every week. This is not very scalable but can be easy to automate.
Proposal 2 : dynamic signature generation on the web server
If you decide to move away from client-side Javascript and use server side generated pages instead (using Python, Ruby, PhP etc ... and a server) you can dynamically generate signatures from your server. The downside of this approach is that you will need a server.
Proposal 3 : dynamic signature generation, serverless
If you're familiar with AWS Lambda and API Gateway, you can create a serverless service that will dynamically return a signed URL to your MP3 file. Your static HTML page (or client side Javascript) will call the API Gateway URL, the API Gateway will call Lambda and Lambda, based on the path or query string, will return the appropriate signed URL for your MP3.
Proposal 2 and 3 have AWS costs associated to it (either to run an EC2 server, or for the API Gateway and Lambda execution time), so be sure to check AWS Pricing before choosing an option. (hint : Proposal 3 will be more cost effective)
The real question is WHY do you want to create this ? Why can't you have all your public content in the same bucket using fine grained S3 access policies when required.
I have a file with a pre-signed URL.
I would like to upload that file directly to my S3 bucket without donwloading it first (I know how to do it with the intermediate step but I want to prevent it).
Any suggestion?
Thanks in advance
There is not a method supported by S3 that will accomplish what you are trying to do.
S3 does not support a request type that says, essentially, "go to this url and whatever you fetch from there, save it into my bucket under the following key."
The only option here is to fetch what you want, and then upload it. If the objects are large, and you don't want to dedicate the necessary disk space, you could fetch it in parts from the origin and upload it in parts using multipart upload... or if you are trying to save bandwidth somewhere, even the very small t1.micro instance located in the same region as the S3 bucket will likely give you very acceptable performance for doing the fetch and upload operation.
The single exception to this is where you are copying an object from S3, to S3, and the object is under 5 GB in size. In this case, you send a PUT request to the target bucket, accompanied by:
x-amz-copy-source: /source_bucket/source_object_key
That's not quite a "URL" and I assume you do not mean copying from bucket to bucket where you own both buckets, or you would have asked this more directly... but this is the only thing S3 has that resembles the behavior you are lookng for at all.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html
You can't use the signed URL here... the credentials you use to send the PUT request have to have permission to both fetch and store.