Using API Gateway to download already gzipped files from S3 - amazon-web-services

I have an API frontend to a few things, one of those is an S3 bucket containing lots of files.
When I setup a resource that integrates with my S3 bucket, this works perfectly fine for standard text data, but fails for files that are already gzipped.
How do I tell API Gateway to just pass through the gzipped file as a binary stream?
I need to use API Gateway for authentication, so I can't just get around it by using the s3 bucket to serve my files.
I also need gzip encoding turned on for nearly every other endpoint, so turning that off will cause other problems to already working endpoints.

The "other" working content - are there only text files or as well binary content files?
You have to set some parameter when handling binary content with the default (not http-only) API Gaeway. See https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-payload-encodings.html (this is only my assumption about the problem as you still failed to provide any repeatable and validable example, how the API Gateways is providing the content? As Lambda? Code? )
As well there's 10MB limit for API Gateway payload. If you want to return longer content (actually - I'd use it for all content), the API Gateway can return a pre-signed URL for S3 (or web distribution) so the client could download the content of any length from the S3 bucket (directly or through the CloudFront).

Related

uploading images to S3 using SDK skipping cloudfront

Setup:
We are running a E-commerce website consists of Cloudfront-->ALB-->EC2. we are serving the images from S3 via cloudfront behaviour.
Issue:
Our admin URL is like example.com/admin. We are uploading product images via admin panel as a zip file that goes via cloudfront.Each zip file size around 100MB-150MB consists of around 100 images. While uploading the zip file we are facing 502 gateway error from cloudfront since it took more than 30sec, which is default time out value for cloudfront.
Expected solution:
Is there a way we can skip the cloudfront for only uploading images?
Is there any alternate way increasing timeout value for cloudfront??
Note: Any recommended solutions are highly appreciated
CloudFront is a CDN service to help you speed up your services by caching your static files in edge location. So it won't help you in uploading side
In my opinion, for the uploading images feature, you should use the AWS SDK to connect directly with S3.
If you want to upload files directly to s3 from the client, I can highly suggest using s3 presigned URLs.
You create an endpoint in your API to create the presigned URL for a certain object (myUpload.zip), pass it back to the client and use that URL to do the upload. It's safe, and you won't have to expose any credentials for uploading. Make sure to set the expiration time to a reasonable time (one hour).
More on presigned URLs's here https://aws.amazon.com/blogs/developer/generate-presigned-url-modular-aws-sdk-javascript/

How to download hundreds of confidential files from S3?

I'm using S3 to store a bunch of confidential files for clients. The bucket can not have public access and only authenticated users can access these files.
This is my current idea
I'm using Cognito to authenticate the user and allow them to access API Gateway. When they make a request to the path /files, it directs the request to a lambda, which generates a signed url for every file that the user has access too. Then API Gateway returns the list of all these signed urls and the browser displays them.
Gathering a signed url for every file seems very inefficient. Is there any other way to get confidential files from S3 in one large batch?
A safer approach would be for your application to generate signed URLs, valid for a single request or period, and have your bucket accept only requests originating from CloudFront using an Origin Access Identity.
See the documentation for this at https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html
You say "Gathering a signed url for every file seems very inefficient", but the process of creating the Signed URL itself is very easy — just a few lines of code.
However, if there are many files, it would put a lot of work on your users to download each file individually.
Therefore, another approach could be:
Identify all the files they wish to download
Create a Zip of the files and store it in Amazon S3
Provide a Signed URL to the Zip file
Delete the Zip file later (since it is not required anymore), possibly by creating a lifecycle rule on a folder within the bucket
Please note that AWS Lambda functions have a disk storage limit of 500MB, which might not be enough to create the Zip file.

Using S3 for uploads but not allowing public access

My idea was (is) to create an S3 bucket for allowing users to upload binary objects. The next step would be to confirm the upload and the API will then initiate processing of the file.
To make it more secure the client would first request an upload location. The API then allocates and pre-creates a one-time use directory on S3 for this upload, and sets access policy on that directory to allow a file to be dumped in there (but ideally not be read or even overwritten).
After confirmation by the client the API initiates processing and clean-up.
The problem I'm facing is authentication and authorisation. Simplest would be to allow public write with difficult-to-guess bucket directories, eg
s3://bucket/year/month/day/UUID/UUID/filename
Where the date is added in to allow clean-up later for orphaned files (and should volume grow to require it one can add hours/minutes.
The first UUID is not meaningful other than providing a unique upload location. The second identifies the user.
The entire path is created by the API. The API then allows the user access to write into that final directory. (The user should not be allowed to create this directory).
The question I'm stuck with is that from googling it seems that public writable S3 buckets is considered bad practice, even horribly so.
What alternative do I have?
a) provide the client with some kind of access token?
b) create an IAM account for every uploader (I do not want to be tied to Amazon this way)
c) Any other options?
P.S And is it possible to control the actual file name that the client can use to create a file from the policy?
From what I understand, your goals are to:
Securely allow users to upload specific files to an S3 bucket
Limit access by preventing users from reading or writing other files
Ideally, upload the files directly to S3 without going through your server
You can do this by generating presigned PUT URLs server-side and returning those URLs to the client. The client can use those URLs to upload directly to S3. The client is limited to only the filename you specify when signing the URL. It will be limited to PUT only. You keep your AWS access keys secure on the server and never send it to the client.
If you are using the PutObject API, you only need to sign one URL per file. If you are using the multi-part upload API, it's a bit more complicated and you'll need to start and finish the upload server-side and send presigned UploadPart URLs to the client.

How to access files from one S3 bucket into another bucket without making them public?

To make files accessible to other S3 buckets, we either need to make the bucket public or enable cors configuration.
I have an HTML page in one public bucket which is hosted as a static website. In another bucket, I have mp3 files. This bucket is not public. From the first bucket, the HTML invokes a script.js file that tries to access the MP3 files in the second bucket using the resource URL. This is not directly possible and gives a 403 error. Hence, I wrote a CORS configuration for bucket-2 with the ARN of the first bucket in . Still, the script was unable to access the MP3 files. I also tried using the static website URL instead of ARN. Again got a 403 error. Is it possible to enable the script.js to access the mp3 files in bucket-2 without making bucket-2 public?
You have to understand that your javascript is run in the customer's browser window, hence this is the browser trying to access the mp3 file in your second bucket, not the first bucket.
Knowing that, there is no easy solution to solve your problem, beside opening access to the second bucket and using CORS as you tried (but CORS alone will not give access to the private bucket)
Proposal 1 : manually generated signatures
If you just want to give access to a couple of files in the second bucket (and not all files) I would recommend to include in your javascript a fully signed URL to the object in the second bucket. Signed URLs allow to access individual objects in a non public bucket, as per S3 documentation. However generating signatures is not trivial and requires a bit of code.
I wrote this command line utility to help you to generate a signature for a given object in a private bucket.
https://github.com/sebsto/s3sign
The AWS command line has also a presign option nowadays
https://docs.aws.amazon.com/cli/latest/reference/s3/presign.html
Also, signature are time-bounded and the maximum age is 7 days So if you choose this approach, you will need to re-generate your links every week. This is not very scalable but can be easy to automate.
Proposal 2 : dynamic signature generation on the web server
If you decide to move away from client-side Javascript and use server side generated pages instead (using Python, Ruby, PhP etc ... and a server) you can dynamically generate signatures from your server. The downside of this approach is that you will need a server.
Proposal 3 : dynamic signature generation, serverless
If you're familiar with AWS Lambda and API Gateway, you can create a serverless service that will dynamically return a signed URL to your MP3 file. Your static HTML page (or client side Javascript) will call the API Gateway URL, the API Gateway will call Lambda and Lambda, based on the path or query string, will return the appropriate signed URL for your MP3.
Proposal 2 and 3 have AWS costs associated to it (either to run an EC2 server, or for the API Gateway and Lambda execution time), so be sure to check AWS Pricing before choosing an option. (hint : Proposal 3 will be more cost effective)
The real question is WHY do you want to create this ? Why can't you have all your public content in the same bucket using fine grained S3 access policies when required.

Best choice of uploading files into S3 bucket

I have to upload video files into an S3 bucket from my React web application. I am currently developing a simple react application and from this application, I am trying to upload video files into an S3 bucket so I have decided two approaches for implementing the uploading part.
1) Amazon EC2 instance: From the front-end, I am hitting the API and the server is running in the Amazon EC2 instance. So I can upload the files into S3 bucket from the ec2 instance.
2) Amazon API Gateway + Lambda: I am directly sending the local files into an S3 bucket through API + Lambda function by calling the https URL with data.
But I am not happy with these two methods because both are more costly. I have to upload files into an S3 bucket, and the files are more than 200MB. I don't know I can optimize this uploading process. Video uploading part is necessary for my application and I should be very careful to do this part and also I have to increase the performance and cost-effective.
If someone knows any solution please share with me, I will be very helpful for me to continue my process.
Thanks in advance.
you can directly upload files from your react app to s3 using aws javascript sdk and cognito identity pools and for the optimization part you can use AWS multipart upload capability to upload file in multiple parts I'm providing links to read about it further
AWS javascript upload image example
cognito identity pools
multipart upload to S3
also consider a look at aws managed upload made for javascript sdk
aws managed upload javascript
In order to bypass the EC2, you can use a pre-authenticated POST request to directly upload you content from the browser to the S3 bucket.