Can't Upload 5MB - 10MB image file to Lambda through API gateway - amazon-web-services

I am trying to upload 5MB image to aws lambda through api gateway.
I need to pass the file content as binary or buffer without any conversion. But API gateway converts the input as base64
by default and the converted base64 text is 7MB. As the data size is increased after base64 conversion lambda is not allowing that size.
How to prevent this automatic base64 conversion in API Gateway?
In AWS forum most of them were suggested to upload the file to s3 bucket and use that in lambda. But in my case i need to pass it directly to lambda without the help of S3. I have been at this for some weeks now....any help or insight is appreciated.

As documented, the maximum payload size for synchronous invocation (as from API Gateway) is 6 MB.
That means that, if you have larger payloads, you will need to break them up into multiple requests and combine those requests for processing. Which means that you need some form of storage to hold the pieces, and a way to link the pieces together.
If you need to upload a larger payload in a single request, and can't use an alternative such as uploading to S3 first, then Lambda isn't for you.

Related

AWS size limit on lambda output to be written to s3

I have to create a lambda that processes some payload and creates an output that is greater than the limit 6 MB for the response payload.
A way to solve this problem mentioned over various SO answers is to directly put the file on an s3.
But what these answers fail to mention is the upper limit of the output that can be saved into the s3 by the lambda. Is it because there isn't any limit?
I just want to confirm this before moving forward.
There are always limits. So yes, there is also a limit of object size in a S3 bucket. But before you hit that limit, you are going to hit other limits.
Here is the limit of uploading files using the API:
Using the multipart upload API, you can upload a single large object, up to 5 TB in size.
(Source)
But you are probably not going to be able to achieve this with a Lambda, since Lambdas have a maximum running time of 900 seconds. So even if you could upload a file at 1GB/s, you only would be able to upload 900GB before the Lambda stops.

Saving mp3 tag data to AWS DynamoDB via API Gateway

I tried to save MP3 tag metadata to DynamoDB via API Gateway with a Lambda proxy, but that fails on certain files with:
PayloadTooLargeError: request entity too large
The main culprit was the (often present) picture param, which includes a buffer array, which varies in size/length depending on the album art.
What I wound up doing converting the buffer array into a dataURL and storing that in S3, and referencing it in DynamoDB, which works, but results in a lot more API calls and more complexity than just storing the buffer array (converted to base64) in DynamoDB directly.
Has anyone successfully and consistently stored mp3 tag data, including cover art in DynamoDB via the API Gateway, and, if so, how? Or is using S3 the only way to fly with this?
Your question isn't really related to MP3, as it applies to all large data you want to pass through API Gateway.
API Gateway has a limit of 10MB for the payload size and there is no way of circumventing this limitation.
Even if you'd be able to pass the images through API Gateway, you wouldn't be able to store them in DynamoDB, as each item there has a size limit of 400KB.
Unless you're open to scaling the images down to <400KB before sending the request, I'm afraid your current solution with S3 to store the images is the best you can do.

Extract .gz files in S3 automatically

I'm trying to find a solution to extract ALB logs file in .gz format when they're uploaded automatically from ALB to S3.
My bucket structure is like this
/log-bucket
..alb-1/AWSLogs/account-number/elasticloadbalancing/ap-northeast-1/2018/log.gz
..alb-2/AWSLogs/account-number/elasticloadbalancing/ap-northeast-1/2018/log.gz
..alb-3/AWSLogs/account-number/elasticloadbalancing/ap-northeast-1/2018/log.gz
Basically, every 5 minutes, each ALB would automatically push logs to correspond S3 bucket. I'd like to extract new .gz files right at that time in same bucket.
Is there any ways to handle this?
I noticed that we can use Lambda function but not sure where to start. A sample code would be greatly appreciated!
Your best choice would probably be to have an AWS Lambda function subscribed to S3 events. Whenever a new object gets created, this Lambda function would be triggered. The Lambda function could then read the file from S3, extract it, write the extracted data back to S3 and delete the original one.
How that works is described in Using AWS Lambda with Amazon S3.
That said, you might also want to reconsider if you really need to store uncompressed logs in S3. Compressed files are not only cheaper, as they don't take as much storage space as uncompressed ones, but they are usually also faster to process, as the bottleneck in most cases is network bandwidth for transferring the data and not available CPU-resources for decompression. Most tools also support working directly with compressed files. Take Amazon Athena (Compression Formats) or Amazon EMR (How to Process Compressed Files) for example.

passing additional values to s3 event notification for lambda consumption

I have to write code in react-native that allows a user to upload videos to amazon s3 to be transcoded for consumption by various devices. For the processing after the upload occurs; I am reviewing two approaches:
1) I can use Lambda with ffmpeg to handle the transcoding immediately after the uploading occurs (my fear here would be the amount of time required to transcode the videos and the effect on pricing if it takes a considerable amount of time).
2) I can have s3 pass an sns message to a rest api after the created event occurs and the rest api generate a rabbitmq message that will be processed by worker that will perform the transcoding using ffmpeg.
Option 1) seems to be the preferable option based on a completion time perspective. How concerned should I be with using 1) considering how long video transcoding might take as opposed to option 2)?
Also, regardless, I need a way to pass additional parameters to lambda or along the sns messaging that would allow me to somehow associate the user who uploaded the video with their account. Is there a way to pass additional text-based values to s3 to pass along to lambda or along sns when the upload completes, as a caveat I plan to upload the video directly to s3 using the rest layer(found this answer here: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html#RESTObjectPUT-responses-examples)?
AWS provides a video transcoding service for exactly this type of thing. If you don't want to do that for some reason then you need to make sure you can complete your transcoding tasks in AWS Lambda in under 5 minutes. Not sure where the second option of using RabbitMQ and workers is coming from. Why RabbitMQ instead of SQS? Would workers be processes on EC2 servers instead of Lambda functions?
Regarding your other question, you need to pass those extra parameters as metadata fields on the S3 object. In the document you linked, look at how x-amz-meta- works. Then when you later retrieve the object from S3 to transcode it you can retrieve the metadata fields at the same time.

AWS S3 TransferUtility.UploadDirectoryRequest and HTTP PUT Limit

Here is a little background. I have designed a WEB API which provides methods for cloud operations(upload, download..). This API internally calls AWS API methods to fulfill these cloud operation requests.
WebDev solution --> WEB API --> AWS API
I am trying to upload a directory to AWS S3 cloud. This directory has large individual files more than 5 GB each. Amazon S3 has a limit of 5 GB for a single PUT operation. But They have provided a multi part upload mechanism using which files upto 5 TB can be uploaded.
The AWS documentation said the TransferUtility.UploadDirectory method will use multi-part upload mechanism to upload large files. So In my WEB API a [PUT] UploadDirectoryMethod method calls TransferUtility.UploadDirectory to upload the directory.
I am receiving an error "Amazon.S3.AmazonS3Exception: Your proposed upload exceeds the maximum allowed size"
Shouldn't the TransferUtility.UploadDirectory take care of breaking the larger objects (larger then 5GB) into parts and they uploading them with multiple PUT(?) operations?
How does Multi-part upload work internally for objects more then 5 GB in size? Does it create multiple PUT requests internally?