AWS Lambda function, API gateway and ffmpeg timeout issue - amazon-web-services

I have created a lambda function, that is extracting the audio stream from a video file using ffmpeg. I have also configured API gateway as a trigger, where I am passing the file to the lambda function in the request body.
The lambda function is working perfectly well with small files, but with bigger files, it needs a bit more time and then I am running into the API gateway timeout, which according to my understanding is set to 29 seconds max.
So when I trigger audio extraction from a bigger file, I am hitting this timeout and my API request fails to return any result even though the transcoding still runs in the background and the file is extracted, so I was wondering what is the best approach to handle those cases, where the execution of the lambda function is taking longer?
I was thinking to start the transcoding in the background and simply return a JSON with a message that the transcoding might take a couple of minutes, depending on the input file duration, but if I try to push the ffmpeg to the background I am being presented with an error, that the destination file doesn't exist.
os.system(f"{ffmpeg} -loglevel panic -nostdin -i {in_video} -vn -c:a aac -ar 48000 -b:a 192K {out_audio} 2> /dev/null &")
This is the ffmpeg command extracting the audio and transcoding it to AAC.
If I remove the 2> /dev/null & part of the command, it runs just fine, but if I keep it, I get an error:
"errorMessage": "[Errno 2] No such file or directory: 'output_audio.aac'"
"errorType": "FileNotFoundError"
So I was wondering what is the preferred way to run processes in the background.

There are many options that can be considered.
But first, since you already have all the flow working with lambda behind API Gateway, you can use lambda url.
Lambda url are a good way to trigger lambda via HTTPS. It supports multiple authorization mechanism such as IAM.
The interesting point is about timeout. When using Lambda url, the maximum timeout you can have is 15 mins, which is definitely better than the 29s you have when dealing with API Gateway.
Lambda url is free of charge and can be enabled on existing lambda function.
Increasing the timeout might just push back the problem until you have a very big file to convert and in the long run, maybe worth exploring other solution like uploading the file to S3 and maybe use AWS Batch or Spin up an EC2 to process the file. This would require more architecture design and implementation though.

For longer processing, it is recommended to use asynchronous invocations, where the Lambda function is triggered and runs until completion and does not block the caller. One option to solve it would be to upload the file to S3, configure the Lambda function to react to the S3 event, download the file from S3, process it, and upload it to another S3 bucket after processing completes.

Related

LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413

I have node/express + serverless backend api which I deploy to Lambda function.
When I call an api, request goes to API gateway to lambda, lambda connects to S3, reads a large bin file, parses it and generates an output in JSON object.
The response JSON object size is around 8.55 MB (I verified using postman, running node/express code locally). Size can vary as per bin file size.
When I make an api request, it fails with the following msg in cloudwatch,
LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413
I can't/don't want to change this pipeline : HTTP API Gateway + Lambda + S3.
What should I do to resolve the issue ?
the AWS lambda functions have hard limits for the sizes of the request and of the response payloads. These limits cannot be increased.
The limits are:
6MB for Synchronous requests
256KB for Asynchronous requests
You can find additional information in the official documentation here:
https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html
You might have different solutions:
use EC2, ECS/Fargate
use the lambda to parse and transform the bin file into the desired JSON. Then save this JSON directly in an S3 public bucket. In the lambda response, you might return the client the public URL/URI/FileName of the created JSON.
For the last solution, if you don't want to make the JSON file visible to whole the world, you might consider using AWS Amplify in your client or/and AWS Cognito in order to give only an authorised user access to the file that he has just created.
As noted in other questions, API Gateway/Lambda has limits on on response sizes. From the discussion I read that latency is a concern additionally.
With these two requirements Lambda are mostly out of the question, as they need some time to start up (which can be lowered with provisioned concurrency) and do only have normal network connections (whereas EC2,EKS can have enhanced networking).
With this requirements it would be better (from AWS Point Of View) to move away from Lambda.
Looking further we could also question the application itself:
Large JSON objects need to be generated on demand. Why can't these be pre-generated asynchronously and then downloaded from S3 directly? Which would give you the best latency and speed and can be coupled with CloudFront
Why need the JSON be so large? Large JSONs also need to be parsed on the client side requiring more CPU. Maybe it can be split and/or compressed?

AWS Lambda Payload Response

I'm working on an application and our Back-end is written in .NET Web APi Core and Front-end in React. I make an endpoint which gets a JSON list and the size of the list is almost 83 MB. When I, deploy my back-end into AWS Lambda and call my endpoint it gives me an error (Error converting the response object of type Amazon.Lambda.APIGatewayEvents.APIGatewayProxyResponse from the Lambda function to JSON: Unable to expand the length of this stream beyond its capacity.: JsonSerializerException). I already check the Lambda payload response limit is 6 MB, (storing data into S3 from lambda endpoint and then call into Front-End will not work for me), so is there any way I can get that much data through Lambda.
Not with a single call, sorry, you cannot. As described in this link, the payload limit (for synchronous call) is, as you say, 6MB. Asynchronous calls have even lower limits.
I'd suggest you modify your UI/API to narrow your results first, or trap this error and alert the user (or calling service) that the payload is too large (and hence should be aborted, narrowed, or split into multiple calls).
In a single call we cannot retrieve more than 6.2 mb data from AWS lambda or through AWS API.Im also facing the same issue,Either we have to filter data or should do multiple calls

Aws lambda audio features extraction ( Not enough storage -Layers )

We have IOT sensors that uploads wav files into S3 Bucket.
We want to be able to extract sound features from each file that is getting uploaded (create obj event) with aws lambda
For that we need:
python librosa or pyAudio analysis package + numpy and scipy. (~ 240mb unzziped)
ffmpeg (~ 70mb unzziped)
As you can see there is no way to put them all together in same lambda package (250mb uncompressed max). And im getting an error when not including the ffmpeg in the layers when gathering the wav file:
[ERROR] FileNotFoundError: [Errno 2] No such file or directory: 'ffprobe': 'ffprobe'
which is related to ffmpeg.
We are looking for implementation recommendation, we thought about:
Putting the ffmpeg file in s3 and getting it every single invoke ( without having to put it in the layers. ( if it is even possible)
Chaining two lambdas: 1 for processing the input file through ffmpeg and puting the output file in abother bucket > 2 function invoked and extracting features from the processed data. ( using SNS / chaining mechanism) ( if it is even possible)
Move to EC2 where there we will have a problem with concurrent invokation accuring when two files uploads at the same time.
there has to be and easier way, ill be glad to hear for other opinions before diving into implementation,
Thank you all!
The scenario appears to be:
Files come in at random times
The files need to be processed, but not in real-time
The required libraries are too big for an AWS Lambda function
Suggested architecture:
Configure an Amazon S3 Event to send a message to an Amazon SQS queue when a file arrives
Configure an Amazon CloudWatch Event to trigger an AWS Lambda function at regular intervals (eg 1 hour)
The Lambda function checks whether there are messages in the queue
If there are messages, it launches an Amazon EC2 instance with a User Data script that installs and starts the processing system
The processing system will:
Grab a message from the queue
Process the message (without the limitations of Lambda)
Delete the message
If there are no messages left in the queue, it will terminate the EC2 instance
This can be very cost-effective because Amazon EC2 Linux instances are charged per-second. You can run several workers in parallel to process the messages (but be careful when writing the termination code, to ensure that all workers have finished processing messages). Or, if things are not time-critical, just choose the smallest usable Instance Type and single-thread it since larger instances cost more anyway (so they are no better from a cost-efficient standpoint).
Make sure you put monitoring in place to ensure that messages are being processed. Implement a Dead Letter Queue in Amazon SQS to catch messages that are failing to process and put a CloudWatch Alarm on the DLQ to notify you if things seem to be going wrong.

Workaround aws apigateway timeout with lambda - asynchronous processing

I have a serverless backend running on lambda. The runtime usually varies betweeen 40-250s which is over the apigateway max allowed runtime (29s). As such I think my only option is to resort to asynchronous processing. I get the idea behind it, but help online seems sparse and I'd like to know if there are any best practices out there? Or what would be the simplest way for me to get around this timeout problem–using asynchronous processing or other?
It really depends on your use case. But probably an asynchronous approach is best fitted for this scenario given that it's not usually a good idea from the calling side of your API to wait 250 seconds to get the reply back (probably that's why the 29s limitation on API Gateway).
Asynchronous simply means that you will be replying back from Lambda saying that you received the request and you are going to work on it but it will be available only later.
Then, you will be changing the logic on the client side, too, to check back after some time or perform some checks in a loop until the requested resource is ready.
Depending on what work needs to be done you could create an S3 bucket on the fly and reply back to the client with an S3 presigned URL. Then your worker will upload their results to the S3 bucket and the client will poll that bucket for the results until they are present.

aws lambda function triggering multiple times for a single event

I am using aws lambda function to convert uploaded wav file in a bucket to mp3 format and later move file to another bucket. It is working correctly. But there's a problem with triggering. When i upload small wav files,lambda function is called once. But when i upload a large sized wav file, this function is triggered multiple times.
I have googled this issue and found that it is stateless, so it will be called multiple times(not sure this trigger is for multiple upload or a same upload).
https://aws.amazon.com/lambda/faqs/
Is there any method to call this function once for a single upload?
Short version:
Try increasing timeout setting in your lambda function configuration.
Long version:
I guess you are running into the lambda function being timed out here.
S3 events are asynchronous in nature and lambda function listening to S3 events is retried atleast 3 times before that event is rejected. You mentioned your lambda function is executed only once (with no error) during smaller sized upload upon which you do conversion and re-upload. There is a possibility that the time required for conversion and re-upload from your code is greater than the timeout setting of your lambda function.
Therefore, you might want to try increasing the timeout setting in your lambda function configuration.
By the way, one way to confirm that your lambda function is invoked multiple times is to look into cloudwatch logs for the event id (67fe6073-e19c-11e5-1111-6bqw43hkbea3) occurrence -
START RequestId: 67jh48x4-abcd-11e5-1111-6bqw43hkbea3 Version: $LATEST
This event id represents a specific event for which lambda was invoked and should be same for all lambda executions that are responsible for the same S3 event.
Also, you can look for execution time (Duration) in the following log line that marks end of one lambda execution -
REPORT RequestId: 67jh48x4-abcd-11e5-1111-6bqw43hkbea3 Duration: 244.10 ms Billed Duration: 300 ms Memory Size: 128 MB Max Memory Used: 20 MB
If not a solution, it will at least give you some room to debug in right direction. Let me know how it goes.
Any event Executing Lambda several times is due to retry behavior of Lambda as specified in AWS document.
Your code might raise an exception, time out, or run out of memory. The runtime executing your code might encounter an error and stop. You might run out concurrency and be throttled.
There could be some error in Lambda which makes the client or service invoking the Lambda function to retry.
Use CloudWatch logs to find the error and resolving it could resolve the problem.
I too faced the same problem, in my case it's because of application error, resolving it helped me.
Recently AWS Lambda has new property to change the default Retry nature. Set the Retry attempts to 0 (default 2) under Asynchronous invocation settings.
For some in-depth understanding on this issue, you should look into message delivery guarantees. Then you can implement a solution using the idempotent consumers pattern.
The context object contains information on which request ID you are currently handling. This ID won't change even if the same event fires multiple times. You could save this ID for every time an event triggers and then check that the ID hasn't already been processed before processing a message.
In the Lambda Configuration look for "Asynchronous invocation" there is an option "Retry attempts" that is the maximum number of times to retry when the function returns an error.
Here you can also configure Dead-letter queue service
Multiple retry can also happen due read time out. I fixed with '--cli-read-timeout 0'.
e.g. If you are invoking lambda with aws cli or jenkins execute shell:
aws lambda invoke --cli-read-timeout 0 --invocation-type RequestResponse --function-name ${functionName} --region ${region} --log-type Tail --```payload {""} out --log-type Tail \
I was also facing this issue earlier, try to keep retry count to 0 under 'Asynchronous Invocations'.