I have a aws lambda function which is invoked by API Gateway. The Lambda function calls external API endpoints and it sometime receives network time out while calling external API.
What is the best way to implement retry mechanism in aws lambda to handle network time out or other server side errors? Also is it good to use retry inside lambda function, which cost as per execution time?
Any recommendation is highly appreciated.
Regards
What is the best way to implement retry mechanism in aws lambda to handle network time out or other server side errors?
You can throw time out error further and don't handle it in your Lambda function, in this case your Lambda will be invoked again. Please note that it depends on configuration of your Lambda (i.e. number of retries set).
You can find more theory and practical examples here.
Also is it good to use retry inside lambda function, which cost as per execution time?
Lambda Retries are free for you (you pay only for Lambda execution and not for retry logic). Implementing your own retry approach inside Lambda is not free for you, because you pay for its execution.
You may increase the lambda execution time limiting.
You may track the execution time of the lambda, and when it close to limit, re-call your lambda with the same payload. I mean that lambda re-triggers itself until the external API would successfully response.
You can introduce SQS(Simple Queue System) to your system, where the messages for external API will be stored. And the some of consumer will send messages to external API. It may be an another lambda or some another running service which is responsible just for calling external API.
Related
I used to think aws lambda was best suited to handle background tasks which did not require immediate results. However more and more I have seen aws lambda being used to handle real-time requests as well, for example fetch users from a db in a http get.
API Gateway -> AWS Lambda -> Results
Is this a standard approach or is this the improper use of lambda ?
Use of API Gateway to provide a front end for the Lambda function invocation is the standard way of executing Lambda function code on the fly. If you are concerned about the cold starts on the function; and want to minize the latency, you can consider Provisioned Concurrency to keep 'n' active containers at a small cost.
Here is the situation I suffer.
Lambda has 1000 concurrency limit. (There is no reserved number)
100-200 Clients access to Lambda at a time.
Lambda still doesn't reach throttling in the figures(100-200).
However, Lambda returns a lot of 502 errors.
And I assume
The first time, any Lambda isn't up.
When Lambda receives a lot of requests, it starts scaling.
However, because of Lambda's cold start time, It takes time to execute enough concurrency to handle all requests and as the result, it returns the error(even if it is not reached to maximum concurrency execution number[1000])
Is my assumption correct? If so, is it inevitable situation?
I have read on warming Lambda by sending ping requests at regular interval to the Lambda.
However, It looks not to solve the above issue because the ping is sent to only one Lambda making only the Lambda is resued, causing the same issue when a lot of requests is received at a time.
------Edit------
About #M Mo's asking
How are the lambdas being invoked?
By API-getaway it is invoked.
If through api gateway, are you using proxy integration?
Yes, proxy integration.
Do the lambdas call any other resources?
Yes, The Lambda calls S3 resource to get objects.
What are the average response times of the lambdas?
it takes about 1 sec.
I have created a lambda function which is triggered through cloudwatch event cron.
While testing I found that lambda retry is not working in case of timeout.
I want to understand what is the expected behaviour.Should retry happen in case of timeout?
P.S I have gone through the document on the aws site but still can't figure out
https://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
Found the aws documentation on this,
"Error handling for a given event source depends on how Lambda is invoked. Amazon CloudWatch Events is configured to invoke a Lambda function asynchronously."
"Asynchronous invocation – Asynchronous events are queued before being used to invoke the Lambda function. If AWS Lambda is unable to fully process the event, it will automatically retry the invocation twice, with delays between retries."
So the retry should happen in this case. Not sure what was wrong with my lambda function , I just deleted and created again and retry worked this time.
Judging from the docs you linked to it seems that the lambda function is called again if it has timed out and the timeout is because it is waiting for another resource (i.e. is blocked by network):
The function times out while trying to reach an endpoint.
As a cron event is not stream based (if it is synchronous or asynchronous seems not be be clear from the docs) it will be retried.
CloudWatch Event invokes a Lambda function asynchronously.
For asynchronous invocation, Lambda manages the function's asynchronous event queue and attempts to retry two more times on errors including timeout.
https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html
So with the default configuration, your function should retry with timeout errors. If it doesn't, there might be some other reasons as follows:
The function doesn't have enough concurrency to run and events are throttled. Check function's reserved concurrency setting. It should be at least 1.
When above happens, events might also be deleted from the queue without being sent to the function. Check function's asynchronous invocation setting, make sure it has enough age to keep the events in the queue and retry attempts is not zero.
I've been playing with Lambda recently and am working on creating an API using API Gateway and Lambda. I have a lambda function in place that returns a JSON and an API Gateway endpoint that invokes the function. Everything works well with this simple setup.
I tried loadtesting the API gateway endpoint with the loadtest npm module. While Lambda processes the concurrent requests (albeit with an increase in mean latency over the course of execution), when I send it 40 requests per second or so, it starts throwing errors, only partially completing the requests.
I read in the documentation that by default, Lambda invocation is of type RequestResponse (which is what the API does right now) which is synchronous in nature, and it looks like it is non-blocking. For asynchronous invocation, the invocation type is Event. But lambda discards the return type for async invocations and the API returns nothing.
Is there something I am missing either with the sync, async or concurrency definitions in regards to AWS? Is there a better way to approach this problem? Any insight is helpful. Thank you!
You will have to use Synchronous execution if you want to get a return response from API Gateway. It doesn't make sense to use Async execution in this scenario. I think what you are missing is that while each Lambda execution is blocking, single threaded, there will be multiple instances of your function running in multiple Lambda server environments.
The default number of concurrent Lambda executions is fairly low, for safety reasons. This is to prevent you from accidentally writing a run-away Lambda process that would cost lots of money while you are still learning about Lambda. You need to request an increase in the Lambda concurrent execution limit on your account.
My current AWS Lambda function invokes another AWS Lambda function but I want to make sure that the invoke succeeded. After looking at concurrent execution limits for AWS Lambda I am trying to figure out what would happen if the concurrent limit is hit and I tried to invoke the Lambda from another Lambda.
For now, I am solving this problem by putting messages in an SNS but I rather prefer invoking Lambda directly avoiding the indirection.
The best way to handle the concurrent limit is to use a Kinesis stream rather than SNS.
The number of shards will limit the number of lambda invoked. And if it pertinent for you, you can take several messages at once, which you can't do with SNS, and that can lead to hit the concurrent limit.
Can you elaborate a little? Not sure I Understand what you are trying to achieve.
Lambda limits can be viewed under AWS console / EC2 page, top left corner has menu item called Limits, there you should see the limit.
When you hit the limit, lambda will stop being Invoked, and if my memory serves me right you will see an error in the logs saying something about limit being hit.
From the AWS Lambda FAQs:
Q: What happens if my account exceeds the default throttle limit on concurrent executions?
On exceeding the throttle limit, AWS Lambda functions being invoked
synchronously will return a throttling error (429 error code). Lambda
functions being invoked asynchronously can absorb reasonable bursts of
traffic for approximately 15-30 minutes, after which incoming events
will be rejected as throttled. In case the Lambda function is being
invoked in response to Amazon S3 events, events rejected by AWS Lambda
may be retained and retried by S3 for 24 hours. Events from Amazon
Kinesis streams and Amazon DynamoDB streams are retried until the
Lambda function succeeds or the data expires. Amazon Kinesis and
Amazon DynamoDB Streams retain data for 24 hours.
Inside the AWS Console you can always create a Service Limit Increase for AWS Lambda concurrent executions at no additional cost. This answer explains this in more detail.
I believe you're handling it correctly currently. I was just reading an article that was explaining how you shouldn't invoke lambda from another lambda because:
"If you do, the first will run until the second is finished executing, and you’re double billing yourself. Instead, use SNS or SQS to send a message to the other Lambda."
http://web.archive.org/web/20160713113906/http://www.appliedsoftwaredesign.com/archives/aws-lambda-pro-tips/