Is there a way to pass along values between retries in Lambda? - amazon-web-services

For instance Try one failed, Can we pass few parameters to to event object of the next retry by something like below?
event.somevariable = somevalue
If we want do something like that what could be our options?

I'm not a fan of Lambda retries. They are run exactly the same as the initial call and if it failed the first time, it will fail on both of the subsequent retries. What changes?
I'm going to assume that you want to pass along a variable to track which retry is being executed and potentially make changes so that the subsequent retries do succeed - this does make sense. However, unfortunately, you need to look outside of lambda to make this happen.
DynamoDB is one method which is commonly used, to track the event ID and number of executions however I personally find that to be a faff.
I'd rather use Amazon SNS to ping a HTTP endpoint on failure, then re-execute my lambda function with different parameters. Just be mindful (in all cases) of idempotency. You should be able to re-execute a lambda multiple times without it causing issues or overwriting what was intended to happen.

There's no way to do that directly in AWS.
You could use the request ID as a primary key in a DynamoDB table where you store that value, and always look for those values in DynamoDB at the start of a request.

Related

Rate Exceeded on AWS Lambda Using API Gateway and serverless framework

When I try to invoke a method that has a HTTP event it results in 500 Internal server error.
On CloudWatch logs it shows Recoverable error occurred (Rate Exceeded.)
When I try invoke a function without lambda it executes with response.
Here is my serverless config:
You have set your Lambda's reservedConcurrency to 0. This will prevent your Lambda from ever being invoked. Setting it to 0 is usually useful when your functions are getting invoked but you're not sure why and you want to stop it right away.
If you want to have it invoked, change reservedConcurrency to a positive integer (by default, it can be a positive integer <= 1000, but you can increase this limit by contacting AWS) or simply remove the reservedConcurrency attribute from your .yml file as it will use the default values.
Why would one ever use reservedConcurrency anyways? Well, let's say your Lambda functions are triggered by requests from API Gateway. Let's say you get 400 (peak hours) requests/second and, upon every request, two other Lambda functions are triggered, one to generate a thumbnail for a given image and one to insert some metadata in DynamoDB. You'd have, in theory, 1200 Lambda functions running at the same time (given all of your Lambda functions finish their execution in less than a second). This would lead to throttling as the default concurrent execution for Lambda functions is 1000. But is the thumbnail generation as important as the requests coming from API Gateway? Very likely not as it's naturally an eventually consistent task, so you could set reservedConcurrency on the thumbnail Lambda to only 200, so you wouldn't use up your concurrency, meaning other functions would be able to spin up to do something more useful at a given point in time (in our example, receiving HTTP requests is more important than generating thumbnails). The other 800 left concurrency could then be split between the function triggered from API Gateway and the one that inserts data into DynamoDB, thus preventing throttling for the important stuff and keeping the not-so-important-stuff eventually consistent.

How to clean up state post lambda time out

We use APIG and Lambda to process long running jobs. These jobs have an id which needs to unique. In order to capture duplicate job submissions /createJob Lambda checks an job exists (if not adds an entry into db) and requests to schedule that job.
We had an issue where an entry was made into db but before the request could be sent the lambda (Which executes /createJob) terminated. We believe it was due some network latency.
Though its an rare event, wanted to check that are available mechanisms for rollback (i.e. delete that entry from db in case the lambda fails to execute)
Even though you have not shared the lambda code, I think it is better to commit to the DB after the major steps in lambda are completed rather than before everything else. This way, if there is a failure for some reason, there will be no entry made to the DB and you don't need to rollback anything.

Make Lambda function execute now, and/or in an hour

I'm trying to implement an AWS Lambda function that should send an HTTP request. If that request fails (response is anything but status 200) I should wait another hour before retrying (longer that the Lambda stays hot). What the best way to implement this?
What comes to mind is to persist my HTTP request in some way and being able to trigger the Lambda function again in a specified amount of time in case of a persisted HTTP request. But I'm not completely sure which AWS service that would provide that functionality for me. Is SQS an option that can help here?
Or, can I dynamically schedule Lambda execution for this? Note that the request to be retried should be identical to the first one.
Any other suggestions? What's the best practice for this?
(Lambda function is my option. No EC2 or such things are possible)
You can't directly trigger Lambda functions from SQS (at the time of writing, anyhow).
You could potentially handle the non-200 errors by writing the request data (with appropriate timestamp) to a DynamoDB table that's configured for TTL. You can use DynamoDB Streams to detect when DynamoDB deletes a record and that can trigger a Lambda function from the stream.
This is obviously a roundabout way to achieve what you want but it should be simple to test.
As jarmod mentioned, you cannot trigger Lambda functions directly by SQS. But a workaround (one I've used personally) would be to do the following:
If the request fails, push an item to an SQS Delay Queue (docs)
This SQS message will only become visible on the queue after a certain delay (you mentioned an hour).
Then have a second scheduled lambda function which is triggered by a cron value of a smaller timeframe (I used a minute).
This second function would then scan the SQS queue and if an item is on the queue, call your first Lambda function (either by SNS or with the AWS SDK) to retry it.
PS: Note that you can put data in an SQS item, since you mentioned you needed the lambda functions to be identical you can store your first function's input in here to be reused after an hour.
I suggest that you take a closer look at the AWS Step Functions for this. Basically, Step Functions is a state machine that allows you to execute a Lambda function, i.e. a task in each step.
More information can be found if you log in to your AWS Console and choose the "Step Functions" from the "Services" menu. By pressing the Get Started button, several example implementations of different Step Functions are presented. First, I would take a closer look at the "Choice state" example (to determine wether or not the HTTP request was successful). If not, then proceed with the "Wait state" example.

Api Gateway Api Key immediate use upon creation giving forbidden

Application creates an api key on a per user basis, meaning the process is as follows:
Lambda function creates api key and adds to a usage plan
Api key value is returned from lambda function
Api key is then immediately used to call an Api Gateway end point
Forbidden message is returned
If I delay execution between api key creation and the http request to the api gateway end point (by around 5 seconds), then it works as intended, but less than that I get an error.
I suspect that the api key takes a few seconds to propagate to the endpoint but I can't find an AWS API method that correctly lets me know when it has done so. Has anyone come across this problem before and how did you solve it?
The best solution I have at the moment is to retry the api call on a sliding timeout until an unreasonable amount of time has passed.
How long should I wait after applying an AWS IAM policy before it is valid? is not the same question but seems likely to be similar in its underlying explanation -- it's not so much a case of the API key taking time to exist but rather taking time to propagate and become visible at every possible place where it might need to exist before being valid for any subsequent request.
If those assumptions are correct, there is no mechanism for authoritatively determining whether the key is ready for use or not, because for some period of time after the key creation request succeeds, it's in a situation arguably reminiscent of Schrödinger's cat -- the key both exists and doesn't exist -- you don't know until you try it, and (unlike the cat) even a successful test does not necessarily prove that it is fully ready for use, because of the possibility (however unlikely) of a result such as fail fail fail fail pass fail pass pass pass. Such is the characteristic behavior of many large-scale, distributed systems.
From comments:
If an API call returns the api key value then I would expect it to be able to be used instantly, or at least return only when the key has been propagated fully to the end points.
That makes sense on the surface, but it becomes problematic in implementation. What if one of the endpoints is failed, offline for maintenance, or in the middle of recovering from an outage and lagging... what then? Fail the request? Delay the response waiting for something statistically unlikely to impact you?
The resource cost of observing replication tends to outweigh the benefits in many cases and can destabilize the control plane of a system if a replication issue causes a sufficient backlog, and is often not implemented except in cases where it has a high value, viz. the GetChange action in Route 53 which allows you to verify the propagation of a change through the system -- and note that even in this case, the change request itself succeeds without waiting -- if you need to verify the sync state, you have to ask separately.
A lot of AWS services take time to create. Usually there is a way to detect if the job has been completed. In this case it looks like you get a forbidden response until the key is created.
I think you will have to handle this in your client.

AWS Lambda faster process way

Currently, I'm implementing a solution based on S3, Lambda and DynamoDB.
My use case is, when a new object is uploaded on S3, a first Lambda function is called, downloads the new file, splits it in around 100(or more) parts and for each of them, adds additional information. Next step, each part will be processed by second Lambda function and in some case an insert will be performed in DynamoDB.
My question is only about the best way to call the "second lambda". I mean, the faster way. I want to execute 100 Lambda function(if I'd 100 parts to process) at the same time.
I know there are different possibilities:
1) My first Lambda function can push each part as an item in a Kinesis stream and my second Lambda function will react, retrieve an item and processed it. In this case I don't know if AWS will launch a new Lambda function each time there is a remaining item in the stream. Maybe there is some limitation...
2) My first Lambda function can push each part in an SNS topic and then my second Lambda will react to each new message. In this case I've some doubts about the latency(time between the action to send a message through the SNS topic and the time to my second Lambda function to be executed).
3) My first Lambda function can launch directly the second one by performing an API call and by passing the information. In this case I have no idea if I can launch 100 Lambdas function at the same time. I think I'll be stuck by a rate limitation against the AWS API(I said, I think!)
Somebody have a feedback and maybe advises regarding my use case? One more time, the most important for me it's to have the faster process way.
Thanks
Lambda limits are in place to provide some sane defaults but many workloads quickly exceed them. You can request an increase so this will not be a bottleneck for your use case. This document describes the process:
http://docs.aws.amazon.com/lambda/latest/dg/limits.html
I'm not sure how much latency your use case can tolerate but I often use SNS to fan out and the latency is usually sub-second to the next invocation (unless it's Java/coldstart).
If latency is extremely sensitive then you'd probably want to invoke Lambdas directly using Invoke with the InvocationType set to "Event". This would minimize blocking while you Invoke 100 times. You could also thread these Invoke calls within your main Lambda function to further increase parallelism if you want to hyper-optimize.
Cold containers will occasionally cause latency in your invocations. If milliseconds count this can become tricky. People who are trying to hyper-optimize Lambda processing times will sometimes schedule executions of their Lambda function with a "heartbeat" event that returns immediately (so processing time is cheap). These containers will remain "warm" for a small period of time which allows them to pick up your events without incurring "cold startup" time. Java containers are much slower to spin up cold than Node containers (I assume Python is probably equally fast as Node though I haven't tested).