I have a lambda function that sends http call to a API(Let's say 'A'). After getting response from 'A' Immediately return the stuff's to the caller i.e., (callback(null, success)) within 10secs. Then save the Data fetched from API 'A' to My External API(Let's Say 'B').
I tried like below but Lambda waits until event loop is empty(It is waiting for the response from second http call).
I doesn't want to set the eventLoopWaitEmpty to false since it freezes the eventloop and Execute next time when invoked.
request.get({url: endpointUrlA},
function (errorA, responseA, bodyA) {
callback(null, "success");
request.post({url: endpointUrlB,
body: bodyA,
json: true}, function(errorB, responseB, bodyB){
//Doesn't want to wait for this response
});
/* Also tried the callback(null, "success"); here too
});
Anybody have any thoughts on How can I implement this? Thanks!
PS - Btw I read the Previous similar questions doesn't seems to clear with those.
This seems like a good candidate for breaking up this lambda into two lambdas with some support code.
First lambda recieves request to 'A' and places a message onto SQS. It then returns to the caller the success status.
A separate process monitors the SQS queue and invokes a second Lambda on it when a message becomes available.
This has several benefits.
Firstly, you no longer have a long-running lambda waiting for a second system that may be down to return.
Secondly, you're doing things asynchronously in the background.
Take a look at this blog post for an overview of how this could work in practice.
Related
I have a dynamoDB stream which is triggering a lambda handler that looks like this:
let failedRequestId: string
await asyncForEachSerial(event.Records, async (record) => {
try {
await handle(record.dynamodb.OldImage, record.dynamodb.NewImage, record, context)
return true
} catch (e) {
failedRequestId = record.dynamodb.SequenceNumber
}
return false //break;
})
return {
batchItemFailures:[ { itemIdentifier: failedRequestId } ]
}
I have my lambda set up with a DestinationConfig.onFailure pointing to a DLQ I configured in SQS. The idea behind the handler is to process a batch of events and interrupt at the first failure. Then it reports the most recent failure in 'batchItemFailures' which tells the stream to continue at that record next try. (I pulled the idea from this article)
My current issue is that if there is a genuine failure of my handle() function on one of those records, then my exit code will trigger that record as my checkpoint for the next handler call. However the dlq condition doesn't ever trigger and I end up processing that record over and over again. I should also note that I am trying to avoid reprocessing records multiple times since handle() is not idempotent.
How can I elegantly handle errors while maintaining batching, but without triggering my handle() function more than once for well-behaved stream records?
I'm not sure if you have found the answer you were looking for. I'll respond in case someone else come across this issue.
There are 2 other parameters you'd want to use to avoid that issue. Quoting documentation (https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html):
Retry attempts – The maximum number of times that Lambda retries when the function returns an error. This doesn't apply to service errors or throttles where the batch didn't reach the function.
Maximum age of record – The maximum age of a record that Lambda sends to your function.
Basically, you'll have to specify how many time the failures should be retried and how far back in the events Lambda should be looking at.
This is more of a concern than a question, but still, has anyone experienced this before? Does anyone know how to prevent it?
I have a lambda function (L1) which calls a second lambda function (L2) all written in NodeJs (runtime: Node.Js 8.10, and aws-sdk should be v2.488.0 - but I'm just pasting that from the documentation). The short story is that L1 is supposed to call L2, and when it does L2 is executed twice! I discovered this by writing logs to CloudWatch and I could see one L1 log and two L2 logs.
Here's a simplified version of L1 and L2.
L1:
const AWS = require('aws-sdk');
const lambda = new AWS.Lambda();
module.exports = {
handler: async (event, context, callback) => {
const payload: { rnd: Math.random() };
const lambdaParams = {
FunctionName: 'L2',
Qualifier: `dev`,
Payload: JSON.stringify(payload),
};
console.log(`L1 calling: ${JSON.stringify(payload)}`);
return await lambda.invoke(lambdaParams).promise();
},
};
L2:
module.exports = {
handler: async (event, context, callback) => {
console.log(`L2 called: ${JSON.stringify(event)}`);
},
};
In CloudWatch I can see one L1 calling {"rnd": 0.012072353149807702} and two L2 called: {"rnd": 0.012072353149807702}!
BTW, this does not happen all the time. This is part of a step function process which was going to call L1 10k times. My code is written in a way that if L2 is executed twice (per one call), it will fail the whole process (because L2 inserts a record to DB only if it does not exist and fails if it does). So far, I managed to log this behaviour three times. All of them processing the same 10k items, facing the issue at a different iteration each time.
Does anyone have the same experience? Or even better, knows how to make sure one call leads to exactly one execution?
Your lambda function must be idempotent, because it can be called twice in different situations.
https://aws.amazon.com/premiumsupport/knowledge-center/lambda-function-idempotent/
https://cloudonaut.io/your-lambda-function-might-execute-twice-deal-with-it/
With 10K lambda invokes it must be experiencing a failure and doing a retry.
From the documentation:
Asynchronous Invocation – Lambda retries function errors twice. If the
function doesn't have enough capacity to handle all incoming requests,
events may wait in the queue for hours or days to be sent to the
function. You can configure a dead-letter queue on the function to
capture events that were not successfully processed. For more
information, see Asynchronous Invocation.
If this is what is a happening and you setup the dead letter queue you'll be able to isolate the failure event.
You can also use CloudWatch Logs Insights to easily and quickly search for errors messages of the lambda. Once you select the log group this query should help you get started. Just change the time window.
fields #timestamp, #message
| filter #message like /(?i)(Exception|error|fail|5\d\d)/
| sort #timestamp desc
| limit 20
One case that may cause this is that in your L2 lambda you didn't return anything, which will lead the L1 lambda (the caller) to think there is an error with L2 and so the Retry mechanism is triggered. Try to return something in L2, even simply an "OK".
In my case, it happened that when calling my second lambda, there was a try that was catching an exception with traceback, this triggered the lambda to retry the call, normally without traceback this does not happen, but when commenting the module it stopped happening.
Also within the try it had a condition that yes or yes it could fail, since it had to query for a resource with boto3, so if it existed there was no problem, but since there was no traceback it forced the general failure, not capturing it as an exception.
I am wondering something, and I really can't find information about it. Maybe it is not the way to go but, I would just like to know.
It is about Lambda working in batches. I know I can set up Lambda to consume batch messages. In my Lambda function I iterate each message, and if one fails, Lambda exits. And the cycle starts again.
I am wondering about slightly different approach
Let's assume I have three messages: A, B and C. I also take them in batches. Now if the message B fails (e.g. API call failed), I return message B to SQS and keep processing the message C.
Is it possible? If it is, is it a good approach? Because I see that I need to implement some extra complexity in Lambda and what not.
Thanks
There's an excellent article here. The relevant parts for you are...
Using a batchSize of 1, so that messages succeed or fail on their own.
Making sure your processing is idempotent, so reprocessing a message isn't harmful, outside of the extra processing cost.
Handle errors within your function code, perhaps by catching them and sending the message to a dead letter queue for further processing.
Calling the DeleteMessage API manually within your function after successfully processing a message.
The last bullet point is how I've managed to deal with the same problem. Instead of returning errors immediately, store them or note that an error has occurred, but then continue to handle the rest of the messages in the batch. At the end of processing, return or raise an error so that the SQS -> lambda trigger knows not to delete the failed messages. All successful messages will have already been deleted by your lambda handler.
sqs = boto3.client('sqs')
def handler(event, context):
failed = False
for msg in event['Records']:
try:
# Do something with the message.
handle_message(msg)
except Exception:
# Ok it failed, but allow the loop to finish.
logger.exception('Failed to handle message')
failed = True
else:
# The message was handled successfully. We can delete it now.
sqs.delete_message(
QueueUrl=<queue_url>,
ReceiptHandle=msg['receiptHandle'],
)
# It doesn't matter what the error is. You just want to raise here
# to ensure the trigger doesn't delete any of the failed messages.
if failed:
raise RuntimeError('Failed to process one or more messages')
def handle_msg(msg):
...
For Node.js, check out https://www.npmjs.com/package/#middy/sqs-partial-batch-failure.
const middy = require('#middy/core')
const sqsBatch = require('#middy/sqs-partial-batch-failure')
const originalHandler = (event, context, cb) => {
const recordPromises = event.Records.map(async (record, index) => { /* Custom message processing logic */ })
return Promise.allSettled(recordPromises)
}
const handler = middy(originalHandler)
.use(sqsBatch())
Check out https://medium.com/#brettandrews/handling-sqs-partial-batch-failures-in-aws-lambda-d9d6940a17aa for more details.
As of Nov 2019, AWS has introduced the concept of Bisect On Function Error, along with Maximum retries. If your function is idempotent this can be used.
In this approach you should throw an error from the function even if one item in the batch is failing. AWS with split the batch into two and retry. Now one half of the batch should pass successfully. For the other half the process is continued till the bad record is isolated.
Like all architecture decisions, it depends on your goal and what you are willing to trade for more complexity. Using SQS will allow you to process messages out of order so that retries don't block other messages. Whether or not that is worth the complexity depends on why you are worried about messages getting blocked.
I suggest reading about Lambda retry behavior and Dead Letter Queues.
If you want to retry only the failed messages out of a batch of messages it is totally doable, but does add slight complexity.
A possible approach to achieve this is iterating through a list of your events (ex [eventA, eventB, eventC]), and for each execution, append to a list of failed events if the event failed. Then, have an end case that checks to see if the list of failed events has anything in it, and if it does, manually send the messages back to SQS (using SQS sendMessageBatch).
However, you should note that this puts the events to the end of the queue, since you are manually inserting them back.
Anything can be a "good approach" if it solves a problem you are having without much complexity, and in this case, the issue of having to re-execute successful events is definitely a problem that you can solve in this manner.
SQS/Lambda supports reporting batch failures. How it works is within each batch iteration, you catch all exceptions, and if that iteration fails add that messageId to an SQSBatchResponse. At the end when all SQS messages have been processed, you return the batch response.
Here is the relevant docs section: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
To use this feature, your function must gracefully handle errors. Have your function logic catch all exceptions and report the messages that result in failure in batchItemFailures in your function response. If your function throws an exception, the entire batch is considered a complete failure.
To add to the answer by David:
SQS/Lambda supports reporting batch failures. How it works is within each batch iteration, you catch all exceptions, and if that iteration fails add that messageId to an SQSBatchResponse. At the end when all SQS messages have been processed, you return the batch response.
Here is the relevant docs section: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
I implemented this, but a batch of A, B and C, with B failing, would still mark all three as complete. It turns out you need to explicitly define the lambda event source mapping to expect a batch failure to be returned. It can be done by adding the key of FunctionResponseTypes with the value of a list containing ReportBatchItemFailures. Here is the relevant docs: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
My sam template looks like this after adding this:
Type: SQS
Properties:
Queue: my-queue-arn
BatchSize: 10
Enabled: true
FunctionResponseTypes:
- ReportBatchItemFailures
I have this scenario where I have a WebApi and an endpoint that when triggered does a lot of work (around 2-5min). It is a POST endpoint with side effects and I would like to limit the execution so that if 2 requests are sent to this endpoint (should not happen, but better safe than sorry), one of them will have to wait in order to avoid race conditions.
I first tried to use a simple static lock inside the controller like this:
lock (_lockObj)
{
var results = await _service.LongRunningWithSideEffects();
return Ok(results);
}
this is of course not possible because of the await inside the lock statement.
Another solution I considered was to use a SemaphoreSlim implementation like this:
await semaphore.WaitAsync();
try
{
var results = await _service.LongRunningWithSideEffects();
return Ok(results);
}
finally
{
semaphore.Release();
}
However, according to MSDN:
The SemaphoreSlim class represents a lightweight, fast semaphore that can be used for waiting within a single process when wait times are expected to be very short.
Since in this scenario the wait times may even reach 5 minutes, what should I use for concurrency control?
EDIT (in response to plog17):
I do understand that passing this task onto a service might be the optimal way, however, I do not necessarily want to queue something in the background that still runs after the request is done.
The request involves other requests and integrations that take some time, but I would still like the user to wait for this request to finish and get a response regardless.
This request is expected to be only fired once a day at a specific time by a cron job. However, there is also an option to fire it manually by a developer (mostly in case something goes wrong with the job) and I would like to ensure the API doesn't run into concurrency issues if the developer e.g. double-sends the request accidentally etc.
If only one request of that sort can be processed at a given time, why not implement a queue ?
With such design, no more need to lock nor wait while processing the long running request.
Flow could be:
Client POST /RessourcesToProcess, should receive 202-Accepted quickly
HttpController simply queue the task to proceed (and return the 202-accepted)
Other service (windows service?) dequeue next task to proceed
Proceed task
Update resource status
During this process, client should be easily able to get status of requests previously made:
If task not found: 404-NotFound. Ressource not found for id 123
If task processing: 200-OK. 123 is processing.
If task done: 200-OK. Process response.
Your controller could look like:
public class TaskController
{
//constructor and private members
[HttpPost, Route("")]
public void QueueTask(RequestBody body)
{
messageQueue.Add(body);
}
[HttpGet, Route("taskId")]
public void QueueTask(string taskId)
{
YourThing thing = tasksRepository.Get(taskId);
if (thing == null)
{
return NotFound("thing does not exist");
}
if (thing.IsProcessing)
{
return Ok("thing is processing");
}
if (!thing.IsProcessing)
{
return Ok("thing is not processing yet");
}
//here we assume thing had been processed
return Ok(thing.ResponseContent);
}
}
This design suggests that you do not handle long running process inside your WebApi. Indeed, it may not be the best design choice. If you still want to do so, you may want to read:
Long running task in WebAPI
https://blogs.msdn.microsoft.com/webdev/2014/06/04/queuebackgroundworkitem-to-reliably-schedule-and-run-background-processes-in-asp-net/
I have a piece of code which makes multiple nested calls to FB.api to retrieve certain information. Eventually, it creates an object called "myfriends" and stores my desired information in that object.
What I want to do is to use that object, after it is filled in with data (i.e. after all asynch calls are done), to run something else. In other words, I need a way for my code to know that those calls are complete. How can I do that?
call 'myfriends' after async request has completed.
Example:
FB.api('/me', function(response) {
alert('Your name is ' + response.name);
// Use 'myfriends' object here
});
I ended up using callback functions. The other problem I had was that my inner API call was in a loop; I ended up using an asynchronous looping function. This combination solved my problem.