DynamoDB sends multiple update event and my lambda is called multiple times

DynamoDB sends multiple update event and my lambda is called multiple times - amazon-web-services

I've got a DynamoDB in AWS and it has a trigger on it with an AWS Lambda connected.
I need to send an email when a status reaches a particular value and it's done checking the EventName=="MODIFY" and the newImage["Status"] value.
What currently happens is that the event is fired 3 times and so 3 emails are sent...
I thought to set a value on DynamoDB telling that I've already sent an email, but doing so another trigger is fired and I don't know if the time between events is enough to perform this update... anyone has got this issue before? how did you handle it?
Thanks

Related

AWS Lex fulfillment triggers Lambda function twice

I have a Lex bot whose fulfillment is set to my Lambda function (LF1), so everytime the intent is fulfilled, LF1 will be triggered and the slots parameters will be sent to LF1, which sends the data to SQS and then processed and send text msg via SNS. It works, but every time I finished the conversation with my bot, my phone receives two messages at the same time. After a careful look at CloudWatch, I found the LF1 was triggered twice every time the intent is fulfilled. They have the same message, but different request-id and different message-id. I really couldn't figure out where goes wrong. Please help!
lambda function log
detail of the first trigger
detail of the second trigger

AWS Lambda triggered twice for a sigle SQS Message

I have a system where a Lambda is triggered with event source as an SQS Queue.Each message gets our own internal unique id to differentiate between two requests .
Now lambda deletes the message from the queue automatically after sqs invocation and keeps the message in inflight while processing it so duplicate processing of a unique message should never occur ideally.
But when I checked my logs a message with the same unique id was processed within 100 milliseconds of the time frame of each other.
So This seems like two lambdas were triggered for one message and something failed at the end of aws it was either visibility timeout or something else.I have read online that few others have gone through the same situation.
Can anyone who has gone through the same situation explain how did they solve it or people with current scalable systems who don't have this kind of issue can help me out with the reasons why I could be having it ?
Note:- One single message was successfully executed Twice this wasn't the case of retry on failure.

I faced a similar issue, where a lambda (let's call it lambda-1) is triggered through a queue, and lambda-1 further invokes lambda-2 'synchronously' (https://docs.aws.amazon.com/lambda/latest/dg/invocation-sync.html) and the message basically goes to inflight and return back after visibility timeout expiry and triggers lambda-1 again. This goes on in a loop.
As per the link above:
"For functions with a long timeout, your client might be disconnected
during synchronous invocation while it waits for a response. Configure
your HTTP client, SDK, firewall, proxy, or operating system to allow
for long connections with timeout or keep-alive settings."
Making async calls in lambda-1 can resolve this issue. In the case above, invoking lambda-2 with InvocationType='Event' returns back, which in-turn deletes the item from queue.

How to debounce events on AWS grouped by a key?

Our frontend application sends user actions to a lambda function behind an API gateway, which then stores these actions in dynamodb.
We then use dynamodb streams to trigger a separate lambda function that'll parse these actions in dynamodb and decide if the user's actions should result in any notifications being sent (we call these notification events).
For example, if a user places a comment in our app, we'll store a "CREATED_COMMENT" action in dynamodb, which will then trigger a new lambda through a dynamodb stream. The new lambda may then create an "email notification event", which we may send to an email provider like customer.io
However, our users have informed us that they receive emails too frequently, and thus we'd like to start sending email digests aggregating multiple actions over time into a single email rather than sending an email for each action.
Our idea was to use something like AWS EventBridge, Kinesis, Step Functions, or even DynamoDB streams to resend the dynamodb stream actions to, but then configure the new stream's events to be grouped by email address and for these events to be debounced by e.g. 10 minutes. If the user then performs a new action, that user's stream will continue gathering actions for another 10 minutes, until there's been no new actions from that user for 10 minutes. Once that happens, the stream will "release" all gathered actions and invoke a lambda function. Our lambda function will then generate the email notification event and send it to e.g. customer.io.
However, we've been unable to find such grouping and debounced flushing configuration in any of the aforementioned AWS stream services. For such a common thing as digesting (or rolling up), shouldn't there be a serverless approach to doing this without having to write our own queueing service?

The answer to me seems like using a tool such as SQS. SQS will allow you to accumulate messages into a queue and every x minutes you can then read the queue using a Lambda function to do so on a schedule event. You do not need to have a Lambda triggered by SQS, and can still read the queue "manually" from within Lambda instead.

Gareth McCumskey is on the right track.
Use a normal sqs queue for strictly for debouncing.
Set a batch window, i.e 5 seconds. Use a really large batch size when you read from the queue.
In code, use a hashMap to group your message with the same messageId together. Now use your deduped messageIDs to do your work.

I wrote a blog post on something just like this. The short version of it is that it uses a scheduled Lambda function to identify the records that need to be processed.
The problem with using the delay in SQS is that you can only receive 10 messages at a time, so in order to get all the messages you'd have to call SQS repeatedly to clear the queue. At that point, you can aggregate the messages. This doesn't scale very well, as all the messages have to be read in order for it to work. By using DynamoDB you can actually have just one record that represents the collection of records, and query the single record, which then can result in a message in a queue for that specific group of messages. Consider the following data:
user | comment | time
user 1 | comment 1 | 11:43am
user 1 | comment 2 | 11:50am
user 2 | comment 1 | 11:51am
You can add another record that is a signal for the need to send a message for each user (in this example 15 minutes after the first message).
user | scheduled
user 1 | 11:58
user 2 | 12:06
When you insert the second set of records you are inserting the time when you want to send the batch. You only do the insert if there isn't a record already, so you don't end up constantly increasing the time. Your scheduled process reads that record to know what users it needs to send messages to and collects all the data for that user. The process of sending the messages to each user can be done in parallel (you could send a message the SQS for each user or use a Map state in a step function, for example).

What is the recommended way to fanout in SQS lambda environment?

I would like to send a push notification to users in my database in a lambda environment via SQS / messaging queue architecture, in order to do that
I would first need to query all users in my database with push notifications enabled.
loop over all of them them
send a SQS event/message for each user.
let my sqs triggered lambda handle/send the push notification
Is there a better way to implement this to avoid querying a big number of users and/or looping over all the results to send a SQS message for each?

I would take a slightly different approach here, but similar.
Query the database for the users
Loop over the users
Send one messages to SQS for a batch of records to send, and use the SendMessageBatch operation of SQS to send them. So batches of batches. Each batch of messages would have several "users" to send to, not just one. This will should increase your performance because a batch will require fewer lambda invocations.
Lambda handles SQS messages (probably more than one), and each SQS message results in sending many push notifications. In the case of Firebase I believe there is a way to send batches, which is even better. Even without that you can send several messages at once using a Promise.all type logic.
With this structure you can send a very large number of messages really quickly, and probably a lot cheaper. Imagine you need to send to 1M users. If you send batches of 100, in batches of 25 to SQS, then you have 2,500 messages per call to SQS. That would mean 400 calls to SQS, far better than even the 40K you'd have to make if you sent single messages in batches of 25.
On the receiving side, even if you throttled the SQS integration to 1 message per invocation you'd have 10,000 lambda invocations. If you assume even 1s per invocation, and 1000 concurrent invocations, it would take 10 seconds (likely less). If you send one message per user you'd have to make 1M lambda invocations. If you assume each invocation takes 100ms then you can send 10/second, so with 1000 concurrent executions it would take 100 seconds. In reality the numbers are probably even better than that for the batch version, especially if you don't limit it to 1 message at a time.
Edit
Based on the comments the question seemed to be a bit more about the first part of the process. With that in mind I'd suggest the following options.
If you find yourself needing to address the same large groups repeatedly most messaging services (Firebase and SNS for sure) support some sort of topic subscription model. Given that these are push notifications you can subscribe a device to the topic in code. What this ultimately leads to is one messages sent from your code to the messaging service. The service handles the rest. This is probably the preferred solution for anything that has mass recipients, especially if you can know the recipients up front. This even works for dynamic topics. For example, consider a situation where a person comments on a post. Any new comment on that post should send a message to everyone who has commented on that post. You can create a topic on the fly when the post is created, and add recipients to the topic as they comment. If a user wishes to stop receiving messages you can remove the user from the topic.
If you don't know the recipients up front the above solution is a solid solution. However, if you are concerned with Lambda timeouts on the first two steps I'd modify slightly. I would take advantage of AWS Step Functions and page the data in the lambda. Lambda will tell you, via the context object supplied in the invocation, how much time is remaining. You can check that periodically to determine if you should exit the lambda and pass to the step function the current paging information. The step function can pass that paging information back into the lambda, which should be coded to accept the paging information as part of the request, and continue from that point if supplied.

I would suggest an additional piece in your application architecture,
I personally prefer to avoid using the Primary database for heavy querying,
assuming you have a large user base.
I will suggest maintaining your user list in a Search Engine like ElasticSearch or CloudSearch, or a simple table with just the user list in AWS DynamoDb or create a Read Replica of your DB.
To no confuse you, use a Search Engine(first choice) or an AWS DynamoDb
This will avoid creating pressure on your database when you query the read specialty datastore and won't affect other modules in operation
And it's way fast to query this way
Step 2: loop over all of them them
Step 3: batch send messages to SQS using its SendMessageBatch method like Jason is suggesting
Step 4: Based on your SQS setting, you may process multiple messages on your Lambda function

Does SQS really send multiple S3 PUT object records per message?

I've set up an S3 bucket to emit an event on PUT object to SQS, and I'm handling the SQS queue in an EB worker tier.
The schema for the message that SQS sends is here: http://docs.aws.amazon.com/AmazonS3/latest/dev/notification-content-structure.html
Records is an array, implying that there can be multiple records sent in one POST to my worker's endpoint. Does this actually happen? Or will my worker only ever receive one record per message?
The worker can only return one response, either 200 (message handled successfully) or non-200 (message not handled successfully, which puts it back into the queue), regardless of how many records in the message it receives.
So if my worker receives multiple records in a message, and it handles some successfully (say by doing something with side effects such as inserting into a database) but fails on one or more, how should I handle that? If I return 200, then the ones that failed will not be retried. But if I return non-200, then the ones that were handled successfully will be retried unnecessarily, and possibly re-inserted. So I'd have to make my worker smart enough to retry only the failed ones -- which is logic I'd prefer not having to write.
This would be much easier if only one record was ever sent per message. So if that's the case in practice, despite records being an array, I'd really like to know!

To be clear, it's not the records that "SQS sends." It's the records that S3 sends to SQS (or to SNS, or to Lambda).
Currently, all S3 event notifications have a single event per notification message. We might include multiple records as we add new event types in the future. This is also a message format that is shared across other AWS services, and other services can include multiple records.
— https://forums.aws.amazon.com/thread.jspa?messageID=592264&#592264
So, for the moment, it appears there's only one record per message.
But... you are making a mistake if you assume your application need not be prepared to handle repeated or duplicate messages. In any massive and distributed system like SQS it is extremely difficult to absolutely guarantee that this can never happen, however unlikely:
Q: How many times will I receive each message?
Amazon SQS is engineered to provide “at least once” delivery of all messages in its queues. Although most of the time each message will be delivered to your application exactly once, you should design your system so that processing a message more than once does not create any errors or inconsistencies.
— http://aws.amazon.com/sqs/faqs/
Incidentally, in my platform, more than one entry in the records array is considered an error, causing the message to be abandoned and sent to the dead letter queue for review.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js