I'm working on an event-based email notification service. An email is sent on every single event at the moment. The problem with this approach is sometimes there are too many events for a short period of time, thus the user gets too many emails at once. Instead of this, I'd like to throttle and group these emails by user.
I'm looking at AWS SQS to pipe these events into it and somehow consume them with a lambda that picks and groups the ones ready (ready=been there for at least 3 minutes) to be sent. Is there a built-in solution? Can I tag events with user ID and on the consumer side I pick the latest one, check if it's been there for 3 minutes, if so I pick all the remaining ones...? Just and idea...
Related
We have several consumer processes which poll from a standard SQS and process the message. Each message is associated with a user. For each user, we can process 100 messages per minute. Beyond that, the API which we are using for processing would start giving 500 errors.
Now since the Queue contains messages for other users, we can't cherry-pick those users since they have their quota under the limit.
One solution to this is using FIFO and implementing message groups. But FIFO has a peculiar limitation.
You can have a maximum of 20,000 in-flight messages
This would have been completely fine, but the issue is that when a message is in flight from a message group, SQS adds the count of all the messages in that group to the in-flight count.
This article explains more in detail:
https://tomgregory.com/3-surprising-facts-about-aws-sqs-fifo-queues/#:~:text=A%20FIFO%20queue%20has%20a%20maximum%20inflight%20message%20limit%20of%2020%2C000.
In this article read "20,000 message buffer" header. That might explain what's happening.
https://aws.amazon.com/premiumsupport/knowledge-center/sqs-message-backlog/
The second solution which I could think of is to make the producer of the microservice smart. But in our case, the producer is a completely different microservice. And the owners of that microservice hardly listen.
We definitely want our consumers to scale to provide minimum wait time to each user but can't because of the above reasons.
I genuinely feel SQS was not the correct choice for this design, but can't convince my superiors of the same.
Is there a way we can overcome this situation or did we hit a dead end?
This would have been completely fine, but the issue is that when a
message is in flight from a message group, SQS adds the count of all
the messages in that group to the in-flight count
I do not think this is the case for a FIFO. I am using a FIFO where I process one message at a time per consumer(There are 3 of them). There are SQS messages from the same message group, but the inflight message count for me is always 3, i.e each of the 3 consumers processing one of them. When either of them processes the message, and the processing time for each SQS message is variable here, it picks up the next one in the queue. The inflight messages count remains 3 all the time.
Our frontend application sends user actions to a lambda function behind an API gateway, which then stores these actions in dynamodb.
We then use dynamodb streams to trigger a separate lambda function that'll parse these actions in dynamodb and decide if the user's actions should result in any notifications being sent (we call these notification events).
For example, if a user places a comment in our app, we'll store a "CREATED_COMMENT" action in dynamodb, which will then trigger a new lambda through a dynamodb stream. The new lambda may then create an "email notification event", which we may send to an email provider like customer.io
However, our users have informed us that they receive emails too frequently, and thus we'd like to start sending email digests aggregating multiple actions over time into a single email rather than sending an email for each action.
Our idea was to use something like AWS EventBridge, Kinesis, Step Functions, or even DynamoDB streams to resend the dynamodb stream actions to, but then configure the new stream's events to be grouped by email address and for these events to be debounced by e.g. 10 minutes. If the user then performs a new action, that user's stream will continue gathering actions for another 10 minutes, until there's been no new actions from that user for 10 minutes. Once that happens, the stream will "release" all gathered actions and invoke a lambda function. Our lambda function will then generate the email notification event and send it to e.g. customer.io.
However, we've been unable to find such grouping and debounced flushing configuration in any of the aforementioned AWS stream services. For such a common thing as digesting (or rolling up), shouldn't there be a serverless approach to doing this without having to write our own queueing service?
The answer to me seems like using a tool such as SQS. SQS will allow you to accumulate messages into a queue and every x minutes you can then read the queue using a Lambda function to do so on a schedule event. You do not need to have a Lambda triggered by SQS, and can still read the queue "manually" from within Lambda instead.
Gareth McCumskey is on the right track.
Use a normal sqs queue for strictly for debouncing.
Set a batch window, i.e 5 seconds. Use a really large batch size when you read from the queue.
In code, use a hashMap to group your message with the same messageId together. Now use your deduped messageIDs to do your work.
I wrote a blog post on something just like this. The short version of it is that it uses a scheduled Lambda function to identify the records that need to be processed.
The problem with using the delay in SQS is that you can only receive 10 messages at a time, so in order to get all the messages you'd have to call SQS repeatedly to clear the queue. At that point, you can aggregate the messages. This doesn't scale very well, as all the messages have to be read in order for it to work. By using DynamoDB you can actually have just one record that represents the collection of records, and query the single record, which then can result in a message in a queue for that specific group of messages. Consider the following data:
user | comment | time
user 1 | comment 1 | 11:43am
user 1 | comment 2 | 11:50am
user 2 | comment 1 | 11:51am
You can add another record that is a signal for the need to send a message for each user (in this example 15 minutes after the first message).
user | scheduled
user 1 | 11:58
user 2 | 12:06
When you insert the second set of records you are inserting the time when you want to send the batch. You only do the insert if there isn't a record already, so you don't end up constantly increasing the time. Your scheduled process reads that record to know what users it needs to send messages to and collects all the data for that user. The process of sending the messages to each user can be done in parallel (you could send a message the SQS for each user or use a Map state in a step function, for example).
I would like to send a push notification to users in my database in a lambda environment via SQS / messaging queue architecture, in order to do that
I would first need to query all users in my database with push notifications enabled.
loop over all of them them
send a SQS event/message for each user.
let my sqs triggered lambda handle/send the push notification
Is there a better way to implement this to avoid querying a big number of users and/or looping over all the results to send a SQS message for each?
I would take a slightly different approach here, but similar.
Query the database for the users
Loop over the users
Send one messages to SQS for a batch of records to send, and use the SendMessageBatch operation of SQS to send them. So batches of batches. Each batch of messages would have several "users" to send to, not just one. This will should increase your performance because a batch will require fewer lambda invocations.
Lambda handles SQS messages (probably more than one), and each SQS message results in sending many push notifications. In the case of Firebase I believe there is a way to send batches, which is even better. Even without that you can send several messages at once using a Promise.all type logic.
With this structure you can send a very large number of messages really quickly, and probably a lot cheaper. Imagine you need to send to 1M users. If you send batches of 100, in batches of 25 to SQS, then you have 2,500 messages per call to SQS. That would mean 400 calls to SQS, far better than even the 40K you'd have to make if you sent single messages in batches of 25.
On the receiving side, even if you throttled the SQS integration to 1 message per invocation you'd have 10,000 lambda invocations. If you assume even 1s per invocation, and 1000 concurrent invocations, it would take 10 seconds (likely less). If you send one message per user you'd have to make 1M lambda invocations. If you assume each invocation takes 100ms then you can send 10/second, so with 1000 concurrent executions it would take 100 seconds. In reality the numbers are probably even better than that for the batch version, especially if you don't limit it to 1 message at a time.
Edit
Based on the comments the question seemed to be a bit more about the first part of the process. With that in mind I'd suggest the following options.
If you find yourself needing to address the same large groups repeatedly most messaging services (Firebase and SNS for sure) support some sort of topic subscription model. Given that these are push notifications you can subscribe a device to the topic in code. What this ultimately leads to is one messages sent from your code to the messaging service. The service handles the rest. This is probably the preferred solution for anything that has mass recipients, especially if you can know the recipients up front. This even works for dynamic topics. For example, consider a situation where a person comments on a post. Any new comment on that post should send a message to everyone who has commented on that post. You can create a topic on the fly when the post is created, and add recipients to the topic as they comment. If a user wishes to stop receiving messages you can remove the user from the topic.
If you don't know the recipients up front the above solution is a solid solution. However, if you are concerned with Lambda timeouts on the first two steps I'd modify slightly. I would take advantage of AWS Step Functions and page the data in the lambda. Lambda will tell you, via the context object supplied in the invocation, how much time is remaining. You can check that periodically to determine if you should exit the lambda and pass to the step function the current paging information. The step function can pass that paging information back into the lambda, which should be coded to accept the paging information as part of the request, and continue from that point if supplied.
I would suggest an additional piece in your application architecture,
I personally prefer to avoid using the Primary database for heavy querying,
assuming you have a large user base.
I will suggest maintaining your user list in a Search Engine like ElasticSearch or CloudSearch, or a simple table with just the user list in AWS DynamoDb or create a Read Replica of your DB.
To no confuse you, use a Search Engine(first choice) or an AWS DynamoDb
This will avoid creating pressure on your database when you query the read specialty datastore and won't affect other modules in operation
And it's way fast to query this way
Step 2: loop over all of them them
Step 3: batch send messages to SQS using its SendMessageBatch method like Jason is suggesting
Step 4: Based on your SQS setting, you may process multiple messages on your Lambda function
I'm sending messages that failed in my lambda to a dead letter queue using aws sdk. I want to wait for few hours before sending the message back to the main queue for reprocessing. I have a lambda attached to my dead letter queue. I can use delay for sending messages to the dead letter queue. But the maximum delay is 15 minutes. But I want to wait for more time. Has anyone done this before?
Amazon SQS is not intended to be used in this manner. Its primary purpose is to store messages and then provide them back when requested.
Some other options:
Store the message in a database and have the application search for relevant messages based on a timestamp field, or
Do some tricky stuff with delays on AWS Step Functions (which has a delay feature)
As shown in this answer you can extend the delay, by giving each message a timestamp and processing only those that are in queue for a while now.
message = SQS.poll_messages
if message.perform_message_at > Time.now
SQS.push_to_queue({perform_message_at : "Thursday November
2022"},delay:15 mins)
else
process_message(message)
end
I have large number of messages in AWS SQS Queue. These messages will be pushed to it constantly by other source. There are no proper dynamic on how often those messages will be pushed to queue. Currently, I keep polling SQS every second and checking if there are any messages available in there. Is there any better way of handling this, like receiving notification from SQS or SNS that some messages are available so that I only request SQS when I needed instead of constant polling?
The way to do what you want is to use long polling - rather than constantly poll every second, you open a request that stays open until it either times out or a message comes into the queue. Take a look at the documentation for ReceiveMessageRequest
ReceiveMessageRequest req = new ReceiveMessageRequest()
.withWaitTimeSeconds(Integer.valueOf(20)); // set long poll timeout to 20 sec
// set other properties on the request as well
ReceiveMessageResult result = amazonSQS.receiveMessage(req);
A common usage pattern for this is to have a background thread running the long poll and pushing the results into an internal queue (such as LinkedBlockingQueue or an ExecutorService) for a worker thread to read from.
PS. Don't forget to call deleteMessage once you're done processing the result so you don't end up receiving it again.
You can also use the worker functionality in AWS Elastic Beanstalk. It allows you to build a worker to process each message, and when you use Elastic Beanstalk to deploy it to an EC2 instance, you can define it as subscribed to a specific queue. Then each message will be POST to the worker, without your need to call receive-message on it from the queue.
It makes your system wiring much easier, as you can also have auto scaling rules that will allow you to spawn multiple workers to handle more messages in time of peak load, and scale down back to a single worker, when the load is low. It will also delete the message automatically, if you respond with OK from your worker.
See more information about it here: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html
You could also have a look at Shoryuken and the property delay:
delay: 25 # The delay in seconds to pause a queue when it's empty
But being honest we use delay: 0 here, the cost of SQS is inexpensive:
First 1 million Amazon SQS Requests per month are free
$0.50 per 1 million Amazon SQS Requests per month thereafter ($0.00000050 per SQS Request)
A single request can have from 1 to 10 messages, up to a maximum total payload of 256KB.
Each 64KB ‘chunk’ of payload is billed as 1 request. For example, a single API call with a 256KB payload will be billed as four requests.
You will probably spend less than 10 dollars monthly polling messages every second 24x7 in a single host.
One of the advantages of Shoryuken is that it fetches in batch, so it saves some money compared with a fetch per message solutions.