Selecting message queue approach for multiple consumers in AWS - amazon-web-services

Please help selecting a MQ app/system/approach for the following use-case:
Check for incoming messages for a specific user -> read the message if available -> delete from the queue, ideally, staying within AWS.
Context:
Social networking app, users receiving messages, i.e.
I need to identify incoming messages by recipient ID.
The app is doing long-polls for new messages every 30 seconds.
Message size is <1Kb.
As per current estimates, I'll need 100M+ message checks per months in total (however, much less messages, these are just checks).
While users acknowledge messages choosing OK or Ignore, however not sure if ACK support is required from MQ system for that.
I'm in AWS. Initially thought of SQS, but the more I read the less it looks like a good match - cannot set message recipient ID in a way to filter by recipient, etc, however maybe I'm wrong.
One of the options I also thought about is to just use DynamoDB's "messages" table, partition key being userId and sort key being a messageId, thus I'll be able to easily query by a user, however concerned with costs.
If possible, I would much more prefer to stay within AWS or at least use SAAS like SQS, as being a 1-person startup I really want to avoid headaches supporting self-hosted system.
Thank you!
D

You are right on both these counts:
SQS won't work, because of the limitation you pointed.
DynamoDB would work, but cost a lot.
I can suggest the following:
Create a Redis cluster, possibly on Amazon ElastiCache.
In it, make one List per user.
Whenever a new message comes, append it to concerned User's list.
To deliver the message, just read from the User's list. Also, flush the queue if needed.
What I am suggesting is very similar to how Twitter manages each User's news-feed and home-feed.
It should also be cheap.

Related

Is it possible to selectively read from AWS SQS?

I have a use-case. I want to read from SQS always, except when another event happens.
For instance, I have football news into SQS as messages. I want to retrieve them always, except for times when live matches are happening.
Is there any possibility to read unless there is another event does the job?
I scrolled the docs and Stack Overflow, but I don't see a solution.
COMMENT: I have a small and week service, and I cannot because of technical limitations increase it (memory/CPU, etc.), but I still want 2 "conflicting" flows to be in the service. They are both supposed to communicate to the same API, and I don't want them to send conflicting requests.
Is there a way to do it, or will I have to write a custom communicator with SQS?
You can't select which messages you want to read from SQS and which you'd rather not - there is no filtering in SQS.
If you have messages that need to be processed at all times and others that need to be processed only sometimes or in batches, you should put them in separate queues and read from the seperately.
You don't say anything about the infrastructure that reads from the queue, but if it's a process on EC2, you could just stop it while live matches are happening and restart it later. SQS is built for asynchronous messaging and will store the messages for up to 14 days (depending on your configuration) until a consumer is available to read them.

What is the recommended way to fanout in SQS lambda environment?

I would like to send a push notification to users in my database in a lambda environment via SQS / messaging queue architecture, in order to do that
I would first need to query all users in my database with push notifications enabled.
loop over all of them them
send a SQS event/message for each user.
let my sqs triggered lambda handle/send the push notification
Is there a better way to implement this to avoid querying a big number of users and/or looping over all the results to send a SQS message for each?
I would take a slightly different approach here, but similar.
Query the database for the users
Loop over the users
Send one messages to SQS for a batch of records to send, and use the SendMessageBatch operation of SQS to send them. So batches of batches. Each batch of messages would have several "users" to send to, not just one. This will should increase your performance because a batch will require fewer lambda invocations.
Lambda handles SQS messages (probably more than one), and each SQS message results in sending many push notifications. In the case of Firebase I believe there is a way to send batches, which is even better. Even without that you can send several messages at once using a Promise.all type logic.
With this structure you can send a very large number of messages really quickly, and probably a lot cheaper. Imagine you need to send to 1M users. If you send batches of 100, in batches of 25 to SQS, then you have 2,500 messages per call to SQS. That would mean 400 calls to SQS, far better than even the 40K you'd have to make if you sent single messages in batches of 25.
On the receiving side, even if you throttled the SQS integration to 1 message per invocation you'd have 10,000 lambda invocations. If you assume even 1s per invocation, and 1000 concurrent invocations, it would take 10 seconds (likely less). If you send one message per user you'd have to make 1M lambda invocations. If you assume each invocation takes 100ms then you can send 10/second, so with 1000 concurrent executions it would take 100 seconds. In reality the numbers are probably even better than that for the batch version, especially if you don't limit it to 1 message at a time.
Edit
Based on the comments the question seemed to be a bit more about the first part of the process. With that in mind I'd suggest the following options.
If you find yourself needing to address the same large groups repeatedly most messaging services (Firebase and SNS for sure) support some sort of topic subscription model. Given that these are push notifications you can subscribe a device to the topic in code. What this ultimately leads to is one messages sent from your code to the messaging service. The service handles the rest. This is probably the preferred solution for anything that has mass recipients, especially if you can know the recipients up front. This even works for dynamic topics. For example, consider a situation where a person comments on a post. Any new comment on that post should send a message to everyone who has commented on that post. You can create a topic on the fly when the post is created, and add recipients to the topic as they comment. If a user wishes to stop receiving messages you can remove the user from the topic.
If you don't know the recipients up front the above solution is a solid solution. However, if you are concerned with Lambda timeouts on the first two steps I'd modify slightly. I would take advantage of AWS Step Functions and page the data in the lambda. Lambda will tell you, via the context object supplied in the invocation, how much time is remaining. You can check that periodically to determine if you should exit the lambda and pass to the step function the current paging information. The step function can pass that paging information back into the lambda, which should be coded to accept the paging information as part of the request, and continue from that point if supplied.
I would suggest an additional piece in your application architecture,
I personally prefer to avoid using the Primary database for heavy querying,
assuming you have a large user base.
I will suggest maintaining your user list in a Search Engine like ElasticSearch or CloudSearch, or a simple table with just the user list in AWS DynamoDb or create a Read Replica of your DB.
To no confuse you, use a Search Engine(first choice) or an AWS DynamoDb
This will avoid creating pressure on your database when you query the read specialty datastore and won't affect other modules in operation
And it's way fast to query this way
Step 2: loop over all of them them
Step 3: batch send messages to SQS using its SendMessageBatch method like Jason is suggesting
Step 4: Based on your SQS setting, you may process multiple messages on your Lambda function

How to stream events with GCP platform?

I am looking into building a simple solution where producer services push events to a message queue and then have a streaming service make those available through gRPC streaming API.
Cloud Pub/Sub seems well suited for the job however scaling the streaming service means that each copy of that service would need to create its own subscription and delete it before scaling down and that seems unnecessarily complicated and not what the platform was intended for.
On the other hand Kafka seems to work well for something like this but I'd like to avoid having to manage the underlying platform itself and instead leverage the cloud infrastructure.
I should also mention that the reason for having a streaming API is to allow for streaming towards a frontend (who may not have access to the underlying infrastructure)
Is there a better way to go about doing something like this with the GCP platform without going the route of deploying and managing my own infrastructure?
If you essentially want ephemeral subscriptions, then there are a few things you can set on the Subscription object when you create a subscription:
Set the expiration_policy to a smaller duration. When a subscriber is not receiving messages for that time period, the subscription will be deleted. The tradeoff is that if your subscriber is down due to a transient issue that lasts longer than this period, then the subscription will be deleted. By default, the expiration is 31 days. You can set this as low as 1 day. For pull subscribers, the subscribers simply need to stop issuing requests to Cloud Pub/Sub for the timer on their expiration to start. For push subscriptions, the timer starts based on when no messages are successfully delivered to the endpoint. Therefore, if no messages are published or if the endpoint is returning an error for all pushed messages, the timer is in effect.
Reduce the value of message_retention_duration. This is the time period for which messages are kept in the event a subscriber is not receiving messages and acking them. By default, this is 7 days. You can set it as low as 10 minutes. The tradeoff is that if your subscriber disconnects or gets behind in processing messages by more than this duration, messages older than that will be deleted and the subscriber will not see them.
Subscribers that cleanly shut down could probably just call DeleteSubscription themselves so that the subscription goes away immediately, but for ones that shut down unexpectedly, setting these two properties will minimize the time for which the subscription continues to exist and the number of messages (that will never get delivered) that will be retained.
Keep in mind that Cloud Pub/Sub quotas limit one to 10,000 subscriptions per topic and per project. Therefore, if a lot of subscriptions are created and either active or not cleaned up (manually, or automatically after expiration_policy's ttl has passed), then new subscriptions may not be able to be created.
I think your original idea was better than ephemeral subscriptions tbh. I mean it works, but it feels totally unnatural. Depending on what your requirements are. For example, do clients only need to receive messages while they're connected or do they all need to get all messages?
Only While Connected
Your original idea was better imo. What I probably would have done is to create a gRPC stream service that clients could connect to. The implementation is essentially an observer pattern. The consumer will receive a message and then iterate through the subscribers to do a "Send" to all of them. From there, any time a client connects to the service, it just registers itself with that observer collection and unregisters when it disconnects. Horizontal scaling is passive since clients are sticky to whatever instance they've connected to.
Everyone always get the message, if eventually
The concept is similar to the above but the client doesn't implicitly un-register from the observer on disconnect. Instead, it would register and un-register explicitly (through a method/command designed to do so). Modify the 'on disconnected' logic to tell the observer list that the client has gone offline. Then the consumer's broadcast logic is slightly different. Now it iterates through the list and says "if online, then send, else queue", and send the message to a ephemeral queue (that belongs to the client). Then your 'on connect' logic will send all messages that are in queue to the client before informing the consumer that it's back online. Basically an inbox. Setting up ephemeral, self-deleting queues is really easy in most products like RabbitMQ. I think you'll have to do a bit of managing whether or not it's ok to delete a queue though. For example, never delete the queue unless the client explicitly unsubscribes or has been inactive for so long. Fail to do that, and the whole inbox idea falls apart.
The selected answer above is most similar to what I'm subscribing here in that the subscription is the queue. If I did this, then I'd probably implement it as an internal bus instead of an observer (since it would be unnecessary) - You create a consumer on demand for a connecting client that literally just forwards the message. The message consumer subscribes and unsubscribes based on whether or not the client is connected. As Kamal noted, you'll run into problems if your scale exceeds the maximum number of subscriptions allowed by pubsub. If you find yourself in that position, then you can unshackle that constraint by implementing the pattern above. It's basically the same pattern but you shift the responsibility over to your infra where the only constraint is your own resources.
gRPC makes this mechanism pretty easy. Alternatively, for web, if you're on a Microsoft stack, then SignalR makes this pretty easy too. Clients connect to the hub, and you can publish to all connected clients. The consumer pattern here remains mostly the same, but you don't have to implement the observer pattern by hand.
(note: arrows in diagram are in the direction of dependency, not data flow)

Delayed SES Stats Updation

I am noticing AWS SES stats are not being updated in real-time. After sending email, it takes time for sent count to increase on SES Dashboard. Sometimes it takes few minutes and sometimes it takes long.
Has anyone also experienced this? Any thoughts?
On the assumption that the console is simply making a call to a standard API action (rather than using some kind a console-only backend service that is not documented or user-accessible -- such things are not unheard-of, but are pretty rare in AWS, so it's a reasonably safe assumption), it looks like this is not really designed to be real-time. The stats are reported in 15 minute windows.
From the SES API reference:
GetSendStatistics
Returns the user's sending statistics. The result is a list of data points, representing the last two weeks of sending activity.
Each data point in the list contains statistics for a 15-minute interval.
— http://docs.aws.amazon.com/ses/latest/APIReference/API_GetSendStatistics.html
AWS/SES dashboard stats are for pure hint performace but not to rely on them. In such case, if you want to have real time notifications of sent emails you will need to create SNS notifications. Keep in mind that Spam-Complaint notifications can take up to a couple of days as this is based on information provided by the ISP to Amazon. And complaints within the Gmail evil-system will NEVER get to you.

I need help clarifying a high level use-case of Amazon SQS

So I need a second pair of eyes to correct or confirm my understand standing of Amazon SQS. From my understanding, you can add an unlimited amount of messages to one queue. A message can be 256 KB in size, and if it needs to be larger than that, you can use amazon s3 to store 2 GB. Reading around online, it appears there are many use cases for this queuing service. For example one use case of SQS can act as a database buffer.
But here's what I'm looking to do.. I'm looking to make a real time messaging system. My current functionality acts like more of a message board, so the implementation just inserts into the database then reads the data and packages it into JSON to be inserted on SQLITE mobile phone. That works great, but I'm getting a lot of requests from people to make it real-time.
So what I'm wondering is can I utilize amazon SQS to write and read messages for a chat application? So in my theoretical use case of SQS would have a message queue to write to, and pull from the that queue every second to check for messages on mobile. But here's where I'm confused. Since you cannot "Query" a particular message from the queue, would it make sense to have a queue per user then a generic queue for the app server to read from? Or am I just talking crazy and should spend cognitive resources thinking about implementing an open connection on an Ec2 instance?
Any help would be great,
Thanks!
Have you thought about using Amazon SNS to push the chat messages to your mobile devices? Each user publishes to a topic and the readers subscribe to that topic. You just have to be ok with missing messages if the app isn't running.
If you only have a few (or maybe, less than 100) users, you could have thought of having one SQS queue per user. If that is not so, the solution won't be operationally feasible.
If you were to have one generic queue, SQS won't help because it doesn't allow querying for a given field in all available messages.
I can think of following options for your use case:
Setup one Redis cluster, possibly on Amazon ElastiCache. Have one message List per user.
One Messages table in MySQL, possibly on AWS RDS. This will provide an easy way to query messages for a given user.
You can also use DynamoDB in #2.