Is there a way to get RabbitMQ to deliver messages in batches, instead of one at a time? - clojure

I am using RabbitMQ with Clojure and Langohr, and I want to process messages off the queue in batches rather than one at a time. I could batch the messages myself after they're pulled off the queue, of course, but I'm curious if there's an API call or setting I'm missing that would get RMQ to deliver, say, 500 messages at a time to a consumer. Is this possible?

Related

AWS SQS - when will the duplicated message arrive?

I understand that standard SQS uses "at least once" delivery while FIFO messages are delivered exactly once. I'm trying to weigh standard queues vs FIFO for my application, and one factor is how long it takes for the duplicated message to arrive.
I intend to consume messages from SQS then post the data I received to an idempotent third-party API. I understand that with standard SQS, there's always a risk of me overwriting more recent data with the old duplicated data.
For example:
Message A arrives, I post it onwards.
Message A duplicate arrives, I post it onwards.
Message B arrives, I post it onwards.
All fine ✓
On the other hand:
Message A arrives, I post it onwards.
Message B arrives, I post it onwards.
Message A duplicate arrives - I post it and overwrite the latest data, which was B! ✖
I want to measure this risk, i.e. I want to know how long the duplicate message should take to arrive. Will the duplicate message take roughly the same amount of time to arrive, as the original message?
Maybe it's useful to understand how message duplication occurs. As far as I know this isn't documented in the official docs, but instead it's my mental model of how it works. This is an educated guess.
Whenever you send a message to SQS (SendMessage API), this message arrives at the SQS webservice endpoint, which is one of probably thousands of servers. This endpoint receives your message, duplicates it one or more times and stores these duplicates on more than one SQS server. After it has received confirmation from at least two SQS servers, it acknowledges to the client that the message has been received.
When you call the ReceiveMessage API only a subset of the SQS servers that handle your queue are queried for messages. When a message is returned, these servers communicate to their peers, that this message is currently in-flight and the visibility timeout starts. This doesn't happen instantaneously, as it's a distributed system. While this ReceiveMessage call takes place another consumer might also do a ReceiveMessage call and happen to query one of the servers that have a replica of the message, before it's marked as in-flight. That server hands out the message and now you have to consumers working on it.
This is just one scenario, which is the result of this being a distributed system.
There are a couple of edge cases that can happen as the result of network issues, e.g. when the SQS response to the initial SendMessage gets lost and the client thinks the message didn't arrive and sends it again - poof, you got another duplicate.
The point being: things fail in weird and complex ways. That makes measuring the risk of a delayed message difficult. If your use case can't handle duplicate and out of order messages, you should go for FIFO, but that will inherently limit your throughput. Alternatives are based on distributed locking mechanisms and keeping track of which messages you have already processed, which are complex tools to solve a complex problem.

Is it possible to pull all messages from a RabbitMQ queue at once?

I would like to pull messages from a RabbitMQ queue, wrap them in a object and dispatch for some kind of processing. Ofcourse I could iteratively do that until the queue is empty, but I was wondering if there is any other way (some flag of some kind) or a neater way.
RabbitMQ does not support batches of messages, so you do indeed need to consume each message individually.
Maybe an alternative would be to batch the messages yourself by publishing one large message with all the required content.

Does the Spring SqsListener wait until the last message is processed (or completed) from the current poll before the next poll of messages happens?

I have a SQS Listener with a max message count of 10. When my consumer receives a batch of 10 message they all get processed but sometimes (depending on the message) the process will take 5-6 hours and some with take as little as 5 minutes. I have 3 consumers (3 different JVM's) polling from the queue with a maxMessageCount of 10. Here is my issue:
If one of those 10 messages takes 5 hours to process it seems as though the listener is waiting to do the next poll of 10 messages until all of the previous messages are 100% complete. Is there a way to allow it to poll a new batch of messages even though another is still being processed?
I'm guessing that I am missing something little here. How I am using Spring Cloud library and the SqsListener annotation. Has anybody ran across this before?
Also I dont think this should matter but the queue is AWS SQS and there JVM's are running on an ECS cluster.
If you run the task on the poller thread, the next poll won't happen until the current one completes.
You can use an ExecutorChannel or QueueChannel to hand the work off to another thread (or threads) but you risk message loss if you do that.
Your situation is rather unusual; 5 hours is a long time to process a message.
You should perhaps consider redesigning your application to persist these "long running" requests to a database or similar, instead of processing them directly from the message. Or, perhaps put them in a different queue so that they don't impact the shorter tasks.

How to handle large Emailing queue and delivery with AWS SES?

We are developing an app. that need to handle large email queues. We have planned to store emails in a SQS queue and use SES to send emails. but a bit confused on how to actually handle the queue and process queue. should I use cronjob to regularly read the SQS queue and send emails? What would be the best way to actually trigger the script that will be emailing from our app?
Using SQS with SES is a great way to handle this. If something goes wrong while emailing the request will still be on the queue and will be processed next time around.
I just use a cron job that starts my queue processing/email sending job once an hour. The job runs for an hour as a simple loop:
while i've been running < 1 hour:
if there's a message in the queue:
process the message
delete the message from the queue
I set the WaitTimeSeconds parameter to the maximum (20 seconds) so that the check for a new message will wait a while for a new message if necessary so that the job isn't hitting AWS every few milliseconds. Otherwise, I could put a sleep statement of some kind in the loop.
The reason I run for just an hour is that the job might encounter some error that kills it, or have a memory leak, or some other unanticipated problem. This way any queued email requests will still get handled the next time the job is started.
If you want, you can start the job every fifteen minutes so you'll always have four worker processes handling queue requests. If one of them dies for some reason, you'll still be processing with the other three.

sampled machines when using queues

I am new to Amazon Web Services and am currently trying to get my head around how Simple Queue Service (SQS) works.
In the link ReceiveMessage the following is mentioned:
Short poll is the default behavior where a weighted random set of
machines is sampled on a ReceiveMessage call. This means only the
messages on the sampled machines are returned. If the number of
messages in the queue is small (less than 1000), it is likely you will
get fewer messages than you requested per ReceiveMessage call. If the
number of messages in the queue is extremely small, you might not
receive any messages in a particular ReceiveMessage response; in which
case you should repeat the request.
What I understand there is one queue and many machines/instances can read the messages. What is not clear to me is what does "weighted random set of machines" means? Is there more than one queue on a number of machines? Clearly I am lacking some knowledge on on SQS works.
I believe what this means is that because SQS is geographically distributed, not all of the machines (amazon's servers that have your queue) will have the exact same queue content at all times because they won't always be in sync with each other at every instant.
You don't know or control from which of amazons servers it will serve messages from, it uses an algorithm to figure out which messages are sent to you when you request some. That is why you don't always get messages when you ask for them, and occasionally the same message will get served up more than once; you need to make sure whatever your processing entails it can deal with the possibility that it is processing something that has already been processed by another of your worker machines.