Does the Spring SqsListener wait until the last message is processed (or completed) from the current poll before the next poll of messages happens? - amazon-web-services

I have a SQS Listener with a max message count of 10. When my consumer receives a batch of 10 message they all get processed but sometimes (depending on the message) the process will take 5-6 hours and some with take as little as 5 minutes. I have 3 consumers (3 different JVM's) polling from the queue with a maxMessageCount of 10. Here is my issue:
If one of those 10 messages takes 5 hours to process it seems as though the listener is waiting to do the next poll of 10 messages until all of the previous messages are 100% complete. Is there a way to allow it to poll a new batch of messages even though another is still being processed?
I'm guessing that I am missing something little here. How I am using Spring Cloud library and the SqsListener annotation. Has anybody ran across this before?
Also I dont think this should matter but the queue is AWS SQS and there JVM's are running on an ECS cluster.

If you run the task on the poller thread, the next poll won't happen until the current one completes.
You can use an ExecutorChannel or QueueChannel to hand the work off to another thread (or threads) but you risk message loss if you do that.
Your situation is rather unusual; 5 hours is a long time to process a message.
You should perhaps consider redesigning your application to persist these "long running" requests to a database or similar, instead of processing them directly from the message. Or, perhaps put them in a different queue so that they don't impact the shorter tasks.

Related

How to control the number of messages processing simultaneously in a single worker/consumer machine in amazon SQS?

Below are the configuration of my SQS and description of my use case
I am pushing ~1000 messages per minute to AWS SQS and worker machine needs ~30 seconds to process a single message and uses significant amount of RAM to process that message.
worker machine poll the SQS for every 10 Seconds and each time it may get 10 messages.
Since for every 10 seconds a dameon in the worker, polls the SQS, so large number of messages are end up getting parallely processed and throwing memory errors.
SQS Queue type is, Standard.
My question how to control the number of parallel messages processed by the consumer, like at any given time only 3 message should be processed parallely?
Also even if i increase the number of consumers, some consumer may end up processing a lot of messages right? because of the worker polling SQS for every 5 seconds.
I am new to this AWS SQS, kindly correct me if i am wrong.

On Demand Scheduler

I have a daemon which constantly pools an AWS SQS queue for messages, once it does receive a message, I need to keep increasing the visibility timeout until the message is processed.
I would like to set up an "on demand scheduler" which increases the visibility timeout of the message every X minutes or so and then stops the scheduler once the message is processed.
I have tried using the Spring Scheduler (https://spring.io/guides/gs/scheduling-tasks/) but that doesn't meet my needs since it's not on demand and runs no matter what.
This is done on a distributed system with a large fleet.
A message can take up to 10 hours to completely process.
We cannot set the default visibility timeout for the queue to be a high number (due to other reasons).
I would just like to know if there is a good library out there that I can leverage for doing this? Thanks for the help!
The maximum visibility timeout for an SQS message is 12 hours. You are nearing that limit. Perhaps you should consider removing the message from the queue while it is being processed and if an error occurs or the need arises you can re-queue the message.
You can set a trigger for Spring Scheduler allowing you to manually set the next execution time. Refer to this answer. This gives you more control over when the scheduled task runs.
Given the scenario, pulling a message (thus having the visibility timeout timer start) and then trying to acquire a lock was not the most feasible way to go about doing this (especially since messages can take so long to process).
Since the messages could potentially take a very long time to process and thus delete, its not feasible to keep having to increase the timeout for messages that you've pulled. Thus, we went a different way.
We first acquire a lock and then pull the message and then increase the visibility timeout to 11 hours, after we've gotten a lock.

Create workers dynamically (ActiveMQ)

I want to create a web application were a client calls a REST Webservice. This returns OK-Status for the client (with a link to the result) and creates a new message on an activeMQ Queue. On the listeners side of the activeMQ there should be worker who process the messages.
Iam stucking here with my concept, because i dont really know how to determine the number of workers i need. The workers only have to call web service interfaces, so no high computation power is needed for the worker itself. The most time the worker has to wait for returning results from the called webservice. But one worker can not handle all requests, so if a limit of requests in the queue is exceeded (i dont know the limit yet), another worker should treat the queue.
What is the best practise for doing this job? Should i create one worker per Request and destroying them if the work is done? How to dynamically create workers based on the queue size? Is it better to run these workers all the time or creating them when the queue requiere that?
I think a Topic/Suscriber architecture is not reasonable, because only one worker should care about one request. Lets imagine of 100 Requests per Minute average and 500 requests on high workload.
My intention is to get results fast, so no client have to wait for it answer just because not properly used ressources ...
Thank you
Why don't you figure out the max number of workers you'd realistically be able to support, and then make that number and leave them running forever? I'd use a prefetch of either 0 or 1, to avoid piling up a bunch of messages in one worker's prefetch buffer while the others sit idle. (Prefetch=0 will pull the next message when the current one is finished, whereas prefetch=1 will have a single message sitting "on deck" available to be processed without needing to get it from the network but it means that a consumer might be available to consume a message but can't because it's sitting in another consumer's prefetch buffer waiting for that consumer to be read for it). I'd use prefetch=0 as long as the time to download your messages from the broker isn't unreasonable, since it will spread the workload as evenly as possible.
Then whenever there are messages to be processed, either a worker available to process the next message (so no delay) or all the workers are processing messages (so of course you're going to have to wait because you're at capacity, but as soon as there's a worker available it will take the next message from the queue).
Also, you're right that you want queues (where a message will be consumed by only a single worker) not topics (where a message will be consumed by each worker).

When to use delay queue feature of Amazon SQS?

I understand the concept of delay queue of Amazon SQS, but I wonder why it is useful.
What's the usage of SQS delay queue?
Thanks
One use case which i can think of is usage in distributed applications which have eventual consistency semantics. The system consuming the message may have an dependency like a co-relation identifier to be available and hence may need to wait for certain guaranteed duration of time before seeing the co-relation data. In this case, it makes sense for the message to be delayed for certain duration of time.
Like you I was confused as to a use-case for delay queues, until I stumbled across one in my own work. My application needs to have an internal queue with each item waiting at least one minute between each check for completion.
So instead of having to manage a "last-checked-time" on every object, I just shove the object's ID into an SQS queue messagewith a delay time of 60 seconds, and my main loop then becomes a simple long-poll against the queue.
A few off the top of my head:
Emails - Let's say you have a service that sends reminder emails triggered from queue messages. You'd have to delay enqueueing the message in that case.
Race conditions - Delivery delays can be used to overcome race conditions in distributed systems. For example, a service could insert a row into a table, and sends a message about its availability to other services. They can't use the new entry just yet, so you have to delay publishing the SQS message.
Handling retries - Sometimes if a message fails you want to retry with exponential backoffs. This requires re-enqueuing the message with longer delays.
I've built a suite of API's to make queue message scheduling easy. You can call our API's to schedule queue messages, cancel, edit, and check on the status of such messages. Think of it like a scheduler microservice.
www.schedulerapi.com
If you are looking for a solution, let me know. I've built these schedulers before at work for delivering emails at high scale, so I have experience with similar use cases.
One use-case can be:
Think of a time critical expression like a scheduled equity trade order.
If one of your system is fetching all the order scheduled in next 60 minutes and putting them in queue (which will be fetched by another sub system).
If you send these order directly, then they will be visible immediately to process in queue and will be processed depending upon their order.
But most likely, they will not execute in exact time (Hour:Minute:Seconds) in which Customer wanted and this will impact the outcome.
So to solve this, what first sub system will do, it will add delay seconds (difference between current and execution time) so message will only be visible after that much delay or at exact time when user wanted.

How to handle large Emailing queue and delivery with AWS SES?

We are developing an app. that need to handle large email queues. We have planned to store emails in a SQS queue and use SES to send emails. but a bit confused on how to actually handle the queue and process queue. should I use cronjob to regularly read the SQS queue and send emails? What would be the best way to actually trigger the script that will be emailing from our app?
Using SQS with SES is a great way to handle this. If something goes wrong while emailing the request will still be on the queue and will be processed next time around.
I just use a cron job that starts my queue processing/email sending job once an hour. The job runs for an hour as a simple loop:
while i've been running < 1 hour:
if there's a message in the queue:
process the message
delete the message from the queue
I set the WaitTimeSeconds parameter to the maximum (20 seconds) so that the check for a new message will wait a while for a new message if necessary so that the job isn't hitting AWS every few milliseconds. Otherwise, I could put a sleep statement of some kind in the loop.
The reason I run for just an hour is that the job might encounter some error that kills it, or have a memory leak, or some other unanticipated problem. This way any queued email requests will still get handled the next time the job is started.
If you want, you can start the job every fifteen minutes so you'll always have four worker processes handling queue requests. If one of them dies for some reason, you'll still be processing with the other three.