I want to create a web application were a client calls a REST Webservice. This returns OK-Status for the client (with a link to the result) and creates a new message on an activeMQ Queue. On the listeners side of the activeMQ there should be worker who process the messages.
Iam stucking here with my concept, because i dont really know how to determine the number of workers i need. The workers only have to call web service interfaces, so no high computation power is needed for the worker itself. The most time the worker has to wait for returning results from the called webservice. But one worker can not handle all requests, so if a limit of requests in the queue is exceeded (i dont know the limit yet), another worker should treat the queue.
What is the best practise for doing this job? Should i create one worker per Request and destroying them if the work is done? How to dynamically create workers based on the queue size? Is it better to run these workers all the time or creating them when the queue requiere that?
I think a Topic/Suscriber architecture is not reasonable, because only one worker should care about one request. Lets imagine of 100 Requests per Minute average and 500 requests on high workload.
My intention is to get results fast, so no client have to wait for it answer just because not properly used ressources ...
Thank you
Why don't you figure out the max number of workers you'd realistically be able to support, and then make that number and leave them running forever? I'd use a prefetch of either 0 or 1, to avoid piling up a bunch of messages in one worker's prefetch buffer while the others sit idle. (Prefetch=0 will pull the next message when the current one is finished, whereas prefetch=1 will have a single message sitting "on deck" available to be processed without needing to get it from the network but it means that a consumer might be available to consume a message but can't because it's sitting in another consumer's prefetch buffer waiting for that consumer to be read for it). I'd use prefetch=0 as long as the time to download your messages from the broker isn't unreasonable, since it will spread the workload as evenly as possible.
Then whenever there are messages to be processed, either a worker available to process the next message (so no delay) or all the workers are processing messages (so of course you're going to have to wait because you're at capacity, but as soon as there's a worker available it will take the next message from the queue).
Also, you're right that you want queues (where a message will be consumed by only a single worker) not topics (where a message will be consumed by each worker).
Related
We're designing C# scheduled task (runs every few hours) that will run on AWS ECS instances that will grab batched transaction data for thousands of customers from an endpoint, modify the data then send it on to another web service. We will be maintaining the state of the last successful batch in a separate database (using some like created date of the transactions). We need the system to be scalable so as more customers are added we add additional ECS containers to process the data.
There are the options we're considering:
Each container only processes a specific subset of the data. As more customers are added more contains are added. We would need to maintain a logical separation of what contains are processing what customers data.
All the containers process all of the customers. We use some kind of locking flags on the database to let other processes know that the customers data is being processed.
Some other approach.
I think that option 2 is probably the best, but it adds a lot of complexity regarding the locking and unlocking of customers. Are there specific design patterns I could be pointed towards if that if the correct solution?
In both scenarios an important thing to consider is retries in case processing for a specific customer fails. One potential way to distribute jobs across a vast number of container with retries would be to use AWS SQS.
A single container would run periodically every few hours and be the job generator. It would create one SQS queued item for each customer that needs to be processed. In response to items appearing in the queue a number of "worker" containers would be spun up by ECS to consume items from the queue. This can be made to autoscale relative to the number of items in the queue to quickly spin up many containers that can work in parallel.
Each container would use its own high performance concurrent poller similar to this (https://www.npmjs.com/package/squiss) to start grabbing items from the queue and processing them. If a worker failed or crashed due to a bug then SQS will automatically redeliver and dropped queued items that worker had been working on to a different worker after they time out.
This approach would give you a great deal of flexibility, and would let you horizontally scale out the number of workers, while letting any of the workers process any jobs from the queue that it grabs. It would also ensure that every queued item gets processed at least once, and that none get dropped forever in case something crashes or goes wrong.
I've a standard AWS SQS queue and have multiple EC2 instances(~2K) actively polling that queue in an interval of 2 seconds.
I'm using the AWS Java SDK to poll the queue and using the ReceiveMessageRequest with a single message in response for each request.
My expectation is that the number of in flight messages that shown in the SQS console is the number of messages received by the consumers and not yet deleted from queue(i.e it is the number of active messages under process in an instant). But The problem is that the Number of in flight messages is very much less than the number of consumers I've at an instant. As I mentioned I've ~2K consumers but I only see In-flight messages count in aprox. 300-600 range.
Is my assumption is wrong that the in-flight messages is equal to the number of messages currently under process. Also is there any limitation in the SQS/ EC2 or the SQS Java SDK that limits the number of messages that can be processed in an instant?
This might point to a larger than expected amount of time that your hosts are NOT actively processing messages.
From your example of 2000 consumers polling at an interval of 2s, but only topping out at 600 in flight messages - some very rough math (600/2000=0.3) would indicate your hosts are only spending 30% of their time actually processing. In the simplest case, this would happen if a poll/process/delete of a message takes only 600ms, leaving average of 1400ms of idle time between deleting one message and receiving the next.
A good pattern for doing high volume message processing is to think of message processing in terms of thread pools - one for fetching messages, one for processing, and one for deleting (with a local in-memory queue to transition messages between each pool). Each pool has a very specific purpose, and can be more easily tuned to do that purpose really well:
Have enough fetchers (using the batch ReceiveMessage API) to keep your processors unblocked
Limit the size of the in-memory queue between fetchers and processors so that a single host doesn't put too many messages in flight (blocking other hosts from handling them)
Add as many processor threads as your host can handle
Keep metrics on how long processing takes, and provide ability to abort processing if it exceeds a certain time threshold (related to visibility timeout)
Use enough deleters to keep up with processing (also using the batch DeleteMessage API)
By recording metrics on each stage and the in-memory queues between each stage, you can easily pinpoint where your bottlenecks are and fine-tune the system further.
Other things to consider:
Use long polling - set WaitTimeSeconds property in the ReceiveMessage API to minimize empty responses
When you see low throughput, make sure your queue is saturated - if there are very few items in the queue and a lot of processors, many of those processors are going to sit idle waiting for messages.
Don't poll on an interval - poll as soon as you're done processing the previous messages.
Use batching to request/delete multiple messages at once, reducing time spent on round-trip calls to SQS
Generally speaking, as the number of consumers goes up, the number of messages in flight will go up as well - and each consumer can request unto 10 messages per read request - but in reality if each consumer alwaysrequests 10, they will get anywhere from 0-10 messages, especially when the number of messages is low and the number of consumers is high.
So your thinking is more or less correct, but you can't accurately predict precisely how many messages are in flight at any given time based on the number of consumers currently running, but there is a non-precise correlation between the two.
I have a SQS Listener with a max message count of 10. When my consumer receives a batch of 10 message they all get processed but sometimes (depending on the message) the process will take 5-6 hours and some with take as little as 5 minutes. I have 3 consumers (3 different JVM's) polling from the queue with a maxMessageCount of 10. Here is my issue:
If one of those 10 messages takes 5 hours to process it seems as though the listener is waiting to do the next poll of 10 messages until all of the previous messages are 100% complete. Is there a way to allow it to poll a new batch of messages even though another is still being processed?
I'm guessing that I am missing something little here. How I am using Spring Cloud library and the SqsListener annotation. Has anybody ran across this before?
Also I dont think this should matter but the queue is AWS SQS and there JVM's are running on an ECS cluster.
If you run the task on the poller thread, the next poll won't happen until the current one completes.
You can use an ExecutorChannel or QueueChannel to hand the work off to another thread (or threads) but you risk message loss if you do that.
Your situation is rather unusual; 5 hours is a long time to process a message.
You should perhaps consider redesigning your application to persist these "long running" requests to a database or similar, instead of processing them directly from the message. Or, perhaps put them in a different queue so that they don't impact the shorter tasks.
I understand the concept of delay queue of Amazon SQS, but I wonder why it is useful.
What's the usage of SQS delay queue?
Thanks
One use case which i can think of is usage in distributed applications which have eventual consistency semantics. The system consuming the message may have an dependency like a co-relation identifier to be available and hence may need to wait for certain guaranteed duration of time before seeing the co-relation data. In this case, it makes sense for the message to be delayed for certain duration of time.
Like you I was confused as to a use-case for delay queues, until I stumbled across one in my own work. My application needs to have an internal queue with each item waiting at least one minute between each check for completion.
So instead of having to manage a "last-checked-time" on every object, I just shove the object's ID into an SQS queue messagewith a delay time of 60 seconds, and my main loop then becomes a simple long-poll against the queue.
A few off the top of my head:
Emails - Let's say you have a service that sends reminder emails triggered from queue messages. You'd have to delay enqueueing the message in that case.
Race conditions - Delivery delays can be used to overcome race conditions in distributed systems. For example, a service could insert a row into a table, and sends a message about its availability to other services. They can't use the new entry just yet, so you have to delay publishing the SQS message.
Handling retries - Sometimes if a message fails you want to retry with exponential backoffs. This requires re-enqueuing the message with longer delays.
I've built a suite of API's to make queue message scheduling easy. You can call our API's to schedule queue messages, cancel, edit, and check on the status of such messages. Think of it like a scheduler microservice.
www.schedulerapi.com
If you are looking for a solution, let me know. I've built these schedulers before at work for delivering emails at high scale, so I have experience with similar use cases.
One use-case can be:
Think of a time critical expression like a scheduled equity trade order.
If one of your system is fetching all the order scheduled in next 60 minutes and putting them in queue (which will be fetched by another sub system).
If you send these order directly, then they will be visible immediately to process in queue and will be processed depending upon their order.
But most likely, they will not execute in exact time (Hour:Minute:Seconds) in which Customer wanted and this will impact the outcome.
So to solve this, what first sub system will do, it will add delay seconds (difference between current and execution time) so message will only be visible after that much delay or at exact time when user wanted.
I have heard that there is a limit for a server for the requests number it can process.
So if the requests from the client are large than the number people will queue the requests.
So I have two problems:
1 When
How to decide if it is necessary to queue the requests? How to measure the largest number?
2 How
If the queue is unavoidable, so where should be the queue done?
For a J2EE application using spring web mvc as the framework, I want to know if the queue should be put in the Controller or the Model or the DAO?
3 Is there a idea which can avoid the queue but keeping providing the service?
First you have to establish your limit at the server actually is. Its likely that its a limit on the frequency of messages, ie. maybe your limited to sending 10 requests a second. If thats the case then your would need to keep a count of how many messages you've sent out in a second, then before you send out a request check to see if you will breach this limit, if this is true then you must make the thread wait until the second is up. If not your free to send the request. This thread would be reading from a queue of outbound messages.
If the server limit is determined in an other way, i.e. dynamically based on its current load, which sounds like it might be in your case, there must be a continuous feed of request limits which you must process to determine the current limit. Once you have this limit you can process the requests in the same way as mentioned in the first paragraph.
As for where to put the queue and the associated logic, i'd put it in the controller.
I don't think there is a way to avoid the queue, you are forced to throttle your requests an therefore you must queue your outbound requests internally so that they are not lost, and will be processed at some point in the future.