I'm trying to design an autoscaling policy to scale out/in the number of consumers listening to a queue. My first instinct was to base the scaling policy off EnqueueTime which if too high should result in scaling out and scale in when low.
However, the way EnqueueTime appears in Cloudwatch does not seem to match my expectations. From the documentation, EnqueueTime is defined as
The end-to-end latency from when a message arrives at a broker until it is delivered to a consumer.
Note:
EnqueueTime does not measure the end-to-end latency from when a message is sent by a producer until it reaches the broker, nor the latency from when a message is received by a broker until it is acknowledged by the broker. Rather, EnqueueTime is the number of milliseconds from the moment a message is received by the broker until it is successfully delivered to a consumer.
I had expected EnqueueTime to represent how long a message will "wait" in the queue until consumed, but from the screenshot, it is not clear to me how the supposed "wait time" is 1.9s despite there being nothing in the queue and no message production (EnqueueCount = 0). I also don't understand why EnqueueTime does not change well after the spike in traffic (the green spike). I expected the value to be close to 0ms after the spike. The metric not changing affects scaling because if the metric does not change, then the policy might erroneously scale out despite there being no traffic.
I'm also new to using ActiveMQ and am not entirely familiar with its operations. I would greatly appreciate it if somebody could somebody explain what's going on here and how to properly interpret EnqueueTime.
EnqueueTime does represent how long messages "wait" in the queue until they are consumed, but it is important to note that it is an average. Therefore, it is unlikely to fit your use-case because the relative "weight" of each message's individual EnqueueTime will change over time. It won't give you a reliably clear picture of the queue's load relative to consumption.
Related
We have a Messaging Platform built on top of Akka (2.5) using akka cluster and Distributed Pubsub. We have a cluster of 25 servers currently.
The scenario is as follows.
Actor1 created in Server1 subscribes to a topic Chat1.
Actor2 created in Server2 publishes a message over Chat1 (after around 100ms of subscription)
Sometimes the 1st message is not received by Actor1 but subsequent messages always do.
We could derive that this is happening because of the fact that a subscription takes some time to register on all the nodes of the cluster. These are the actions we took to solve this -
Decreased the gossip-interval from 1sec (default) to 50ms.
Added a delay of another 400ms thus giving the cluster 500ms in total to register the subscription. This reduced the probability of the issue happening but its still pretty frequent (1/6 times around)
So few questions here -
Is it expected for Pubsub to take more than 400ms in a cluster of just 25 (that too in private network of servers in the same data centre)
Are there additional configurations in akka which can help in tweaking the time taken for subscription propagation.
What are our options here to monitor the average time taken by Pubsub for subscription propagation within the cluster? This would help in getting the right estimate of delay to be introduced(if at all needed)
If the above mentioned delay is expected, Are there any workarounds which has been used by someone in the past to overcome this issue.
I have 3 SQS queues:
HighPQueue1
MediumPQueue2
LowPQueue3
Messages are inserted in the queue based on the API gateway REST API call. If the message is of high priority, it goes to HighPQueue1. If the message is medium, it goes to MediumPQueue2. If the message is low, it goes to LowPQueue3.
The messages from these 3 queues has to be read in priority order. How can I do that using AWS?
I have thought about creating a Lambda and then checking if message is available first in HighPQueue1, then in MediumPQueue2 and then in LowPQueue3. Would that be the right approach?
I have to trigger AWS step functions for each SQS message depending on the priority. I want to limit to 10 concurrent requests for my AWS step functions at any given point in time.
You won't be able to use the lambda integration for this, but you could still use lambda if you want to start a new invocation every so often. I think what you are suggesting for the pattern is correct (check high, then medium, then low). Here are some things to keep in mind.
Make sure when you are checking the medium and low queues that you only request one message at a time if it's important that the high queue messages are processed quickly.
If you process any message you start over. In other words don't make the mistake of processing a high item and then checking the medium queue. Always start over.
Lambda may not be your best option if you are polling queues. You'll effectively have lambda compute running all the time. That still may be okay if this is the only workload running and you are staying within, or close to within, the free tier.
Consider handling multiple requests at the same time. Is there something in your downstream infrastructure that limits you to processing one message at a time? If not, I would skip this model entirely and go with one queue backed by lambda and running processes in parallel when multiple come in.
I've a standard AWS SQS queue and have multiple EC2 instances(~2K) actively polling that queue in an interval of 2 seconds.
I'm using the AWS Java SDK to poll the queue and using the ReceiveMessageRequest with a single message in response for each request.
My expectation is that the number of in flight messages that shown in the SQS console is the number of messages received by the consumers and not yet deleted from queue(i.e it is the number of active messages under process in an instant). But The problem is that the Number of in flight messages is very much less than the number of consumers I've at an instant. As I mentioned I've ~2K consumers but I only see In-flight messages count in aprox. 300-600 range.
Is my assumption is wrong that the in-flight messages is equal to the number of messages currently under process. Also is there any limitation in the SQS/ EC2 or the SQS Java SDK that limits the number of messages that can be processed in an instant?
This might point to a larger than expected amount of time that your hosts are NOT actively processing messages.
From your example of 2000 consumers polling at an interval of 2s, but only topping out at 600 in flight messages - some very rough math (600/2000=0.3) would indicate your hosts are only spending 30% of their time actually processing. In the simplest case, this would happen if a poll/process/delete of a message takes only 600ms, leaving average of 1400ms of idle time between deleting one message and receiving the next.
A good pattern for doing high volume message processing is to think of message processing in terms of thread pools - one for fetching messages, one for processing, and one for deleting (with a local in-memory queue to transition messages between each pool). Each pool has a very specific purpose, and can be more easily tuned to do that purpose really well:
Have enough fetchers (using the batch ReceiveMessage API) to keep your processors unblocked
Limit the size of the in-memory queue between fetchers and processors so that a single host doesn't put too many messages in flight (blocking other hosts from handling them)
Add as many processor threads as your host can handle
Keep metrics on how long processing takes, and provide ability to abort processing if it exceeds a certain time threshold (related to visibility timeout)
Use enough deleters to keep up with processing (also using the batch DeleteMessage API)
By recording metrics on each stage and the in-memory queues between each stage, you can easily pinpoint where your bottlenecks are and fine-tune the system further.
Other things to consider:
Use long polling - set WaitTimeSeconds property in the ReceiveMessage API to minimize empty responses
When you see low throughput, make sure your queue is saturated - if there are very few items in the queue and a lot of processors, many of those processors are going to sit idle waiting for messages.
Don't poll on an interval - poll as soon as you're done processing the previous messages.
Use batching to request/delete multiple messages at once, reducing time spent on round-trip calls to SQS
Generally speaking, as the number of consumers goes up, the number of messages in flight will go up as well - and each consumer can request unto 10 messages per read request - but in reality if each consumer alwaysrequests 10, they will get anywhere from 0-10 messages, especially when the number of messages is low and the number of consumers is high.
So your thinking is more or less correct, but you can't accurately predict precisely how many messages are in flight at any given time based on the number of consumers currently running, but there is a non-precise correlation between the two.
I understand the concept of delay queue of Amazon SQS, but I wonder why it is useful.
What's the usage of SQS delay queue?
Thanks
One use case which i can think of is usage in distributed applications which have eventual consistency semantics. The system consuming the message may have an dependency like a co-relation identifier to be available and hence may need to wait for certain guaranteed duration of time before seeing the co-relation data. In this case, it makes sense for the message to be delayed for certain duration of time.
Like you I was confused as to a use-case for delay queues, until I stumbled across one in my own work. My application needs to have an internal queue with each item waiting at least one minute between each check for completion.
So instead of having to manage a "last-checked-time" on every object, I just shove the object's ID into an SQS queue messagewith a delay time of 60 seconds, and my main loop then becomes a simple long-poll against the queue.
A few off the top of my head:
Emails - Let's say you have a service that sends reminder emails triggered from queue messages. You'd have to delay enqueueing the message in that case.
Race conditions - Delivery delays can be used to overcome race conditions in distributed systems. For example, a service could insert a row into a table, and sends a message about its availability to other services. They can't use the new entry just yet, so you have to delay publishing the SQS message.
Handling retries - Sometimes if a message fails you want to retry with exponential backoffs. This requires re-enqueuing the message with longer delays.
I've built a suite of API's to make queue message scheduling easy. You can call our API's to schedule queue messages, cancel, edit, and check on the status of such messages. Think of it like a scheduler microservice.
www.schedulerapi.com
If you are looking for a solution, let me know. I've built these schedulers before at work for delivering emails at high scale, so I have experience with similar use cases.
One use-case can be:
Think of a time critical expression like a scheduled equity trade order.
If one of your system is fetching all the order scheduled in next 60 minutes and putting them in queue (which will be fetched by another sub system).
If you send these order directly, then they will be visible immediately to process in queue and will be processed depending upon their order.
But most likely, they will not execute in exact time (Hour:Minute:Seconds) in which Customer wanted and this will impact the outcome.
So to solve this, what first sub system will do, it will add delay seconds (difference between current and execution time) so message will only be visible after that much delay or at exact time when user wanted.
I am new to Amazon Web Services and am currently trying to get my head around how Simple Queue Service (SQS) works.
In the link ReceiveMessage the following is mentioned:
Short poll is the default behavior where a weighted random set of
machines is sampled on a ReceiveMessage call. This means only the
messages on the sampled machines are returned. If the number of
messages in the queue is small (less than 1000), it is likely you will
get fewer messages than you requested per ReceiveMessage call. If the
number of messages in the queue is extremely small, you might not
receive any messages in a particular ReceiveMessage response; in which
case you should repeat the request.
What I understand there is one queue and many machines/instances can read the messages. What is not clear to me is what does "weighted random set of machines" means? Is there more than one queue on a number of machines? Clearly I am lacking some knowledge on on SQS works.
I believe what this means is that because SQS is geographically distributed, not all of the machines (amazon's servers that have your queue) will have the exact same queue content at all times because they won't always be in sync with each other at every instant.
You don't know or control from which of amazons servers it will serve messages from, it uses an algorithm to figure out which messages are sent to you when you request some. That is why you don't always get messages when you ask for them, and occasionally the same message will get served up more than once; you need to make sure whatever your processing entails it can deal with the possibility that it is processing something that has already been processed by another of your worker machines.