I know that the maximum number of ConsumerGroups we can have in an eventhub is 20, and the maximum number of partitions is 32. And with EventProcessorHost, there is only one active reader per ConsumerGroup per partition. So I wanted to know what is the maximum number of consumers reading simultaneously from an eventhub is possible.
It is recommended to have a maximum of one consumer(belonging to one consumer group) processing events from one partition at one time. However, the Event Hub service supports a maximum of 5 consumers per consumer group concurrently receiving events from one partition. But obviously, since they are subscribed to the same partition and belong to same consumer group, they would be reading in the same data until each consumer maintains and reads from a different offset.
You can refer to this article from Azure docs to confirm this.
Also this blog presents a nice code snippet to test out the same support of up-to 5 concurrent consumers per partition.
So for your figures, I think, theoretically, that would make => 20(consumer groups) *5(consumers per group) *32(partitions) = 3200 active consumer running concurrently.
Related
Within my project, I have a group of remote nodes that have data on them that needs to be downloaded and set up a FIFO SQS queue and am able to push and pull message to/from it just fine to download the data.
Because these nodes are remote, they could have limited bandwidth so I use a MessageGroupId to enforce no more than 2 messages in flight per individual node to ensure we get no more than 2 concurrent connections at any given time.
Unfortunately, it seems that the only option available when calling receiveMessage() is MaxNumberOfMessages which ranges from 1-10 but also is equal to the number of MessageGroupId allowed in the response. So this means my receiveMessage() calls have to be 2 or less in order to prevent more than 2 concurrent connections to my remote nodes at once.
So my question here is, am I wrong? Someone please tell me I'm wrong and show me an option where I can set MaxNumberOfMessages = 10 and something like MessageGroupIdMax to 2 or something. I would prefer to pull 10 messages at a time and know I am only getting 2 per MessageGroupId so I don't have to call the queue so often.
Thanks in advance!
So we hired a consultant to basically confirm what I assumed. You cannot separate the number of messages received from the max number per Message Group Id, they are one in the same. To handle hundreds of message groups at one apiece, you just need a ton of workers processing the queue.
What is shards in kinesis data stream and partition key. I read aws documents but I don't get it. Can someone explain it in simple terms?
From Amazon Kinesis Data Streams Terminology and Concepts - Amazon Kinesis Data Streams:
A shard is a uniquely identified sequence of data records in a stream. A stream is composed of one or more shards, each of which provides a fixed unit of capacity. Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second and up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). The data capacity of your stream is a function of the number of shards that you specify for the stream. The total capacity of the stream is the sum of the capacities of its shards.
So, a shard has two purposes:
A certain amount of capacity/throughput
An ordered list of messages
If your application must process all messages in order, then you can only use one shard. Think of it as a line at a bank — if there is one line, then everybody gets served in order.
However, if messages only need to be ordered for a certain subset of messages, they can be sent to separate shards. For example, multiple lines in a bank, where each line gets served in order. Or, think of a bus sending GPS coordinates. Each bus sends messages to only a single shard. A shard might contain messages from multiple buses, but each bus only sends to one shard. This way, when the messages from that shard is processed, all messages from a particular bus are processed in order.
This is controlled by using a Partition Key, which identifies the source. The partition key is hashed and assigned to a shard. Thus, all messages with the same partition key will go to the same shard.
At the back-end, there is a typically one worker per shard that is processing the messages, in order, from that shard.
If your system does not care about preserving message order, then use a random partition key. This means the message will be sent to any shard.
I've a standard AWS SQS queue and have multiple EC2 instances(~2K) actively polling that queue in an interval of 2 seconds.
I'm using the AWS Java SDK to poll the queue and using the ReceiveMessageRequest with a single message in response for each request.
My expectation is that the number of in flight messages that shown in the SQS console is the number of messages received by the consumers and not yet deleted from queue(i.e it is the number of active messages under process in an instant). But The problem is that the Number of in flight messages is very much less than the number of consumers I've at an instant. As I mentioned I've ~2K consumers but I only see In-flight messages count in aprox. 300-600 range.
Is my assumption is wrong that the in-flight messages is equal to the number of messages currently under process. Also is there any limitation in the SQS/ EC2 or the SQS Java SDK that limits the number of messages that can be processed in an instant?
This might point to a larger than expected amount of time that your hosts are NOT actively processing messages.
From your example of 2000 consumers polling at an interval of 2s, but only topping out at 600 in flight messages - some very rough math (600/2000=0.3) would indicate your hosts are only spending 30% of their time actually processing. In the simplest case, this would happen if a poll/process/delete of a message takes only 600ms, leaving average of 1400ms of idle time between deleting one message and receiving the next.
A good pattern for doing high volume message processing is to think of message processing in terms of thread pools - one for fetching messages, one for processing, and one for deleting (with a local in-memory queue to transition messages between each pool). Each pool has a very specific purpose, and can be more easily tuned to do that purpose really well:
Have enough fetchers (using the batch ReceiveMessage API) to keep your processors unblocked
Limit the size of the in-memory queue between fetchers and processors so that a single host doesn't put too many messages in flight (blocking other hosts from handling them)
Add as many processor threads as your host can handle
Keep metrics on how long processing takes, and provide ability to abort processing if it exceeds a certain time threshold (related to visibility timeout)
Use enough deleters to keep up with processing (also using the batch DeleteMessage API)
By recording metrics on each stage and the in-memory queues between each stage, you can easily pinpoint where your bottlenecks are and fine-tune the system further.
Other things to consider:
Use long polling - set WaitTimeSeconds property in the ReceiveMessage API to minimize empty responses
When you see low throughput, make sure your queue is saturated - if there are very few items in the queue and a lot of processors, many of those processors are going to sit idle waiting for messages.
Don't poll on an interval - poll as soon as you're done processing the previous messages.
Use batching to request/delete multiple messages at once, reducing time spent on round-trip calls to SQS
Generally speaking, as the number of consumers goes up, the number of messages in flight will go up as well - and each consumer can request unto 10 messages per read request - but in reality if each consumer alwaysrequests 10, they will get anywhere from 0-10 messages, especially when the number of messages is low and the number of consumers is high.
So your thinking is more or less correct, but you can't accurately predict precisely how many messages are in flight at any given time based on the number of consumers currently running, but there is a non-precise correlation between the two.
We currently have an application that receives a large amount of sensor data. Each sensor has its own unique sensor id (eg '5834f7718273f92cc326f620') and emits its status at different intervals. The processing order of the messages that come in is not important, for example a newer message of one sensor can be processed before an older message of another sensor. What does matter though, is that each message for a given sensor must be processed sequentially; in the order that that they arrived in the stream.
I have taken a look at the Kinesis Client Library and understand that KCL pushes messages to a single processor per shard. Does this mean that if a stream has only one shard it will have only one processor and couldn't this create a bottleneck? Or does KCL have more than one processor, and somehow, perhaps using the partition key ensures messages with the same partition key are never processed concurrently?
Note: We have taken a look at sqs fifo, but ruled it out as the 300 messages per second limit would soon become an issue.
Yes, each shard can only have one processor at a given moment (per application).
But, you can use the sensor id as the partition key for your kinesis put record request. (see here)
This will make sure that all of this sensor events will get into the same shard and processor.
If you will do that you'll be able to scale your processes and shards and still get each sensor events processed in a single processor
Is there any attempt to keep adjacent shards together when spreading them out over multiple workers? In the documentation example it started with 1 worker/instance and 4 shards. Then auto-scaling occurred and a 2nd worker/instance was started up. The KCL auto-magically moved 2 shards over to worker 2. Is there any attempt at keeping adjacent shards together with a worker when autoscaling? What about when splitting shards?
Thanks
Random.
If you mean "Kinesis Consumer Application" as "Worker", then the consumer application with the most shards loses 1 shard to another application who has less shards.
"Lease" is the correct term here, it describes a consumer application & shard association. And there is not adjacency check for taking leases, it is pure random.
See source code, chooseLeaseToSteal method: https://github.com/awslabs/amazon-kinesis-client/blob/c6e393c13ec348f77b8b08082ba56823776ee48a/src/main/java/com/amazonaws/services/kinesis/leases/impl/LeaseTaker.java#L414
Is there any attempt to keep adjacent shards together when spreading them out over multiple workers?
I doubt that's the case. My understanding is that order is maintained only within the boundary of a single key and the boundary of a single key falls within a single shard.
Imagine I have 2 keys, key-a and key-b, and the following events occurred:
["event-1-key-a", "event-2-key-b", "event-3-key-a"]
Now we have 2 events for key-a: ["event-1-key-a", "event-3-key-a"]
and 1 event for key-b: ["event-2-key-b"]
Note that sharding happens exactly like the above -- the 2 events for key-a will always end up in the same shard. With that being the guarantee, maintaining the order among shards is not necessary.