Azure Event Hub - Outgoing message/bytes doesn't increase after increasing throughput units and IEventProcessor instances - azure-eventhub

I have an EventHub instance with 200 partitions (Dedicated cluster). Originally, I have a consumer group with 70 instances of IEventProcessor + 1 throughput unit.
It appears that I can only have 30M outgoing messages per hour while there are double of that amount for incoming. So I increased to 20 throughput units and 100 processor instances. But the outgoing messages don't increase beyond 30M. I don't see any throttle messages.
Are there other EventHub limits that I should adjust here?
EDIT 1:
After setting prefetch and batch size to 1000 I still only see moderate increase: Imgur

Couple things I can recommend to check and do:
Increase batch and prefetch size.
Check client side resources like CPU and available memory. See if there is any high resource utilization which may become a bottleneck.
If hosts are on in a different region than the eventhub, network latency can slow down the receives. Co-locate hosts with eventhub if so.
Consider creating a support ticket so PG can do a proper investigation.

Related

AWS Application Load Balancer (ALB): How many http2 persistent-connections can it keep alive at the same time?

Essentially what the subject says.
I'm new to this sport and need some high-level pieces of information to figure out the behaviour of ALB towards http2 persistent connections.
I know that ALB supports http2 persistent connections:
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/application-load-balancers.html#connection-idle-timeout
I can't find anything in the docs explaining how the size(s) of the http2-connection-pools (maintained by ALB) are configured (if at all). Any links on this specific aspect?
Does the ALB, by default, maintain a fixed-size http2-connection-pool between itself and the browsers (clients) or are these connection-pools dynamically sized? If they are fixed-size how big are they by default? If they are dynamic what rules govern their expansion/contraction and what's the max amount of persistent http2-connections that they can hold? 30k? 40k?
Let's assume we have 20k http2-clients that run single-page-applications (SPAs) with sessions lasting up to 30mins. These clients need to enjoy ultra-low latency for their semi-frequent http2-requests through AWS ALB (say 1 request per 4secs which translates to about 5k requests/second landing on the ALB):
Does it make sense to configure the ALB to have a hefty
http2-connection-pool so as to ensure that all of these 20k
http2-connections from our clients will indeed be kept alive
throughout the lifetime of the client-session?
Reasoning: In this way no http2-connection will be closed and reopened (guarantees lower jitter because reestablishing a new http2-connection involves some extra latency - at least that's my intuition about this and I'd be happy to stand corrected if I miss something)
I asked this question in the amazon forums:
https://repost.aws/questions/QULRcA_-73QxuAOyGYWhExng/aws-application-load-balancer-and-http-2-persistent-connections-keep-alive
And I got this answer which covers every aspect in question in good detail:
<< So, when it comes to the concurrent connection limits of an Application Load Balancer, there is no upper limitations on the amount of traffic it can serve; it can scale automatically to meet the vast majority of traffic workloads.
An ALB will scale up aggressively as traffic increases, and scale down conservatively as traffic decreases. As it scales up, new higher capacity nodes will be added and registered with DNS, and previous nodes will be removed. This effectively gives an ALB a dynamic connection pool to work with.
When working with the client behavior you have described, the main attribute you'll want to look at when configuring your ALB will be the Connection Idle Timeout setting. By default, this is set to 60 seconds, but can be set to a value of up to 4000 seconds. In your situation, you can set a value that will meet your need to maintain long-term connections of up to 30 minutes without the connection being terminated, in conjunction with utilizing HTTP keep-alive options within your application.
As you might expect, an ALB will start with an initial capacity that may not immediately meet your workload. But as stated above, the ALB will scale up aggressively, and scale down conservatively, scaling up in minutes, and down in hours, based on the traffic received. I highly recommend checking out our best practices for ELB evaluation page to learn more about scaling and how you can test your application to better understand how an ALB will behave based on your traffic load. I will highlight from this page that depending on how quickly traffic increases, the ALB may return an HTTP 503 error if it has not yet fully scaled to meet traffic demand, but will ultimately scale to the necessary capacity. When load testing, we recommend that traffic be increased at no more than 50 percent over a five minute interval.
When it comes to pricing, ALBs are charged for each hour that the ALB is running, and the number of Load Balancer Capacity Units (LCU) used per hour. LCUs are measured based on a set of dimensions on which traffic is processed; new connections, active connections, processed bytes, and rule evaluations, and you are charged based only on the dimension with the highest usage in a particular hour.
As an example using the ELB Pricing Calculator, assuming the ~20,000 connections are ramped up by 10 connections per second, with an average connection duration of 30 minutes (1800 seconds) and sending 1 request every 4 seconds for a total of 1GB of processed data per hour, you could expect a rough cost output of:
1 GB per hour / 1 GB processed bytes per hour per LCU for EC2
instances and IP addresses as targets
= 1 processed bytes LCUs for EC2 instances and IP addresses
as targets
10 new connections per second / 25 new connections
per second per LCU = 0.40 new connections LCUs
10 new connections per second x 1,800 seconds
= 18,000 active connections
18,000 active connections / 3000 connections per LCU
= 6 active connections LCUs
1 rules per request - 10 free rules = -9 paid rules per request
after 10 free rules Max (-9 USD, 0 USD) = 0.00 paid rules per
request Max (1 processed bytes LCUs, 0.4 new connections LCUs,
6 active connections LCUs, 0 rule evaluation LCUs)
= 6 maximum LCUs
1 load balancers x 6 LCUs x 0.008 LCU price per hour x 730 hours
per month = 35.04 USD
Application Load Balancer LCU usage charges (monthly): 35.04 USD
<<

What should minRCU and minWCU should be set for dynamodb in case of spikes only for few times?

We have a service built in AWS which only gets traffic for few minutes in entire day and then there is no traffic at all. During the burst, say, we get traffic at 200 TPS otherwise, traffic is almost zero during the entire day. This dynamodb has auto scaling enabled.
The thing I wanted to know is how should we set minWCU and minWCU for it. Should it be determined by the most traffic we expected to traffic or the minimum traffic we receive? If I do minimum traffic, say 10, and set utilization as 50%, then I see that some events gets throttled since autoscaling takes time to increase capacity units. But setting the min capacity units according to most traffic that we receive increases the cost of dynamodb, in which case we are incurring cost even when we are not using the dynamodb at all. So, are there any best practices regarding this case?
For your situation, you might be better going with on-demand mode.
DynamoDB on-demand offers pay-per-request pricing for read and write requests so that you pay only for what you use.
This frees you from managing RCUs, WCUs, and autoscaling. There would be no need for pro-active scaling
Be sure to review the considerations before making that change
If you do not have consistent traffic then its better to set to close to what the burst is, as it takes 5 minutes before scaling up you might find your credits depleted before it scales.

Do DynamoDB consumed capacity units in on-demand mode compare to provisioned capacity?

Right now I am using on-demand mode for my DynamoDB tables, as I didn't know how much data to expect. But now that the application has run a while, I can see the metrics for ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits for my tables in CloudWatch.
In on-demand mode I pay per request, whereas in provisioned capacity mode I have to pay for the provisioned capacity. If I simply take the metrics for (max) consumed capacity units and compare the prices of those in provisioned capacity mode to my current costs, I believe provisioned capacity mode would be a lot cheaper for me.
My question is, can I simply take the metrics and take the max (plus some buffer) of the consumed capacity units and configure them as provisioned capacity, or is that an error in reasoning on my part?
There are two other things you need to consider:
How 'bursty' is your throughput?
Are you using SDKs to connect to your database?
Setting your provision to the maximum throughput you ever see will ensure you don't get throttled requests, however you will probably be setting the provision too high. Dynamodb can actually consume more provision than you have set using Burst Capacity. This will accomodate short bursts of high throughput over the space of 5 minutes. If you see sustained peaks, for example your database is busy in the day but not the night, you might consider setting your tables to Autoscale. In this case you can set the provisioned throughput lower, and Dyanmodb will automatically scale up provision as required. Note that autoscale is good for workloads that vary over the course of hours (e.g. for handling daily peak hours). It's not good for reacting to events that occur in less than about 30 minutes.
If you are using official SDKs, they will handle throttle responses, and retry any failed requests. This gives Dynamodb some time to scale without your application failing requests.

How to compute initial Auto-scaling limits for DynamoDb table

Our table has bursty writes, expected once a week. We have auto-scaling enabled, with provisioned capacity as 5 WCU's, with 70% target utilization. This suffices for our off-peak (non-bursty) traffic. However, during the bursty writes, the WCU's reach around 1.5-2k, which leads to a lot of throttled writed and ultimately failures to write as well.
1) Is the auto-scaling suitable for such an use-case?
2) If yes, what should our initial provisioned capacity be?
This answer will tell you why auto-scaling is not working for you:
https://stackoverflow.com/a/53005089/4985580
This answer will tell you how you can configure your SDK to retry operations over a much longer period (and therefore stop your operation failures furing peak requests).
What should be done when the provisioned throughput is exceeded?
Ultimately you should probably move your tables to on-demand.
For tables using on-demand mode, DynamoDB instantly accommodates
customers’ workloads as they ramp up or down to any previously
observed traffic level. If the level of traffic hits a new peak,
DynamoDB adapts rapidly to accommodate the workload.
No, auto-scaling is not suitable for your needs. It takes a few minutes to scale up and it does that by increasing a fixed percentage of your current capacity at each time. There's also a limited number of times it scales up or down per day, so you can't get from 5 to 2,000 in a matter of minutes. You may not even get that in a matter of hours.
I'd suggest to try on demand mode, or manually setting capacity to 2,000 some time before you actually need it (it doesn't really scale instantly).
I strongly advise to read the ENTIRE dynamo documentation with regards to best practices for primary key, GSI, data architecture. Depending on the size of your table (lager that 10 Gb), the 2,000 units may get spread across partitions and you could potentially still have throttled requests.

When does DynamoDB throttle request?

In the answer to "How is Amazon DynamoDB throughput calculated and limited?" it's been suggested, that DynamoDB throttles request whenever you exceed provisioned throughput on per second basis. However, this contradicts my experience.
I've table where I post multiple rows, often the number of rows way exceeding provisioned write capacity. This happens in short bursts. At one point I've even got 5 minutes average above provisioned capacity. OTOH, 15 minutes average is below capacity. I haven't got any throttled request in that period.
5 minutes average peaks at 8.053 with provisioned capacity of 6:
15 minutes average peaks well below provisioned capacity:
So when does DynamoDB throttle requests? What kind of average does it take in account? How high above provisioned capacity can the burst be before it gets throttled?
DynamoDB is designed to ensure that your provisioned capacity is available on a per-second basis. If you provision a table for ten 1kB reads per second then DynamoDB will give you enough capacity to handle that throughput rate. In addition, DynamoDB will sometimes allow you to achieve limited bursting above your provisioned throughput for a short period of time. This is intended to absorb natural variations in customer workloads. This bursting is not guaranteed and it is not always available (and the nature of the available bursting may change over time). As is currently described in the best practices documentation, in order to get the best performance you should have an evenly distributed workload that does not exceed your provisioned capacity and distributes the load evenly over the key space. However, if the reality of production behavior for your application deviates from an evenly distributed workload then DynamoDB may absorb some of the bursts.
As for how much to provision your table, it depends a lot on your workload. You could start with provisioning to something like 80% of your peaks and then adjust your table capacity depending on how many throttles you receive (which you can see in your CloudWatch graphs) and your application’s tolerance for latency induced by retries. Keep in mind that DynamoDB does not allow unlimited bursts above your provisioned capacity. You may be able to absorb short bursts but you cannot sustain a throughput rate above your provisioned capacity level for an extended period of time. The general guidance we can give is to provision for something close to your peaks and then dial down while watching for throttles.
This answer was posted in AWS forums
Disclaimer: I work for Amazon, DynamoDB team.
There's a hint in the DynamoDB documentation that explains how bursting works:
When you are not fully utilizing a partition's throughput, DynamoDB retains a portion of your unused capacity for later bursts of throughput usage. DynamoDB currently retains up five minutes (300 seconds) of unused read and write capacity.
But it also says that you cannot rely on this behavior:
However, do not design your application so that it depends on burst capacity being available at all times: DynamoDB can and does use burst capacity for background maintenance and other tasks without prior notice.
At least that would explain why it was possible to have a 5 minute average above the provisioned capacity. With the explanation above, it would even be possible to have 15 minute averages (or longer timespans) to be above the provisioned capacity, if you have a spike in the very beginning of the interval and less usage within the 300 seconds before the start of the interval.
DynamoDB provides some flexibility in your per-partition throughput provisioning by providing burst capacity. Whenever you're not fully using a partition's throughput, DynamoDB reserves a portion of that unused capacity for later bursts of throughput to handle usage spikes.
DynamoDB currently retains up to 5 minutes (300 seconds) of unused read and write capacity. During an occasional burst of read or write activity, these extra capacity units can be consumed quickly—even faster than the per-second provisioned throughput capacity that you've defined for your table.
DynamoDB can also consume burst capacity for background maintenance and other tasks without prior notice.