AWS Application Load Balancer (ALB): How many http2 persistent-connections can it keep alive at the same time? - amazon-web-services

Essentially what the subject says.
I'm new to this sport and need some high-level pieces of information to figure out the behaviour of ALB towards http2 persistent connections.
I know that ALB supports http2 persistent connections:
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/application-load-balancers.html#connection-idle-timeout
I can't find anything in the docs explaining how the size(s) of the http2-connection-pools (maintained by ALB) are configured (if at all). Any links on this specific aspect?
Does the ALB, by default, maintain a fixed-size http2-connection-pool between itself and the browsers (clients) or are these connection-pools dynamically sized? If they are fixed-size how big are they by default? If they are dynamic what rules govern their expansion/contraction and what's the max amount of persistent http2-connections that they can hold? 30k? 40k?
Let's assume we have 20k http2-clients that run single-page-applications (SPAs) with sessions lasting up to 30mins. These clients need to enjoy ultra-low latency for their semi-frequent http2-requests through AWS ALB (say 1 request per 4secs which translates to about 5k requests/second landing on the ALB):
Does it make sense to configure the ALB to have a hefty
http2-connection-pool so as to ensure that all of these 20k
http2-connections from our clients will indeed be kept alive
throughout the lifetime of the client-session?
Reasoning: In this way no http2-connection will be closed and reopened (guarantees lower jitter because reestablishing a new http2-connection involves some extra latency - at least that's my intuition about this and I'd be happy to stand corrected if I miss something)

I asked this question in the amazon forums:
https://repost.aws/questions/QULRcA_-73QxuAOyGYWhExng/aws-application-load-balancer-and-http-2-persistent-connections-keep-alive
And I got this answer which covers every aspect in question in good detail:
<< So, when it comes to the concurrent connection limits of an Application Load Balancer, there is no upper limitations on the amount of traffic it can serve; it can scale automatically to meet the vast majority of traffic workloads.
An ALB will scale up aggressively as traffic increases, and scale down conservatively as traffic decreases. As it scales up, new higher capacity nodes will be added and registered with DNS, and previous nodes will be removed. This effectively gives an ALB a dynamic connection pool to work with.
When working with the client behavior you have described, the main attribute you'll want to look at when configuring your ALB will be the Connection Idle Timeout setting. By default, this is set to 60 seconds, but can be set to a value of up to 4000 seconds. In your situation, you can set a value that will meet your need to maintain long-term connections of up to 30 minutes without the connection being terminated, in conjunction with utilizing HTTP keep-alive options within your application.
As you might expect, an ALB will start with an initial capacity that may not immediately meet your workload. But as stated above, the ALB will scale up aggressively, and scale down conservatively, scaling up in minutes, and down in hours, based on the traffic received. I highly recommend checking out our best practices for ELB evaluation page to learn more about scaling and how you can test your application to better understand how an ALB will behave based on your traffic load. I will highlight from this page that depending on how quickly traffic increases, the ALB may return an HTTP 503 error if it has not yet fully scaled to meet traffic demand, but will ultimately scale to the necessary capacity. When load testing, we recommend that traffic be increased at no more than 50 percent over a five minute interval.
When it comes to pricing, ALBs are charged for each hour that the ALB is running, and the number of Load Balancer Capacity Units (LCU) used per hour. LCUs are measured based on a set of dimensions on which traffic is processed; new connections, active connections, processed bytes, and rule evaluations, and you are charged based only on the dimension with the highest usage in a particular hour.
As an example using the ELB Pricing Calculator, assuming the ~20,000 connections are ramped up by 10 connections per second, with an average connection duration of 30 minutes (1800 seconds) and sending 1 request every 4 seconds for a total of 1GB of processed data per hour, you could expect a rough cost output of:
1 GB per hour / 1 GB processed bytes per hour per LCU for EC2
instances and IP addresses as targets
= 1 processed bytes LCUs for EC2 instances and IP addresses
as targets
10 new connections per second / 25 new connections
per second per LCU = 0.40 new connections LCUs
10 new connections per second x 1,800 seconds
= 18,000 active connections
18,000 active connections / 3000 connections per LCU
= 6 active connections LCUs
1 rules per request - 10 free rules = -9 paid rules per request
after 10 free rules Max (-9 USD, 0 USD) = 0.00 paid rules per
request Max (1 processed bytes LCUs, 0.4 new connections LCUs,
6 active connections LCUs, 0 rule evaluation LCUs)
= 6 maximum LCUs
1 load balancers x 6 LCUs x 0.008 LCU price per hour x 730 hours
per month = 35.04 USD
Application Load Balancer LCU usage charges (monthly): 35.04 USD
<<

Related

Google Cloud Run concurrency limits + autoscaling clarifications

Google Cloud Run allows a specified request concurrency limit per container. The subtext of the input field states "When this concurrency number is reached, a new container instance is started" Two clarification questions:
Is there any way to set Cloud Run to anticipate the concurrency limit being reached, and spawn a new container a little before that happens to ensure that requests over the concurrency limit of Container 1 are seamlessly handled by Container 2 without the cold start time affecting the requests?
Imagine we have Maximum Instances set to 10, Concurrency set to 10 and there are currently 100 requests being processed (i.e. we've maxed our our capacity and cannot autoscale any more). What happens to the 101th request? Will it be queued up for some period of time, or will a 5XX be returned immediately?
Is there any way to set Cloud Run to anticipate the concurrency limit
being reached, and spawn a new container a little before that happens
to ensure that requests over the concurrency limit of Container 1 are
seamlessly handled by Container 2 without the cold start time
affecting the requests?
No. Cloud Run does not try to predict future traffic patterns.
Imagine we have Maximum Instances set to 10, Concurrency set to 10 and
there are currently 100 requests being processed (i.e. we've maxed our
our capacity and cannot autoscale any more). What happens to the 101th
request? Will it be queued up for some period of time, or will a 5XX
be returned immediately?
HTTP Error 429 Too Many Requests will be returned.
[EDIT - Google Cloud documentation on request queuing]
Under normal circumstances, your revision scales out by creating new
instances to handle incoming traffic load. But when you set a maximum
instances limit, in some scenarios there will be insufficient
instances to meet that traffic load. In that case, incoming requests
queue for up to 60 seconds. During this 60 second window, if an
instance finishes processing requests, it becomes available to process
queued requests. If no instances become available during the 60 second
window, the request fails with a 429 error code on Cloud Run (fully
managed).
About maximum container instances

What should minRCU and minWCU should be set for dynamodb in case of spikes only for few times?

We have a service built in AWS which only gets traffic for few minutes in entire day and then there is no traffic at all. During the burst, say, we get traffic at 200 TPS otherwise, traffic is almost zero during the entire day. This dynamodb has auto scaling enabled.
The thing I wanted to know is how should we set minWCU and minWCU for it. Should it be determined by the most traffic we expected to traffic or the minimum traffic we receive? If I do minimum traffic, say 10, and set utilization as 50%, then I see that some events gets throttled since autoscaling takes time to increase capacity units. But setting the min capacity units according to most traffic that we receive increases the cost of dynamodb, in which case we are incurring cost even when we are not using the dynamodb at all. So, are there any best practices regarding this case?
For your situation, you might be better going with on-demand mode.
DynamoDB on-demand offers pay-per-request pricing for read and write requests so that you pay only for what you use.
This frees you from managing RCUs, WCUs, and autoscaling. There would be no need for pro-active scaling
Be sure to review the considerations before making that change
If you do not have consistent traffic then its better to set to close to what the burst is, as it takes 5 minutes before scaling up you might find your credits depleted before it scales.

Azure Event Hub - Outgoing message/bytes doesn't increase after increasing throughput units and IEventProcessor instances

I have an EventHub instance with 200 partitions (Dedicated cluster). Originally, I have a consumer group with 70 instances of IEventProcessor + 1 throughput unit.
It appears that I can only have 30M outgoing messages per hour while there are double of that amount for incoming. So I increased to 20 throughput units and 100 processor instances. But the outgoing messages don't increase beyond 30M. I don't see any throttle messages.
Are there other EventHub limits that I should adjust here?
EDIT 1:
After setting prefetch and batch size to 1000 I still only see moderate increase: Imgur
Couple things I can recommend to check and do:
Increase batch and prefetch size.
Check client side resources like CPU and available memory. See if there is any high resource utilization which may become a bottleneck.
If hosts are on in a different region than the eventhub, network latency can slow down the receives. Co-locate hosts with eventhub if so.
Consider creating a support ticket so PG can do a proper investigation.

Issues while testing on AWS Auto Scaling, ELB, Cloud Watch

I created a Web App in JSP. One of my web app URL is to return a unique ID.
Here it is the URL.
www.biomobilestrokelab.com/GateKeeper/newUserId.jsp
It works fine in web browser. Now i want to test scalablity on this Web App.
I use Apache Jmeter for this purpose.
But when i hit 1000 requests per second then
Sometimes all requests return response successfully
Sometimes i receive HTTP 504 Gateway Timeout code
Somtimes i receive HTTP 503 service unavailable back-end server is at capacity.
I am using AWS Autoscalig with
Minimum instances = 2
Maximum instance = 12
Health check grace period = 300 sec
Default Cool Down = 60 sec
For ELB follwing options are configured.
Time out = 60 sec
Interval = 200 sec
Unhealthy threshold = 2
Healthy threshold = 10
And I apply the following Cloud Watch Matrices for auto scaling.
CPU Utilization: which add 1 instance when it is greater than 10% and remove 1 instance when it is less than 3% and i applied average statistics for period of 1 minute.
Request Count: which adds 1 instance when Sum is greater than 1000 and decrease 1 instance when Sum is less than equal to 1000.
Kindly guide me how can i resolve this issue so that i can successfully hit 1000 or more than 1000 requests per second.
I suggest you either reduce the number of requests per second for a fixed number of instances (to let's say two) or increase the number of instances until there is no error and all the thousand requests per second are handled successfully. Then based on this, you can set the min in your auto scaling group and adjust the CloudWatch alarm.

RDS connection time out in rush hours (with ~50.000 HTTP requests)

We're using RDS on a db.t2.large instance. And an auto-scaling group of EC2's is writing data to the database during the day. In rush hours we're having about 50.000 HTTP requests each which read/write MySQL data.
This varies each day, but for today's example, during an hour:
We're seeing "Connect Error (2002) Connection timed out" from our PHP instances, about 187 times a minute.
RDS CPU won't raise above 50%
DB Connections won't go above 30 (max is set to 5000).
Free storage is ~ 300G (Disk size is large to provide high IOPS )
Write IOPS hit 1500 burst but drop to 900 because burst limit has expired, after rush hours.
Read IOPS are hitting 300 each 10mins and around 150 in between.
Disk Write Throughput averages between 20 and 25 MB/Sec
Disk Read Throughput between 0,75 and 1,5 MB/Sec
CPU Credit Balance is around 500, so we don't have a need for the CPU burst.
And when it comes to the network, I see a potential limit we're hitting:
Network Receive Throughput reaches 1.41 MB/Second and stays around 1.5 MB/Seconds during an hour.
During this time Network Transmit 5 a 5.2 MB/Second with drops to 4 MB/Second each 10 min which concurs with our cronjobs which are processing data (mainly reading)
I've tried placing the EC2's in different or the same AZ's, but this has no effect
During this time I can connect fine from my local workstation via SSH Tunnel (EC2 -> RDS). And from the EC2 to the RDS as well.
The PHP scripts are set to time-out after 5 sec of trying to connect to ensure a fast response. I've increased this limit to 15 sec now for some scripts.
But which limit are we hitting on RDS? Before we start migrating or changing instances types we'd like to known the source of this problem. I've also just enabled Enhanced Monitoring to get more details on this issue.
If more info needed, I'll gladly elaborate where needed.
Thanks!
Update 25/01/2016
On recommendation of datasage we increased the RDS disk size to 500 GB which gives us 1500 IOPS with 3600 burst, it uses around 1200 IOPS (so not even bursting now) and the time outs still occur.
Connection time-outs are set to 5 sec and 15 sec as mentioned before, shows no difference.
Update 26/01/2016
RDS Screenshot from our peak hours:
Update 28/01/2016
I've changed the setting sync_bin_log to 0, because initially I thought we were hitting the EBS throughput limits (GP-SSD 160 Mbit/s), this gives us a significant drop in disk throughput and the IOPS are lower as well, but we still see the connection time outs occur.
When we plot the times that the errors occur we're seeing that each minute around :40 seconds the time-outs start happening during about 25seconds, then no errors for about 35 secs again and it starts again. This during the peak hour of our incoming traffic.
Apparently it was the Network Performance keeping us back. When we upgraded our RDS instance to an m4.xlarge (with High Network Performance) the issues were resolved.
This was a last resort for us, but it solved our problem in the end.