We have a service that connects to a database. The memory and CPU consumption is overall quite low.
However, we find we're running out of concurrent requests with, say, one container. Scaling to 2+ allows more throughput.
What metric, if any, can we latch onto to autoscale based on concurrent requests? I think they're handles, but I'm out of my lane a bit there.
I would look into using your load balancer's RequestCount or ActiveConnectionCount metrics as your auto-scaling trigger.
Related
So I just have a question in regards to load balancers which relates to one of our our systems. So we have a system which sits behind an ALB with a fleet of EC2 instances which deal with requests, these requests based on the type of request are forwarded downstream to other components and eventually reside within dynamodb. This system is essentially the entry point into our system and if we know an event is coming we can appropriately scale up our instances to deal with the spike. The problem arises when we have an unexpected spike it traffic, normally within 60 seconds which taxes the instances and the load balancer is unable to scale up in time, what we find is new instances come into play well after the incident is over.
Currently we scale based on CPU threshold. My question is, is there any other metric or method we could use to scale fast when a large spike in traffic comes into play. The easiest solution really that I can see is to throw more instances into play permentantly but this isn’t the most cost effective solution.
Thanks in advance for any guidance you can provide.
We have a lambda that will fire requests to another system and I am thinking about using Fargate for this system. What I want to know is whether Fargate will spin up with every request sent to it(like a lambda), or whether it will spin up once and stay alive to handle subsequent requests from the lambda.
Each lambda invocation will only fire one request to the Fargate system.
There will be many requests, but will be dormant during night times. How does Fargate handle spinning up and down between requests?
I've found extended answer in the article Concurrency Compared: AWS Lambda, AWS App Runner, and AWS Fargate:
AWS Fargate is similar to AWS App Runner in that each container can serve many concurrent requests.
This means that if your load balancer receives a large spike of traffic then the requests will be distributed across all the available containers.
However, it is up to you to use the metrics to define your own scaling rules. You can create scaling rules based on metrics that ECS captures, such as application CPU or memory consumption. Or you can create scaling rules based on metrics from the load balancer, such as concurrent requests or request latency. You can even create custom scaling metrics powered by your application itself. This gives you maximum control over the scaling and concurrency of your application.
I have a T2 Micro instance on AWS Beanstalk with Autoscaling set up. The autoscaling policy uses the Network Out parameter and currently I have it set at 6mb. However, this results in a lot of instances being created and terminated (as the Net Out goes above 6mb). My question is what is an appropriate auto-scaling Net Out policy for a Micro Instance. I understand that a Micro instance should support a Network bandwidth of about 70 Mbit so perhaps the Net Out auto scale can safely be set to about 20 Mbit?
EC2 instance types's exact network performance?
Determining a scale-out trigger for an Auto Scaling group is always difficult.
It needs to be something that identifies that the instance is "busy", to know when to add/remove instances. This varies greatly depending upon the application.
The specific issue with T2 instances is that they have CPU credits. If these credits are exhausted, then there is an artificial maximum level of CPU available. Thus, T2 instances should never have a scaling policy based on CPU.
In your case, you are using networking as the scaling trigger. This is good if network usage is an indication of the instance being "busy", resulting in a bottleneck. If, on the other hand, networking is not the bottleneck then this is not a good scaling trigger.
Traditionally, busy computers are either limited in CPU, Network or Disk access. You will need to study a "busy" instance to discover which of these dimensions is the best indicator that the instance is "busy" such that it cannot handle any additional load.
Alternatively, you might want the application to generate its own metrics, such as the number of messages being simultaneously processed. These can be pushed to Amazon CloudWatch as a custom metric, which can then be used for scaling in/out.
You can even get fancy and use information from a database to trigger scaling events: AWS Autoscaling Based On Database Query Custom Metrics - powerupcloud
Our web application has 5 pages (Signin, Dashboard, Map, Devices, Notification)
We have done the load test for this application, and load test script does the following:
Signin and go to Dashboard page
Click Map
Click Devices
Click Notification
We have a basic free plan in AWS.
While performing load test, till about 100 users, we didn’t get any error. please see the below image. We could see NetworkIn, CPUUtilization seems to be normal. But the NetworkOut showed 846K.
But when reach around 114 users, we started getting error in the map page (highlighted in red). During that time, it seems only NetworkOut is high. Please see the below image.
We want to know what is the optimal score for the NetworkOut, If this number is high, is there any way to reduce this number?
Please let me know if you need more information. Thanks in advance for your help.
You are using a t2.micro instance.
This instance type has limitations on CPU that means it is good for bursty workloads, but sustained loads will consume all the available CPU credits. Thus, it might perform poorly under sustained loads over long periods.
The instance also has limited network bandwidth that might impact the throughput of the server. While all Amazon EC2 instances have limited allocations of bandwidth, the t2.micro and t2.nano have particularly low bandwidth allocations. You can see this when copying data to/from the instance and it might be impacting your workloads during testing.
The t2 family, especially at the low-end, is not a good choice for production workloads. It is great for workloads that are sometimes high, but not consistently high. It is also particularly low-cost, but please realise that there are trade-offs for such a low cost.
See:
Amazon EC2 T2 Instances – Amazon Web Services (AWS)
CPU Credits and Baseline Performance for Burstable Performance Instances - Amazon Elastic Compute Cloud
Unlimited Mode for Burstable Performance Instances - Amazon Elastic Compute Cloud
That said, the network throughput showing on the graphs is a result of your application. While the t2 might be limiting the throughput, it is not responsible for the spike on the graph. For that, you will need to investigate the resources being used by the application(s) themselves.
NetworkOut simply refers to volume of outgoing traffic from the instance. You reduce the requests you are sending from this instance to reduce the NetworkOut .So you may need to see which one of click Map, Click Devices and Click Notification is sending traffic outside of the instances. It may not necessarily related only to the number of users but a combination of number of users and application module.
I've now learnt that when I start a new EC2 instance it has a certain number of CPU credits due to which it's performance is high when it starts processing but gradually reduces over time as the credits run out. Past that point, the instance runs at which appears to be the baseline CPU utilisation rate. To numerate, when I started the EC2 instance (t2.nano), Cloudwatch reported around 80% CPU utilisation gradually decreasing down to 5%.
Now I'm happy to use one of the better instance types pending the instance limit request. But whilst that is in progress, I'd like to know whether the issue of reducing performance over time will still hold even with the better instance type?
Would I require a dedicated host setup if I wish to ensure I get consistent CPU utilisation? The only problem I can see here is that I'm running a SQS worker queue and Elastic Beanstalk allows us to easily setup a worker environment which reads messages from the queue. From what I've read and from looking at the configuration options available in Elastic Beanstalk, I don't think I'll be able to launch instances into a dedicated host directly. Most of my reading has lead me to believe that I'll have to learn how to use a VPC. Would that be correct?
So I guess my questions are - would simply increasing the instance type to a more powerful instance guarantee consistent CPU utilisation performance or is a dedicated host required and if so, is it possible to set up one with Elastic Beanstalk or would it have to be setup manually and if it is set up manually can it be configured to work with an SQS queue automatically?
If you want consistent CPU performance, you should avoid the burstable performance instances (the T2 family). All other families of instances (M5, C5, etc) will have consistent CPU performance over time. You can use any instance family with Elastic Beanstalk. No need for a dedicated host.