AWS site down issue because cpu utilization reach 100% - amazon-web-services

I am using an Amazon EC2 instance with instance type m3.medium and an Amazon RDS database instance.
In my working hours the website goes down because CPU utilization reaches 100%, and at night (not working hours) the CPU utilization is 60%.
So please give me right solution for this site down issue. I am not sure why I am experiencing this problem.
Once I had set a cron job for every minutes, but I was removed it because of slow down issue, but still I have site down issue.
When i try to use "top" command, i had shows below images for cpu usage, in which httpd command consume more cpu usage, so any suggestion for settings to reduce cpu usage with httpd command
Without website use by any user below two images:
http://screencast.com/t/1jV98WqhCLvV
http://screencast.com/t/PbXF5EYI
After website access simultaneously 5 users
http://screencast.com/t/QZgZsiNgdCUl

If you are CPU Utilization is reaching 100% you have two options.
Increase your EC2 Instance Type to large.
Use AutoScaling to launch one more EC2 Instance of same Instance Type.
Looks like you need some scheduled actions as you donot need 100% CPU Utilization during non-working hours.
The best possible option is to use AWS AutoScaling with Scheduled actions.
http://docs.aws.amazon.com/autoscaling/latest/userguide/schedule_time.html
AWS AutoScaling can launch new EC2 instances based on your CPU Utilization (or other metrics like Network Load, Disk read/write etc). This way you can always keep your site alive.
Using the AutoScaling scheduled actions you can specify metrics such that you stop your autoscaled instances during non-working hours and autoscale instances during working hours according to CPU Utilization(or other metrics).
You can even stop your severs if you donot need them at some point of time.
If you are not familiar with AWS AutoScaling you can follow the Documentation which is very precise and easy.
http://docs.aws.amazon.com/autoscaling/latest/userguide/GettingStartedTutorial.html

If the cpu utilization reach 100% bacause of the number of visitors your site have, you must consider to change the instance type, Auto Scaling or AWS CloudFront in order to cache as many http requests as posible (static and dynamic content).
If visitors are not the problem and there are other scheduled tasks on the EC2 isntance, I strongly recomend to decouple these workload via AWS SQS & AWS Elasticbeanstalk - Worker type

Related

Cloudwatch Period time

CPU metrics cannot be selected below 1 minute in Cloudwatch service. For example, how can I lower this period time to trigger the Autoscale scale faster? I just need to trigger the AutoScale instances in short time. (By the way, datapoints value 1 to 1)
the minimum granularity for the metrics that EC2 provides is 1 minute.
Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html
Would also say that if you need to scale that quickly, wouldn't the startup time be an issue anyway?
You are correct -- basic monitoring of an Amazon EC2 instance provides metrics over 5-minute periods. If you activate EC2 Detailed Monitoring, metrics are provided over 1-minute periods. Extra charges apply for Detailed Monitoring.
When launching a new instance via Amazon EC2 Auto-Scaling, it can take a few minutes for the new instance to launch and for the User Data script (if any) to run. Linux instances are quite fast, but Windows instances take a while on their first boot due to sysprep operations.
You mention that you want to react to a metric in less than one minute. I would suggest that this would not be an ideal way to trigger Auto-scaling. Sometimes a computer can be busy for a while, then can drop down again. Reacting too quickly to a high CPU load would cause the Auto-Scaling group to flap between adding instances and terminating instances. It is better to provision enough capacity for a reasonable amount of extra load and then gradually add more capacity as it is required over time.
If you have a need to react so quickly, then perhaps you should investigate using AWS Lambda to perform small amounts of work in a highly-parallel fashion rather than relying on Amazon EC2 instances.

Windows application performance deteriorates in EC2 instance created by custom AMI

I have created a Windows custom AMI with some custom Windows application.I use this AMI to generate EC2 instances.
I have run into a strange issue:
All the applications run smoothly in the EC2 instance created from the custom AMI.
However, after 24 hours, when I created an EC2 instance using the same custom image, the performance of the applications deteriorate.
Even opening an application on the EC2 instance is much slower compared to the EC2 instance which was created 24 hours prior.
Any Suggestions would be really helpful.
This might be caused by the use of a T2 instance. These are burstable instances.
From CPU Credits and Baseline Performance for Burstable Performance Instances - Amazon Elastic Compute Cloud:
Traditional Amazon EC2 instance types provide fixed performance, while burstable performance instances provide a baseline level of CPU performance with the ability to burst above that baseline level. The baseline performance and ability to burst are governed by CPU credits. A CPU credit provides the performance of a full CPU core for one minute.
So, if your Amazon EC2 instance is consuming a lot of CPU, then it might run out of the CPU credit balance, and therefore be limited in the amount of CPU it can use.
You can monitor the CPU credit balance in Amazon CloudWatch. You can also see the historical CPU usage in CloudWatch, or do it within the Windows instance itself using the Task Manager.
I got the issue. Apparently any windows app we launch , Microsoft automatically tries to connect to internet for updates for every 24 hours . In my case , Internet was turned off , The updates where not getting downloaded. hence the connection was in wait state of 15 seconds by default, Hence the application was slow to launch

How to speed up deployments on AWS Fargate?

After migrating from EC2 cluster instances to AWS Fargate, I realized that deployments take a lot longer. Before they would take 1-2 minutes, now some deplyoments take up to 5 minutes. This post claims that their deployments on Fargate even take up to 10 minutes.
Does anybody know of a way to speed them up? I can't find any documentation on this topic.
Through further googling I found this Reddit thread. An AWS employee wrote:
With regard to time to provision and start a container it is
definitely longer when using Fargate. We may reduce the length of the
provisioning state in the future, but Fargate is doing much more under
the hood than ECS on your own self managed hosts. When you self manage
hosts they are already up and running, and may even already have your
docker image downloaded and cached locally, so ECS is able to launch
the container very quickly. That's not the case with Fargate.
So shrinking the image should help a little. But in general I guess I'll have to live with it and hope for optimizations on AWS' side.
Here's the breakdown of tasks and possible improvements that I've found while researching options to improve my deployment times with ECS Fargate:
Fargate Deployment Overview
Here's a breakdown of what's going on behind the scenes that attribute to the deployment duration:
Provision the Fargate worker instance
Provision/attach the ENI
Download the Docker image
Here you have opportunities for improvement:
reduce the size of your Docker image
Networking throughput is based on the CPU allocations to the Fargate Task - if you allocate more CPU then you get more networking and the image will download faster
Application Startup time
Becomes a factor if your application requires a health check grace period, again effected by CPU allocation
If your task is associated with a load balancer the deployment will also need to pass health checks, and you'll need to account for:
Load balancer deregistration delay
Pass health checks: (Health Check Interval * Threshold)
How to deploy Fargate Task updates faster
Over allocate the CPU
Reduce the deregistration delay
Set the health check threshold to 2 and interval to 5 seconds
don't forget to account for a health check grace period if your app needs it
My Results
During my testing, I was able to deploy my application that typically takes about 8 minutes w/1024 CPU (1vCPU) in under 4 minutes w/4096 CPU (4vCPU)
Disclaimer
Likely your tasks typically require considerably less CPU and you don't want to be always paying for over-allocating the CPU. So, run your deployment with overallocated resources and then run another deployment right after with the original CPU allocation.
Probably not a solution you want to use for every deployment, but could be a solution for hotfix deployments.
Additional Reading
Highly recommend reading Scaling containers on AWS in 2022
Two reasons they're slower, in my experience:
awsvpc network mode attaches an ENI to the task. When it has to do this to a Lambda, if the Lambda is running in a VPC, it is known to dramatically increase the initial spin up time.
Docker image size also affects startup time, since the image will usually need to be downloaded to whatever hidden host for a task to launch. I've done some benchmarking with a small 200MB container and a 2.5GB container. The former did start up quicker.
You can't do much about awsvpc, since Fargate requires it. Shrinking down that image would be your next biggest impact.

How to get consistent CPU utilization on AWS

I've now learnt that when I start a new EC2 instance it has a certain number of CPU credits due to which it's performance is high when it starts processing but gradually reduces over time as the credits run out. Past that point, the instance runs at which appears to be the baseline CPU utilisation rate. To numerate, when I started the EC2 instance (t2.nano), Cloudwatch reported around 80% CPU utilisation gradually decreasing down to 5%.
Now I'm happy to use one of the better instance types pending the instance limit request. But whilst that is in progress, I'd like to know whether the issue of reducing performance over time will still hold even with the better instance type?
Would I require a dedicated host setup if I wish to ensure I get consistent CPU utilisation? The only problem I can see here is that I'm running a SQS worker queue and Elastic Beanstalk allows us to easily setup a worker environment which reads messages from the queue. From what I've read and from looking at the configuration options available in Elastic Beanstalk, I don't think I'll be able to launch instances into a dedicated host directly. Most of my reading has lead me to believe that I'll have to learn how to use a VPC. Would that be correct?
So I guess my questions are - would simply increasing the instance type to a more powerful instance guarantee consistent CPU utilisation performance or is a dedicated host required and if so, is it possible to set up one with Elastic Beanstalk or would it have to be setup manually and if it is set up manually can it be configured to work with an SQS queue automatically?
If you want consistent CPU performance, you should avoid the burstable performance instances (the T2 family). All other families of instances (M5, C5, etc) will have consistent CPU performance over time. You can use any instance family with Elastic Beanstalk. No need for a dedicated host.

Computing power of AWS Elastic Beanstalk instances

I have a CPU-intensive application that I'm considering hosting on 1+ AWS Elastic Beanstalk instances. If at all possible, I'd like to throttle it so that I don't dip over the "free" utilization of the instances.
So I need to figure out what kind of hardware/virtualized hardware the Beanstalk instances are running on, and compare that to the maximum CPU utilization of the free versions.
So for instance, if each Beanstalk instance is running on, say, 2GHz CPUs, and my app performs a specific "supercalc" operation that takes 50 million CPU operations, but the free version of the app only allows me to utilize 100 billion operations per day, then I am limited to 100billion/50million = 2,000 "supercalcs" per day on a free instance. So if the CPU is 2GHz, then my app instance could only run for 2GHz/50million = 40 seconds before I've already "maxed out" the free CPU utilization on the Beanstalk instance.
This is probably not a great example, but hopefully illustrates what I'm trying to achieve. I need to figure out how much I need to throttle my app, or how long my app could run before I max out the Beanstalk CPU utilization, and it really comes down to how beefy the AWS Beanstalk machines are. Thanks in advance!
Amazon EC2 instances aren't based on a "CPU utilisation" billing system (I think Google App Engine is?) - EC2 instance billing is based on the amount of time the machine is "on" regardless of what is doing. See the Amazon EC2 Pricing for the amount it costs to run the different instances sizes in different regions.
There is a special case which is the "Micro" instance - this provides the ability to have short bursts of higher CPU usage than the "small" instance at a lower cost, but if you overuse it you get throttled back for a period (which you don't with a Small). This isn't the same as having an operation limit though, and the price remains the same whether you're throttled or not.
Also note that with Elastic Beanstalk you're also paying for the Elastic Loadbalancer, any storage and bandwidth, and also any database service you are using.
Given all that though - AWS does have a Free Tier - however this is only for the first 12 months of a new account. The Free Tier will cover the cost of a micro EC2 instance, Elastic Loadbalancer, RDS database and other ancillary services - see the link for more info.