EC2 CPU Spikes in CloudWatch, but not within the VM - amazon-web-services

During the past couple of weeks, a couple of my server instances keep triggering CloudWatch alarms that I've set for CPU usage.
I can see periodic 1min spikes in CloudWatch only within the EC2 CPUUtilization metric. I cannot see them on the instance itself using top, atop, CloudWatch Agent, etc.
I cannot find any correlation or corroborating measurement inside the VM to these events.
I've searched and read documentation and come up empty handed.
Any thoughts?
At the moment, I'm confident that the deployed app code is behaving. Am I wrong to think that CloudWatch is showing artifacts of something I don't have visibility into?... on only two instances? If it happened across the autoscaling group, I'd probably write this off as some kind of background noise.

Related

Cloudwatch Period time

CPU metrics cannot be selected below 1 minute in Cloudwatch service. For example, how can I lower this period time to trigger the Autoscale scale faster? I just need to trigger the AutoScale instances in short time. (By the way, datapoints value 1 to 1)
the minimum granularity for the metrics that EC2 provides is 1 minute.
Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html
Would also say that if you need to scale that quickly, wouldn't the startup time be an issue anyway?
You are correct -- basic monitoring of an Amazon EC2 instance provides metrics over 5-minute periods. If you activate EC2 Detailed Monitoring, metrics are provided over 1-minute periods. Extra charges apply for Detailed Monitoring.
When launching a new instance via Amazon EC2 Auto-Scaling, it can take a few minutes for the new instance to launch and for the User Data script (if any) to run. Linux instances are quite fast, but Windows instances take a while on their first boot due to sysprep operations.
You mention that you want to react to a metric in less than one minute. I would suggest that this would not be an ideal way to trigger Auto-scaling. Sometimes a computer can be busy for a while, then can drop down again. Reacting too quickly to a high CPU load would cause the Auto-Scaling group to flap between adding instances and terminating instances. It is better to provision enough capacity for a reasonable amount of extra load and then gradually add more capacity as it is required over time.
If you have a need to react so quickly, then perhaps you should investigate using AWS Lambda to perform small amounts of work in a highly-parallel fashion rather than relying on Amazon EC2 instances.

How to track unused resources in AWS?

I have been using AWS for a while now. I always have the difficulty tracking AWS resources and how they are interconnected. Obviously, I am using Terraform but still, there is always ad-hoc operations that cut down my visibility.
Since I have been charged multiple times for resources/services that are present but not used by me.
Unused services include resources that are not pointing to other services but present in the AWS environment.
Tools suggestions are also welcome.
Also, posted on DevOps. Posting here since there are fewer people there.
I have used Janitor Monkey, Cloud Custodian and we do have a bunch of AWS Config + Lambda for cleaning up.
Janitor Monkey determines whether a resource should be a cleanup
candidate by applying a set of rules on it. If any of the rules
determines that the resource is a cleanup candidate, Janitor Monkey
marks the resource and schedules a time to clean it up.
I think that a viable answer here is the same as the popular answer for when to auto-scale - use CloudWatch alarms.
Whenever you have a service that you need to auto-scale up, you do something like monitor for high CPU. If the CPU usage trips some threshold, the alarm can be configured to scale up your fleet. Correspondingly, if CPU usage goes below some threshold, the alarm can be configured to scale down the fleet. Similar alarms can be configured other alerts like memory, disk usage, etc.
So, instead of configuring CloudWatch alarms to scale up or scale down your fleet, you can just configure a CloudWatch alarm to email you when a host becomes idle (e.g. it's CPU usage is too low).
Similar to Janitor Monkey, I've created a tool to track different types of unused resources (ELB, EBS, AMI, Security groups, etc) : https://github.com/romibuzi/majordome

Passenger - Using "Requests in queue" as an AWS metric for autoscaling

I am surprised to find little information regarding EC2 autoscaling with Phusion Passenger.
I actually discovered not so long ago a metric "Requests in queue" being exposed upon running passenger-status
I am wondering whether this stat would make a nice metric to help with autoscaling.
Right now most AWS EC2 Autoscaling guides mention using CPU and Memory to write autoscaling rules but I find this insufficient. When I think about the problem autoscaling should solve, that is being able to scale up to the demand, I'd rather base those rules on the number of pending/completed requests to report a node health or a cluster congestion, and Passenger "Requests in queue" (and also for each process, the "Last Used" and "Processed" count) seems to useful.
I am wondering it it would be possible to report this "Requests in queue" stat (and eventually others) periodically as an AWS metric. I was thinking the following rule would be ideal for autoscaling : If the average number of "requests in queue" on the autoscaled instances is to exceed a threshold value, this would trigger spawning a new machine from the autoscaling group.
Is this possible ?
Has anyone ever tried to implement autoscaling rules based on number of requests in queue this way ?
This is totally possible (and a good approach).
Step 1. Create custom CloudWatch metric for "Requests in queue".
You will have to write your own agent that runs passenger-status, extracts the value and sends it to CloudWatch. You can use any AWS SDK or just AWS CLI: http://docs.aws.amazon.com/cli/latest/reference/cloudwatch/put-metric-data.html
Step 2. Create alarms for scale up and scale down based on your custom metric.
Step 3. Modify scaling policy for your Auto Scaling Group to use your custom alarms to scale up/down.

AWS site down issue because cpu utilization reach 100%

I am using an Amazon EC2 instance with instance type m3.medium and an Amazon RDS database instance.
In my working hours the website goes down because CPU utilization reaches 100%, and at night (not working hours) the CPU utilization is 60%.
So please give me right solution for this site down issue. I am not sure why I am experiencing this problem.
Once I had set a cron job for every minutes, but I was removed it because of slow down issue, but still I have site down issue.
When i try to use "top" command, i had shows below images for cpu usage, in which httpd command consume more cpu usage, so any suggestion for settings to reduce cpu usage with httpd command
Without website use by any user below two images:
http://screencast.com/t/1jV98WqhCLvV
http://screencast.com/t/PbXF5EYI
After website access simultaneously 5 users
http://screencast.com/t/QZgZsiNgdCUl
If you are CPU Utilization is reaching 100% you have two options.
Increase your EC2 Instance Type to large.
Use AutoScaling to launch one more EC2 Instance of same Instance Type.
Looks like you need some scheduled actions as you donot need 100% CPU Utilization during non-working hours.
The best possible option is to use AWS AutoScaling with Scheduled actions.
http://docs.aws.amazon.com/autoscaling/latest/userguide/schedule_time.html
AWS AutoScaling can launch new EC2 instances based on your CPU Utilization (or other metrics like Network Load, Disk read/write etc). This way you can always keep your site alive.
Using the AutoScaling scheduled actions you can specify metrics such that you stop your autoscaled instances during non-working hours and autoscale instances during working hours according to CPU Utilization(or other metrics).
You can even stop your severs if you donot need them at some point of time.
If you are not familiar with AWS AutoScaling you can follow the Documentation which is very precise and easy.
http://docs.aws.amazon.com/autoscaling/latest/userguide/GettingStartedTutorial.html
If the cpu utilization reach 100% bacause of the number of visitors your site have, you must consider to change the instance type, Auto Scaling or AWS CloudFront in order to cache as many http requests as posible (static and dynamic content).
If visitors are not the problem and there are other scheduled tasks on the EC2 isntance, I strongly recomend to decouple these workload via AWS SQS & AWS Elasticbeanstalk - Worker type

AWS CloudWatch Alarms to multiple EC2 instances

I'm wanting to apply a CloudWatch alarm to stop instances which aren't being used in our pre-production environment. We often have instances being spun up, used and then left turned on which is really starting to cost us a fair amount of money.
CloudWatch alarms have a handy feature whereby we can stop based on some metrics - this is awesome and what I'd like to use to constantly keep an eye on the servers with but let it tidy up the instances for me.
The problem with this is that it appears that the CloudWatch alarms need to be created individually against each instance. Is there a way in which I can create one alarm which would share values across all current and future instances which will be started?
ETA - Alternatively, tell me that these options are better than CloudWatch and I'll be happy at that.
AWS EC2 stop all through PowerShell/CMD tools
Add a startup script that creates the CloudWatch alarm to the base image you use to generate your VMs.
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/CLIReference.html
I don't believe this is possible - CloudWatch seems designed to be 'very manual' or 'very automated'. i.e. You can't setup one alarm which would go off if any one instance is idle, you have to setup individual alarms for each instance.
A couple of possible solutions, which are probably not what you want to hear:
Script your instance creation, and add a call to cloudwatch to create an alarm for each instance.
Run a service continually, which looks for instances and checks to ensure that there is an alarm for the instance, create alarms for the new instances, and remove alarms for instances which have been terminated.
I think what you are actually looking for would be auto-scaling:
https://aws.amazon.com/documentation/autoscaling/