Amazon EC2 Instance Monitoring? - amazon-web-services

I am in need of a fairly short/simple script to monitor my EC2 instances for Memory and CPU (for now).
After using Get-EC2Instance -Region , it lists all of the instances. from here where can i go?

Cloudwatch is the monitoring tool for AWS instances. While it can support custom metrics, by default it only measures what the hypervisor can see for your instance.
CPU utilization is supported by default, this is often a more accurate way to see your true CPU utilization since the value comes from the hypervisor.
Memory utilization however is not. This depends largely on your OS and is not visible to the hypervisor. However, you can set up a script that will report this metric to Cloudwatch. Some scripts to help you do this are here: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts-perl.html

There are a few possibilities for monitoring EC2 instances.
Nagios - http://www.nagios.com/solutions/aws-monitoring
StackDriver - http://www.stackdriver.com/
CopperEgg - http://copperegg.com/aws/
But my favorite is Datadog - http://www.datadoghq.com/ - (not just because I work here, but its important to disclose I do work for Datadog.) 5 hosts or less is free and I bet you can be up and running in less than 5 minutes.

Depends what your requirements are for service availability of the monitoring solution itself, as well as how you want to be alerted about host/service notifications.
Nagios, Icinga etc... will allow you to customise an extremely large number of parameters that can be passed to your EC2 hosts, specifying exactly what you want to monitor or check up on. You can run any of the default (or custom) scripts which then feed data back to a central system, then handle those notifications however you want (i.e. send an email, SMS, execute an arbitrary script). Downside of this approach is that you need to self-manage your backend for all of the aggregated monitoring data.
The CloudWatch approach means your instances can push metric data into AWS, then define custom policies around thresholds. For example, 90% CPU usage for more than 5 minutes on an instance or ASG, which might then push a message out to your email via SNS (Simple Notification Service). This method reduces the amount of backend components to manage/maintain, but lacks the extreme customisation abilities of self-hosted monitoring platforms.

Related

Cloudwatch Period time

CPU metrics cannot be selected below 1 minute in Cloudwatch service. For example, how can I lower this period time to trigger the Autoscale scale faster? I just need to trigger the AutoScale instances in short time. (By the way, datapoints value 1 to 1)
the minimum granularity for the metrics that EC2 provides is 1 minute.
Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html
Would also say that if you need to scale that quickly, wouldn't the startup time be an issue anyway?
You are correct -- basic monitoring of an Amazon EC2 instance provides metrics over 5-minute periods. If you activate EC2 Detailed Monitoring, metrics are provided over 1-minute periods. Extra charges apply for Detailed Monitoring.
When launching a new instance via Amazon EC2 Auto-Scaling, it can take a few minutes for the new instance to launch and for the User Data script (if any) to run. Linux instances are quite fast, but Windows instances take a while on their first boot due to sysprep operations.
You mention that you want to react to a metric in less than one minute. I would suggest that this would not be an ideal way to trigger Auto-scaling. Sometimes a computer can be busy for a while, then can drop down again. Reacting too quickly to a high CPU load would cause the Auto-Scaling group to flap between adding instances and terminating instances. It is better to provision enough capacity for a reasonable amount of extra load and then gradually add more capacity as it is required over time.
If you have a need to react so quickly, then perhaps you should investigate using AWS Lambda to perform small amounts of work in a highly-parallel fashion rather than relying on Amazon EC2 instances.

AWS Network out

Our web application has 5 pages (Signin, Dashboard, Map, Devices, Notification)
We have done the load test for this application, and load test script does the following:
Signin and go to Dashboard page
Click Map
Click Devices
Click Notification
We have a basic free plan in AWS.
While performing load test, till about 100 users, we didn’t get any error. please see the below image. We could see NetworkIn, CPUUtilization seems to be normal. But the NetworkOut showed 846K.
But when reach around 114 users, we started getting error in the map page (highlighted in red). During that time, it seems only NetworkOut is high. Please see the below image.
We want to know what is the optimal score for the NetworkOut, If this number is high, is there any way to reduce this number?
Please let me know if you need more information. Thanks in advance for your help.
You are using a t2.micro instance.
This instance type has limitations on CPU that means it is good for bursty workloads, but sustained loads will consume all the available CPU credits. Thus, it might perform poorly under sustained loads over long periods.
The instance also has limited network bandwidth that might impact the throughput of the server. While all Amazon EC2 instances have limited allocations of bandwidth, the t2.micro and t2.nano have particularly low bandwidth allocations. You can see this when copying data to/from the instance and it might be impacting your workloads during testing.
The t2 family, especially at the low-end, is not a good choice for production workloads. It is great for workloads that are sometimes high, but not consistently high. It is also particularly low-cost, but please realise that there are trade-offs for such a low cost.
See:
Amazon EC2 T2 Instances – Amazon Web Services (AWS)
CPU Credits and Baseline Performance for Burstable Performance Instances - Amazon Elastic Compute Cloud
Unlimited Mode for Burstable Performance Instances - Amazon Elastic Compute Cloud
That said, the network throughput showing on the graphs is a result of your application. While the t2 might be limiting the throughput, it is not responsible for the spike on the graph. For that, you will need to investigate the resources being used by the application(s) themselves.
NetworkOut simply refers to volume of outgoing traffic from the instance. You reduce the requests you are sending from this instance to reduce the NetworkOut .So you may need to see which one of click Map, Click Devices and Click Notification is sending traffic outside of the instances. It may not necessarily related only to the number of users but a combination of number of users and application module.

How to track unused resources in AWS?

I have been using AWS for a while now. I always have the difficulty tracking AWS resources and how they are interconnected. Obviously, I am using Terraform but still, there is always ad-hoc operations that cut down my visibility.
Since I have been charged multiple times for resources/services that are present but not used by me.
Unused services include resources that are not pointing to other services but present in the AWS environment.
Tools suggestions are also welcome.
Also, posted on DevOps. Posting here since there are fewer people there.
I have used Janitor Monkey, Cloud Custodian and we do have a bunch of AWS Config + Lambda for cleaning up.
Janitor Monkey determines whether a resource should be a cleanup
candidate by applying a set of rules on it. If any of the rules
determines that the resource is a cleanup candidate, Janitor Monkey
marks the resource and schedules a time to clean it up.
I think that a viable answer here is the same as the popular answer for when to auto-scale - use CloudWatch alarms.
Whenever you have a service that you need to auto-scale up, you do something like monitor for high CPU. If the CPU usage trips some threshold, the alarm can be configured to scale up your fleet. Correspondingly, if CPU usage goes below some threshold, the alarm can be configured to scale down the fleet. Similar alarms can be configured other alerts like memory, disk usage, etc.
So, instead of configuring CloudWatch alarms to scale up or scale down your fleet, you can just configure a CloudWatch alarm to email you when a host becomes idle (e.g. it's CPU usage is too low).
Similar to Janitor Monkey, I've created a tool to track different types of unused resources (ELB, EBS, AMI, Security groups, etc) : https://github.com/romibuzi/majordome

How to ensure AWS Elastic Beanstalk is free

I am wanting to deploy a Django webapp with a PostgreSQL database to AWS Elastic Beanstalk using this tutorial, but I am so confused about pricing. It says it uses services in the AWS Free Tier, but those seem to be limited to a certain number of hours a month, so how do I make sure I don't go above that threshold? And how do I make sure I'm only using free services? They even require a card on file, so it seems really hard to make sure I don't get charged.
You can do the following configuration to make sure you use AWS Elastic Beankstalk for one year free.
Use only Micro instances for the WebServer and RDS instance.
Limit the scaling of the WebServer maximum to 1 or use Standalone deployment without autoscaling.
When selecting storage, use less than 30GB for EBS and don't enable Provision Throughput.
Apart from these, there are usage base costs for Network, EBS IOPS & etc which includes a free quota and the cost is not considerable when it comes to light use cases.
The AWS Free Tier allows AWS accounts to use a certain amount of services for no charge. Any usage beyond the free tier limits will result in a charge on your credit card.
The Free Tier is intended to provide a trial of AWS services. It is not intended for production use, nor is there any guaranteed way to stay within the free limits. It is up to you to monitor your usage.
There is no such thing as a totally free AWS account.
I have found "Cost Management Preferences" -> "Receive Free Tier Usage Alerts" setting in Billing preferences menu. Hopefully this will be enough for a small personal projects with low usage. I would guess it is not enough for large projects since this is only a notification.
In short, you can absolutely make sure that your app stays free, just not from within the AWS interface. You'll have to use your own usage monitoring to ensure you stay within the free limits as others state.
As Ashan said, this is a pretty silly approach since fees are nominal and the alternative is a loss of service, however, AWS does offer APIs to help you do this through CloudWatch.
CloudWatch exposes pretty much all of the billable metrics on a service-by-service basis, for example here are the metrics for EC2, and here are the metrics for S3. After starting your services through beanstalk, just look up all the services you're using via the billing page of the AWS console, look up the CloudWatch APIs for each, then check them.
At least for EC2, there are even customizable alarms and actions, including shutting down the instance. See the Monitoring tab at the bottom of the EC2 console. Not sure, but you might have to manually throw status updates to their status system for some of the other metrics. If so, it's not that difficult. You'd set up an access key for some IAM identity so you can check CloudWatch stuff from command line. Then, you'd write a watchdog script to run on that instance using AWSCLI to regularly ping CloudWatch and call your shutdown code or modify your status if you're over some percentage of your quota.

Jmeter load test with 30K users with aws

My scenario is mentioned below, please provide the solution.
I need to run 17 HTTP Rest API's for 30K users.
I will create 6 AWS instances (Slaves) for running 30K (6 Instances*5000 Users) users.
Each AWS instance (Slave) needs to handle 5K Users.
I will create 1 AWS instance (Master) for controlling 6 AWS slaves.
1) For Master AWS instance, what instance type and storage I need to use?
2) For Slave AWS instance, what instance type and storage I need to use?
3) The main objective is a Single AWS instance need to handle 5000Users (5k) users, for this what instance type and storage I need to use? This objective needs to solve for low cost (pricing)?
Full ELB DNS Name:
The answer is I don't know, this is something you need to find out how many users you will be able to simulate on this or that AWS instance as it depends on the nature of your test, what it is doing, response size, number of postprocessors/assertions, etc.
So I would recommend the following approach:
First of all make sure you are following recommendations from the 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure
Start with single AWS server, i.e. t2.large and single virtual user. Gradually increase the load at the same time monitor the AWS health (CPU,RAM, Disk, etc) using either Amazon CloudWatch or JMeter PerfMon Plugin. Once there will be a lack of the monitored metrics (i.e. CPU usage exceeds 90%) stop your test and mention the number of virtual users at this stage (you can use i.e. Active Threads Over Time listener for this)
Depending on the outcome either switch to other instance type (i.e. Compute Optimized if there is a lack of CPU or Memory Optimized if there is a lack of RAM) or go for higher spec instance of the same tier (i.e. t2.xlarge)
Once you get the number of users you can simulate on a single host you should be able to extrapolate it to other hosts.
JMeter master host doesn't need to be as powerful as slave machines, just make sure it has enough memory to handle incoming results.