How to get information on which resource actually breached cloudwatch alarm - amazon-web-services

I have setup a cloudwatch alarm for RDS "By Database Engine" for postgres for "FreeableMemory" metric and assign an alarm when the Free Memory is less than 1GB. When an alarm is raised, is it possible to get the information on which specific RDS resource/instance actually breached the alarm?.
I could specify "Per Database" metric but then I have to setup one for every instance of db I got. Is it possible to set alart for broader category like "By Database Engine" and when alarm is breached, just look at the event payload and get list of resources that actually breached the alarm?

I don't think you could include that information in your Database Engine level alarm. However, once the alarm is raised, you can easily see which DB has that metric crossed your threshold. Just go to CloudWatch metrics and select FreeableMemory metric and then select all databases from the list. In the diagram you can see any instances above the threshold.
As you mentioned, you could also create an alarm for each DB instance. That may be a tedious task if you do it manually, but you can easily automate it using a script which iterates through the list of your DB instances.

Related

Update cloudwatch alarm threshold based on the stoarge size of RDS

We have a setup where RDS is provisioned through terraform, cloudwatch alarm for RDS' free storage space metric is set to <10% total RDS storage through terraform. Total RDS storage is a static value. Whenever there's storage shortage we manually increase the RDS storage through AWS console. Is there way to update the cloudwatch alarm threshold automatically, to set it to 10% of new total storage after increasing RDS storage?
Although considered an anti-pattern, there is an alternative. It is considered an anti-pattern because it is generally a bad idea to have two different processes controlling the configuration of the same resource. However, if you insist on controlling the size of the RDS outside the terraform plan, you can automate the cloudwatch metric settings.
The root problem is that the FreeSpaceAvailable RDS metric only understands bytes, not percentages. Terraform does that conversion for you so you don't see it. That is why the alarm is not changing when you change the allocated space in the RDS console. You could develop a simple lambda function that uses one the of the AWS APIs (like boto3 for python) to periodically query the size the RDS database and update the alarm on FreeSpaceAvailable accordingly. It could be even more sophisticated by having SNS notify a SQS queue about changes in RDS configuration which could then trigger the lambda to evaluate the metric, which would make the update very fast and efficient. You would need to apply a IAM policy which allows the lambda to read data on the RDS instance in question and permission to update the metric in question. Alternatively, you could have it be manually run and then run it right after the AWS console update, but in that case you could probably just as easily manually update the metric yourself.
I still wouldn't recommend this be the path due to the control of an already controlled resource by terraform.
This should be automatically updated by terraform if you use terraform code to update the allocated space. Cloudwatch itself doesn't understand the relationship between what you are watching - it is just programmed with a threshold value by terraform when you apply the template. Since terraform is doing both actions on your behalf (allocating the RDS instance and setting up cloudwatch metrics) it understands what needs to be set. However, using the AWS console is essentially side-stepping the terraform. While the size increases, terraform is unaware of the change to also change the cloudwatch metric. Using terraform to apply the size update should fix the alarm, depending on how you setup your alarm threshold.

Programmatically Stop AWS EC2 in case of inactivity

Can we stop an AWS windows server EC2 instance of a development environment when there is no activity in it, say after 2 hours of inactivity? I am having trouble identifying whether any user is connected to the server virtually.
I can easily start/stop the EC2 at a fixed time, programmatically, but in order to cut the cost of my server, I am trying to stop the EC2 when it is not being used.
My intent(or use case) is: If no user is using the EC2 till a specified amount of time, it will automatically stop. Developers can restart it as and when needed.
Easiest solution probably would be to set up an Alert with CloudWatch.
Have a read at the documentation, which basically describes your use case perfectly:
You can create an alarm that stops an Amazon EC2 instance when a
certain threshold has been met
A condition could be the average CPU utilisation, e.g. CPU utilisation is below a certain point (which most probably correlates with no logged in users / no developer actually utilising the machine).
This is not a simple task.
The Amazon EC2 service provides a virtual computer that has RAM, CPU and Disk. It can view the amount of activity on the CPU, Network traffic and disk access but it cannot see into the Operating System.
So, the problem becomes how to detect 'inactivity'. This really comes down to the operating system and making some hard decisions. For example, your home computer screen turns off after a defined time of no mouse/keyboard input but the operating system is still doing activity in the background. If the system is running an application such as a web server, and there are no web requests, it is hard to know whether this is 'inactive' because there are no requests, or 'active' because the web server is running.
Bottom line: There is no out-of-the-box feature to do this. You would need to find your own definition of 'inactivity' and then trigger a shutdown in the Operating System.
If you wish to do it via schedule, this might help: Auto-Stop EC2 instances when they finish a task - DEV Community
UPDATE: Lambda's aren't needed anymore, see tpschmidt's answer.
Create a Lambda to turn off the EC2 that will be triggered by a Cloud Watch Alarm when for example the CPU goes under 20% average for an hour. This is fine when you're coding as you will be using more than 20%, and when you have a break for over an hour that's when you want it turned off.
Be sure to set auto save in your IDE's.
Example Python Lambda:
import boto3
region = 'eu-west-3'
instances = ['i-05be5c0c4039881ed']
ec2 = boto3.client('ec2', region_name=region)
def lambda_handler(event, context):
#TODO getInstanceIDFromCloudWatch = event["instanceid"]
ec2.stop_instances(InstanceIds=instances)
print('stopped your instances: ' + str(instances))
Ref: https://www.howtoforge.com/aws-lambda-function-to-start-and-stop-ec2-instance/
In AWS Console:
Goto EC2, select the EC2 instance and copy the Instance ID
Goto Cloud Watch and select Metrics
Under AWS Namespaces click EC2
Paste the Instance ID to find it
Select EC2 > Per-Instance Metrics
Choose the first metric CPU utilisation
Select the second tab called Graphed Metric
Click the Bell icon under Actions
Set a threshold, also this is the hard part, leave the default of Statistic: Average over 1 hour
Set the Condition Lower/Equal and put the value as 20% (you'll need to use the machine more than 1/5th of the hour over 20% CPU otherwise it'll turn off).
Next create an alarm, setup a notification if you like or remove it
Once the Alarm is created
In Cloud Watch select Event > Rules
Add a Rule
Select EC2 as the Service Name and All Event
Click Target and select your Lambda.
When the Alarm goes off the Lambda will turn off the instance ID
You can set up an AWS Cloudwatch alarm that monitors activity. Different parameters like ComparisonOperator, Period, and Threshold can be modified according to how you want to monitor your Ec2 instance.
Then, you can set up an SQS queue and set a Python Lambda function as its target. Within the lambda function, you can use boto3 to turn off the ec2 instance. You can read more details here: https://medium.com/geekculture/automatically-turn-off-ec2-instances-upon-inactivity-31fedd363cad
Terraform setup:
https://medium.com/geekculture/terraform-setup-for-automatically-turning-off-ec2-instances-upon-inactivity-d7f414390800
You are looking for adding stop action to your ec2 instance, this can be easily achieved using CloudWatch alarms.
You can do this from the console using the following steps:
Open the Amazon EC2 console
In the navigation pane, choose Instances.
Select the instance and choose Actions, Monitor and troubleshoot,
Manage CloudWatch alarms.
Alternatively, you can choose the plus sign ( ) in the Alarm status
column.
On the Manage CloudWatch alarms page, do the following:
Choose to Create an alarm.
To receive an email when the alarm is triggered, for Alarm
notification, choose an existing Amazon SNS topic. You first need to
create an Amazon SNS topic using the Amazon SNS console. For more
information, see Using Amazon SNS for application-to-person (A2P)
messaging in the Amazon Simple Notification Service Developer Guide.
Toggle on the Alarm action, and choose Stop.
For Group samples by and Type of data to sample, choose a statistic
and a metric. In this example, choose Average and CPU utilization.
For Alarm When and Percent, specify the metric threshold. In this
example, specify <= and 10 percent.
For the Consecutive period and Period, specify the evaluation period for
the alarm. In this example, specify 1 consecutive period of 5
Minutes.
Amazon CloudWatch automatically creates an alarm name for you. To
change the name, for the Alarm name, enter a new name. Alarm names must
contain only ASCII characters.
Choose to Create.
Note You can adjust the alarm configuration based on your own
requirements before creating the alarm, or you can edit them later.
This includes the metric, threshold, duration, action, and
notification settings. However, after you create an alarm, you
cannot edit its name later.
Check this link from the documentation for terminating the instance using the same way.
You are looking for adding stop action to your ec2 instance, this can be easily achieved using CloudWatch alarms.
Here, I will show how to do that using Terraform:
resource "aws_cloudwatch_metric_alarm" "ec2_cpu" {
alarm_name = "StopTheInstanceAfterInactivity"
metric_name = "CPUUtilization"
comparison_operator = "LessThanOrEqualToThreshold"
statistic = "Average"
threshold = var.threshold
evaluation_periods = var.evaluation_periods # The number of periods over which data is compared to the specified threshold
period = var.period # Evaluation Period (seconds)
namespace = "AWS/EC2"
alarm_description = "This metric monitors ec2 cpu utilization and stop the instance if it is inactive"
actions_enabled = "true"
alarm_actions = ["arn:aws:automate:${var.region}:ec2:stop"]
ok_actions = [] # do nothing
insufficient_data_actions = [] # do nothing
dimensions = {InstanceId = aws_instance.ec2_instance.id}
}

AWS: Alarming on Metric Math alternative

I am currently in the process of migrating some services to AWS and have hit a bit of a road block. I would like to be able to monitor the error percentage of a Lambda and create an Alarm if a certain threshold is breached. Currently the percentage error rate can be calculated with Metric Math, however alarms cannot be generated from this.
I was wondering if anyone know a way in that I could push the metrics require to calculate the percentage, Error and Invocation, to a Lambda and have the Lambda perform the calculation and create the SNS alarm?
Thanks!
CloudWatch just released the Alarms on MetricMath expressions.
https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-cloudwatch-launches-ability-to-add-alarms-on-metric-math-expressions/
So basically you just need to:
Go to CloudWatch
Go to Alarms
Create Alarm
Add your metrics
Add a MetricMath expression
Optionally, add other properties for the alarm
Add the actions that you want to be executed
More information in their documentation

AWS cloudwatch alarm for RDS

Is there a way to make an alarm on cloudwatch for my RDS instances based on % free disk (i know i can turn on enhanced monitoring and that metric is there, but i can't use those metrics on cloudwatch alarms)
if not is there a good way out ?
RDS doesn't report percentage of disk space free, but it does report the amount of free space available. See the list of CloudWatch metrics available for your RDS instances here.
You would need to create alarms on the FreeStorageSpace metric reported by each of your instances.
See an option using Enhanced Monitoring and log metrics enter link description. Basically you can turn on enhanced monitoring for RDS and then parse the JSON logs to get the usedPercentage value for the storage filesystem. This can be turned into a log metric that can be associated with an alarm.

Stopping EC2 instance when custom cloudwatch metric passes limit

I'm trying to find a way to make an Amazon EC2 instance stop automatically when a certain custom metric on CloudWatch passes a limit. So far if I've understood correctly based on these articles:
Discussion Forum: Custom Metric EC2 Action
CloudWatch Documentation: Create Alarms to Stop, Terminate, Reboot, or Recover an Instance
This will only work if the metric is defined as follows:
Tied to certain instance
With type of System/Linux
However in my case I have a custom metric that is actually not instance-related but "global" and if a certain limit is passed, I would need to stop all instances, no matter from which instance the limiting log is received.
Does anybody know if there is way to make this work? What I'd need is some way to make CloudWatch work like this:
If arbitrary custom metric value passes a certain limit -> stop defined instances not tied to the metric itself.
The main problem is that the EC2 option is greyed out as the metric is not tied to certain EC2 instance and I'm not sure if there's any way to do this without actually making the metric itself certain instance related.
Have the custom CloudWatch metric post alerts to an SNS topic.
Have the SNS topic trigger a Lambda function that shuts down your EC2 instances via a call to the AWS API.