Getting single time series from AWS CloudWatch metric maths SEARCH function - amazon-web-services

I'm attempting to creating a CloudWatch alarm for if any instances in a group go over x% of memory used and have built the following metric maths query to do so:
SEARCH('{CWAgent,InstanceId} MetricName="mem_used_percent"', 'Maximum', 300)
This graphs fine, however the CloudWatch console complains "The expression for an alarm must create exactly one time series.". I believe this is the case; The query above should (and does) return a singular line graph result that is not multi-dimensional.
How can I get this data to return in the format required by CloudWatch to create an alarm? My alternative is to general a new alarm per instance creation, however this seems more complex to manage the creation and destruction of alarms.
CloudWatch config on instance for collecting metric:
"metrics":{
"append_dimensions": {
"InstanceId": "${aws:InstanceId}"
},
"metrics_collected":{
"mem": {
"measurement": [
"used_percent"
]
},
"disk": {
"measurement": [ "used_percent" ],
"metrics_collection_interval": 60,
"resources": [ "/" ]
}
}

Unfortunately it's not possible to create an alarm based on a search expression, so I don't think there's (currently) a way to do what you're after.
Per https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Create-alarm-on-metric-math-expression.html:
You can't create an alarm based on the SEARCH expression. This is because search expressions return multiple time series, and an alarm based on a math expression can watch only one time series.
This appears to be the case even when you only get one result from a SEARCH expression.
I tried to combine this down into one time series using AVG, but this then appeared to lose the context of the metric and instead gave the error 'The expression for an alarm must include at least one metric'.
I'm currently handling a similar case with a pair of Lambda functions tied to CloudTrail events for RunInstances and TerminateInstances, that parse the event data for the instance ID and (among other things) create and delete individual CloudWatch alarms.

This example displays one line for each instance in the Region, showing the CPUUtilization metric from the AWS/EC2 namespace.
SEARCH(' {AWS/EC2,InstanceId} MetricName="CPUUtilization" ', 'Average', 300)
Changing InstanceId to InstanceType changes the graph to show one line for each instance type used in the Region. Data from all instances of each type is aggregated into one line for that instance type.
SEARCH(' {AWS/EC2,InstanceType} MetricName="CPUUtilization" ', 'Average', 300)
Removing the dimension name but keeping the namespace in the schema, as in the following example, results in a single line showing the aggregation of CPUUtilization metrics for all instances in the Region.
SEARCH(' {AWS/EC2} MetricName="CPUUtilization" ', 'Average', 300)
refer this for detailed explanation about search query.
To select metrics, refer this link for step-by-step explanation.

Related

google cloud platform -- creating alert policy -- how to specify message variable in alerting documentation markdown?

So I've created a logging alert policy on google cloud that monitors the project's logs and sends an alert if it finds a log that matches a certain query. This is all good and fine, but whenever it does send an email alert, it's barebones. I am unable to include anything useful in the email alert such as the actual message, the user must instead click on "View incident" and go to the specified timeframe of when the alert happened.
Is there no way to include the message? As far as I can tell viewing the gcp Using Markdown and variables in documentation templates doc on this.
I'm only really able to use ${resource.label.x} which isn't really all that useful because it already includes most of that stuff by default in the alert.
Could I have something like ${jsonPayload.message}? It didn't work when I tried it.
Probably (!) not.
To be clear, the alerting policies track metrics (not logs) and you've created a log-based metric that you're using as the basis for an alert.
There's information loss between the underlying log (that contains e.g. jsonPayload) and the metric that's produced from it (which probably does not). You can create Log-based metrics labels using expressions that include the underlying log entry fields.
However, per the example in Google's docs, you'd want to consider a limited (enum) type for these values (e.g. HTTP status although that may be too broad too) rather than a potentially infinite jsonPayload.
It is possible. Suppose you need to pass "jsonPayload.message" present in your GCP log to documentation section in your policy. You need to use "label_extractor" feature to extract your log message.
I will share a policy creation JSON file template wherein you can pass "jsonPayload.message" in the documentation section in your policy.
policy_json = {
"display_name": "<policy_name>",
"documentation": {
"content": "I have the extracted the log message:${log.extracted_label.msg}",
"mime_type": "text/markdown"
},
"user_labels": {},
"conditions": [
{
"display_name": "<condition_name>",
"condition_matched_log": {
"filter": "<filter_condition>",
"label_extractors": {
"msg": "EXTRACT(jsonPayload.message)"
}
}
}
],
"alert_strategy": {
"notification_rate_limit": {
"period": "300s"
},
"auto_close": "604800s"
},
"combiner": "OR",
"enabled": True,
"notification_channels": [
"<notification_channel>"
]
}

How to monitor all ec2 by CPU usage via CloudWatch

I am trying to set up monitoring of a large number of ec2s and their number is constantly changing. I would like the owner of this instance to receive a notification when the CPU usage is low for a long time.
I can create a function that would get a list of all ec2s, then get their CPU utilization, then send messages to the owners. This option does not suit me, since it takes some time to monitor the state, and not get the CPU utilization values per second of the function launch. And in general, this method looks bad.
I can set up alarm in cloudwatch, but only for one specific instance. This option is not suitable, since there are a lot of ec2 and their number varies.
I can create a dashboard with ec2 names and their CPU utilization. This dashboard will be dynamically replenished. But I haven't figured out how to send notifications from it.
How can I solve my problem without third-party solutions?
Please see this AWS document https://aws.amazon.com/blogs/mt/use-tags-to-create-and-maintain-amazon-cloudwatch-alarms-for-amazon-ec2-instances-part-1/
You will find some existing Lambda functions which will create Cloudwatch alert after creating EC2 instance automatically.
It looks a little bit tricky but worth seeing if you really want to make it automatic. But yes single cloud watch alert can't monitor multiple EC2 instances.
--
Another thing, same sample lambda function you will find from the existing template and it will directly create that lambda function and you can test it.
I have solved my problem. And it seems to me that this is one of the simplest options.
Using method get_metric_data from AWS SDK for Python Boto3 I wrote a function:
import boto3
from statistics import mean
from datetime import timedelta, datetime
cloudwatch_client = boto3.client('cloudwatch')
response = cloudwatch_client.get_metric_data(
MetricDataQueries=[
{
'Id': 'myrequest',
'MetricStat': {
'Metric': {
'Namespace': 'AWS/EC2',
'MetricName': 'CPUUtilization',
'Dimensions': [
{
'Name': 'InstanceId',
'Value': 'i-123abc456def'
}
]
},
'Period': 3600,
'Stat': 'Average',
'Unit': 'Percent'
}
},
],
StartTime=datetime.now() - timedelta(days=1),
EndTime=datetime.now()
)
for MetricDataResults in response['MetricDataResults']:
list_avg = mean(MetricDataResults['Values'])
print(list_avg)
At the output, I get the average CPU usage as a percentage. For the specified time.
I'm still learning, but I'll try to answer your questions if there are any. Thank you all!

Report matching time stamps from automatically triggered lambda functions

I am using Amazon Cloud Watch to trigger 4 different lambda functions every twelve hours. The lambda functions pull some data from an api and save it to my database. I want to make sure that the timestamp matches for the data on all my lambda functions. Initially I used the PostgreSQL default timestamp however this records time to the millisecond which introduces small discrepancies in time.
It seems like the Cloud Watch rule which invokes my lambda functions might be able to pass along an identical time stamp but I haven't been able to figure out how to do this, or even verify if it is possible.
I really don't need the time stamp to go to the minute. Mostly I am concerned with the date and whether it was the AM or PM batch so knowing time to the nearest hour is good enough.
If any AWS experts could lend me some advice it would be appreciated.
The scheduled CloudWatch (CW) Event rule passes the following event object to the lambda function, e.g.:
{
"version": "0",
"id": "a75ba59d-81d6-8363-8e68-593f7de30b09",
"detail-type": "Scheduled Event",
"source": "aws.events",
"account": "32323232",
"time": "2021-02-21T06:29:27Z",
"region": "us-east-1",
"resources": [
"arn:aws:events:us-east-1:32323232:rule/test"
],
"detail": {}
}
As you can see, time is measured to the second. Also CW does not guarantee exact execution of it events. They can be off by 1 minute:
Your scheduled rule is triggered within that minute, but not on the precise 0th second
So your four functions will have slightly different time. Thus, you have to manage that in your code - round it to nearest hour for example.
The alternative is to use your lambda environment build in tools for getting timestamp, instead of using time from event. This can be easier as you can just get timestamp with precision of 1 hour directly, rather then parse the time from event and the post-process it to get desired precision.

AWS cloud watch metrics with ASG name changes

On AWS cloud watch we have one dashboard per environment.
Each dashboard has N plots.
Some plots, use the Auto Scaling Group Name (ASG) to find the data to plot.
Example of such a plot (edit, tab source):
{
"metrics": [
[ "production", "mem_used_percent", "AutoScalingGroupName", "awseb-e-rv8y2igice-stack-AWSEBAutoScalingGroup-3T5YOK67T3FD" ]
],
... other params removed for brevity ...
"title": "Used Memory (%)",
}
Every time we deploy, the ASG name changes (deploy using code-deploy with Elastic Bean Stalk (EBS) configuration files from source).
I need to manually find the new name and update the N plots one by one.
The strange thing is that this happens for production and staging environments, but not for integration.
All 3 should be copies of one another, with different settings from the EBS configuration files, so I don't know what is going on.
In any case, what (I think) I need is one of:
option 1: prevent the ASG name change upon deploy
option 2: dynamically update the plots with the new name
option 3: plot the same data without using the ASG name (but alternatives I find are EC2 instance ID that changes and ImageId and InstanceType that are common to more than one EC2, so won't work either)
My online-search-foo has turned out empty.
More Info:
I'm publishing these metrics with the cloud watch agent, by adjusting the conf file, as per the docs here:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Agent-on-EC2-Instance.html
Have a look at CloudWatch Search Expression Syntax. It allows you to use tokens for searching, e.g.:
SEARCH(' {AWS/CWAgent, AutoScalingGroupName} MetricName="mem_used_percent" rv8y2igice', 'Average', 300)
which would replace the entry for metrics like so:
"metrics": [
[ { "expression": "SEARCH(' {AWS/CWAgent, AutoScalingGroupName} MetricName=\"mem_used_percent\" rv8y2igice', 'Average', 300)", "label": "Expression1", "id": "e1" } ]
]
Simply search the desired result in the console, results that match the search appear.
To graph, all of the metrics that match your search, choose Graph search
and find the accurate search expression that you want in the Details on the Graphed metrics tab.
SEARCH('{CWAgent,AutoScalingGroupName,ImageId,InstanceId,InstanceType} mem_used_percent', 'Average', 300)

CloudWatch does not aggregate across dimensions for your custom metrics

Reading the docs I saw this statement;
CloudWatch does not aggregate across dimensions for your custom
metrics
That seems like a HUGE limitation right? It would make custom metrics all but useless in my estimation- so I want to confirm I'm understanding this.
For example say I had a custom metric I shipped from multiple servers. I want to see per server but I also want to see them all together. I would have no way of aggregating that accross all the servers? Or would i be forced to create two custom metrics, one for single server and one for all server and double post metrics from the servers to the per server one AND the one for aggregating all of them?
The docs are correct, CloudWatch won't aggregate across dimensions for your custom metrics (it will do so for some metrics published by other services, like EC2).
This feature may seem useful and clear for your use-case but it's not clear how such aggregation would behave in a general case. CloudWatch allows for up to 10 dimensions so aggregating for all combinations of those may result in a lot of useless metrics, for all of which you would be billed. People may use dimensions to split their metrics between Test and Prod stacks for example, which are completely separate and aggregating those would not make sense.
CloudWatch is treating a metric name plus a full set of dimensions as a unique metric identifier. In your case, this means that you need to publish your observations for each metric you want it contributing to separately.
Let's say you have a metric named Latency, and you're putting a hostname in a dimension called Server. If you have three servers this will create three metrics:
Latency, Server=server1
Latency, Server=server2
Latency, Server=server3
So the approach you mentioned in your question will work. If you also want a metric showing the data across all servers, each server would need to publish to a separate metric, which would be best to do by using a new common value for the Server dimension, something like AllServers. This will result in you having 4 metrics, like this:
Latency, Server=server1 <- only server1 data
Latency, Server=server2 <- only server2 data
Latency, Server=server3 <- only server3 data
Latency, Server=AllServers <- data from all 3 servers
Update 2019-12-17
Using metric math SEARCH function: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html
This will give you per server latency and latency across all servers, without publishing a separate AllServers metric and if a new server shows up, it will be automatically picked up by the expression:
Graph source:
{
"metrics": [
[ { "expression": "SEARCH('{SomeNamespace,Server} MetricName=\"Latency\"', 'Average', 60)", "id": "e1", "region": "eu-west-1" } ],
[ { "expression": "AVG(e1)", "id": "e2", "region": "eu-west-1", "label": "All servers", "yAxis": "right" } ]
],
"view": "timeSeries",
"stacked": false,
"region": "eu-west-1"
}
Result will be a graph like this:
Downsides of this approach:
Expressions are limited to 100 metrics.
Overall aggregation is limited to available metric math functions, which means percentiles are not available as of 2019-12-17.
Using Contributor Insights (open preview as of 2019-12-17): https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContributorInsights.html
If you publish your logs to CloudWatch Logs in JSON or Common Log Format (CLF), you can create rules that keep track of top contributors. For example, a rule that keeps track servers with latencies over 400 ms would look something like this:
{
"Schema": {
"Name": "CloudWatchLogRule",
"Version": 1
},
"AggregateOn": "Count",
"Contribution": {
"Filters": [
{
"Match": "$.Latency",
"GreaterThan": 400
}
],
"Keys": [
"$.Server"
],
"ValueOf": "$.Latency"
},
"LogFormat": "JSON",
"LogGroupNames": [
"/aws/lambda/emf-test"
]
}
Result is a list of servers with most datapoints over 400 ms:
Bringing it all together with CloudWatch Embedded Format: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Embedded_Metric_Format.html
If you publish your data in CloudWatch Embedded Format you can:
Easily configure dimensions, so you can have per server metrics and overall metric if you want.
Use CloudWatch Logs Insights to query and visualise your logs.
Use Contributor Insights to get top contributors.