Linking to AWS Cloudwatch Logs from Alarm SNS - amazon-web-services

I have CloudWatch alarms sending SNS messages back with error information, and I'm using that along with the slackWebhook to send alarm messages to our Slack channel. I'd like to be able to include a link to the relevant logs, but right now all I'm seeing that may be useful is the alarm Arn. Can I use this somehow, or is there a way to scrape the aws error logs for that Arn and link to that somehow?
Here's the JSON from the SNS message:
{
"AlarmName": "EmailErrorsFF58B22B-HFUJGANB6BDD",
"AlarmDescription": "Some Description",
"AWSAccountId": "<REMOVED>",
"AlarmConfigurationUpdatedTimestamp": "2022-03-24T12:20:22.195+0000",
"NewStateValue": "ALARM",
"NewStateReason": "Threshold Crossed: 1 datapoint [1.0 (25/03/22 15:39:00)] was greater than the threshold (0.0).",
"StateChangeTime": "2022-03-25T15:44:45.495+0000",
"Region": "US East (N. Virginia)",
"AlarmArn": "arn:aws:cloudwatch:<REMOVED>",
"OldStateValue": "OK",
"OKActions": [],
"AlarmActions": [
"arn:aws:sns:<REMOVED>"
],
"InsufficientDataActions": [],
"Trigger": {
"MetricName": "Errors",
"Namespace": "AWS/Lambda",
"StatisticType": "Statistic",
"Statistic": "SUM",
"Unit": null,
"Dimensions": [
{
"value": "Email-production",
"name": "FunctionName"
}
],
"Period": 300,
"EvaluationPeriods": 1,
"ComparisonOperator": "GreaterThanThreshold",
"Threshold": 0,
"TreatMissingData": "",
"EvaluateLowSampleCountPercentile": ""
}
}

Related

How to get resource tags from CloudWatch alarm's trigger in lambda

I'm trying to get resource tags from the resource that breached a CloudWatch alarm in a Lambda function
Say I have 2 CloudWatch alarms - one for CPU utilization, another for Lambda errors, both publish to the same SNS topic
Then a Lambda function is triggered from that SNS topic. This Lambda function needs to know which resource triggered the CloudWatch alarm, then I assume call list_tags_for_resource() on the ARN of said resource
However the payload from CloudWatch doesn't include the ARN of the resource. Example:
{
"AlarmName": "LessThanThreshold-CPUUtilization",
"AlarmDescription": "Created from EC2 Console",
"AWSAccountId": "xx",
"AlarmConfigurationUpdatedTimestamp": "2022-03-01T03:29:21.832+0000",
"NewStateValue": "ALARM",
"NewStateReason": "Threshold Crossed: 1 out of the last 1 datapoints [0.161290322580642 (01/03/22 03:34:00)] was less than the threshold (0.99) (minimum 1 datapoint for OK -> ALARM transition).",
"StateChangeTime": "2022-03-01T03:35:27.613+0000",
"Region": "US East (N. Virginia)",
"AlarmArn": "arn:aws:cloudwatch:us-east-1:xx:alarm:LessThanThreshold-CPUUtilization",
"OldStateValue": "OK",
"Trigger": {
"MetricName": "CPUUtilization",
"Namespace": "AWS/EC2",
"StatisticType": "Statistic",
"Statistic": "AVERAGE",
"Unit": null,
"Dimensions": [
{
"value": "i-xx",
"name": "InstanceId"
}
],
"Period": 60,
"EvaluationPeriods": 1,
"DatapointsToAlarm": 1,
"ComparisonOperator": "LessThanThreshold",
"Threshold": 0.99,
"TreatMissingData": "",
"EvaluateLowSampleCountPercentile": ""
}
}

How do I use Lambda to get EC2 information via CloudWatch?

The following flow is executing Lambda.
monitor log files in EC2 with cloudwatch logs
Detects monitored strings with a metrics filter
Execute Lambda with alarm
I would like to know how to get the following information within Lambda.
Path of the log file being monitored
Instance name
Instance id
Alarm name
I am writing in python and trying to get it using boto3.
You can easily achieve this in 2 ways:-
Create a cloudwatch event bridge rule with event type as cloudwatch
alarm state change.
Whenever your alarm will be in an alarm state it will send an event, configure the target of this event type as lambda function or sns topic, whatever suits your need.
Sample event from this rule
{
"version": "0",
"id": "c4c1c1c9-6542-e61b-6ef0-8c4d36933a92",
"detail-type": "CloudWatch Alarm State Change",
"source": "aws.cloudwatch",
"account": "123456789012",
"time": "2019-10-02T17:04:40Z",
"region": "us-east-1",
"resources": ["arn:aws:cloudwatch:us-east-1:123456789012:alarm:ServerCpuTooHigh"],
"detail": {
"alarmName": "ServerCpuTooHigh",
"configuration": {
"description": "Goes into alarm when server CPU utilization is too high!",
"metrics": [{
"id": "30b6c6b2-a864-43a2-4877-c09a1afc3b87",
"metricStat": {
"metric": {
"dimensions": {
"InstanceId": "i-12345678901234567"
},
"name": "CPUUtilization",
"namespace": "AWS/EC2"
},
"period": 300,
"stat": "Average"
},
"returnData": true
}]
},
"previousState": {
"reason": "Threshold Crossed: 1 out of the last 1 datapoints [0.0666851903306472 (01/10/19 13:46:00)] was not greater than the threshold (50.0) (minimum 1 datapoint for ALARM -> OK transition).",
"reasonData": "{\"version\":\"1.0\",\"queryDate\":\"2019-10-01T13:56:40.985+0000\",\"startDate\":\"2019-10-01T13:46:00.000+0000\",\"statistic\":\"Average\",\"period\":300,\"recentDatapoints\":[0.0666851903306472],\"threshold\":50.0}",
"timestamp": "2019-10-01T13:56:40.987+0000",
"value": "OK"
},
"state": {
"reason": "Threshold Crossed: 1 out of the last 1 datapoints [99.50160229693434 (02/10/19 16:59:00)] was greater than the threshold (50.0) (minimum 1 datapoint for OK -> ALARM transition).",
"reasonData": "{\"version\":\"1.0\",\"queryDate\":\"2019-10-02T17:04:40.985+0000\",\"startDate\":\"2019-10-02T16:59:00.000+0000\",\"statistic\":\"Average\",\"period\":300,\"recentDatapoints\":[99.50160229693434],\"threshold\":50.0}",
"timestamp": "2019-10-02T17:04:40.989+0000",
"value": "ALARM"
}
}
}
Inside your cloud watch alarm there is an alarm action there your can add SNS topic to it and then you can easily get your event information, if you want to process it further,you can add lambda to SNS topic.

Cloudwatch Alarm doesn't leave alarm state and doesn't retrigger

I created a custom metric with the unit count. The requirement is to check every 24h if the sum of the metric count is >= 1. If so a message should be sent to sns topic which triggers a lambda which sends a message to slack channel.
Metric behaviour: Currently the custom metric is always higher than one. I crate a datapoint every 10 sec.
Alarm behaviour: The alarm instantly switches into alarm state and sends a message to the sns topic. But the state never leaves the alarm state and also doesn't retrigger a new message 24h later to the sns topic.
How should I configure my alarm if I want to achieve my requirement?
Thanks in advance,
Patrick
Here is the aws cloudwatch describe-alarms result:
{
"MetricAlarms": [
{
"AlarmName": "iot-data-platform-stg-InvalidMessagesAlarm-1OS91W5YCQ8E9",
"AlarmArn": "arn:aws:cloudwatch:eu-west-1:xxxxxx:alarm:iot-data-platform-stg-InvalidMessagesAlarm-1OS91W5YCQ8E9",
"AlarmDescription": "Invalid Messages received",
"AlarmConfigurationUpdatedTimestamp": "2020-04-03T18:11:15.076Z",
"ActionsEnabled": true,
"OKActions": [],
"AlarmActions": [
"arn:aws:sns:eu-west-1:xxxxx:iot-data-platform-stg-InvalidMessagesTopic-FJQ0WUJY9TZC"
],
"InsufficientDataActions": [],
"StateValue": "ALARM",
"StateReason": "Threshold Crossed: 1 out of the last 1 datapoints [3.0 (30/03/20 11:49:00)] was greater than or equal to the threshold (1.0) (minimum 1 datapoint for OK -> ALARM transition).",
"StateReasonData": "{\"version\":\"1.0\",\"queryDate\":\"2020-03-31T11:49:03.417+0000\",\"startDate\":\"2020-03-30T11:49:00.000+0000\",\"statistic\":\"Sum\",\"period\":86400,\"recentDatapoints\":[3.0],\"threshold\":1.0}",
"StateUpdatedTimestamp": "2020-03-31T11:49:03.421Z",
"MetricName": "InvalidMessages",
"Namespace": "Message validation",
"Statistic": "Sum",
"Dimensions": [
{
"Name": "stream",
"Value": "raw events"
},
{
"Name": "stage",
"Value": "stg"
}
],
"Period": 86400,
"EvaluationPeriods": 1,
"DatapointsToAlarm": 1,
"Threshold": 1.0,
"ComparisonOperator": "GreaterThanOrEqualToThreshold",
"TreatMissingData": "notBreaching"
}
]
}

amazon-Cloudformation metric alarm prevent to be on alert when there is no information

I'm trying to create an alarm for a metric in cloudformation
So I have my MetricFilter with my Alarm Like next:
{
"logMetric": {
"Type" : "AWS::Logs::MetricFilter",
"Properties" : {
"FilterPattern" : "[ERROR, WARNING, FATAL, Exception]",
"LogGroupName" : "/logapp",
"MetricTransformations" : [ {
"MetricValue": "1",
"MetricNamespace": "ErrorLogs/app",
"MetricName": "AppLogMetric"
} ]
}
},
"AppLogAlert": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"ActionsEnabled": "true",
"AlarmActions": ["arn"],
"AlarmDescription": "trigger alert when an error is received into the app",
"AlarmName": "app-ErrorLog-alert",
"ComparisonOperator": "GreaterThanOrEqualToThreshold",
"EvaluationPeriods": 1,
"MetricName": "AppLogMetric",
"Namespace": "ErrorLogs/app",
"Period": 60,
"Statistic": "Maximum",
"Threshold": 1
}
}
}
This is creating both AWS::Logs::MetricFilter and AWS::CloudWatch::Alarm and it looks great, it goes into Alarm status when there is more than 1 detection in the last minute.
The problem is that when there are no logs with those filters, the alarm does not go back to OK status, as the graph does not show "0" alerts, its just blank space Is there any way to make this happen?
Thanks.
I think the solution is that you set TreatMissingData as notBreaching.
I think your usecase is similar to this one for kms: Creating an Amazon CloudWatch Alarm to Detect Usage of a Customer Master Key that is Pending Deletion

Publish AWS SNS message to Pagerduty

I have integrated pagerduty with AWS cloudwatch and I am trying to publish a message manually to a SNS Topic that is subscribed by pagerduty and email. But I am not able to get incidents in pagerduty. However, cloudwatch alarms are triggering incidents in pagerduty using this same topic.
I referred some document for pagerduty message payload. But unable to make it work. My SNS message JSON is as follows,
{
"default":"test message",
"email":"test email message",
"https":{
"service_key":"XXXX",
"event_type":"trigger",
"description":"Example alert on host1.example.com"
}
}
Its not triggering an incident in pagerduty. I am not sure what I am missing in the request body. I am receiving email messages properly from this same message body. Could someone point out the mistake?
Thanks in advance.
To do so, you must choose the option Custom Event Transformer for the PagerDuty Integration. In the integration, you can write your own JavaScript code as follows:
var normalized_event = {
event_type: PD.Trigger,
description: "SNS Event",
details: PD.inputRequest
};
PD.emitGenericEvents([normalized_event]);
To parse the received payload from SNS, you can use:
var rawBody = PD.inputRequest.rawBody;
var obj = JSON.parse(unescape(rawBody));
And treat obj to treat your event according to your SNS message.
I believe PagerDuty's native AWS CloudWatch integration is opinionated. So a Custom SNS message won't trigger an incident.
But PagerDuty has an inbound integration type that allows you to create a script using JS (ES5) to parse any custom message sent to the this integration - which can then trigger an incident based on the logic of your script.
Docs on the Custom Event Transformer: https://v2.developer.pagerduty.com/docs/creating-an-integration-inline
I'm too late to answer this but still adding as #filipebarretto has suggested we need to use Custom Event Transformer for this type of integration.
Setup: ~ AWS Cloudwatch (RDS Metric) -> AWS SNS -> PagerDuty (CET)
I have successfully integrated AWS SNS to PagerDuty via Custom Event Transformer
var body = JSON.parse(PD.inputRequest.rawBody)
var message = body.NewStateReason
var normalized_event = {
event_type: PD.Trigger,
description: body.AlarmName,
details: message
};
PD.emitGenericEvents([normalized_event]);
The above code will send incident as AlarmName and details as NewStateReason.
I tested with below sample events as SNS message, it works fine.
{
"version": "0",
"id": "bba1bcef-5268-9967-8628-9a6d09e042e9",
"detail-type": "CloudWatch Alarm State Change",
"source": "aws.cloudwatch",
"account": "[Account ID]",
"time": "2020-11-17T06:25:42Z",
"region": "[region Id]",
"resources": [
"arn:aws:cloudwatch:[region Id]:[Account ID]:alarm:CPUUtilize"
],
"detail": {
"alarmName": "CPUUtilize",
"state": {
"value": "ALARM",
"reason": "Threshold Crossed: 1 out of the last 1 datapoints [4.314689265544354 (17/11/20 06:20:00)] was less than the threshold (70.0) (minimum 1 datapoint for OK -> ALARM transition).",
"reasonData": {
"version": "1.0",
"queryDate": "2020-11-17T06:25:42.491+0000",
"startDate": "2020-11-17T06:20:00.000+0000",
"statistic": "Average",
"period": 300,
"recentDatapoints": [
4.314689
],
"threshold": 70
},
"timestamp": "2020-11-17T06:25:42.493+0000"
},
"previousState": {
"value": "OK",
"reason": "Threshold Crossed: 1 out of the last 1 datapoints [4.484088172640544 (17/11/20 05:44:00)] was not greater than or equal to the threshold (70.0) (minimum 1 datapoint for ALARM -> OK transition).",
"reasonData": {
"version": "1.0",
"queryDate": "2020-11-17T05:49:53.688+0000",
"startDate": "2020-11-17T05:44:00.000+0000",
"statistic": "Average",
"period": 300,
"recentDatapoints": [
4.484088
],
"threshold": 70
},
"timestamp": "2020-11-17T05:49:53.691+0000"
},
"configuration": {
"description": "Alarm Notification in my local timezone",
"metrics": [
{
"id": "16baea70-421b-0a6e-f6f1-bc913d2bf647",
"metricStat": {
"metric": {
"namespace": "AWS/EC2",
"name": "CPUUtilization",
"dimensions": {
"InstanceId": "i-0e448XXXXXXXXXXXX"
}
},
"period": 300,
"stat": "Average"
},
"returnData": true
}
]
}
}
}
Took from https://aws.amazon.com/blogs/mt/customize-amazon-cloudwatch-alarm-notifications-to-your-local-time-zone-part-1/
I am even later to the game here, but ...
How are you 'manually' sending the events? Did you check that the Policy on the SNS topic allows publishing of notifications from whichever service you are using to publish the events?
I had a similar issue with publishing notifications/events from AWS Backup. I had to add something like this to the Access Policy:
{
"Sid": "My-statement-id",
"Effect": "Allow",
"Principal": {
"Service": "backup.amazonaws.com"
},
"Action": "SNS:Publish",
"Resource": "arn:aws:sns:region:account-id:myTopic"
}