Why is my AWS CloudWatch alarm not being triggered? - amazon-web-services

I'm trying to setup AWS to send notifications to a slack channel when a CloudWatch alarm goes off. I'm following along in this guide:
https://medium.com/analytics-vidhya/generate-slack-notifications-for-aws-cloudwatch-alarms-e46b68540133
I think I did everything properly but I'm not getting my slack notifications. I'm not sure where in the process it's failing but I suspect the alarm is not being triggered.
Here are the details:
CloudWatch logs shows my error is being logged:
Here is my filter metric:
Here is how I define the pattern on which I want to filter:
Here is the state of the alarm:
The alarm seems to be OK. I gave it 5 minutes after logging the error. Does this mean the alarm is not being triggered?
Thanks
UPDATE
Here are some updated screen shots to address Marcin's point about the time discrepancy (note that the CloudWatch logs are in local time and the Alarm graph is in UTC--a 6 hour difference):
I'm not exactly sure how to interpret the graph. It says OK in the top right corner but the horizontal red line at 1 seems to indicate that it's in an alarm state.

It seems to me that there must be something wrong between your pattern filter and the alarm trigger since you clearly have the message in your log stream and the alarm condition has been met.
I'm more used to seeing filter patterns in JSON, but going over the documentation at https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html there's a line that says:
Metric filter terms that include characters other than alphanumeric or underscore must be placed inside double quotes ("").
And I'm thinking that your message pattern has a dash and it's neither alphanumeric nor an underscore on it so maybe that's the part that's being interpreted differently than expected.

I believe the problem is that CloudWatch Filter's need to be quoted if they have characters other than alphanumerics and underscores.
Since your pattern has dashes in it, you will need to put your filter pattern in double quotes. Without quotes, CloudWatch may interpret dashes as minus signs used to exclude terms.
"LOGIN-SIGNUP-ERROR"
Also, as it was already discussed in the comments, you should change the statistic to SUM instead of AVERAGE assuming you want to be alerted each time this error occurs.
References:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html

Related

Error in metric filter pattern in cloud watch

Iam trying to create the custom cloudwatch metric from the Log Groups
I am trying to create the metric pattern for the status of the email. I just need to monitor the the response in email(success/failure)
My cloudwatch logs look like below
Email status : [EmailStatusResponse{farmId=3846, emailIds='xxx', response='success'}
So, i just need to monitor two cases
response='success'
response='failure'
Please find the below snippet for my configuration
Can anyone pls help me with the error in the filter pattern
kindly help!
Wrap this in double quotes.
Metric filter terms that include characters other than alphanumeric or underscore must be placed inside double quotes ("").
For you it would be "response='success'"

Is it possible to set up CloudWatch Alarm for 3 or 4 mins period?

I need to receive a notification each time a certain message does not appear in logs for 3-4 minutes. It is a clear sign that the system is not working properly.
But it is only possible to choose 1 min or 5 mins. Is there any workaround?
"does not appear in logs for 3-4 minutes. It is a clear sign that the system is not working properly."
-- I know what you mean, CloudWatch Alarm on a metric which is not continuously pushed might behave a bit differently.
You should consider using Alarm's M out of N option with 3 out 4 option.
https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-cloudwatch-alarms-now-alerts-you-when-any-m-out-of-n-metric-datapoints-in-an-interval-are-above-your-threshold/
Also, if the metric you are referring to was created using a metric filter on a CloudWatch Log Group, you should edit the metric to include a default value so that each time a log is pushed and the metric filter expression does not match it still pushes a default value (of say 0) thus making metric have more continuous datapoint.
If you describe an cloudwatch alarm using AWS Cli it is possible to input the period in seconds.Only the web interface limits the period to set of values.
https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/describe-alarms.html

Filtering for email addresses in AWS Cloudwatch Logs?

I am looking to setup some CloudFormation stuff that is able to find any email addresses in CloudWatch logs and let us know that one slipped through the cracks. I thought this would be a simple process of using a RegEx pattern that catches all the possible variations and email address can have, and using that as a filter. Having discovered that CloudWatch filtering does not support RegEx I've become a bit stumped as to how to write a filter that can be relied upon to catch any email address.
Has anyone done something similar to this, or know where a good place to start would be?
Amazon has launched a service called CloudWatch insights and it allows to filter messages logs. In the previous link you have examples of queries.
You need to select the CloudWatch Log Group and the period of time in which search.
Example:
fields #message
| sort #timestamp desc
| filter #message like /.*47768.*/
If you're exporting the logs somewhere (Like Sumologic, Datadog etc) thats a better place to do that alerting.
If not and you're exporting them into S3 then a triggered lambda function that runs the check might do the trick. Could be expensive long term though.
The solution that we landed upon was to pass stings through a RegEx pattern that recognises email addresses before they logged into AWS. Replacing any matches with [REDACTED]. Which is simple enough to do in a lambda.

Filter AWS Cloudwatch Lambda's Log

I have a Lambda function and its logs in Cloudwatch (Log group and Log Stream). Is it possible to filter (in Cloudwatch Management Console) all logs that contain "error"? For example logs containing "Process exited before completing request".
In Log Groups there is a button "Search Events". You must click on it first.
Then it "changes" to "Filter Streams":
Now you should just type your filter and select the beginning date-time.
So this is kind of a side issue, but it was relevant for us. (I posted this to another answer on StackOverflow but thought it would be relevant to this conversation too)
We've noticed that tailing and searching logs gets really slow after a log group has a lot of Log Streams in it, like when an AWS Lambda Function has had a lot of invocations. This is because "tail" type utilities and searching need to connect to each log stream to run. Log Events get expired and deleted due to the policy you set on the Log Group itself, but the Log Streams never get cleaned up. I made a few little utility scripts to help with that:
https://github.com/four43/aws-cloudwatch-log-clean
Hopefully that save you some agony over waiting for those logs to get searched.
You can also use CloudWatch Insights (https://aws.amazon.com/about-aws/whats-new/2018/11/announcing-amazon-cloudwatch-logs-insights-fast-interactive-log-analytics/) which is an AWS extension to CloudWatch logs that gives a pretty powerful query and analytics tool. However it can be slow. Some of my queries take up to a minute. Okay, if you really need that data.
You could also use a tool I created called SenseLogs. It downloads CloudWatch data to your browser where you can do queries like you ask about. You can use either full text and search for "error" or if your log data is structured (JSON), you can use a Javascript like expression language to filter by field, eg:
error == 'critical'
Posting an update as CloudWatch has changed since 2016:
In the Log Groups there is a Search all button for a full-text search
Then just type your search:

Stopping AWS Cloudwatch INSUFFICIENT_DATA to OK transition emails

With Cloudwatch alarms I want to know about ALARM -> OK transition, but INSUFFICIENT_DATA -> OK transition just gets really annoying.
Is there a way to stop the latter notification? I could do via an email filter but would rather stop it at the source if possible.
Sadly, the answer appears to be:
Currently, this isn't possible though it is an interesting request.
from this forum answer on the 21st of January 2013:
https://forums.aws.amazon.com/thread.jspa?messageID=417727
In 2018 AWS launched metric math. Since that the problem could be solved with FILL() function. The function replaces empty (INSUFFICIENT_DATA) points with constant values.
CW metric example screenshot
m1 is a raw CloudWatch metric which contains holes in the time series.
e1 is a math metric and has zeros instead of empty points.
Using Metric Math:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html
Detailed explanation how to create a metric math alarms:
https://aws.amazon.com/blogs/mt/create-a-metric-math-alarm-using-amazon-cloudwatch/
The similar question:
AWS Cloudwatch Math Expressions: removing Insufficient Data: is there a "coalesce" function like SQL?