Stopping AWS Cloudwatch INSUFFICIENT_DATA to OK transition emails - amazon-web-services

With Cloudwatch alarms I want to know about ALARM -> OK transition, but INSUFFICIENT_DATA -> OK transition just gets really annoying.
Is there a way to stop the latter notification? I could do via an email filter but would rather stop it at the source if possible.

Sadly, the answer appears to be:
Currently, this isn't possible though it is an interesting request.
from this forum answer on the 21st of January 2013:
https://forums.aws.amazon.com/thread.jspa?messageID=417727

In 2018 AWS launched metric math. Since that the problem could be solved with FILL() function. The function replaces empty (INSUFFICIENT_DATA) points with constant values.
CW metric example screenshot
m1 is a raw CloudWatch metric which contains holes in the time series.
e1 is a math metric and has zeros instead of empty points.
Using Metric Math:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html
Detailed explanation how to create a metric math alarms:
https://aws.amazon.com/blogs/mt/create-a-metric-math-alarm-using-amazon-cloudwatch/
The similar question:
AWS Cloudwatch Math Expressions: removing Insufficient Data: is there a "coalesce" function like SQL?

Related

AWS Auto Scaling Group: is it possible to schedule an increase of capacity every first sunday of each months?

I'm using Auto Scaling groups which works pretty fine with some custom rules I've set, but I also noticed that I need to set the minimum values of servers to a certain, higher, amount every first Sunday of each month (we have an increase in requests at that period).
I saw the Scheduled actions in the "Automatic scaling" tab, but it does not appear to be possible to set longer than per week, even less setting something like "every first Sunday of each months".
Is this something possible in another way ? Maybe via some Cloudwatch settings with custom "cron" tasks? I'm not sure here.
You don't need to use an external service or a Lambda to do it. You can do it in the advance tab in AutoScaling option in AWS.
For your particular case, your crontab expression would be 0 14 ? * 1#1 * See the image below, it will be triggered every first Sunday of the month.
Edit: It only works in Event bridge cron expression.
Cron expression
You can create a different timer (e.g. for lambda called every morning/hour/whatever) and scale-up the scaling group from there if its the first sunday of the month.

Why is my AWS CloudWatch alarm not being triggered?

I'm trying to setup AWS to send notifications to a slack channel when a CloudWatch alarm goes off. I'm following along in this guide:
https://medium.com/analytics-vidhya/generate-slack-notifications-for-aws-cloudwatch-alarms-e46b68540133
I think I did everything properly but I'm not getting my slack notifications. I'm not sure where in the process it's failing but I suspect the alarm is not being triggered.
Here are the details:
CloudWatch logs shows my error is being logged:
Here is my filter metric:
Here is how I define the pattern on which I want to filter:
Here is the state of the alarm:
The alarm seems to be OK. I gave it 5 minutes after logging the error. Does this mean the alarm is not being triggered?
Thanks
UPDATE
Here are some updated screen shots to address Marcin's point about the time discrepancy (note that the CloudWatch logs are in local time and the Alarm graph is in UTC--a 6 hour difference):
I'm not exactly sure how to interpret the graph. It says OK in the top right corner but the horizontal red line at 1 seems to indicate that it's in an alarm state.
It seems to me that there must be something wrong between your pattern filter and the alarm trigger since you clearly have the message in your log stream and the alarm condition has been met.
I'm more used to seeing filter patterns in JSON, but going over the documentation at https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html there's a line that says:
Metric filter terms that include characters other than alphanumeric or underscore must be placed inside double quotes ("").
And I'm thinking that your message pattern has a dash and it's neither alphanumeric nor an underscore on it so maybe that's the part that's being interpreted differently than expected.
I believe the problem is that CloudWatch Filter's need to be quoted if they have characters other than alphanumerics and underscores.
Since your pattern has dashes in it, you will need to put your filter pattern in double quotes. Without quotes, CloudWatch may interpret dashes as minus signs used to exclude terms.
"LOGIN-SIGNUP-ERROR"
Also, as it was already discussed in the comments, you should change the statistic to SUM instead of AVERAGE assuming you want to be alerted each time this error occurs.
References:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html

How can I get AWS lambda usage for the last hour?

I would like to know if there is a way to get all of my lambda invocation usages for the last 1 hour (better if every 5 minutes).
It could also be nice to get the cost usage (but from what I've read it only updates once a day).
From looking at the documentation it seems like I can use GetMetricData (Cloudwatch), is there a better one for my use case?
You can get this information by region within CloudWatch metrics.
In the AWS/Lambda namespace is a metric named Invocations, this can be viewed for the entire region or on a per Lambda basis.
If you look at the Sum per whichever period you want to use (you can get down to per 1 minute values for this metric), you will be able to get these values in near real-time.
You can get these values from within the console or by using the get-metric-data command within the CLI or SDK.
There are many tools to get metrics on your lambda, so it really depends on your needs.
What do you mean by "is there a better one for my use case"?
If you prefer, you can check it through the console: Go to cloudwatch -> metrics -> and navigate to your lambda. You can aggregate the data differently (examples: average per 5 minutes, or total a day, etc.)
Here's a great doc: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html#monitoring-metrics-invocation
Moreover, here's a solution that I gave that surveys different approaches to monitor lambda resources: Best Way to Monitor Customer Usage of AWS Lambda
Disclosoure: I work for Lumigo, a company that does exactly that.

Is it possible to set up CloudWatch Alarm for 3 or 4 mins period?

I need to receive a notification each time a certain message does not appear in logs for 3-4 minutes. It is a clear sign that the system is not working properly.
But it is only possible to choose 1 min or 5 mins. Is there any workaround?
"does not appear in logs for 3-4 minutes. It is a clear sign that the system is not working properly."
-- I know what you mean, CloudWatch Alarm on a metric which is not continuously pushed might behave a bit differently.
You should consider using Alarm's M out of N option with 3 out 4 option.
https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-cloudwatch-alarms-now-alerts-you-when-any-m-out-of-n-metric-datapoints-in-an-interval-are-above-your-threshold/
Also, if the metric you are referring to was created using a metric filter on a CloudWatch Log Group, you should edit the metric to include a default value so that each time a log is pushed and the metric filter expression does not match it still pushes a default value (of say 0) thus making metric have more continuous datapoint.
If you describe an cloudwatch alarm using AWS Cli it is possible to input the period in seconds.Only the web interface limits the period to set of values.
https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/describe-alarms.html

Why playing with AWS DynamoDb "Hello world" produces read/write alarms?

I'v started to play with DynamoDb and I'v created "dynamo-test" table with hash PK on userid and couple more columns (age, name). Read and write capacity is set to 5. I use Lambda and API Gateway with Node.js. Then I manually performed several API calls through API gateway using similar payload:
{
"userId" : "222",
"name" : "Test",
"age" : 34
}
I'v tried to insert the same item couple times (which didn't produce error but silently succeeded.) Also, I used DynamoDb console and browsed for inserted items several times (currently there are 2 only). I haven't tracked how many times exactly I did those actions, but that was done completely manually. And then after an hour, I'v noticed 2 alarms in CloudWatch:
INSUFFICIENT_DATA
dynamo-test-ReadCapacityUnitsLimit-BasicAlarm
ConsumedReadCapacityUnits >= 240 for 12 minutes
No notifications
And the similar alarm with "...WriteCapacityLimit...". Write capacity become OK after 2 mins, but then went back again after 10 mins. Anyway, I'm still reading and learning how to plan and monitor these capacities, but this hello world example scared me a bit if I'v exceeded my table's capacity :) Please, point me to the right direction if I'm missing some fundamental part!
It's just an "INSUFFICIENT_DATA" message. It means that your table hasn't had any reads or writes in a while, so there is insufficient data available for the CloudWatch metric. This happens with the CloudWatch alarms for any DynamoDB table that isn't used very often. Nothing to worry about.
EDIT: You can now change a setting in CloudWatch alarms to ignore missing data, which will leave the alarm at its previous state instead of changing it to the "INSUFFICIENT_DATA" state.