I want to setup an alarm which would raise an alert in case there are any items in the DyanamoDB table. The alarm has been setup in the following manner -
My understanding is that -
Period defines how regularly the data point is recorded in this case, a datapoint per minute.
Maximum statistic means that we select the maximum value of RecordedItemCount in a minute.
2 out of 5 would mean that from the last 5 datapoint (5 minutes), if 2 of them are in ALARM, the state of the alarm would change.
However I am not seeing the intended results. I can only see a single datapoint (instead of a datapoint every minute?) in the chart and the state is OK even when the datapoint is above the threshold?
Could someone help out with this?
I figured out that this was the intended behaviour. As per the documentation for ReturnedItemCount -
This metric would only record a datapoint if the Query/Scan operation was performed on the table during a given time interval. It isn't a periodic check unlike some of the other metrics available on CloudWatch.
Related
I am using on-demand Dynamodb table and I have read the doc https://aws.amazon.com/premiumsupport/knowledge-center/on-demand-table-throttling-dynamodb/. It says You might experience throttling if you exceed double your previous traffic peak within 30 minutes. It means Dynamodb adjust the RCU/WCU based on the last 30 minutes.
Let's say my table is throttled, do I have to wait for maximum 30 minutes until the table adjust its RCU/WCU? Or does the table update RCU immediately? or in a few minutes?
The reason I am asking is that I'd like to put a retry on my application code to retry the DB action whenever there is a throttle. How can I add sleep interval between the retry?
Capacity is always managed with an On Demand table to support double any previous peak throughput, but if you grow faster than that, the table will add physical capacity (physical partitions).
When DynamoDB adds partitions it can take between 5 minutes and 30 minutes for that capacity to be available for use.
It has nothing to do with RCUs/WCUs because On Demand tables don't have capacity units.
Note: You may stay throttled if you've designed a hot partition key in either the base table or a GSI.
During the throttle period requests are still getting handled (and handled at a good rate). Just like if you see a line at the grocery store check out, you get in line. Don't design the code to come back in 30 minutes hoping there's no line after adding checkers. The grocery store will be "adding checkers" when it notices the load is high, but it also keeps the existing work processing.
I have been having some difficulty understanding how to go about the ideal threshold for few of our cloudwatch alarms. I am looking at metrics for error rates, fault rate and failure rate. I am vaguely looking at having an evaluation period of around 15 mins. My metrics are being recorded at a minute level currently. I have the following ideas:
To look at the avg of minute level data over a few days, and set it slightly higher than that.
To try different thresholds (t1,t2 ..) and for a given day, see how many times the datapoints are crossing it in 15 min bins.
Not sure if this is the right way of going about it, do share if there is a better way of going about the problem.
PS 1: I know that thresholds should be based on Service Level Agreements(SLA), but let's say we do not have an SLA yet.
PS 2: Also does can I import data from cloudwatch to excel for some easier manipulation? Currently looking at running a few queries on log insights to calculate error rates.
In your case, maybe you could also try Amazon CloudWatch Anomaly Detection instead of static thresholds:
You can create an alarm based on CloudWatch anomaly detection, which mines past metric data and creates a model of expected values. The expected values take into account the typical hourly, daily, and weekly patterns in the metric.
I need help in making sense of how many data points (SampleCount) I get in 5-minute intervals in basic monitoring.
I have basic monitoring for an EC2 instance, which means new data point is gathered every 5 minutes.
With MetricsDataQueries API, I can get data points for the metric.
I have queried to get SampleCount of data points every 5 minutes in a 10-minute period.
Data shows (10-min period):
0 min - 5 sample count
5 min - 5 sample count
I am confused now as to what this actually means. Since basic monitoring gathers data every 5 minute, I would've expected to have 1 data point per 5 minute intervals. So my expectation:
0 min - 1 sample count
5 min - 1 sample count
Thank you for your help!
Default monitoring metrics are collected every 5 minutes but the custom monitoring metrics are collected every minute. See FAQ.
A custom metric can be one of the following:
Standard resolution, with data having one-minute granularity
High resolution, with data at a granularity of one second
By default, metrics are stored at 1-minute resolution in CloudWatch. You can define a metric as high-resolution by setting the StorageResolution parameter to 1 in the PutMetricData API request. If you do not set the optional StorageResolution parameter, then CloudWatch will default to storing the metrics at 1-minute resolution.
When you publish a high-resolution metric, CloudWatch stores it with a resolution of 1 second, and you can read and retrieve it with a period of 1 second, 5 seconds, 10 seconds, 30 seconds, or any multiple of 60 seconds.
Custom metrics follow the same retention schedule listed above.
Difference (one of) between detailed monitoring and basic monitoring is the frequency at which the data is published to CloudWatch. In the case of basic monitoring that is every 5 min and in the case of detailed monitoring that is every 1 minute.
Data collected is the same and the 5 min datapoint that is published in the case of basic monitoring is an aggregation of the 1 min datapoints, that's why the sample count is 5, it's an aggregation of five 1 min samples.
Below is an example of a metric before and after the detailed monitoring was enabled.
Before enabling - no difference between graphing the metric at 1 min or 5 min resolution.
After enabling - graphing at 1 min resolution gives you more detail.
What resolution (basic monitoring with 5 min period, detailed with 1 min, or high-resolution with 1 sec) do Metric Filters use? And how can I change it or at least see it?
Metric filters will only publish the data at 1min resolution.
As the data ages out, these will be rolled up into 5min (for data between 15d and 63d) then into 1h (for the remaining 15 months).
This follows the normal metric retention policy as described in the question "What is the retention period of all metrics?" in the CloudWatch FAQ.
AFAIK subminute resolution is not supported at the moment for metric filters.
I'm trying to set up a custom dashboard in CloudWatch, and don't understand how "Period" is affecting how my data is being displayed.
Couldn't find any coverage of this variable in the AWS documentation, so any guidance would be appreciated!
Period is the width of the time range of each datapoint on a graph and it's used to define the granularity at which you want to view your data.
For example, if you're graphing total number of visits to your site during a day you could set the period to 1h, which would plot 24 datapoints and you will see how many visitors you had in each hour of that day. If you set the period to 1min, graph will display 1440 datapoints and you will see how many visitors you had each in minute of that day.
See the CloudWatch docs for more details:
http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html#CloudWatchPeriods
Here is a similar question that might be useful:
API Gateway Cloudwatch advanced logging