Cloudwatch Custom Metrics units for "minutes" - amazon-web-services

I've been scouring different sources (Boto3 docs, AWS docs among others) and most only list a limited number of units as far as time goes. Seconds, Milliseconds, and Microseconds. Say I want to measure a metric in Minutes. How would I go about publishing a custom metric that does this?

Seconds, Microseconds and Milliseconds are the only supported time units: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_MetricDatum.html
If you want to graph your data using CloudWatch Dashboards, in Minutes, you could publish the data in Seconds and then use metric math to get the data in Minutes: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html
You give the metric id m1 and then your expression would be m1/60.
You can also use metric math with GetMetricData API, in case you need raw values instead of a graph: https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/get-metric-data.html

Related

AWS CloudWatch sql query not showing results

I'm using CloudWatch to monitor cpu_usage_system metric from CWagent.
I'm plotting data that is more then 24h old.
When using the regular CloudWatch browsing tab to view the data I see data points, when I do the same with CloudWatch SQL I do not.
Answer from AWS support:
CloudWatch Metrics Insights currently supports the latest three hours of data only. When you graph with a period larger than one minute, for example five minutes or one hour, there could be cases where the oldest data point differs from the expected value. This is because the Metrics Insights queries return only the most recent 3 hours of data, so the oldest datapoint, being older than 3 hours, accounts only for observations that have been measured within the last three hours boundary.
In simple words, Currently, we can query only the most recent 3 hours of data (nothing other than that). The document link having more information on this is mentioned below.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch-metrics-insights-limits.html

Best practices to configure thresholds for alarms

I have been having some difficulty understanding how to go about the ideal threshold for few of our cloudwatch alarms. I am looking at metrics for error rates, fault rate and failure rate. I am vaguely looking at having an evaluation period of around 15 mins. My metrics are being recorded at a minute level currently. I have the following ideas:
To look at the avg of minute level data over a few days, and set it slightly higher than that.
To try different thresholds (t1,t2 ..) and for a given day, see how many times the datapoints are crossing it in 15 min bins.
Not sure if this is the right way of going about it, do share if there is a better way of going about the problem.
PS 1: I know that thresholds should be based on Service Level Agreements(SLA), but let's say we do not have an SLA yet.
PS 2: Also does can I import data from cloudwatch to excel for some easier manipulation? Currently looking at running a few queries on log insights to calculate error rates.
In your case, maybe you could also try Amazon CloudWatch Anomaly Detection instead of static thresholds:
You can create an alarm based on CloudWatch anomaly detection, which mines past metric data and creates a model of expected values. The expected values take into account the typical hourly, daily, and weekly patterns in the metric.

One or more points were written more frequently than the maximum sampling period configured for the metric

Background
I have a website deployed in multiple machines. I want to create a Google Custom Metric that specifies the throughput of it - how many calls were served.
The idea was to create a custom metric that collects information about served requests and 1 time per minute to update the information into a custom metric. So, for each machine, this code can happen a maximum of 1-time per minute. But this process is happening on each machine on my cluster.
Running the code locally is working perfectly.
The problem
I'm getting this error: Grpc.Core.RpcException:
Status(StatusCode=InvalidArgument, Detail="One or more TimeSeries
could not be written: One or more points were written more frequently
than the maximum sampling period configured for the metric. {Metric:
custom.googleapis.com/web/2xx, Timestamps: {Youngest Existing:
'2019/09/28-23:58:59.000', New: '2019/09/28-23:59:02.000'}}:
timeSeries[0]; One or more points were written more frequently than
the maximum sampling period configured for the metric. {Metric:
custom.googleapis.com/web/4xx, Timestamps: {Youngest Existing:
'2019/09/28-23:58:59.000', New: '2019/09/28-23:59:02.000'}}:
timeSeries1")
Then, I was reading in the custom metric limitations that:
Rate at which data can be written to a single time series = one point per minute
I was thinking that Google Cloud Custom Metric will handle the concurrencies issues for me.
According to their limitations, the only option for me to implement realtime monitoring is to put another application that will collect information from all machines and will update it into a custom metric. It sounds to me like too much work for a real use case.
What I'm missing?
Now that you add the machine name on the metric and you get the machines metrics.
To SUM these metrics go to Stackdriver > Metric Explorer, and group your metrics by project-id or label for example, and then SUM the metrics.
https://cloud.google.com/monitoring/charts/metrics-selector#alignment
You can save the chart in a custom dashboard.

How many sample count number for basic monitoring per period?

I need help in making sense of how many data points (SampleCount) I get in 5-minute intervals in basic monitoring.
I have basic monitoring for an EC2 instance, which means new data point is gathered every 5 minutes.
With MetricsDataQueries API, I can get data points for the metric.
I have queried to get SampleCount of data points every 5 minutes in a 10-minute period.
Data shows (10-min period):
0 min - 5 sample count
5 min - 5 sample count
I am confused now as to what this actually means. Since basic monitoring gathers data every 5 minute, I would've expected to have 1 data point per 5 minute intervals. So my expectation:
0 min - 1 sample count
5 min - 1 sample count
Thank you for your help!
Default monitoring metrics are collected every 5 minutes but the custom monitoring metrics are collected every minute. See FAQ.
A custom metric can be one of the following:
Standard resolution, with data having one-minute granularity
High resolution, with data at a granularity of one second
By default, metrics are stored at 1-minute resolution in CloudWatch. You can define a metric as high-resolution by setting the StorageResolution parameter to 1 in the PutMetricData API request. If you do not set the optional StorageResolution parameter, then CloudWatch will default to storing the metrics at 1-minute resolution.
When you publish a high-resolution metric, CloudWatch stores it with a resolution of 1 second, and you can read and retrieve it with a period of 1 second, 5 seconds, 10 seconds, 30 seconds, or any multiple of 60 seconds.
Custom metrics follow the same retention schedule listed above.
Difference (one of) between detailed monitoring and basic monitoring is the frequency at which the data is published to CloudWatch. In the case of basic monitoring that is every 5 min and in the case of detailed monitoring that is every 1 minute.
Data collected is the same and the 5 min datapoint that is published in the case of basic monitoring is an aggregation of the 1 min datapoints, that's why the sample count is 5, it's an aggregation of five 1 min samples.
Below is an example of a metric before and after the detailed monitoring was enabled.
Before enabling - no difference between graphing the metric at 1 min or 5 min resolution.
After enabling - graphing at 1 min resolution gives you more detail.

CloudWatch Custom Dashboard - Period Setting

I'm trying to set up a custom dashboard in CloudWatch, and don't understand how "Period" is affecting how my data is being displayed.
Couldn't find any coverage of this variable in the AWS documentation, so any guidance would be appreciated!
Period is the width of the time range of each datapoint on a graph and it's used to define the granularity at which you want to view your data.
For example, if you're graphing total number of visits to your site during a day you could set the period to 1h, which would plot 24 datapoints and you will see how many visitors you had in each hour of that day. If you set the period to 1min, graph will display 1440 datapoints and you will see how many visitors you had each in minute of that day.
See the CloudWatch docs for more details:
http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html#CloudWatchPeriods
Here is a similar question that might be useful:
API Gateway Cloudwatch advanced logging