One or more points were written more frequently than the maximum sampling period configured for the metric - google-cloud-platform

Background
I have a website deployed in multiple machines. I want to create a Google Custom Metric that specifies the throughput of it - how many calls were served.
The idea was to create a custom metric that collects information about served requests and 1 time per minute to update the information into a custom metric. So, for each machine, this code can happen a maximum of 1-time per minute. But this process is happening on each machine on my cluster.
Running the code locally is working perfectly.
The problem
I'm getting this error: Grpc.Core.RpcException:
Status(StatusCode=InvalidArgument, Detail="One or more TimeSeries
could not be written: One or more points were written more frequently
than the maximum sampling period configured for the metric. {Metric:
custom.googleapis.com/web/2xx, Timestamps: {Youngest Existing:
'2019/09/28-23:58:59.000', New: '2019/09/28-23:59:02.000'}}:
timeSeries[0]; One or more points were written more frequently than
the maximum sampling period configured for the metric. {Metric:
custom.googleapis.com/web/4xx, Timestamps: {Youngest Existing:
'2019/09/28-23:58:59.000', New: '2019/09/28-23:59:02.000'}}:
timeSeries1")
Then, I was reading in the custom metric limitations that:
Rate at which data can be written to a single time series = one point per minute
I was thinking that Google Cloud Custom Metric will handle the concurrencies issues for me.
According to their limitations, the only option for me to implement realtime monitoring is to put another application that will collect information from all machines and will update it into a custom metric. It sounds to me like too much work for a real use case.
What I'm missing?

Now that you add the machine name on the metric and you get the machines metrics.
To SUM these metrics go to Stackdriver > Metric Explorer, and group your metrics by project-id or label for example, and then SUM the metrics.
https://cloud.google.com/monitoring/charts/metrics-selector#alignment
You can save the chart in a custom dashboard.

Related

Best practices to configure thresholds for alarms

I have been having some difficulty understanding how to go about the ideal threshold for few of our cloudwatch alarms. I am looking at metrics for error rates, fault rate and failure rate. I am vaguely looking at having an evaluation period of around 15 mins. My metrics are being recorded at a minute level currently. I have the following ideas:
To look at the avg of minute level data over a few days, and set it slightly higher than that.
To try different thresholds (t1,t2 ..) and for a given day, see how many times the datapoints are crossing it in 15 min bins.
Not sure if this is the right way of going about it, do share if there is a better way of going about the problem.
PS 1: I know that thresholds should be based on Service Level Agreements(SLA), but let's say we do not have an SLA yet.
PS 2: Also does can I import data from cloudwatch to excel for some easier manipulation? Currently looking at running a few queries on log insights to calculate error rates.
In your case, maybe you could also try Amazon CloudWatch Anomaly Detection instead of static thresholds:
You can create an alarm based on CloudWatch anomaly detection, which mines past metric data and creates a model of expected values. The expected values take into account the typical hourly, daily, and weekly patterns in the metric.

AWS Cloudwatch dashboard custom time range

I want each of my graphs on the cloudwatch dashboard to show graphs for different time ranges e.g RAM to show 2 weeks, DISK 4 weeks, etc. Everytime i refresh they revert to auto 3 hours. How can I ensure each graph shows different time range
I don't think this is possible within the same dashboard and you would need to create multiple dashboards to accommodate your requirement.
In CloudFormation, the timeframe is set in the DashboardBody property using start, which affects the whole dashboard and not the individual widgets.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/CloudWatch-Dashboard-Body-Structure.html

GCP BQ metric for query count not reflecting correct no

Recently we faced an outage due to 403 rateLimitExceeded error. We are trying to setup an alert using gcp metric for this error. However the metric for bigquery.googleapis.com/query/count or bigquery.googleapis.com/job/num_in_flight is not showing the number of queries running correctly. We believe we crossed the threshold of 100 concurrent queries several times over the past few days but the metric explorer shows only a maximum of 5 only on few occasions. Do these metrics need any other configs to show the right number or we should use some other way to create an alert that shows that we have crossed 80% of concurrent query no.

How can I see request count (not rate) for a Google Cloud Run application?

I deployed a Google Cloud Run service running in a docker container. Out of the box, it looks like I get insight into some metrics on the Metrics tab of the service page such as Request count, Request latencies and more. Although it sounds like request count would answer my question, what I am really looking for is insight into adoption so that I can answer "How many visits to my application were there in the past week" or something like that. Is there a way to get insight like that out of the box?
Currently, the Request count metric reports responses/second, so I can see blips that look like "0.05/s", which can give me some insight but it's hard to aggregate.
I've tried using the Monitoring > Metrics explorer as well, but I'm not seeing any data for the metrics I select. I'm considering hooking into Google Analytics from within my application if that seems like the suggested solution. Thank you!
I've realized it's quite difficult to have Metrics Explorer give you a straight answer on "how many requests I received this month". However, it's possible:
Go to Metrics Explorer as you said, choose resource type "Cloud Run Revision" (cloud_run_revision) and you'll see "Request Count" (run.googleapis.com/request_count) metric:
Description: Number of requests reaching the revision. Excludes requests that are not reaching your container instances (e.g. unauthorized requests or when maximum number of instances is reached).
Resource type: cloud_run_revision
Unit: number Kind: Delta Value type: Int64
Then, choose Aggregator: None, and click Show Advanced Options. In the form, choose Aligner: sum (instead of default "Rate" default). You now should be able to see total request count per minute:
Now if you change "Alignment Period" to "10 minutes", you'll see one data point for every 10m (sadly, there seems to be a bug that says X req/s, but that's more like X reqs/10m in this case):
If you collect enough data, you can change "Alignment Period" to "Custom" and set 30 days, then update your timeframe on the top to 1 year and see monthly request count.
This does not show sums of all Alignment Periods (I think that part is up to you to do manually, maybe possible via the API), but it lets you see requests per month. For example, here's a service I've been running for some months and I set alignment period to 7 days, viewing past 6 weeks, so I get 6 data points on weekly request count. Hope this helps.

Count number of GCP log entries during a specified time

Is it possible to count number of occurrences of a specific log message over a specific period of time from GCP Stackdriver logging? To answer the question "How many times did this event occur during this time period." Basically I would like the integral of the curve in the chart below.
It doesn't have to be a moving window, this time it's more of a one-time-task. A count-aggregator or similar on the advanced log query would also work if that would be available.
The query looks like this:
(resource.type="container"
logName="projects/xyz-142842/logs/drs"
"Publish Message for updated entity"
) AND (timestamp>="2018-04-25T06:20:53Z" timestamp<="2018-04-26T06:20:53Z")
My log based metric for the graph above looks like this:
My Dashboard is setup like this:
I ended up building stacked bars.
With correct zoom level I can sum up the number of occurrences easy enough. It would have been a nice feature to get the count directly from a graph (the integral), but this works for now.
There are multiple ways to do so, the two that I saw actually working and that can apply to your situation are the following:
Making use of Logs-based Metrics. They can, for example, record the number of log entries containing particular error messages, or they can extract latency information reported in log entries.
Stackdriver Logging logs-based metrics can be one of two metric types: counter or distribution. [...] Counter metrics count the number of log entries matching an advanced logs filter. [...] Distribution metrics accumulate numeric data from log entries matching a filter.
I would advise you to go through the Documentation to check this feature completely cover your use case.
You can export your logs to Big query, once you have them there you can make use of the classical tools like groupby, select and all the tool that BigQuery offers you.
Here you can find a very minimal step to step guide regarding how to export the logs and how to Analyzing Audit Logs Using BigQuery, but I am sure you can find online many resources.
The product and the approaches are really different, I would say that BigQuery is more flexible, but also more complex to be configure and to properly use it. If you find a third better way please update your question with those information.
At first you have to create a metric :
Go to Log explorer.
Type your query
Go to Actions >> Create Metric.
In the monitoring dashboard
Create a chart.
Select the resource and metric.
Go to "Advanced" and provide the details as given below :
Preprocessing step : Rate
Alignment function : count
Alignment period : 1
Alignment unit : minutes
Group by : log
Group by function : count
This will give you the visualisation in a bar chart with count of the desired events.
There is one more option.
You can read your custom metric using Stackdriver Monitoring API ( https://cloud.google.com/monitoring/api/v3/ ) and process it in script with whatever aggregation you need.
If you are working with python - you may look into gcloud python library https://github.com/GoogleCloudPlatform/google-cloud-python/tree/master/monitoring
It will be very simple script and you can stream results of calculation into bigquery table and use it in your dashboard
With PacketAI, you can send logs of arbitrary formats, including from GCP. then the logs dashboard will automatically parse and group into patterns as shown in this video. https://streamable.com/n50kr8
Counts and trends of different log patterns are also displayed
Disclaimer: I work for PacketAI