need understanding on guest-metrics in google cloud - google-cloud-platform

I am collecting metrics from Monitoring in google cloud through rest-api. In the api documentation from https://cloud.google.com/monitoring/api/metrics_gcp I am seeing lot of metrics beginning with guest like
guest/cpu/usage_time
guest/disk/bytes_used
guest/disk/io_time
I am seeing the same kind of metrics beginnging with instance like
instance/cpu/usage_time
instance/disk/max_read_bytes_count
I have searched the documentation, but I am not getting clear idea of what is the difference between guest and instance metrics. Which metrics is preferred? Can anyone give suggestion? Thanks

The guest/... is used to monitor the system health of COS instances.
While for the instance/... it is targeted at regular GCE VM instances metrics not COS instance type.

Related

GCP Monitoring can't get metrics from asia-southeast1-b

I've several GCE instances located two zone: asia-southeast1-b and us-east4-c. All instances have already install stackdriver agent. In metrics explorer, I can't find asia-southeast1-b in CPU load metric:
But CPU Usage is OK:
What's wrong with this?
Can you execute this command inside the VM’s deployed in asia-southeast1-b:
grep collectd /var/log/{syslog,messages} | tail
This will show if there is any error with the agent.
To my understanding, this metric (CPU Load) is recollected from Stackdriver agent, then sent to Monitoring.
Let’s see if we can understand what is happening:
Is there a problem with Stackdriver Agent gathering that metric?
Or is there a problem in Monitoring API while ingesting it?
Let me ask you some questions:
Are you using different Operating Systems on the Instances on asia-southeast1-b in comparison to the one’s running in us-east4-c?
Which version of Stackdriver are you running?
In this link you will be able to determine which version you have installed.[2]
Did you make any changes in the configuration of the Stackdriver agent? The file is located in /etc/stackdriver/collectd.conf
Best regards,
[1] https://cloud.google.com/monitoring/agent/install-agent#agent-version
I've fixed this error by adding Monitoring Metric Writer permission to the service account.
https://stackoverflow.com/a/45068262/380774

Unable to understand GCP bill for Stackdriver Monitoring usage

We have implemented kube-state metrics (by following the steps mentioned in this article section 4.4.1 Install monitoring components) on one of our kubernetes clusters on GCP. So basically it created 3 new deployments node-exporter, prometheus-k8s and kube-state metrics on our cluster. After that, we were able to see all metrics inside Metric Explorer with prefix "external/prometheus/".
In order to check External metrics pricing, we referred to this link. Hence, we calculated the price accordingly but when we received the bill it's a shocking figure. GCP has charged a lot of amount but we haven't added any single metric in dashboard or not set monitoring for anything. From the ingested volume (which is around 1.38GB/day), it looks these monitoring tools do some background job (at specific time it reads some metrics or so) which consumed this volume and we received this bill.
We would like to understand how these kube-state metrics monitoring components work. Will it automatically get metrics data and increase the ingested volume and bill in such way or there is any mis-configuration in its setup?
Any guidance on this would be really appreciated!
Thank you.
By default, when implemented, kube-state-metrics exposes several metrics for events across your cluster:
If you have a number of frequently-updating resources on your cluster, you may find that a lot of data is ingested into these metrics which incurs high costs.
You need to configure what metrics you'd like to expose, as well as consult the documentation for your Kubernetes environment in order to avoid unexpected high costs.

get alert when instance is not active in GCE

For Compute Engine of GCE, I use stackdriver monitoring for monitoring and alert.
For most of the general metrics like CPU, disk IO, memory ... etc is available and can set alert for those metrics based or dead-or-alive by process name.
However I cannot find any metrics related to status of GCE instance itself.
My use-case is so simply. I'd like to know if the instance id down or not.
Any suggestion appreciated.
thanks.
think the instance status not a monitoring metric; there's just instance/uptime available.
(and I have no clue what it would return when it is terminated, possibly worth a try).
but one can check for servers with Uptime Checks and then report the Incident.
and one can get the instance status with gcloud compute instances describe instance01.

How to extract an instance uptime based on incidents?

On stackdriver, creating an Uptime Check gives you access to the Uptime Dashboard that contains the uptime % of your service:
My problem is that uptime checks are restricted to http/tcp checks. I have other services running and those services report their health in different ways (say, for example, by a specific process running). I have incident policies already set up for this services, so if the service is not running I get notified.
Now I want to be able to look back and know how long the service was down for the last hour. Is there a way to do that?
There's no way to programmatically retrieve alerts at the moment, unfortunately. Many resource types expose uptime as a metric, though (e.g., instance/uptime on GCE instances) - could you pull those and do the math on them? Without knowing what resource types you're using, it's hard to give specific suggestions.
Aaron Sher, Stackdriver engineer

See load of separate instances on Google Cloud Spanner

Is there a way to see CPU utilisation per configured instance ?
Currently I only see one CPU utilisation no matter if 1 or 3 instances of Google Cloud Spanner are configured.
This would be interesting to get a glimpse if our data-structure balances load evenly.
Thanks,
Christian
Today you can see the CPU usage of each instance by navigating to that instance's page in the console. Cloud Spanner doesn't separate the usage by node though, so you do only get a single figure for, say, a 3-node instance. Generally, Spanner will take care of balancing data across nodes to achieve even distribution.
Instance- and database-level utilization is also available through Google Stackdriver. Look for "spanner_instance" under the Metrics Explorer.