Google monitor - Dashboard with CPU, Memory and Disk usages with GCE filter - google-cloud-platform

I'm trying to set up Googlecloud monitor Dashboard for my GCE's. I'm expiriancing some difficulties though when tring to filter.
I have serveral GCE, and some are not running and are as backup, but are still displayed in the Cloud Monitor.
I would like to monitor 3 metrics (for now) : CPU, Mem, Disk usage.
CPU wasnt a problem as i could just filter by GCE instance name:
But now if i try to do the same for Memory and Disk usage, I dont have the option to filter as I did using CPU. I tried serveral different approaches like filter by "metadata labels:name", "label", "zone" etc. - all result in a "no data avalible for selected timeframe" (without the filter data is displayed). I feel like I'm missing something trivial:
What am I doing wrong? How can I filter by Instancename? Do i need to Activate some logger on Google cloud? Thank you verymuch in advance!

Use the Cloud Monitoring agent to gather system and application metrics (disk, CPU, network and process) from VM instances and send them to Monitoring.
Install the Monitoring Agent
Use the Cloud Logging agent to gather logging metrics from VM instances and send them to Cloud Monitoring.
Install the Logging Agent

Related

Some charts stopped working in instance group monitoring after installing Ops Agent in Google Cloud

I have installed the new Ops Agent in a fleet of servers that are part of a managed instance group.
Since I did the update, some graphs stopped working in the "Monitoring" tab of the instance group.
Here's the screens
As you can see at 11:35am the CPU Utilization and disk IO graphs went to zero.
If I check each server "Observability" tab, all graphs are working and showing information:
Here's the CPU
And here's the disk
Is there something else I need to configure in the instance group to get the previous charts working? Is this a bug in the Google Cloud Interface?

GCP Monitoring can't get metrics from asia-southeast1-b

I've several GCE instances located two zone: asia-southeast1-b and us-east4-c. All instances have already install stackdriver agent. In metrics explorer, I can't find asia-southeast1-b in CPU load metric:
But CPU Usage is OK:
What's wrong with this?
Can you execute this command inside the VM’s deployed in asia-southeast1-b:
grep collectd /var/log/{syslog,messages} | tail
This will show if there is any error with the agent.
To my understanding, this metric (CPU Load) is recollected from Stackdriver agent, then sent to Monitoring.
Let’s see if we can understand what is happening:
Is there a problem with Stackdriver Agent gathering that metric?
Or is there a problem in Monitoring API while ingesting it?
Let me ask you some questions:
Are you using different Operating Systems on the Instances on asia-southeast1-b in comparison to the one’s running in us-east4-c?
Which version of Stackdriver are you running?
In this link you will be able to determine which version you have installed.[2]
Did you make any changes in the configuration of the Stackdriver agent? The file is located in /etc/stackdriver/collectd.conf
Best regards,
[1] https://cloud.google.com/monitoring/agent/install-agent#agent-version
I've fixed this error by adding Monitoring Metric Writer permission to the service account.
https://stackoverflow.com/a/45068262/380774

Is it possible to measure HTTP response latencies without changing my server code?

I have a small number of HTTP servers on GCP VMs. I have a mixture of different server languages and Linux based OS's.
Questions
A. It it possible to use the Stackdriver monitoring service to set alerts at specific percentiles for HTTP response latencies?
B. Can I do this without editing the code of each server process?
C. Will installing the agent into the VM report HTTP latencies?
For example, if the 95th percentile goes over 100ms for a certain time period I want to know.
I know I can do this for CPU utilisation and other hypervisor provided stats using:
https://console.cloud.google.com/monitoring/alerting
Thanks.
Request latencies are extracted by cloud load balancers. As long as you are using cloud load balancer you don't need to install monitoring agent to create alerts based 95th Percentile Metrics.
Monitoring agent captures latencies for some preconfigured systems such as riak, cassandra and some others. Here's a full list of systems and metrics monitoring agent supports by default: https://cloud.google.com/monitoring/api/metrics_agent
But if you want anything custom, i.e. you want to measure request latencies from VM you would need to capture response times yourself and configure logging agent to create a custom metric which you can use to create alerts. And as long as you are capturing them as distribution metrics you should be able to visualise different percentiles (i.e. 25, 50, 75, 80, 90, 95 and 99th etc.) and create alert based on that.
see: https://cloud.google.com/logging/docs/logs-based-metrics/distribution-metrics
A. It it possible to use the Stackdriver monitoring service to set
alerts at specific percentiles for HTTP response latencies?
If you want to simply consider network traffic, yes it is possible. Also if you are using a load balancer it's also possible to set alerts on that.
What you want to do should be pretty straight forward from the interface, however you can also find more info in the documentation.
If you want to use some advanced metric on top of tomcat/apache2 etc, you should check the list of metrics provided by the stackdriver monitoring agent here.
B. Can I do this without editing the code of each server process?
Yes, no need to update any program, stackdriver monitoring works transparently and will be able to fetch basic metrics from a GCP VMs without the need of the monitoring agent, including network traffic and cpu utilization.
C. Will installing the agent into the VM report HTTP latencies?
No, the agent shouldn't cause any http latencies.

how to distribute surplus load of user traffic to google app engine from google compute VM ? running django with apache

I am running django on google VM instance using apache and mod wsgi... i however am unsure of the concurrent requests that my app shall receive from the users and would like to know if i can transfer the surplus load of the VM to the App engine automatically to prevent the server from crashing.
I am unable to find any solution expect running kubernetes cluster or docket containers to effectively manage the load. but in need to be free of this hassle and send off the excess load to GAE.
If you want to analyze the traffic, latency and load of your resources and applications, I would recommend you to start with Stackdriver Trace.
As per documentation, Stackdriver Trace is a distributed tracing system that collects latency data from your applications and displays it in the Google Cloud Platform Console. You can track how requests propagate through your application and receive detailed near real-time performance insights. Stackdriver Trace automatically analyzes all of your application's traces to generate in-depth latency reports to surface performance degradations, and can capture traces from all of your VMs, containers, or Google App Engine projects.
Once you have determine the user traffic or you have a better idea about this, then you can try using "Instance Groups".
GCE offers two kind of VM instance groups:
Managed instance groups (MIGs) allow you to operate applications on multiple identical VMs. You can make your workloads scalable and highly available by taking advantage of automated MIG services, including: autoscaling, autohealing, regional (multi-zone) deployment, and auto-updating.
Unmanaged instance groups allow you to load balance across a fleet of VMs that you manage yourself.

Determine bandwidth in GCP virtual machine

I created machine in Google Cloud Platform A month ago
How do I know the cloud traffic bandwidth and number of hits of VM in Google Compute engine?
If you click on the individual VM's, theirs a monitoring tab that will show you network data up to a 30-day window.
If you want to view multiple VMs at once, you have two choices:
use Stackdriver via a monitoring application like Grafana or New Relic
export compute engine daily usage statistics (Compute Engine -> Settings) and parse them yourself or with another application (like Cloud Health)