GKE Alerts using Alerting Policy - google-cloud-platform

We have a GKE cluster, in which we have enabled Cloud Monitoring.
Trying to Achieve:
Add alerts on GKE Node Up/Down
Alerts on CPU and Memory Utilisation
Alerts on Disk Utilisation
Issue:
We tried to setup alerts with pod/volume/utilization for disk and node/cpu/allocatable_utilization.
But all these metrics are not in percentage.
We checked GKE dashboard in Cloud Monitoring, it contains all the necessary metrics visualised in percentages. Is there a way to setup alerts based on this dashboard?

Related

In GCP , I can able to view GKE Monitoring dashboard. How to create alerts for CPU and memory Utilization for Kubernetes Container

I have enabled Default GCP Monitoring in my Google Kubernetes Cluster. So GKE Dashboard is created which contains System Metrics. Now I need to enable alert for Kubernetes container's CPU and memory Utilization from GKE dashboard. I tried to create own alert, but it didn't match with metrics defined in GKE dashboard.
This is a Guide1 and Guide2 for monitoring the Kubernetes engine. In it, you can know about alerting and how to monitor your system. In case you were already familiar, here is a list of the metrics for the new Kubernetes engine in comparison to the previous metrics. Additionally, the complete list of metrics, which are always useful, can be found here.
In Monitoring dashboard, dashboard displays CPU and Memory utilization in time range:
CPU utilization: The CPU utilization of containers that can be attributed to a resource within the selected time span. The metric used is here check with For CPU Utilization
Memory utilization: The memory utilization of containers that can be attributed to a resource within the selected time span. The metric used is here check with For Memory Utilization
The command "kubectl top node" displays resource (CPU/Memory/Storage) usage at that moment, not time span.

Not getting node metrics for mysqld_exporter and postgres_exporter for RDS instances

I have setup a system for RDS monitoring and used mysqld_exporter and postgres_exporter and exposed their metrics to prometheus server but I am not getting important metrics to monitor like CPU_Utilization, memory available, iops rate, latency etc. I am getting about 1800 metrics but none of them are these I suppose. I think i need node exporter for RDS monitoring for getting cpu utilization but i don't know how to configure node_exporter for RDS instance.
PS: I don't want to use AWS Cloudwatch metrics

AWS Resource Usage Data - CPU, Memory and Disk

I am trying to build an analytics Dashboard using the below Metrics/KPIs for all the EC2 Instance.
Total CPU vs CPUUtilized
Total RAM vs RAMUtilized
Total EBS Volume vs EBSUtilized.
For example, I have lunch an EC2 instance with 4 CPU, 16GiB RAM and 50GB SSD, I would like to know the above KPIs in a time series trend. I am not getting any clue on where to get the data from EC2. Tried the EC2 instance metrics through CloudWatch using boto3 client, however did not get the above Metrics. I would like to know :
Where to find the data with above Metrics ?
Need the above metrics data in s3 on an daily basis.
Similarly is there a way to get similar metrics for AWS RDS and AWS EKS Cluster ?
Thanks!
The Amazon EC2 service collects information about the virtual machine (instance) and sends it to Amazon CloudWatch Logs.
See: List the available CloudWatch metrics for your instances - Amazon Elastic Compute Cloud
Note that it only collects metrics that can be observed from the virtual machine itself -- CPU Utilization, network traffic and Amazon EBS traffic. The EC2 service cannot see what is happening 'inside' the instance, since it is the Operating System that controls memory and manages the contents of the disks.
If you wish to collect metrics from the Operating System, then you would need to Collect metrics and logs from Amazon EC2 instances and on-premises servers with the CloudWatch agent - Amazon CloudWatch. This agent runs in the instance and sends metrics out to CloudWatch.
You can write code that calls the CloudWatch Metrics APIs to retrieve metrics. Note that the metrics returned are calculated over a time period (eg average CPU Utilization over a 5-minute period). It is not possible to retrieve the actual raw datapoints.
See also:
Monitoring Amazon RDS metrics with Amazon CloudWatch - Amazon Relational Database Service
Amazon EKS and Kubernetes Container Insights metrics - Amazon CloudWatch

AWS Elasticache - Redis Autoscaling

There is an redis instance been created in ElasticCache and this will be used to store and retrieve data as usual.
Is there any max memory for this redis instance and how can that be checked?
All I need is say example if the data size in redis reaches above 100 mb then it should be auto scaled without me having to manually scale it or create a new instance and things like that.
And when the data size is reduced(example: From 300mb to 50 mb due to less traffic) then the instances should be reduced so that there is no extra cost incured.
How can this be configured in AWS ElastiCache?
unfortunately there is no auto-scaling policy attach with Elasticcache out of the box, amazon ElastiCache provides console, CLI, and API support for scaling your Redis (cluster mode disabled) replication group up.
One option that you can try is to set cloud watch alarm base on node memory and then trigger lambda function that will scale up and down base on metrics.
Create a CW alarm
Select Elastic cache metrics
Select Node level metrics
Select Free memory metrics
Trigger notification to SNS topic
Subscribe lambda function
scaleup/scaledown base on metrics
Now Elasticache supports autoscaling
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/AutoScaling.html

AWS realtime visualization

Are there any tools out there that would allow me to visualize my AWS instances in real-time ? For example, if I'm using auto-scaling I would be able to see the load on each instance and the scaling in real-time.
The Amazon EC2 management console has a Monitoring tab that provides graphs for CPU, Network and Disk metrics.
You can select multiple instances to view them all on one chart. Click the chart to make it bigger.
Similar statistics are available in the Amazon CloudWatch management console.
See documentation: Graph Metrics for Your Instances