How to Monitor EKS Node group Status in CloudWatch - amazon-web-services

I'm currently trying to monitor the EKS Node group status, sometimes my node groups show degraded and I want a CloudWatch alert whenever the status is in a Degraded state, I checked CloudWatch Metrics there are no standard metrics, and even I'm unable to find the event in Cloud trail,
Is there any possibility's to creating the alarm using AWS Cloud trail events, Event bridge, or CloudWatch
Kindly help to find the solution for this

For CloudWatch, please take a looks at this:
https://docs.aws.amazon.com/de_de/AmazonCloudWatch/latest/monitoring/deploy-container-insights-EKS.html

I think you can combine Lambda & CloudWatch & EventBridge service here to implement your simple health-check status for a single or multiple node groups.
For your health check Lambda function:
We create a Lambda with Python3 (3.9 for example)
We describe the node group using Boto3
We put a custom metric to CloudWatch metrics so if the status is Active, we put 1 else 0.
When we have the function ready, we prepare the every 1 minutes (up to you) setup.
We create an EventBridge (EB) rule with every 1 min triggers
The EB rule destination is the Lambda function
Once we have enough data points from CloudWatch metrics, we can create a CloudWatch alarm to help us notifying to E-mail or others.
References:
https://stackify.com/custom-metrics-aws-lambda/
https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-run-lambda-schedule.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/eks.html#EKS.Client.describe_nodegroup
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudwatch.html

Related

Prometheus forward alerts to Cloudwatch

I am running a kube cluster in AWS/EKS. All the alarms are managed in AWS CloudWatch. While that could change in the futur, this a requirement I have to deal with today.
I also have alerts in Prometheus. I wish to "export" them to CloudWatch. What would be the best solution for this? I see only two possibilities so far:
I create a lambda in AWS, which query the ALARM{} metrics to Prometheus, then export the result in CW. I then create an additional alarm in CW monitoring the state of the Prometheus alarme.
I create a webhook in alert manager calling an API gateway in AWS, which would turn on/off the alarm in CW.
Any other suggestions ?

Is there a simple way to monitor when a backup fails twice in cloudformation?

I am trying to send an SNS notification when a backup in the backup vault fails twice consecutively. Is there a CloudWatch alarm or any other way to do this in CloudFormation?
You can use CloudWatch metrics for this purpose and then setup alarms based on the thresholds that you need.
You can find the list of Metrics that are emitted to CloudWatch in this document: https://docs.aws.amazon.com/aws-backup/latest/devguide/cloudwatch.html
For instance you can setup an alarm on NumberOfBackupJobsFailed metric.

Difference between AWS CloudWatch and AWS CloudWatch Events

Was studying about Amazon web services and fundamentals when came across these 2 concepts:
Amazon CloudWatch
Amazon CloudWatch Events
Even while going through the official documents on AWS, I couldn't find a difference between the two even when Amazon mentions that they are different. Excerpt is:
CloudWatch provides you with data and actionable insights to monitor
your applications, respond to system-wide performance changes,
optimize resource utilization, and get a unified view of operational
health. CloudWatch collects monitoring and operational data in the
form of logs, metrics, and events, providing you with a unified view
of AWS resources, applications, and services that run on AWS and
on-premises servers. You can use CloudWatch to detect anomalous behavior in your environments, set alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to keep your applications
running smoothly.
Documentation of AWS CloudWatch
Amazon CloudWatch Events delivers a near real-time stream of system
events that describe changes in Amazon Web Services (AWS) resources.
Using simple rules that you can quickly set up, you can match events
and route them to one or more target functions or streams. CloudWatch
Events becomes aware of operational changes as they occur. CloudWatch
Events responds to these operational changes and takes corrective
action as necessary, by sending messages to respond to the
environment, activating functions, making changes, and capturing
state information.
Documentation of AWS CloudWatch Events
CloudWatch
CloudWatch is a monitoring service for your AWS resources. You can log your log files. By default the resources created within AWS logs in CloudWatch(CW). You can monitor the performance of resources too for example you can monitor how is the CPU utilisation of your EC2 instances. You can set Alarms for your resources
threshold and get an SNS alert on that. For example you can create an Alarm for your DynamoDB if Write capacity is exceeding. You can set an alarm for your billing too. So basically CW is used as a Monitoring solution.
CloudWatch Events
CW Events is also the part of CloudWatch. CloudWatch Events is helpful when you want to schedule something. Say you to want run your lambda every other day, you can create a Rule for that or you want to trigger your lambda by Event Pattern. There are bunch of services supported by CloudWatch Events, you can use anyone of them as your target not just Lambda. Event Buses is used to send your events to multiple accounts also. For example if you have a CICD account and every month you bake new AMI there, to notify all accounts you can use Event Buses, after getting the event from Event Buses other accounts can trigger some important tasks.

Count AWS IoT successfull connection

I want to count number of successful IoT connection and messages at a time , based upon that I want to apply a notification if it increases by some threshold.
I could not see any cloudwatch matrices/event for the same.
Please guide
You have a couple options. You can create a topic rule that listens on lifecycle topics. The rule could push a custom metric into AWS Cloudwatch or invoke a lambda that has more complex logic.
Another option would be to turn on the IOT logs, and setup a Lambda subscription to the IoT log group. The Lambda function could then push a custom metric into AWS CloudWatch.

Use cloudwatch to determine if linux service is running

Suppose I have an ec2 instance with service /etc/init/my_service.conf with contents
script
exec my_exec
end script
How can I monitor that ec2 instance such that if my_service stopped running I can act on it?
You can publish a custom metric to CloudWatch in the form of a "heart beat".
Have a small script running via cron on your server checking the
process list to see whether my_service is running and if it is, make
a put-metric-data call to CloudWatch.
The metric could be as simple as pushing the number "1" to your custom metric in CloudWatch.
Set up a CloudWatch alarm that triggers if the average for the metric falls below 1
Make the period of the alarm be >= the period that the cron runs e.g. cron runs every 5 minutes, make the alarm alarm if it sees the average is below 1 for two 5 minute periods.
Make sure you also handle the situation in which the metric is not published (e. g. cron fails to run or whole machine dies). you would want to setup an alert in case the metric is missing. (see here: AWS Cloudwatch Heartbeat Alarm)
Be aware that the custom metric will add an additional cost of 50c to your AWS bill (not a big deal for one metric - but the equation changes drastically if you want to push hundred/thousands of metrics - i.e. good to know it's not free as one would expect)
See here for how to publish a custom metric: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html
I am not sure if CloudWatch is the right route for checking if the service is running - it would be easier with Nagios kind of solution.
Nevertheless, you may try the CloudWatch Custom metrics approach. You add Additional lines of code which publishes say an integer 1 to CloudWatch Custom Metrics every 5 mins. Your can then configure CloudWatch alarms to do a SNS Notification / Mail Notification for the conditions like Sample Count or sum deviating your anticipated value.
script
exec my_exec
publish cloudwatch custom metrics value
end script
More Info
Publish Custom Metrics - http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html