I have an EC2 instance in AWS with Centos 6 and I only have supervisor on it which maintains a single PHP script. In some cases this script fails and I can see something like this:
$ sudo /usr/local/bin/supervisorctl status
my-worker EXITED Aug 19 10:19 AM
I would like to receive alert email about it because my script hasn't worked since Aug 19.
I try to find something related to health checks, but health check available only for load balancers. Also I tried to find something in CloudWatch but couldn't find a relevant metric for me.
Any idea, how i can receive email when my worker fall down?
There isn't an out of the box metric for something like that as Cloudwatch by default only has access to hypervisor level metrics rather than OS based metrics such as RAM usage or process related statistics.
To augment the data in Cloudwatch you could write a small script that checks whether the process is running and then calls PutMetricData to upload that metric to Cloudwatch.
Something like this should work:
#!/bin/bash
${process_name}=$1
DATE=`date +%Y-%m-%dT%H:%M:%S.000Z`
processes_running=`pidof ${process_name} | wc -w`
aws cloudwatch put-metric-data --metric-name ${process_name}_running --namespace "MyService" --value ${processes_running} --timestamp $DATE
Then just call that with cron or something every minute (or however often you want to update Cloudwatch - max resolution is 1 minute though, more frequent calls will be aggregated)
Then you just need to create an alarm that performs some action (such as using SNS to send an email to all subscribed addresses but potentially also performing some action such as rebooting the instance).
Related
I've been trying to create an AWS CloudWatch Custom Dashboard to indicate the number of Times a Task running in a ECS Service has STOPPED (Fargate Launch Type) due to containers crashing (could be due to a container related issue or application level issue).
I tried to retrieve the expected metric by writing a CloudWatch Logs Insights Query to query the /aws/ecs/containerinsights/{my-cluster-name}/performance Log Group. The query I used for this is written as below,
fields TaskId, KnownStatus, LaunchType, #timestamp
| filter ServiceName = "my-first-service" and KnownStatus = "STOPPED"
| limit 20
However, this doesn't output the STOPPED instances. Though, it outputs the RUNNING and Pending tasks when filtered by equalling KnownStatus to RUNNING or PENDING.
So my question is, doesn't ECS log STOPPED task logs to CloudWatch?
If it does, appreciate it very much if anyone could provide an example,
to indicate the number of Times a Task had STOPPED?
to indicate the number of times each Container within a Task had STOPPED or crashed?
I also read in the documentation that Amazon EventBridge can be used to detect the STOPPED task events triggered by ECS, which in turn can write its logs back to CloudWatch. Hence, please feel free to advise on what would be the best practice to achieve my requirement.
Thank you.
I have some ECS tasks running in AWS Fargate which in very rare cases may "die" internally, but will still show as "RUNNING" and not fail and trigger the task to restart.
What I would like to do, if possible is check for the absence of logs, e.g. if logs haven't been written in 30 minutes, trigger a lambda to kill the ECS task which will cause it to start back up.
The health check functionality isn't sufficient.
If this isn't possible, are there any other approaches I could consider?
you can have metric and anomaly detection but it may cost for metric to process logs + alarm may cost too. Would rather do lambda run every 30min which would check if logs are there and then would kill ECS as needed. you can run lambda on interval with cloudwatch events bridge.
Logs are probably sent to cloudwatch logs group from your ECS, if you have static name of the logs group, you can use SDK to describe streams inside the group. This api call will tell you timestamp of the last data in stream.
inside lambda nodejs context aws-sdk v2 is already present, so you can require w/o install. here is doc for v2:
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CloudWatchLogs.html#describeLogStreams-property
pick to orderBy: "LastEventTime" and to save networking time, set limit from default 50 to 1 limit: 1 and in result you will have lastEventTimestamp
anomaly detection:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Anomaly_Detection.html
alarms:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
check pricing for these, there is free tier, so maybe it won't cost you anything, yet it's easy to build up real $ spend with cloudwatch. https://aws.amazon.com/cloudwatch/pricing/
To run lambda on interval:
I need to send weekly emails to my team for the complete performance metrics dashboard snapshot which includes CPU, Memory, I/O, Network graphs for the production server in AWS EC2,RDS database for the last week.
I prefer to use AWS CloudWatch Custom dashboard. However, i am not able to send automatic emails for Custom Dashboards on weekly basis.
Should i use AWS Cloud Watch or some other monitoring tool to achieve this task.
I have created AWS cloudwatch alerts but will only trigger email, if certain threshold reaches which will not serve my purpose as i need complete dashboard to be emailed to my team which includes CPU, Memory, Network for Web server, RDS..etc etc in the same email.
I created custom dashboard in Cloudwatch which displays graphs for Ec2 and RDS (CPU, Memory, Network..etc).
Is there a way to send Custom Dashboards in email
Expected Results: Setup a email notification ever week that send complete performance metric dashboard to my team members
I think no way to take snap and send the metrics in AWS.
This may help you...
If you need AWS CloudWatch console UI snapshot, Create Automation Script and configure it.
Way1:
Write automation script to get Statistics by using below awscli and get metrics by changing "--metric-name", "--start-time" and "--end-time".
Include sending mail in the script using this link
Configure CRON job and configure the script by running weekly basis.
aws cloudwatch get-metric-statistics --namespace AWS/EC2 --metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 --statistics Maximum \
--start-time 2016-10-18T23:18:00 --end-time 2016-10-19T23:18:00 --period 360
Way 2:
create UI automation script to take snap using Selenium driver
Use this link for Console login and this link for Taking
screenshot.
Configure UI automation script in JOB1 and enable "Poll SCM" for daily run basis.
Create Jenkins JOB2 and enable "Poll SCM" for Weekly basis and add automation script as downstream project
What is the best way to check the EC2 instance uptime and possibly send alerts if uptime for instance is more then N hours? How can it be organized with default AWS tools such as CloudWatch, Lambda ?
Here's another option which can be done just in CloudWatch.
Create an alarm for your EC2 instance with something like CPUUtilization - you will always get a value for this when the instance is running.
Set the alarm to >= 0; this will ensure that whenever the instance is running, it matches.
Set the period and consecutive periods to match the required alert uptime, for example for 24 hours you could set the period to 1 hour and the consecutive periods to 24.
Set an action to send a notification when the alarm is in ALARM state.
Now, when the instance has been on less than the set time, the alarm will be in INSUFFICIENT DATA state. Once it has been on for the uptime, it will go to ALARM state and the notification will be sent.
One option is to use AWS CLI and get the launch time. From that calculate the uptime and send it to Cloudwatch:
aws ec2 describe-instances --instance-ids i-00123458ca3fa2c4f --query 'Reservations[*].Instances[*].LaunchTime' --output text
Output
2016-05-20T19:23:47.000Z
Another option is to periodically run a cronjob script that:
calls uptime -p command
converts the output to hours
sends the result to Cloudwatch with dimension Count
After adding the cronjob:
add a Cloudwatch alarm that sends an alert when this value exceeds a threshold or if there is INSUFFICIENT DATA
INSUFFICIENT DATA means the machine is not up
I would recommend looking into an "AWS" native way of doing this.
If it is basically sending OS level metrics (e.g. Free Memory, Uptime, Disk Usage etc...) to Cloudwatch then this can be achieved by following the guide:
This installs the Cloudwatch Logs Agent on your EC2 instances.
http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/QuickStartEC2Instance.html
The great thing about this is you then get the metrics show up in Cloudwatch logs (see attached picture which shows the CW Logs interface in AWS Console.).
Suppose I have an ec2 instance with service /etc/init/my_service.conf with contents
script
exec my_exec
end script
How can I monitor that ec2 instance such that if my_service stopped running I can act on it?
You can publish a custom metric to CloudWatch in the form of a "heart beat".
Have a small script running via cron on your server checking the
process list to see whether my_service is running and if it is, make
a put-metric-data call to CloudWatch.
The metric could be as simple as pushing the number "1" to your custom metric in CloudWatch.
Set up a CloudWatch alarm that triggers if the average for the metric falls below 1
Make the period of the alarm be >= the period that the cron runs e.g. cron runs every 5 minutes, make the alarm alarm if it sees the average is below 1 for two 5 minute periods.
Make sure you also handle the situation in which the metric is not published (e. g. cron fails to run or whole machine dies). you would want to setup an alert in case the metric is missing. (see here: AWS Cloudwatch Heartbeat Alarm)
Be aware that the custom metric will add an additional cost of 50c to your AWS bill (not a big deal for one metric - but the equation changes drastically if you want to push hundred/thousands of metrics - i.e. good to know it's not free as one would expect)
See here for how to publish a custom metric: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html
I am not sure if CloudWatch is the right route for checking if the service is running - it would be easier with Nagios kind of solution.
Nevertheless, you may try the CloudWatch Custom metrics approach. You add Additional lines of code which publishes say an integer 1 to CloudWatch Custom Metrics every 5 mins. Your can then configure CloudWatch alarms to do a SNS Notification / Mail Notification for the conditions like Sample Count or sum deviating your anticipated value.
script
exec my_exec
publish cloudwatch custom metrics value
end script
More Info
Publish Custom Metrics - http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html