Is it possible to use CloudWatch or other AWS services to hit a URI, e.g. www.mysite.com/status, and send me error alerts when that doesn't return a 200 result? I want service-level monitoring for a small site (and don't want to do any work).
Ideally, I'd like to hit the /status endpoint on a particular EC2 host, with the HTTP hostname parameter set.
Thanks in advance.
edit: I recall something similar is available in auto-scaling groups, where hosts are automatically taken down if they don't meet health checks. I'm looking for something similar, but I just want email, not hosts taken down. (Since I'm working on small sites on a shared host.)
You can't do it directly from CloudWatch, but you could set up a monitor on a separate server, construct the test, and then send a custom metric to CloudWatch using the CLI tools. Custom metrics (and the CloudWatch CLI) are covered here:
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html
From a separate server you could then run a simple script which tries to load your health page, and sends 0 for healthy, 1 for unhealthy, or whatever works for you, to CloudWatch.
Doing this with CloudWatch and SNS is not straightforward. You could do it with Route 53 and DNS failover, but for what you need, have a look at Pingdom. They have a free plan somewhere if you search for it.
Related
How do you route AWS Web Application Firewall (WAF) logs to an S3 bucket? Is this something I can quickly do through the AWS Console? Or, would I have to use a lambda function (invoked by a CloudWatch timer event) to query the WAF logs every n minutes?
UPDATE:
I'm interested in the ACL logs (Source IP, URI, Matches rule, Request Headers, Action, Time, etc).
UPDATE (05/15/2017)
AWS doesn't provide an easy way to view/parse these logs. You can get a "random sample" via the get-sampled-requests command. Which isn't acceptable...
Gets detailed information about a specified number of requests--a
sample--that AWS WAF randomly selects from among the first 5,000
requests that your AWS resource received during a time range that you
choose. You can specify a sample size of up to 500 requests, and you
can specify any time range in the previous three hours.
http://docs.aws.amazon.com/cli/latest/reference/waf/get-sampled-requests.html
Also, I'm not the only one experiencing this issue either:
https://forums.aws.amazon.com/thread.jspa?threadID=220202
I was looking for this functionality today and stumbled across the referenced thread. It was, coincidentally, updated today:
Hello,
Thanks for your input. I have submitted a feature request on your
behalf to export WAF events to S3 for long term analysis.
Best Regards, albertpataws
The lack of this feature strikes me as being almost as odd as the fact that I can't change timezones for graphs.
It looks like CloudWatch gives customers 10 custom metrics under the free plan, then each additional one costs $0.50. Does anyone know how to enforce PutMetric accept only a set of custom metrics?
I'm interested in limiting the custom metrics coming from mobile clients or possibly adding a layer of protection against abuse.
Is the only solution to implement my own service which does the validation against a whitelist?
One option you could look at is placing AWS Gateway in front of Cloudwatch and making the calls through the api.
This example shows you how to do this for S3, but there's not reason why you couldn't do something similar for Cloudwatch.
This shows you how to do it for dynamo: https://aws.amazon.com/blogs/compute/using-amazon-api-gateway-as-a-proxy-for-dynamodb/
I ended up running a simple tomcat service which validates metrics against a whitelist (stored in s3) and publishes them to CloudWatch.
We are hosting our services in AWS beanstalk managed instances. That is forcing us to move away from files based logging to use database based logging.
Is DynamoDB a good choice for replacing file based logging. If so, what should be the primary key. I thought of using timestamp but multiple messages may be logged by the same service within the same timeStamp so that might not be reliable.
Any advice would be appreciated.
Don't use DynamoDB to store logs. You'll be paying for throughput and space needlessly.
Amazon CloudWatch has built-in logging capabilities.
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/WhatIsCloudWatchLogs.html
Another alternative is a dedicated logging service such as Loggly which is cloud-based and can receive logs in many common formats, plus they have an API to send custom logs. In the web-based console, you can search and filter through the logs.
As an alternative, why don't you use cloudwatch? I ended up writing a whole app to consolidate logs across ec2 instances in a beanstalk app, then last year AWS opened up cloudwatch as a service, so I junked my stuff. You tell cloudwatch where your logs are on the instance, give it a log group and stream name, and all your logs are consolidated in one spot, in cloudwatch. You can also run alarms off them using the standard AWS setup. It's pretty slick, and easy - don't have to write a front end to do lookups, it's already there.
Don't know what you're using for logging - we are a node.js shop, used winston for logging, and there is a nice NPM module that works with Winston to log automatically, called winston-cloudwatch.
We have hosted some apps on Amazon EC2 and are using an Elastic Load Balancer (ELB) to manage several instances of one app. Also, we have set up ELB alarms to get notified about Unhealthy Hosts, i.e. when an instance has gone down.
So far, I could not figure out where to check which instance exactly has gone down when the alarm goes off, except for the ELB status page in the AWS console. However, if the instance comes back to In Service state again, this won't help me either.
The e-mail notification sent out by the ELB does not contain this information; and I couldn't find it anywhere in the alarms history in the console either.
Is there a way to tell which instance an ELB alarm has been triggered for, even if the instance has come back into OK state in the meantime?
Cheers, Alex
Sadly Amazon does not provide a health check log, so its impossible to find out which instance failed the health check afterwards, assuming that the server is no longer unhealthy. You can only use Per-Az metrics to know in which AZ is the instance.
But, you could know which instance is down if you query AWS api during the issue. So, I have thought of a possible workaround:
Set up a new SNS topic, and add an HTTP action to a custom URL that triggers a job that enumerates the instances and send you that info by mail.
Then setup a CloudWatch alarm for UnHealthyHostCount > 0 and setup the action to the SNS topic.
The difficult part is that your URL should handle the SNS subscription & confirmation described here.
The command to know which instance is currently OutOfService is:
elb-describe-instance-health *LoadBalancerName* --region *YourRegion*
You could probably use the AWS SDK gem or other AWS library that can get status. Use it to create a cron task that regularly gets the status of each instance and records it somewhere. Either that will give you what you need or the disappearance of the status for one instance will tell you which one went bad.
We are using the following Lambda function to make up for the lack of Health Check logging:
'use strict';
var AWS = require('aws-sdk');
var elb = new AWS.ELB();
exports.handler = (event, context, callback) => {
var params = {
LoadBalancerName: "<elb_name_here>"
};
elb.describeInstanceHealth(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
};
It does not produce the prettiest logs in CloudWatch, but the data is there. It allows us to see if there is a particular instance which tends to drop more often, etc. It is set up much like Gerardo Grignoli's answer above. I added a CloudWatch alarm to send an SNS message to the Lambda function when the alarm was triggered. It doesn't do anything with the message itself - the message is merely the triggering mechanism for the Lambda function to run and log the instance status.
No. The ELB metrics in CloudWatch do not provide you with that level of details and IMHO from the design perspective they should not. If a host is unhealthy the monitoring on the specific host should report the details for that not the ELB. If a node goes out of service in ELB, it should not be a problem for ELB. Although, in load balancer it makes sense to figure out an alarming state where 3 out of 6 of your machines go into Not In Service state. Take a look at CloudWatch metrics
Go to load balancer and find load balancer associated with you ELB. Then look at instances that OutofService
I've got an app running on AWS. How do I set up Amazon CloudWatch to notify me when the EC2 instance fails or is no longer responsive?
I went through the CloudWatch screens, and it appears that you can monitor certain statistics, like CPU or disk utilization, but I didn't see a way to monitor an event like "the instance got an http request and took more than X seconds to respond."
Amazon's Route 53 Health Check is the right tool for the job.
Route 53 can monitor the health and performance of your application as well as your web servers and other resources.
You can set up HTTP resource checks in Route 53 that will trigger an e-mail notification if the server is down or responding with an error.
http://eladnava.com/monitoring-http-health-email-alerts-aws/
To monitor an event in CloudWatch you create an Alarm, which monitors a metric against a given threshold.
When creating an alarm you can add an "action" for sending a notification. AWS handles notifications through SNS (Simple Notification Service). You can subscribe to a notification topic and then you'll receive an email for you alarm.
For EC2 metrics like CPU or disk utilization this is the guide from the AWS docs: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/US_AlarmAtThresholdEC2.html
As answered already, use an ELB to monitor HTTP.
This is the list of available metrics for ELB:
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/US_MonitoringLoadBalancerWithCW.html#available_metrics
To answer your specific question, for monitoring X seconds for the http response, you would set up an alarm to monitor the ELB "Latency".
CloudWatch monitoring is just like you have discovered. You will be able to infer that one of your instances is frozen by taking a look at the metrics, but CloudWatch won't e.g. send you an email when your app is down or too slow, for example.
If you are looking for some sort of notification when your app or instance is down, I suggest you to use a monitoring service. Pingdom is a good option. You can also set up a new instance on AWS and install a monitoring tool, like Nagios, which would be my preferred option.
Good practices that are always worth, in the long road: using load balancing (Amazon ELB), more than one instance running your app, Autoscaling (when an instance is down, Amazon will automatically start a new one and maintain your SLA), and custom monitoring.
My team has used a custom monitoring script for a long time, and we always knew of failures as soon as they occurred. Basically, if we had two nodes running our app, node 1 sent HTTP requests to node 2 and node 2 to 1. If any request took more than expected, or returned an unexpected HTTP status or response body, the script sent an email to the system admins. Nowadays, we rely on more robust approaches, like Nagios, which can even monitor operating system stuff (threads, etc), application servers (connection pools health, etc) and so on. It's worth every cent invested in setting it up.
CloudWatch recently added "status check" metrics that will answer one of your questions on whether an instance is down or not. It will not do a request to your Web server but rather a system check. As previous answer suggest, use ELB for HTTP health checks.
You could always have another instance for tools/testing, that instance would try the http request based on a schedule and measure the response time, then you could publish that response time with CloudWatch and set an alarm when it goes over a certain threshold.
You could even do that from the instance itself.
As Kurst Ursan mentioned above, using "Status Check" metrics is the way to go. In some cases you won't be able to browse that metrics (i.e if you;re using AWS OpsWorks), so you're going to have to report that custom metric on your own. However, you can set up an alarm built on a metric that always matches (in an OK sate) and have the alarm trigger when the state changes to "INSUFFICIENT DATA" state, this technically means CloudWatch can't tell whether the state is OK or ALARM because it can't reach your instance, AKA your instance is offline.
There are a bunch of ways to get instance health info. Here are a couple.
Watch for instance status checks and EC2 events (planned downtime) in the EC2 API. You can poll those and send to Cloudwatch to create an alarm.
Create a simple daemon on the server which writes to DynamoDB every second (has better granularity than Cloudwatch). Have a second process query the heartbeats and alert when missing.
Put all instances in a load balancer with a dummy port open that that gives a TCP response. Setup TCP health checks on the ELB, and alert on unhealthy instances.
Unless you use a product like Blue Matador (automatically notifies you of production issues), it's actually quite heinous to set something like this up - let alone maintain it. That said, if you're going down the road, and want some help getting started using Cloudwatch (terminology, alerts, logs, etc), start with this blog: How to Monitor Amazon EC2 with CloudWatch
You can use CloudWatch Event Rule to Monitor whenever any EC2 instance goes down. You can create an Event rule from CloudWatch console as following :
In the CLoudWatch Console choose Events -> rule
For Event Pattern, In service Name Choose EC2
For Event Type, Choose EC2 Instance State-change Notification
For Specific States, Choose Stopped
In targets Choose any previously created SNS topic for sending a notification!
Source : Create a Rule - https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatch-Events-Input-Transformer-Tutorial.html#input-transformer-create-rule
This is not exactly a CloudWatch alarm, however this serves the purpose of monitoring/notification.