Amazon EC2 ELB alarm - which instance is unhealthy? - amazon-web-services

We have hosted some apps on Amazon EC2 and are using an Elastic Load Balancer (ELB) to manage several instances of one app. Also, we have set up ELB alarms to get notified about Unhealthy Hosts, i.e. when an instance has gone down.
So far, I could not figure out where to check which instance exactly has gone down when the alarm goes off, except for the ELB status page in the AWS console. However, if the instance comes back to In Service state again, this won't help me either.
The e-mail notification sent out by the ELB does not contain this information; and I couldn't find it anywhere in the alarms history in the console either.
Is there a way to tell which instance an ELB alarm has been triggered for, even if the instance has come back into OK state in the meantime?
Cheers, Alex

Sadly Amazon does not provide a health check log, so its impossible to find out which instance failed the health check afterwards, assuming that the server is no longer unhealthy. You can only use Per-Az metrics to know in which AZ is the instance.
But, you could know which instance is down if you query AWS api during the issue. So, I have thought of a possible workaround:
Set up a new SNS topic, and add an HTTP action to a custom URL that triggers a job that enumerates the instances and send you that info by mail.
Then setup a CloudWatch alarm for UnHealthyHostCount > 0 and setup the action to the SNS topic.
The difficult part is that your URL should handle the SNS subscription & confirmation described here.
The command to know which instance is currently OutOfService is:
elb-describe-instance-health *LoadBalancerName* --region *YourRegion*

You could probably use the AWS SDK gem or other AWS library that can get status. Use it to create a cron task that regularly gets the status of each instance and records it somewhere. Either that will give you what you need or the disappearance of the status for one instance will tell you which one went bad.

We are using the following Lambda function to make up for the lack of Health Check logging:
'use strict';
var AWS = require('aws-sdk');
var elb = new AWS.ELB();
exports.handler = (event, context, callback) => {
var params = {
LoadBalancerName: "<elb_name_here>"
};
elb.describeInstanceHealth(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
};
It does not produce the prettiest logs in CloudWatch, but the data is there. It allows us to see if there is a particular instance which tends to drop more often, etc. It is set up much like Gerardo Grignoli's answer above. I added a CloudWatch alarm to send an SNS message to the Lambda function when the alarm was triggered. It doesn't do anything with the message itself - the message is merely the triggering mechanism for the Lambda function to run and log the instance status.

No. The ELB metrics in CloudWatch do not provide you with that level of details and IMHO from the design perspective they should not. If a host is unhealthy the monitoring on the specific host should report the details for that not the ELB. If a node goes out of service in ELB, it should not be a problem for ELB. Although, in load balancer it makes sense to figure out an alarming state where 3 out of 6 of your machines go into Not In Service state. Take a look at CloudWatch metrics

Go to load balancer and find load balancer associated with you ELB. Then look at instances that OutofService

Related

Health check endpoint for AWS

I'm new to serverless and AWS so I'm unsure how to have a health check endpoint /healthcheck for my actual processing Lambda or if it's even needed at all. I want to be able to check the health even without access to AWS account and without calling the actual Lambda. I'm using just a simple workflow of API Gateway > Lambda > DynamoDB. As far as I understand, it is possible for the service to be down in all 3 stages.
I know of Route 53 but I don't think it fits what I want because it calls the endpoint repeatedly and I think access to AWS account is needed as well.
It is possible to have /healthcheck Lambda to just return that the endpoint is up and if service is down, then there would be nothing returned but this does not seem like the correct approach since the endpoint can never return down.
Maybe AWS health API to report public health events would work but it seems like it works in the reverse manner - report when there's an issue instead of having an endpoint to check myself. Is that the recommended method to check health for serverless?
You keep mentioning Lambda as an entire service, so if that is what you mean, then AWS operates a regional health page by service: https://status.aws.amazon.com/
You can also use the Health API https://docs.aws.amazon.com/health/latest/ug/monitoring-logging-health-events.html to return a status of 'healthy' unless it finds a entry for Lambda (or whichever) that indicates unhealthy.
If you are looking instead to deploy a Lambda function that says 'I am alive and can access specific resources I need', then perhaps you should develop a simple function to deploy in /healthcheck that has the same permissions as the real function and does some small actions like check and record a dummy value in DynamoDB to make sure it can access it/ read it/ modify it/ delete it or whatever else it is supposed to do there. It could also return some simple stats on the dynamodb table that are recorded in cloudwatch to indicate the health of the table to you in a more simple manner than searching in the console
(https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/metrics-dimensions.html)

AWS SNS not sending Subscription Confirmation

I have setup AWS SNS setup with a topic say 'A'. I'm doing a subscribe to this SNS topic using Http (tried both manually using AWS console online and using Java Code). All I get is 'pending confirmation' in both cases. However SNS does not send the initial 'SubscriptionConfirmation' to the provided Url.
Note that my endpoint is ready to receive http POST notification. When I manually do a POST from my side I see my servlet processing those Json I send. For some reason I receive nothing from AWS SNS.
Note that my http end point that I used for subscribe is public facing so SNS should have no issue reaching it.
Any inputs is appreciated.
Here is my subscribe function.
public String subscribe(String arn,String url) {
if(arn == null || arn.isEmpty())
arn = topicArn;
SubscribeRequest subRequest = new SubscribeRequest(arn,"http",url);
SubscribeResult result = snsClient.subscribe(subRequest);
//get request id for SubscribeRequest from SNS metadata
if(result != null){
LOGGER.info("SubscribeResult - " + result.toString());
}
LOGGER.info("SubscribeRequest - " + snsClient.getCachedResponseMetadata(subRequest));
return result.toString();
}
You are always going to get "pending confirmation" as the response for the subscriptionArn. The confirmation process is asynchronously as a separate process. To make this even more confusing if you call to get a list of current subscriptions they will show an slightly different subscriptionArn of "PendingConfirmation" so you can not even match it later.
As far as being able to connect, I would try hitting an end point outside of AWS first. By default most AWS elements are very locked down and can not even connect to each other, so there is likely a security setting somewhere that needs to be changed to let SNS connect to your EC2. Which would be why you can connect to the EC2 outside of AWS, but your SNS service can not.
Also check to make sure the SNS and EC2 you are using are in the same region. It is a common cause of connection issues.
If you are using a host name to connect I would try using the direct IP to see if it gets through.
To troubleshoot, you should turn on the "Delivery status" reports in topic actions - https://docs.aws.amazon.com/sns/latest/dg/sns-msg-status.html. Then you will see why the confirmation message failed to be sent from AWS side.
On your EC2 instance side, on network level you must make sure that the port you are listening on is open from outside. There are several things: both making sure the port is open in firewall (Security groups settings), and making sure that the IP is reachable (i.e., make sure your VPC where the machine is located is publicly visible).
I faced the same issue, the region was the problem.
Make sure the SNS, CloudWatch and EC2 are in the same region.
For me disabling encryption on the topic allowed the emails to finally be delivered, albeit to the spam folder.

Get Email notifications when ec2 instance in terminated

I need to receive notifications whenver my instance in terminated. I know it can be done by cloudtrail and then using sns and sqs to get email for it, if you receive event of termination.
Is there a simpler way to do that ?
Any solution will is appreciated, but I prefer is doing using boto.
While it is not possible to receive a notification directly from Amazon EC2 when an instance is terminated, there are a couple of ways this could be accomplished:
Auto Scaling can send a notification when an instance managed by Auto Scaling is terminated. See: Configure Your Auto Scaling Group to Send Notifications
AWS Config can also be configured to send a Simple Notification Service (SNS) notification when resources change. This would send many notifications, so you would need to inspect and filter the notifications to find the one(s) indicating an instance termination. See the SNS reference in: Set Up AWS Config Using the Console and Example Amazon SNS Notification and Email from AWS Config.
Amazon Simple Notification Service (SNS) can also push a message to Amazon Queueing Service (SQS), which can be easily polled with the boto python SDK.
Receiving notifications via CloudTrail and CloudWatch Logs is somewhat messier, so I'd recommend the AWS Config method.
Now AWS introduced "rules" Under "Events" in AWS CloudWatch. In your case, you can select EC2 as Event Selector and SNS or SQS as Targets.
https://aws.amazon.com/blogs/aws/new-cloudwatch-events-track-and-respond-to-changes-to-your-aws-resources/
According to the AWS doc: Spot Instance Interruptions, it is possible to pool the instance-metadata in order to get an approximation of the termination time. You can build any custom monitoring solution around that.
> curl http://169.254.169.254/latest/meta-data/spot/instance-action
{"action": "stop", "time": "2017-09-18T08:22:00Z"}
If the instance is not scheduled for termination a http:400 will be returned.

Can I use AWS CloudWatch to hit a status URI?

Is it possible to use CloudWatch or other AWS services to hit a URI, e.g. www.mysite.com/status, and send me error alerts when that doesn't return a 200 result? I want service-level monitoring for a small site (and don't want to do any work).
Ideally, I'd like to hit the /status endpoint on a particular EC2 host, with the HTTP hostname parameter set.
Thanks in advance.
edit: I recall something similar is available in auto-scaling groups, where hosts are automatically taken down if they don't meet health checks. I'm looking for something similar, but I just want email, not hosts taken down. (Since I'm working on small sites on a shared host.)
You can't do it directly from CloudWatch, but you could set up a monitor on a separate server, construct the test, and then send a custom metric to CloudWatch using the CLI tools. Custom metrics (and the CloudWatch CLI) are covered here:
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html
From a separate server you could then run a simple script which tries to load your health page, and sends 0 for healthy, 1 for unhealthy, or whatever works for you, to CloudWatch.
Doing this with CloudWatch and SNS is not straightforward. You could do it with Route 53 and DNS failover, but for what you need, have a look at Pingdom. They have a free plan somewhere if you search for it.

How do I set up CloudWatch to detect when an EC2 instance goes down?

I've got an app running on AWS. How do I set up Amazon CloudWatch to notify me when the EC2 instance fails or is no longer responsive?
I went through the CloudWatch screens, and it appears that you can monitor certain statistics, like CPU or disk utilization, but I didn't see a way to monitor an event like "the instance got an http request and took more than X seconds to respond."
Amazon's Route 53 Health Check is the right tool for the job.
Route 53 can monitor the health and performance of your application as well as your web servers and other resources.
You can set up HTTP resource checks in Route 53 that will trigger an e-mail notification if the server is down or responding with an error.
http://eladnava.com/monitoring-http-health-email-alerts-aws/
To monitor an event in CloudWatch you create an Alarm, which monitors a metric against a given threshold.
When creating an alarm you can add an "action" for sending a notification. AWS handles notifications through SNS (Simple Notification Service). You can subscribe to a notification topic and then you'll receive an email for you alarm.
For EC2 metrics like CPU or disk utilization this is the guide from the AWS docs: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/US_AlarmAtThresholdEC2.html
As answered already, use an ELB to monitor HTTP.
This is the list of available metrics for ELB:
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/US_MonitoringLoadBalancerWithCW.html#available_metrics
To answer your specific question, for monitoring X seconds for the http response, you would set up an alarm to monitor the ELB "Latency".
CloudWatch monitoring is just like you have discovered. You will be able to infer that one of your instances is frozen by taking a look at the metrics, but CloudWatch won't e.g. send you an email when your app is down or too slow, for example.
If you are looking for some sort of notification when your app or instance is down, I suggest you to use a monitoring service. Pingdom is a good option. You can also set up a new instance on AWS and install a monitoring tool, like Nagios, which would be my preferred option.
Good practices that are always worth, in the long road: using load balancing (Amazon ELB), more than one instance running your app, Autoscaling (when an instance is down, Amazon will automatically start a new one and maintain your SLA), and custom monitoring.
My team has used a custom monitoring script for a long time, and we always knew of failures as soon as they occurred. Basically, if we had two nodes running our app, node 1 sent HTTP requests to node 2 and node 2 to 1. If any request took more than expected, or returned an unexpected HTTP status or response body, the script sent an email to the system admins. Nowadays, we rely on more robust approaches, like Nagios, which can even monitor operating system stuff (threads, etc), application servers (connection pools health, etc) and so on. It's worth every cent invested in setting it up.
CloudWatch recently added "status check" metrics that will answer one of your questions on whether an instance is down or not. It will not do a request to your Web server but rather a system check. As previous answer suggest, use ELB for HTTP health checks.
You could always have another instance for tools/testing, that instance would try the http request based on a schedule and measure the response time, then you could publish that response time with CloudWatch and set an alarm when it goes over a certain threshold.
You could even do that from the instance itself.
As Kurst Ursan mentioned above, using "Status Check" metrics is the way to go. In some cases you won't be able to browse that metrics (i.e if you;re using AWS OpsWorks), so you're going to have to report that custom metric on your own. However, you can set up an alarm built on a metric that always matches (in an OK sate) and have the alarm trigger when the state changes to "INSUFFICIENT DATA" state, this technically means CloudWatch can't tell whether the state is OK or ALARM because it can't reach your instance, AKA your instance is offline.
There are a bunch of ways to get instance health info. Here are a couple.
Watch for instance status checks and EC2 events (planned downtime) in the EC2 API. You can poll those and send to Cloudwatch to create an alarm.
Create a simple daemon on the server which writes to DynamoDB every second (has better granularity than Cloudwatch). Have a second process query the heartbeats and alert when missing.
Put all instances in a load balancer with a dummy port open that that gives a TCP response. Setup TCP health checks on the ELB, and alert on unhealthy instances.
Unless you use a product like Blue Matador (automatically notifies you of production issues), it's actually quite heinous to set something like this up - let alone maintain it. That said, if you're going down the road, and want some help getting started using Cloudwatch (terminology, alerts, logs, etc), start with this blog: How to Monitor Amazon EC2 with CloudWatch
You can use CloudWatch Event Rule to Monitor whenever any EC2 instance goes down. You can create an Event rule from CloudWatch console as following :
In the CLoudWatch Console choose Events -> rule
For Event Pattern, In service Name Choose EC2
For Event Type, Choose EC2 Instance State-change Notification
For Specific States, Choose Stopped
In targets Choose any previously created SNS topic for sending a notification!
Source : Create a Rule - https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatch-Events-Input-Transformer-Tutorial.html#input-transformer-create-rule
This is not exactly a CloudWatch alarm, however this serves the purpose of monitoring/notification.