I am using CloudWatch alarm to stop ec2 instance. For my case I am pushing log information from my ec2 instance to CloudWatch via log grope. And I filter those information via filter with specific pattern that detect error messages due to failed authentication. Unlike standard ways to stop ec2 instance using CPU utilization. I am using a custom metric (figure). Then I am configuring the action to stop the ec2 instance (figure).
But my alarm appears with state "Insufficient data" all the time. can any one help me to solve the problem and stop my ec2 instance once it is in alarm (means that the logs match the pattern of the filter), Thanks a lot!
On the personal health dashboard in the AWS Console, I've got this notification
EC2 persistent instance retirement scheduled
yesterday which says that one of my ec2 instances is scheduled to retire on 13th March 2019. The status was 'upcoming' while the start and end times both were set to 14-Mar-2019.
The content of the notification starts with:
Hello,
EC2 has detected degradation of the underlying hardware hosting your Amazon EC2 instance (instance-ID: i-xxxxxxxxxx) associated with your AWS account (AWS Account ID: xxxxxxxxxx) in the xxxx region. Due to this degradation your instance could already be unreachable. We will stop your instance after 2019-03-13 00:00 UTC.
....
I've got yet another notification today for the same instance and with the same subject line but the status has been changed to 'ongoing' and the start time is 27-Feb-2019 while the end time is 14-Mar-2019.
I was planning to do a start-stop of the instance next week but does the second notification tell me to do is ASAP?
Yes, it is better to do stop/start ASAP. Even in your message it says:
Due to this degradation your instance could already be unreachable
I have a Google Compute Engine VM Instance that runs Ubuntu 16.04.
I stopped the instance more than an hour ago but it still hasn't stopped.
It says "Unknown error." on notification panel.
I can't do any action on VM right now, because the process is still ongoing. It doesn't let me to stop, start, reset, delete, or access.
How can I force stop the instance? I need to access to the disk attached to VM, but I can't even create snapshot before stopping the VM.
Edit:
Instance stopped after 1.5 hours being stuck at "stopping" state. But I still couldn't find a way to force stop an instance.
You can take a look into Stopping or Deleting an Instance this article or you can use the following gcloud command line tool in order to stop the instance temporarily so you can come back to it at a later time:
$ gcloud compute instances stop INSTANCE_NAMES [INSTANCE_NAMES …]
If the gcloud command line unable to stop the instance then, you can take a look into activity log or stackdriver logs which will help you to debug further. Also at the same time I would recommend running the gcloud command along with flag ‘--verbosity=debug’. Either the activity logs, stackdriver logs or gcloud command line with ‘--verbosity=debug’ flag will give you a better output which will help you to review from what point it is failing.
My question is setting when monitoring AWS metrics with stackdriver.
I'm tried thing below but, alert(policy) is not working.
How do I send alert(policy) with group settings?
I dont want is single monitoring, I do want is group settings.
I completed stackdriver monitoring setting for aws accounts by role settings. for next, I settinged group settings alert(policy) metrics is below.
load average > 5
disk usage > 80%
there target is some ec2 instances, these is group settings.
I complete settings for these. for next, did test of stress.
I looked at the metrics. Then the graph exceeded the threshold.
but not sended alert(policy), and not opened incidents.
below is details.
Alert(Policy) Creation
go to [Alerting/ Policies/ TARGET POLICY]
[Add Condition], for next select to [Metric Threshold]
RESOURCE TYPE is Instance(EC2)
APPLIES TO is Group
Select group. This group is Including EC2 Instances.
CONDITION TRIGGERS IF: Any Member Violates
IF METRIC is [CPU Load Average(past 1m)
CONDITION is above
THRESHOLD is 5 load
FOR is 1 minutes
Write by name and Push [Save Policy]
Test of Stress
ssh to target instances.
Execute stress test.
Confim the Load Average above reached 5.
but not sended alert(policy)
Confirm the Stackdriver
Confirm the above Load Average reached 5, with alert settings page.
But not opened Incidents.
I Tried other settings
For GCP instances, alerts will work correctly. It is both group setting and single setting.
Alerts will work for AWS instances in single configuration, but not for group settings.
Version info
stackdriver
stackdriver-agent version: stackdriver-agent.x86_64 5.5.2-366.amzn1
aws
OS: Amazon Linux
VERSION: 2016.03
ID_LIKE: rhel fedora
more detail is please comments.
If the agent wasn't configured correctly and is sending metrics to the wrong project, this could lead to the behavior described. This works for single instances but doesn't for group of instances. This might work for GCP because it's zero setup for monitoring GCE Instances. This causes any alerts which use group filters to not work.
https://cloud.google.com/monitoring/agent/troubleshooting#verify-project
"If you are using an Amazon EC2 VM instance, or if you are using private-key credentials on your Google Compute Engine instance, then the credentials could be invalid or they could be from the wrong project. For AWS accounts, the project used by the agent must be the AWS connector project, typically named "AWS Link..."."
These instructions at https://cloud.google.com/monitoring/agent/troubleshooting#verify-running help verify that agent is sending metrics correctly.
I'm trying to set up an Elastic Beanstalk application with Amazon Web Services however I'm receiving a load of errors with the message None of the instances are sending data. I've tried deleting the Elastic Beanstalk Application and the EC2 instance several times with the sample application and trying again but I get the same error.
I also tried uploading a flask application with AWS Elastic Beanstalk command line tools but then I received the error below:
Environment health has transitioned from Pending to Severe. 100.0 % of the requests to the ELB are failing with HTTP 5xx. Insufficient request rate (0.5 requests/min) to determine application health (7 minutes ago). ELB health is failing or not available for all instances. None of the instances are sending data
Why do I get this error and how do I fix it? Thanks.
You are using Enhanced Health Monitoring.
With enhanced health monitoring an agent installed on your EC2 instance monitors vital system and application level health metrics and sends them directly to Elastic Beanstalk.
When you see an error message like "None of the instances are sending data", it means either the agent on the instance has crashed or it is unable to post data to Elastic Beanstalk due to networking error or some other error.
For debugging this, I would recommend downloading "Full logs" from the AWS console. You can follow the instructions for getting logs in the section "Downloading Bundle Logs from Elastic Beanstalk Console" here.
If you are unable to download logs using the console for any reason you can also ssh to the instance and look at the logs in /var/log.
You will find logs for the health agent in /var/log/healthd/daemon.log.
Additional logs useful for this situation are /var/log/cfn-init.log, /var/log/eb-cfn-init.log and /var/log/eb-activity.log. Can you look at the logs and give more details of the errors you see?
This should hopefully give you more details regarding the error "None of the instances are sending data".
Regarding other health "causes" you are seeing:
Environment health has transitioned from Pending to Severe - This is because initially your environment health status is Pending. If the instances do not go healthy within grace period health status transitions to Severe. In your case since none of the instances is healthy / sending data, the health transitioned to Severe.
100.0 % of the requests to the ELB are failing with HTTP 5xx. Insufficient request rate (0.5 requests/min) to determine application health (7 minutes ago).
Elastic Beanstalk monitors other resources in addition to your EC2 instances when using enhanced health monitoring. For example, it monitors cloudwatch metrics for your ELB. This error means that all requests sent to your environment CNAME/load balancer are failing with HTTP 5xx errors. At the same time the request rate is very low only 0.5 requests per minute, so this indicates that even though all requests are failing, the request rate is pretty low. "7 minutes ago" means that information about ELB metrics is slightly old. Because Elastic Beanstalk monitors cloudwatch metrics every few minutes, so the data can be slightly stale. This is as opposed to health data we get directly from the EC2 instances which is "near real time". In your case since the instances are not sending data the only available source for health is ELB metrics which is delayed by about 7 minutes.
ELB health is failing or not available for all instances
Elastic Beanstalk is looking at the health of your ELB, i.e. it is checking how many instances are in service behind ELB. In your case either all instances behind ELB are out of service or the health is not available for some other reason. You should double check that your service role is correctly configured. You can read how to configure service role correctly here or in the documentation. It is possible that your application failed to start.
In your case I would suggest focusing on the first error "None of the instances are sending data". For this you need to look at the logs as outlined above. Let me know what you see in the logs. The agent is started fairly early in the bootstrap process on the instance. So if you see an error like "None of the instances are sending data", it is very likely that bootstrap failed or the agent failed to start for some reason. The logs should tell you more.
Also make sure you are using an instance profile with your environment. Instance profile allows the health agent running on your EC2 instance to authenticate with Elastic Beanstalk. If instance profile is not associated with your environment then the agent will not be able to send data to Elastic Beanstalk. Read more about Instance Profiles with Elastic Beanstalk here.
Update
One common reason for the health cause "None of the instances are sending data" can be that your instance is in a VPC and your VPC does not allow NTP access. Typical indicator of this problem is the following message in /var/log/messages: ntpdate: Synchronizing with time server: [FAILED]. When this happens the clock on your EC2 instance can get out of sync and the data is considered invalid. You should also see a health cause on the instances on the health page on the AWS web console that tells you that instance clock is out-of-sync. The fix is to make sure that your VPC allows access to NTP.
There can be many reasons why the health agent is not able to send any data, so this may not be the answer to your problem, but it was to mine and hopefully can help somebody else:
I got the same error and looking into /var/log/healthd/daemon.log the following was repeatedly reported:
sending message(s) failed: (Aws::Healthd::Errors::GroupNotFoundException) Group 97c30ca2-5eb5-40af-8f9a-eb3074622172 does not exist
This was caused by me making and using an AMI image from an EC2 instance inside an Elastic Beanstalk environment. That is, I created a temporary environment with one instance the same configuration as my production environment, went into the EC2 console and created an image of the instance, terminated the temporary environment, and then created yet another environment using the new custom AMI.
Of course (in hindsight) this meant some settings of the temporary environment were still being used. In this case specifically /etc/healthd/config.yaml, resulting in the health agent trying to send messages to a no longer existing health group.
To fix this and make sure there was no other stale configuration around, I instead started a new EC2 instance by hand from the default AMI used in the production environment (find it under the 'Instances' configuration page of your environment), provision that, then create a new image from that and use that image in my new EB environment.
Check if your instance type's RAM is enough for app + os + amazon tooling. We suffered from this for a long time, when we discovered that t2.micro is barely enough for our use cases. The problem went away right after using t2.small (2GB).
I solved this by adding another security group (the default one for my Elastic Beanstalk).
It appears my problem was that I didn't associate a public ip address to my instance... after I set it it worked just fine.
I was running an app in elastic beanstalk environment with docker as platform. I got the same error that none of the instances are sending. And I was unable fetch logs as well.
Rebuilding the environment worked for me.
I just set the Path on load balancing to a URL that response with status code 200, for this only to study environment.
For my real app, I use actuator
If you see something like this where you don't get any enhanced metrics, check that you haven't accidentally removed the conf.d/elasticbeanstalk/healthd.conf include from your nginx config. This conf adds an machine-read log format that is responsible for reporting that data in EB (see Enhanced health log format - AWS).
My instance profile's IAM Role was lacking elasticbeanstalk:PutInstanceStatistics permission.
I found this by looking at /var/log/healthd/daemon.log as suggested in one of the other answers.
I had to SSH into the machine directly to discover this, as the Get Logs function itself was failing due to missing S3 Write permissions.
If you're running a Worker Tier EB, need to add this policy:
arn:aws:iam::aws:policy/AWSElasticBeanstalkWorkerTier
For anyone arriving here in 2022…
After launching a new environment that was identical to a current healthy environment and seeing no data, I raised an AWS Support ticket. I was informed:
Here, I would like to inform you that recently Elastic Beanstalk introduced new feature called EnhancedHealthAuthEnabled to increase security of your environment and help prevent health data spoofing on your behalf and this option will be enabled by default when you create new environment.
If you use managed policies for your instance profile, this feature is available for your new environment without any further configuration as Elastic Beanstalk instance profile managed policies contain permissions for the elasticbeanstalk:PutInstanceStatistics action. However, If you use a custom instance profile instead of a managed policy, your environment might display a No Data health status. This happens because custom instance profile doesn't PutInstanceStatistics permission by default and instances aren't authorised for the action that communicates enhanced health data to the service. Hence, your environment health shows Unknown/No data status.
The policy that I needed to attach to my existing EC2 role (as advised by AWS Support) looked like:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ElasticBeanstalkHealthAccess",
"Action": [
"elasticbeanstalk:PutInstanceStatistics"
],
"Effect": "Allow",
"Resource": [
"arn:aws:elasticbeanstalk:*:*:application/*",
"arn:aws:elasticbeanstalk:*:*:environment/*"
]
}
]
}
Adding this policy to my EC2 role solved the issue for me.
In My case when i increased my ram or instance type(t2.micro to c5.xlarge) it had resolved.