AWS: None of the Instances are sending data - amazon-web-services

I'm trying to set up an Elastic Beanstalk application with Amazon Web Services however I'm receiving a load of errors with the message None of the instances are sending data. I've tried deleting the Elastic Beanstalk Application and the EC2 instance several times with the sample application and trying again but I get the same error.
I also tried uploading a flask application with AWS Elastic Beanstalk command line tools but then I received the error below:
Environment health has transitioned from Pending to Severe. 100.0 % of the requests to the ELB are failing with HTTP 5xx. Insufficient request rate (0.5 requests/min) to determine application health (7 minutes ago). ELB health is failing or not available for all instances. None of the instances are sending data
Why do I get this error and how do I fix it? Thanks.

You are using Enhanced Health Monitoring.
With enhanced health monitoring an agent installed on your EC2 instance monitors vital system and application level health metrics and sends them directly to Elastic Beanstalk.
When you see an error message like "None of the instances are sending data", it means either the agent on the instance has crashed or it is unable to post data to Elastic Beanstalk due to networking error or some other error.
For debugging this, I would recommend downloading "Full logs" from the AWS console. You can follow the instructions for getting logs in the section "Downloading Bundle Logs from Elastic Beanstalk Console" here.
If you are unable to download logs using the console for any reason you can also ssh to the instance and look at the logs in /var/log.
You will find logs for the health agent in /var/log/healthd/daemon.log.
Additional logs useful for this situation are /var/log/cfn-init.log, /var/log/eb-cfn-init.log and /var/log/eb-activity.log. Can you look at the logs and give more details of the errors you see?
This should hopefully give you more details regarding the error "None of the instances are sending data".
Regarding other health "causes" you are seeing:
Environment health has transitioned from Pending to Severe - This is because initially your environment health status is Pending. If the instances do not go healthy within grace period health status transitions to Severe. In your case since none of the instances is healthy / sending data, the health transitioned to Severe.
100.0 % of the requests to the ELB are failing with HTTP 5xx. Insufficient request rate (0.5 requests/min) to determine application health (7 minutes ago).
Elastic Beanstalk monitors other resources in addition to your EC2 instances when using enhanced health monitoring. For example, it monitors cloudwatch metrics for your ELB. This error means that all requests sent to your environment CNAME/load balancer are failing with HTTP 5xx errors. At the same time the request rate is very low only 0.5 requests per minute, so this indicates that even though all requests are failing, the request rate is pretty low. "7 minutes ago" means that information about ELB metrics is slightly old. Because Elastic Beanstalk monitors cloudwatch metrics every few minutes, so the data can be slightly stale. This is as opposed to health data we get directly from the EC2 instances which is "near real time". In your case since the instances are not sending data the only available source for health is ELB metrics which is delayed by about 7 minutes.
ELB health is failing or not available for all instances
Elastic Beanstalk is looking at the health of your ELB, i.e. it is checking how many instances are in service behind ELB. In your case either all instances behind ELB are out of service or the health is not available for some other reason. You should double check that your service role is correctly configured. You can read how to configure service role correctly here or in the documentation. It is possible that your application failed to start.
In your case I would suggest focusing on the first error "None of the instances are sending data". For this you need to look at the logs as outlined above. Let me know what you see in the logs. The agent is started fairly early in the bootstrap process on the instance. So if you see an error like "None of the instances are sending data", it is very likely that bootstrap failed or the agent failed to start for some reason. The logs should tell you more.
Also make sure you are using an instance profile with your environment. Instance profile allows the health agent running on your EC2 instance to authenticate with Elastic Beanstalk. If instance profile is not associated with your environment then the agent will not be able to send data to Elastic Beanstalk. Read more about Instance Profiles with Elastic Beanstalk here.
Update
One common reason for the health cause "None of the instances are sending data" can be that your instance is in a VPC and your VPC does not allow NTP access. Typical indicator of this problem is the following message in /var/log/messages: ntpdate: Synchronizing with time server: [FAILED]. When this happens the clock on your EC2 instance can get out of sync and the data is considered invalid. You should also see a health cause on the instances on the health page on the AWS web console that tells you that instance clock is out-of-sync. The fix is to make sure that your VPC allows access to NTP.

There can be many reasons why the health agent is not able to send any data, so this may not be the answer to your problem, but it was to mine and hopefully can help somebody else:
I got the same error and looking into /var/log/healthd/daemon.log the following was repeatedly reported:
sending message(s) failed: (Aws::Healthd::Errors::GroupNotFoundException) Group 97c30ca2-5eb5-40af-8f9a-eb3074622172 does not exist
This was caused by me making and using an AMI image from an EC2 instance inside an Elastic Beanstalk environment. That is, I created a temporary environment with one instance the same configuration as my production environment, went into the EC2 console and created an image of the instance, terminated the temporary environment, and then created yet another environment using the new custom AMI.
Of course (in hindsight) this meant some settings of the temporary environment were still being used. In this case specifically /etc/healthd/config.yaml, resulting in the health agent trying to send messages to a no longer existing health group.
To fix this and make sure there was no other stale configuration around, I instead started a new EC2 instance by hand from the default AMI used in the production environment (find it under the 'Instances' configuration page of your environment), provision that, then create a new image from that and use that image in my new EB environment.

Check if your instance type's RAM is enough for app + os + amazon tooling. We suffered from this for a long time, when we discovered that t2.micro is barely enough for our use cases. The problem went away right after using t2.small (2GB).

I solved this by adding another security group (the default one for my Elastic Beanstalk).

It appears my problem was that I didn't associate a public ip address to my instance... after I set it it worked just fine.

I was running an app in elastic beanstalk environment with docker as platform. I got the same error that none of the instances are sending. And I was unable fetch logs as well.
Rebuilding the environment worked for me.

I just set the Path on load balancing to a URL that response with status code 200, for this only to study environment.
For my real app, I use actuator

If you see something like this where you don't get any enhanced metrics, check that you haven't accidentally removed the conf.d/elasticbeanstalk/healthd.conf include from your nginx config. This conf adds an machine-read log format that is responsible for reporting that data in EB (see Enhanced health log format - AWS).

My instance profile's IAM Role was lacking elasticbeanstalk:PutInstanceStatistics permission.
I found this by looking at /var/log/healthd/daemon.log as suggested in one of the other answers.
I had to SSH into the machine directly to discover this, as the Get Logs function itself was failing due to missing S3 Write permissions.

If you're running a Worker Tier EB, need to add this policy:
arn:aws:iam::aws:policy/AWSElasticBeanstalkWorkerTier

For anyone arriving here in 2022…
After launching a new environment that was identical to a current healthy environment and seeing no data, I raised an AWS Support ticket. I was informed:
Here, I would like to inform you that recently Elastic Beanstalk introduced new feature called EnhancedHealthAuthEnabled to increase security of your environment and help prevent health data spoofing on your behalf and this option will be enabled by default when you create new environment.
If you use managed policies for your instance profile, this feature is available for your new environment without any further configuration as Elastic Beanstalk instance profile managed policies contain permissions for the elasticbeanstalk:PutInstanceStatistics action. However, If you use a custom instance profile instead of a managed policy, your environment might display a No Data health status. This happens because custom instance profile doesn't PutInstanceStatistics permission by default and instances aren't authorised for the action that communicates enhanced health data to the service. Hence, your environment health shows Unknown/No data status.
The policy that I needed to attach to my existing EC2 role (as advised by AWS Support) looked like:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ElasticBeanstalkHealthAccess",
"Action": [
"elasticbeanstalk:PutInstanceStatistics"
],
"Effect": "Allow",
"Resource": [
"arn:aws:elasticbeanstalk:*:*:application/*",
"arn:aws:elasticbeanstalk:*:*:environment/*"
]
}
]
}
Adding this policy to my EC2 role solved the issue for me.

In My case when i increased my ram or instance type(t2.micro to c5.xlarge) it had resolved.

Related

Kibana health status is RED

I am using AWS ELK(amazon managed elastic) and my Kibana health status is red, trying to browse to the kibana URL i get "Kibana server is not ready yet".
I have tried to fix the problem but without luck, I think it all start when I changed my ELK settings from 1 availability zones with 1 instances to 2 Availability zones or another option is that I have streamed large amount of data in the last day.
As part of trials to fix the problem I returned to 1 availability zones with 1 instances but that didn't fix the problem.
Also I have Enabled the error logs and seen that I receive in cloudwatch:
"publishing cluster state with version [68816] failed for the
following nodes"
"failed to connect to node"
Any help solving this problem will help.
More info(about my current setup):
Domain status:Active
Elasticsearch version: 6.7
Availability zones:1
Instance type:r5.large.elasticsearch
Number of instances:1
Storage type:EBS
EBS volume type:General Purpose (SSD)
EBS volume size:1000 GB
Encryption at rest:Disabled
Node-to-node encryption:Disabled
Amazon Cognito for authentication:Disabled
Service software release:R20190724-P1
in the cluster health tab of the domain I can see:
Cluster status:green
MasterReachableFromNode:green
AutomatedSnapshotFailure:green
KibanaHealthyNodes:red
and in the InvalidHostHeaderRequests I have about 60% of requests that are InvalidHostHeaderRequests out of ElasticsearchRequests (but I guess that is unrelated):
CPUUtilization: is about 8%
JVMMemoryPressure: is about 20%
SysMemoryUtilization:98%
KibanaHealthyNodes is red possibly your kibana is down. Have you updated to AWS Elasticsearch v6.7 recently? Looks like the kibana needs to be restarted on the elasticsearch cluster for which AWS support team can help you with. Or in case you dont have support plan might be if you post on AWS forum someone from AWS can take a look and assist you with the same.
InvalidHostHeaderRequests will not cause the issue with kibana. AWS ES will throw this error when your application is trying to send the request on IPs of the nodes. Please check and use the domain endpoint in the request else this error will come up.
https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains.html

Amazon EC2 instance passed 1/2 checks

Newbie to Amazon Web Services here. I launched an instance from a Public AMI and found that I could not ssh into the instance - I received the error "Connection timed out." I checked the security groups to verify that Port 22 was associated with 0.0.0.0/0. Additionally, I checked the route tables to verify that 0.0.0.0/0 is associated with target gateway attached to the VPC.
I find that only 1/2 status checks have passed - the instance status check failed. I have tried stopping and starting the instance as well as terminated and launching a new instance, both to no avail. The error that I see in the system log is:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1).
From this previous question, it appears that this could be a virtualization issue, but I'm not sure if that was due to something I did on my end when launching the instance or something that occurred from the creators of the AMI? Ec2 1/2 checks passed
Any help would be appreciated!
Can you share any more details about how you deployed the instance? Did you use the AWS Management Console, or one of the command line tools or SDKs to deploy it? Which public AMI did you use? Was it one of the ones provided by Amazon?
Depending on your needs, I would make sure that you use one of the AMIs provided by Amazon, such as Ubuntu, Amazon Linux, CentOS, etc. Here's the links to the docs on AMIs, but you can learn quite a bit by just searching for images. Since you mentioned virtualization types though, I'd suggest reading up briefly on the HVM vs. Paravirtual virtualization types on AWS. Each of the instance types / families uses a certain virtualization type, which is indicated in the chart on this page.
Instance Status Checks
This documentation page covers the instance status checks, which you'll probably want to familiarize yourself with. It's entirely possible that shutting down (not restart, but shutdown) and then starting the instance back up might resolve the instance status check.
Spot Instances - cost savings!
By the way, I'll just mention this since you indicated that you're new to AWS ... if you're just playing around right now, you can save a ton of cost by deploying EC2 Spot Instances, instead of paying the normal, on-demand rates. Depending on current rates, you can save more than 50%, and per-second billing still applies. Although there's the possibility that your EC2 instance could get "interrupted" based on market demand, you can configure your Spot Instance to just "Hibernate" or "Stop" instead of terminating and relaunching. That way, your work is instance state is saved for when it relaunches.
Hope this helps!
1) Use well-known images or contact with the image developer. Perhaps it requires more than one drive or tricky partitioning.
2) make sure you selected proper HVM/PV image according to the instance type.
3) (after checks are passed) make sure the instance has public ip

I want to send metric alert (in group setting) of AWS instance with stackdriver monitoring

My question is setting when monitoring AWS metrics with stackdriver.
I'm tried thing below but, alert(policy) is not working.
How do I send alert(policy) with group settings?
I dont want is single monitoring, I do want is group settings.
I completed stackdriver monitoring setting for aws accounts by role settings. for next, I settinged group settings alert(policy) metrics is below.
load average > 5
disk usage > 80%
there target is some ec2 instances, these is group settings.
I complete settings for these. for next, did test of stress.
I looked at the metrics. Then the graph exceeded the threshold.
but not sended alert(policy), and not opened incidents.
below is details.
Alert(Policy) Creation
go to [Alerting/ Policies/ TARGET POLICY]
[Add Condition], for next select to [Metric Threshold]
RESOURCE TYPE is Instance(EC2)
APPLIES TO is Group
Select group. This group is Including EC2 Instances.
CONDITION TRIGGERS IF: Any Member Violates
IF METRIC is [CPU Load Average(past 1m)
CONDITION is above
THRESHOLD is 5 load
FOR is 1 minutes
Write by name and Push [Save Policy]
Test of Stress
ssh to target instances.
Execute stress test.
Confim the Load Average above reached 5.
but not sended alert(policy)
Confirm the Stackdriver
Confirm the above Load Average reached 5, with alert settings page.
But not opened Incidents.
I Tried other settings
For GCP instances, alerts will work correctly. It is both group setting and single setting.
Alerts will work for AWS instances in single configuration, but not for group settings.
Version info
stackdriver
stackdriver-agent version: stackdriver-agent.x86_64 5.5.2-366.amzn1
aws
OS: Amazon Linux
VERSION: 2016.03
ID_LIKE: rhel fedora
more detail is please comments.
If the agent wasn't configured correctly and is sending metrics to the wrong project, this could lead to the behavior described. This works for single instances but doesn't for group of instances. This might work for GCP because it's zero setup for monitoring GCE Instances. This causes any alerts which use group filters to not work.
https://cloud.google.com/monitoring/agent/troubleshooting#verify-project
"If you are using an Amazon EC2 VM instance, or if you are using private-key credentials on your Google Compute Engine instance, then the credentials could be invalid or they could be from the wrong project. For AWS accounts, the project used by the agent must be the AWS connector project, typically named "AWS Link..."."
These instructions at https://cloud.google.com/monitoring/agent/troubleshooting#verify-running help verify that agent is sending metrics correctly.

Amazon EC2 autoscaling instances always show status as "pending"

I created a launch configuration using the Amazon ECS-oriented AMI. All of the instances are connected to a VPC but also have a public non-EIP address.
When I create an autoscaling group, I can look in the Instances page and see the instances pass all health checks.
Furthermore, the ELB I created picks the new instances up and begins to serve traffic to them.
However, the autoscaler always shows my instances as "pending" and eventually destroys them.
What is going on?
Instances:
ELB:
Autoscaling show instances pending:
Thanks for any help!
EDIT
Here's the output from the launch log, with a very unhelpful message:
Check your ASG Activity History tab from the Auto Scaling Group module, checking in particular for the transition from Scale Out to Pending, to Terminated.
For each of those, check the 'more' arrow that will display the following fields:
Description: Launching a new EC2 instance: i-0aaaaa06b45ce05
Cause: At 2016-06-16T17:54:25Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 2 to 4
The activity history and the related lifecyle events description and cause will help you narrow down the problem quickly.
The cause for the Terminated/Cancelled event will be of particular interest. Here is an example of a Terminated event:
Description: Terminating EC2 instance: i-0aaaaaad47162b8f84
Cause: At 2016-05-20T08:12:42Z an instance was taken out of service in response to a EC2 instance status checks failure.
EDIT:
Based on the log history provided, the instance is failing to launch because of a Only EC2-Classic instances may be linked. error. There is configuration problem in the Launch Configuration.
Check your Launch Configuration, and make sure that Link to VPC option is unchecked in Advanced Details.

Why every time Elastic Beanstalk issues a command to its instance it always timed out?

I have a PHP application deployed to Amazon Elastic Beanstalk. But I notice a problem that every time I push my code changes via git aws.push to the Elastic Beanstalk, the application deployed didn't picked up the changes. I checked the events log on my application Beanstalk environment and notice that every time the Beanstalk issues:
Deploying new version to instance(s)
it's always followed by:
The following instances have not responded in the allowed command timeout time (they might still finish eventually on their own):
[i-d5xxxxx]
The same thing happens when I try to request snapshot logs. The Beanstalk issues:
requestEnvironmentInfo is starting
then after a few minutes it's again followed by:
The following instances have not responded in the allowed command timeout time (they might still finish eventually on their own): [i-d5xxxxx].
I had this problem a few times. It seems to affect only particular instances. So it can be solved by terminating the EC2 instance (done via the EC2 page on the Management Console). Thereafter, Elastic Beanstalk will detect that there are 0 healthy instances and automatically launch a new one.
If this is a production environment and you have only 1 instance and you want minimal down time
configure minimum instances to 2, and Beanstalk will launch another instance for you.
terminate the problematic instance via EC2 tab, Beanstalk will launch another instance for you because minimum instance is 2
configure minimum instance back to 1, Beanstalk will remove one of your two instances.
By default Elastic Beanstalk "throws a timeout exception" after 8 minutes (480 seconds defined in settings) if your commands did not complete in time.
You can set an higher time up to 30 minutes (1800 seconds).
{
"Namespace": "aws:elasticbeanstalk:command",
"OptionName": "Timeout",
"Value": "1800"
}
Read here: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/command-options.html
Had the same issue here (single t1.micro instance).
Did solve the problem by rebooting the EC2 instance via the EC2 page on the Management Console (and not from EB page).
Beanstalk deployment (and other features like Get Logs) work by sending SQS commands to instances. SQS client is deployed to instances and checks queue about every 20 secs (see /var/log/cfn-hup.log):
2018-05-30 10:42:38,605 [DEBUG] Receiving messages for queue https://sqs.us-east-2.amazonaws.com/124386531466/93b60687a33e19...
If SQS Client crashes or has network problems on t1/t2 instances then it will not be able to receive commands from Beanstalk, and deployment would time out. Rebooting instance restarts SQS Client, and it can receive commands again.
An easier way to fix SQS Client is to restart cfn-hup service:
sudo service cfn-hup restart
In the case of deployment, an alternative to shutting down the EC2 instances and waiting for Elastic Beanstalk to react, or messing about with minimum and maximum instances, is to simply perform a Rebuild environment on the target environment.
If a previous deployment failed due to timeout then the new version will still be registered against the environment, but due to the timeout it will not appear to be operational (in my experience the instance appears to still be running the old version).
Rebuilding the environment seems to reset things with the new version being used.
Obviously there's the downside with that of a period of downtime.
I think is the correct way to deal with this.
I think the correct way to deal with this is to figure out the cause of the timeout by doing what this answer suggests.
chongzixin's answer is what needs to be done if you need this fixed ASAP before investigating the reason for a timeout.
However, if you do need to increase timeout, see the following:
Add configuration files to your source code in a folder named .ebextensions and deploy it in your application source bundle.
Example:
option_settings:
"aws:elasticbeanstalk:command":
Timeout: 2400
*"value" represents the length of time before timeout in seconds.
Reference: https://serverfault.com/a/747800/496353
"Restart App Server(s)" from the "Actions" menu in Elastic Beanstalk management dashboard followed by eb deploy fixes it for me.
Visual cue for the first instruction
After two days of checking random issues, I restarted both EC2 instances one after another to make sure there is no downtime. Site worked fine but after a while, website started throwing error 504.
When I checked the http server, nginx was off and "Out of HDD space" was thrown. "Increased the HDD size", elastic beanstalk created new instances and the issue was fixed.
For me, the problem was my VPC security group rules. According to the docs, you need to allow outbound traffic on port 123 for NTP to work. I had the port closed, so the clock was drifting, and so the EC2's were becoming unresponsive to commands from the Elastic Beanstalk environment, taking forever to deploy (only to time out) failing to get logs, etc.
Thank you #Logan Pickup for the hint in your comment.