I've seen some weird errors happening on my worker environments only (this never happens on the web environments).
I can see this in the "Health" page:
Following services are not running: aws-sqsd.
clock out of sync...
Instance ELB health is not available
Some screenshots:
This last error I realized that when I create a new worker single environment, the instance is not attached to the Load Balancer and I have to do it manually. Can anyone tell me why this is happening?
Has anyone experienced something similar?
Related
I am running managed instance group in google cloud. These are behind a loadbalancer and it is working fine. The problem is when the managed instance group scales down, the loadbalancer will not notice this until after the instance has been killed so some requests will be sent to an instance that is dead causing the application to not work properly for a while.
On this page https://cloud.google.com/compute/docs/autoscaler/understanding-autoscaler-decisions I have read that shutdown scripts can be used. I tried to add one that tells the instance it will be shut down so it starts sending unhealthy when the load balancer does a health check, the script then waits for a while to make sure to give the load balancer time to check it. It does however not seem to work. The script seems to be called but to late so it just shuts down.
Anyone know how to write a shutdown script for this scenario?
Seems like this was not the problem after all. After inspecting the logs it turned out that the health checks timed out causing the load balancer to ot find any nodes with a healthy state.
I am setting up an infrastructure to deploy my application on AWS. I am using ECS service because I am trying to deploy a Docker-based application. So far I have created a task definition with two containers one for the apache and another one for PHP. Then I launched an ECS cluster with an EC2 instance and a task running. They all seem to be up and running. Now, I am trying to figure out how I can access the apache of my EC2 instance with the Cluster on the browser.
This is how I created the apache container.
And then I created the php container as follow.
Then I launched an EC2 based ECS cluster with one instance in it. Then I run one task within the cluster. Then I tried to open the public IP address of my instance. It just keeps loading loading and loading. What is wrong with my configuration? How can I access it on the browser?
It seems to me there's a couple of possible scenarios here you could check:
If do you reach the service and are stuck on an endless reloading loop, which might point to something in your code that could be causing it to do that,
If you're having a long wait time till the browser actually gives a timeout, which might be caused by not having the right port open on the Security Group associated with your task definition.
I'm running through this tutorial to create a deployment pipeline with my custom .net-based docker image.
But when I start a deployment, it's stuck on install phase, so I have to stop it manually:
After that I get a couple of running tasks with different task definitions (note :1 and :4, 'cause I've tried to run deployment 4 times by now):
They also change their state RUNNING->PROVISIONING->PENDING all the time. And the list of stopped tasks grows:
Q:
So, how to hunt down the issue with CodeDeploy? Why It's running forever?
UPDATE:
It is connected to health checks.
UPDATE:
I'm getting this:
(service dataapi-dev-service, taskSet ecs-svc/9223370487815385540) (port 80) is unhealthy in target-group dataapi-dev-tg1 due to (reason Health checks failed with these codes: [404]).
Don't quite understand, why is it failing for newly created container, 'cause the original one passes health-check.
While the ECS task is running, ELB (Elastic Load Balancer) will constantly do healthchecking the container as you config in the target group to check if the container is still responding.
From your debug message, the container (api) responded the healthcheck path with 404.
I suggest you config the healhcheck path in target group dataapi-dev-tg1.
For those who are still hitting this issue: in my case the ECS cluster had no outbound connectivity.
Possible solutions to this problem:
make security groups you use with your VPC allow outbound traffic
make sure that the route table you use with VPC has subnet associations with subnets you use with your load balancer (examine route tables)
I have able to figure it out because I enabled CloudWatch during ECS cluster creation and got CannotPullContainerError. For more information on solving this problem look into Cannot Pull Container Image Error.
Make sure your Internet Gateway is attached to your Subnets through the Route Table (Routes), if your Load Balancer is internet facing.
The error is due to health check which detected an unhealthy target.
Make sure to check your configuration in Target group settings.
I am hosting a Django site on Elastic Beanstalk. I haven't yet linked it to a custom domain and used to access it through the Beanstalk environment domain name like this: http://mysite-dev.eu-central-1.elasticbeanstalk.com/
Today I did some stress tests on the site which led it to spin up several new EC2 instances. Shortly afterwards I deployed a new version to the beanstalk environment via my local command line while 3 instances were still running in parallel. The update failed due to timeout. Once the environment had terminated all but one instance I tried the deployment again. This time it worked. But since then I cannot access the site through the EB environment domain name anymore. I alway get a "took too long to respond" error.
I can access it through my ec2 instance's IP address as well as through my load balancer's DNS. The beanstalk environment is healthy and the logs are not showing any errors. The beanstalk environment's domain is also part of my allowed hosts setting in Django. So my first assumption was that there is something wrong in the security group settings.
Since the load balancer is getting through it seems that the issue is with the Beanstalk environment's domain. As I understand the beanstalk domain name points to the load balancer which then redirects to the instances? So could it be that the environment update in combination with new instances spinning up has somehow corrupted the connection? If yes, how do I fix this and if no what else could be the cause?
Being a developer and newbie to cloud hosting my understanding is fairly limited in this respect. My issue seems to be similar to this one Elastic Beanstalk URL root not working - EC2 Elastic IP and Elastic IP Public DNS working
, but hasn't helped me further
Many Thanks!
Update: After one day everything is back to normal. The environment URL works as previously as if the dependencies had recovered overnight.
Obviously a server can experience downtime, but since the site worked fine when accessing the ec2 instance ip and the load balancer dns directly, I am still a bit puzzled about what's going on here.
If anyone has an explanantion for this behaviour, I'd love to hear it.
Otherwise, for those experiencing similar issues after a botched update: Before tearing out your hair in desperation, try just leaving the patient alone overnight and let the AWS ecosystem work its magic.
We have a ASP.NET MVC web application that is deployed to elastic beanstalk. There is an issue that is causing an instance to be identified as unhealthy and terminated. I suspect there's some information in the windows event logs that is going to help me diagnose the issue but once the instance is terminated, the logs are gone.
Any ideas how we could preserve the instance or the logs so that we can have a look at the windows event log?