Ok, I am lost with where to to even troubleshoot this. I am trying to spin up a stack that has a basic app running in ECS. I will show the cloudformation below. But I keep getting:
service sos-ecs-SosEcsService-1RVB1U5QXTY9S was unable to place a task
because no container instance met all of its requirements. Reason: No
Container Instances were found in your cluster. For more information,
see the Troubleshooting section.
I get 2 EC2 instances up and running but neither appear in the ECS cluster instances.
Here are a few of my theories:
is my user_data correct? do i need to sub the values?
what about the health check
my app is a sinatra app that uses port 4567. am i missing something with that?
Also, I basically started with this, http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/quickref-ecs.html and just streamlined it. So here is my current json, https://gist.github.com/kidbrax/388e2c2ae4d622b3ac4806526ec0e502
On a side note, how could I simplify this to take out all autoscaling? Just want to get it working in some form or fashion?
In order for the ECS instance to join the cluster, the following conditions must be met:
The agent must be configured correctly to connect to the correct
cluster via the /etc/ecs/ecs.config file.
The ECS instance must be assigned the correct IAM role to allow the ECS agent to access the ECS endpoints.
The ECS instance must have a connection to the Internet to contact the control plane, either via igw or NAT.
The ECS agent on the ECS Instance should be running.
UserData that should be used to configure /etc/ecs/ecs.config file.
#!/bin/bash
echo ECS_CLUSTER=ClusterName >> /etc/ecs/ecs.config
You can check reason for COntainer Instance not registering with Cluster in /var/log/ecs/ecs-agent.log*
After reading Why can't my ECS service register available EC2 instances with my ELB? I realized the issue was my userdata. The values were not being substituted correctly and so the instances were joining the defualt cluster.
Unable to place a task because no container instance met all of its requirements. Reason: No Container Instances were found in your cluster.
This usually means that your instances booted, but they're not healthy to register to cluster.
Navigate to Load Balancing Target Group of your cluster, then check the following
Health status of the instances in Targets tab.
Attributes in Description tab (the values could be off).
Health checks parameters.
If your instances are terminated, check system logs of terminated instances, and for any errors in your userdata script (check in Launch Configurations).
If the instances running, SSH to it, and verify the following:
The cluster is correctly configured in /etc/ecs/ecs.config.
ECS agent is up and running (docker ps). If it's not, start manually by: start ecs.
Check ECS logs for any errors by: tail -f /var/log/ecs/*.
Related:
terraform-ecs. Registered container instance is showing 0
How do I find the cause of an EC2 autoscaling group "health check" failure? (no load balancer involved)
Related
My main issue is trying to work out why my health checks are failing on ECS.
My setup
I have successfully set up an ECS cluster using an EC2 auto-scaling group. All the EC2 are in private subnets with NAT gateways.
I have a load-balancer all connected up to the target group which is linked to ECS.
When I try and get an HTTP response from the load balancer from my local machine, it times out. So I am obviously not getting responses back from the containers.
I have been able to ssh into the EC2 instances and confirmed the following:
ECS is deploying containers onto the EC2 instances, then after some time killing them and then firing them up again
I can curl the healthcheck endpoint from the EC2 instance (localhost) and it runs successfully
I can reach the internet from the EC2 instance, eg curl google.com returns an html response
My question is there seems to be two different types of health-check going on, and I can't figure out which is which.
ELB health-checks
The ELB seems, as far as I can tell, to use the health-checks defined in the target group.
The target group is defined as a list of EC2 instances. So does that mean the ELB is sending requests to the instances to see if they are running?
This would of course fail because we cannot guarantee that ECS will have deployed a container to each instance.
ECS health-checks
ECS however is responsible for deploying containers into these instances, in what could turn out to be a many-to-many relationship.
So surely ECS would be querying the actual running containers to find out if they are healthy and then killing them if required.
My confusion / question
I don't really understand what role the ELB has in managing the EC2 instances in this context.
It doesn't seem like the EC2 instances are being stopped and started. However from reading the docs it seems to indicate that the ASG / ELB will manage the EC2 instances and restart them if they fail the healthcheck.
Does ECS somehow override this default behaviour and take responsibility for running the healthchecks instead of the ELB?
And if not, won't the health check just fail on any EC2 instance that happens not to have a container running on it?
I am creating an AWS ECS cluster (Networking + Linux).
I follow all the steps, set up the subnets and use the existing VPC and the EC2 instance is created.
However, when I go into my cluster > ECS Instances I don't see any EC2 instances there. It doesn't seem to register there.
My EC2 instance has a public IP so that should not be an issue. What could be the problem?
You haven't specified in the question, but normally you also should modify your UserData so that it registers with the non-default cluster:
#!/bin/bash
echo ECS_CLUSTER=<your-cluster-name> >> /etc/ecs/ecs.config
Also Amazon ECS-optimized AMI should be used which has pre-installed ECS Agent.
Edit: Also need to make sure that instances have access to the ECS Service, for example by having public IP and internet access. Without that, ECS Agent won't be able to communicate with the ECS Service.
UserData in console can be specified in the following configuration:
You can also use Launch Templates or Launch Configurations to specify the UserData reduce the amount of work needed when launching new instances.
We have started using ECS and we are not quite sure if the behaviour we are experiencing is the correct one, and if it is, how to work around it.
We have setup a Beanstalk Docker Multicontainer environment which in the background uses ECS to manage everything, that has been working just fine. Yesterday, we created a standalone cluster in ECS "ecs-int", a task definition "ecs-int-task" and a service "ecs-int-service" associated to a load balance "ecs-int-lb" and we added one instance to the cluster.
When the service first ran, it worked fine and we were able to reach the docker service through the loadbalance. While we were playing with the instance security group that is associated to the cluster "ecs-int" we mistakenly removed the port rule where the container were running, and the health check started failing on the LB resulting it in draining the instance out from it. When it happened, for our surprise the service "ecs-int-service" and the task "ecs-int-task" automatically moved to the Beanstalk cluster and started running there creating an issue for our beanstalk app.
While setting up the service we setup the placement rule we set as "AZ Balanced Spread".
Should the service move around cluster? Shouldn't the service be attached only to the cluster it was originally created to? If this is the normal behaviour though, how can we set a rule so he service even if the instances for some reason fail the health check but to stick within the same cluster?
Thanks
I have re-created all the infrastructure again and the problem went away. As I suspected, services created for one cluster should not move to different cluster when instance(s) fail.
I have started 2 ECS optimized instances on EC2, but how can I register them as ECS container instances ?
Can not figure out a way of doing that.
When you start an ECS optimized image, it starts the ECS agent on the instance by default. The ecs agent registers the instance with the default ecs cluster.
For your instance to be available on the cluster, you will have to create the default cluster.
if you have a custom ecs cluster, you can set the cluster name using the userdata section.
The ecs agent expects the cluster name inside the ecs.config file available at /etc/ecs/ecs.config.
You can set it up at instance boot up using userdata script
#!/bin/bash
echo ECS_CLUSTER={cluster_name} >> /etc/ecs/ecs.config
Please refer to the following ecs documentation for more information
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html
When you create an EC2 instance, you must specified the IAM role linked to your ECS container (if using SDK/..., you must specified the "Instance Profile ARN" of this role in the parameters), if you use the interactive ECS cluster creation at your first ECS use on the aws website, you should already have an ecsInstanceRole link to the default cluster.
Then, after being launched, your EC2 instance will be automatically register as ECS container in this cluster.
Other than the user-data script echoing the non-default cluster's name, remember that the container instances need external network access to communicate with the Amazon ECS service. So, if your container instances do not have public IP addresses, then they must use network address translation (NAT) gateway to provide this access.
Source: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html
One more thing you can do to register instances in the cluster is to:
Create a service and assign it a task;
When creating a service - choose a load balancer and respective number of tasks that should be launched;
Afterwards, create a target group for the load balancer (if one doesn't exist already);
You have 2 options now - either create desired instances manually or edit a launch template of your cluster (based on the template, the instances will be created automatically);
If you create instances via the launch template - they will be linked to the target group automatically (because you selected the respective load balancer when creating the service);
Otherwise add them manually - any instance that passes health checks and is in your service target group will be automatically added to the cluster, unless the cluster already has the max. amount of instances.
I've got an EC2 launch configuration that builds the ECS optimized AMI. I've got an auto scaling group that ensures that I've got at least two available instances at all times. Finally, I've got a load balancer.
I'm trying to create an ECS service that distributes my tasks across the instances in the load balancer.
After reading the documentation for ECS load balancing, it's my understanding that my ASG should not automatically register my EC2 instances with the ELB, because ECS takes care of that. So, my ASG does not specify an ELB. Likewise, my ELB does not have any registered EC2 instances.
When I create my ECS service, I choose the ELB and also select the ecsServiceRole. After creating the service, I never see any instances available in the ECS Instances tab. The service also fails to start any tasks, with a very generic error of ...
service was unable to place a task because the resources could not be found.
I've been at this for about two days now and can't seem to figure out what configuration settings are not properly configured. Does anybody have any ideas as to what might be causing this to not work?
Update # 06/25/2015:
I think this may have something to do with the ECS_CLUSTER user data setting.
In my EC2 auto scaling launch configuration, if I leave the user data input completely empty, the instances are created with an ECS_CLUSTER value of "default". When this happens, I see an automatically-created cluster, named "default". In this default cluster, I see the instances and can register tasks with the ELB like expected. My ELB health check (HTTP) passes once the tasks are registered with the ELB and all is good in the world.
But, if I change that ECS_CLUSTER setting to something custom I never see a cluster created with that name. If I manually create a cluster with that name, the instances never become visible within the cluster. I can't ever register tasks with the ELB in this scenario.
Any ideas?
I had similar symptoms but ended up finding the answer in the log files:
/var/log/ecs/ecs-agent.2016-04-06-03:
2016-04-06T03:05:26Z [ERROR] Error registering: AccessDeniedException: User: arn:aws:sts::<removed>:assumed-role/<removed>/<removed is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-west-2:<removed:cluster/MyCluster-PROD
status code: 400, request id: <removed>
In my case, the resource existed but was not accessible. It sounds like OP is pointing at a resource that doesn't exist or isn't visible. Are your clusters and instances in the same region? The logs should confirm the details.
In response to other posts:
You do NOT need public IP addresses.
You do need: the ecsServiceRole or equivalent IAM role assigned to the EC2 instance in order to talk to the ECS service. You must also specify the ECS cluster and can be done via user data during instance launch or launch configuration definition, like so:
#!/bin/bash
echo ECS_CLUSTER=GenericSericeECSClusterPROD >> /etc/ecs/ecs.config
If you fail to do this on newly launched instances, you can do this after the instance has launched and then restart the service.
In the end, it ended up being that my EC2 instances were not being assigned public IP addresses. It appears ECS needs to be able to directly communicate with each EC2 instance, which would require each instance to have a public IP. I was not assigning my container instances public IP addresses because I thought I'd have them all behind a public load balancer, and each container instance would be private.
Another problem that might arise is not assigning a role with the proper policy to the Launch Configuration. My role didn't have the AmazonEC2ContainerServiceforEC2Role policy (or the permissions that it contains) as specified here.
You definitely do not need public IP addresses for each of your private instances. The correct (and safest) way to do this is setup a NAT Gateway and attach that gateway to the routing table that is attached to your private subnet.
This is documented in detail in the VPC documentation, specifically Scenario 2: VPC with Public and Private Subnets (NAT).
It might also be that the ECS agent creates a file in /var/lib/ecs/data that stores the cluster name.
If the agent first starts up with the cluster name of 'default', you'll need to delete this file and then restart the agent.
There where several layers of problems in our case. I will list them out so it might give you some idea of the issues to pursue.
My gaol was to have 1 ECS in 1 host. But ECS forces you to have 2 subnets under your VPC and each have 1 instance of docker host. I was trying to just have 1 docker host in 1 availability zone and could not get it to work.
Then the other issue was that the only one of the subnets had an attached internet facing gateway to it. So one of them was not accessible from public.
The end result was DNS was serving 2 IPs for my ELB. And one of the IPs would work and the other did not. So I was seeing random 404s when accessing the NLB using the public DNS.