AWS VPC for EKS node group - amazon-web-services

I am trying to create a Node group in my EKS cluster, but I am getting '''NodeCreationFailure: Instances failed to join the kubernetes cluster'''.
After reading many documentations I think the problem is in the VPC configuration. I've tried multiple solutions like enabling DNS host name, and adding endpoints to the subnets but still having the same error.
Can any one guide me to solve this issue?

First, make sure that the VPC and Subnet configurations are correct and that the EC2 instances that you are trying to create the Node Group with have the correct security group settings.
Next, ensure that the EC2 instances have the correct IAM role attached and that it has the necessary permissions for the EKS cluster.
Finally, ensure that the IAM user or role that you are using to create the Node Group has the correct permission for the EKS cluster.
If all of the above are configured correctly, you may need to check the EKS cluster logs to troubleshoot further. You can do this by running the command "kubectl get events --sort-by=.metadata.creationTimestamp" in the EKS cluster.

Related

Is EKS Nodegroup really necessary

I have a few questions on EKS node groups.
I dont understand the concept of Node group and why is it required. Can't we create an EC2 and run kubeadm join for joining EC2 node to EKS Cluster. What advantage does node group hold.
Does node groups (be it managed or self-managed) have to exist in same VPC of EKS cluster. Is it not possible to create node group in another VPC. If so how?
managed node groups is a way to let AWS manage part of the lifecycle of the Kubernetes nodes. For sure you are allowed to continue to configure self managed nodes if you need/want to. To be fair you can also spin up a few EC2 instances and configure your own K8s control plane. It boils down to how much you wanted managed Vs how much you want to do yourself. The other extreme on this spectrum would be to use Fargate which is a fully managed experience (where there are no nodes to scale, configure, no AMIs etc).
the EKS cluster (control plane) lives in a separate AWS managed account/VPC. See here. When you deploy a cluster EKS will ask you which subnets (and which VPC) you want the EKS cluster to manifest itself (through ENIs that get plugged into your VPC/subnets). That VPC is where your self managed workers, your managed node groups and your Fargate profiles need to be plugged into. You can't use another VPC to add capacity to the cluster.

NodeCreationFailure-> Unhealthy nodes in the kubernetes cluster

I have created a Amazon Elastic Kubernetes Service in US East (Ohio)us-east-2 region. After cluster setup I have created Fargate profile which is done successfully. Now I am trying to Add a Node group but its ends with showing error "NodeCreationFailure Unhealthy nodes in the kubernetes cluster" issue. What's the reason?
your nodes are unable to register with your Amazon EKS cluster.
A quick and dirty solution consists in adding AmazonEKS_CNI_Policy to the worker nodegroup role.
If that's solve the problem please be aware that the recommended approach is instead:
https://docs.aws.amazon.com/eks/latest/userguide/cni-iam-role.html

Why are ecs services trying to attach their ENIs to the ec2 instance? leading to this error "encountered error "RESOURCE:ENI"."

I have 3 aws accounts with almost identical cloudformation. In 2 of them I am able to run up to 8 ecs services per ec2 instance. Each service has it's own ENI, this ENI is not attached to anything and not attached to the ec2 instance. Everything works.
In 1 of my aws accounts, each ecs service is trying to attach its ENI to the ec2 instance and so I now see the unable to place a task because no container instance met all of its requirements."RESOURCE:ENI" error and I'm unable to deploy more than 2 services per instance. This is because each ec2 instance has a limit to the ENIs you can attach.
VPC trunking is not on in the working accounts, so my question is why now are the ecs services attaching their ENIs to the ec2 instance? is there an option somewhere that says "don't attach your ENI to anything". Is it maybe normal to attach the ENIs and actually my working accounts should be attaching them but they aren't?
The answer is that vpc trunking was actually on in the other accounts. Just because you can't see the vpctrunking option checked in ecs account settings doesn't mean that another user/role might have vpctrunking set to on.
Or maybe vpctrunking will appear to be on when you check account settings in ecs but that only displays the setting for your user and not for the role the ecs ec2 instances are using.
I needed to set account wide vpc trunking and more importantly properly read documentation.

AWS Elastic Beanstalk unable to access AWS MSK

I have an AWS MSK cluster running inside a VPC with 3 subnets.
When I created my Elastic Beanstalk (Java) environment it asked for VPC and I configured the same VPC where my MSK cluster is running. I also selected all the three listed subsets in my Elastic Beanstalk Network configuration. I did not assigned a public IP as I don't require access from internet to Elastic Beanstalk instances.
I also assigned AWS MSK Full Access permissions to the IAM Instance Profile that I selected for my Elastic Beanstalk environment under Security configuration. Just for information completeness, I selected the AWSServiceRoleForElasticBeanstalk as a service role.
On a side note, when I configured my Lambda to access the MSK cluster, it asked me for VPC as well as Security Groups explicitly. But I don't see any such configuration options for Security Groups in case of Elastic Beanstalk. Am I overlooking something here? my Lambda is able to successfully access MSK cluster.
I don't understand why my Elastic Beanstalk instance is unable to access my AWS MSK cluster. Am I missing something?
With the help of AWS Support, I was able to resolve this issue.
First, you can configure Security Groups under 'Instances' configuration card.
But, it was a bit confusing for me because, the VPC and Subnets are under 'Networking' configuration card, which is stacked way after the 'Instances' configuration card. And the Security Groups listed under 'Instances' directly depends on the VPC and Subnets selected under 'Networking'. If you change your selection in 'Networking' then you should update/review your Security Groups selection under 'Instances' as well.
So, in my case, first I select my target VPC and related Subnets under 'Networking' and only then I was able to see my target Security Groups under 'Instances'.

AWS ECS Error when running task: No Container Instances were found in your cluster

Im trying to deploy a docker container image to AWS using ECS, but the EC2 instance is not being created. I have scoured the internet looking for an explanation as to why I'm receiving the following error:
"A client error (InvalidParameterException) occurred when calling the RunTask operation: No Container Instances were found in your cluster."
Here are my steps:
1. Pushed a docker image FROM Ubuntu to my Amazon ECS repo.
2. Registered an ECS Task Definition:
aws ecs register-task-definition --cli-input-json file://path/to/my-task.json
3. Ran the task:
aws ecs run-task --task-definition my-task
Yet, it fails.
Here is my task:
{
"family": "my-task",
"containerDefinitions": [
{
"environment": [],
"name": "my-container",
"image": "my-namespace/my-image",
"cpu": 10,
"memory": 500,
"portMappings": [
{
"containerPort": 8080,
"hostPort": 80
}
],
"entryPoint": [
"java",
"-jar",
"my-jar.jar"
],
"essential": true
}
]
}
I have also tried using the management console to configure a cluster and services, yet I get the same error.
How do I configure the cluster to have ec2 instances, and what kind of container instances do I need to use? I thought this whole process was to create the EC2 instances to begin with!!
I figured this out after a few more hours of investigating. Amazon, if you are listening, you should state this somewhere in your management console when creating a cluster or adding instances to the cluster:
"Before you can add ECS instances to a cluster you must first go to the EC2 Management Console and create ecs-optimized instances with an IAM role that has the AmazonEC2ContainerServiceforEC2Role policy attached"
Here is the rigmarole:
1. Go to your EC2 Dashboard, and click the Launch Instance button.
2. Under Community AMIs, Search for ecs-optimized, and select the one that best fits your project needs. Any will work. Click next.
3. When you get to Configure Instance Details, click on the create new IAM role link and create a new role called ecsInstanceRole.
4. Attach the AmazonEC2ContainerServiceforEC2Role policy to that role.
5. Then, finish configuring your ECS Instance. NOTE: If you are creating a web server you will want to create a securityGroup to allow access to port 80.
After a few minutes, when the instance is initialized and running you can refresh the ECS Instances tab you are trying to add instances too.
I ran into this issue when using Fargate. I fixed it when I explicitly defined launchType="FARGATE" when calling run_task.
Currently, the Amazon AWS web interface can automatically create instances with the correct AMI and the correct name so it'll register to the correct cluster.
Even though all instances were created by Amazon with the correct settings, my instances wouldn't register. On the Amazon AWS forums I found a clue. It turns out that your clusters need internet access and if your private VPC does not have an internet gateway, the clusters won't be able to connect.
The fix
In the VPC dashboard you should create a new Internet Gateway and connect it to the VPC used by the cluster.
Once attached you must update (or create) the route table for the VPC and add as last line
0.0.0.0/0 igw-24b16740
Where igw-24b16740 is the name of your freshly created internet gateway.
Other suggested checks
Selecting the suggested AMI which was specified for the given region solved my problem.
To find out the AMI - check Launching an Amazon ECS Container Instance.
By default all the ec2 instances are added to default cluster . So the name of the cluster also matters.
See point 10 at Launching an Amazon ECS Container Instance.
More information available in this thread.
Just in case someone else is blocked with this problem as I was...
I've tried everything here and didn't work for me.
Besides what was said here regards the EC2 Instance Role, as commented here, in my case only worked if I still configured the EC2 Instance with simple information. Using the User Data an initial script like this:
#!/bin/bash
cat <<'EOF' >> /etc/ecs/ecs.config
ECS_CLUSTER=quarkus-ec2
EOF
Informing the related ECS Cluster Name created at this ecs config file, resolved my problem. Without this config, the ECS Agent Log at the EC2 Instance was showing an error that was not possible to connect to the ECS, doing this I've got the EC2 Instance visible to the ECS Cluster.
After doing this, I could get the EC2 Instance available for my EC2 Cluster:
The AWS documentation said that this part is optional, but in my case, it didn't work without this "optional" configuration.
When this happens, you need to look to the following:
Your EC2 instances should have a role with AmazonEC2ContainerServiceforEC2Role managed policy attached to it
Your EC2 Instances should be running AMI image which is ecs-optimized (you can check this in EC2 dashboard)
Your VPC's private subnets don't have public IPs assigned, OR you do not have an interface VPC endpoint configured, OR you don't have NAT gateway set up
Most of the time, this issue appears because of the misconfigured VPC. According to the Documentation:
QUOTE: If you do not have an interface VPC endpoint configured and your container instances do not have public IP addresses, then they must use network address translation (NAT) to provide this access.
To create a VPC endpoint: Follow to the documentation here
To create a NAT gateway: Follow to the documentation here
These are the reasons why you don't see the EC2 instances listed in the ECS dashboard.
If you have come across this issue after creating the cluster
Go the ECS instance in the EC2 instances list and check the IAM role that you have assigned to the instance. You can identify the instances easily with the instance name starts with ECS Instance
After that click on the IAM role and it will direct you to the IAM console. Select the AmazonEC2ContainerServiceforEC2Role policy from the permission policy list and save the role.
Your instances will be available in the cluster shortly after you save it.
The real issue is lack of permission. As long as you create and assign a IAM Role with AmazonEC2ContainerServiceforEC2Role permission, the problem goes away.
I realize this is an older thread, but I stumbled on it after seeing the error the OP mentioned while following this tutorial.
Changing to an ecs-optimized AMI image did not help. My VPC already had a route 0.0.0.0/0 pointing to the subnet. My instances were added to the correct cluster, and they had the proper permissions.
Thanks to #sanath_p's link to this thread, I found a solution and took these steps:
Copied my Autoscaling Group's configuration
Set IP address type under the Advanced settings to "Assign a public IP address to every instance"
Updated my Autoscaling Group to use this new configuration.
Refreshed my instances under the Instance refresh tab.
Another possible cause that I ran into was updating my ECS cluster AMI to an "Amazon Linux 2" AMI instead of an "Amazon Linux AMI", which caused my EC2 user_data launch script to not work.
for other than ecs-optimized instance image. Please do below step
Install ECS Agent ECS Agent download link
ECS_CLUSTER=REPLACE_YOUR_CLUSTER_NAME
add above content to /etc/ecs/ecs.config
The VPC will need to communicate with the ECR.
To do this, the security group attached to the VPC will need an outbound rule of 0.0.0.0/0.