So, I have an interesting issue...
Private subnets for an EKS cluster have run out of IPs.. there aren't enough addresses in the subnets to allow the cluster to fully 'keep warm' all of the IPs it can use, so the subnets read '0 Free' in the AWS Console.. but clearly some are free in the cluster because the issue is intermittent (but becoming more problematic).
The current node groups are unmanaged, the cluster can't be rebuilt.
I created new subnets, and created a managed node group... then when I check the nodes, they can't resolve DNS. They can ping IPs but not resolve DNS..
The obvious answer is a missing rule in the SG.. but; the unmanaged nodes have a custom SG which a bunch of rules for cross VPC comms etc. I know we can't add a custom SG to a managed nodegroup... so I thought, perhaps I'd replicate the custom SG onto the SG the managed nodegroup is using (the EKS managed SG which is used for backplane and manage node groups).
This did not resolve the problem.
My questions are mainly... should this work? If the SG isn't stopping DNS from working, then what is? (I've spun up a busybox pod on a node in the node group and checked nslookup.. the correct DNS server is there, but no resolution).
I read that AWS are working on a solution to adding a custom SG to a managed nodegroup, but it's not yet possible.
Related
I "inherited" an unmanaged EKS cluster with two nodegroups created through eksctl with Kubernetes version 1.15. I updated the cluster to 1.17 and managed to create a new nodegroup with eksctl and nodes successfully join the cluster (i had to update aws-cni to 1.6.x from 1.5.x to do so). However the the Classic Load Balancer of the cluster marks my two new nodes as OutOfService.
I noticed the Load Balancer Security Group was missing from my node Security Groups thus i added it to my two new nodes but nothing changed, still the nodes were unreachable from outside the EKS cluster. I could get my nodes change their state to InService by applying the Security Group of my two former nodes but manually inserting the very same inbound/outbound rules seems to sort no effect on traffic. Only the former nodegroup security group seems to work in this case. I reached a dead end and asking here because i can't find any additional information on AWS documentation. Anyone knows what's wrong?
I have a few questions on EKS node groups.
I dont understand the concept of Node group and why is it required. Can't we create an EC2 and run kubeadm join for joining EC2 node to EKS Cluster. What advantage does node group hold.
Does node groups (be it managed or self-managed) have to exist in same VPC of EKS cluster. Is it not possible to create node group in another VPC. If so how?
managed node groups is a way to let AWS manage part of the lifecycle of the Kubernetes nodes. For sure you are allowed to continue to configure self managed nodes if you need/want to. To be fair you can also spin up a few EC2 instances and configure your own K8s control plane. It boils down to how much you wanted managed Vs how much you want to do yourself. The other extreme on this spectrum would be to use Fargate which is a fully managed experience (where there are no nodes to scale, configure, no AMIs etc).
the EKS cluster (control plane) lives in a separate AWS managed account/VPC. See here. When you deploy a cluster EKS will ask you which subnets (and which VPC) you want the EKS cluster to manifest itself (through ENIs that get plugged into your VPC/subnets). That VPC is where your self managed workers, your managed node groups and your Fargate profiles need to be plugged into. You can't use another VPC to add capacity to the cluster.
Background: I have a kubernetes cluster set up in one AWS account that needs to access data in an RDS MySQL instance in a different account and I can't seem to get the settings correct to allow traffic to flow.
What I've tried so far:
Setup a peering connection between the two VPCs. They are in the same region, us-east-1.
Created Route table entries in each account to point traffic on the corresponding subnet to the peering connection.
Created a security group in the RDS VPC to allow traffic from the kubernetes subnets to access MySql.
Made sure DNS Resolution is enabled on both VPC's.
Kubernetes VPC details (Requester)
This contains 3 EC2's (looks like each has its own subnet) that house my kubernetes cluster. I used EKS to set this up.
The route table rules I set up have the 3 subnets associated, and point the RDS VPC CIDR block at the peering connection.
RDS VPC details (Accepter)
This VPC contains the mysql RDS instance, as well as some other resources. The RDS instance has quite a few VPC security groups assigned to it for access from our office IP's etc. It has Public Accessibility set to true.
I repeated the route table setup (in reverse) and pointed back to the K8s VPC subnet / peering connection.
Testing
To test the connection, I've tried 2 different ways. The application that needs to access mysql is written in node, so I just wrote a test connector and example query and it times out.
I also tried netcat from a terminal in the pod running in the kubernetes cluster.
nc -v {{myclustername}}.us-east-1.rds.amazonaws.com 3306
Which also times out. It seems to be trying to hit the correct mysql instance IP though so I'm not sure if that means my routing rules are working right from the k8s vpc side.
DNS fwd/rev mismatch: ec2-XXX.compute-1.amazonaws.com != ip-{{IP OF MY MYSQL}}.ec2.internal
I'm not sure what steps to take next. Any direction would be greatly appreciated.
Side Note: I've read thru this Kubernetes container connection to RDS instance in separate VPC
I think I understand what's going on there. My CIDR blocks do not conflict with the default K8s ips (10.0...) so my problem seems to be different.
I know this was asked a long time ago, but I just ran into this problem as well.
It turns out I was editing the wrong AWS routing table! When I ran kops to create my cluster, it created a new VPC with its own routing table but also another routing table! I needed to add the peer connection route to the cluster's routing table instead of the VPC's Main routing table.
I am faced with a chicken and egg problem. I currently have a server in EC2 classic, as well as an RDS instance -- in EC2 classic as well. The EC2 instances also interact with Cassandra cluster, which also resides in EC2 classic.
However, I need to move RDS into the VPC. Now, in an ideal world, I'd have all of my stuff in VPC at this point. However, that presents a major migration challenge and I'd like to minimize impact on users and keep steps to minimum -- this is mainly because of the Cassandra cluster.
It turns out that I cannot create security group rules between VPC and Non-VPC security groups.
So, how can I have RDS in VPC that my EC2 instances can access w/o having to open up my RDS to the entire world ?
Any help is greatly appreciated.
UPDATE: So, one idea I had is to assign elastic IPs to my EC2 instances and add IPs explicitly to the security group for RDS within VPC. Would that work ? (trying it now using https://github.com/skymill/aws-ec2-assign-elastic-ip)
Yes, unfortunately that's the only way to do it. You cannot use DNS in security groups, so you're stuck with IP address.
So, I ended up solving it exactly like I described -- assign elastic IPs to my EC2 instances and add IPs explicitly to the security group for RDS within VPC. It ended up working great.
When a new security group is added, or the existing one is modified, the affects are not visible. For instance, I have a security group called “mdi-sg-redshift” with two rules:
As you can see, these rules allow inbounds from anyone across the globe. When applied to the cluster, they should allow inbounds at those ports. Does NOT work! I have rebooted the cluster to no affect.
Here is the snapshot of my Redshift Cluster:
Here is the snapshot of the port scanner.
The cluster was rebooted several times to no effect.
Also noted that the cluster belongs to the same region as the VPC and the security group. The cluster belongs to the VPC that has the security group applied.
I have seen similar issues on EC2 side, but reboots usually fixed it. Not this time.
Anyone with insights? Thanks!
This sounds mostly a VPC rules issue.
Things I will check:
Do you get the same issue if you create your cluster outside of VPC?
Check Cluster Subnet group. It says default in your screen shot. Which subnet groups is dded to this default subnet group? Make sure your cluster is running in the subnet which is added to default subnet group.
Check VPC security group policy for the Red-shift cluster
Did this set-up ever worked in the past ? OR is it the 1st time you are working on this cluster? If it worked in the past, then what setting with respect to VPC/cluster subnet group/ VPC security groups has changed?
Where are you accessing Redshift from?
If you ar trying to access Redshift from outside VPC then please check the Route Table for an entry of Internet Gateway (to verify if the Redshift cluster is publicly available over internet)
If you are trying to access Redshift from within VPC then there might be some other issue that might be stopping access