I have recently setup a EKS cluster in AWS for my company's new project. Before I get into my issue, here is some info of my setup. There are two nodes (at the moment) in the nodegroup with auto-scaling. At first, I created the nodes in the private subnet as I think it is more secure. But my supervisor told me that we will need the capability to SSH to the node directly, so I recreated the nodes in the public subnet so that we can SSH to the node using public key.
We have two CMS instances sitting in AWS (for example aws.example.com) and DigitalOcean (for example do.example.com) that contains some data. When the containers in EKS cluster start, some of them will need to access the instance in AWS by using the url aws.example.com or do.example.com. If the containers in EKS failed to access the instances, the container will still run but the app in it won't. So I need to whitelist the public IP of all my EKS nodes on the two CMS instances in order for the app work. And it works.
I am using ingress in my EKS cluster. When I created the ingress object, the AWS created an application load balancer for me. All the inbound traffic is being handled by the ALB, it is great. But here comes the issue. When more containers are created, the auto-scaling spin up new nodes in my EKS cluster (with different public IP every time), then I will have to go to the two CMS instances to whitelist the new public IP address.
So, is there any way to configure in such a way that all the nodes to use a single fixed IP address for outbound traffic? or maybe configure them to use the ALB created by ingress for outbound traffic as well? or I need to create a server to do that? I am very lost right now.
Here is what I have tried:
When the cluster is created, it seems like it created a private subnet as well even though I specify the nodes to be created in public subnet. There is a nat-gateway (ngw-xxxxxx) created for the private subnet and it comes with an Elastic IP (for example 1.2.3.4). The routetable of the public subnet is as below:
192.168.0.0/16 local
0.0.0.0/0 igw-xxxxxx
So I thought by changing igw-xxxxxx to ngw-xxxxxx, all the outbound traffic will use the ngw-xxxxxx and send the traffic to the outside world using IP address 1.2.3.4, which I just need to whitelist 1.2.3.4 on my two CMS instances. But right after I applied the changes, all containers are terminated and all things stopped working.
Exactly, as #Marcin mentioned in the comment above.
You should move your node-group to the private subnet.
Private subnet
As the docs tell:
Use a public subnet for resources that must be connected to the internet, and a private subnet for resources that won't be connected to the internet
The idea of private subnet is to forbid access to resources inside directly from the internet.
You can read really good part of AWS documentation here: https://docs.aws.amazon.com/vpc/latest/userguide/how-it-works.html
For private subnet you need to setup outgoing traffic thru your Nat gateway in the Route Table (read here: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html).
Public subnet
If you need your cluster in the public subnet for some reasons, but it is BAD parctice, you can do the trick:
You can route traffic via Nat Gateways from public subnet only to a specific server(your CMS).
Your public subnet route table may look like:
Destination
Target
10.0.0.0/16
local
W.X.Y.Z/32
nat-gateway-id
0.0.0.0/0
igw-id
Where W.X.Y.Z/32 is your CMS IP address.
Some hint
Moreover good practice is to allocate a pool of EIP and attach them to NAT Gateway to be sure it is not changed in the future.
When you want to modify infrastructure and create more complicated Nat (e.g. you want to filtering traffic on layer 7), you can createHigh Availability Nat instances, and attach the IP to NAT instances instead of NAT Gateways.
In that situation you will avoid mailing some 3-rd party APIs to whitelist your new IPs.
Related
Using Terraform to setup a VPC with two EC2s in private subnets. The setup needs to SSH to the EC2s to install package updates from the Internet and install the application software. To do this there is an IGW and a NAT-GW in a public subnet. Both EC2s can access the Internet at this point as both private subnets are routing to the NAT-GW. Terraform and SSH to the private subnets is done via Client VPN.
One of the EC2s is going to host a web service so a Classic mode Load Balancer is added and configured to target the web server EC2. Using Classic mode because I can't find a way to make Terraform build Application mode LBs. The Load Balancer requires the instance to be using a subnet that routes to the IGW, so it is changed from routing to the NAT-GW, to the IGW. At this point, the Load Balancer comes online with the EC2 responding and public Internet can access the web service using the DNS supplied End Point for the LB.
But now the web server EC2 can no longer access the Internet itself. I can't curl google.com or get package updates.
I would like to find a way to let the EC2 access the Internet from behind the LB and not use CloudFront at this time.
I would like to keep the EC2 in a private subnet because a public subnet causes the EC2 to have a public IP address, and I don't want that.
Looking for a way to make LB work without switching subnets, as that would make the EC web service unavailable when doing updates.
Not wanting any iptables or firewalld tricks. I would really like an AWS solution that is disto agnostic.
A few points/clarifications about the problems you're facing:
Instances on a public subnet do not need a NAT Gateway. They can initiate outbound requests to the internet via IGW. NGW is for allowing outbound IPv4 connections from instances in private subnets.
The load balancer itself needs to be on a public subnet. The instances that the LB will route to do not. They can be in the same subnet or different subnets, public or private, as long as traffic is allowed through security groups.
You can create instances without a public IP, on a public subnet. However, they won't be able to receive or send traffic to the internet.
Terraform supports ALBs. The resource is aws_lb with load_balancer_type set to "application" (this is the default option).
That said, the public-private configuration you want is entirely possible.
Your ALB and NAT Gateway need to be on the public subnet, and EC2 instances on the private subnet.
The private subnet's route table needs to have a route to the NGW, to facilitate outbound connections.
EC2 instances' security group needs to allow traffic from the ALB's security group.
It sounds like you got steps 1 and 2 working, so the connection from ALB to EC2 is what you have to work on. See the documentation page here as well - https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Scenario2.html
I want some of my GKE deployments to use a public static IP for egress traffic to the internet.
Here is what I already know is possible:
Use gcp's nat gateway and NAT ALL public traffic from a cluster/vpc
Create a GCE instance with IP forwarding and create a routing rule to route specific traffic through the GCE instance- to selectively NAT traffic
I'd like to avoid either and just assign a reserved global IP to a GKE deployment/pod (like I can assign a reserved IP to an ingress). Is this at all possible?
I want outbound traffic from some pods (deployments) to use the same static public IP, but for most deployments I don't want to NAT their traffic at all.
I also can't use the underlying node's public IPs because I autoscale and the node's IP could change - you can't use reserved IPs for nodes as far as I know.
EDIT: Azure seems to support what I'm looking for with azure-egress https://learn.microsoft.com/en-us/azure/aks/egress. So I can see at least one provider has an official solution for this. I am wondering if GKE has something similar.
You should go with the 2nd option - Create a GCE instance that will serve as a NAT instance.
Then, you can assign different network tags for different node pools in your cluster, so only one of your node pools will route its public traffic to the NAT instance you created.
You then can use Node Taints and Tolerations, to make sure only the deployments you want to route to that NAT instance will be allocated to the nodes in your special node pool.
For example, configure this taint: traffic=nat:NoExecute and add the following toleration to your deployment:
tolerations:
- effect: NoExecute
key: traffic
value: "nat"
I am new to AWS, Kubernetes, EKS, and AppMesh, but had done DevOps in previous roles.
I am taking over some K8 cluster which used EKS and found that we set up NAT gateway which helps redirect egress traffic outbound as a single IP (we need that for whitelisting as 3rd-party external service require it). Pods hosted in a private subnet works fine.
But I found that Pods which hosted on public subnet just skip the NAT gateway, it uses the Public DNS (IPv4) IP address for outbound calls, which don't work for us as it does not use the single NAT gateway.
So I have a few questions:
How do we migrate Pods from Public subnet Hosts to Private subnets Hosts?
Should we use nodeSelector, Node affinity? Do labelings on the Nodes work?
I am not sure why we have Nodes in a public subnet, but we followed this guide: https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html
If we do choose to be on fully private subnets, can we make an exception for such mapping that we allow some Pods to have HTTP endpoints to be exposed for ingress traffic, while still on private subnets?
What do you recommend us to handle when a Pod/Container needs to use NAT gateway for egress traffic, but then exposing HTTP endpoints for ingress traffic?
Note that currently, our EKS is by default set to all public, should we move to Public and private mode?
Thanks for all the answers in advance!.
How do we migrate Pods from Public subnet Hosts to Private subnets Hosts? Should we use nodeSelector, Node affinity? Do labelings on the Nodes work?
Yes. Use Node Affinity which same as using a nodeSelector. You can do a rolling change by updating whatever resource you are using to manage your pods (i.e Deployment, Statefulset, DaemonSet, etc). If you configured it correctly the next time your pods start, they will be in the private subnet hosts.
I am not sure why we have Nodes in a public subnet, but we followed this guide:
https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html
The guide says public subnet so it makes sense that there is one.
If we do choose to be on fully private subnets, can we make an exception for such mapping that we allow some Pods to have HTTP endpoints to be exposed for ingress traffic, while still on private subnets?
Yes! you can create an externally facing load balancer (ALB, NLB, or ELB). These can also be managed by Kubernetes if you use the Service type LoadBalancer. You'll need the appropriate annotations in your Service definitions to get what you want.
What do you recommend us to handle when a Pod/Container needs to use NAT gateway for egress traffic, but then exposing HTTP endpoints for ingress traffic?
Use an externally facing load balancer that forwards traffic to your private VPC with the Kubernetes Service type LoadBalancer and use AWS NAT Gateways for outgoing internet traffic.
Disclaimer: This is just a recommendation, there are other combinations and alternatives.
I have a few elastic beanstalk applications on the same VPC (which can also be reduced to one application), and I'd like them to be accessible both via one IP address (both inbound and outbound traffic), and via their own URL. I've seen that this can be done via NAT, but I haven't found documentation on whether this is all traffic (in both directions) and if it can be done alongside the original endpoints. Another question is whether there is a better way to do this.
NAT is used to provide access to internet for instances in private subnets. In this case all instances in the subnet will have the same external IP. But you won't be able to access your private instances using that IP, it's only for outbound traffic.
In your case I'd go with a ELB. Following the best practices, keep your instances with running applications in private subnets and:
Have an external facing ELB in public subnets (you'll need at least 2 public subnets in different AZs).
Create a Target Group and add your instances with running apps to it.
Assign the Target Group to the listener on your ELB.
Configure the security groups on ELB and app instances to allow the traffic on the port the applications are serving (usually it's 8080).
As a result you'll have your instances accessible by the ELB URL. If you want to have a pretty URL, you can configure it in Route 53 and resolve it to the ELB URL.
Its not possible by using aws provided NAT cluster but can be achieved by hosting a box with both Load balancer and NAT running in the same instance with EIP, map your domain with that IP for incoming traffic, for outgoing traffic in the route table of private app subnet you configure the NAT as target for all the 0.0.0.0/0 route, But it is not the recommended approach since the front facing instance becomes SPOF.
The recommended way is using ELB as a front facing and NAT cluster as outgoing for high HA.
I'm putting the instances behind the aws loadbalancer, I have configured the routable and attached the IGW to it, created the loadbalancer and added this instance to the aws loadbalancer every things work well, the endpoint URL of the AWS loadbalancer able to load the HTTP pages
Now i have removed the IGW from the route table and tested it again, the AWS loadbalalncer endpoint URL not able to load the page, but the instace status shows in AWS loadbalancer as inService
Why the IGW is required when loadbalancer is configured over private subnet, it technically Mean it's a public subnet, which is blocking me to create a NAT inatance
A subnet without a default route pointing to the igw-xxxxxxxx Internet Gateway object is, by definition, a private subnet. If you remove the igw from a public subnet, you now have a private subnet.
Placing an Internet-facing load balancer (ELB) in such a private subnet is incorrect.
It sounds as though you are making a commonly-made -- but incorrect -- assumption that the ELB should be configured in the same subnets as the instances behind it. This is also incorrect.
Provision the ELB in public subnets, without regard to the subnets the instances behind it were placed in.
In summary:
Internet-facing ELB requires a public subnet for placement.
NAT instance requires a public subnet for placement.
The instances that use these services (NAT and ELB) belong in different -- private -- subnets, different subnets than the ELB and NAT instances.
ELB and NAT can be placed together in the same subnets, or separately, as long as the subnets are public (have the IGW as their default route) and are in the same availability zones.
I believe you cannot do anything without the IGW attached to the routing table where the subnet is attached to.
Another way to do this is to spin off a NAT instance(can be found in AWS marketplace) in the public subnet, add it to your private route table where your original instance is on (0.0.0.0/0 - instanceid) all the traffic will be routed through the NAT instance.
Here mainly we have to see two things, provided subnet for the elasticloadbalancer is public or private.
Every VPC should have one IGW to connect to the public, every trafic should go through the IGW only. If VPC connected to the IGW, IGW distribute internet to the all instance which are in that VPC, if route table changed to the particular rt, internet traffic can send to only that instance. Here in ELB instance is in service because both are in the same VPC and can communicate each other means checks the status. This IGW will work main role when we are using the NAT.
Always we provide ip range for the IGW or rt as 0.0.0.0/0, it represents as public.
The following link will explain more : http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Scenario2.html
This small explanation might be helpful for someone.
Let me cover your two questions
the AWS loadbalalncer endpoint URL not able to load the page, but the instace status shows in AWS loadbalancer as inService
This is the default behaviour of load balancer , since internally load balancer and your instance may be in the same VPC so they are able to communicate that is why loadbalancer is showing inService status.
Second question Why the IGW is required when loadbalancer is configured over private subnet, it technically Mean it's a public subnet, which is blocking me to create a NAT inatance
You need IGW if you want to access any resources especially EC2 , Loadbalancer from internet. however if you put your load balancer in private subnet it means IGW is not associated with the subnet having loadbalancer in it and hence this load balancer is not accessible outside your VPC that is reason that you were not able to load your page.
NAT instance is usually used when you want your private subnet instances should be able to initiate request over internet and has nothing to do with normal load balancing setup unless and untill you want dont want to install any updates from internet to your instance.
You are trying to access webpage publicly and removing the route entry IGW from the loadbalncer subnet.
The Subnet without IGW will become Private, Hence you can't access it.
First, a subnet with route table that route traffic via the internet gateway (IGW) is a public subnet. An IGW is required because the subnets created in AWS VPC are internal IPs and as internal IPs are not routable via the internet, traffic to and fro EC2 instances that belongs to an internal IP needs a way to complete these request. This is where an IGW comes into play. The IGW allows your EC2 instance to make outbound request to the internet and allows other user/client to make inbound request to your ec2 instance.
A public subnet are group of IPs (called subnets) in your VPC that allows internet traffic to and fro your ec2 instance. A subnet without an Internet gateway is a private subnet. As you already guess no traffic is allowed in or out.
That said, instances in VPC which are in Private subnet still need to initiate an outbound request to the internet to download software or perform update. In this case you have to create and attach a NAT gateway or NAT instance to the private subnet. NAT Gateway and NAT instance only allow outbound traffic to the internet but not the the other way round. In some cases you might want your production EC2 to be in the private subnet and ELB in public subnet for security reason.
ELB usually belong to the public subnet so it reachable from the internet as is your case as well.
TO answer some of your question - when you deleted the IGW from the route table, your ELB automatically becomes a private subnet as such your web page stops loading.
Also, you could still see the ec2 instance behind the ELB as InService even when you deleted the IGW because the ELB and EC2 instance can communicate via the internal IP as they are in same network or VPC.
The ELB needs a route to the internet in order to send you the response over internet.. As simple as that.
Configure your ELB in public subnet, regardless of where your instances are present.
Basically there are two types of load balancers.
1)Internal
2) External
Internal load balancers are those which are launched in a private subnet which will be accessible only internally by the instances on same vpc of the internal elb
External load balancers are which are accessible over the internet which should be launched in a subnet which has internet gateway attached to it and which has route table configured properly to route the requests.
If you attach an internet gateway to a subnet it becomes a public subnet.Also if you create a load-balancer which you need to be accessible from internet it should be a External load balancer and aws will not allow it launch in a private subnet.The instances are showing in service because its communicating internally using private ip-address.