I am new to AWS, Kubernetes, EKS, and AppMesh, but had done DevOps in previous roles.
I am taking over some K8 cluster which used EKS and found that we set up NAT gateway which helps redirect egress traffic outbound as a single IP (we need that for whitelisting as 3rd-party external service require it). Pods hosted in a private subnet works fine.
But I found that Pods which hosted on public subnet just skip the NAT gateway, it uses the Public DNS (IPv4) IP address for outbound calls, which don't work for us as it does not use the single NAT gateway.
So I have a few questions:
How do we migrate Pods from Public subnet Hosts to Private subnets Hosts?
Should we use nodeSelector, Node affinity? Do labelings on the Nodes work?
I am not sure why we have Nodes in a public subnet, but we followed this guide: https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html
If we do choose to be on fully private subnets, can we make an exception for such mapping that we allow some Pods to have HTTP endpoints to be exposed for ingress traffic, while still on private subnets?
What do you recommend us to handle when a Pod/Container needs to use NAT gateway for egress traffic, but then exposing HTTP endpoints for ingress traffic?
Note that currently, our EKS is by default set to all public, should we move to Public and private mode?
Thanks for all the answers in advance!.
How do we migrate Pods from Public subnet Hosts to Private subnets Hosts? Should we use nodeSelector, Node affinity? Do labelings on the Nodes work?
Yes. Use Node Affinity which same as using a nodeSelector. You can do a rolling change by updating whatever resource you are using to manage your pods (i.e Deployment, Statefulset, DaemonSet, etc). If you configured it correctly the next time your pods start, they will be in the private subnet hosts.
I am not sure why we have Nodes in a public subnet, but we followed this guide:
https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html
The guide says public subnet so it makes sense that there is one.
If we do choose to be on fully private subnets, can we make an exception for such mapping that we allow some Pods to have HTTP endpoints to be exposed for ingress traffic, while still on private subnets?
Yes! you can create an externally facing load balancer (ALB, NLB, or ELB). These can also be managed by Kubernetes if you use the Service type LoadBalancer. You'll need the appropriate annotations in your Service definitions to get what you want.
What do you recommend us to handle when a Pod/Container needs to use NAT gateway for egress traffic, but then exposing HTTP endpoints for ingress traffic?
Use an externally facing load balancer that forwards traffic to your private VPC with the Kubernetes Service type LoadBalancer and use AWS NAT Gateways for outgoing internet traffic.
Disclaimer: This is just a recommendation, there are other combinations and alternatives.
Related
Using Terraform to setup a VPC with two EC2s in private subnets. The setup needs to SSH to the EC2s to install package updates from the Internet and install the application software. To do this there is an IGW and a NAT-GW in a public subnet. Both EC2s can access the Internet at this point as both private subnets are routing to the NAT-GW. Terraform and SSH to the private subnets is done via Client VPN.
One of the EC2s is going to host a web service so a Classic mode Load Balancer is added and configured to target the web server EC2. Using Classic mode because I can't find a way to make Terraform build Application mode LBs. The Load Balancer requires the instance to be using a subnet that routes to the IGW, so it is changed from routing to the NAT-GW, to the IGW. At this point, the Load Balancer comes online with the EC2 responding and public Internet can access the web service using the DNS supplied End Point for the LB.
But now the web server EC2 can no longer access the Internet itself. I can't curl google.com or get package updates.
I would like to find a way to let the EC2 access the Internet from behind the LB and not use CloudFront at this time.
I would like to keep the EC2 in a private subnet because a public subnet causes the EC2 to have a public IP address, and I don't want that.
Looking for a way to make LB work without switching subnets, as that would make the EC web service unavailable when doing updates.
Not wanting any iptables or firewalld tricks. I would really like an AWS solution that is disto agnostic.
A few points/clarifications about the problems you're facing:
Instances on a public subnet do not need a NAT Gateway. They can initiate outbound requests to the internet via IGW. NGW is for allowing outbound IPv4 connections from instances in private subnets.
The load balancer itself needs to be on a public subnet. The instances that the LB will route to do not. They can be in the same subnet or different subnets, public or private, as long as traffic is allowed through security groups.
You can create instances without a public IP, on a public subnet. However, they won't be able to receive or send traffic to the internet.
Terraform supports ALBs. The resource is aws_lb with load_balancer_type set to "application" (this is the default option).
That said, the public-private configuration you want is entirely possible.
Your ALB and NAT Gateway need to be on the public subnet, and EC2 instances on the private subnet.
The private subnet's route table needs to have a route to the NGW, to facilitate outbound connections.
EC2 instances' security group needs to allow traffic from the ALB's security group.
It sounds like you got steps 1 and 2 working, so the connection from ALB to EC2 is what you have to work on. See the documentation page here as well - https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Scenario2.html
We have 10 instances which we deployed the app using the AWS ECS and ELB
Due to security reasons the API allows request only through specific IP whitelisted IP addresses.
So we are planning to pass the request through the proxy
How to route an API request go through a proxy
We are using nginx
Any specific way to route an API request go through a proxy will be helful
You won't need NGINX as a proxy for this use-case, I'd propose to consider looking into using AWS NAT Gateways. NAT Gateway is a highly available AWS managed service that makes it easy to connect to the Internet from instances within a private subnet in an Amazon Virtual Private Cloud (Amazon VPC). Its the perfect place to provide a static IP to all your subnet's outbound traffic.
In order to provide a NAT Gateway with static IP (Elastic IP) for your cluster's outbound traffic. This will allow your different tasks running inside your ECS cluster's private subnets to look like a single requesting entity from an outsider's POV (in your case, the 3rd party API is the outsider). To achieve this, you will have to:
Create 2 route tables (1 for private subnets, 1 for public subnets)
Internet gateway on the public subnet
Elastic IP address
Create a NAT Gateway and attach the elastic IP to it (This will be the IP whitelisted to the 3rd party API)
Ensure that all your tasks are running inside private subnets of the VPC
Add a rule in your route table for your private subnets that redirects outbound 0.0.0.0/0 to the NAT Gateway.
Add a rule in your route table for your public subnets that redirects outbound traffic 0.0.0.0/0 to the internet gateway.
You should consider using NAT Gateway instead. I am assuming you already would have all your containers in a VPC, so you can create a new NAT Gateway within this VPC itself.
You can refer to articles attached below to do this:
https://docs.aws.amazon.com/appstream2/latest/developerguide/add-nat-gateway-existing-vpc.html
https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html
Note: NAT Gateways have price associated with them.
If needed, you can use the elastic IP provided by NAT Gateways on your lambdas as well.
I have an EKS cluster with worker nodes in private subnet. The worker nodes can access internet via the nat gateway. I have a Route53 hosted zone record routing traffic (alias) to a load balancer.
When I try to access the url (route53 record) from a pod within the EKS cluster, it times out. I tried allowing the worker nodes security group in the inbound rules of the load balancer security group but it does not work. Only thing that works is if I allow the public IP of the nat gateway in the inbound rules of the load balancer security group.
I am sure this setup is very common. My question is, is the solution of allowing the nat gateway public ip in the inbound rules of the LB SG the correct way or is there a better cleaner way to allow the access?
based on what you have described here, it seems like you have a internet facing load balancer and trying to access it from the pod. In this case, the traffic needs to go out to internet(through nat gateway) and come back to the load balancer, that is why it only works when you add the public IP of nat gateway to load balancer's SG.
Now, in terms of the solution, it depends on what you are trying to do here:
if you only need to consume the service inside the cluster, you can use DNS name created for that service inside the cluster. in this case the traffic will stay inside the cluster. you can read more here
if you need to make the service available to other clusters but same VPC, you can use a private load balancer and add the security group of worker nodes to the load balancer SG.
if the service needs to be exposed to internet, then your solution works but you have to open the SG of the public load balancer to all public IPs accessing the service.
I have recently setup a EKS cluster in AWS for my company's new project. Before I get into my issue, here is some info of my setup. There are two nodes (at the moment) in the nodegroup with auto-scaling. At first, I created the nodes in the private subnet as I think it is more secure. But my supervisor told me that we will need the capability to SSH to the node directly, so I recreated the nodes in the public subnet so that we can SSH to the node using public key.
We have two CMS instances sitting in AWS (for example aws.example.com) and DigitalOcean (for example do.example.com) that contains some data. When the containers in EKS cluster start, some of them will need to access the instance in AWS by using the url aws.example.com or do.example.com. If the containers in EKS failed to access the instances, the container will still run but the app in it won't. So I need to whitelist the public IP of all my EKS nodes on the two CMS instances in order for the app work. And it works.
I am using ingress in my EKS cluster. When I created the ingress object, the AWS created an application load balancer for me. All the inbound traffic is being handled by the ALB, it is great. But here comes the issue. When more containers are created, the auto-scaling spin up new nodes in my EKS cluster (with different public IP every time), then I will have to go to the two CMS instances to whitelist the new public IP address.
So, is there any way to configure in such a way that all the nodes to use a single fixed IP address for outbound traffic? or maybe configure them to use the ALB created by ingress for outbound traffic as well? or I need to create a server to do that? I am very lost right now.
Here is what I have tried:
When the cluster is created, it seems like it created a private subnet as well even though I specify the nodes to be created in public subnet. There is a nat-gateway (ngw-xxxxxx) created for the private subnet and it comes with an Elastic IP (for example 1.2.3.4). The routetable of the public subnet is as below:
192.168.0.0/16 local
0.0.0.0/0 igw-xxxxxx
So I thought by changing igw-xxxxxx to ngw-xxxxxx, all the outbound traffic will use the ngw-xxxxxx and send the traffic to the outside world using IP address 1.2.3.4, which I just need to whitelist 1.2.3.4 on my two CMS instances. But right after I applied the changes, all containers are terminated and all things stopped working.
Exactly, as #Marcin mentioned in the comment above.
You should move your node-group to the private subnet.
Private subnet
As the docs tell:
Use a public subnet for resources that must be connected to the internet, and a private subnet for resources that won't be connected to the internet
The idea of private subnet is to forbid access to resources inside directly from the internet.
You can read really good part of AWS documentation here: https://docs.aws.amazon.com/vpc/latest/userguide/how-it-works.html
For private subnet you need to setup outgoing traffic thru your Nat gateway in the Route Table (read here: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html).
Public subnet
If you need your cluster in the public subnet for some reasons, but it is BAD parctice, you can do the trick:
You can route traffic via Nat Gateways from public subnet only to a specific server(your CMS).
Your public subnet route table may look like:
Destination
Target
10.0.0.0/16
local
W.X.Y.Z/32
nat-gateway-id
0.0.0.0/0
igw-id
Where W.X.Y.Z/32 is your CMS IP address.
Some hint
Moreover good practice is to allocate a pool of EIP and attach them to NAT Gateway to be sure it is not changed in the future.
When you want to modify infrastructure and create more complicated Nat (e.g. you want to filtering traffic on layer 7), you can createHigh Availability Nat instances, and attach the IP to NAT instances instead of NAT Gateways.
In that situation you will avoid mailing some 3-rd party APIs to whitelist your new IPs.
I want some of my GKE deployments to use a public static IP for egress traffic to the internet.
Here is what I already know is possible:
Use gcp's nat gateway and NAT ALL public traffic from a cluster/vpc
Create a GCE instance with IP forwarding and create a routing rule to route specific traffic through the GCE instance- to selectively NAT traffic
I'd like to avoid either and just assign a reserved global IP to a GKE deployment/pod (like I can assign a reserved IP to an ingress). Is this at all possible?
I want outbound traffic from some pods (deployments) to use the same static public IP, but for most deployments I don't want to NAT their traffic at all.
I also can't use the underlying node's public IPs because I autoscale and the node's IP could change - you can't use reserved IPs for nodes as far as I know.
EDIT: Azure seems to support what I'm looking for with azure-egress https://learn.microsoft.com/en-us/azure/aks/egress. So I can see at least one provider has an official solution for this. I am wondering if GKE has something similar.
You should go with the 2nd option - Create a GCE instance that will serve as a NAT instance.
Then, you can assign different network tags for different node pools in your cluster, so only one of your node pools will route its public traffic to the NAT instance you created.
You then can use Node Taints and Tolerations, to make sure only the deployments you want to route to that NAT instance will be allocated to the nodes in your special node pool.
For example, configure this taint: traffic=nat:NoExecute and add the following toleration to your deployment:
tolerations:
- effect: NoExecute
key: traffic
value: "nat"