Simplest way to setup egress gateway with EKS without Istio - amazon-web-services

How can I setup an egress gateway so that all traffic outbound from an EKS cluster has a single external IP? I need to do this to comply to corporate policies relating to incoming IPs when accessing external services outside the cluster. We are currently adding node IPs (for nodes with public IP) and NAT gateway IP (for private nodes) manually to the allowed range but it is getting more and more painful due to dynamically spinned up nodes by Karpenter.
Currently, we are using Kong to control ingress traffic, so Istio is not a choice.

Related

How to route all outgoing traffic of a Pod in GKE

We have a GKE Autopilot Cluster and an external Address/Cloud NAT set up. For certain Pods we want to ensure that all their outgoing traffic (layer 4) is routed through that external address.
The only possibilities I can think of is to make the whole Cluster private (and thus enforce use of the Cloud NAT) or to use a Service Mesh solution which could perhaps intercept all pakets via ebpf?
Are there other solutions to enforcing a routing to one external Address?
With the time being, there is no way to do this for the GKE Autopilot Cluster.
But by the end of October, there will likely be an upgrade to the Egress NAT policy that will enable users to setup SNAT based on pod labels, namespaces, and even the destination IP address.

AWS LoadBalancer access from EKS worker nodes in provate subnets

I have an EKS cluster with worker nodes in private subnet. The worker nodes can access internet via the nat gateway. I have a Route53 hosted zone record routing traffic (alias) to a load balancer.
When I try to access the url (route53 record) from a pod within the EKS cluster, it times out. I tried allowing the worker nodes security group in the inbound rules of the load balancer security group but it does not work. Only thing that works is if I allow the public IP of the nat gateway in the inbound rules of the load balancer security group.
I am sure this setup is very common. My question is, is the solution of allowing the nat gateway public ip in the inbound rules of the LB SG the correct way or is there a better cleaner way to allow the access?
based on what you have described here, it seems like you have a internet facing load balancer and trying to access it from the pod. In this case, the traffic needs to go out to internet(through nat gateway) and come back to the load balancer, that is why it only works when you add the public IP of nat gateway to load balancer's SG.
Now, in terms of the solution, it depends on what you are trying to do here:
if you only need to consume the service inside the cluster, you can use DNS name created for that service inside the cluster. in this case the traffic will stay inside the cluster. you can read more here
if you need to make the service available to other clusters but same VPC, you can use a private load balancer and add the security group of worker nodes to the load balancer SG.
if the service needs to be exposed to internet, then your solution works but you have to open the SG of the public load balancer to all public IPs accessing the service.

Is it possible to assign a reserved public IP to a GKE deployment for egress traffic?

I want some of my GKE deployments to use a public static IP for egress traffic to the internet.
Here is what I already know is possible:
Use gcp's nat gateway and NAT ALL public traffic from a cluster/vpc
Create a GCE instance with IP forwarding and create a routing rule to route specific traffic through the GCE instance- to selectively NAT traffic
I'd like to avoid either and just assign a reserved global IP to a GKE deployment/pod (like I can assign a reserved IP to an ingress). Is this at all possible?
I want outbound traffic from some pods (deployments) to use the same static public IP, but for most deployments I don't want to NAT their traffic at all.
I also can't use the underlying node's public IPs because I autoscale and the node's IP could change - you can't use reserved IPs for nodes as far as I know.
EDIT: Azure seems to support what I'm looking for with azure-egress https://learn.microsoft.com/en-us/azure/aks/egress. So I can see at least one provider has an official solution for this. I am wondering if GKE has something similar.
You should go with the 2nd option - Create a GCE instance that will serve as a NAT instance.
Then, you can assign different network tags for different node pools in your cluster, so only one of your node pools will route its public traffic to the NAT instance you created.
You then can use Node Taints and Tolerations, to make sure only the deployments you want to route to that NAT instance will be allocated to the nodes in your special node pool.
For example, configure this taint: traffic=nat:NoExecute and add the following toleration to your deployment:
tolerations:
- effect: NoExecute
key: traffic
value: "nat"

Migrate Pods from different EC2 hosts

I am new to AWS, Kubernetes, EKS, and AppMesh, but had done DevOps in previous roles.
I am taking over some K8 cluster which used EKS and found that we set up NAT gateway which helps redirect egress traffic outbound as a single IP (we need that for whitelisting as 3rd-party external service require it). Pods hosted in a private subnet works fine.
But I found that Pods which hosted on public subnet just skip the NAT gateway, it uses the Public DNS (IPv4) IP address for outbound calls, which don't work for us as it does not use the single NAT gateway.
So I have a few questions:
How do we migrate Pods from Public subnet Hosts to Private subnets Hosts?
Should we use nodeSelector, Node affinity? Do labelings on the Nodes work?
I am not sure why we have Nodes in a public subnet, but we followed this guide: https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html
If we do choose to be on fully private subnets, can we make an exception for such mapping that we allow some Pods to have HTTP endpoints to be exposed for ingress traffic, while still on private subnets?
What do you recommend us to handle when a Pod/Container needs to use NAT gateway for egress traffic, but then exposing HTTP endpoints for ingress traffic?
Note that currently, our EKS is by default set to all public, should we move to Public and private mode?
Thanks for all the answers in advance!.
How do we migrate Pods from Public subnet Hosts to Private subnets Hosts? Should we use nodeSelector, Node affinity? Do labelings on the Nodes work?
Yes. Use Node Affinity which same as using a nodeSelector. You can do a rolling change by updating whatever resource you are using to manage your pods (i.e Deployment, Statefulset, DaemonSet, etc). If you configured it correctly the next time your pods start, they will be in the private subnet hosts.
I am not sure why we have Nodes in a public subnet, but we followed this guide:
https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html
The guide says public subnet so it makes sense that there is one.
If we do choose to be on fully private subnets, can we make an exception for such mapping that we allow some Pods to have HTTP endpoints to be exposed for ingress traffic, while still on private subnets?
Yes! you can create an externally facing load balancer (ALB, NLB, or ELB). These can also be managed by Kubernetes if you use the Service type LoadBalancer. You'll need the appropriate annotations in your Service definitions to get what you want.
What do you recommend us to handle when a Pod/Container needs to use NAT gateway for egress traffic, but then exposing HTTP endpoints for ingress traffic?
Use an externally facing load balancer that forwards traffic to your private VPC with the Kubernetes Service type LoadBalancer and use AWS NAT Gateways for outgoing internet traffic.
Disclaimer: This is just a recommendation, there are other combinations and alternatives.

I have limited IP space in my AWS VPC. How do I setup Kubernetes in AWS where worker nodes and control plane are in different subnets

I have limited IP's in my public facing VPC which basically means I cannot run the K8S worker nodes in that VPC since I would not have sufficient IP's to support all the pods. My requirement is to run the control plane in my public facing VPC and the worker nodes in a different VPC with a private IP range (192.168.X.X).
We use traefik for ingress and have deployed traefik as a DaemonSet. These pods are exposed using a Kubernetes service of type NLB. And we created a VPC endpoint on top of this NLB which allows us to access this traefik endpoint through our public facing VPC.
However, based on docs it looks like NLB support is still in alpha stage. I am curious what are my other options given the above constraints.
Usually, in Kubernetes cluster Pods are running in separate overlay subnet that should not overlap with existing IP subnets in VPC.
This functionality is provided by Kubernetes cluster networking solutions like Calico, Flannel, Weave, etc.
So, you only need to have enough IP address space to support all cluster nodes.
The main benefit of using NLB is to expose client IP address to pods, so if there are no such requirements, regular ELB would be good for most cases.
You can add secondary CIDR to your vpc and use one of the two options mentioned here to have pods use the secondary vpc CIDR.
https://aws.amazon.com/blogs/containers/optimize-ip-addresses-usage-by-pods-in-your-amazon-eks-cluster/