We have a GKE Autopilot Cluster and an external Address/Cloud NAT set up. For certain Pods we want to ensure that all their outgoing traffic (layer 4) is routed through that external address.
The only possibilities I can think of is to make the whole Cluster private (and thus enforce use of the Cloud NAT) or to use a Service Mesh solution which could perhaps intercept all pakets via ebpf?
Are there other solutions to enforcing a routing to one external Address?
With the time being, there is no way to do this for the GKE Autopilot Cluster.
But by the end of October, there will likely be an upgrade to the Egress NAT policy that will enable users to setup SNAT based on pod labels, namespaces, and even the destination IP address.
Related
I have deployed my EKS cluster in private subnetes, Now this subnets have internet access using NAT gateway. I wanted to find out how much data transfered from Each pod to NAT gateway ??
You don't, really.
There is no available metric available at that level of granularity. You can see the total bytes transferred in/out for each NAT gateway, but it won't tell you what percentage each pod (or any other services in the private subnet for that matter) are accounted for of the total bytes transferred.
By default, containers from all pods in an EKS cluster share network interface(s) from the host(s) of the cluster, which is more cost effective and saves available IP addresses in your VPC, but means you can't track individual container traffic with flow logs. In theory, (I don't recommend this) you could configure your cluster to assign a VPC network interface for each container in your cluster and track traffic to your NAT gateway(s) independently with VPC flow logs then filter/aggregate the data, relate it back to the origin pods in order to determine how much traffic each pod sent to the NAT gateway. In practice, this is difficult and expensive.
See How can I find the top talkers or contributors to traffic through the NAT gateway in my VPC? for more detail.
Another option may be to use a proxy container for requests bound for the NAT gateway and have the proxy collect the metrics per pod. You'd have to configure the pods to use the proxy, share pod information to the proxy, and configure the proxy to track/provide the metrics. I don't know of any off-the-shelf tools that do this.
How can I setup an egress gateway so that all traffic outbound from an EKS cluster has a single external IP? I need to do this to comply to corporate policies relating to incoming IPs when accessing external services outside the cluster. We are currently adding node IPs (for nodes with public IP) and NAT gateway IP (for private nodes) manually to the allowed range but it is getting more and more painful due to dynamically spinned up nodes by Karpenter.
Currently, we are using Kong to control ingress traffic, so Istio is not a choice.
What is the standard way to block an external IP from accessing my GCP cluster? Happy for the answer to include another Google service.
Because your cluster is deployed on Compute Engine instance, you can simply set a firewall rule to discard connection from a specific IP.
If you use an HTTP load balancer, you can add Cloud Armor policy to exclude some IPs.
In both case, keep in mind that IP filtering isn't very efficient. A VPN or Proxy can be easily and freely used on the internet and change the IP source of the requester.
As per GCP documentation on Cloud NAT,
Regular (non-private) GKE clusters assign each node an external IP address, so such clusters cannot use Cloud NAT to send packets from the node's primary interface. Pods can still use Cloud NAT if they send packets with source IP addresses set to the pod IP
Question: How do I configure pods to set source IP to pod IP while sending packets to some external service?
Cloud NAT is used to permit GCE instances or GKE clusters that only have internal IP addresses to access public resources on the internet. If you want to use Cloud NAT, you will need to follow the guidelines from the public docs or you can build your own NAT gateway using a GCE Instance which does not require you to use a private cluster.
Muhammad's answer is mostly accurate and it is the supported method for GCP. Though there is one addition to address the quoted text.
GKE uses ip masquerade and SNAT when routing traffic between nodes or outside the cluster. As long as pods are routing to traffic within the Masq range, SNAT occurs and the pods use the node's external (or internal) IP address. You'll want to disable SNAT by extending the non-masq range to include all IPs (0.0.0.0/0). You can do this using the ip-masq-agent, which, if not present, you can install.
I'd like to setup a NAT gateway, using Cloud NAT, so that VMs/Pods in a public GKE cluster use static IP addresses.
The issue I'm facing is that the NAT gateway seems to only be used if VMs have no other options, i.e:
GCP forwards traffic using Cloud NAT only when there are no other matching routes or paths for the traffic.
But in the case of a public GKE cluster, VMs have ephemeral external IPs, so they don't use the gateway.
According to the doc:
If you configure an external IP on a VM's interface [...] NAT will not be performed on such packets. However, alias IP ranges assigned to the interface can still use NAT because they cannot use the external IP to reach the Internet.
And
With this configuration, you can connect directly to a GKE VM via SSH, and yet have the GKE pods/containers use Cloud NAT to reach the Internet.
That's what I want, but I fail to see what precisely to setup here.
What is implied by alias IP ranges assigned to the interface can still use NAT and how to set this up?
"Unfortunately, this is not currently the case. While Cloud NAT is still in Beta, certain settings are not fully in place and thus the pods are still using SNAT even with IP aliasing. Because of the SNAT to the node's IP, the pods will not use Cloud NAT."
Indeed, as Patrick W says above, it's not currently working as documented. I tried as well, and spoke with folks on the GCP Slack group in the Kubernetes Engine channel. They also confirmed in testing that it only works with a GKE private cluster. We haven't started playing with private clusters yet. I can't find solid documentation on this simple question: If I create a private cluster, can I still have public K8S services (aka load balancers) in that cluster? All of the docs about private GKE clusters indicate you don't want any external traffic coming in, but we're running production Internet-facing services on our GKE clusters.
I filed a ticket with GCP support about the Cloud NAT issue, and here's what they said:
"I have been reviewing your configuration and the reason that Cloud NAT is not working is because your cluster is not private.
To use Cloud NAT with GKE you have to create a private cluster. In the non-private cluster the public IP addresses of the cluster are used for communication between the master and the nodes. That’s why GKE is not taking into consideration the Cloud NAT configuration you have.
Creating a private cluster will allow you to combine Cloud NAT and GKE.
I understand this is not very clear from our documentation and I have reported this to be clarified and explained exactly how it is supposed to work."
I responded asking them to please make it work as documented, rather than changing their documentation. I'm waiting for an update from them...
Using google's Cloud NAT with public GKE clusters works!
First a cloud NAT gateway and router needs to be setup using a reserved external IP.
Once that's done the ip-masq-agent configuration needs to be changed to not masquerade the pod IPs for the external IPs that are the target of requests from inside the cluster. Changing the configuration is done in the nonMasqueradeCidrs list in the ConfigMap for the ip-masq-agent.
The way this works is that for every outgoing requests to an IP in the nonMasqueradeCidrs list IP masquerading is not done. So the requests does not seem to originate from the node IP but from the pod IP. This internal IP is then automatically NATed by the Cloud NAT gateway/router. The result is that the request seems to originate from the (stable) IP of the cloud NAT router.
Sources:
https://rajathithanrajasekar.medium.com/google-cloud-public-gke-clusters-egress-traffic-via-cloud-nat-for-ip-whitelisting-7fdc5656284a
https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent
The idea here is that if you use native VPC (Ip alias) for the cluster, your pods will not use SNAT when routing out of the cluster. With no SNAT, the pods will not use the node's external IP and thus should use the Cloud NAT.
Unfortunately, this is not currently the case. While Cloud NAT is still in Beta, certain settings are not fully in place and thus the pods are still using SNAT even with IP aliasing. Because of the SNAT to the node's IP, the pods will not use Cloud NAT.
This being said, why not use a private cluster? It's more secure and will work with Cloud NAT. You can't SSH directly into a node, but A) you can create a bastion VM instance in your project that can SSH using the internal IP flag and B) you generally do not need to SSH into the node on most occassions.