NOTE: I tried to include screenshots but stackoverflow does not allow me to add images with preview so I included them as links.
I deployed a web app on AWS using kOps.
I have two nodes and set up a Network Load Balancer.
The target group of the NLB has two nodes (each node is an instance made from the same template).
Load balancer actually seems to be working after checking ingress-nginx-controller logs.
The requests are being distributed over pods correctly. And I can access the service via ingress external address.
But when I go to AWS Console / Target Group, one of the two nodes is marked as and I am concerned with that.
Nodes are running correctly.
I tried to execute sh into nginx-controller and tried curl to both nodes with their internal IP address.
For the healthy node, I get nginx response and for the unhealthy node, it times out.
I do not know how nginx was installed on one of the nodes and not on the other one.
Could anybody let me know the possible reasons?
I had exactly the same problem before and this should be documented somewhere on AWS or Kubernetes. The answer is copied from AWS Premium Support
Short description
The NGINX Ingress Controller sets the spec.externalTrafficPolicy option to Local to preserve the client IP. Also, requests aren't routed to unhealthy worker nodes. The following troubleshooting implies that you don't need to maintain the cluster IP address or preserve the client IP address.
Resolution
If you check the ingress controller service you will see the External Traffic Policy field set to Local.
$ kubectl -n ingress-nginx describe svc ingress-nginx-controller
Output:
Name: ingress-nginx-controller
Namespace: ingress-nginx
...
External Traffic Policy: Local
...
This Local setting drops packets that are sent to Kubernetes nodes that aren't running instances of the NGINX Ingress Controller. Assign NGINX pods (from the Kubernetes website) to the nodes that you want to schedule the NGINX Ingress Controller on.
Update the pec.externalTrafficPolicy option to Cluster
$ kubectl -n ingress-nginx patch service ingress-nginx-controller -p '{"spec":{"externalTrafficPolicy":"Cluster"}}'
Output:
service/ingress-nginx-controller patched
By default, NodePort services perform source address translation (from the Kubernetes website). For NGINX, this means that the source IP of an HTTP request is always the IP address of the Kubernetes node that received the request. If you set a NodePort to the value of the externalTrafficPolicy field in the ingress-nginx service specification to Cluster, then you can't maintain the source IP address.
Related
E.g. an istio service
istio-ingressgateway LoadBalancer 10.103.19.83 10.160.32.41 15021:30943/TCP,80:32609/TCP,443:30341/TCP,3306:30682/TCP,15443:30302/TCP
Which resulted in a TCP internal load balancer. The front end is ports 15021, 80, 443, 3306, and 15443.
The backend is basically the instance group of the cluster.
How does the load balancer know 443 at the front end will forward to 30341 at backend? As far as I know, TCP load balancer is doing port forwarding? How/Where does the magic happening
The LoadBalancer Service type is an extension of the NodePort type, which is an extension of the ClusterIP type. A nodePort just opens up a port in the range 30000-32767 on each worker node and uses a label selector to identify which Pods to send the traffic to.
This means that internal clients call the Service by using the internal IP address of a node along with the TCP port specified by nodePort. The request is forwarded to one of the member Pods on the TCP port specified by the targetPort field.
Here’s an example
When a Service is created in kubernetes, a corresponding Endpoints object is created along with it. It also applies to LoadBalancer service type.
If you create a simple nginx deployment e.g. by running:
kubectl apply -f https://k8s.io/examples/application/deployment.yaml
and then expose it as a LoadBalancer service:
kubectl apply -f https://k8s.io/examples/application/deployment.yaml
apart from the service itself, you will also see the lb-nginx Endpoints object. You can inspect its details:
kubectl get ep lb-nginx -o yaml
As you can see it keeps track of all exposed pods (being part of a Deployment in this case) so that corresponding iptables rules, which are responsible for forwarding the traffic to a particular pod, can be up-to-date all the time, even if number of them or their ip chages.
You can e.g. scale your deployment to 5 replicas:
kubectl scale deployment nginx-deployment --replicas=5
and inspect the Endpoints object again:
kubectl get ep lb-nginx -o yaml
and you will see that right after your 5 pods are up and running it immediately gets updated as well.
As you can see in subsets section of the yaml:
subsets:
- addresses:
- ip: 10.12.0.3
nodeName: gke-gke-default-pool-75259266-oauz
targetRef:
kind: Pod
name: nginx-deployment-66b6c48dd5-dw9mt
namespace: default
resourceVersion: "22394113"
uid: 8d7e1d3e-64e2-4891-b567-61ee48f61ed1
apart from the ip address of the Pod it maintains information about the node on which it is running.
Let's go back for a moment to the Service:
kubectl get svc lb-nginx -o yaml
As you can see LoadBalancer service apart from its external IP address has its ClusterIP as every other Service (well, almost every as headless services don't have ClusterIP):
spec:
clusterIP: 10.16.6.236
clusterIPs:
- 10.16.6.236
externalTrafficPolicy: Cluster
ports:
- nodePort: 31935
port: 80
protocol: TCP
targetPort: 80
So as you can imagine this external IP is somehow mapped to the cluster ip so it route the traffic further to the respective endpoints in the cluster. How exactly this mapping is done doesn't really matter as it is done by the cloud provider and such implementation details are not part of publicly shared knowledge. The only thing you need to know is that when your cloud provider provisions an external load balancer to satisfy your request defined in a Service of LoadBalancer type, apart from creating an external load balancer it takes care of the mapping between this external IP and some standard port assigned to it and a kubernetes service which has all the information needed to route the traffic further to the respective pods. In case you wonder how exactly this is done on GCP side i.e. mapping/binding between the external (or internal) loadbalancer and kubernetes LoadBalancer service, I'm affraid such implementation details are not publicly revealed.
I installed Kubernetes ingress controller on GKE following the official documentation as following.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.46.0/deploy/static/provider/cloud/deploy.yaml
The ingress controller runs fine.
ingress-nginx-admission-create-dvkgp 0/1 Completed 0 5h29m
ingress-nginx-admission-patch-58l4z 0/1 Completed 1 5h29m
ingress-nginx-controller-65d7564f46-2rtjs 1/1 Running 0 5h29m
It creates a TCP load balancer, health checkup and firewall rules automatically. My kubernetes cluster has 3 nodes. Interestingly, the health checkup fails for 2 instances. It passes for the instance where the ingress controller is running. I debug it but didn't find any clue. Could someone help me on this.
If you were to look into the deploy.yaml you applied you would see:
apiVersion: v1
kind: Service
metadata:
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
type: LoadBalancer
externalTrafficPolicy: Local
Notice the externalTrafficPolicy: Local. It is being used to Preserve the client source ip.
It's even better explained here: Source IP for Services with Type=LoadBalancer
From k8s docs:
However, if you're running on Google Kubernetes Engine/GCE, setting the same service.spec.externalTrafficPolicy field to Local forces nodes without Service endpoints to remove themselves from the list of nodes eligible for loadbalanced traffic by deliberately failing health checks.
These health checkups are designed to fail. It works that way so that client IPs can be preserved.
Notice that the one node that is listed as healthy is the one where ingress-nginx-controller pod runs. Delete this pod and wait for it to reschedule on a different node - now this other node should be healthy. Now run 3 pod replicas, one on every node and all nodes will be healthy.
One of the possible reason is Firewall rules. Google has specified IP range and port details of Google Health Check probers. You have to configure ingress allow rule to establish health check probing connection to your backend.
For additional debugging details check this Google Cloud Platform blog: Debugging Health Checks in Load Balancing on Google Compute Engine
I've deployed a Kubernetes cluster on AWS using kops and I'm able to expose my pods using a service with --type=LoadBalancer:
kubectl run sample-nginx --image=nginx --replicas=2 --port=80
kubectl expose deployment sample-nginx --port=80 --type=LoadBalancer
However, I cannot get it to work by specifying service.spec.externalIPs with the public IP of my master node.
I've allowed ingress traffic the specified port and used https://kubernetes.io/docs/concepts/services-networking/service/#external-ips as documentation.
Can anyone clarify how to expose a service on AWS without using the cloud provider's native load balancer?
If you want to avoid using Loadbalancer then you case use NodePort type of service.
NodePort exposes service on each Node’s IP at a static port (the NodePort).
ClusterIP service that NodePort service routes is created along. You will be able to reach the NodePort service, from outside by requesting:
<NodeIP>:<NodePort>
That means that if you access any node with that port you will be able to reach your service. It worth to remember that NodePorts are high-numbered ports (30 000 - 32767)
Coming back specifically to AWS here is theirs official document how to expose a services along with NodePort explained.
Do note very important inforamation there about enabling the ports:
Note: Before you access NodeIP:NodePort from an outside cluster, you must enable the security group of the nodes to allow
incoming traffic through your service port.
Let me know if this helps.
We have set up OpenShift Origin on AWS using this handy guide. Our eventual
hope is to have some pods running REST or similar services that we can access
for development purposes. Thus, we don't need DNS or anything like that at this
point, just a public IP with open ports that points to one of our running pods.
Our first proof of concept is trying to get a jenkins (or even just httpd!) pod
that's running inside OpenShift to be exposed via an allocated Elastic IP.
I'm not a network engineer by any stretch, but I was able to successuflly get
an Elastic IP connected to one of my OpenShift "worker" instances, which I
tested by sshing to the public IP allocated to the Elastic IP. At this point
we're struggling to figure out how to make a pod visible that allocated Elastic IP,
owever. We've tried a kubernetes LoadBalancer service, a kubernetes Ingress,
and configuring an AWS Network Load Balancer, all without being able to
successfully connect to 18.2XX.YYY.ZZZ:8080 (my public IP).
The most promising success was using oc port-forward seemed to get at least part way
through, but frustratingly hangs without returning:
$ oc port-forward --loglevel=7 jenkins-2-c1hq2 8080 -n my-project
I0222 19:20:47.708145 73184 loader.go:354] Config loaded from file /home/username/.kube/config
I0222 19:20:47.708979 73184 round_trippers.go:383] GET https://ec2-18-2AA-BBB-CCC.us-east-2.compute.amazonaws.com:8443/api/v1/namespaces/my-project/pods/jenkins-2-c1hq2
....
I0222 19:20:47.758306 73184 round_trippers.go:390] Request Headers:
I0222 19:20:47.758311 73184 round_trippers.go:393] X-Stream-Protocol-Version: portforward.k8s.io
I0222 19:20:47.758316 73184 round_trippers.go:393] User-Agent: oc/v1.6.1+5115d708d7 (linux/amd64) kubernetes/fff65cf
I0222 19:20:47.758321 73184 round_trippers.go:393] Authorization: Bearer Pqg7xP_sawaeqB2ub17MyuWyFnwdFZC5Ny1f122iKh8
I0222 19:20:47.800941 73184 round_trippers.go:408] Response Status: 101 Switching Protocols in 42 milliseconds
I0222 19:20:47.800963 73184 round_trippers.go:408] Response Status: 101 Switching Protocols in 42 milliseconds
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
( oc port-forward hangs at this point and never returns)
We've found a lot of information about how to get this working under GKE, but
nothing that's really helpful for getting this working for OpenShift Origin on
AWS. Any ideas?
Update:
So we realized that sysdig.com's blog post on deploying OpenShift Origin on AWS was missing some key AWS setup information, so based on OpenShift Origin's Configuring AWS page, we set the following env variables and re-ran the ansible playbook:
$ export AWS_ACCESS_KEY_ID='AKIASTUFF'
$ export AWS_SECRET_ACCESS_KEY='STUFF'
$ export ec2_vpc_subnet='my_vpc_subnet'
$ ansible-playbook -c paramiko -i hosts openshift-ansible/playbooks/byo/config.yml --key-file ~/.ssh/my-aws-stack
I think this gets us closer, but creating a load-balancer service now gives us an always-pending IP:
$ oc get services
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
jenkins-lb 172.30.XX.YYY <pending> 8080:31338/TCP 12h
The section on AWS Applying Configuration Changes seems to imply I need to use AWS Instance IDs rather than hostnames to identify my nodes, but I tried this and OpenShift Origin fails to start if I use that method. Still at a loss.
It may not satisfy the "Elastic IP" part but how about using AWS cloud provider ELB to expose the IP/port to the pod via a service to the pod with LoadBalancer option?
Make sure to configure the AWS cloud provider for the cluster (References)
Create a svc to the pod(s) with type LoadBalancer.
For instance to expose a Dashboard via AWS ELB.
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
type: LoadBalancer <-----
ports:
- port: 443
targetPort: 8443
selector:
k8s-app: kubernetes-dashboard
Then the svc will be exposed as an ELB and the pod can be accessed via the ELB public DNS name a53e5811bf08011e7bae306bb783bb15-953748093.us-west-1.elb.amazonaws.com.
$ kubectl (oc) get svc kubernetes-dashboard -n kube-system -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes-dashboard LoadBalancer 10.100.96.203 a53e5811bf08011e7bae306bb783bb15-953748093.us-west-1.elb.amazonaws.com 443:31636/TCP 16m k8s-app=kubernetes-dashboard
References
K8S AWS Cloud Provider Notes
Reference Architecture OpenShift Container Platform on Amazon Web Services
DEPLOYING OPENSHIFT CONTAINER PLATFORM 3.5 ON AMAZON WEB SERVICES
Configuring for AWS
Check this guide out: https://github.com/dwmkerr/terraform-aws-openshift
It's got some significant advantages vs. the one you referring to in your post. Additionally, it has a clear terraform spec that you can modify and reset to using an Elastic IP (haven't tried myself but should work).
Another way to "lock" your access to the installation is to re-code the assignment of the Public URL to the master instance in the terraform script, e.g., to a domain that you own (the default script sets it to an external IP-based value with "xip.io" added - works great for testing), then set up a basic ALB that forwards https 443 and 8443 to the master instance that the install creates (you can do it manually after the install is completed, also need a second dummy Subnet; dummy-up the healthcheck as well) and link the ALB to your domain via Route53. You can even use free Route53 wildcard certs with this approach.
I am very new to kubernetes and have just got a stock kubernetes v.1.3.5 cluster up on AWS using kube-up. So far, I have been playing around with kubernetes in understanding it's mechanics (nodes, pods, svc and stuff). Based on my initial (or maybe crude) understanding , I had few questions:
1) How does routing to cluster IP work here (i.e in kube-aws) ? I see that the services have IPs in the range 10.0.0.0/16. I did a deployment with rc=3 of stock nginx and then attached a service to it with Node Port exposed. All works great! I can connect to the service from my dev machine. This nginx service has a cluster IP of 10.0.33.71:1321. Now, if I ssh into one of the minions(or nodes or VMS) and do a "telnet 10.0.33.71 1321", it connects as expected. But I am clueless how this works, I couldn't find any routes related to 10.0.0.0/16 in the VPC setup by kubernetes. What exactly happens under the hood here that results in a successful connection for app like telnet? However, If I ssh into the master node and do "telnet 10.0.33.71 1321", it does not connect. Why does it fail to connect from master?
2) There is a cbr0 interface inside each node. Each minion node has cbr0 configured as 10.244.x.0/24 and master has cbr0 as 10.246.0.0/24.
I can ping to any of the 10.244.x.x pods from any of the nodes(including master). But I am not able to ping 10.246.0.1 (cbr0 inside master node) from any of the minion nodes. What could be happening here?
Here's the routes set up by kubernetes in aws. VPC.
Destination Target
172.20.0.0/16 local
0.0.0.0/0 igw-<hex value>
10.244.0.0/24 eni-<hex value> / i-<hex value>
10.244.1.0/24 eni-<hex value> / i-<hex value>
10.244.2.0/24 eni-<hex value> / i-<hex value>
10.244.3.0/24 eni-<hex value> / i-<hex value>
10.244.4.0/24 eni-<hex value> / i-<hex value>
10.246.0.0/24 eni-<hex value> / i-<hex value>
Mark Betz (SRE at Olark) presents Kubernetes networking in three articles:
pods
services:
ingress
For a pod, you are looking at:
You find:
etho0: a "physical network interface"
docker0/cbr0: a bridge for connecting two ethernet segments no matter their protocol.
veth0, 1, 2: Virtual Network Interface, one per container.
docker0 is the default Gateway of veth0. It is named cbr0 for "custom bridge".
Kubernetes starts containers by sharing the same veth0, which means each container must expose different ports.
pause: a special container started in "pause", to detect SIGTERM sent to a pod, and forward it to the containers.
node: an host
cluster: a group of nodes
router/gateway
The last element is where things start to be more complex:
Kubernetes assigns an overall address space for the bridges on each node, and then assigns the bridges addresses within that space, based on the node the bridge is built on.
Secondly, it adds routing rules to the gateway at 10.100.0.1 telling it how packets destined for each bridge should be routed, i.e. which node’s eth0 the bridge can be reached through.
Such a combination of virtual network interfaces, bridges, and routing rules is usually called an overlay network.
When a pod contacts another pod, it goes through a service.
Why?
Pod networking in a cluster is neat stuff, but by itself it is insufficient to enable the creation of durable systems. That’s because pods in Kubernetes are ephemeral.
You can use a pod IP address as an endpoint but there is no guarantee that the address won’t change the next time the pod is recreated, which might happen for any number of reasons.
That means: you need a reverse-proxy/dynamic load-balancer. And it better be resilient.
A service is a type of kubernetes resource that causes a proxy to be configured to forward requests to a set of pods.
The set of pods that will receive traffic is determined by the selector, which matches labels assigned to the pods when they were created
That service uses its own network. By default, its type is "ClusterIP"; it has its own IP.
Here is the communication path between two pods:
It uses a kube-proxy.
This proxy uses itself a netfilter.
netfilter is a rules-based packet processing engine.
It runs in kernel space and gets a look at every packet at various points in its life cycle.
It matches packets against rules and when it finds a rule that matches it takes the specified action.
Among the many actions it can take is redirecting the packet to another destination.
In this mode, kube-proxy:
opens a port (10400 in the example above) on the local host interface to listen for requests to the test-service,
inserts netfilter rules to reroute packets destined for the service IP to its own port, and
forwards those requests to a pod on port 8080.
That is how a request to 10.3.241.152:80 magically becomes a request to 10.0.2.2:8080.
Given the capabilities of netfilter all that’s required to make this all work for any service is for kube-proxy to open a port and insert the correct netfilter rules for that service, which it does in response to notifications from the master api server of changes in the cluster.
But:
There’s one more little twist to the tale.
I mentioned above that user space proxying is expensive due to marshaling packets.
In kubernetes 1.2, kube-proxy gained the ability to run in iptables mode.
In this mode, kube-proxy mostly ceases to be a proxy for inter-cluster connections, and instead delegates to netfilter the work of detecting packets bound for service IPs and redirecting them to pods, all of which happens in kernel space.
In this mode kube-proxy’s job is more or less limited to keeping netfilter rules in sync.
The network schema becomes:
However, this is not a good fit for external (public facing) communication, which needs an external fixed IP.
You have dedicated services for that: nodePort and LoadBalancer:
A service of type NodePort is a ClusterIP service with an additional capability: it is reachable at the IP address of the node as well as at the assigned cluster IP on the services network.
The way this is accomplished is pretty straightforward:
When kubernetes creates a NodePort service, kube-proxy allocates a port in the range 30000–32767 and opens this port on the eth0 interface of every node (thus the name “NodePort”).
Connections to this port are forwarded to the service’s cluster IP.
You get:
A Loadalancer is more advancer, and allows to expose services using stand ports.
See the mapping here:
$ kubectl get svc service-test
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
openvpn 10.3.241.52 35.184.97.156 80:32213/TCP 5m
However:
Services of type LoadBalancer have some limitations.
You cannot configure the lb to terminate https traffic.
You can’t do virtual hosts or path-based routing, so you can’t use a single load balancer to proxy to multiple services in any practically useful way.
These limitations led to the addition in version 1.2 of a separate kubernetes resource for configuring load balancers, called an Ingress.
The Ingress API supports TLS termination, virtual hosts, and path-based routing. It can easily set up a load balancer to handle multiple backend services.
The implementation follows a basic kubernetes pattern: a resource type and a controller to manage that type.
The resource in this case is an Ingress, which comprises a request for networking resources
For instance:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: test-ingress
annotations:
kubernetes.io/ingress.class: "gce"
spec:
tls:
- secretName: my-ssl-secret
rules:
- host: testhost.com
http:
paths:
- path: /*
backend:
serviceName: service-test
servicePort: 80
The ingress controller is responsible for satisfying this request by driving resources in the environment to the necessary state.
When using an Ingress you create your services as type NodePort and let the ingress controller figure out how to get traffic to the nodes.
There are ingress controller implementations for GCE load balancers, AWS elastic load balancers, and for popular proxies such as NGiNX and HAproxy.