Dynamically create public IPs or subdomains for EKS pods - amazon-web-services

Complex AWS EKS / ENI / Route53 issue has us stumped. Need an expert.
Context:
We are working on dynamic game servers for a social platform (https://myxr.social) that transport game and video data using WebRTC / UDP SCTP/SRTP via https://MediaSoup.org
Each game server will have about 50 clients
Each client requires 2-4 UDP ports
Our working devops strategy
https://github.com/xr3ngine/xr3ngine/tree/dev/packages/ops
We are provisioning these game servers using Kubernetes and https://agones.dev
Mediasoup requires each server connection to a client be assigned individual ports. Each client will need two ports, one for sending data and one for receiving data; with a target maximum of about 50 users per server, this requires 100 ports per server be publicly accessible.
We need some way to route this UDP traffic to the corresponding gameserver. Ingresses appear to primarily handle HTTP(S) traffic, and configuring our NGINX ingress controller to handle UDP traffic assumes that we know our gameserver Services ahead of time, which we do not since the gameservers are spun up and down as they are needed.
Questions:
We see two possible ways to solve this problem.
Path 1
Assign each game server in the node group public IPs and then allocate ports for each client. Either IP v4 or v6. This would require SSL termination for IP ports in AWS. Can we use ENI and EKS to dynamically create and provision IP ports for each gameserver w/ SSL? Essentially expose these pods to the internet via a public subnet with them each having their own IP address or subdomain. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html We have been referencing this documentation trying to figure out if this is possible.
Path 2
Create a subdomain (eg gameserver01.gs.xrengine.io, etc) dynamically for each gameserver w/ dynamic port allocation for each client (eg client 1 [30000-30004], etc). This seems to be limited by the ports accessible in the EKS fleet.
Are either of these approaches possible? Is one better? Can you give us some detail about how we should go about implementation?

The native way for receiving UDP traffic on Amazon EKS is by using a Kubernetes Service of type Loadbalancer with an extra annotation to get the NLB.
Example
apiVersion: v1
kind: Service
metadata:
name: my-game-app-service
annotation:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
selector:
app: my-game-app
ports:
- name: outgoing-port # choose your name
protocol: UDP
port: 9000 # choose your port
- name: incoming-port # choose your name
protocol: UDP
port: 9001 # choose your port
type: LoadBalancer

Related

RedHat Openshift on AWS

I have installed Openshift on AWS using installer provisioned Tool. My Container application is an VOIP server and it need to have 4 public IP addresses. So from external world other VOIP devices can connect to these 4 public IP addresses using SIP/RTP protocol messages. How can I do that? I tried setting up own VPC then install Openshift. But Openshift always install compute Node on private subnet. If I dont pass a private subnet in install script, openshft wont start installation process.
Will multus CNI can give me 4 public IP addresses for my Container?
Thanks,
Prince
Set up your cluster using the standard IPI (installer-provisioned infrastructure) installation using the openshift-installer.
Then, create a Service of type LoadBalancer for each of the IPs that you want: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#create-a-service-from-a-manifest
apiVersion: v1
kind: Service
metadata:
name: example-service
spec:
selector:
app: example
ports:
- port: 8765
targetPort: 9376
type: LoadBalancer
In your case you'll have to create 4 of these Services with different ports, so you'll get 4 different IPs assigned to the Load Balancer (and the Service). I do not think you need Multus or similar to achieve this.

How does GCP internal load balancer + GKE service work? (It works, but I do not know why)

E.g. an istio service
istio-ingressgateway LoadBalancer 10.103.19.83 10.160.32.41 15021:30943/TCP,80:32609/TCP,443:30341/TCP,3306:30682/TCP,15443:30302/TCP
Which resulted in a TCP internal load balancer. The front end is ports 15021, 80, 443, 3306, and 15443.
The backend is basically the instance group of the cluster.
How does the load balancer know 443 at the front end will forward to 30341 at backend? As far as I know, TCP load balancer is doing port forwarding? How/Where does the magic happening
The LoadBalancer Service type is an extension of the NodePort type, which is an extension of the ClusterIP type. A nodePort just opens up a port in the range 30000-32767 on each worker node and uses a label selector to identify which Pods to send the traffic to.
This means that internal clients call the Service by using the internal IP address of a node along with the TCP port specified by nodePort. The request is forwarded to one of the member Pods on the TCP port specified by the targetPort field.
Here’s an example
When a Service is created in kubernetes, a corresponding Endpoints object is created along with it. It also applies to LoadBalancer service type.
If you create a simple nginx deployment e.g. by running:
kubectl apply -f https://k8s.io/examples/application/deployment.yaml
and then expose it as a LoadBalancer service:
kubectl apply -f https://k8s.io/examples/application/deployment.yaml
apart from the service itself, you will also see the lb-nginx Endpoints object. You can inspect its details:
kubectl get ep lb-nginx -o yaml
As you can see it keeps track of all exposed pods (being part of a Deployment in this case) so that corresponding iptables rules, which are responsible for forwarding the traffic to a particular pod, can be up-to-date all the time, even if number of them or their ip chages.
You can e.g. scale your deployment to 5 replicas:
kubectl scale deployment nginx-deployment --replicas=5
and inspect the Endpoints object again:
kubectl get ep lb-nginx -o yaml
and you will see that right after your 5 pods are up and running it immediately gets updated as well.
As you can see in subsets section of the yaml:
subsets:
- addresses:
- ip: 10.12.0.3
nodeName: gke-gke-default-pool-75259266-oauz
targetRef:
kind: Pod
name: nginx-deployment-66b6c48dd5-dw9mt
namespace: default
resourceVersion: "22394113"
uid: 8d7e1d3e-64e2-4891-b567-61ee48f61ed1
apart from the ip address of the Pod it maintains information about the node on which it is running.
Let's go back for a moment to the Service:
kubectl get svc lb-nginx -o yaml
As you can see LoadBalancer service apart from its external IP address has its ClusterIP as every other Service (well, almost every as headless services don't have ClusterIP):
spec:
clusterIP: 10.16.6.236
clusterIPs:
- 10.16.6.236
externalTrafficPolicy: Cluster
ports:
- nodePort: 31935
port: 80
protocol: TCP
targetPort: 80
So as you can imagine this external IP is somehow mapped to the cluster ip so it route the traffic further to the respective endpoints in the cluster. How exactly this mapping is done doesn't really matter as it is done by the cloud provider and such implementation details are not part of publicly shared knowledge. The only thing you need to know is that when your cloud provider provisions an external load balancer to satisfy your request defined in a Service of LoadBalancer type, apart from creating an external load balancer it takes care of the mapping between this external IP and some standard port assigned to it and a kubernetes service which has all the information needed to route the traffic further to the respective pods. In case you wonder how exactly this is done on GCP side i.e. mapping/binding between the external (or internal) loadbalancer and kubernetes LoadBalancer service, I'm affraid such implementation details are not publicly revealed.

Exposing a K8s TCP Service Endpoint to the Public Internet Without a Load Balancer

So I'm working on a project that involves managing many postgres instances inside of a k8s cluster. Each instance is managed using a Stateful Set with a Service for network communication. I need to expose each Service to the public internet via DNS on port 5432.
The most natural approach here is to use the k8s Load Balancer resource and something like external dns to dynamically map a DNS name to a load balancer endpoint. This is great for many types of services, but for databases there is one massive limitation: the idle connection timeout. AWS ELBs have a maximum idle timeout limit of 4000 seconds. There are many long running analytical queries/transactions that easily exceed that amount of time, not to mention potentially long-running operations like pg_restore.
So I need some kind of solution that allows me to work around the limitations of Load Balancers. Node IPs are out of the question since I will need port 5432 exposed for every single postgres instance in the cluster. Ingress also seems less than ideal since it's a layer 7 proxy that only supports HTTP/HTTPS. I've seen workarounds with nginx-ingress involving some configmap chicanery, but I'm a little worried about committing to hacks like that for a large project. ExternalName is intriguing but even if I can find better documentation on it I think it may end up having similar limitations as NodeIP.
Any suggestions would be greatly appreciated.
The Kubernetes ingress controller implementation Contour from Heptio can proxy TCP streams when they are encapsulated in TLS. This is required to use the SNI handshake message to direct the connection to the correct backend service.
Contour can handle ingresses, but introduces additionally a new ingress API IngressRoute which is implemented via a CRD. The TLS connection can be terminated at your backend service. An IngressRoute might look like this:
apiVersion: contour.heptio.com/v1beta1
kind: IngressRoute
metadata:
name: postgres
namespace: postgres-one
spec:
virtualhost:
fqdn: postgres-one.example.com
tls:
passthrough: true
tcpproxy:
services:
- name: postgres
port: 5432
routes:
- match: /
services:
- name: dummy
port: 80
ha proxy supports tcp load balancing. you can look at ha-proxy as a proxy and load balancer for postgres database. it can support both tls and non tls connections.

GKE: Pubsub messages between pods with push subscribers

I am using GKE deployment with multiple pods and I need to send and receive messages between pods. I want to use pubsub push subscribers.
I found for push I need to configure https access for subscribers pods.
In order to receive push messages, you need a publicly accessible HTTPS server to handle POST requests. The server must present a valid SSL certificate signed by a certificate authority and routable by DNS. You also need to validate that you own the domain (or have equivalent access to the endpoint).
Is this really required or is there some workaround. Does it mean I should expose each subscriber pod with Ingress, even for internal communication?
If you only need pods to be exposed on a certain port (for pod to pod communication) then you would just need to expose each pod via a service that targets that port (in your case port 443).
For example, by using the following YAML you can create a service which targets a port on a pod(s):
apiVersion: v1
kind: Service
metadata:
name: my-pod
labels:
run: my--pod
spec:
ports:
- port: 443
targetPort: 443
protocol: TCP
selector:
run: my-pod
The above would create a Service which targets TCP port 443 on any Pod with the run: my-pod label. In the file, targetPort is the port the container (within the pod) accepts traffic on, and port is the abstracted Service port, which can be any port other pods use to access the Service).
EDIT:
However, if you need the pods to be able to communicate with the Pub-Sub API,then the ability to communicate externally is required, so yes ingress would be recommended.
In response to your question in the comment "I wonder why Google needs to access Kubernetes with public HTTPS instead on some internal request"- The reason is it isn't an internal request. The Pub-Sub API sits outside of your project/network, so data travels across other networks. For it to be secure, It needs to be encrypted- this is the reason HTTPS is used.

how is cluster IP in kubernetes-aws configured?

I am very new to kubernetes and have just got a stock kubernetes v.1.3.5 cluster up on AWS using kube-up. So far, I have been playing around with kubernetes in understanding it's mechanics (nodes, pods, svc and stuff). Based on my initial (or maybe crude) understanding , I had few questions:
1) How does routing to cluster IP work here (i.e in kube-aws) ? I see that the services have IPs in the range 10.0.0.0/16. I did a deployment with rc=3 of stock nginx and then attached a service to it with Node Port exposed. All works great! I can connect to the service from my dev machine. This nginx service has a cluster IP of 10.0.33.71:1321. Now, if I ssh into one of the minions(or nodes or VMS) and do a "telnet 10.0.33.71 1321", it connects as expected. But I am clueless how this works, I couldn't find any routes related to 10.0.0.0/16 in the VPC setup by kubernetes. What exactly happens under the hood here that results in a successful connection for app like telnet? However, If I ssh into the master node and do "telnet 10.0.33.71 1321", it does not connect. Why does it fail to connect from master?
2) There is a cbr0 interface inside each node. Each minion node has cbr0 configured as 10.244.x.0/24 and master has cbr0 as 10.246.0.0/24.
I can ping to any of the 10.244.x.x pods from any of the nodes(including master). But I am not able to ping 10.246.0.1 (cbr0 inside master node) from any of the minion nodes. What could be happening here?
Here's the routes set up by kubernetes in aws. VPC.
Destination Target
172.20.0.0/16 local
0.0.0.0/0 igw-<hex value>
10.244.0.0/24 eni-<hex value> / i-<hex value>
10.244.1.0/24 eni-<hex value> / i-<hex value>
10.244.2.0/24 eni-<hex value> / i-<hex value>
10.244.3.0/24 eni-<hex value> / i-<hex value>
10.244.4.0/24 eni-<hex value> / i-<hex value>
10.246.0.0/24 eni-<hex value> / i-<hex value>
Mark Betz (SRE at Olark) presents Kubernetes networking in three articles:
pods
services:
ingress
For a pod, you are looking at:
You find:
etho0: a "physical network interface"
docker0/cbr0: a bridge for connecting two ethernet segments no matter their protocol.
veth0, 1, 2: Virtual Network Interface, one per container.
docker0 is the default Gateway of veth0. It is named cbr0 for "custom bridge".
Kubernetes starts containers by sharing the same veth0, which means each container must expose different ports.
pause: a special container started in "pause", to detect SIGTERM sent to a pod, and forward it to the containers.
node: an host
cluster: a group of nodes
router/gateway
The last element is where things start to be more complex:
Kubernetes assigns an overall address space for the bridges on each node, and then assigns the bridges addresses within that space, based on the node the bridge is built on.
Secondly, it adds routing rules to the gateway at 10.100.0.1 telling it how packets destined for each bridge should be routed, i.e. which node’s eth0 the bridge can be reached through.
Such a combination of virtual network interfaces, bridges, and routing rules is usually called an overlay network.
When a pod contacts another pod, it goes through a service.
Why?
Pod networking in a cluster is neat stuff, but by itself it is insufficient to enable the creation of durable systems. That’s because pods in Kubernetes are ephemeral.
You can use a pod IP address as an endpoint but there is no guarantee that the address won’t change the next time the pod is recreated, which might happen for any number of reasons.
That means: you need a reverse-proxy/dynamic load-balancer. And it better be resilient.
A service is a type of kubernetes resource that causes a proxy to be configured to forward requests to a set of pods.
The set of pods that will receive traffic is determined by the selector, which matches labels assigned to the pods when they were created
That service uses its own network. By default, its type is "ClusterIP"; it has its own IP.
Here is the communication path between two pods:
It uses a kube-proxy.
This proxy uses itself a netfilter.
netfilter is a rules-based packet processing engine.
It runs in kernel space and gets a look at every packet at various points in its life cycle.
It matches packets against rules and when it finds a rule that matches it takes the specified action.
Among the many actions it can take is redirecting the packet to another destination.
In this mode, kube-proxy:
opens a port (10400 in the example above) on the local host interface to listen for requests to the test-service,
inserts netfilter rules to reroute packets destined for the service IP to its own port, and
forwards those requests to a pod on port 8080.
That is how a request to 10.3.241.152:80 magically becomes a request to 10.0.2.2:8080.
Given the capabilities of netfilter all that’s required to make this all work for any service is for kube-proxy to open a port and insert the correct netfilter rules for that service, which it does in response to notifications from the master api server of changes in the cluster.
But:
There’s one more little twist to the tale.
I mentioned above that user space proxying is expensive due to marshaling packets.
In kubernetes 1.2, kube-proxy gained the ability to run in iptables mode.
In this mode, kube-proxy mostly ceases to be a proxy for inter-cluster connections, and instead delegates to netfilter the work of detecting packets bound for service IPs and redirecting them to pods, all of which happens in kernel space.
In this mode kube-proxy’s job is more or less limited to keeping netfilter rules in sync.
The network schema becomes:
However, this is not a good fit for external (public facing) communication, which needs an external fixed IP.
You have dedicated services for that: nodePort and LoadBalancer:
A service of type NodePort is a ClusterIP service with an additional capability: it is reachable at the IP address of the node as well as at the assigned cluster IP on the services network.
The way this is accomplished is pretty straightforward:
When kubernetes creates a NodePort service, kube-proxy allocates a port in the range 30000–32767 and opens this port on the eth0 interface of every node (thus the name “NodePort”).
Connections to this port are forwarded to the service’s cluster IP.
You get:
A Loadalancer is more advancer, and allows to expose services using stand ports.
See the mapping here:
$ kubectl get svc service-test
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
openvpn 10.3.241.52 35.184.97.156 80:32213/TCP 5m
However:
Services of type LoadBalancer have some limitations.
You cannot configure the lb to terminate https traffic.
You can’t do virtual hosts or path-based routing, so you can’t use a single load balancer to proxy to multiple services in any practically useful way.
These limitations led to the addition in version 1.2 of a separate kubernetes resource for configuring load balancers, called an Ingress.
The Ingress API supports TLS termination, virtual hosts, and path-based routing. It can easily set up a load balancer to handle multiple backend services.
The implementation follows a basic kubernetes pattern: a resource type and a controller to manage that type.
The resource in this case is an Ingress, which comprises a request for networking resources
For instance:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: test-ingress
annotations:
kubernetes.io/ingress.class: "gce"
spec:
tls:
- secretName: my-ssl-secret
rules:
- host: testhost.com
http:
paths:
- path: /*
backend:
serviceName: service-test
servicePort: 80
The ingress controller is responsible for satisfying this request by driving resources in the environment to the necessary state.
When using an Ingress you create your services as type NodePort and let the ingress controller figure out how to get traffic to the nodes.
There are ingress controller implementations for GCE load balancers, AWS elastic load balancers, and for popular proxies such as NGiNX and HAproxy.