AWS EKS load balancer service stuck at external-ip[pending]

AWS EKS load balancer service stuck at external-ip[pending] - amazon-web-services

I am new to aws.
I am tryin to deploy my application to aws eks, everything is created well, except for my caddy server service, it stuck at pending status when it tries to get external-ip.
When I describe the service this is the output:
Name: caddy
Namespace: default
Labels: app=caddy
Annotations: service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-type: external
Selector: app=caddy
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.100.4.149
IPs: 10.100.4.149
Port: http 80/TCP
TargetPort: 80/TCP
NodePort: http 31064/TCP
Endpoints: 192.168.26.17:80
Port: https 443/TCP
TargetPort: 443/TCP
NodePort: https 30707/TCP
Endpoints: 192.168.26.17:443
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 16m service-controller Ensuring load balancer
Warning FailedBuildModel 15m service Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
status code: 400, request id: dd76289e-ca16-48e5-8985-3a4fc1b64f43
Warning FailedBuildModel 7m49s service Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
status code: 400, request id: 62ed516f-c505-4bc8-979f-74edc449217e

I discovered that the problem was coming from the serviceAccount I have created, there was a a typo in the OIDC provider URI.

Related

accessing kubernetes service from local host

I created a single node cluster. There is a nodeport service
kubectl get all --namespace default
service/backend-org-1-substra-backend-server NodePort 10.43.81.5 <none> 8000:30068/TCP 4d23h
The node ip is
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3d-k3s-default-server-0 Ready control-plane,master 5d v1.24.4+k3s1 172.18.0.2 <none> K3s dev 5.15.0-1028-aws containerd://1.6.6-k3s1
From the same host, but not inside the cluster, I can ping the 172.18.0.2 ip. Since the backend-org-1-substra-backend-server is a nodeport, shouldn't I be able to access it by
curl 172.18.0.2:30068? I get
curl: (7) Failed to connect to 172.18.0.2 port 30068 after 0 ms: Connection refused
additional information:
$ kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
$ kubectl get nodes -o yaml
...
addresses:
- address: 172.24.0.2
type: InternalIP
- address: k3d-k3s-default-server-0
type: Hostname
allocatable:
$ kubectl describe svc backend-org-1-substra-backend-server
Name: backend-org-1-substra-backend-server
Namespace: org-1
Labels: app.kubernetes.io/instance=backend-org-1
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=substra-backend-server
app.kubernetes.io/part-of=substra-backend
app.kubernetes.io/version=0.34.1
helm.sh/chart=substra-backend-22.3.1
skaffold.dev/run-id=394a8d19-bbc8-4a3b-b04e-08e0fff40681
Annotations: meta.helm.sh/release-name: backend-org-1
meta.helm.sh/release-namespace: org-1
Selector: app.kubernetes.io/instance=backend-org-1,app.kubernetes.io/name=substra-backend-server
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.43.68.217
IPs: 10.43.68.217
Port: http 8000/TCP
TargetPort: http/TCP
NodePort: http 31960/TCP
Endpoints: <none>
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
Here, I noticed the endpoints shows . which worries me.
I followed the doc at https://docs.substra.org/en/stable/contributing/getting-started.html
It's a lot to ask someone to replicate the whole thing.
My point is AFAIK, the nodeport service allows callers from outside the cluster to call pods inside the cluster. But neither the cluster ip nor the node ip allows me to curl that service.

I found that it was due to a faulty installation. Now wget to the load balancer ip and port does get a connection.

Ingress controller's load balancer address - Could not resolve host

I'm using Nginx ingress controller in kubernetes cluster and its host address (load balancer DNS name) is returning "Could not resolve host: ..".
In AWS, I don't even have load balancer with this DNS name.
However, when I run kubectl logs on ingress controller, it's still receiving traffic normally.
How is it possible for ingress controller to have a host that can't be found and still receives traffic?
Nginx Ingress Controller service
-> kubectl -n ingress-nginx describe svc ingress-nginx-controller
...
LoadBalancer Ingress: xxx.elb.ap-northeast-2.amazonaws.com #deprecated
Port: http 80/TCP
TargetPort: http/TCP
NodePort: http 31610/TCP
Endpoints: 10.0.51.53:80
Port: https 443/TCP
TargetPort: http/TCP
NodePort: https 32544/TCP
Endpoints: 10.0.51.53:80
Session Affinity: None
External Traffic Policy: Cluster
Events:
Warning SyncLoadBalancerFailed 32m (x6919 over 24d) service-controller (combined from similar events): Error syncing load balancer: failed to ensure load balancer: error creating load balancer listener: "TargetGroupAssociationLimit:
The following target groups cannot be associated with more than one load balancer: arn:aws:elasticloadbalancing:ap-northeast-2:4xxx:targetgroup/k8s-ingressn-ingressn-5f8ebc7e16/25aa0ef278298505\n\tstatus
code: 400, request id: d1767330-f7d1-4c4f-bcf1-4f1e4af8ab9f"
Normal
EnsuringLoadBalancer 2m40s (x6934 over 24d) service-controller Ensuring load balancer
-> curl xxx.elb.ap-northeast-2.amazonaws.com
curl: (6) Could not resolve host: xxx.elb.ap-northeast-2.amazonaws.com
Ingress using deprecated DNS
-> kubectl describe ingress slack-api-ingress
Name: slack-api-ingress
Namespace: default
Address: xxx.elb.ap-northeast-2.amazonaws.com #deprecated
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
...
I have another NLB in AWS which was initially provisioned by ingress controller and I can still access my services with this lb's DNS name.
I'm assuming that ingress controller tried to update and create a new load balancer, but still directs traffic to previous lb after failing to create new lb. If this is the case, I want ingress-controller to use previous lb's DNS.
I'm new to kubernetes and probably don't know what's going on so any advice is appreciated.

Why does GCE Load Balancer behave differently through the domain name and the IP address?

A backend service happens to be returning Status 404 on the health check path of the Load Balancer. When I browse to the Load Balancer's domain name, I get "Error: Server Error/ The server encountered a temporary error", and the logs show
"type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry"
statusDetails: "failed_to_pick_backend", which makes sense.
When I browse to the Load Balancer's Static IP, my browser shows the 404 Error Message which the underlying Kubernetes Pod returned, In other words the Load Balancer passed on the request despite the failed health check.
Why these two different behaviors?
[Edit]
Here is the yaml for the Ingress that created the Load Balancer:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress1
spec:
rules:
- host: example.com
http:
paths:
- backend:
serviceName: myservice
servicePort: 80

I did a "deep dive" into that and managed to reproduce the situation on my GKE cluster, so now I can tell that there are a few things combined here.
A backend service happens to be returning Status 404 on the health check path of the Load Balancer.
There could be 2 options (it is not clear from the description you have provided).
something like:
"Error: Server Error
The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds."
This one you are geting from LoadBalancer in case HealthCheck failed for pod. The official documentation on GKE Ingress object says that
a Service exposed through an Ingress must respond to health checks from the load balancer.
Any container that is the final destination of load-balanced traffic must do one of the following to indicate that it is healthy:
Serve a response with an HTTP 200 status to GET requests on the / path.
Configure an HTTP readiness probe. Serve a response with an HTTP 200 status to GET requests on the path specified by the readiness probe. The Service exposed through an Ingress must point to the same container port on which the readiness probe is enabled.
It is needed to fix HealthCheck handling. You can check Load balancer details by visiting GCP console - Network Services - Load Balancing.
"404 Not Found -- nginx/1.17.6"
This one is clear. That is the response returned by endpoint myservice is sending request to. It looks like something is misconfigured there. My guess is that pod merely can't serve that request properly. Can be nginx web-server issue, etc. Please check the configuration to find out why pod can't serve the request.
While playing with the setup I have find an image that allows you to check if request has reached the pod and requests headers.
so it is possible to create a pod like:
apiVersion: v1
kind: Pod
metadata:
annotations:
run: fake-web
name: fake-default-knp
# namespace: kube-system
spec:
containers:
- image: mendhak/http-https-echo
imagePullPolicy: IfNotPresent
name: fake-web
ports:
- containerPort: 8080
protocol: TCP
to be able to see all the headers that were in incoming requests (kubectl logs -f fake-default-knp ).
When I browse to the Load Balancer's Static IP, my browser shows the 404 Error Message which the underlying Kubernetes Pod returned.
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress1
spec:
rules:
- host: example.com
http:
paths:
- backend:
serviceName: myservice
servicePort: 80
Upon creation of such an Ingress object, there will be at least 2 backends in GKE cluster.
- the backend you have specified upon Ingress creation ( myservice one)
- the default one (created upon cluster creation).
kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP
l7-default-backend-xyz 1/1 Running 0 20d 10.52.0.7
Please note that myservice serves only requests that have Host header set to example.com . The rest of requests are sent to "default backend" . That is the reason why you are receiving "default backend - 404" error message upon browsing to LoadBalancer's IP address.
Technically there is a default-http-backend service that has l7-default-backend-xyz as an EndPoint.
kubectl get svc -n kube-system -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default-http-backend NodePort 10.0.6.134 <none> 80:31806/TCP 20d k8s-app=glbc
kubectl get ep -n kube-system
NAME ENDPOINTS AGE
default-http-backend 10.52.0.7:8080 20d
Again, that's the "object" that returns the "default backend - 404" error for the requests with "Host" header not equal to the one you specified in Ingress.
Hope that it sheds a light on the issue :)
EDIT:
myservice serves only requests that have Host header set to example.com." So you are saying that requests go to the LB only when there is a host header?
Not exactly. The LB receives all the requests and passes requests in accordance to "Host" header value. Requests with example.com Host header are going to be served on myservice backend .
To put it simple the logic is like the following:
request arrives;
system checks the Host header (to determine user's backend)
request is served if there is a suitable user's backend ( according to the Ingress config) and that backend is healthy , otherwise "Error: Server Error The server encountered a temporary error and could not complete your request. Please try again in 30 seconds." is thrown if backend is in non-healthy state;
if request's Host header doesn't match any host in Ingress spec, request is sent to l7-default-backend-xyz backend (not the one that is mentioned in Ingress config). That backend replies with: "default backend - 404" error .
Hope that makes it clear.

How to connect to rabbitmq service using load balancer hostname

The kubectl describe service the-load-balancer command returns:
Name: the-load-balancer
Namespace: default
Labels: app=the-app
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"the-app"},"name":"the-load-balancer","namespac...
Selector: app=the-app
Type: LoadBalancer
IP: 10.100.129.251
LoadBalancer Ingress: 1234567-1234567890.us-west-2.elb.amazonaws.com
Port: the-load-balancer 15672/TCP
TargetPort: 15672/TCP
NodePort: the-load-balancer 30080/TCP
Endpoints: 172.31.77.44:15672
Session Affinity: None
External Traffic Policy: Cluster
The RabbitMQ server that runs on another container, behind of load balancer is reachable from another container via the load balancer's Endpoints 172.31.77.44:15672.
But it fails to connect using the-load-balancer hostname or via its local 10.100.129.251 IP address.
What needs to be done in order to make the RabbitMQ service reachable via the load balancer's the-load-balancer hostname?
Edited later:
Running a simple Python test from another container:
import socket
print(socket.gethostbyname('the-load-balancer'))
returns a load balancer local IP 10.100.129.251.
Connecting to RabbitMQ using '172.31.18.32' works well:
import pika
credentials = pika.PlainCredentials('guest', 'guest')
parameters = pika.ConnectionParameters(host='172.31.18.32', port=5672, credentials=credentials)
connection = pika.BlockingConnection(parameters)
channel = connection.channel()
print('...channel: %s' % channel)
But after replacing the host='172.31.18.32' with host='the-load-balancer' or host='10.100.129.251' and the client fails to connect.

When serving RabbitMQ from behind the Load Balancer you will need to open the ports 5672 and 15672. When configured properly the kubectl describe service the-load-balancer command should return both ports mapped to a local IP address:
Name: the-load-balancer
Namespace: default
Labels: app=the-app
Selector: app=the-app
Type: LoadBalancer
IP: 10.100.129.251
LoadBalancer Ingress: 123456789-987654321.us-west-2.elb.amazonaws.com
Port: the-load-balancer-port-15672 15672/TCP
TargetPort: 15672/TCP
NodePort: the-load-balancer-port-15672 30080/TCP
Endpoints: 172.31.18.32:15672
Port: the-load-balancer-port-5672 5672/TCP
TargetPort: 5672/TCP
NodePort: the-load-balancer-port-5672 30081/TCP
Endpoints: 172.31.18.32:5672
Below is the the-load-balancer.yaml file used to create RabbitMQ service:
apiVersion: v1
kind: Service
metadata:
name: the-load-balancer
labels:
app: the-app
spec:
type: LoadBalancer
ports:
- port: 15672
nodePort: 30080
protocol: TCP
name: the-load-balancer-port-15672
- port: 5672
nodePort: 30081
protocol: TCP
name: the-load-balancer-port-5672
selector:
app: the-app

I've noticed that in your code, you are using port 5672 to talk to the endpoint directly, while it is 15672 in the service definition which is the port for web console?

Be sure that the load balancer service and rabbitmq are in the same namespace of your application.
If not, you have to use the full dns record service-x.namespace-b.svc.cluster.local, according to the DNS for Services and Pods documentation

AWS EKS: Service(LoadBalancer) running but not responding to requests

I am using AWS EKS.
I have launched my django app with help of gunicorn in kubernetes cluster.
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: api
labels:
app: api
type: web
spec:
replicas: 1
template:
metadata:
labels:
app: api
type: web
spec:
containers:
- name: vogofleet
image: xx.xx.com/api:image2
imagePullPolicy: Always
env:
- name: DATABASE_HOST
value: "test-db-2.xx.xx.xx.xx.com"
- name: DATABASE_PASSWORD
value: "xxxyyyxxx"
- name: DATABASE_USER
value: "admin"
- name: DATABASE_PORT
value: "5432"
- name: DATABASE_NAME
value: "test"
ports:
- containerPort: 9000
I have applied these changes and I can see my pod running in kubectl get pods
Now, I am trying to expose it via service object. Here is my service object,
# service
---
apiVersion: v1
kind: Service
metadata:
name: api
labels:
app: api
spec:
ports:
- port: 9000
protocol: TCP
targetPort: 9000
selector:
app: api
type: web
type: LoadBalancer
The service is also up and running. It has given me the external IP to access the service, which is the address of the load balancer. I can see that it has launched a new load balancer in the AWS console. But I am not able to access it from browser. It says that address didn't return any data. The ELB is showing the healthcheck on instances as OutOfService.
There are other pods also running in the cluster. When I run printenv in those pods, here is the result,
root#consumer-9444cf7cd-4dr5z:/consumer# printenv | grep API
API_PORT_9000_TCP_ADDR=172.20.140.213
API_SERVICE_HOST=172.20.140.213
API_PORT_9000_TCP_PORT=9000
API_PORT=tcp://172.20.140.213:9000
API_PORT_9000_TCP=tcp://172.20.140.213:9000
API_PORT_9000_TCP_PROTO=tcp
API_SERVICE_PORT=9000
And I tried to check connection to my api pod,
root#consumer-9444cf7cd-4dr5z:/consumer# telnet $API_PORT_9000_TCP_ADDR $API_PORT_9000_TCP_PORT
Trying 172.20.140.213...
telnet: Unable to connect to remote host: Connection refused
But, when I do port-forward to my localhost, I can access it on my localhost,
$ kubectl port-forward api-6d94dcb65d-br6px 9000
and check the connection,
$ nc -vz localhost 9000
found 0 associations
found 1 connections:
1: flags=82<CONNECTED,PREFERRED>
outif lo0
src ::1 port 53299
dst ::1 port 9000
rank info not available
TCP aux info available
Connection to localhost port 9000 [tcp/cslistener] succeeded!
Why am I not able to access it from other containers and from public internet? And, The security groups are correct.

I have the same problem. Here's the o/p of kubectl describe service command.
kubectl describe services nginx-elb
Name: nginx-elb
Namespace: default
Labels: deploy=slido
Annotations: service.beta.kubernetes.io/aws-load-balancer-internal: true
Selector: deploy=slido
Type: LoadBalancer
IP: 10.100.29.66
LoadBalancer Ingress: internal-a2d259057e6f94965bfc1f08cf86d4ce-884461987.us-west-2.elb.amazonaws.com
Port: http 80/TCP
TargetPort: 3000/TCP
NodePort: http 32582/TCP
Endpoints: 192.168.60.119:3000
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 119s service-controller Ensuring load balancer
Normal EnsuredLoadBalancer 117s service-controller Ensured load balancer

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js