Small question regarding Redis deployed in AWS (not AWS Elastic Cache) and an issue connecting to it.
Here is the setup of the Redis deployed in AWS: (pasting only the Kubernetes StatefulSet and Service)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: redis
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
initContainers:
- name: config
image: redis:7.0.5-alpine
command: [ "sh", "-c" ]
args:
- |
cp /tmp/redis/redis.conf /etc/redis/redis.conf
echo "finding master..."
MASTER_FDQN=`hostname -f | sed -e 's/redis-[0-9]\./redis-0./'`
if [ "$(redis-cli -h sentinel -p 5000 ping)" != "PONG" ]; then
echo "master not found, defaulting to redis-0"
if [ "$(hostname)" = "redis-0" ]; then
echo "this is redis-0, not updating config..."
else
echo "updating redis.conf..."
echo "slaveof $MASTER_FDQN 6379" >> /etc/redis/redis.conf
fi
else
echo "sentinel found, finding master"
MASTER="$(redis-cli -h sentinel -p 5000 sentinel get-master-addr-by-name mymaster | grep -E '(^redis-\d{1,})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})')"
echo "master found : $MASTER, updating redis.conf"
echo "slaveof $MASTER 6379" >> /etc/redis/redis.conf
fi
volumeMounts:
- name: redis-config
mountPath: /etc/redis/
- name: config
mountPath: /tmp/redis/
containers:
- name: redis
image: redis:7.0.5-alpine
command: ["redis-server"]
args: ["/etc/redis/redis.conf"]
ports:
- containerPort: 6379
name: redis
volumeMounts:
- name: data
mountPath: /data
- name: redis-config
mountPath: /etc/redis/
volumes:
- name: redis-config
emptyDir: {}
- name: config
configMap:
name: redis-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: nfs-1
resources:
requests:
storage: 50Mi
---
apiVersion: v1
kind: Service
metadata:
name: redis
spec:
ports:
- port: 6379
targetPort: 6379
name: redis
selector:
app: redis
type: LoadBalancer
The pods are healthy, I can exec into it and perform operations fine. Here is the get all:
NAME READY STATUS RESTARTS AGE
pod/redis-0 1/1 Running 0 22h
pod/redis-1 1/1 Running 0 22h
pod/redis-2 1/1 Running 0 22h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/redis LoadBalancer 192.168.45.55 10.51.5.2 6379:30315/TCP 26h
NAME READY AGE
statefulset.apps/redis 3/3 22h
Here is the describe of the service:
Name: redis
Namespace: Namespace
Labels: <none>
Annotations: <none>
Selector: app=redis
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 192.168.22.33
IPs: 192.168.22.33
LoadBalancer Ingress: 10.51.5.2
Port: redis 6379/TCP
TargetPort: 6379/TCP
NodePort: redis 30315/TCP
Endpoints: 192.xxx:6379,192.xxx:6379,192.xxx:6379
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal IPAllocated 68s metallb-controller Assigned IP ["10.51.5.2"]
Normal nodeAssigned 58s (x5 over 66s) metallb-speaker announcing from node "someaddress.com" with protocol "bgp"
Normal nodeAssigned 58s (x5 over 66s) metallb-speaker announcing from node "someaddress.com" with protocol "bgp"
I then try to connect to it, i.e. inserting some data with a very straightforward Spring Boot application. The application has no business logic, just trying to insert data.
Here are the relevant parts:
#Configuration
public class RedisConfiguration {
#Bean
public ReactiveRedisConnectionFactory reactiveRedisConnectionFactory() {
return new LettuceConnectionFactory("10.51.5.2", 30315);
}
#Repository
public class RedisRepository {
private final ReactiveRedisOperations<String, String> reactiveRedisOperations;
public RedisRepository(ReactiveRedisOperations<String, String> reactiveRedisOperations) {
this.reactiveRedisOperations = reactiveRedisOperations;
}
public Mono<RedisPojo> save(RedisPojo redisPojo) {
return reactiveRedisOperations.opsForValue().set(redisPojo.getInput(), redisPojo.getOutput()).map(__ -> redisPojo);
}
Each time I am trying to write the data, I am getting this exception:
2022-12-02T20:20:08.015+08:00 ERROR 1184 --- [ctor-http-nio-3] a.w.r.e.AbstractErrorWebExceptionHandler : [8f16a752-1] 500 Server Error for HTTP POST "/save"
org.springframework.data.redis.RedisConnectionFailureException: Unable to connect to Redis
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.translateException(LettuceConnectionFactory.java:1602) ~[spring-data-redis-3.0.0.jar:3.0.0]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ⇢ Handler com.redis.controller.RedisController#test(RedisRequest) [DispatcherHandler]
*__checkpoint ⇢ HTTP POST "/save" [ExceptionHandlingWebHandler]
Original Stack Trace:
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.translateException(LettuceConnectionFactory.java:1602) ~[spring-data-redis-3.0.0.jar:3.0.0]
Caused by: io.lettuce.core.RedisConnectionException: Unable to connect to 10.51.5.2/<unresolved>:30315
at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:78) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:56) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.AbstractRedisClient.getConnection(AbstractRedisClient.java:350) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.RedisClient.connect(RedisClient.java:216) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /10.51.5.2:30315
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.85.Final.jar:4.1.85.Final]
This is particularly puzzling, because I am quite sure the code of the Spring Boot app is working. When I change the IP of return new LettuceConnectionFactory("10.51.5.2", 30315);: to
a regular Redis on my laptop ("localhost", 6379),
a dockerized Redis on my laptop,
a dockerized Redis on prem, all are working fine.
Therefore, I am quite puzzled what did I do wrong with the setup of this Redis in AWS.
What should I do in order to connect to it properly.
May I get some help please?
Thank you
By default, Redis binds itself to the IP addresses 127.0.0.1 and ::1 and does not accept connections against non-local interfaces. Chances are high that this is your main issue and you may want to review your redis.conf file to bind Redis to the interface you need or to the generic * -::*, as explained in the comments of the config file itself (which I have linked above).
With that being said, Redis also does not accept connections on non-local interfaces if the default user has no password - a security layer named Protected mode. Thus you should either give your default user a password or disable protected mode in your redis.conf file.
Not sure if this applies to your case but, as a side note, I would suggest to always avoid exposing Redis to the Internet.
You are mixing 2 things.
To enable this service for pods in different namespaces you do not need external load balancer, you can just try to use redis.namespace-name:6379 dns name and it will just work. Such dns is there for every service you create (but works only inside kubernetes)
Kubernetes will make sure that your traffic will be routed to proper pods (assuming there is more than one).
If you want to expose redis from outside of kubernetes then you need to make sure there is connectivity from the outside and then you need network load balancer that will forward traffic to your kubernetes service (in your case node port, so you need NLB with eks worker nodes: 30315 as a targets)
If your worker nodes have public IP and their SecurityGroups allow connecting to them directly, you could try to connect to worker node's IP directly just to test things out (without LB).
And regardless off yout setup you can always create proxy via kubectl
kubectl port-forward -n redisNS svc/redis 6379:6379
and connect from spring boot app to localhost:6379
How do you want to connect from app to redis in a final setup?
Related
I'm trying to use docker-compose and kubernetes as two different solutions to setup a Django API served by Gunicorn (as the web server) and Nginx (as the reverse proxy). Here are the key files:
default.tmpl (nginx) - this is converted to default.conf when the environment variable is filled in:
upstream api {
server ${UPSTREAM_SERVER};
}
server {
listen 80;
location / {
proxy_pass http://api;
}
location /staticfiles {
alias /app/static/;
}
}
docker-compose.yaml:
version: '3'
services:
api-gunicorn:
build: ./api
command: gunicorn --bind=0.0.0.0:8000 api.wsgi:application
volumes:
- ./api:/app
api-proxy:
build: ./api-proxy
command: /bin/bash -c "envsubst < /etc/nginx/conf.d/default.tmpl > /etc/nginx/conf.d/default.conf && exec nginx -g 'daemon off;'"
environment:
- UPSTREAM_SERVER=api-gunicorn:8000
ports:
- 80:80
volumes:
- ./api/static:/app/static
depends_on:
- api-gunicorn
api-deployment.yaml (kubernetes):
apiVersion: apps/v1
kind: Deployment
metadata:
name: release-name-myapp-api-proxy
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: myapp-api-proxy
template:
metadata:
labels:
app.kubernetes.io/name: myapp-api-proxy
spec:
containers:
- name: myapp-api-gunicorn
image: "helm-django_api-gunicorn:latest"
imagePullPolicy: Never
command:
- "/bin/bash"
args:
- "-c"
- "gunicorn --bind=0.0.0.0:8000 api.wsgi:application"
- name: myapp-api-proxy
image: "helm-django_api-proxy:latest"
imagePullPolicy: Never
command:
- "/bin/bash"
args:
- "-c"
- "envsubst < /etc/nginx/conf.d/default.tmpl > /etc/nginx/conf.d/default.conf && exec nginx -g 'daemon off;'"
env:
- name: UPSTREAM_SERVER
value: 127.0.0.1:8000
volumeMounts:
- mountPath: /app/static
name: api-static-assets-on-host-mount
volumes:
- name: api-static-assets-on-host-mount
hostPath:
path: /Users/jonathan.metz/repos/personal/code-demos/kubernetes-demo/helm-django/api/static
My question involves the UPSTREAM_SERVER environment variable.
For docker-compose.yaml, the following values have worked for me:
Setting it to the name of the gunicorn service and the port it's running on (in this case api-gunicorn:8000). This is the best way to do it (and how I've done it in the docker-compose file above) because I don't need to expose the 8000 port to the host machine.
Setting it to MY_IP_ADDRESS:8000 as described in this SO post. This method requires me to expose the 8000 port, which is not ideal.
For api-deployment.yaml, only the following value has worked for me:
Setting it to localhost:8000. Inside of a pod, all containers can communicate using localhost.
Are there any other values for UPSTREAM_SERVER that work here, especially in the kubernetes file? I feel like I should be able to point to the container's name and that should work.
You could create a service to target container myapp-api-gunicorn but this will expose it outside of the pod:
apiVersion: v1
kind: Service
metadata:
name: api-gunicorn-service
spec:
selector:
app.kubernetes.io/name: myapp-api-proxy
ports:
- protocol: TCP
port: 8000
targetPort: 8000
You might also use hostname and subdomain fields inside a pod to take advantage of FQDN.
Currently when a pod is created, its hostname is the Pod’s metadata.name value.
The Pod spec has an optional hostname field, which can be used to specify the Pod’s hostname. When specified, it takes precedence over the Pod’s name to be the hostname of the pod. For example, given a Pod with hostname set to “my-host”, the Pod will have its hostname set to “my-host”.
The Pod spec also has an optional subdomain field which can be used to specify its subdomain. For example, a Pod with hostname set to “foo”, and subdomain set to “bar”, in namespace “my-namespace”, will have the fully qualified domain name (FQDN) “foo.bar.my-namespace.svc.cluster-domain.example”.
Also here is a nice article from Mirantis which talks about exposing multiple containers in a pod
Folks,
What problem now still persists:
I have now gone beyond the code getting stuck on CrashLoopBackOff by fixing the Dockerfile run command as suggested by Emil Gi, however the external IP is not forwarding to my pod library app server
Status
Fixed port to 8080 in Dockerfile and ensured it is consistent across
Made sure Dockerfile has proper commands so that it doesn't terminate immediately post startup, this was what was causing the CrashLoop Back
Problem is still that the load balancer external IP I click on gives this error "This site can’t be reached34.93.141.11 refused to connect."
Original Question:
How do I resolve this CrashLoopBackOff? I looked at many docs and tried debugging but unsure what is causing this? The app runs perfectly in local mode, it even deploys smoothly into appengine standard, but GKE nope. Any pointers to debug this further most appreciated.
Problem: The cloudsql proxy container is running, but the library-app container is having CrashLoopBackOff error. The pod was assigned to a node, starts pulling the images, starting the images, and then it goes into this BackOff state.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
library-7699b84747-9skst 1/2 CrashLoopBackOff 28 121m
$ kubectl logs library-7699b84747-9skst
Error from server (BadRequest): a container name must be specified for pod library-7699b84747-9skst, choose one of: [library-app cloudsql-proxy]
$ kubectl describe pods library-7699b84747-9skst
Name: library-7699b84747-9skst
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-library-default-pool-35b5943a-ps5v/10.160.0.13
Start Time: Fri, 06 Dec 2019 09:34:11 +0530
Labels: app=library
pod-template-hash=7699b84747
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container library-app; cpu request for container cloudsql-proxy
Status: Running
IP: 10.16.0.10
Controlled By: ReplicaSet/library-7699b84747
Containers:
library-app:
Container ID: docker://e7d8aac3dff318de34f750c3f1856cd754aa96a7203772de748b3e397441a609
Image: gcr.io/library-259506/library
Image ID: docker-pullable://gcr.io/library-259506/library#sha256:07f54e055621ab6ddcbb49666984501cf98c95133bcf7405ca076322fb0e4108
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 06 Dec 2019 09:35:07 +0530
Finished: Fri, 06 Dec 2019 09:35:07 +0530
Ready: False
Restart Count: 2
Requests:
cpu: 100m
Environment:
DATABASE_USER: <set to the key 'username' in secret 'cloudsql'> Optional: false
DATABASE_PASSWORD: <set to the key 'password' in secret 'cloudsql'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kj497 (ro)
cloudsql-proxy:
Container ID: docker://352284231e7f02011dd1ab6999bf9a283b334590435278442e9a04d4d0684405
Image: gcr.io/cloudsql-docker/gce-proxy:1.16
Image ID: docker-pullable://gcr.io/cloudsql-docker/gce-proxy#sha256:7d302c849bebee8a3fc90a2705c02409c44c91c813991d6e8072f092769645cf
Port: <none>
Host Port: <none>
Command:
/cloud_sql_proxy
--dir=/cloudsql
-instances=library-259506:asia-south1:library=tcp:3306
-credential_file=/secrets/cloudsql/credentials.json
State: Running
Started: Fri, 06 Dec 2019 09:34:51 +0530
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/cloudsql from cloudsql (rw)
/etc/ssl/certs from ssl-certs (rw)
/secrets/cloudsql from cloudsql-oauth-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kj497 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cloudsql-oauth-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-oauth-credentials
Optional: false
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs
HostPathType:
cloudsql:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-kj497:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-kj497
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 86s default-scheduler Successfully assigned default/library-7699b84747-9skst to gke-library-default-pool-35b5943a-ps5v
Normal Pulling 50s kubelet, gke-library-default-pool-35b5943a-ps5v pulling image "gcr.io/cloudsql-docker/gce-proxy:1.16"
Normal Pulled 47s kubelet, gke-library-default-pool-35b5943a-ps5v Successfully pulled image "gcr.io/cloudsql-docker/gce-proxy:1.16"
Normal Created 46s kubelet, gke-library-default-pool-35b5943a-ps5v Created container
Normal Started 46s kubelet, gke-library-default-pool-35b5943a-ps5v Started container
Normal Pulling 2s (x4 over 85s) kubelet, gke-library-default-pool-35b5943a-ps5v pulling image "gcr.io/library-259506/library"
Normal Created 1s (x4 over 50s) kubelet, gke-library-default-pool-35b5943a-ps5v Created container
Normal Started 1s (x4 over 50s) kubelet, gke-library-default-pool-35b5943a-ps5v Started container
Normal Pulled 1s (x4 over 52s) kubelet, gke-library-default-pool-35b5943a-ps5v Successfully pulled image "gcr.io/library-259506/library"
Warning BackOff 1s (x5 over 43s) kubelet, gke-library-default-pool-35b5943a-ps5v Back-off restarting failed container
Here is the library.yaml file I have to go with it.
# [START kubernetes_deployment]
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: library
labels:
app: library
spec:
replicas: 2
template:
metadata:
labels:
app: library
spec:
containers:
- name: library-app
# Replace with your project ID or use `make template`
image: gcr.io/library-259506/library
# This setting makes nodes pull the docker image every time before
# starting the pod. This is useful when debugging, but should be turned
# off in production.
imagePullPolicy: Always
env:
# [START cloudsql_secrets]
- name: DATABASE_USER
valueFrom:
secretKeyRef:
name: cloudsql
key: username
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: cloudsql
key: password
# [END cloudsql_secrets]
ports:
- containerPort: 8080
# [START proxy_container]
- image: gcr.io/cloudsql-docker/gce-proxy:1.16
name: cloudsql-proxy
command: ["/cloud_sql_proxy", "--dir=/cloudsql",
"-instances=library-259506:asia-south1:library=tcp:3306",
"-credential_file=/secrets/cloudsql/credentials.json"]
volumeMounts:
- name: cloudsql-oauth-credentials
mountPath: /secrets/cloudsql
readOnly: true
- name: ssl-certs
mountPath: /etc/ssl/certs
- name: cloudsql
mountPath: /cloudsql
# [END proxy_container]
# [START volumes]
volumes:
- name: cloudsql-oauth-credentials
secret:
secretName: cloudsql-oauth-credentials
- name: ssl-certs
hostPath:
path: /etc/ssl/certs
- name: cloudsql
emptyDir:
# [END volumes]
# [END kubernetes_deployment]
---
# [START service]
# The library-svc service provides a load-balancing proxy over the polls app
# pods. By specifying the type as a 'LoadBalancer', Container Engine will
# create an external HTTP load balancer.
# The service directs traffic to the deployment by matching the service's selector to the deployment's label
#
# For more information about external HTTP load balancing see:
# https://cloud.google.com/container-engine/docs/load-balancer
apiVersion: v1
kind: Service
metadata:
name: library-svc
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
selector:
app: library
# [END service]
More error status
Container 'library-app' keeps crashing.
CrashLoopBackOff
Reason
Container 'library-app' keeps crashing.
Check Pod's logs to see more details. Learn more
Source
library-7699b84747-9skst
Conditions
Initialized: True Ready: False ContainersReady: False PodScheduled: True
- lastProbeTime: null
lastTransitionTime: "2019-12-06T06:03:43Z"
message: 'containers with unready status: [library-app]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
Key Events
Back-off restarting failed container BackOff Dec 6, 2019, 9:34:54
AM Dec 6, 2019, 12:24:26 PM 779 pulling image
"gcr.io/library-259506/library" Pulling Dec 6, 2019, 9:34:12 AM Dec 6,
2019, 11:59:26 AM 34
The Dockerfile is as follows (this fixed the CrashLoop btw):
FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
COPY requirements.txt /code/
RUN pip install -r requirements.txt
COPY . /code/
# Server
EXPOSE 8080
STOPSIGNAL SIGINT
ENTRYPOINT ["python", "manage.py"]
CMD ["runserver", "0.0.0.0:8080"]
I think a bunch of things all came together
I found the password to db had a special character that needed to be put within quotes and then ensuring port # where accurate across the Dockerfile, library.yaml files. This ensured the secrets actually worked, I detected in the logs a password mismatch issue.
IMPORTANT: the command line fix Emil G about ensuring my Dockerfile doesn't exit quickly, so make sure the CMD actually works and runs your server.
IMPORTANT: Finally I found a fix to the external IP not connecting to my server, see this thread where I explain what went wrong: basically I needed a security context where I had to fix the runAs to not run as root: RunAsUser issue & Clicking external IP of load balancer -> Bad Request (400) on deploying Django app on GKE (Kubernetes) and db connection failing:
I also documented all steps to deploy step 1-15 and
Trying to curl the service deployed in k8s cluster from the master node
curl: (7) Failed to connect to localhost port 31796: Connection
refused
For kubernetes cluster, when I check my iptables on master I get the following .
Chain KUBE-SERVICES (1 references)
target prot opt source destination
REJECT tcp -- anywhere 10.100.94.202 /*
default/some-service: has no endpoints */ tcp dpt:9015 reject-with
icmp-port-unreachable
REJECT tcp -- anywhere 10.103.64.79 /*
default/some-service: has no endpoints */ tcp dpt:9000 reject-with
icmp-port-unreachable
REJECT tcp -- anywhere 10.107.111.252 /*
default/some-service: has no endpoints */ tcp dpt:9015 reject-with
icmp-port-unreachable
if I flush my iptables with
iptables -F
and then curl
curl -v localhost:31796
I get the following
* Rebuilt URL to: localhost:31796/
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 31796 (#0)
> GET / HTTP/1.1
> Host: localhost:31796
> User-Agent: curl/7.58.0
> Accept: */*
but soon after it results in
* Rebuilt URL to: localhost:31796/
* Trying 127.0.0.1...
* TCP_NODELAY set
* connect to 127.0.0.1 port 31796 failed: Connection refused
* Failed to connect to localhost port 31796: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 31796: Connection
refused
I'm using the nodePort concept in my service
Details
kubectl get node
NAME STATUS ROLES AGE VERSION
ip-Master-IP Ready master 26h v1.12.7
ip-Node1-ip Ready <none> 26h v1.12.7
ip-Node2-ip Ready <none> 23h v1.12.7
Kubectl get pods
NAME READY STATUS RESTARTS AGE
config-service-7dc8fc4ff-5kk88 1/1 Running 0 5h49m
kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
AGE SELECTOR
cadmin-server NodePort 10.109.55.255 <none>
9015:31796/TCP 22h app=config-service
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP
26h <none>
Kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
endpoint.yml
apiVersion: v1
kind: Endpoints
metadata:
name: xyz
subsets:
- addresses:
- ip: node1_ip
- ip: node2_ip
ports:
- port: 31796
- name: xyz
service.yml
apiVersion: v1
kind: Service
metadata:
name: xyz
namespace: default
annotations:
alb.ingress.kubernetes.io/healthcheck-path: /xyz
labels:
app: xyz
spec:
type: NodePort
ports:
- nodePort: 31796
port: 8001
targetPort: 8001
protocol: TCP
selector:
app: xyz
deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: xyz
name: xyz
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: xyz
template:
metadata:
labels:
app: xyz
spec:
containers:
- name: xyz
image: abc
ports:
- containerPort: 8001
imagePullPolicy: Always
resources:
requests:
cpu: 200m
volumeMounts:
- mountPath: /app/
name: config-volume
restartPolicy: Always
imagePullSecrets:
- name: awslogin
volumes:
- configMap:
name: xyz
name: config-volume
You can run the following command to check endpoints.
kubectl get endpoints
If endpoint is not showing up for the service. Please check the yml files that you used for creating the loadbalancer and the deployment. Make sure the labels match.
As many have pointed out in their comments the Firewall Rule "no endpoints" is inserted by the kubelet service and indicates a broken Service Application Definition or Setup.
# iptables-save
# Generated by iptables-save v1.4.21 on Wed Feb 24 10:10:23 2021
*filter
# [...]
-A KUBE-EXTERNAL-SERVICES -p tcp -m comment --comment "default/web-service:http has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 30081 -j REJECT --reject-with icmp-port-unreachable
# [...]
As you have noticed as well the service kubelet constantly monitors the Firewall Rules and inserts or deletes rules dynamically according to the Kubernetes Pod or Service definitions.
# kubectl get service --namespace=default
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 198d
web-service NodePort 10.111.188.199 <none> 8201:30081/TCP 194d
# kubectl get pods --namespace=default
No resources found in default namespace.
In this example case a Service is defined but the Pod associated with the Service does not exist.
Still the kube-proxy process listens on the port 30081:
# netstat -lpn | grep -i kube
[...]
tcp 0 0 0.0.0.0:30081 0.0.0.0:* LISTEN 21542/kube-proxy
[...]
So the kubelet service inserts a firewall rule to prevent the traffic for the broken service.
Also the kubelet service will delete this rule as soon as you delete the Service definition
# kubectl delete service web-service --namespace=default
service "web-service" deleted
# iptables-save | grep -i "no endpoints" | wc -l
0
As a Side Node:
This rule is also inserted for Kubernetes Definitions that the kubelet Service doesn't like.
As an example your service can have the name "log-service" but can't have the name "web-log".
In the latter case the kubelet Service didn't give a warning but inserted this blocking rule
So I have an EKS cluster, and have set up the AWS Alb Ingress Controller:
https://github.com/kubernetes-sigs/aws-alb-ingress-controller
I'm trying to set up Grafana here, and the Ingress is created but it doesn't seem to resolve at all.
I have the follow Ingress:
$ kubectl describe ingress grafana
Name: grafana
Namespace: orbix-mvp
Address: 4ae1e4ba-orbixmvp-grafana-fd7d-993303634.eu-central-1.elb.amazonaws.com
Default backend: default-http-backend:80 (<none>)
Rules:
Host Path Backends
---- ---- --------
grafana-orbix.orbixpay.com
/ grafana:80 (<none>)
Annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-2016-08
alb.ingress.kubernetes.io/subnets: subnet-08431d96168e36c30,subnet-0e2a7e2766852bf8a
alb.ingress.kubernetes.io/success-codes: 302
kubernetes.io/ingress.class: alb
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CREATE 45m alb-ingress-controller LoadBalancer 4ae1e4ba-orbixmvp-grafana-fd7d created, ARN: arn:aws:elasticloadbalancing:eu-central-1:109153834985:loadbalancer/app/4ae1e4ba-orbixmvp-grafana-fd7d/4b98cb7027b71697
Normal CREATE 45m alb-ingress-controller rule 1 created with conditions [{ Field: "host-header", Values: ["grafana-orbix.orbixpay.com"] },{ Field: "path-pattern", Values: ["/"] }]
The backend fro it is the following service:
$ kubectl describe service grafana
Name: grafana
Namespace: orbix-mvp
Labels: app=grafana
chart=grafana-1.25.1
heritage=Tiller
release=grafana
Annotations: <none>
Selector: app=grafana,release=grafana
Type: NodePort
IP: 172.20.11.232
Port: service 80/TCP
TargetPort: 3000/TCP
NodePort: service 30772/TCP
Endpoints: 10.0.0.180:3000
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
It does have a proper endpoint:
$ kubectl get endpoints | grep grafana
grafana 10.0.0.180:3000 46m
The pod itself is properly tagged and has the correct IP that's the endpoint above:
$ kubectl describe pod grafana-bdc977fd4-ptzhg
Name: grafana-bdc977fd4-ptzhg
Namespace: orbix-mvp
Priority: 0
PriorityClassName: <none>
Node: ip-10-0-0-230.eu-central-1.compute.internal/10.0.0.230
Start Time: Mon, 11 Feb 2019 13:24:43 +0200
Labels: app=grafana
pod-template-hash=687533980
release=grafana
Annotations: <none>
Status: Running
IP: 10.0.0.180
My AWS account has the LoadBalancer listed as Active, the subnets are on the same VPC as the cluster, security groups are being generated by the Ingress Controller.
Everything seems to be set up properly, however when I access the LoadBalancer address, it just times out.
$ kubectl get ingresses
NAME HOSTS ADDRESS PORTS AGE
grafana grafana-orbix.orbixpay.com 4ae1e4ba-orbixmvp-grafana-fd7d-993303634.eu-central-1.elb.amazonaws.com 80 49m
I actually figured it out - the Ingress configuration was allowing for traffic for the domain only. That excludes traffic to the load balancer address (which I assumed is allowed by default).
Basically it needs to be allowed for * in order for the Load Balancer URL to work too. Also, if the app redirects to /login like in my case, all paths need to be allowed too, since that redirect doesn't work if the path specified is for / only.
Followed the installation guide to setup cluster: https://s3.amazonaws.com/quickstart-reference/redhat/openshift/latest/doc/red-hat-openshift-on-the-aws-cloud.pdf
I'm able to get the public DNS name for a service in Kubernetes but not in Openshift. It is very basic thing, I dont know why it is not working?. I'm attaching manifest files that are used to create app and server. It is not working openshift.
prometheus-configmap.yml
prometheus-rbac.yml
prometheus-deployment.yml
In K8s
kubectl apply -f prometheus-configmap.yml
kubectl apply -f prometheus-rbac.yml
kubectl apply -f prometheus-deployment.yml
veeru#ultron:~/prometheus-k8s-monitoring$ kubectl describe svc prometheus-test
Name: prometheus-test
Namespace: default
Labels: name=prometheus-test
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/scrape":"true"},"labels":{"name":"prometheus-test"},"name":"prometheus-te...
prometheus.io/scrape=true
Selector: app=prometheus-test
Type: LoadBalancer
IP: 100.xx.xx.xx
LoadBalancer Ingress: xxxxx-1679955855.us-east-2.elb.amazonaws.com
Port: prometheus-test 9090/TCP
TargetPort: 9090/TCP
NodePort: prometheus-test 31558/TCP
Endpoints: 100.xx.xx.xx:9090
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 9m service-controller Ensuring load balancer
Normal EnsuredLoadBalancer 9m service-controller Ensured load balancer
In above you can see that I got the LoadBalancer Ingress with public DNS name.
In Openshift
kubectl apply -f prometheus-configmap.yml
kubectl apply -f prometheus-rbac.yml
kubectl apply -f prometheus-deployment.yml
root#ultron:/home/veeru/prometheus-k8s-monitoring# oc describe svc prometheus-test
Name: prometheus-test
Namespace: spinnaker
Labels: name=prometheus-test
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/scrape":"true"},"labels":{"name":"prometheus-test"},"name":"prometheus-te...
prometheus.io/scrape=true
Selector: app=prometheus-test
Type: LoadBalancer
IP: 172.30.134.153
Port: prometheus-test 9090/TCP
NodePort: prometheus-test 31667/TCP
Endpoints: <none>
Session Affinity: None
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
10m 36s 8 service-controller Normal CreatingLoadBalancer Creating load balancer
10m 36s 8 service-controller Warning CreatingLoadBalancerFailed Error creating load balancer (will retry): Failed to create load balancer for service spinnaker/prometheus-test: could not find any suitable subnets for creating the ELB
You can see the status failed to create load balancer for service
If I specify annotation like --> service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
Then I'm able get the "internal" DNS name for service
root#ultron:/home/veeru/prometheus-k8s-monitoring# oc describe svc test4-dev
Name: test4-dev
Namespace: default
Labels: <none>
Annotations: service.beta.kubernetes.io/aws-load-balancer-internal=0.0.0.0/0
Selector: load-balancer-test4-dev=true
Type: LoadBalancer
IP: 172.30.177.217
LoadBalancer Ingress: internal-xxxxx-298335522.us-east-2.elb.amazonaws.com
Port: http 8080/TCP
TargetPort: 8080/TCP
NodePort: http 31595/TCP
Endpoints: 10.131.0.75:8080
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreatingLoadBalancer 1m (x208 over 16h) service-controller Creating load balancer
Openshift is not using AWS ELB to create public DNS name?.
Ok, instead of relying on AWS load balancer to provide public DNS name. I configured subdomain in /etc/openshift/master/master-config.yaml.
Create A recode(Wildcard DNS); *.cluster.example.com -> Your master IP
Specify in /etc/openshift/master/master-config.yaml
routingConfig:
subdomain: cluster.example.com
serviceAccountConfig
Restart daemans
systemctl restart atomic-openshift-master-api atomic-openshift-master-controllers
After this you should able to create Openshift Route
Resources:
https://docs.openshift.com/container-platform/3.7/install_config/router/default_haproxy_router.html#customizing-the-default-routing-subdomain
https://docs.openshift.com/container-platform/3.7/install_config/install/prerequisites.html#wildcard-dns-prereq
https://access.redhat.com/solutions/2081043