akka.management.cluster.bootstrap.internal.HttpContactPointBootstrap - Probing failed. How can it be solved? - akka

How can I solve this problem -- I am trying to run akka cluster on minikube. But failed to create a cluster.
17:46:49.093 [appka-akka.actor.default-dispatcher-12] WARN akka.management.cluster.bootstrap.internal.HttpContactPointBootstrap - Probing [http://172-17-0-3.default.pod.cluster.local:8558/bootstrap/seed-nodes] failed due to: Tcp command [Connect(172-17-0-3.default.pod.cluster.local:8558,None,List(),Some(10 seconds),true)] failed because of java.net.ConnectException: Connection refused
My config is --
akka {
actor {
provider = cluster
}
cluster {
shutdown-after-unsuccessful-join-seed-nodes = 60s
}
coordinated-shutdown.exit-jvm = on
management {
cluster.bootstrap {
contact-point-discovery {
discovery-method = kubernetes-api
}
}
}
}
my yaml
kind: Deployment
metadata:
labels:
app: appka
name: appka
spec:
replicas: 2
selector:
matchLabels:
app: appka
template:
metadata:
labels:
app: appka
spec:
containers:
- name: appka
image: akkacluster:latest
imagePullPolicy: Never
readinessProbe:
httpGet:
path: /ready
port: management
periodSeconds: 10
failureThreshold: 10
initialDelaySeconds: 20
livenessProbe:
httpGet:
path: /alive
port: management
periodSeconds: 10
failureThreshold: 10
initialDelaySeconds: 20
ports:
- name: management
containerPort: 8558
protocol: TCP
- name: http
containerPort: 8080
protocol: TCP
- name: remoting
containerPort: 25520
protocol: TCP
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: read-pods
subjects:
- kind: User
name: system:serviceaccount:default:default
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
Unfortunately my cluster is not formaing---
kubectl logs pod/appka-7c4b7df7f7-5v7cc
17:46:32.026 [appka-akka.actor.default-dispatcher-3] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
SLF4J: A number (1) of logging calls during the initialization phase have been intercepted and are
SLF4J: now being replayed. These are subject to the filtering rules of the underlying logging system.
SLF4J: See also http://www.slf4j.org/codes.html#replay
17:46:33.644 [appka-akka.actor.default-dispatcher-3] INFO akka.remote.artery.tcp.ArteryTcpTransport - Remoting started with transport [Artery tcp]; listening on address [akka://appka#172.17.0.4:25520] with UID [-8421566647681174079]
17:46:33.811 [appka-akka.actor.default-dispatcher-3] INFO akka.cluster.Cluster - Cluster Node [akka://appka#172.17.0.4:25520] - Starting up, Akka version [2.6.14] ...
17:46:34.491 [appka-akka.actor.default-dispatcher-3] INFO akka.cluster.Cluster - Cluster Node [akka://appka#172.17.0.4:25520] - Registered cluster JMX MBean [akka:type=Cluster]
17:46:34.512 [appka-akka.actor.default-dispatcher-3] INFO akka.cluster.Cluster - Cluster Node [akka://appka#172.17.0.4:25520] - Started up successfully
17:46:34.883 [appka-akka.actor.default-dispatcher-3] INFO akka.cluster.Cluster - Cluster Node [akka://appka#172.17.0.4:25520] - No downing-provider-class configured, manual cluster downing required, see https://doc.akka.io/docs/akka/current/typed/cluster.html#downing
17:46:34.884 [appka-akka.actor.default-dispatcher-3] INFO akka.cluster.Cluster - Cluster Node [akka://appka#172.17.0.4:25520] - No seed nodes found in configuration, relying on Cluster Bootstrap for joining
17:46:39.084 [appka-akka.actor.default-dispatcher-11] INFO akka.management.internal.HealthChecksImpl - Loading readiness checks [(cluster-membership,akka.management.cluster.scaladsl.ClusterMembershipCheck), (sharding,akka.cluster.sharding.ClusterShardingHealthCheck)]
17:46:39.090 [appka-akka.actor.default-dispatcher-11] INFO akka.management.internal.HealthChecksImpl - Loading liveness checks []
17:46:39.104 [appka-akka.actor.default-dispatcher-3] INFO ClusterListenerActor$ - started actor akka://appka/user - (class akka.actor.typed.internal.adapter.ActorRefAdapter)
17:46:39.888 [appka-akka.actor.default-dispatcher-3] INFO akka.management.scaladsl.AkkaManagement - Binding Akka Management (HTTP) endpoint to: 172.17.0.4:8558
17:46:40.525 [appka-akka.actor.default-dispatcher-3] INFO akka.management.scaladsl.AkkaManagement - Including HTTP management routes for ClusterHttpManagementRouteProvider
17:46:40.806 [appka-akka.actor.default-dispatcher-3] INFO akka.management.scaladsl.AkkaManagement - Including HTTP management routes for ClusterBootstrap
17:46:40.821 [appka-akka.actor.default-dispatcher-3] INFO akka.management.cluster.bootstrap.ClusterBootstrap - Using self contact point address: http://172.17.0.4:8558
17:46:40.914 [appka-akka.actor.default-dispatcher-3] INFO akka.management.scaladsl.AkkaManagement - Including HTTP management routes for HealthCheckRoutes
17:46:44.198 [appka-akka.actor.default-dispatcher-3] INFO akka.management.cluster.bootstrap.ClusterBootstrap - Initiating bootstrap procedure using kubernetes-api method...
17:46:44.200 [appka-akka.actor.default-dispatcher-3] INFO akka.management.cluster.bootstrap.ClusterBootstrap - Bootstrap using `akka.discovery` method: kubernetes-api
17:46:44.226 [appka-akka.actor.default-dispatcher-3] INFO akka.management.scaladsl.AkkaManagement - Bound Akka Management (HTTP) endpoint to: 172.17.0.4:8558
17:46:44.487 [appka-akka.actor.default-dispatcher-6] INFO akka.management.cluster.bootstrap.internal.BootstrapCoordinator - Locating service members. Using discovery [akka.discovery.kubernetes.KubernetesApiServiceDiscovery], join decider [akka.management.cluster.bootstrap.LowestAddressJoinDecider], scheme [http]
17:46:44.490 [appka-akka.actor.default-dispatcher-6] INFO akka.management.cluster.bootstrap.internal.BootstrapCoordinator - Looking up [Lookup(appka,None,Some(tcp))]
17:46:44.493 [appka-akka.actor.default-dispatcher-6] INFO akka.discovery.kubernetes.KubernetesApiServiceDiscovery - Querying for pods with label selector: [app=appka]. Namespace: [default]. Port: [None]
17:46:45.626 [appka-akka.actor.default-dispatcher-12] INFO akka.management.cluster.bootstrap.internal.BootstrapCoordinator - Looking up [Lookup(appka,None,Some(tcp))]
17:46:45.627 [appka-akka.actor.default-dispatcher-12] INFO akka.discovery.kubernetes.KubernetesApiServiceDiscovery - Querying for pods with label selector: [app=appka]. Namespace: [default]. Port: [None]
17:46:48.428 [appka-akka.actor.default-dispatcher-13] INFO akka.management.cluster.bootstrap.internal.BootstrapCoordinator - Located service members based on: [Lookup(appka,None,Some(tcp))]: [ResolvedTarget(172-17-0-4.default.pod.cluster.local,None,Some(/172.17.0.4)), ResolvedTarget(172-17-0-3.default.pod.cluster.local,None,Some(/172.17.0.3))], filtered to [172-17-0-4.default.pod.cluster.local:0, 172-17-0-3.default.pod.cluster.local:0]
17:46:48.485 [appka-akka.actor.default-dispatcher-22] INFO akka.management.cluster.bootstrap.internal.BootstrapCoordinator - Located service members based on: [Lookup(appka,None,Some(tcp))]: [ResolvedTarget(172-17-0-4.default.pod.cluster.local,None,Some(/172.17.0.4)), ResolvedTarget(172-17-0-3.default.pod.cluster.local,None,Some(/172.17.0.3))], filtered to [172-17-0-4.default.pod.cluster.local:0, 172-17-0-3.default.pod.cluster.local:0]
17:46:48.586 [appka-akka.actor.default-dispatcher-12] INFO akka.management.cluster.bootstrap.LowestAddressJoinDecider - Discovered [2] contact points, confirmed [0], which is less than the required [2], retrying
17:46:49.092 [appka-akka.actor.default-dispatcher-12] WARN akka.management.cluster.bootstrap.internal.HttpContactPointBootstrap - Probing [http://172-17-0-4.default.pod.cluster.local:8558/bootstrap/seed-nodes] failed due to: Tcp command [Connect(172-17-0-4.default.pod.cluster.local:8558,None,List(),Some(10 seconds),true)] failed because of java.net.ConnectException: Connection refused
17:46:49.093 [appka-akka.actor.default-dispatcher-12] WARN akka.management.cluster.bootstrap.internal.HttpContactPointBootstrap - Probing [http://172-17-0-3.default.pod.cluster.local:8558/bootstrap/seed-nodes] failed due to: Tcp command [Connect(172-17-0-3.default.pod.cluster.local:8558,None,List(),Some(10 seconds),true)] failed because of java.net.ConnectException: Connection refused
17:46:49.603 [appka-akka.actor.default-dispatcher-22] INFO akka.management.cluster.bootstrap.LowestAddressJoinDecider - Discovered [2] contact points, confirmed [0], which is less than the required [2], retrying
17:46:49.682 [appka-akka.actor.default-dispatcher-21] INFO akka.management.cluster.bootstrap.internal.BootstrapCoordinator - Looking up [Lookup(appka,None,Some(tcp))]
17:46:49.683 [appka-akka.actor.default-dispatcher-21] INFO akka.discovery.kubernetes.KubernetesApiServiceDiscovery - Querying for pods with label selector: [app=appka]. Namespace: [default]. Port: [None]
17:46:49.726 [appka-akka.actor.default-dispatcher-12] INFO akka.management.cluster.bootstrap.internal.BootstrapCoordinator - Located service members based on: [Lookup(appka,None,Some(tcp))]: [ResolvedTarget(172-17-0-4.default.pod.cluster.local,None,Some(/172.17.0.4)), ResolvedTarget(172-17-0-3.default.pod.cluster.local,None,Some(/172.17.0.3))], filtered to [172-17-0-4.default.pod.cluster.local:0, 172-17-0-3.default.pod.cluster.local:0]
17:46:50.349 [appka-akka.actor.default-dispatcher-21] WARN akka.management.cluster.bootstrap.internal.HttpContactPointBootstrap - Probing [http://172-17-0-3.default.pod.cluster.local:8558/bootstrap/seed-nodes] failed due to: Tcp command [Connect(172-17-0-3.default.pod.cluster.local:8558,None,List(),Some(10 seconds),true)] failed because of java.net.ConnectException: Connection refused
17:46:50.504 [appka-akka.actor.default-dispatcher-11] WARN akka.management.cluster.bootstrap.internal.HttpContactPointBootstrap - Probing [http://172-17-0-4.default.pod.cluster.local:8558/bootstrap/seed-nodes] failed due to: Tcp command [Connect(172-17-0-4.default.pod.cluster.local:8558,None,List(),Some(10 seconds),true)] failed because of java.net.ConnectException: Connection refused

You are missing akka.remote setting block. Something like:
akka {
actor {
# provider=remote is possible, but prefer cluster
provider = cluster
}
remote {
artery {
transport = tcp # See Selecting a transport below
canonical.hostname = "127.0.0.1"
canonical.port = 25520
}
}
}

Related

Redis deployed in AWS - Connection time out from localhost SpringBoot app

Small question regarding Redis deployed in AWS (not AWS Elastic Cache) and an issue connecting to it.
Here is the setup of the Redis deployed in AWS: (pasting only the Kubernetes StatefulSet and Service)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: redis
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
initContainers:
- name: config
image: redis:7.0.5-alpine
command: [ "sh", "-c" ]
args:
- |
cp /tmp/redis/redis.conf /etc/redis/redis.conf
echo "finding master..."
MASTER_FDQN=`hostname -f | sed -e 's/redis-[0-9]\./redis-0./'`
if [ "$(redis-cli -h sentinel -p 5000 ping)" != "PONG" ]; then
echo "master not found, defaulting to redis-0"
if [ "$(hostname)" = "redis-0" ]; then
echo "this is redis-0, not updating config..."
else
echo "updating redis.conf..."
echo "slaveof $MASTER_FDQN 6379" >> /etc/redis/redis.conf
fi
else
echo "sentinel found, finding master"
MASTER="$(redis-cli -h sentinel -p 5000 sentinel get-master-addr-by-name mymaster | grep -E '(^redis-\d{1,})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})')"
echo "master found : $MASTER, updating redis.conf"
echo "slaveof $MASTER 6379" >> /etc/redis/redis.conf
fi
volumeMounts:
- name: redis-config
mountPath: /etc/redis/
- name: config
mountPath: /tmp/redis/
containers:
- name: redis
image: redis:7.0.5-alpine
command: ["redis-server"]
args: ["/etc/redis/redis.conf"]
ports:
- containerPort: 6379
name: redis
volumeMounts:
- name: data
mountPath: /data
- name: redis-config
mountPath: /etc/redis/
volumes:
- name: redis-config
emptyDir: {}
- name: config
configMap:
name: redis-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: nfs-1
resources:
requests:
storage: 50Mi
---
apiVersion: v1
kind: Service
metadata:
name: redis
spec:
ports:
- port: 6379
targetPort: 6379
name: redis
selector:
app: redis
type: LoadBalancer
The pods are healthy, I can exec into it and perform operations fine. Here is the get all:
NAME READY STATUS RESTARTS AGE
pod/redis-0 1/1 Running 0 22h
pod/redis-1 1/1 Running 0 22h
pod/redis-2 1/1 Running 0 22h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/redis LoadBalancer 192.168.45.55 10.51.5.2 6379:30315/TCP 26h
NAME READY AGE
statefulset.apps/redis 3/3 22h
Here is the describe of the service:
Name: redis
Namespace: Namespace
Labels: <none>
Annotations: <none>
Selector: app=redis
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 192.168.22.33
IPs: 192.168.22.33
LoadBalancer Ingress: 10.51.5.2
Port: redis 6379/TCP
TargetPort: 6379/TCP
NodePort: redis 30315/TCP
Endpoints: 192.xxx:6379,192.xxx:6379,192.xxx:6379
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal IPAllocated 68s metallb-controller Assigned IP ["10.51.5.2"]
Normal nodeAssigned 58s (x5 over 66s) metallb-speaker announcing from node "someaddress.com" with protocol "bgp"
Normal nodeAssigned 58s (x5 over 66s) metallb-speaker announcing from node "someaddress.com" with protocol "bgp"
I then try to connect to it, i.e. inserting some data with a very straightforward Spring Boot application. The application has no business logic, just trying to insert data.
Here are the relevant parts:
#Configuration
public class RedisConfiguration {
#Bean
public ReactiveRedisConnectionFactory reactiveRedisConnectionFactory() {
return new LettuceConnectionFactory("10.51.5.2", 30315);
}
#Repository
public class RedisRepository {
private final ReactiveRedisOperations<String, String> reactiveRedisOperations;
public RedisRepository(ReactiveRedisOperations<String, String> reactiveRedisOperations) {
this.reactiveRedisOperations = reactiveRedisOperations;
}
public Mono<RedisPojo> save(RedisPojo redisPojo) {
return reactiveRedisOperations.opsForValue().set(redisPojo.getInput(), redisPojo.getOutput()).map(__ -> redisPojo);
}
Each time I am trying to write the data, I am getting this exception:
2022-12-02T20:20:08.015+08:00 ERROR 1184 --- [ctor-http-nio-3] a.w.r.e.AbstractErrorWebExceptionHandler : [8f16a752-1] 500 Server Error for HTTP POST "/save"
org.springframework.data.redis.RedisConnectionFailureException: Unable to connect to Redis
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.translateException(LettuceConnectionFactory.java:1602) ~[spring-data-redis-3.0.0.jar:3.0.0]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ⇢ Handler com.redis.controller.RedisController#test(RedisRequest) [DispatcherHandler]
*__checkpoint ⇢ HTTP POST "/save" [ExceptionHandlingWebHandler]
Original Stack Trace:
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.translateException(LettuceConnectionFactory.java:1602) ~[spring-data-redis-3.0.0.jar:3.0.0]
Caused by: io.lettuce.core.RedisConnectionException: Unable to connect to 10.51.5.2/<unresolved>:30315
at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:78) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:56) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.AbstractRedisClient.getConnection(AbstractRedisClient.java:350) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.RedisClient.connect(RedisClient.java:216) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /10.51.5.2:30315
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.85.Final.jar:4.1.85.Final]
This is particularly puzzling, because I am quite sure the code of the Spring Boot app is working. When I change the IP of return new LettuceConnectionFactory("10.51.5.2", 30315);: to
a regular Redis on my laptop ("localhost", 6379),
a dockerized Redis on my laptop,
a dockerized Redis on prem, all are working fine.
Therefore, I am quite puzzled what did I do wrong with the setup of this Redis in AWS.
What should I do in order to connect to it properly.
May I get some help please?
Thank you
By default, Redis binds itself to the IP addresses 127.0.0.1 and ::1 and does not accept connections against non-local interfaces. Chances are high that this is your main issue and you may want to review your redis.conf file to bind Redis to the interface you need or to the generic * -::*, as explained in the comments of the config file itself (which I have linked above).
With that being said, Redis also does not accept connections on non-local interfaces if the default user has no password - a security layer named Protected mode. Thus you should either give your default user a password or disable protected mode in your redis.conf file.
Not sure if this applies to your case but, as a side note, I would suggest to always avoid exposing Redis to the Internet.
You are mixing 2 things.
To enable this service for pods in different namespaces you do not need external load balancer, you can just try to use redis.namespace-name:6379 dns name and it will just work. Such dns is there for every service you create (but works only inside kubernetes)
Kubernetes will make sure that your traffic will be routed to proper pods (assuming there is more than one).
If you want to expose redis from outside of kubernetes then you need to make sure there is connectivity from the outside and then you need network load balancer that will forward traffic to your kubernetes service (in your case node port, so you need NLB with eks worker nodes: 30315 as a targets)
If your worker nodes have public IP and their SecurityGroups allow connecting to them directly, you could try to connect to worker node's IP directly just to test things out (without LB).
And regardless off yout setup you can always create proxy via kubectl
kubectl port-forward -n redisNS svc/redis 6379:6379
and connect from spring boot app to localhost:6379
How do you want to connect from app to redis in a final setup?

Istio: Health check / sidecar fails when I enable the JWT RequestAuthentication

OBSOLETE:
I keep this post for further reference, but you can check better diagnose (not solved yet, but workarounded) in
Istio: RequestAuthentication jwksUri does not resolve internal services names
UPDATE:
In Istio log we see the next error. uaa is a kubernetes pod serving OAUTH authentication/authorization. It is accessed with the name uaa from the normal services. I do not know why the istiod cannot find uaa host name. Have I to use an specific name? (remember, standard services find uaa host perfectly)
2021-03-03T18:39:36.750311Z error model Failed to fetch public key from "http://uaa:8090/uaa/token_keys": Get "http://uaa:8090/uaa/token_keys": dial tcp: lookup uaa on 10.96.0.10:53: no such host
2021-03-03T18:39:36.750364Z error Failed to fetch jwt public key from "http://uaa:8090/uaa/token_keys": Get "http://uaa:8090/uaa/token_keys": dial tcp: lookup uaa on 10.96.0.10:53: no such host
2021-03-03T18:39:36.753394Z info ads LDS: PUSH for node:product-composite-5cbf8498c7-jd4n5.chp18 resources:29 size:134.3kB
2021-03-03T18:39:36.754623Z info ads RDS: PUSH for node:product-composite-5cbf8498c7-jd4n5.chp18 resources:14 size:14.2kB
2021-03-03T18:39:36.790916Z warn ads ADS:LDS: ACK ERROR sidecar~10.1.1.56~product-composite-5cbf8498c7-jd4n5.chp18~chp18.svc.cluster.local-10 Internal:Error adding/updating listener(s) virtualInbound: Provider 'origins-0' in jwt_authn config has invalid local jwks: Jwks RSA [n] or [e] field is missing or has a parse error
2021-03-03T18:39:55.618106Z info ads ADS: "10.1.1.55:41162" sidecar~10.1.1.55~review-65b6886c89-bcv5f.chp18~chp18.svc.cluster.local-6 terminated rpc error: code = Canceled desc = context canceled
Original question
I have a service that is working fine, after injecting istio sidecar to a standard kubernetes pod.
I'm trying to add jwt Authentication, and for this, I'm following the official guide Authorization with JWT
My problem is
If I create the JWT resources (RequestAuthorization and AuthorizationPolicy) AFTER injecting the istio dependencies, everything (seems) to work fine
But if I create the JWT resources (RequestAuthorization and AuthorizationPolicy) and then inject the Istio the pod doesn't start. After checking the logs, seems that the sidecar is not able to work (maybe checking the health?)
My code:
JWT Resources
apiVersion: "security.istio.io/v1beta1"
kind: "RequestAuthentication"
metadata:
name: "ra-product-composite"
spec:
selector:
matchLabels:
app: "product-composite"
jwtRules:
- issuer: "http://uaa:8090/uaa/oauth/token"
jwksUri: "http://uaa:8090/uaa/token_keys"
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: "ap-product-composite"
spec:
selector:
matchLabels:
app: "product-composite"
action: ALLOW
# rules:
# - from:
# - source:
# requestPrincipals: ["http://uaa:8090/uaa/oauth/token/faf5e647-74ab-42cc-acdb-13cc9c573d5d"]
# b99ccf71-50ed-4714-a7fc-e85ebae4a8bb
2- I use destination rules as follows
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: dr-product-composite
spec:
host: product-composite
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
3- My service deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: product-composite
spec:
replicas: 1
selector:
matchLabels:
app: product-composite
template:
metadata:
labels:
app: product-composite
version: latest
spec:
containers:
- name: comp
image: bthinking/product-composite-service
imagePullPolicy: Never
env:
- name: SPRING_PROFILES_ACTIVE
value: "docker"
- name: SPRING_CONFIG_LOCATION
value: file:/config-repo/application.yml,file:/config-repo/product-composite.yml
envFrom:
- secretRef:
name: rabbitmq-client-secrets
ports:
- containerPort: 80
resources:
limits:
memory: 350Mi
livenessProbe:
httpGet:
scheme: HTTP
path: /actuator/info
port: 4004
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 20
successThreshold: 1
readinessProbe:
httpGet:
scheme: HTTP
path: /actuator/health
port: 4004
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 3
successThreshold: 1
volumeMounts:
- name: config-repo-volume
mountPath: /config-repo
volumes:
- name: config-repo-volume
configMap:
name: config-repo-product-composite
---
apiVersion: v1
kind: Service
metadata:
name: product-composite
spec:
selector:
app: "product-composite"
ports:
- port: 80
name: http
targetPort: 80
- port: 4004
name: http-mgm
targetPort: 4004
4- Error log in the pod (combined service and sidecar)
2021-03-02 19:34:41.315 DEBUG 1 --- [undedElastic-12] o.s.s.w.s.a.AuthorizationWebFilter : Authorization successful
2021-03-02 19:34:41.315 DEBUG 1 --- [undedElastic-12] .b.a.e.w.r.WebFluxEndpointHandlerMapping : [0e009bf1-133] Mapped to org.springframework.boot.actuate.endpoint.web.reactive.AbstractWebFluxEndpointHandlerMapping$ReadOperationHandler#e13aa23
2021-03-02 19:34:41.316 DEBUG 1 --- [undedElastic-12] ebSessionServerSecurityContextRepository : No SecurityContext found in WebSession: 'org.springframework.web.server.session.InMemoryWebSessionStore$InMemoryWebSession#48e89a58'
2021-03-02 19:34:41.319 DEBUG 1 --- [undedElastic-15] .s.w.r.r.m.a.ResponseEntityResultHandler : [0e009bf1-133] Using 'application/vnd.spring-boot.actuator.v3+json' given [*/*] and supported [application/vnd.spring-boot.actuator.v3+json, application/vnd.spring-boot.actuator.v2+json, application/json]
2021-03-02 19:34:41.320 DEBUG 1 --- [undedElastic-15] .s.w.r.r.m.a.ResponseEntityResultHandler : [0e009bf1-133] 0..1 [java.util.Collections$UnmodifiableMap<?, ?>]
2021-03-02 19:34:41.321 DEBUG 1 --- [undedElastic-15] o.s.http.codec.json.Jackson2JsonEncoder : [0e009bf1-133] Encoding [{}]
2021-03-02 19:34:41.326 DEBUG 1 --- [or-http-epoll-3] r.n.http.server.HttpServerOperations : [id: 0x0e009bf1, L:/127.0.0.1:4004 - R:/127.0.0.1:57138] Detected non persistent http connection, preparing to close
2021-03-02 19:34:41.327 DEBUG 1 --- [or-http-epoll-3] o.s.w.s.adapter.HttpWebHandlerAdapter : [0e009bf1-133] Completed 200 OK
2021-03-02 19:34:41.327 DEBUG 1 --- [or-http-epoll-3] r.n.http.server.HttpServerOperations : [id: 0x0e009bf1, L:/127.0.0.1:4004 - R:/127.0.0.1:57138] Last HTTP response frame
2021-03-02 19:34:41.328 DEBUG 1 --- [or-http-epoll-3] r.n.http.server.HttpServerOperations : [id: 0x0e009bf1, L:/127.0.0.1:4004 - R:/127.0.0.1:57138] Last HTTP packet was sent, terminating the channel
2021-03-02T19:34:41.871551Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
5- Istio injection
kubectl get deployment product-composite -o yaml | istioctl kube-inject -f - | kubectl apply -f -
NOTICE: I have checked a lot of post in SO, and it seems that health checking create a lot of problems with sidecars and other configurations. I have checked the guide Health Checking of Istio Services with no success. Specifically, I tried to disable the sidecar.istio.io/rewriteAppHTTPProbers: "false", but it is worse (in this case, doesn't start neither the sidecar neither the service.

Unable to create/read document to HDFS deployed with AWS EBS in EKS cluster

I have EKS cluster with EBS storage class/volume.
I am able to deploy hdfs namenode and datanode images (bde2020/hadoop-xxx) using statefulset successfully.
When I am trying to put a file to hdfs from my machine using hdfs://:, it gives me success, but it does not get written on datanode.
In namenode log, I see below error.
Can it be something to do with EBS volume? I cannot even upload/download files from namenode GUI. Can it be due to as datanode host name hdfs-data-X.hdfs-data.pulse.svc.cluster.local is not resolvable to my local machine?
Please help
2020-05-12 17:38:51,360 INFO hdfs.StateChange: BLOCK* allocate blk_1073741825_1001, replicas=10.8.29.112:9866, 10.8.29.176:9866, 10.8.29.188:9866 for /vault/a.json
2020-05-12 17:39:13,036 WARN blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology
2020-05-12 17:39:13,036 WARN protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=3, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
2020-05-12 17:39:13,036 WARN blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable: unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
2020-05-12 17:39:13,036 INFO hdfs.StateChange: BLOCK* allocate blk_1073741826_1002, replicas=10.8.29.176:9866, 10.8.29.188:9866 for /vault/a.json
2020-05-12 17:39:34,607 INFO namenode.FSEditLog: Number of transactions: 11 Total time for transactions(ms): 23 Number of transactions batched in Syncs: 3 Number of syncs: 8 SyncTimes(ms): 23
2020-05-12 17:39:35,146 WARN blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 2 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology
2020-05-12 17:39:35,146 WARN protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 2 but only 0 storage types can be selected (replication=3, selected=[], unavailable=[DISK], removed=[DISK, DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
2020-05-12 17:39:35,146 WARN blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 2 to reach 3 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable: unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
2020-05-12 17:39:35,147 INFO hdfs.StateChange: BLOCK* allocate blk_1073741827_1003, replicas=10.8.29.188:9866 for /vault/a.json
2020-05-12 17:39:57,319 WARN blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 3 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology
2020-05-12 17:39:57,319 WARN protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 3 but only 0 storage types can be selected (replication=3, selected=[], unavailable=[DISK], removed=[DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
2020-05-12 17:39:57,319 WARN blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 3 to reach 3 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable: unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
2020-05-12 17:39:57,320 INFO ipc.Server: IPC Server handler 5 on default port 8020, call Call#12 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.254.40.95:59328
java.io.IOException: File /vault/a.json could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2219)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2789)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:892)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
My namenode web page shows below:
Node Http Address Last contact Last Block Report Capacity Blocks Block pool used Version
hdfs-data-0.hdfs-data.pulse.svc.cluster.local:9866 http://hdfs-data-0.hdfs-data.pulse.svc.cluster.local:9864 1s 0m
975.9 MB
0 24 KB (0%) 3.2.1
hdfs-data-1.hdfs-data.pulse.svc.cluster.local:9866 http://hdfs-data-1.hdfs-data.pulse.svc.cluster.local:9864 2s 0m
975.9 MB
0 24 KB (0%) 3.2.1
hdfs-data-2.hdfs-data.pulse.svc.cluster.local:9866 http://hdfs-data-2.hdfs-data.pulse.svc.cluster.local:9864 1s 0m
975.9 MB
0 24 KB (0%) 3.2.1
My deployment:
NameNode:
#clusterIP service of namenode
apiVersion: v1
kind: Service
metadata:
name: hdfs-name
namespace: pulse
labels:
component: hdfs-name
spec:
ports:
- port: 8020
protocol: TCP
name: nn-rpc
- port: 9870
protocol: TCP
name: nn-web
selector:
component: hdfs-name
type: ClusterIP
---
#namenode stateful deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: hdfs-name
namespace: pulse
labels:
component: hdfs-name
spec:
serviceName: hdfs-name
replicas: 1
selector:
matchLabels:
component: hdfs-name
template:
metadata:
labels:
component: hdfs-name
spec:
initContainers:
- name: delete-lost-found
image: busybox
command: ["sh", "-c", "rm -rf /hadoop/dfs/name/lost+found"]
volumeMounts:
- name: hdfs-name-pv-claim
mountPath: /hadoop/dfs/name
containers:
- name: hdfs-name
image: bde2020/hadoop-namenode
env:
- name: CLUSTER_NAME
value: hdfs-k8s
- name: HDFS_CONF_dfs_permissions_enabled
value: "false"
ports:
- containerPort: 8020
name: nn-rpc
- containerPort: 9870
name: nn-web
volumeMounts:
- name: hdfs-name-pv-claim
mountPath: /hadoop/dfs/name
#subPath: data #subPath required as on root level, lost+found folder is created which does not cause to run namenode --format
volumeClaimTemplates:
- metadata:
name: hdfs-name-pv-claim
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: ebs
resources:
requests:
storage: 1Gi
Datanode:
#headless service of datanode
apiVersion: v1
kind: Service
metadata:
name: hdfs-data
namespace: pulse
labels:
component: hdfs-data
spec:
ports:
- port: 80
protocol: TCP
selector:
component: hdfs-data
clusterIP: None
type: ClusterIP
---
#datanode stateful deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: hdfs-data
namespace: pulse
labels:
component: hdfs-data
spec:
serviceName: hdfs-data
replicas: 3
selector:
matchLabels:
component: hdfs-data
template:
metadata:
labels:
component: hdfs-data
spec:
containers:
- name: hdfs-data
image: bde2020/hadoop-datanode
env:
- name: CORE_CONF_fs_defaultFS
value: hdfs://hdfs-name:8020
volumeMounts:
- name: hdfs-data-pv-claim
mountPath: /hadoop/dfs/data
volumeClaimTemplates:
- metadata:
name: hdfs-data-pv-claim
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: ebs
resources:
requests:
storage: 1Gi
It seems to be issue with the datanode not reachable over rpc port from my client machine.
I had datanodes http port reachable from my client machine. Tried using webhdfs:// (instead of hdfs://) after putting mapping of datanode podname vs IP in hosts file, it worked out.

Prometheus alert manager doesnt send alert k8s

Im using prometheus operator 0.3.4 and alert manager 0.20 and it doesnt work, i.e. I see that the alert is fired (on prometheus UI on the alerts tab) but I didnt get any alert to the email. by looking at the logs I see the following , any idea ? please see the warn in bold maybe this is the reason but not sure how to fix it...
This is the helm of prometheus operator which I use:
https://github.com/helm/charts/tree/master/stable/prometheus-operator
level=info ts=2019-12-23T15:42:28.039Z caller=main.go:231 msg="Starting Alertmanager" version="(version=0.20.0, branch=HEAD, revision=f74be0400a6243d10bb53812d6fa408ad71ff32d)"
level=info ts=2019-12-23T15:42:28.039Z caller=main.go:232 build_context="(go=go1.13.5, user=root#00c3106655f8, date=20191211-14:13:14)"
level=warn ts=2019-12-23T15:42:28.109Z caller=cluster.go:228 component=cluster msg="failed to join cluster" err="1 error occurred:\n\t* Failed to resolve alertmanager-monitoring-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc:9094: lookup alertmanager-monitoring-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc on 100.64.0.10:53: no such host\n\n"
level=info ts=2019-12-23T15:42:28.109Z caller=cluster.go:230 component=cluster msg="will retry joining cluster every 10s"
level=warn ts=2019-12-23T15:42:28.109Z caller=main.go:322 msg="unable to join gossip mesh" err="1 error occurred:\n\t* Failed to resolve alertmanager-monitoring-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc:9094: lookup alertmanager-monitoring-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc on 100.64.0.10:53: no such host\n\n"
level=info ts=2019-12-23T15:42:28.109Z caller=cluster.go:623 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2019-12-23T15:42:28.131Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=info ts=2019-12-23T15:42:28.132Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=info ts=2019-12-23T15:42:28.134Z caller=main.go:416 component=configuration msg="skipping creation of receiver not referenced by any route" receiver=AlertMail
level=info ts=2019-12-23T15:42:28.134Z caller=main.go:416 component=configuration msg="skipping creation of receiver not referenced by any route" receiver=AlertMail2
level=info ts=2019-12-23T15:42:28.135Z caller=main.go:497 msg=Listening address=:9093
level=info ts=2019-12-23T15:42:30.110Z caller=cluster.go:648 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.00011151s
level=info ts=2019-12-23T15:42:38.110Z caller=cluster.go:640 component=cluster msg="gossip settled; proceeding" elapsed=10.000659096s
this is my config yaml
global:
imagePullSecrets: []
prometheus-operator:
defaultRules:
grafana:
enabled: true
prometheusOperator:
tolerations:
- key: "WorkGroup"
operator: "Equal"
value: "operator"
effect: "NoSchedule"
- key: "WorkGroup"
operator: "Equal"
value: "operator"
effect: "NoExecute"
tlsProxy:
image:
repository: squareup/ghostunnel
tag: v1.4.1
pullPolicy: IfNotPresent
resources:
limits:
cpu: 8000m
memory: 2000Mi
requests:
cpu: 2000m
memory: 2000Mi
admissionWebhooks:
patch:
priorityClassName: "operator-critical"
image:
repository: jettech/kube-webhook-certgen
tag: v1.0.0
pullPolicy: IfNotPresent
serviceAccount:
name: prometheus-operator
image:
repository: quay.io/coreos/prometheus-operator
tag: v0.34.0
pullPolicy: IfNotPresent
prometheus:
prometheusSpec:
replicas: 1
serviceMonitorSelector:
role: observeable
tolerations:
- key: "WorkGroup"
operator: "Equal"
value: "operator"
effect: "NoSchedule"
- key: "WorkGroup"
operator: "Equal"
value: "operator"
effect: "NoExecute"
ruleSelector:
matchLabels:
role: alert-rules
prometheus: prometheus
image:
repository: quay.io/prometheus/prometheus
tag: v2.13.1
alertmanager:
alertmanagerSpec:
image:
repository: quay.io/prometheus/alertmanager
tag: v0.20.0
resources:
limits:
cpu: 500m
memory: 1000Mi
requests:
cpu: 500m
memory: 1000Mi
serviceAccount:
name: prometheus
config:
global:
resolve_timeout: 1m
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'alertmanager#vsx.com'
smtp_auth_username: 'ds.monitoring.grafana#gmail.com'
smtp_auth_password: 'mypass'
smtp_require_tls: false
route:
group_by: ['alertname', 'cluster']
group_wait: 45s
group_interval: 5m
repeat_interval: 1h
receiver: default-receiver
routes:
- receiver: str
match_re:
cluster: "canary|canary2"
receivers:
- name: default-receiver
- name: str
email_configs:
- to: 'rayndoll007#gmail.com'
from: alertmanager#vsx.com
smarthost: smtp.gmail.com:587
auth_identity: ds.monitoring.grafana#gmail.com
auth_username: ds.monitoring.grafana#gmail.com
auth_password: mypass
- name: 'AlertMail'
email_configs:
- to: 'rayndoll007#gmail.com'
https://codebeautify.org/yaml-validator/cb6a2781
The error says it failed in the resolve , the pod name called alertmanager-monitoring-prometheus-oper-alertmanager-0 which is up and running however it try to resolve : lookup alertmanager-monitoring-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc not sure why...
Here is the output of kubectl get svc -n mon
update
This is warn logs
level=warn ts=2019-12-24T12:10:21.293Z caller=cluster.go:438 component=cluster msg=refresh result=failure addr=alertmanager-monitoring-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc:9094
level=warn ts=2019-12-24T12:10:21.323Z caller=cluster.go:438 component=cluster msg=refresh result=failure addr=alertmanager-monitoring-prometheus-oper-alertmanager-1.alertmanager-operated.monitoring.svc:9094
level=warn ts=2019-12-24T12:10:21.326Z caller=cluster.go:438 component=cluster msg=refresh result=failure addr=alertmanager-monitoring-prometheus-oper-alertmanager-2.alertmanager-operated.monitoring.svc:9094
This is the kubectl get svc -n mon
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 6m4s
monitoring-grafana ClusterIP 100.11.215.226 <none> 80/TCP 6m13s
monitoring-kube-state-metrics ClusterIP 100.22.248.232 <none> 8080/TCP 6m13s
monitoring-prometheus-node-exporter ClusterIP 100.33.130.77 <none> 9100/TCP 6m13s
monitoring-prometheus-oper-alertmanager ClusterIP 100.33.228.217 <none> 9093/TCP 6m13s
monitoring-prometheus-oper-operator ClusterIP 100.21.229.204 <none> 8080/TCP,443/TCP 6m13s
monitoring-prometheus-oper-prometheus ClusterIP 100.22.93.151 <none> 9090/TCP 6m13s
prometheus-operated ClusterIP None <none> 9090/TCP 5m54s
Proper debug steps to help with these kind of scenarios:
Enable Alertmanager debug logs: add argument --log.level=debug
Verify Alertmanager cluster is formed properly (Check /status endpoint and verify all peers are listed)
Verify that Prometheus is sending alerts to all Alertmanager peers (Check /status endpoint and verify all Alertmanager peers are listed)
End to End testing: Generate a test alert, alert should be seen in Prometheus UI, then alert should be seen in Alertmanager UI, finally alert notification should be seen.

Chain KUBE-SERVICES - Rejects Service has no endpoints

Trying to curl the service deployed in k8s cluster from the master node
curl: (7) Failed to connect to localhost port 31796: Connection
refused
For kubernetes cluster, when I check my iptables on master I get the following .
Chain KUBE-SERVICES (1 references)
target prot opt source destination
REJECT tcp -- anywhere 10.100.94.202 /*
default/some-service: has no endpoints */ tcp dpt:9015 reject-with
icmp-port-unreachable
REJECT tcp -- anywhere 10.103.64.79 /*
default/some-service: has no endpoints */ tcp dpt:9000 reject-with
icmp-port-unreachable
REJECT tcp -- anywhere 10.107.111.252 /*
default/some-service: has no endpoints */ tcp dpt:9015 reject-with
icmp-port-unreachable
if I flush my iptables with
iptables -F
and then curl
curl -v localhost:31796
I get the following
* Rebuilt URL to: localhost:31796/
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 31796 (#0)
> GET / HTTP/1.1
> Host: localhost:31796
> User-Agent: curl/7.58.0
> Accept: */*
but soon after it results in
* Rebuilt URL to: localhost:31796/
* Trying 127.0.0.1...
* TCP_NODELAY set
* connect to 127.0.0.1 port 31796 failed: Connection refused
* Failed to connect to localhost port 31796: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 31796: Connection
refused
I'm using the nodePort concept in my service
Details
kubectl get node
NAME STATUS ROLES AGE VERSION
ip-Master-IP Ready master 26h v1.12.7
ip-Node1-ip Ready <none> 26h v1.12.7
ip-Node2-ip Ready <none> 23h v1.12.7
Kubectl get pods
NAME READY STATUS RESTARTS AGE
config-service-7dc8fc4ff-5kk88 1/1 Running 0 5h49m
kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
AGE SELECTOR
cadmin-server NodePort 10.109.55.255 <none>
9015:31796/TCP 22h app=config-service
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP
26h <none>
Kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
endpoint.yml
apiVersion: v1
kind: Endpoints
metadata:
name: xyz
subsets:
- addresses:
- ip: node1_ip
- ip: node2_ip
ports:
- port: 31796
- name: xyz
service.yml
apiVersion: v1
kind: Service
metadata:
name: xyz
namespace: default
annotations:
alb.ingress.kubernetes.io/healthcheck-path: /xyz
labels:
app: xyz
spec:
type: NodePort
ports:
- nodePort: 31796
port: 8001
targetPort: 8001
protocol: TCP
selector:
app: xyz
deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: xyz
name: xyz
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: xyz
template:
metadata:
labels:
app: xyz
spec:
containers:
- name: xyz
image: abc
ports:
- containerPort: 8001
imagePullPolicy: Always
resources:
requests:
cpu: 200m
volumeMounts:
- mountPath: /app/
name: config-volume
restartPolicy: Always
imagePullSecrets:
- name: awslogin
volumes:
- configMap:
name: xyz
name: config-volume
You can run the following command to check endpoints.
kubectl get endpoints
If endpoint is not showing up for the service. Please check the yml files that you used for creating the loadbalancer and the deployment. Make sure the labels match.
As many have pointed out in their comments the Firewall Rule "no endpoints" is inserted by the kubelet service and indicates a broken Service Application Definition or Setup.
# iptables-save
# Generated by iptables-save v1.4.21 on Wed Feb 24 10:10:23 2021
*filter
# [...]
-A KUBE-EXTERNAL-SERVICES -p tcp -m comment --comment "default/web-service:http has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 30081 -j REJECT --reject-with icmp-port-unreachable
# [...]
As you have noticed as well the service kubelet constantly monitors the Firewall Rules and inserts or deletes rules dynamically according to the Kubernetes Pod or Service definitions.
# kubectl get service --namespace=default
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 198d
web-service NodePort 10.111.188.199 <none> 8201:30081/TCP 194d
# kubectl get pods --namespace=default
No resources found in default namespace.
In this example case a Service is defined but the Pod associated with the Service does not exist.
Still the kube-proxy process listens on the port 30081:
# netstat -lpn | grep -i kube
[...]
tcp 0 0 0.0.0.0:30081 0.0.0.0:* LISTEN 21542/kube-proxy
[...]
So the kubelet service inserts a firewall rule to prevent the traffic for the broken service.
Also the kubelet service will delete this rule as soon as you delete the Service definition
# kubectl delete service web-service --namespace=default
service "web-service" deleted
# iptables-save | grep -i "no endpoints" | wc -l
0
As a Side Node:
This rule is also inserted for Kubernetes Definitions that the kubelet Service doesn't like.
As an example your service can have the name "log-service" but can't have the name "web-log".
In the latter case the kubelet Service didn't give a warning but inserted this blocking rule