Jaeger is not showing any trace results - kubectl

I am able to run kiali fine. But it Jaeger is not showing any results. I'm using virtualbox for this exercise. In order for me to view it in my local browser I'm using port forwarding.
I think this is communication issue between pods.
Below is what I'm using.
Virtualbox
Minimal install of CentOS_8.4.2105
istio-1.11.4
Docker version 20.10.9, build c2ea9bc
minikube version: v1.23.2
[centos#centos8 bin]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:38:50Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:32:41Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
here is my Kiali
Below is my Jaeger
[centos#centos8 warmup-exercise]$ kubectl get pods -n istio-system
NAME READY STATUS RESTARTS AGE
grafana-7bdcf77687-w5hvt 1/1 Running 0 3h58m
istio-egressgateway-5547fcc8fc-qsd2l 1/1 Running 0 3h58m
istio-ingressgateway-8f568d595-j6wzd 1/1 Running 0 3h58m
istiod-6659979bdf-9chbn 1/1 Running 0 3h58m
jaeger-5c7c5c8d87-p5678 1/1 Running 0 3h58m
kiali-7fd9f6f484-vlxms 1/1 Running 0 3h58m
prometheus-f5f544b59-br5n4 2/2 Running 0 3h58m
[centos#centos8 warmup-exercise]$ kubectl --namespace istio-system describe pod/jaeger-5c7c5c8d87-p5678
Name: jaeger-5c7c5c8d87-p5678
Namespace: istio-system
Priority: 0
Node: minikube/192.168.49.2
Start Time: Thu, 21 Oct 2021 12:42:47 -0400
Labels: app=jaeger
pod-template-hash=5c7c5c8d87
Annotations: prometheus.io/port: 14269
prometheus.io/scrape: true
sidecar.istio.io/inject: false
Status: Running
IP: 172.17.0.6
IPs:
IP: 172.17.0.6
Controlled By: ReplicaSet/jaeger-5c7c5c8d87
Containers:
jaeger:
Container ID: docker://3e155e7909f9f9976184b0b8f72880307d6bb7f8810d98c25d2dd8f18df342bb
Image: docker.io/jaegertracing/all-in-one:1.20
Image ID: docker-pullable://jaegertracing/all-in-one#sha256:54c2ea315dab7215c51c1b06b111c666f594e90317584f84eabbc59aa5856b13
Port: <none>
Host Port: <none>
State: Running
Started: Thu, 21 Oct 2021 12:49:26 -0400
Ready: True
Restart Count: 0
Requests:
cpu: 10m
Liveness: http-get http://:14269/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:14269/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
BADGER_EPHEMERAL: false
SPAN_STORAGE_TYPE: badger
BADGER_DIRECTORY_VALUE: /badger/data
BADGER_DIRECTORY_KEY: /badger/key
COLLECTOR_ZIPKIN_HTTP_PORT: 9411
MEMORY_MAX_TRACES: 50000
QUERY_BASE_PATH: /jaeger
Mounts:
/badger from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tj4pj (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-tj4pj:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 51m (x3 over 102m) kubelet Readiness probe failed: Get "http://172.17.0.6:14269/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 51m (x5 over 160m) kubelet Liveness probe failed: Get "http://172.17.0.6:14269/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning NodeNotReady 51m node-controller Node is not ready
[centos#centos8 warmup-exercise]$ kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
api-gateway-5cd5c547c6-lrt6k 2/2 Running 0 3h30m 172.17.0.13 minikube <none> <none>
photo-service-7c79458679-trblk 2/2 Running 0 3h30m 172.17.0.11 minikube <none> <none>
position-simulator-6c7b7949f8-k2z7t 2/2 Running 0 3h30m 172.17.0.14 minikube <none> <none>
position-tracker-cbbc8b7f6-dl4gz 2/2 Running 0 3h30m 172.17.0.12 minikube <none> <none>
staff-service-6597879677-7zh2c 2/2 Running 0 3h30m 172.17.0.15 minikube <none> <none>
vehicle-telemetry-c8fcb46c6-n9764 2/2 Running 0 3h30m 172.17.0.10 minikube <none> <none>
webapp-85fd946885-zdjck 2/2 Running 0 3h30m 172.17.0.16 minikube <none> <none>
I'm still learning devops. Let me know if I missed something.

The default sample rate for Jaeger is 1% : https://istio.io/latest/docs/tasks/observability/distributed-tracing/jaeger/

Related

Issue pulling secret for NGINX Ingress Controller with NLB in EKS

Trying to provision NGINX ingress controller with NLB in EKS. Getting CrashLoopBackOff for ingress-nginx-admission-create and ingress-nginx-admission-patch.
It's worth mentioning that this is a private EKS cluster without internet access and I'm pulling the docker images successfully from ECR. Also, I am using a secondary VPC CIDR to allocate pod IPs.
I am following this documentation:
https://kubernetes.github.io/ingress-nginx/deploy/#aws
https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.1.1/deploy/static/provider/aws/deploy.yaml
The following resources are created:
namespace/ingress-nginx created
serviceaccount/ingress-nginx created
configmap/ingress-nginx-controller created
clusterrole.rbac.authorization.k8s.io/ingress-nginx created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created
role.rbac.authorization.k8s.io/ingress-nginx created
rolebinding.rbac.authorization.k8s.io/ingress-nginx created
service/ingress-nginx-controller-admission created
service/ingress-nginx-controller created
deployment.apps/ingress-nginx-controller created
ingressclass.networking.k8s.io/nginx created
validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created
serviceaccount/ingress-nginx-admission created
clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
role.rbac.authorization.k8s.io/ingress-nginx-admission created
rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
job.batch/ingress-nginx-admission-create created
job.batch/ingress-nginx-admission-patch created
NAMESPACE NAME READY STATUS RESTARTS AGE
ingress-nginx ingress-nginx-admission-create-lws8l 0/1 CrashLoopBackOff 4 4m18s
ingress-nginx ingress-nginx-admission-patch-g5s8w 0/1 CrashLoopBackOff 4 4m18s
ingress-nginx ingress-nginx-controller-79c469cd9f-wqmhn 0/1 ContainerCreating 0 4m18s
kube-system aws-node-4g2h2 1/1 Running 0 29h
kube-system aws-node-r65xb 1/1 Running 0 29h
kube-system aws-node-spfzj 1/1 Running 0 29h
kube-system coredns-65ccb76b7c-97xcv 1/1 Running 0 31h
kube-system coredns-65ccb76b7c-bck8f 1/1 Running 0 31h
kube-system kube-proxy-jnfjd 1/1 Running 0 29h
kube-system kube-proxy-smc88 1/1 Running 0 29h
kube-system kube-proxy-v6hjp 1/1 Running 0 29h
One of the issues seems to be getting the secret and I don't understand why.
{"err":"Get \"https://172.20.0.1:443/api/v1/namespaces/ingress-nginx/secrets/ingress-nginx-admission\": dial tcp 172.20.0.1:443: i/o timeout","level":"fatal","msg":"error getting secret","source":"k8s/k8s.go:232","time":"2022-03-11T23:56:16Z"}
The other thing is that the NLB is not being created.
sh-4.2$ kubectl get svc -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller LoadBalancer 172.20.76.158 <pending> 80:31054/TCP,443:32308/TCP 7m43s
ingress-nginx-controller-admission ClusterIP 172.20.240.214 <none> 443/TCP 7m43s
sh-4.2$ kubectl get -A ValidatingWebhookConfiguration
NAME WEBHOOKS AGE
ingress-nginx-admission 1 12m
vpc-resource-validating-webhook 1 31h
Here is the ingress-nginx-admission-create-xw4rz pod.
Name: ingress-nginx-admission-create-xw4rz
Namespace: ingress-nginx
Priority: 0
Node: ip-10-51-80-103.eu-west-2.compute.internal/10.51.80.103
Start Time: Sat, 12 Mar 2022 14:38:38 +0000
Labels: app.kubernetes.io/component=admission-webhook
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/version=1.1.1
controller-uid=e6d21979-55bf-4b44-aa41-6462b9923806
helm.sh/chart=ingress-nginx-4.0.15
job-name=ingress-nginx-admission-create
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 10.0.25.33
IPs:
IP: 10.0.25.33
Controlled By: Job/ingress-nginx-admission-create
Containers:
create:
Container ID: docker://9a311111111111111111111111111111111111111111111111111 Image: 11111111111111.dkr.ecr.eu-west-2.amazonaws.com/certgen:latest
Image ID: docker-pullable://11111111111111.dkr.ecr.eu-west-2.amazonaws.com/certgen#sha256:7831111111111111111111111111111
Port: <none>
Host Port: <none>
Args:
create
--host=ingress-nginx-controller-admission,ingress-nginx-controller-admission.$(POD_NAMESPACE).svc
--namespace=$(POD_NAMESPACE)
--secret-name=ingress-nginx-admission
State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 12 Mar 2022 14:42:05 +0000
Finished: Sat, 12 Mar 2022 14:42:35 +0000
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 12 Mar 2022 14:40:52 +0000
Finished: Sat, 12 Mar 2022 14:41:22 +0000
Ready: False
Restart Count: 4
Environment:
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rrlnd (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-rrlnd:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m59s default-scheduler Successfully assigned ingress-nginx/ingress-nginx-admission-create-xw4rz to ip-10-51-80-103.eu-west-2.compute.internal
Normal Pulled 32s (x5 over 3m58s) kubelet Container image "111111111111111.dkr.ecr.eu-west-2.amazonaws.com/certgen:latest" already present on machine
Normal Created 32s (x5 over 3m58s) kubelet Created container create
Normal Started 32s (x5 over 3m58s) kubelet Started container create
Warning BackOff 2s (x7 over 2m56s) kubelet Back-off restarting failed container
And here is the ingress-nginx-controller-79c469cd9f-ft76q. I can see a failed mount, but I understand that "ingress-nginx-controller" is created after the 2 pods above run without errors.
sh-4.2$ kubectl describe pod ingress-nginx-controller-79c469cd9f-ft76q -n ingress-nginx
Name: ingress-nginx-controller-79c469cd9f-ft76q
Namespace: ingress-nginx
Priority: 0
Node: ip-10-51-80-6.eu-west-2.compute.internal/10.51.80.6
Start Time: Sat, 12 Mar 2022 14:38:38 +0000
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/name=ingress-nginx
pod-template-hash=79c469cd9f
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/ingress-nginx-controller-79c469cd9f
Containers:
controller:
Container ID:
Image: 111111111111111.dkr.ecr.eu-west-2.amazonaws.com/nginx-controller:latest
Image ID:
Ports: 80/TCP, 443/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
--election-id=ingress-controller-leader
--controller-class=k8s.io/ingress-nginx
--configmap=$(POD_NAMESPACE)/ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: ingress-nginx-controller-79c469cd9f-ft76q (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j8v4c (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: ingress-nginx-admission
Optional: false
kube-api-access-j8v4c:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m46s default-scheduler Successfully assigned ingress-nginx/ingress-nginx-controller-79c469cd9f-ft76q to ip-10-51-80-6.eu-west-2.compute.internal
Warning FailedMount 6m43s kubelet Unable to attach or mount volumes: unmounted volumes=[webhook-cert], unattached volumes=[kube-api-access-j8v4c webhook-cert]: timed out waiting for the condition
Warning FailedMount 2m14s (x2 over 4m29s) kubelet Unable to attach or mount volumes: unmounted volumes=[webhook-cert], unattached volumes=[webhook-cert kube-api-access-j8v4c]: timed out waiting for the condition
Warning FailedMount 32s (x12 over 8m46s) kubelet MountVolume.SetUp failed for volume "webhook-cert" : secret "ingress-nginx-admission" not found

Prometheus alert manager doesnt send alert k8s

Im using prometheus operator 0.3.4 and alert manager 0.20 and it doesnt work, i.e. I see that the alert is fired (on prometheus UI on the alerts tab) but I didnt get any alert to the email. by looking at the logs I see the following , any idea ? please see the warn in bold maybe this is the reason but not sure how to fix it...
This is the helm of prometheus operator which I use:
https://github.com/helm/charts/tree/master/stable/prometheus-operator
level=info ts=2019-12-23T15:42:28.039Z caller=main.go:231 msg="Starting Alertmanager" version="(version=0.20.0, branch=HEAD, revision=f74be0400a6243d10bb53812d6fa408ad71ff32d)"
level=info ts=2019-12-23T15:42:28.039Z caller=main.go:232 build_context="(go=go1.13.5, user=root#00c3106655f8, date=20191211-14:13:14)"
level=warn ts=2019-12-23T15:42:28.109Z caller=cluster.go:228 component=cluster msg="failed to join cluster" err="1 error occurred:\n\t* Failed to resolve alertmanager-monitoring-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc:9094: lookup alertmanager-monitoring-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc on 100.64.0.10:53: no such host\n\n"
level=info ts=2019-12-23T15:42:28.109Z caller=cluster.go:230 component=cluster msg="will retry joining cluster every 10s"
level=warn ts=2019-12-23T15:42:28.109Z caller=main.go:322 msg="unable to join gossip mesh" err="1 error occurred:\n\t* Failed to resolve alertmanager-monitoring-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc:9094: lookup alertmanager-monitoring-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc on 100.64.0.10:53: no such host\n\n"
level=info ts=2019-12-23T15:42:28.109Z caller=cluster.go:623 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2019-12-23T15:42:28.131Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=info ts=2019-12-23T15:42:28.132Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config/alertmanager.yaml
level=info ts=2019-12-23T15:42:28.134Z caller=main.go:416 component=configuration msg="skipping creation of receiver not referenced by any route" receiver=AlertMail
level=info ts=2019-12-23T15:42:28.134Z caller=main.go:416 component=configuration msg="skipping creation of receiver not referenced by any route" receiver=AlertMail2
level=info ts=2019-12-23T15:42:28.135Z caller=main.go:497 msg=Listening address=:9093
level=info ts=2019-12-23T15:42:30.110Z caller=cluster.go:648 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.00011151s
level=info ts=2019-12-23T15:42:38.110Z caller=cluster.go:640 component=cluster msg="gossip settled; proceeding" elapsed=10.000659096s
this is my config yaml
global:
imagePullSecrets: []
prometheus-operator:
defaultRules:
grafana:
enabled: true
prometheusOperator:
tolerations:
- key: "WorkGroup"
operator: "Equal"
value: "operator"
effect: "NoSchedule"
- key: "WorkGroup"
operator: "Equal"
value: "operator"
effect: "NoExecute"
tlsProxy:
image:
repository: squareup/ghostunnel
tag: v1.4.1
pullPolicy: IfNotPresent
resources:
limits:
cpu: 8000m
memory: 2000Mi
requests:
cpu: 2000m
memory: 2000Mi
admissionWebhooks:
patch:
priorityClassName: "operator-critical"
image:
repository: jettech/kube-webhook-certgen
tag: v1.0.0
pullPolicy: IfNotPresent
serviceAccount:
name: prometheus-operator
image:
repository: quay.io/coreos/prometheus-operator
tag: v0.34.0
pullPolicy: IfNotPresent
prometheus:
prometheusSpec:
replicas: 1
serviceMonitorSelector:
role: observeable
tolerations:
- key: "WorkGroup"
operator: "Equal"
value: "operator"
effect: "NoSchedule"
- key: "WorkGroup"
operator: "Equal"
value: "operator"
effect: "NoExecute"
ruleSelector:
matchLabels:
role: alert-rules
prometheus: prometheus
image:
repository: quay.io/prometheus/prometheus
tag: v2.13.1
alertmanager:
alertmanagerSpec:
image:
repository: quay.io/prometheus/alertmanager
tag: v0.20.0
resources:
limits:
cpu: 500m
memory: 1000Mi
requests:
cpu: 500m
memory: 1000Mi
serviceAccount:
name: prometheus
config:
global:
resolve_timeout: 1m
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'alertmanager#vsx.com'
smtp_auth_username: 'ds.monitoring.grafana#gmail.com'
smtp_auth_password: 'mypass'
smtp_require_tls: false
route:
group_by: ['alertname', 'cluster']
group_wait: 45s
group_interval: 5m
repeat_interval: 1h
receiver: default-receiver
routes:
- receiver: str
match_re:
cluster: "canary|canary2"
receivers:
- name: default-receiver
- name: str
email_configs:
- to: 'rayndoll007#gmail.com'
from: alertmanager#vsx.com
smarthost: smtp.gmail.com:587
auth_identity: ds.monitoring.grafana#gmail.com
auth_username: ds.monitoring.grafana#gmail.com
auth_password: mypass
- name: 'AlertMail'
email_configs:
- to: 'rayndoll007#gmail.com'
https://codebeautify.org/yaml-validator/cb6a2781
The error says it failed in the resolve , the pod name called alertmanager-monitoring-prometheus-oper-alertmanager-0 which is up and running however it try to resolve : lookup alertmanager-monitoring-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc not sure why...
Here is the output of kubectl get svc -n mon
update
This is warn logs
level=warn ts=2019-12-24T12:10:21.293Z caller=cluster.go:438 component=cluster msg=refresh result=failure addr=alertmanager-monitoring-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc:9094
level=warn ts=2019-12-24T12:10:21.323Z caller=cluster.go:438 component=cluster msg=refresh result=failure addr=alertmanager-monitoring-prometheus-oper-alertmanager-1.alertmanager-operated.monitoring.svc:9094
level=warn ts=2019-12-24T12:10:21.326Z caller=cluster.go:438 component=cluster msg=refresh result=failure addr=alertmanager-monitoring-prometheus-oper-alertmanager-2.alertmanager-operated.monitoring.svc:9094
This is the kubectl get svc -n mon
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 6m4s
monitoring-grafana ClusterIP 100.11.215.226 <none> 80/TCP 6m13s
monitoring-kube-state-metrics ClusterIP 100.22.248.232 <none> 8080/TCP 6m13s
monitoring-prometheus-node-exporter ClusterIP 100.33.130.77 <none> 9100/TCP 6m13s
monitoring-prometheus-oper-alertmanager ClusterIP 100.33.228.217 <none> 9093/TCP 6m13s
monitoring-prometheus-oper-operator ClusterIP 100.21.229.204 <none> 8080/TCP,443/TCP 6m13s
monitoring-prometheus-oper-prometheus ClusterIP 100.22.93.151 <none> 9090/TCP 6m13s
prometheus-operated ClusterIP None <none> 9090/TCP 5m54s
Proper debug steps to help with these kind of scenarios:
Enable Alertmanager debug logs: add argument --log.level=debug
Verify Alertmanager cluster is formed properly (Check /status endpoint and verify all peers are listed)
Verify that Prometheus is sending alerts to all Alertmanager peers (Check /status endpoint and verify all Alertmanager peers are listed)
End to End testing: Generate a test alert, alert should be seen in Prometheus UI, then alert should be seen in Alertmanager UI, finally alert notification should be seen.

Eclipse Hono deployment on AWS k8s

I am trying to deploy the Eclipse Hono version 1.0 on AWS k8s cluster. Below are the services after deployment (strange external IPs).
root#ip-172-31-35-125:~# kubectl get service -n hono
service/hono-adapter-amqp-vertx LoadBalancer 100.67.94.188 a25d01250eb3611e9b45f0e90af72a72-2016210601.eu-west-3.elb.amazonaws.com 5672:32672/TCP,5671:32671/TCP 4h46m
service/hono-adapter-amqp-vertx-headless ClusterIP None <none> <none> 4h46m
service/hono-adapter-http-vertx LoadBalancer 100.70.64.248 a25dc920deb3611e9b45f0e90af72a72-730471375.eu-west-3.elb.amazonaws.com 8080:30080/TCP,8443:30443/TCP 4h46m
service/hono-adapter-http-vertx-headless ClusterIP None <none> <none> 4h46m
service/hono-adapter-mqtt-vertx LoadBalancer 100.68.3.95 a25ed2e15eb3611e9b45f0e90af72a72-1002962271.eu-west-3.elb.amazonaws.com 1883:31883/TCP,8883:30883/TCP 4h46m
service/hono-adapter-mqtt-vertx-headless ClusterIP None <none> <none> 4h46m
service/hono-artemis ClusterIP 100.70.153.152 <none> 5671/TCP 4h46m
service/hono-dispatch-router ClusterIP 100.64.140.172 <none> 5673/TCP 4h46m
service/hono-dispatch-router-ext LoadBalancer 100.70.95.31 a25b7cb1ceb3611e9b45f0e90af72a72-370392986.eu-west-3.elb.amazonaws.com 15671:30671/TCP,15672:30672/TCP 4h46m
service/hono-grafana ClusterIP 100.67.252.68 <none> 3000/TCP 4h46m
service/hono-prometheus-server ClusterIP 100.65.95.65 <none> 9090/TCP 4h46m
service/hono-service-auth ClusterIP 100.66.3.21 <none> 5671/TCP 4h46m
service/hono-service-auth-headless ClusterIP None <none> <none> 4h46m
service/hono-service-device-connection-headless ClusterIP None <none> <none> 4h46m
service/hono-service-device-registry ClusterIP 100.67.196.156 <none> 5671/TCP 4h46m
service/hono-service-device-registry-ext LoadBalancer 100.65.10.48 a2604531feb3611e9b45f0e90af72a72-643429943.eu-west-3.elb.amazonaws.com 28080:31080/TCP,28443:31443/TCP 4h46m
service/hono-service-device-registry-headless ClusterIP None <none> <none> 4h46m
List of pods:
root#ip-172-31-35-125:~# kubectl get pods -n hono
pod/hono-adapter-amqp-vertx-6888d8fffc-555s9 0/1 Running 0 4h46m
pod/hono-adapter-http-vertx-54b7848749-fnjk5 0/1 Running 0 4h46m
pod/hono-adapter-mqtt-vertx-76b546bf76-9fw9w 0/1 Running 0 4h46m
pod/hono-artemis-5fdb775c46-pzdfz 1/1 Running 0 4h46m
pod/hono-dispatch-router-78ccc6579c-qk8pw 1/1 Running 0 4h46m
pod/hono-grafana-8d5d6476f-wnw96 2/2 Running 0 4h46m
pod/hono-grafana-test 0/1 Error 0 4h46m
pod/hono-prometheus-server-5676c8f974-bvmw8 0/2 ContainerCreating 0 4h46m
pod/hono-service-auth-c669986d-78c4q 1/1 Running 0 4h46m
Pods are running partially. Describing a failed pod (e.g., AMQP), the result is as follows:
root#ip-172-31-35-125:~# kubectl -n hono describe pod hono-adapter-amqp-vertx-6888d8fffc-555s9
Name: hono-adapter-amqp-vertx-6888d8fffc-555s9
Namespace: hono
Priority: 0
Node: ip-172-20-60-72.eu-west-3.compute.internal/172.20.60.72
Start Time: Thu, 10 Oct 2019 08:15:40 +0000
Labels: app.kubernetes.io/component=adapter-amqp-vertx
app.kubernetes.io/instance=hono
app.kubernetes.io/managed-by=Tiller
app.kubernetes.io/name=eclipse-hono
app.kubernetes.io/version=1.0-M7
helm.sh/chart=eclipse-hono-1.0-M7
pod-template-hash=6888d8fffc
Annotations: <none>
Status: Running
IP: 100.96.1.7
IPs: <none>
Controlled By: ReplicaSet/hono-adapter-amqp-vertx-6888d8fffc
Containers:
eclipse-hono-adapter-amqp-vertx:
Container ID: docker://33641eef4c25d50bc54947d396bc4c6e457a86369a51acaa0befcf2ee0307508
Image: index.docker.io/eclipse/hono-adapter-amqp-vertx:1.0-M7
Image ID: docker-pullable://eclipse/hono-adapter-amqp-vertx#sha256:982d2ac2824eb95b915fc53756a6b05fd6939473af7bf39cbf1351127adf6b04
Ports: 8088/TCP, 5671/TCP, 5672/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Thu, 10 Oct 2019 08:16:06 +0000
Ready: False
Restart Count: 0
Limits:
memory: 256Mi
Requests:
memory: 256Mi
Liveness: http-get https://:8088/liveness delay=180s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:8088/readiness delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
SPRING_CONFIG_LOCATION: file:///etc/hono/
SPRING_PROFILES_ACTIVE: dev
LOGGING_CONFIG: classpath:logback-spring.xml
_JAVA_OPTIONS: -XX:MinRAMPercentage=80 -XX:MaxRAMPercentage=80
KUBERNETES_NAMESPACE: hono (v1:metadata.namespace)
JAEGER_SERVICE_NAME: hono-adapter-amqp-vertx
Mounts:
/etc/hono from conf (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-phj96 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
conf:
Type: Secret (a volume populated by a Secret)
SecretName: hono-adapter-amqp-vertx-conf
Optional: false
default-token-phj96:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-phj96
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 4m34s (x1677 over 4h43m) kubelet, ip-172-20-60-72.eu-west-3.compute.internal Readiness probe failed: HTTP probe failed with statuscode: 503
A failed pod logs (pod/hono-adapter-amqp-vertx-6888d8fffc-555s9):
root#ip-172-31-35-125:~# kubectl -n hono logs hono-adapter-amqp-vertx-6888d8fffc-njmvr
13:43:45.184 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - can't connect to AMQP 1.0 container [amqps://hono-service-device-registry:5671]: Connection refused: hono-service-device-registry.hono.svc.cluster.local/100.67.196.156:5671
13:43:45.184 [vert.x-eventloop-thread-0] DEBUG o.e.h.client.impl.HonoConnectionImpl - connection attempt failed
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: hono-service-device-registry.hono.svc.cluster.local/100.67.196.156:5671
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:336)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:685)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:632)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:549)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
13:43:45.184 [vert.x-eventloop-thread-0] DEBUG o.e.h.client.impl.HonoConnectionImpl - starting attempt [#5005] to connect to server [hono-service-device-registry:5671]
13:43:45.184 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - OpenSSL [available: false, supports KeyManagerFactory: false, supports Hostname validation: false]
13:43:45.184 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - using JDK's default SSL engine
13:43:45.184 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.2]
13:43:45.184 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - connecting to AMQP 1.0 container [amqps://hono-service-device-registry:5671]
13:43:45.207 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - can't connect to AMQP 1.0 container [amqps://hono-service-device-registry:5671]: connection attempt timed out after 5000ms
13:43:45.207 [vert.x-eventloop-thread-0] DEBUG o.e.h.client.impl.HonoConnectionImpl - connection attempt failed
org.eclipse.hono.connection.ConnectTimeoutException: connection attempt timed out after 5000ms
at org.eclipse.hono.connection.impl.ConnectionFactoryImpl.lambda$connect$0(ConnectionFactoryImpl.java:140)
at io.vertx.core.impl.VertxImpl$InternalTimerHandler.handle(VertxImpl.java:911)
at io.vertx.core.impl.VertxImpl$InternalTimerHandler.handle(VertxImpl.java:875)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:320)
at io.vertx.core.impl.EventLoopContext.execute(EventLoopContext.java:43)
at io.vertx.core.impl.ContextImpl.executeFromIO(ContextImpl.java:188)
at io.vertx.core.impl.ContextImpl.executeFromIO(ContextImpl.java:180)
at io.vertx.core.impl.VertxImpl$InternalTimerHandler.run(VertxImpl.java:901)
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
13:43:45.207 [vert.x-eventloop-thread-0] DEBUG o.e.h.client.impl.HonoConnectionImpl - starting attempt [#5170] to connect to server [hono-service-device-registry:5671]
13:43:45.207 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - OpenSSL [available: false, supports KeyManagerFactory: false, supports Hostname validation: false]
13:43:45.207 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - using JDK's default SSL engine
13:43:45.207 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.2]
13:43:45.207 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - connecting to AMQP 1.0 container [amqps://hono-service-device-registry:5671]
13:43:45.208 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - ignoring failed connection attempt to AMQP 1.0 container [amqps://hono-service-device-registry:5671]: attempt already timed out
io.netty.channel.ConnectTimeoutException: connection timed out: hono-service-device-registry.hono.svc.cluster.local/100.67.196.156:5671
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263)
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
No endpoints for some pods:
root#ip-172-31-35-125:~# Kubectl get ep -n hono
NAME---ENDPOINTS
endpoints/hono-adapter-amqp-vertx
endpoints/hono-adapter-amqp-vertx-headless
endpoints/hono-adapter-http-vertx
endpoints/hono-adapter-http-vertx-headless
endpoints/hono-adapter-mqtt-vertx
endpoints/hono-adapter-mqtt-vertx-headless
endpoints/hono-artemis---100.96.1.6:5671
endpoints/hono-dispatch-router---100.96.2.5:5673
endpoints/hono-dispatch-router-ext---100.96.2.5:5671,100.96.2.5:5672
endpoints/hono-grafana---100.96.1.5:3000
endpoints/hono-prometheus-server---none
endpoints/hono-service-auth---100.96.2.7:5671
endpoints/hono-service-auth-headless---100.96.2.7
endpoints/hono-service-device-connection-headless---none
endpoints/hono-service-device-registry---none
endpoints/hono-service-device-registry-ext---none
endpoints/hono-service-device-registry-headless---none
No endpoint for the AMQP adaptor service too:
root#ip-172-31-35-125:~# kubectl -n hono describe service hono-adapter-amqp-vertx
Name: hono-adapter-amqp-vertx
Namespace: hono
Labels: app.kubernetes.io/component=adapter-amqp-vertx
app.kubernetes.io/instance=hono
app.kubernetes.io/managed-by=Tiller
app.kubernetes.io/name=eclipse-hono
app.kubernetes.io/version=1.0-M7
helm.sh/chart=eclipse-hono-1.0-M7
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"adapter-amqp-vertx","app.kuberne...
Selector: app.kubernetes.io/component=adapter-amqp-vertx,app.kubernetes.io/instance=hono,app.kubernetes.io/name=eclipse-hono
Type: LoadBalancer
IP: 100.67.94.188
LoadBalancer Ingress: a25d01250eb3611e9b45f0e90af72a72-2016210601.eu-west-3.elb.amazonaws.com
Port: amqp 5672/TCP
TargetPort: amqp/TCP
NodePort: amqp 32672/TCP
Endpoints:
Port: amqps 5671/TCP
TargetPort: amqps/TCP
NodePort: amqps 32671/TCP
Endpoints:
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
I am not sure whether the problem is with my AWS k8s and VPC connectivity or with the Hono deployment? Any idea how to solve this issue?

How to connect to GKE postgresql svc in GCP?

I'm trying to connect to the postgresql service (pod) in my kubernetes deployment but I GCP does not give a port (so I can not use something like: $ psql -h localhost -U postgresadmin1 --password -p 31070 postgresdb to connect to Postgresql and see my database).
I'm using a LoadBalancer in my service:
#cloudshell:~ (academic-veld-230622)$ psql -h 35.239.52.68 -U jhipsterpress --password -p 30728 jhipsterpress-postgresql
Password for user jhipsterpress:
psql: could not connect to server: Connection timed out
Is the server running on host "35.239.52.68" and accepting
TCP/IP connections on port 30728?
apiVersion: v1
kind: Service
metadata:
name: jhipsterpress
namespace: default
labels:
app: jhipsterpress
spec:
selector:
app: jhipsterpress
type: LoadBalancer
ports:
- name: http
port: 8080
NAME READY STATUS RESTARTS AGE
pod/jhipsterpress-84886f5cdf-mpwgb 1/1 Running 0 31m
pod/jhipsterpress-postgresql-5956df9557-fg8cn 1/1 Running 0 31m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jhipsterpress LoadBalancer 10.11.243.22 35.184.135.134 8080:32670/TCP 31m
service/jhipsterpress-postgresql LoadBalancer 10.11.255.64 35.239.52.68 5432:30728/TCP 31m
service/kubernetes ClusterIP 10.11.240.1 <none> 443/TCP 35m
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/jhipsterpress 1 1 1 1 31m
deployment.apps/jhipsterpress-postgresql 1 1 1 1 31m
NAME DESIRED CURRENT READY AGE
replicaset.apps/jhipsterpress-84886f5cdf 1 1 1 31m
replicaset.apps/jhipsterpress-postgresql-5956df9557 1 1 1 31m
#cloudshell:~ (academic-veld-230622)$ kubectl describe pod jhipsterpress-postgresql
Name: jhipsterpress-postgresql-5956df9557-fg8cn
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-standard-cluster-1-default-pool-bf9f446d-9hsq/10.128.0.58
Start Time: Sat, 06 Apr 2019 13:39:08 +0200
Labels: app=jhipsterpress-postgresql
pod-template-hash=1512895113
Annotations: kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container postgres
Status: Running
IP: 10.8.0.14
Controlled By: ReplicaSet/jhipsterpress-postgresql-5956df9557
Containers:
postgres:
Container ID: docker://55475d369c63da4d9bdc208e9d43c457f74845846fb4914c88c286ff96d0e45a
Image: postgres:10.4
Image ID: docker-pullable://postgres#sha256:9625c2fb34986a49cbf2f5aa225d8eb07346f89f7312f7c0ea19d82c3829fdaa
Port: 5432/TCP
Host Port: 0/TCP
State: Running
Started: Sat, 06 Apr 2019 13:39:29 +0200
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment:
POSTGRES_USER: jhipsterpress
POSTGRES_PASSWORD: <set to the key 'postgres-password' in secret 'jhipsterpress-postgresql'> Optional: false
Mounts:
/var/lib/pgsql/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mlmm5 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: spingular-bucket
ReadOnly: false
default-token-mlmm5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-mlmm5
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 33m (x3 over 33m) default-scheduler persistentvolumeclaim "spingular-bucket" not found
Warning FailedScheduling 33m (x3 over 33m) default-scheduler pod has unbound immediate PersistentVolumeClaims
Normal Scheduled 33m default-scheduler Successfully assigned default/jhipsterpress-postgresql-5956df9557-fg8cn to gke-standard-cluster-1-default-pool-bf9f446d-9hsq
Normal SuccessfulAttachVolume 33m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-95ba1737-5860-11e9-ae59-42010a8000a8"
Normal Pulling 33m kubelet, gke-standard-cluster-1-default-pool-bf9f446d-9hsq pulling image "postgres:10.4"
Normal Pulled 32m kubelet, gke-standard-cluster-1-default-pool-bf9f446d-9hsq Successfully pulled image "postgres:10.4"
Normal Created 32m kubelet, gke-standard-cluster-1-default-pool-bf9f446d-9hsq Created container
Normal Started 32m kubelet, gke-standard-cluster-1-default-pool-bf9f446d-9hsq Started container
With the open firewall: posgresql-jhipster
Ingress
Apply to all
IP ranges: 0.0.0.0/0
tcp:30728
Allow
999
default
Thanks for your help. Any documentation is really appreciated.
Your service is currently a type clusterIP. This does not expose the service or the pods outside the cluster. You can't connect to the pod from the Cloud Shell like this since the Cloud shell is not on your VPC and the pods are not exposed.
Update your service using kubectl edit svc jhipsterpress-postgresql
Change the spec.type field to 'LoadBalancer'
You will then have an external IP that you can connect to

EKS 1.11 + Istio 1.0.6 + Cilium 1.4.1, Post https://istio-sidecar-injector.istio-system.svc:443/inject?timeout=30s: Address is not allowed

Here're the steps to reproduce the error:
1). Install an AWS EKS cluster (1.11)
2). Install Cilium v1.4.1 following this guide
$ kubectl -n kube-system set env ds aws-node AWS_VPC_K8S_CNI_EXTERNALSNAT=true
$ kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.4/examples/kubernetes/1.11/cilium.yaml
3). Install istio 1.0.6
$ kubectl apply -f install/kubernetes/helm/helm-service-account.yaml
$ helm init --service-account tiller
$ helm install install/kubernetes/helm/istio --name istio --namespace istio-system
4). Try sample nginx
$ kubectl create ns nginx
$ kubectl label namespace nginx istio-injection=enabled
$ kubectl create deployment --image nginx nginx -n nginx
$ kubectl expose deployment nginx --port=80 --type=LoadBalancer -n nginx
Run into the problem
$ kubectl get deploy -n nginx
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx 1 0 0 0 27m
$ kubectl get deploy -n nginx -oyaml
apiVersion: v1
items:
- apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
traffic.sidecar.istio.io/includeOutboundIPRanges: 172.20.0.0/16
creationTimestamp: "2019-03-08T13:13:58Z"
generation: 3
labels:
app: nginx
name: nginx
namespace: nginx
resourceVersion: "36034"
selfLink: /apis/extensions/v1beta1/namespaces/nginx/deployments/nginx
uid: 0888b279-41a4-11e9-8f26-1274e185a192
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: nginx
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
conditions:
- lastTransitionTime: "2019-03-08T13:13:58Z"
lastUpdateTime: "2019-03-08T13:13:58Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2019-03-08T13:13:58Z"
lastUpdateTime: "2019-03-08T13:13:58Z"
message: 'Internal error occurred: failed calling admission webhook "sidecar-injector.istio.io":
Post https://istio-sidecar-injector.istio-system.svc:443/inject?timeout=30s:
Address is not allowed'
reason: FailedCreate
status: "True"
type: ReplicaFailure
- lastTransitionTime: "2019-03-08T13:23:59Z"
lastUpdateTime: "2019-03-08T13:23:59Z"
message: ReplicaSet "nginx-78f5d695bd" has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
observedGeneration: 3
unavailableReplicas: 1
kind: List
metadata:
resourceVersion: ""
selfLink: ""
Investigation A,
Updated includeOutboundIPRanges annotation as follows, not helping
$ kubectl edit deploy -n nginx
annotations:
traffic.sidecar.istio.io/includeOutboundIPRanges: 172.20.0.0/20
Investigation B,
Removed Cilium, re-install istio, then re-install nginx. Nginx injection becomes fine, Nginx pod runs well.
Investigation C,
As a comparison, I switched the install step between 2). and 3)., Nginx injection will be fine, the nginx welcome page can be seen. But this "Address is not allowed" error would come out again after "terminating manually" ec2 worker instances - ASG auto-create all ec2 worker instances.
FYI, cilium and istio status
$ kubectl -n kube-system exec -ti cilium-4wzgd cilium-health status
Probe time: 2019-03-08T16:35:57Z
Nodes:
ip-10-250-206-54.ec2.internal (localhost):
Host connectivity to 10.250.206.54:
ICMP to stack: OK, RTT=440.788µs
HTTP to agent: OK, RTT=665.779µs
ip-10-250-198-72.ec2.internal:
Host connectivity to 10.250.198.72:
ICMP to stack: OK, RTT=799.994µs
HTTP to agent: OK, RTT=1.594971ms
ip-10-250-199-154.ec2.internal:
Host connectivity to 10.250.199.154:
ICMP to stack: OK, RTT=770.777µs
HTTP to agent: OK, RTT=1.692356ms
ip-10-250-205-177.ec2.internal:
Host connectivity to 10.250.205.177:
ICMP to stack: OK, RTT=460.927µs
HTTP to agent: OK, RTT=1.383852ms
ip-10-250-213-68.ec2.internal:
Host connectivity to 10.250.213.68:
ICMP to stack: OK, RTT=766.769µs
HTTP to agent: OK, RTT=1.401989ms
ip-10-250-214-179.ec2.internal:
Host connectivity to 10.250.214.179:
ICMP to stack: OK, RTT=781.72µs
HTTP to agent: OK, RTT=2.614356ms
$ kubectl -n kube-system exec -ti cilium-4wzgd -- cilium status
KVStore: Ok etcd: 1/1 connected: https://cilium-etcd-client.kube-system.svc:2379 - 3.3.11 (Leader)
ContainerRuntime: Ok docker daemon: OK
Kubernetes: Ok 1.11+ (v1.11.5-eks-6bad6d) [linux/amd64]
Kubernetes APIs: ["CustomResourceDefinition", "cilium/v2::CiliumNetworkPolicy", "core/v1::Endpoint", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
Cilium: Ok OK
NodeMonitor: Disabled
Cilium health daemon: Ok
IPv4 address pool: 6/65535 allocated from 10.54.0.0/16
Controller Status: 34/34 healthy
Proxy Status: OK, ip 10.54.0.1, port-range 10000-20000
Cluster health: 6/6 reachable (2019-03-08T16:36:57Z)
$ kubectl get namespace -L istio-injection
NAME STATUS AGE ISTIO-INJECTION
default Active 4h
istio-system Active 4m
kube-public Active 4h
kube-system Active 4h
nginx Active 4h enabled
$ for pod in $(kubectl -n istio-system get pod -listio=sidecar-injector -o jsonpath='{.items[*].metadata.name}'); do kubectl -n istio-system logs ${pod}; done
2019-03-08T16:35:02.948778Z info version root#464fc845-2bf8-11e9-b805-0a580a2c0506-docker.io/istio-1.0.6-98598f88f6ee9c1e6b3f03b652d8e0e3cd114fa2-dirty-Modified
2019-03-08T16:35:02.950343Z info New configuration: sha256sum cf9491065c492014f0cb69c8140a415f0f435a81d2135efbfbab070cf6f16554
2019-03-08T16:35:02.950377Z info Policy: enabled
2019-03-08T16:35:02.950398Z info Template: |
initContainers:
- name: istio-init
image: "docker.io/istio/proxy_init:1.0.6"
args:
- "-p"
- [[ .MeshConfig.ProxyListenPort ]]
- "-u"
- 1337
- "-m"
- [[ annotation .ObjectMeta `sidecar.istio.io/interceptionMode` .ProxyConfig.InterceptionMode ]]
- "-i"
- "[[ annotation .ObjectMeta `traffic.sidecar.istio.io/includeOutboundIPRanges` "172.20.0.0/16" ]]"
- "-x"
- "[[ annotation .ObjectMeta `traffic.sidecar.istio.io/excludeOutboundIPRanges` "" ]]"
- "-b"
- "[[ annotation .ObjectMeta `traffic.sidecar.istio.io/includeInboundPorts` (includeInboundPorts .Spec.Containers) ]]"
- "-d"
- "[[ excludeInboundPort (annotation .ObjectMeta `status.sidecar.istio.io/port` 0 ) (annotation .ObjectMeta `traffic.sidecar.istio.io/excludeInboundPorts` "" ) ]]"
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true
restartPolicy: Always
containers:
- name: istio-proxy
image: [[ annotation .ObjectMeta `sidecar.istio.io/proxyImage` "docker.io/istio/proxyv2:1.0.6" ]]
ports:
- containerPort: 15090
protocol: TCP
name: http-envoy-prom
args:
- proxy
- sidecar
- --configPath
- [[ .ProxyConfig.ConfigPath ]]
- --binaryPath
- [[ .ProxyConfig.BinaryPath ]]
- --serviceCluster
[[ if ne "" (index .ObjectMeta.Labels "app") -]]
- [[ index .ObjectMeta.Labels "app" ]]
[[ else -]]
- "istio-proxy"
[[ end -]]
- --drainDuration
- [[ formatDuration .ProxyConfig.DrainDuration ]]
- --parentShutdownDuration
- [[ formatDuration .ProxyConfig.ParentShutdownDuration ]]
- --discoveryAddress
- [[ annotation .ObjectMeta `sidecar.istio.io/discoveryAddress` .ProxyConfig.DiscoveryAddress ]]
- --discoveryRefreshDelay
- [[ formatDuration .ProxyConfig.DiscoveryRefreshDelay ]]
- --zipkinAddress
- [[ .ProxyConfig.ZipkinAddress ]]
- --connectTimeout
- [[ formatDuration .ProxyConfig.ConnectTimeout ]]
- --proxyAdminPort
- [[ .ProxyConfig.ProxyAdminPort ]]
[[ if gt .ProxyConfig.Concurrency 0 -]]
- --concurrency
- [[ .ProxyConfig.Concurrency ]]
[[ end -]]
- --controlPlaneAuthPolicy
- [[ annotation .ObjectMeta `sidecar.istio.io/controlPlaneAuthPolicy` .ProxyConfig.ControlPlaneAuthPolicy ]]
[[- if (ne (annotation .ObjectMeta `status.sidecar.istio.io/port` 0 ) "0") ]]
- --statusPort
- [[ annotation .ObjectMeta `status.sidecar.istio.io/port` 0 ]]
- --applicationPorts
- "[[ annotation .ObjectMeta `readiness.status.sidecar.istio.io/applicationPorts` (applicationPorts .Spec.Containers) ]]"
[[- end ]]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: INSTANCE_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: ISTIO_META_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ISTIO_META_INTERCEPTION_MODE
value: [[ or (index .ObjectMeta.Annotations "sidecar.istio.io/interceptionMode") .ProxyConfig.InterceptionMode.String ]]
[[ if .ObjectMeta.Annotations ]]
- name: ISTIO_METAJSON_ANNOTATIONS
value: |
[[ toJson .ObjectMeta.Annotations ]]
[[ end ]]
[[ if .ObjectMeta.Labels ]]
- name: ISTIO_METAJSON_LABELS
value: |
[[ toJson .ObjectMeta.Labels ]]
[[ end ]]
imagePullPolicy: IfNotPresent
[[ if (ne (annotation .ObjectMeta `status.sidecar.istio.io/port` 0 ) "0") ]]
readinessProbe:
httpGet:
path: /healthz/ready
port: [[ annotation .ObjectMeta `status.sidecar.istio.io/port` 0 ]]
initialDelaySeconds: [[ annotation .ObjectMeta `readiness.status.sidecar.istio.io/initialDelaySeconds` 1 ]]
periodSeconds: [[ annotation .ObjectMeta `readiness.status.sidecar.istio.io/periodSeconds` 2 ]]
failureThreshold: [[ annotation .ObjectMeta `readiness.status.sidecar.istio.io/failureThreshold` 30 ]]
[[ end -]]securityContext:
readOnlyRootFilesystem: true
[[ if eq (annotation .ObjectMeta `sidecar.istio.io/interceptionMode` .ProxyConfig.InterceptionMode) "TPROXY" -]]
capabilities:
add:
- NET_ADMIN
runAsGroup: 1337
[[ else -]]
runAsUser: 1337
[[ end -]]
restartPolicy: Always
resources:
[[ if (isset .ObjectMeta.Annotations `sidecar.istio.io/proxyCPU`) -]]
requests:
cpu: "[[ index .ObjectMeta.Annotations `sidecar.istio.io/proxyCPU` ]]"
memory: "[[ index .ObjectMeta.Annotations `sidecar.istio.io/proxyMemory` ]]"
[[ else -]]
requests:
cpu: 10m
[[ end -]]
volumeMounts:
- mountPath: /etc/istio/proxy
name: istio-envoy
- mountPath: /etc/certs/
name: istio-certs
readOnly: true
volumes:
- emptyDir:
medium: Memory
name: istio-envoy
- name: istio-certs
secret:
optional: true
[[ if eq .Spec.ServiceAccountName "" -]]
secretName: istio.default
[[ else -]]
secretName: [[ printf "istio.%s" .Spec.ServiceAccountName ]]
[[ end -]]
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 172.20.0.1 <none> 443/TCP 5h
$ kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-*1.ec2.internal Ready <none> 5h v1.11.5
ip-*2.ec2.internal Ready <none> 5h v1.11.5
ip-*3.ec2.internal Ready <none> 5h v1.11.5
ip-*4.ec2.internal Ready <none> 5h v1.11.5
ip-*5.ec2.internal Ready <none> 5h v1.11.5
ip-*6.ec2.internal Ready <none> 5h v1.11.5
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
istio-system istio-citadel-796c94878b-jt5tb 1/1 Running 0 13m
istio-system istio-egressgateway-864444d6ff-vwptk 1/1 Running 0 13m
istio-system istio-galley-6c68c5dbcf-fmtvp 1/1 Running 0 13m
istio-system istio-ingressgateway-694576c7bb-kmk8k 1/1 Running 0 13m
istio-system istio-pilot-79f5f46dd5-kbr45 2/2 Running 0 13m
istio-system istio-policy-5bd5578b94-qzzhd 2/2 Running 0 13m
istio-system istio-sidecar-injector-6d8f88c98f-slr6x 1/1 Running 0 13m
istio-system istio-telemetry-5598f86cd8-z7kr5 2/2 Running 0 13m
istio-system prometheus-76db5fddd5-hw9pb 1/1 Running 0 13m
kube-system aws-node-5wv4g 1/1 Running 0 4h
kube-system aws-node-gsf7l 1/1 Running 0 4h
kube-system aws-node-ksddt 1/1 Running 0 4h
kube-system aws-node-lszrr 1/1 Running 0 4h
kube-system aws-node-r4gcg 1/1 Running 0 4h
kube-system aws-node-wtcvj 1/1 Running 0 4h
kube-system cilium-4wzgd 1/1 Running 0 4h
kube-system cilium-56sq5 1/1 Running 0 4h
kube-system cilium-etcd-4vndb7tl6w 1/1 Running 0 4h
kube-system cilium-etcd-operator-6d9975f5df-zcb5r 1/1 Running 0 4h
kube-system cilium-etcd-r9h4txhgld 1/1 Running 0 4h
kube-system cilium-etcd-t2fldlwxzh 1/1 Running 0 4h
kube-system cilium-fkx8d 1/1 Running 0 4h
kube-system cilium-glc8l 1/1 Running 0 4h
kube-system cilium-gvm5f 1/1 Running 0 4h
kube-system cilium-jscn8 1/1 Running 0 4h
kube-system cilium-operator-7df75f5cc8-tnv54 1/1 Running 0 4h
kube-system coredns-7bcbfc4774-fr59z 1/1 Running 0 5h
kube-system coredns-7bcbfc4774-xxwbg 1/1 Running 0 5h
kube-system etcd-operator-7b9768bc99-8fxf2 1/1 Running 0 4h
kube-system kube-proxy-bprmp 1/1 Running 0 5h
kube-system kube-proxy-ccb2q 1/1 Running 0 5h
kube-system kube-proxy-dv2mn 1/1 Running 0 5h
kube-system kube-proxy-qds2r 1/1 Running 0 5h
kube-system kube-proxy-rf466 1/1 Running 0 5h
kube-system kube-proxy-rz2ck 1/1 Running 0 5h
kube-system tiller-deploy-57c574bfb8-cd6rn 1/1 Running 0 4h
I ran in the same issue with Calico as a CNI on EKS, this is surely related to this.
After installing istio I get this error:
Internal error occurred: failed calling admission webhook \"mixer.validation.istio.io\": Post https://istio-galley.istio-system.svc:443/admitmixer?timeout=30s: Address is not allowed
My theory is :
This is due to the fact that the Calico CNI is present only on my worker nodes (Pods CIDR is 192.168.../16) and the control plane still run the AWS CNI as I don't have control over this with EKS.
Meaning that the webhook (running from the control plane) isn't allowed to communicate with my service istio-galley.istio-system.svc having an IP outside of the VPC.