I launch an EC2 instance with security group that have an inbound SSH through port 22 from 0.0.0.0/0. In my local machine I can connect to this instance by using its public DNS like that:
ssh ec2-user#ec2-54-144-XX-XXX.compute-1.amazonaws.com -i KeyPair.pem
However, I cannot reach the same instance by using its public ip address
ssh -v ec2-user#54.144.XX.XXX -i KeyPair.pem
OpenSSH_8.0p1 Ubuntu-6ubuntu0.1, OpenSSL 1.1.1c 28 May 2019
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: Connecting to 54.144.XX.XXX [54.144.XX.XXX] port 22.
debug1: connect to address 54.144.XX.XXX port 22: Network is unreachable
ssh: connect to host 54.144.XX.XXX port 22: Network is unreachable
This is also my /etc/ssh/ssh_config file:
Host *
# ForwardAgent no
# ForwardX11 no
# ForwardX11Trusted yes
PasswordAuthentication yes
# HostbasedAuthentication no
# GSSAPIAuthentication no
# GSSAPIDelegateCredentials no
# GSSAPIKeyExchange no
# GSSAPITrustDNS no
# BatchMode no
# CheckHostIP yes
# AddressFamily any
# ConnectTimeout 0
# StrictHostKeyChecking ask
# IdentityFile ~/.ssh/id_rsa
# IdentityFile ~/.ssh/id_dsa
# IdentityFile ~/.ssh/id_ecdsa
# IdentityFile ~/.ssh/id_ed25519
# Port 22
# Ciphers aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc
# MACs hmac-md5,hmac-sha1,umac-64#openssh.com
# EscapeChar ~
# Tunnel no
# TunnelDevice any:any
# PermitLocalCommand no
# VisualHostKey no
# ProxyCommand ssh -q -W %h:%p gateway.example.com
# RekeyLimit 1G 1h
SendEnv LANG LC_*
HashKnownHosts yes
GSSAPIAuthentication yes
I have spin up Elasticcache Redis cluster mode enabled cluster on AWS. I am having 3 master shards and 1 replica for each(total 3 replicas). I have turn on in-transit encryption. For this I have installed stunnel on my EC2 instance and my config file looks like. 3001 is my cluster port
[redis-cli]
client = yes
accept = 127.0.0.1:3001
connect = master1_url:3001
[redis-cli1]
client = yes
accept = 127.0.0.1:3002
connect = replica1_url.com:3001
[redis-cli2]
client = yes
accept = 127.0.0.1:3003
connect = master2_url:3001
[redis-cli3]
client = yes
accept = 127.0.0.1:3004
connect = replica2_url.com:3001
[redis-cli4]
client = yes
accept = 127.0.0.1:3005
connect = master3_url:3001
[redis-cli3]
client = yes
accept = 127.0.0.1:3006
connect = replica3_url.com:3001
----------------------------------------------
sudo netstat -tulnp | grep -i stunnel
tcp 0 0 127.0.0.1:3001 0.0.0.0:* LISTEN 32272/stunnel
tcp 0 0 127.0.0.1:3002 0.0.0.0:* LISTEN 32272/stunnel
tcp 0 0 127.0.0.1:3003 0.0.0.0:* LISTEN 32272/stunnel
tcp 0 0 127.0.0.1:3004 0.0.0.0:* LISTEN 32272/stunnel
tcp 0 0 127.0.0.1:3005 0.0.0.0:* LISTEN 32272/stunnel
tcp 0 0 127.0.0.1:3006 0.0.0.0:* LISTEN 32272/stunnel
When I connect using localhost (src/redis-cli -c -h localhost -p 3001)my connection is successful. But when I hit "get key" it stuck with following
localhost:3001> get key
-> Redirected to slot [12539] located at master3_url:3001
If I change the cluster to single shard and single replica all works fine. What setting I am missing when using multi shards cluster. My Ec2 instance is open to accept connection on all ports. Redis Cluster is open to accept connection from Ec2 instance on all ports.
This is my first question on stackoverflow :)
My goal is to model a hybrid/heterogeneous Kubernetes cluster, where I have the following setup:
Master node runs on AWS (cloud) - ip-172-31-28-6
Slave node runs on my laptop - osboxes
Slave node runs on a Raspberry Pi - edge-1
Running a Kubernetes cluster with three VMs locally on my laptop is no problem and works fine with both Weave Net. However, there are some communication problems (I guess), when modelling my Kubernetes cluster as depicted above.
As Kubernetes is designed to run on nodes, such that all nodes are located in the same network, I set up an OpenVPN server on AWS and connect with both my laptop and Raspberry Pi to it. I was hoping that this would be enough to run Kubernetes on a heterogeneous setup, when the slave nodes are in a different network. Of course, this was an incorrect assumption.
If I run the Kubernetes dashboard on a slave node and try to access it, I get a timeout. If I run it on the Master node, everything works as expected.
I set up the cluster on AWS with kubeadm init --apiserver-advertise-address= and used kubeadm join to connect with the nodes.
$ kubectl get pods --all-namespaces -o wide:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system etcd-ip-172-31-28-6 1/1 Running 0 5m 172.31.28.6 ip-172-31-28-6
kube-system kube-apiserver-ip-172-31-28-6 1/1 Running 0 5m 172.31.28.6 ip-172-31-28-6
kube-system kube-controller-manager-ip-172-31-28-6 1/1 Running 0 5m 172.31.28.6 ip-172-31-28-6
kube-system kube-dns-6f4fd4bdf-w6ctf 0/3 ContainerCreating 0 15h <none> osboxes
kube-system kube-proxy-2pl2f 1/1 Running 0 15h 172.31.28.6 ip-172-31-28-6
kube-system kube-proxy-7b89c 0/1 CrashLoopBackOff 15 15h 192.168.2.106 edge-1
kube-system kube-proxy-qg69g 1/1 Running 1 15h 10.0.2.15 osboxes
kube-system kube-scheduler-ip-172-31-28-6 1/1 Running 0 5m 172.31.28.6 ip-172-31-28-6
kube-system weave-net-pqxfp 1/2 CrashLoopBackOff 189 15h 172.31.28.6 ip-172-31-28-6
kube-system weave-net-thhzr 1/2 CrashLoopBackOff 12 36m 192.168.2.106 edge-1
kube-system weave-net-v69hj 2/2 Running 7 15h 10.0.2.15 osboxes
$ kubectl -n kube-system logs --v=7 kube-dns-6f4fd4bdf-w6ctf -c kubedns:
...
I0321 09:04:25.620580 23936 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/kube-dns-6f4fd4bdf-w6ctf/log?container=kubedns
I0321 09:04:25.620605 23936 round_trippers.go:421] Request Headers:
I0321 09:04:25.620611 23936 round_trippers.go:424] Accept: application/json, */*
I0321 09:04:25.620616 23936 round_trippers.go:424] User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:04:25.713821 23936 round_trippers.go:439] Response Status: 400 Bad Request in 93 milliseconds
I0321 09:04:25.714106 23936 helpers.go:201] server response object: [{
"metadata": {},
"status": "Failure",
"message": "container \"kubedns\" in pod \"kube-dns-6f4fd4bdf-w6ctf\" is waiting to start: ContainerCreating",
"reason": "BadRequest",
"code": 400
}]
F0321 09:04:25.714134 23936 helpers.go:119] Error from server (BadRequest): container "kubedns" in pod "kube-dns-6f4fd4bdf-w6ctf" is waiting to start: ContainerCreating
kubectl -n kube-system logs --v=7 kube-proxy-7b89c:
...
I0321 09:06:51.803852 24289 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/kube-proxy-7b89c/log
I0321 09:06:51.803879 24289 round_trippers.go:421] Request Headers:
I0321 09:06:51.803891 24289 round_trippers.go:424] User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:06:51.803900 24289 round_trippers.go:424] Accept: application/json, */*
I0321 09:08:59.110869 24289 round_trippers.go:439] Response Status: 500 Internal Server Error in 127306 milliseconds
I0321 09:08:59.111129 24289 helpers.go:201] server response object: [{
"metadata": {},
"status": "Failure",
"message": "Get https://192.168.2.106:10250/containerLogs/kube-system/kube-proxy-7b89c/kube-proxy: dial tcp 192.168.2.106:10250: getsockopt: connection timed out",
"code": 500
}]
F0321 09:08:59.111156 24289 helpers.go:119] Error from server: Get https://192.168.2.106:10250/containerLogs/kube-system/kube-proxy-7b89c/kube-proxy: dial tcp 192.168.2.106:10250: getsockopt: connection timed out
kubectl -n kube-system logs --v=7 weave-net-pqxfp -c weave:
...
I0321 09:12:08.047206 24847 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/weave-net-pqxfp/log?container=weave
I0321 09:12:08.047233 24847 round_trippers.go:421] Request Headers:
I0321 09:12:08.047335 24847 round_trippers.go:424] Accept: application/json, */*
I0321 09:12:08.047347 24847 round_trippers.go:424] User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:12:08.062494 24847 round_trippers.go:439] Response Status: 200 OK in 15 milliseconds
DEBU: 2018/03/21 09:11:26.847013 [kube-peers] Checking peer "fa:10:a4:97:7e:7b" against list &{[{6e:fd:f4:ef:1e:f5 osboxes}]}
Peer not in list; removing persisted data
INFO: 2018/03/21 09:11:26.880946 Command line options: map[expect-npc:true ipalloc-init:consensus=3 db-prefix:/weavedb/weave-net http-addr:127.0.0.1:6784 ipalloc-range:10.32.0.0/12 nickname:ip-172-31-28-6 host-root:/host name:fa:10:a4:97:7e:7b no-dns:true status-addr:0.0.0.0:6782 datapath:datapath docker-api: port:6783 conn-limit:30]
INFO: 2018/03/21 09:11:26.880995 weave 2.2.1
FATA: 2018/03/21 09:11:26.881117 Inconsistent bridge state detected. Please do 'weave reset' and try again
kubectl -n kube-system logs --v=7 weave-net-thhzr -c weave:
...
I0321 09:15:13.787905 25113 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/weave-net-thhzr/log?container=weave
I0321 09:15:13.787932 25113 round_trippers.go:421] Request Headers:
I0321 09:15:13.787938 25113 round_trippers.go:424] Accept: application/json, */*
I0321 09:15:13.787946 25113 round_trippers.go:424] User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:17:21.126863 25113 round_trippers.go:439] Response Status: 500 Internal Server Error in 127338 milliseconds
I0321 09:17:21.127140 25113 helpers.go:201] server response object: [{
"metadata": {},
"status": "Failure",
"message": "Get https://192.168.2.106:10250/containerLogs/kube-system/weave-net-thhzr/weave: dial tcp 192.168.2.106:10250: getsockopt: connection timed out",
"code": 500
}]
F0321 09:17:21.127167 25113 helpers.go:119] Error from server: Get https://192.168.2.106:10250/containerLogs/kube-system/weave-net-thhzr/weave: dial tcp 192.168.2.106:10250: getsockopt: connection timed out
$ ifconfig (Kubernetes master on AWS):
datapath Link encap:Ethernet HWaddr ae:90:9a:b2:7e:d9
inet6 addr: fe80::ac90:9aff:feb2:7ed9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1376 Metric:1
RX packets:29 errors:0 dropped:0 overruns:0 frame:0
TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:1904 (1.9 KB) TX bytes:1188 (1.1 KB)
docker0 Link encap:Ethernet HWaddr 02:42:50:39:1f:c7
inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
eth0 Link encap:Ethernet HWaddr 06:a3:d0:8e:19:72
inet addr:172.31.28.6 Bcast:172.31.31.255 Mask:255.255.240.0
inet6 addr: fe80::4a3:d0ff:fe8e:1972/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9001 Metric:1
RX packets:10323322 errors:0 dropped:0 overruns:0 frame:0
TX packets:9418208 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3652314289 (3.6 GB) TX bytes:3117288442 (3.1 GB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:11388236 errors:0 dropped:0 overruns:0 frame:0
TX packets:11388236 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:2687297929 (2.6 GB) TX bytes:2687297929 (2.6 GB)
tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:10.8.0.1 P-t-P:10.8.0.2 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:97222 errors:0 dropped:0 overruns:0 frame:0
TX packets:164607 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:13381022 (13.3 MB) TX bytes:209129403 (209.1 MB)
vethwe-bridge Link encap:Ethernet HWaddr 12:59:54:73:0f:91
inet6 addr: fe80::1059:54ff:fe73:f91/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1376 Metric:1
RX packets:18 errors:0 dropped:0 overruns:0 frame:0
TX packets:36 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1476 (1.4 KB) TX bytes:2940 (2.9 KB)
vethwe-datapath Link encap:Ethernet HWaddr 8e:75:1c:92:93:0d
inet6 addr: fe80::8c75:1cff:fe92:930d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1376 Metric:1
RX packets:36 errors:0 dropped:0 overruns:0 frame:0
TX packets:18 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2940 (2.9 KB) TX bytes:1476 (1.4 KB)
vxlan-6784 Link encap:Ethernet HWaddr a6:02:da:5e:d5:2a
inet6 addr: fe80::a402:daff:fe5e:d52a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:65485 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:8 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
$ sudo systemctl status kubelet.service (on AWS):
Mar 21 09:34:59 ip-172-31-28-6 kubelet[19676]: W0321 09:34:59.202058 19676 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 21 09:34:59 ip-172-31-28-6 kubelet[19676]: E0321 09:34:59.202452 19676 kubelet.go:2109] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: I0321 09:35:01.535541 19676 kuberuntime_manager.go:514] Container {Name:weave Image:weaveworks/weave-kube:2.2.1 Command:[/home/weave/launch.sh] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:HOSTNAME Value: ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:spec.nodeName,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weavedb ReadOnly:false MountPath:/weavedb SubPath: MountPropagation:<nil>} {Name:cni-bin ReadOnly:false MountPath:/host/opt SubPath: MountPropagation:<nil>} {Name:cni-bin2 ReadOnly:false MountPath:/host/home SubPath: MountPropagation:<nil>} {Name:cni-conf ReadOnly:false MountPath:/host/etc SubPath: MountPropagation:<nil>} {Name:dbus ReadOnly:false MountPath:/host/var/lib/dbus SubPath: MountPropagation:<nil>} {Name:lib-modules ReadOnly:false MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:weave-net-token-vn8rh ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:6784,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: I0321 09:35:01.536504 19676 kuberuntime_manager.go:758] checking backoff for container "weave" in pod "weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)"
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: I0321 09:35:01.536636 19676 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=weave pod=weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: E0321 09:35:01.536664 19676 pod_workers.go:186] Error syncing pod c6450070-2c61-11e8-a50d-06a3d08e1972 ("weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)"), skipping: failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=weave pod=weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)"
$ sudo systemctl status kubelet.service (on Laptop)
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.662670 715 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.663412 715 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.663869 715 kuberuntime_manager.go:647] createPodSandbox for pod "kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.664295 715 pod_workers.go:186] Error syncing pod 11886465-2c61-11e8-a50d-06a3d08e1972 ("kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)"), skipping: failed to "CreatePodSandbox" for "kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)\" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Mar 21 05:47:20 osboxes kubelet[715]: W0321 05:47:20.536161 715 pod_container_deletor.go:77] Container "bbf490835face43b70c24dbcb67c3f75872e7831b5e2605dc8bb71210910e273" not found in pod's containers
$ sudo systemctl status kubelet.service (on Raspberry Pi):
Mar 21 09:29:01 edge-1 kubelet[339]: I0321 09:29:01.188199 339 kuberuntime_manager.go:514] Container {Name:kube-proxy Image:gcr.io/google_containers/kube-proxy-amd64:v1.9.5 Command:[/usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:kube-proxy ReadOnly:false MountPath:/var/lib/kube-proxy SubPath: MountPropagation:<nil>} {Name:xtables-lock ReadOnly:false MountPath:/run/xtables.lock SubPath: MountPropagation:<nil>} {Name:lib-modules ReadOnly:true MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:kube-proxy-token-px7dt ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Mar 21 09:29:01 edge-1 kubelet[339]: I0321 09:29:01.189023 339 kuberuntime_manager.go:758] checking backoff for container "kube-proxy" in pod "kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)"
Mar 21 09:29:01 edge-1 kubelet[339]: I0321 09:29:01.190174 339 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)
Mar 21 09:29:01 edge-1 kubelet[339]: E0321 09:29:01.190518 339 pod_workers.go:186] Error syncing pod 5bebafa1-2c61-11e8-a50d-06a3d08e1972 ("kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)"), skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)"
Mar 21 09:29:02 edge-1 kubelet[339]: W0321 09:29:02.278342 339 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 21 09:29:02 edge-1 kubelet[339]: E0321 09:29:02.282534 339 kubelet.go:2120] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
You definitely have a problem with networking between Kubernetes master and nodes.
But, first of all, that is not a best idea to create that kind of hybrid installation. You must have a stable networking between master(s) and nodes, or it will cause many problems. But that is a hard to achieve using Internet connection.
If you want to prepare a Hybrid installation you can use Federation between Kubernetes cluster in AWS and on your local hardware.
But, regarded to your problem, I see that you have a problem with Weave net on a Master and on a edge-1 node. It is not clear from logs which kind of problem you have, try to run Weave container with WEAVE_DEBUG=1 environment variable. Without networking other pods like kube-dns will not work properly.
Also, how did you setup OpenVPN. You must have routing between subnet on AWS and client-to-client. So, all addresses which you using for setup your cluster on all nodes has to be routed between each other. Check another one time to which address you bind Kubernetes components and Weave and are that addresses routable.
This message explains one of the crashes:
FATA: 2018/03/21 09:11:26.881117 Inconsistent bridge state detected. Please do 'weave reset' and try again
Since it's slightly complicated to run the weave command on a Kubernetes node, just reboot the node and the bridge should be recreated from scratch.
This message says it couldn't contact the node to get logs:
F0321 09:08:59.111156 24289 helpers.go:119] Error from server: Get https://192.168.2.106:10250/containerLogs/kube-system/kube-proxy-7b89c/kube-proxy: dial tcp 192.168.2.106:10250: getsockopt: connection timed out
Consider whether those hosts can reach each other on their regular network.
I am using Hazelcast v3.6 on two amazon AWS virtual machines (not using the AWS specific settings for hazelcast). The connection is supposed to work via TCP/IP connection settings (not multicasting). I have opened 5701-5801 address for connection on the virtual machines.
I have tried using iperf on the two virtual machines using which I can see that the client on one VM connects to the server on another VM (and vice versa when I switch the client server setup for iperf).
When I launch two Hazelcast servers on different VM's, the connection is not established. The log statements and the hazelcast.xml config are given below (I am not using the programmatic settings for Hazelcast). I have changed the IP addresses below:
20160401-16:41:02.812 [cached2] InitConnectionTask INFO - [45.46.47.48]:5701 [dev] [3.6] Connecting to /22.23.24.25:5701, timeout: 0, bind-any: true
20160401-16:41:02.812 [cached3] InitConnectionTask INFO - [45.46.47.48]:5701 [dev] [3.6] Connecting to /22.23.24.25:5703, timeout: 0, bind-any: true
20160401-16:41:02.813 [cached1] InitConnectionTask INFO - [45.46.47.48]:5701 [dev] [3.6] Connecting to /22.23.24.25:5702, timeout: 0, bind-any: true
20160401-16:41:02.816 [cached1] InitConnectionTask INFO - [45.46.47.48]:5701 [dev] [3.6] Could not connect to: /22.23.24.25:5702. Reason: SocketException[Connection refused to address /22.23.24.25:570
2]
20160401-16:41:02.816 [cached1] TcpIpJoiner INFO - [45.46.47.48]:5701 [dev] [3.6] Address[22.23.24.25]:5702 is added to the blacklist.
20160401-16:41:02.817 [cached3] InitConnectionTask INFO - [45.46.47.48]:5701 [dev] [3.6] Could not connect to: /22.23.24.25:5703. Reason: SocketException[Connection refused to address /22.23.24.25:570
3]
20160401-16:41:02.817 [cached3] TcpIpJoiner INFO - [45.46.47.48]:5701 [dev] [3.6] Address[22.23.24.25]:5703 is added to the blacklist.
20160401-16:41:02.834 [cached2] TcpIpConnectionManager INFO - [45.46.47.48]:5701 [dev] [3.6] Established socket connection between /45.46.47.48:51965 and /22.23.24.25:5701
20160401-16:41:02.849 [hz._hzInstance_1_dev.IO.thread-in-0] TcpIpConnection INFO - [45.46.47.48]:5701 [dev] [3.6] Connection [Address[22.23.24.25]:5701] lost. Reason: java.io.EOFException[Remote socket
closed!]
20160401-16:41:02.851 [hz._hzInstance_1_dev.IO.thread-in-0] NonBlockingSocketReader WARN - [45.46.47.48]:5701 [dev] [3.6] hz._hzInstance_1_dev.IO.thread-in-0 Closing socket to endpoint Address[54.89.161.2
28]:5701, Cause:java.io.EOFException: Remote socket closed!
20160401-16:41:03.692 [cached2] InitConnectionTask INFO - [45.46.47.48]:5701 [dev] [3.6] Connecting to /22.23.24.25:5701, timeout: 0, bind-any: true
20160401-16:41:03.693 [cached2] TcpIpConnectionManager INFO - [45.46.47.48]:5701 [dev] [3.6] Established socket connection between /45.46.47.48:60733 and /22.23.24.25:5701
20160401-16:41:03.696 [hz._hzInstance_1_dev.IO.thread-in-1] TcpIpConnection INFO - [45.46.47.48]:5701 [dev] [3.6] Connection [Address[22.23.24.25]:5701] lost. Reason: java.io.EOFException[Remote socket
closed!]
Part of Hazelcast config
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.6.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<group>
<name>abc</name>
<password>defg</password>
</group>
<network>
<port auto-increment="true" port-count="100">5701</port>
<outbound-ports>
<ports>0-5900</ports>
</outbound-ports>
<join>
<multicast enabled="false">
<!--<multicast-group>224.2.2.3</multicast-group>
<multicast-port>54327</multicast-port>-->
</multicast>
<tcp-ip enabled="true">
<member>22.23.24.25</member>
</tcp-ip>
</join>
<interfaces enabled="true">
<interface>45.46.47.48</interface>
</interfaces>
<ssl enabled="false" />
<socket-interceptor enabled="false" />
<symmetric-encryption enabled="false">
<algorithm>PBEWithMD5AndDES</algorithm>
<!-- salt value to use when generating the secret key -->
<salt>thesalt</salt>
<!-- pass phrase to use when generating the secret key -->
<password>thepass</password>
<!-- iteration count to use when generating the secret key -->
<iteration-count>19</iteration-count>
</symmetric-encryption>
</network>
<partition-group enabled="false"/>
iperf server and client log statements
Server listening on TCP port 5701
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 22.23.24.25, TCP port 5701
TCP window size: 1.33 MByte (default)
------------------------------------------------------------
[ 5] local 172.31.17.104 port 57398 connected with 22.23.24.25 port 5701
[ 4] local 172.31.17.104 port 5701 connected with 22.23.24.25 port 55589
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 662 MBytes 555 Mbits/sec
[ 4] 0.0-10.0 sec 797 MBytes 666 Mbits/sec
Server listening on TCP port 5701
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local xxx.xx.xxx.xx port 5701 connected with 22.23.24.25 port 57398
------------------------------------------------------------
Client connecting to 22.23.24.25, TCP port 5701
TCP window size: 1.62 MByte (default)
------------------------------------------------------------
[ 6] local 172.31.17.23 port 55589 connected with 22.23.24.25 port 5701
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 797 MBytes 669 Mbits/sec
[ 4] 0.0-10.0 sec 662 MBytes 553 Mbits/sec
Note:
I forgot to mention that I can connect from hazelcast client to server i.e. when I use a hazelcast client to connect to a single hazlecast server node, I am able to connect just fine
An outbound ports range which includes 0 is interpreted by hazelcast as "use ephemeral ports", so the <outbound-ports> element has actually no effect in your configuration. There is an associated test in hazelcast sources: https://github.com/hazelcast/hazelcast/blob/75251c4f01d131a9624fc3d0c4190de5cdf7d93a/hazelcast/src/test/java/com/hazelcast/nio/NodeIOServiceTest.java#L60
I am trying to setup 3 Kafka brokers on ubuntu EC2 machines. But I am getting ConnectException while starting zookeeper. All the ports in the security group of my ec2 intsances are already open.
Below is the stack trace:
[2016-03-03 07:37:12,040] ERROR Exception while listening (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.BindException: Cannot assign requested address
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376)
at java.net.ServerSocket.bind(ServerSocket.java:376)
at java.net.ServerSocket.bind(ServerSocket.java:330)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:507)
...
...
...
[2016-03-03 07:23:46,093] WARN Cannot open channel to 2 at election address /52.36.XXX.181:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
Below is the configuration of:
zookeeper.properties:
dataDir=/tmp/zookeeper
clientPort=2181
maxClientCnxns=0
server.1=52.37.XX.70:2888:3888
server.2=52.36.XXX.181:2888:3888
server.3=52.37.XX.42:2888:3888
initLimit=5
syncLimit=2
server.properties:
broker.id=1
port=9092
host.name=52.37.XX.70
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs
num.partitions=3
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
log.cleaner.enable=false
zookeeper.connect=52.37.XX.70:2181,52.36.XXX.181:2181,52.37.XX.42:2181
zookeeper.connection.timeout.ms=6000
I have added server's public IP in /etc/hosts of the instances. My modified /etc/hosts is as:
127.0.0.1 localhost localhost.localdomain ip-10-131-X-217
127.0.0.1 localhost localhost.localdomain 52.37.XX.70
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
The unique myid entries in /tmp/zookeper/myid are also entered correctly.
I have followed all the steps mentioned in: How to create a MultiNode - MultiBroker Cluster for Kafka on AWS
The issue was because I was using the Public IP of the server. Instead of that, use of Public DNS of ec2 instances fixed the issue.