istio-ingress can't start up - kubectl

When I start minikube and apply istio.yaml
bug the ingress can't start up:
eumji#eumji:~$ kubectl get pods -n istio-system
NAME READY STATUS RESTARTS AGE
istio-ca-76dddbd695-bdwm9 1/1 Running 5 2d
istio-ingress-85fb769c4d-qtbcx 0/1 CrashLoopBackOff 67 2d
istio-mixer-587fd4bbdb-ldvhb 3/3 Running 15 2d
istio-pilot-7db8db896c-9znqj 2/2 Running 10 2d
When I try to see the log I get following output:
eumji#eumji:~$ kubectl logs -f istio-ingress-85fb769c4d-qtbcx -n istio-system
ERROR: logging before flag.Parse: I1214 05:04:26.193386 1 main.go:68] Version root#24c944bda24b-0.3.0-24ec6a3ac3a1d592d1873d2d8198278a849b8301
ERROR: logging before flag.Parse: I1214 05:04:26.193463 1 main.go:109] Proxy role: proxy.Node{Type:"ingress", IPAddress:"", ID:"istio-ingress-85fb769c4d-qtbcx.istio-system", Domain:"istio-system.svc.cluster.local"}
ERROR: logging before flag.Parse: I1214 05:04:26.193480 1 resolve.go:35] Attempting to lookup address: istio-mixer
ERROR: logging before flag.Parse: I1214 05:04:41.195879 1 resolve.go:42] Finished lookup of address: istio-mixer
Error: lookup failed for udp address: i/o timeout
Usage:
agent proxy [flags]
--serviceregistry string Select the platform for service registry, options are {Kubernetes, Consul, Eureka} (default "Kubernetes")
--statsdUdpAddress string IP Address and Port of a statsd UDP listener (e.g. 10.75.241.127:9125)
--zipkinAddress string Address of the Zipkin service (e.g. zipkin:9411)
Global Flags:
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
-v, --v Level log level for V logs (default 0)
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
ERROR: logging before flag.Parse: E1214 05:04:41.198640 1 main.go:267] lookup failed for udp address: i/o timeout
What could be the reason?

There is not enough information in your post to figure out what may be wrong, in particular it seems that somehow your ingress isn't able to resolve istio-mixer which is unexpected.
Can you file a detailed issue
https://github.com/istio/issues/issues/new
And we can take it from there ?
Thanks

Are you using something like minikube? The quick-start docs give this hint: "Note: If your cluster is running in an environment that does not support an external load balancer (e.g., minikube), the EXTERNAL-IP of istio-ingress says . You must access the application using the service NodePort, or use port-forwarding instead."
https://istio.io/docs/setup/kubernetes/quick-start.html

Related

Error running Canary Deployment in Spinnaker

I am trying to enable the canary deployment for the AWS eks but my kayenta pod is not starting. When I describe the pod I see this error. Can anyone help?
Warning Unhealthy 12m (x2 over 12m) kubelet Readiness probe failed: wget: can't connect to remote host (127.0.0.1): Connection refused
Warning Unhealthy 2m56s (x59 over 12m) kubelet Readiness probe failed: wget: server returned error: HTTP/1.1 503
This is the status of pod:
NAME READY STATUS RESTARTS AGE
spin-clouddriver-d796bdc59-tpznw 1/1 Running 0 3h40m
spin-deck-77cc75b57d-w7rfp 1/1 Running 0 3h40m
spin-echo-db954bb9-phfd5 1/1 Running 0 3h40m
spin-front50-7c5684cf9-t7vl8 1/1 Running 0 3h40m
spin-gate-78d6779854-7xqz4 1/1 Running 0 3h40m
spin-kayenta-6d7b5fdfc6-p5tcp 0/1 Running 0 21m
spin-kayenta-869c46bfcf-8t5fh 0/1 Running 0 28m
spin-orca-7ddd66758d-mpnkg 1/1 Running 0 3h40m
spin-redis-5975cfcdc8-rnm9g 1/1 Running 0 45h
spin-rosco-b7dbb577-z4szz 1/1 Running 0 3h40m
I will try to address your issue from the Kubernetes perspective.
The errors you were experiencing:
Warning Unhealthy 12m (x2 over 12m) kubelet Readiness probe failed: wget: can't connect to remote host (127.0.0.1): Connection refused
Warning Unhealthy 2m56s (x59 over 12m) kubelet Readiness probe failed: wget: server returned error: HTTP/1.1 503
indicates that there was a problem with your ReadinessProbe configuration. Removing the ReadinessProbe from the deployment "fixed" the error but can cause some more issues in the future. To avoid that I recommend adding it back with a proper configuration:
Probes have a number of fields that you can use to more precisely
control the behavior of liveness and readiness checks:
initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to
0 seconds. Minimum value is 0.
periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1
for liveness. Minimum value is 1.
failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness
probe means restarting the container. In case of readiness probe the
Pod will be marked Unready. Defaults to 3. Minimum value is 1.
You'll need to adjust the Probe's configuration based on your apps behavior (usually by trial and error). The two resources I would recommend that will help you with that are:
Configure Liveness, Readiness and Startup Probes
Kubernetes best practices: Setting up health checks with readiness and liveness probes

CockroachDB on AWS EKS cluster - [n?] no stores bootstrapped

I am attempting to deploy CockroachDB:v2.1.6 to a new AWS EKS cluster. Everything is deployed successfully; statefulset, services, pv's & pvc's are created. The AWS EBS volumes are created successfully too.
The issue is the pods never get to a READY state.
pod/cockroachdb-0 0/1 Running 0 14m
pod/cockroachdb-1 0/1 Running 0 14m
pod/cockroachdb-2 0/1 Running 0 14m
If I 'describe' the pods I get the following:
Normal Pulled 46s kubelet, ip-10-5-109-70.eu-central-1.compute.internal Container image "cockroachdb/cockroach:v2.1.6" already present on machine
Normal Created 46s kubelet, ip-10-5-109-70.eu-central-1.compute.internal Created container cockroachdb
Normal Started 46s kubelet, ip-10-5-109-70.eu-central-1.compute.internal Started container cockroachdb
Warning Unhealthy 1s (x8 over 36s) kubelet, ip-10-5-109-70.eu-central-1.compute.internal Readiness probe failed: HTTP probe failed with statuscode: 503
If I examine the logs of a pod I see this:
I200409 11:45:18.073666 14 server/server.go:1403 [n?] no stores bootstrapped and --join flag specified, awaiting init command.
W200409 11:45:18.076826 87 vendor/google.golang.org/grpc/clientconn.go:1293 grpc: addrConn.createTransport failed to connect to {cockroachdb-0.cockroachdb:26257 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb on 172.20.0.10:53: no such host". Reconnecting...
W200409 11:45:18.076942 21 gossip/client.go:123 [n?] failed to start gossip client to cockroachdb-0.cockroachdb:26257: initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb on 172.20.0.10:53: no such host"
I came across this comment from the CockroachDB forum (https://forum.cockroachlabs.com/t/http-probe-failed-with-statuscode-503/2043/6)
Both the cockroach_out.log and cockroach_output1.log files you sent me (corresponding to mycockroach-cockroachdb-0 and mycockroach-cockroachdb-2) print out no stores bootstrapped during startup and prefix all their log lines with n?, indicating that they haven’t been allocated a node ID. I’d say that they may have never been properly initialized as part of the cluster.
I have deleted everything including pv's, pvc's & AWS EBS volumes through the kubectl delete command and reapplied with the same issue.
Any thoughts would be very much appreciated. Thank you
I was not aware that you had to initialize the CockroachDB cluster after creating it. I did the following to resolve my issue:
kubectl exec -it cockroachdb-0 -n /bin/sh
/cockroach/cockroach init
See here for more details - https://www.cockroachlabs.com/docs/v19.2/cockroach-init.html
After this the pods started running correctly.

Error when running cassandra on Google Cloud on external ip - Failed to bind port 9042 on 34.89.109.98

I am trying to make Cassandra run on Google Cloud using external ip of the VM. But I am getting error Failed to bind port 9042 on 34.89.109.98. As far as I can see, I have followed the rules of setting firewall rules but I am still not able to resolve the issue. I have attached the pics of my configuration for your reference.
1) The firewall rule is
2) The list of all the rules is
3) The VM is
More Information
I followed the steps in https://linuxize.com/post/how-to-install-apache-cassandra-on-debian-9/ to install Cassandra. This automatically started cassandra. Then I killed cassandra, changed the ip address to external IP in cassandra.yaml file and started it again. It didn't work. Then I started working around with VPN settings.
Part of the message dump after I issue the command to start cassandra /usr/sbin/cassandra -f
INFO [main] 2019-12-18 16:09:40,755 StorageService.java:1521 - JOINING: Finish joining ring
INFO [main] 2019-12-18 16:09:40,826 StorageService.java:2442 - Node localhost/127.0.0.1 state jump to NORMAL
INFO [main] 2019-12-18 16:09:41,027 NativeTransportService.java:68 - Netty using native Epoll event loop
INFO [main] 2019-12-18 16:09:41,071 Server.java:158 - Using Netty Version: [netty-buffer=netty-buffer-4.0.44.Final
.452812a, netty-codec=netty-codec-4.0.44.Final.452812a, netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812
a, netty-codec-http=netty-codec-http-4.0.44.Final.452812a, netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a
, netty-common=netty-common-4.0.44.Final.452812a, netty-handler=netty-handler-4.0.44.Final.452812a, netty-tcnative=
netty-tcnative-1.1.33.Fork26.142ecbb, netty-transport=netty-transport-4.0.44.Final.452812a, netty-transport-native-
epoll=netty-transport-native-epoll-4.0.44.Final.452812a, netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452
812a, netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, netty-transport-udt=netty-transport-udt-4.0.4
4.Final.452812a]
INFO [main] 2019-12-18 16:09:41,071 Server.java:159 - Starting listening for CQL clients on /35.197.238.136:9042 (
unencrypted)...
Exception (java.lang.IllegalStateException) encountered during startup: Failed to bind port 9042 on 35.197.238.136.
java.lang.IllegalStateException: Failed to bind port 9042 on 35.197.238.136.
at org.apache.cassandra.transport.Server.start(Server.java:163)
at java.util.Collections$SingletonSet.forEach(Collections.java:4769)
at org.apache.cassandra.service.NativeTransportService.start(NativeTransportService.java:124)
at org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:696)
at org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:546)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:635)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:742)
ERROR [main] 2019-12-18 16:09:41,100 CassandraDaemon.java:759 - Exception encountered during startup
java.lang.IllegalStateException: Failed to bind port 9042 on 35.197.238.136.
at org.apache.cassandra.transport.Server.start(Server.java:163) ~[apache-cassandra-3.11.5.jar:3.11.5]
at java.util.Collections$SingletonSet.forEach(Collections.java:4769) ~[na:1.8.0_232]
at org.apache.cassandra.service.NativeTransportService.start(NativeTransportService.java:124) ~[apache-cass
andra-3.11.5.jar:3.11.5]
at org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:696) [apache-cass
andra-3.11.5.jar:3.11.5]
at org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:546) [apache-cassandra-3.11.5.ja
r:3.11.5]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:635) [apache-cassandra-3.11.5
.jar:3.11.5]
Within the Cassandra cassandra.yaml file you can bind your Cassandra server to an IP address on which it is listening. The default is 127.0.0.1 (localhost) and is not suitable for external connections.
The address values you can use are the addresses that the Compute Engine has associated with it. These can be discovered using:
ip addr
It is important to realize that a Compute Engine may appear to have a public IP address when shown in the GCP Console, but that is not a network interface on the Compute Engine. In the example in your original question, the Compute Engine IP address would be 10.154.0.4. This is the address you want to set in your configuration file.
See also this document which describes setting up Cassandra on GCP:
Spinning up a Cassandra Cluster on Google Cloud (for free) with just a browser

Intermittent DNS issues while pulling docker image from ECR repository

Has anyone facing this issue with docker pull. we recently upgraded docker to 18.03.1-ce from then we are seeing the issue. Although we are not exactly sure if this is related to docker, but just want to know if anyone faced this problem.
We have done some troubleshooting using tcp dump the DNS queries being made were under the permissible limit of 1024 packet. which is a limit on EC2, We also tried working around the issue by modifying the /etc/resolv.conf file to use a higher retry \ timeout value, but that didn't seem to help.
we did a packet capture line by line and found something. we found some responses to be negative. If you use Wireshark, you can use 'udp.stream eq 12' as a filter to view one of the negative answers. we can see the resolver sending an answer "No such name". All these requests that get a negative response use the following name in the request:
354XXXXX.dkr.ecr.us-east-1.amazonaws.com.ec2.internal
Would anyone of you happen to know why ec2.internal is being adding to the end of the DNS? If run a dig against this name it fails. So it appears that a wrong name is being sent to the server which responds with 'no such host'. Is docker is sending a wrong dns name for resolution.
We see this issue happening intermittently. looking forward for help. Thanks in advance.
Expected behaviour
5.0.25_61: Pulling from rrg
Digest: sha256:50bbce4af6749e9a976f0533c3b50a0badb54855b73d8a3743473f1487fd223e
Status: Downloaded newer image forXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/rrg:5.0.25_61
Actual behaviour
docker-compose up -d rrg-node-1
Creating rrg-node-1
ERROR: for rrg-node-1 Cannot create container for service rrg-node-1: Error response from daemon: Get https:/XXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/v2/: dial tcp: lookup XXXXXXXX.dkr.ecr.us-east-1.amazonaws.com on 10.5.0.2:53: no such host
Steps to reproduce the issue
docker pull XXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/rrg:5.0.25_61
Output of docker version:
(Docker version 18.03.1-ce, build 3dfb8343b139d6342acfd9975d7f1068b5b1c3d3)
Output of docker info:
([ec2-user#ip-10-5-3-45 ~]$ docker info
Containers: 37
Running: 36
Paused: 0
Stopped: 1
Images: 60
Server Version: swarm/1.2.5
Role: replica
Primary: 10.5.4.172:3375
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint
Nodes: 12
Plugins:
Volume:
Network:
Log:
Swarm:
NodeID:
Is Manager: false
Node Address:
Kernel Version: 4.14.51-60.38.amzn1.x86_64
Operating System: linux
Architecture: amd64
CPUs: 22
Total Memory: 80.85GiB
Name: mgr1
Docker Root Dir:
Debug Mode (client): false
Debug Mode (server): false
Experimental: false
Live Restore Enabled: false
WARNING: No kernel memory limit support)

Kernel error when trying to start an NFS server in a container

I was trying to run through the NFS example in the Kubernetes codebase on Container Engine, but I couldn't get the shares to mount. Turns out every time the nfs-server pod is launched, the kernel is throwing an error:
Apr 27 00:11:06 k8s-cluster-6-node-1 kernel: [60165.482242] ------------[ cut here ]------------
Apr 27 00:11:06 k8s-cluster-6-node-1 kernel: [60165.483060] WARNING: CPU: 0 PID: 7160 at /build/linux-50mAO0/linux-3.16.7-ckt4/fs/nfsd/nfs4recover.c:1195 nfsd4_umh_cltrack_init+0x4a/0x60 nfsd
Full output here: http://pastebin.com/qLzCFpAa
Any thoughts on how to solve this?
The NFS example doesn't work because GKE (by default) doesn't support running privileged containers, such as the nfs-server. I just tested this with a v0.16.0 cluster and kubectl v0.15.0 (the current gcloud default) and got a nice error message when I tried to start the nfs-server pod:
$ kubectl create -f nfs-server-pod.yaml
Error: Pod "nfs-server" is invalid: spec.containers[0].privileged: forbidden 'true'