kubectl apply results in ImagePullBackOff - kubectl

I'm getting the following error on kubectl apply -f https://k8s.io/examples/pods/simple-pod.yaml
Warning Failed 4s (x2 over 17s) kubelet Failed to pull image "nginx:1.14.2": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:1.14.2": failed to resolve reference "docker.io/library/nginx:1.14.2": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/1.14.2": x509: certificate signed by unknown authority
I've added all the root CA to /etc/ssl/certs/ca-certificates.crt on Ubuntu, yet the above kubectl command is stil failing. Is there any specfic location kubectl looks for the root CA certs ?
Any help is appreciated!
EDIT:
Okay, I got confused, it is basically my worker nodes that can't pull the image, not my local environment ... so basically I need to find a way to insert the root CA in every worker node. But still, wondering, why did the install of AKS not fail in the first place ...

Related

CockroachDB on AWS EKS cluster - [n?] no stores bootstrapped

I am attempting to deploy CockroachDB:v2.1.6 to a new AWS EKS cluster. Everything is deployed successfully; statefulset, services, pv's & pvc's are created. The AWS EBS volumes are created successfully too.
The issue is the pods never get to a READY state.
pod/cockroachdb-0 0/1 Running 0 14m
pod/cockroachdb-1 0/1 Running 0 14m
pod/cockroachdb-2 0/1 Running 0 14m
If I 'describe' the pods I get the following:
Normal Pulled 46s kubelet, ip-10-5-109-70.eu-central-1.compute.internal Container image "cockroachdb/cockroach:v2.1.6" already present on machine
Normal Created 46s kubelet, ip-10-5-109-70.eu-central-1.compute.internal Created container cockroachdb
Normal Started 46s kubelet, ip-10-5-109-70.eu-central-1.compute.internal Started container cockroachdb
Warning Unhealthy 1s (x8 over 36s) kubelet, ip-10-5-109-70.eu-central-1.compute.internal Readiness probe failed: HTTP probe failed with statuscode: 503
If I examine the logs of a pod I see this:
I200409 11:45:18.073666 14 server/server.go:1403 [n?] no stores bootstrapped and --join flag specified, awaiting init command.
W200409 11:45:18.076826 87 vendor/google.golang.org/grpc/clientconn.go:1293 grpc: addrConn.createTransport failed to connect to {cockroachdb-0.cockroachdb:26257 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb on 172.20.0.10:53: no such host". Reconnecting...
W200409 11:45:18.076942 21 gossip/client.go:123 [n?] failed to start gossip client to cockroachdb-0.cockroachdb:26257: initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb on 172.20.0.10:53: no such host"
I came across this comment from the CockroachDB forum (https://forum.cockroachlabs.com/t/http-probe-failed-with-statuscode-503/2043/6)
Both the cockroach_out.log and cockroach_output1.log files you sent me (corresponding to mycockroach-cockroachdb-0 and mycockroach-cockroachdb-2) print out no stores bootstrapped during startup and prefix all their log lines with n?, indicating that they haven’t been allocated a node ID. I’d say that they may have never been properly initialized as part of the cluster.
I have deleted everything including pv's, pvc's & AWS EBS volumes through the kubectl delete command and reapplied with the same issue.
Any thoughts would be very much appreciated. Thank you
I was not aware that you had to initialize the CockroachDB cluster after creating it. I did the following to resolve my issue:
kubectl exec -it cockroachdb-0 -n /bin/sh
/cockroach/cockroach init
See here for more details - https://www.cockroachlabs.com/docs/v19.2/cockroach-init.html
After this the pods started running correctly.

Eris blockchain - Monax getting error while deploying smart contracts

I am new to blockchain platforms and eris. trying to get a private blockchain up and running in my Mac OS from here
https://monax.io/docs/tutorials/getting-started/
all went fine until deployment of a smart contract. While performing "eris pkgs do" command, getting below error.
Performing action. This can sometimes take a wee while
Error connecting to node (tcp://chain:46657) to get chain id: Post http://chain:46657: dial tcp 198.105.244.228:46657: getsockopt: connection refused
Could not perform pkg action service: Could not perform pkg action: Container interactive-73d789a8-4693-4a4c-bcf2-ae2005a12d23 exited with status 1
update:
I am now able to get past this error. Followed the Mona tutorial for docker machines.
This took me to the compiler error (error scenario 6 in getting started session).
Now getting below error. the IP address is taken from the active machine by issuing command "docker-machine ls".
GinguVjs-MacBook-Pro:idi ginguvj$ eris pkgs do --chain simplechain --address $addr --compiler 192.168.99.101:2376
Performing action. This can sometimes take a wee while
Executing Job defaultAddr
Executing Job setStorageBase
Executing Job deployStorageK
failed to send HTTP request Post 192.168.99.101:2376: unsupported protocol scheme ""
Error compiling contracts: Compilers error:
Post 192.168.99.101:2376: unsupported protocol scheme ""
Could not perform pkg action service: Could not perform pkg action: Container interactive-671e81dc-4a1b-4e1e-b1ad-b51d955297b1 exited with status 1
GinguVjs-MacBook-Pro:idi ginguvj$
The compilation issue was resolved by adding a "-z" to the end.
"eris pkgs do --chain simplechain --address $addr -z"

Kubeadm why does my node not show up though kubelet says it joined?

I am setting up a Kubernetes deployment using auto-scaling groups and Terraform. The kube master node is behind an ELB to get some reliability in case of something going wrong. The ELB has the health check set to tcp 6443, and tcp listeners for 8080, 6443, and 9898. All of the instances and the load balancer belong to a security group that allows all traffic between members of the group, plus public traffic from the NAT Gateway address. I created my AMI using the following script (from the getting started guide)...
# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
# apt-get update
# # Install docker if you don't have it already.
# apt-get install -y docker.io
# apt-get install -y kubelet kubeadm kubectl kubernetes-cni
I use the following user data scripts...
kube master
#!/bin/bash
rm -rf /etc/kubernetes/*
rm -rf /var/lib/kubelet/*
kubeadm init \
--external-etcd-endpoints=http://${etcd_elb}:2379 \
--token=${token} \
--use-kubernetes-version=${k8s_version} \
--api-external-dns-names=kmaster.${master_elb_dns} \
--cloud-provider=aws
until kubectl cluster-info
do
sleep 1
done
kubectl apply -f https://git.io/weave-kube
kube node
#!/bin/bash
rm -rf /etc/kubernetes/*
rm -rf /var/lib/kubelet/*
until kubeadm join --token=${token} kmaster.${master_elb_dns}
do
sleep 1
done
Everything seems to work properly. The master comes up and responds to kubectl commands, with pods for discovery, dns, weave, controller-manager, api-server, and scheduler. kubeadm has the following output on the node...
Running pre-flight checks
<util/tokens> validating provided token
<node/discovery> created cluster info discovery client, requesting info from "http://kmaster.jenkins.learnvest.net:9898/cluster-info/v1/?token-id=eb31c0"
node/discovery> failed to request cluster info, will try again: [Get http://kmaster.jenkins.learnvest.net:9898/cluster-info/v1/?token-id=eb31c0: EOF]
<node/discovery> cluster info object received, verifying signature using given token
<node/discovery> cluster info signature and contents are valid, will use API endpoints [https://10.253.129.106:6443]
<node/bootstrap> trying to connect to endpoint https://10.253.129.106:6443
<node/bootstrap> detected server version v1.4.4
<node/bootstrap> successfully established connection with endpoint https://10.253.129.106:6443
<node/csr> created API client to obtain unique certificate for this node, generating keys and certificate signing request
<node/csr> received signed certificate from the API server:
Issuer: CN=kubernetes | Subject: CN=system:node:ip-10-253-130-44 | CA: false
Not before: 2016-10-27 18:46:00 +0000 UTC Not After: 2017-10-27 18:46:00 +0000 UTC
<node/csr> generating kubelet configuration
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
Node join complete:
* Certificate signing request sent to master and response
received.
* Kubelet informed of new secure connection details.
Run 'kubectl get nodes' on the master to see this machine join.
Unfortunately, running kubectl get nodes on the master only returns itself as a node. The only interesting thing I see in /var/log/syslog is
Oct 27 21:19:28 ip-10-252-39-25 kubelet[19972]: E1027 21:19:28.198736 19972 eviction_manager.go:162] eviction manager: unexpected err: failed GetNode: node 'ip-10-253-130-44' not found
Oct 27 21:19:31 ip-10-252-39-25 kubelet[19972]: E1027 21:19:31.778521 19972 kubelet_node_status.go:301] Error updating node status, will retry: error getting node "ip-10-253-130-44": nodes "ip-10-253-130-44" not found
I am really not sure where to look...
The Hostnames of the two machines (master and the node) should be different. You can check them by running cat /etc/hostname. If they do happen to be the same, edit that file to make them different and then do a sudo reboot to apply the changes. Otherwise kubeadm will not be able to differentiate between the two machines and it will show as a single one in kubectl get nodes.
Yes , I faced the same problem.
I resolved by:
killall kubelet
run the kubectl join command again
and start the kubelet service

Docker Private Registry: ping attempt failed

I'm trying to set up my private Docker Registry and I'm following the official documentation.
I have installed Docker and I'm able to run my registry on my server. But I want my registry to be more widely available.
My docker-server with the private registry is installed on an AWS-instance.
I have created my own certificate and key by using keytool:
docker run -d -p 5000:5000 --restart=always --name registry \
-v `pwd`/certs:/certs \
-e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt \
-e REGISTRY_HTTP_TLS_KEY=/certs/domain.key \
registry:2
I'm able to ping this instance by:
ping ec2-xx-xx-xx-xx.xx-west/east-1.compute.amazonaws.com
But pushing is not possible:
The push refers to a repository [ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/ubuntu] (len: 1)
unable to ping registry endpoint https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v0/
v2 ping attempt failed with error: Get https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v2/: dial tcp 10.x.x.x:5000: i/o timeout
v1 ping attempt failed with error: Get https://ec2-xx-xx-xx-xx.amazonaws.com:5000/v1/_ping: dial tcp 10.0.x.x:5000: i/o timeout
EDIT1:
After changing my aws-security group. Set port 5000 to TCP, the error changed:
unable to ping registry endpoint https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v0/
v2 ping attempt failed with error: Get https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v2/: dial tcp 10.0.x.x:5000: connection refused
v1 ping attempt failed with error: Get https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v1/_ping: dial tcp 10.0.x.x:5000: connection refused
How do I have to make my registry accessible for other aws-instances?
My docker logs are showing the following. They can't find my certificate.
level=fatal msg="open /certs/domain.crt: no such file or directory"
Do I have to put this certificate in my container itself? (and generate it with keytool by myself or using an existing)
EDIT2:
I've generated my own certificates using this documentation.
After generating the certificates I did restart my docker daemon. I did not perform the copy of domain.crt to ca.crt because the path didn't exist. Maybe I have to create it by myself?
new error:
unable to ping registry endpoint https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v0/
v2 ping attempt failed with error: Get https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v2/: dial tcp 10.0.x.x:5000: no route to host
v1 ping attempt failed with error: Get https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v1/_ping: dial tcp 10.0.x.x:5000: no route to host
But I still get the following in my docker logs:
level=fatal msg="open /certs/domain.crt: no such file or directory"
After trying to perform a push, there is created a new /certs folder into my existing certsfolder
EDIT3:
After finding the right directory for my certificate (/home/centos/certs/certs/*.). I get the following error:
level=fatal msg="open /certs/domain.crt: permission denied
Even if I perform a chmod -R 777 and chown -R root:root
You will need to place the certificate in this directory.
/etc/docker/certs.d/<your-domain-name>:5000/ca.crt

Confd error: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

While debugging I realised that confd doesn't pick up the keys and my journal looks like this:
Sep 18 18:31:50 ip-10-171-54-76.ec2.internal docker[24891]: [nginx] waiting for confd to refresh nginx.conf
Sep 18 18:31:56 ip-10-171-54-76.ec2.internal docker[24891]: 2014-09-18T18:31:56Z 9122c7a54edc confd[9572]: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
I use nsenter to log in to the running container to run some experiments for debugging purposes. I ran this command
confd -onetime -node 172.17.42.1:4001 -config-file /etc/confd/conf.d/nginx.toml
Then received this error as above
confd[12894]: ERROR 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
I am totally clueless at this point. I am using EC2 with the stable version of CoreOS and I am sure that etcd is running on the host. Also, I can ping the host from inside the container successfully.
Any ideas on what's wrong?
Assistance will be much appreciated.
This error indicates that your etcd cluster isn't operating correctly, so confd has nothing to watch. It has probably lost quorum. The logs (journalctl -u etcd) should indicate what happened.