Why does Kubernetes apiserver present a bad certificate to the etcd server? - amazon-web-services

Running Kubernetes on CoreOS on an AWS EC2 instance, I am unable to execute apiserver via a hyperkube Docker container successfully. The problem is that the etcd server refuses connections due to a bad certificate.
What happens is this:
$ docker run -v /etc/ssl/etcd:/etc/ssl/etcd:ro gcr.io/google_containers/hyperkube:v1.1.2 /hyperkube apiserver --bind-address=0.0.0.0 --insecure-bind-address=127.0.0.1 --etcd-servers=https://172.31.29.111:2379 --allow-privileged=true --service-cluster-ip-range=10.3.0.0/24 --secure-port=443 --advertise-address=172.31.29.111 --admission-control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota --tls-cert-file=/etc/ssl/etcd/master1-master-client.pem --tls-private-key-file=/etc/ssl/etcd/master1-master-client-key.pem --client-ca-file=/etc/ssl/etcd/ca.pem --kubelet-certificate-authority=/etc/ssl/etcd/ca.pem --kubelet-client-certificate=/etc/ssl/etcd/master1-master-client.pem --kubelet-client-key=/etc/ssl/etcd/master1-master-client-key.pem --kubelet-https=true
I0227 17:07:34.117098 1 plugins.go:71] No cloud provider specified.
I0227 17:07:34.549806 1 master.go:368] Node port range unspecified. Defaulting to 30000-32767.
[restful] 2016/02/27 17:07:34 log.go:30: [restful/swagger] listing is available at https://172.31.29.111:443/swaggerapi/
[restful] 2016/02/27 17:07:34 log.go:30: [restful/swagger] https://172.31.29.111:443/swaggerui/ is mapped to folder /swagger-ui/
E0227 17:07:34.659701 1 cacher.go:149] unexpected ListAndWatch error: pkg/storage/cacher.go:115: Failed to list *api.Pod: 501: All the given peers are not reachable (failed to propose on members [https://172.31.29.111:2379] twice [last error: Get https://172.31.29.111:2379/v2/keys/registry/pods?quorum=false&recursive=true&sorted=true: remote error: bad certificate]) [0]
The certificate should be good though. If I execute an interactive shell within that Docker image, I can get the etcd URL via curl without any issues. So, what is going wrong in this case and how do I fix it?

I found I could solve this by using --etcd-config instead of --etcd-servers:
docker run -p 443:443 -v /etc/kubernetes:/etc/kubernetes:ro -v /etc/ssl/etcd:/etc/ssl/etcd:ro gcr.io/google_containers/hyperkube:v1.1.2 /hyperkube apiserver --bind-address=0.0.0.0 --insecure-bind-address=127.0.0.1 --etcd-config=/etc/kubernetes/etcd.client.conf --allow-privileged=true --service-cluster-ip-range=10.3.0.0/24 --secure-port=443 --advertise-address=172.31.29.111 --admission-control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota --kubelet-certificate-authority=/etc/ssl/etcd/ca.pem --kubelet-client-certificate=/etc/ssl/etcd/master1-master-client.pem --kubelet-client-key=/etc/ssl/etcd/master1-master-client-key.pem --client-ca-file=/etc/ssl/etcd/ca.pem --tls-cert-file=/etc/ssl/etcd/master1-master-client.pem --tls-private-key-file=/etc/ssl/etcd/master1-master-client-key.pem
etcd.client.conf:
{
"cluster": {
"machines": [ "https://172.31.29.111:2379" ]
},
"config": {
"certFile": "/etc/ssl/etcd/master1-master-client.pem",
"keyFile": "/etc/ssl/etcd/master1-master-client-key.pem"
}
}

Related

Subprocess Error when Launching ec2 cluster instance

I am getting subprocess error when launching ec2 cluster instance.
The terminal lags on
Waiting for cluster to enter 'ssh-ready' state'
when running
./spark-ec2 --key-pair=ru_spark --identity-file=ru_spark.pem --region=us-east-1 --zone=us-east-1a launch mycluster
Console:
Warning: Permanently added 'ec2-52-87-225-32.compute-1.amazonaws.com,52.87.225.32' (RSA) to the list of known hosts.
Connection to ec2-52-87-225-32.compute-1.amazonaws.com closed.
Warning: Permanently added 'ec2-52-87-225-32.compute-1.amazonaws.com,52.87.225.32' (RSA) to the list of known hosts.
Transferring cluster's SSH key to slaves...
ec2-34-207-153-79.compute-1.amazonaws.com
Warning: Permanently added 'ec2-34-207-153-79.compute-1.amazonaws.com,34.207.153.79' (RSA) to the list of known hosts.
Cloning spark-ec2 scripts from https://github.com/amplab/spark-ec2/tree/branch-1.6 on master...
Warning: Permanently added 'ec2-52-87-225-32.compute-1.amazonaws.com,52.87.225.32' (RSA) to the list of known hosts.
Cloning into 'spark-ec2'...
error: Peer reports incompatible or unsupported protocol version. while accessing https://github.com/amplab/spark-ec2/info/refs?service=git-upload-pack
fatal: HTTP request failed
Connection to ec2-52-87-225-32.compute-1.amazonaws.com closed.
Error executing remote command, retrying after 30 seconds: Command '['ssh', '-o', 'StrictHostKeyChecking=no', '-o', 'UserKnownHostsFile=/dev/null', '-i', 'ru_spark.pem', '-t', '-t', u'root#ec2-52-87-225-32.compute-1.amazonaws.com', 'rm -rf spark-ec2 && git clone https://github.com/amplab/spark-ec2 -b branch-1.6 spark-ec2']' returned non-zero exit status 128
I updated to curl ssl, changed file permissions to 400 and 600 for ru_spark.pem but neither have helped solve the issue.

Unable to acess Keycloak via browser after configuring SSL/TLS load balancer

I currently have an AWS server set up with docker to run the Keycloak docker container. For SSL/TLS, there is an AWS loadbalancer configured to point https/443 traffic to the container and have it receive it over 8080, terminating the encryption connection on said load balancer.
When creating the container with the following command, I am able to browse to and log into the keycloak service by browsing to the server's IP address.
docker run --name keycloak -v keybase-storage -p 8080:8080 -e KEYCLOAK_USER=admin -e KEYCLOAK_PASSWORD=TempAdminPassword jboss/keycloak However if I try to log into the server by browsing to the URL, I am redirected to the url http://default-host:8080/auth/admin/ and the browser showing a connection error page.
When trying to find a solution to this, I found how to pass java options to the container when it is first run, and using the resources from this page I used the following command to start the container(URL replaced for privacy concerns)
docker run --name keycloak -v keybase-storage -p 8080:8080 -e KEYCLOAK_USER=admin -e KEYCLOAK_PASSWORD=TempAdminPassword -e JAVA_OPTS_APPEND="-Dkeycloak.frontendUrl=https://sso.IntendedURL.com" jboss/keycloak However this yields the same results when trying to browse to the page.
The main clue I have to go off of right now is this line near the end of the previously shown docker run command, which reads as follows:
19:23:00,039 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 67) WFLYUT0021: Registered web context: '/auth' for server 'default-server'
What I believe I need to do now is to either change the config of the docker container after it has been created(have been unable to edit files using docker exec, so this is less likely) or to pass a java option into the run command when the container is first started.
Please let me know if you have any questions or if I can provide any other information.
Thank you.
Environment information:
Operating system
Amazon Linux 2
Docker version
19.03.13-ce, build 4484c46
Keycloak version
12.0.1(WildFly Core 13.0.3.Final)

Why is Google Compute Engine not running my container?

I can do this successfully:
Bundle my app into a docker image
Build this image into a container using Google Cloud Build upon push to master
(This container is stored in the registry at, for example, gcr.io/my-project/my-container)
Deply this container to the web using Google Cloud Run
Visit the Cloud Run url and see my website
I am now trying more sophisticated builds and I think the next step is to use Google Compute Engine.
To start, I am simply trying to deploy a single instance of the same app that I deployed to Cloud Run:
Navigate to Compute Engine > VM Instances
Enter basics like instance name
Enter my container location under "Container Image": gcr.io/my-project/my-container
(As an aside, I find it suspect that the interface does not offer a selector for your existing Container Registry items here.)
Select "Allow HTTP Traffic" and "Allow HTTPS Traffic"
Click "Create"
GCE takes a minute to create it, and then it shows the green checkmark and the instance name, and "External IP: 35.238.xxx.xxx". I visit that URL in my browser and get... "35.238.xxx.xxx refused to connect."
To inspect, I go back to the GCE page and select "SSH > Open in browser window" next to my instance, which opens a type of cloud terminal to the machine.
In this terminal window, type ps and see that no processes are running. The container Dockerfile ends with CMD yarn start:prod, so I guess that's not happening here.
Further, I ls here and there and navigate around, and see that there is no /app directory from my Dockerfile's WORKDIR /app command. It seems like not only did my app not boot, but was the container not copied to the VM instance?
What am I doing wrong?
For anyone having this issue. I faced the same problem and couldn't figure it out.
Reading Serhii's answer give me the clue. I believe as of today (Jan 2021) the GCP Console UI is a bit unhelpful. It appears that if you type in a container name when creating your VM but WITHOUT specifying a tag on the end, it doesn't complain nor assume a default such as 'latest', it just fails silently. Hence the VM but with no docker container running.
At least it this now works for me, hopefully this helps others.
Check whether your VM has an external IP address.
If it doesn't, the VM might not have network access to the public repository and even to the Google Container Registry (gcr.io) and the docker container doesn't start silently.
I've decided to follow Deploying a container on a new VM instance again.
Please find my steps and commands below:
create a new VM that runs the Docker image gcr.io/cloud-marketplace/google/nginx1:latest with network tag http-server:
$ gcloud compute instances create-with-container instance-3 --tags=http-server,https-server --container-image=gcr.io/cloud-marketplace/google/nginx1:latest
Created [https://www.googleapis.com/compute/v1/projects/test-prj/zones/europe-west3-a/instances/instance-3].
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
instance-3 europe-west3-a n1-standard-1 10.156.0.30 35.XXX.111.XXX RUNNING
create a new firewall rule:
$ gcloud compute firewall-rules create default-allow-http --direction=INGRESS --priority=1000 --network=default --action=ALLOW --rules=tcp:80 --source-ranges=0.0.0.0/0 --target-tags=http-server
Creating firewall...⠹
Created [https://www.googleapis.com/compute/v1/projects/test-prj/global/firewalls/default-allow-http].
Creating firewall...done.
NAME NETWORK DIRECTION PRIORITY ALLOW DENY DISABLED
default-allow-http default INGRESS 1000 tcp:80 False
check current firewall rules:
$ nmap -Pn 35.XXX.111.XXX
Starting Nmap 7.70 ( https://nmap.org ) at 2020-04-02 12:04 CEST
PORT STATE SERVICE
...
80/tcp open http
check if NGINX is running in the container:
$ curl -I http://35.XXX.111.XXX
HTTP/1.1 200 OK
Server: nginx/1.16.1
...
$ curl http://35.XXX.111.XXX
...
<h1>Welcome to nginx!</h1>
...
also via web browser at http://35.XXX.111.XXX
check status of the container:
$ gcloud compute ssh instance-3
...
instance-3 ~ $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
...
a657c8871239 gcr.io/cloud-marketplace/google/nginx1:latest "/usr/local/bin/dock…" 14 minutes ago Up 14 minutes klt-instance-3-uwtu
attach to the container and run curl http://35.XXX.111.XXX in the separate terminal:
instance-3 ~ $ docker attach a657c8871239
YY.YY.43.203 - - [02/Apr/2020:10:18:06 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.64.0" "-"
YY.YY.43.203 - - [02/Apr/2020:10:18:07 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.64.0" "-"
I found no errors while following documentation.
To solve your issue:
Compare your steps and commands to mine.
Run test Docker image by following documentation on your project.
Try to replicate steps from documentation with your custom image.
If you still have issue - update your question with all your steps, commands and outputs.
I also had the problem, the instance was running, but could not pull my container.
Error: Failed to start container: Error response from daemon:
{"message":"unautho rized: You don't have the needed permissions to
perform this operation, and you may have invalid credentials. To
authenticate your request, follow the steps in:
https://cloud.google.com/container-registry/docs/advanced-authentication"
I had to add some extra scope to the yaml file : https://www.googleapis.com/auth/source.full_control
steps:
- name: gcr.io/cloud-builders/docker
args: ['build', '-t', 'gcr.io/local-xxxxxxxxxxxxxx/apptraining', '.']
- name: 'gcr.io/cloud-builders/docker'
args: ["push", "gcr.io/local-xxxxxxxxxxxxxx/apptraining"]
- name: 'gcr.io/cloud-builders/gcloud'
args: ['compute', 'instances', 'create-with-container', 'instanceapptraining', '--machine-type=n1-standard-1', '--scopes=https://www.googleapis.com/auth/devstorage.full_control,https://www.googleapis.com/auth/trace.append,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/bigquery,https://www.googleapis.com/auth/datastore,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/trace.append,https://www.googleapis.com/auth/source.full_control,https://www.googleapis.com/auth/source.read_only,https://www.googleapis.com/auth/compute.readonly','--zone=us-central1-a', '--preemptible', '--container-image=gcr.io/local-xxxxxxxxxxxxxx/apptraining:latest']

Docker Private Registry: ping attempt failed

I'm trying to set up my private Docker Registry and I'm following the official documentation.
I have installed Docker and I'm able to run my registry on my server. But I want my registry to be more widely available.
My docker-server with the private registry is installed on an AWS-instance.
I have created my own certificate and key by using keytool:
docker run -d -p 5000:5000 --restart=always --name registry \
-v `pwd`/certs:/certs \
-e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt \
-e REGISTRY_HTTP_TLS_KEY=/certs/domain.key \
registry:2
I'm able to ping this instance by:
ping ec2-xx-xx-xx-xx.xx-west/east-1.compute.amazonaws.com
But pushing is not possible:
The push refers to a repository [ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/ubuntu] (len: 1)
unable to ping registry endpoint https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v0/
v2 ping attempt failed with error: Get https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v2/: dial tcp 10.x.x.x:5000: i/o timeout
v1 ping attempt failed with error: Get https://ec2-xx-xx-xx-xx.amazonaws.com:5000/v1/_ping: dial tcp 10.0.x.x:5000: i/o timeout
EDIT1:
After changing my aws-security group. Set port 5000 to TCP, the error changed:
unable to ping registry endpoint https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v0/
v2 ping attempt failed with error: Get https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v2/: dial tcp 10.0.x.x:5000: connection refused
v1 ping attempt failed with error: Get https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v1/_ping: dial tcp 10.0.x.x:5000: connection refused
How do I have to make my registry accessible for other aws-instances?
My docker logs are showing the following. They can't find my certificate.
level=fatal msg="open /certs/domain.crt: no such file or directory"
Do I have to put this certificate in my container itself? (and generate it with keytool by myself or using an existing)
EDIT2:
I've generated my own certificates using this documentation.
After generating the certificates I did restart my docker daemon. I did not perform the copy of domain.crt to ca.crt because the path didn't exist. Maybe I have to create it by myself?
new error:
unable to ping registry endpoint https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v0/
v2 ping attempt failed with error: Get https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v2/: dial tcp 10.0.x.x:5000: no route to host
v1 ping attempt failed with error: Get https://ec2-xx-xx-xx-xx.compute.amazonaws.com:5000/v1/_ping: dial tcp 10.0.x.x:5000: no route to host
But I still get the following in my docker logs:
level=fatal msg="open /certs/domain.crt: no such file or directory"
After trying to perform a push, there is created a new /certs folder into my existing certsfolder
EDIT3:
After finding the right directory for my certificate (/home/centos/certs/certs/*.). I get the following error:
level=fatal msg="open /certs/domain.crt: permission denied
Even if I perform a chmod -R 777 and chown -R root:root
You will need to place the certificate in this directory.
/etc/docker/certs.d/<your-domain-name>:5000/ca.crt

Why am I getting a permission denied error on docker/aws eb?

I don't know why but I cannot seem to figure out why this is happening. I can build and run the docker image locally.
Recent Events:
2015-05-25 12:57:07 UTC+1000 ERROR Update environment operation is complete, but with errors. For more information, see troubleshooting documentation.
2015-05-25 12:57:07 UTC+1000 INFO New application version was deployed to running EC2 instances.
2015-05-25 12:57:04 UTC+1000 INFO Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
2015-05-25 12:57:04 UTC+1000 ERROR [Instance: i-4775ec9b] Command failed on instance. Return code: 1 Output: (TRUNCATED)... run Docker container: vel="fatal" msg="Error response from daemon: Cannot start container 02c057b331bf3a3d912bf064f1dca3e00c95746b5748c3c4a28a5c6b452ff335: [8] System error: exec: \"bin/app\": permission denied" . Check snapshot logs for details. Hook /opt/elasticbeanstalk/hooks/appdeploy/pre/04run.sh failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
2015-05-25 12:57:03 UTC+1000 ERROR Failed to run Docker container: vel="fatal" msg="Error response from daemon: Cannot start container 02c057b331bf3a3d912bf064f1dca3e00c95746b5748c3c4a28a5c6b452ff335: [8] System error: exec: \"bin/app\": permission denied" . Check snapshot logs for details.
Dockerfile:
FROM java:8u45-jre
MAINTAINER Terence Munro <terry#zenkey.com.au>
ADD ["opt", "/opt"]
WORKDIR /opt/docker
RUN ["chown", "-R", "daemon:daemon", "."]
USER daemon
ENTRYPOINT ["bin/app"]
EXPOSE 9000
Dockerrun.aws.json:
{
"AWSEBDockerrunVersion": "1",
"Ports": [
{
"ContainerPort": "9000"
}
],
"Volumes": []
}
Additional logs as attachment at: https://forums.aws.amazon.com/thread.jspa?threadID=181270
Any help is extremely appreciated.
#nick-humrich suggestion of trying eb local run worked. So using eb deploy ended up working.
I had previously been uploading through the web interface.
Initially using eb deploy was giving me a ERROR: TypeError :: data must be a byte string but I found this issue which was resolved by uninstalling pyopenssl.
So I don't know why the web interface was giving me permission denied perhaps something to do with the zip file?
But anyway I'm able to deploy now thank you.
I had a similar problem running Docker on Elastic Beanstalk. When I pointed CMD in the Dockerfile to a shell script (/path/to/my_script.sh), the EB deployment would fail with
/path/to/my_script.sh: Permission denied.
Apparently, even though I had run RUN chmod +x /path/to/my_script.sh during the Docker build, by the time the image was run, the permissions had been changed. Eventually, to make it work I settled on:
CMD ["/bin/bash","-c","chmod +x /path/to/my_script.sh && /path/to/my_script.sh"]