AWS EKS Kubernetes and DockerHub - amazon-web-services

I have a cluster and node creates in AWS EKS. I applied the deployment to that cluster as under
kubectl apply -f deployment.yaml
Where deployment.yaml contains the containers' specification along with DockerHub repo and image
However, I did a mistake in deployment.yaml and I need to re-apply it to the configuration
My question is:
1 - How do I reapply a deployment.yaml to the AWS EKS cluster using kubectl?
Just running the above command is not working (kubectl apply -f deployment.yaml)
2- After I re-apply the deployment.yaml , will the node will go an pick up the DockerHub image or do I still need to do something else( supposing all the other details are ok)
Some outputs below:
>> kubectl get pods
my-app-786dc95d8f-b6w4h 0/1 ImagePullBackOff 0 9h
my-app-786dc95d8f-w8hkg 0/1 ImagePullBackOff 0 9h
kubectl describe pod my-app-786dc95d8f-b6w4h
Name: my-app-786dc95d8f-b6w4h
Namespace: default
Priority: 0
Node: ip-192-168-24-13.ec2.internal/192.168.24.13
Start Time: Fri, 10 Jul 2020 12:54:38 -0400
Labels: app=my-app
pod-template-hash=786dc95d8f
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP: 192.168.7.235
IPs:
IP: 192.168.7.235
Controlled By: ReplicaSet/my-app-786dc95d8f
Containers:
simple-node:
Container ID:
Image: BAD_REPO/simple-node
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mwwvl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-mwwvl:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-mwwvl
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 17m (x2570 over 9h) kubelet, ip-192-168-24-13.ec2.internal Back-off pulling image "BAD_REPO/simple-node"
Warning Failed 2m48s (x2634 over 9h) kubelet, ip-192-168-24-13.ec2.internal Error: ImagePullBackOff
BR

if you need to change image:
kubectl set image deployment.v1.apps/{your_deployment_name} image_name:tag
but you always can do
kubectl delete -f deployment.yaml
kubectl create -f deployment.yaml
since your image is in ImagePullBackOff - it doesn't work anyway and you can just recreate deployment. Usually you don't do drop/create on prod. that is why i am using image change all the time. just have to change tag on every new image.

ImagePullBackOff means that kubernetes is not able to pull the image.
Specially, the service account "default" is not able to pull the image.
To fix this issue, you need two checks:
Check that you don't have typo in the image name and tag. And that image is available publically.
If the Docker registry is private, make sure to create secret with dockerlogin type, and then patch the service account "default" by this secret.

Related

Golang REST API Deployment on AWS EKS Fails with CrashLoopBackOff

I'm trying to deploy a simple REST API written in Golang to AWS EKS.
I created an EKS cluster on AWS using Terraform and applied the AWS load balancer controller Helm chart to it.
All resources in the cluster look like:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/aws-load-balancer-controller-5947f7c854-fgwk2 1/1 Running 0 75m
kube-system pod/aws-load-balancer-controller-5947f7c854-gkttb 1/1 Running 0 75m
kube-system pod/aws-node-dfc7r 1/1 Running 0 120m
kube-system pod/aws-node-hpn4z 1/1 Running 0 120m
kube-system pod/aws-node-s6mng 1/1 Running 0 120m
kube-system pod/coredns-66cb55d4f4-5l7vm 1/1 Running 0 127m
kube-system pod/coredns-66cb55d4f4-frk6p 1/1 Running 0 127m
kube-system pod/kube-proxy-6ndf5 1/1 Running 0 120m
kube-system pod/kube-proxy-s95qk 1/1 Running 0 120m
kube-system pod/kube-proxy-vdrdd 1/1 Running 0 120m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 127m
kube-system service/aws-load-balancer-webhook-service ClusterIP 10.100.202.90 <none> 443/TCP 75m
kube-system service/kube-dns ClusterIP 10.100.0.10 <none> 53/UDP,53/TCP 127m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/aws-node 3 3 3 3 3 <none> 127m
kube-system daemonset.apps/kube-proxy 3 3 3 3 3 <none> 127m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/aws-load-balancer-controller 2/2 2 2 75m
kube-system deployment.apps/coredns 2/2 2 2 127m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/aws-load-balancer-controller-5947f7c854 2 2 2 75m
kube-system replicaset.apps/coredns-66cb55d4f4 2 2 2 127m
I can run the application locally with Go and with Docker. But releasing this on AWS EKS always throws CrashLoopBackOff.
Running kubectl describe pod PODNAME shows:
Name: go-api-55d74b9546-dkk9g
Namespace: default
Priority: 0
Node: ip-172-16-1-191.ec2.internal/172.16.1.191
Start Time: Tue, 15 Mar 2022 07:04:08 -0700
Labels: app=go-api
pod-template-hash=55d74b9546
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 172.16.1.195
IPs:
IP: 172.16.1.195
Controlled By: ReplicaSet/go-api-55d74b9546
Containers:
go-api:
Container ID: docker://a4bc07b60c85fd308157d967d2d0d688d8eeccfe4c829102eb929ca82fb25595
Image: saurabhmish/golang-hello:latest
Image ID: docker-pullable://saurabhmish/golang-hello#sha256:f79a495ad17710b569136f611ae3c8191173400e2cbb9cfe416e75e2af6f7874
Port: 3000/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 15 Mar 2022 07:09:50 -0700
Finished: Tue, 15 Mar 2022 07:09:50 -0700
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jt4gp (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-jt4gp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m31s default-scheduler Successfully assigned default/go-api-55d74b9546-dkk9g to ip-172-16-1-191.ec2.internal
Normal Pulled 7m17s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 12.77458991s
Normal Pulled 7m16s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 110.127771ms
Normal Pulled 7m3s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 109.617419ms
Normal Created 6m37s (x4 over 7m17s) kubelet Created container go-api
Normal Started 6m37s (x4 over 7m17s) kubelet Started container go-api
Normal Pulled 6m37s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 218.952336ms
Normal Pulling 5m56s (x5 over 7m30s) kubelet Pulling image "saurabhmish/golang-hello:latest"
Normal Pulled 5m56s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 108.105083ms
Warning BackOff 2m28s (x24 over 7m15s) kubelet Back-off restarting failed container
Running kubectl logs PODNAME and kubectl logs PODNAME -c go-api shows standard_init_linux.go:228: exec user process caused: exec format error
Manifests:
go-deploy.yaml ( This is the Docker Hub Image with documentation )
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: go-api
labels:
app: go-api
spec:
replicas: 2
selector:
matchLabels:
app: go-api
strategy: {}
template:
metadata:
labels:
app: go-api
spec:
containers:
- name: go-api
image: saurabhmish/golang-hello:latest
ports:
- containerPort: 3000
resources: {}
go-service.yaml
---
kind: Service
apiVersion: v1
metadata:
name: go-api
spec:
selector:
app: go-api
type: NodePort
ports:
- protocol: TCP
port: 80
targetPort: 3000
How can I fix this error ?
Posting this as Community wiki for better visibility.
Feel free to expand it.
Thanks to #David Maze, who pointed to the solution. There is an article 'Build Intel64-compatible Docker images from Mac M1 (ARM)' (by Beppe Catanese) here.
This article describes the underlying problem well.
You are developing/building on the ARM architecture (Mac M1), but you deploy the docker image to a x86-64 architecture based Kubernetes cluster.
Solution:
Option A: use buildx
Buildx is a Docker plugin that allows, amongst other features, to build images for various target platforms.
$ docker buildx build --platform linux/amd64 -t myapp .
Option B: set DOCKER_DEFAULT_PLATFORM
The DOCKER_DEFAULT_PLATFORM environment variable permits to set the default platform for the commands that take the --platform flag.
export DOCKER_DEFAULT_PLATFORM=linux/amd64
A CrashloopBackOff means that you have a pod starting, crashing, starting again, and then crashing again.
Maybe the error come from the application itself that it can not connect to database, redis,...
You may find something useful here:
My kubernetes pods keep crashing with "CrashLoopBackOff" but I can't find any log

AWS EKS Fargate - Unable to mount EFS volume with statefulset

I want to run a statefulSet in AWS EKS Fargate and attach a EFS volume with it, but I am getting errors in mounting a volume with pod.
These are the error I am getting from describe pod.
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal LoggingEnabled 114s fargate-scheduler Successfully enabled logging for pod
Normal Scheduled 75s fargate-scheduler Successfully assigned default/app1 to fargate-10.0.2.123
Warning FailedMount 43s (x7 over 75s) kubelet MountVolume.SetUp failed for volume "efs-pv" : rpc error: code = Internal desc = Could not mount "fs-xxxxxxxxxxxxxxxxx:/" at "/var/lib/kubelet/pods/b799a6d6-fe9e-4f80-ac2d-8ccf8834d7c4/volumes/kubernetes.io~csi/efs-pv/mount": mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t efs -o tls fs-xxxxxxxxxxxxxxxxx:/ /var/lib/kubelet/pods/b799a6d6-fe9e-4f80-ac2d-8ccf8834d7c4/volumes/kubernetes.io~csi/efs-pv/mount
Output: Failed to resolve "fs-xxxxxxxxxxxxxxxxx.efs.us-east-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID.
See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail.
Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first.
Warning: config file does not have fall_back_to_mount_target_ip_address_enabled item in section mount.. You should be able to find a new config file in the same folder as current config file /etc/amazon/efs/efs-utils.conf. Consider update the new config file to latest config file. Use the default value [fall_back_to_mount_target_ip_address_enabled = True].
If anyone has setup efs volume with eks fargate cluster please have a look at it. I am really stucked in from long time.
What I have setup
Created a EFS Volume
CSIDriver Object
apiVersion: storage.k8s.io/v1beta1
kind: CSIDriver
metadata:
name: efs.csi.aws.com
spec:
attachRequired: false
Storage Class
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: efs-sc
provisioner: efs.csi.aws.com
PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc
csi:
driver: efs.csi.aws.com
volumeHandle: <EFS filesystem ID>
PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: efs-claim
spec:
accessModes:
- ReadWriteMany
storageClassName: efs-sc
resources:
requests:
storage: 5Gi
Pod Configuration
apiVersion: v1
kind: Pod
metadata:
name: app1
spec:
containers:
- name: app1
image: busybox
command: ["/bin/sh"]
args: ["-c", "while true; do echo $(date -u) >> /data/out1.txt; sleep 5; done"]
volumeMounts:
- name: persistent-storage
mountPath: /data
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: efs-claim
I had the same question as you literally a day after and have been working on the error nonstop since then! Did you check to make sure your VPC had DNS hostnames enabled? That is what fixed it for me.
Just an FYI, if you are using fargate and you want to change this--I had to go as far as deleting the entire cluster after changing the DNS hostnames flag in order for the change to propagate. I'm unsure if you're familiar with the DHCP options of a normal ec2 instance, but usually it takes something like renewing the ipconfig in order to force the flag to propagate, but since fargate is a managed system, I was unable to find a way to do so from the node itself. I have created another post here attempting to answer that question.
Another quick FYI: if your pod execution role doesn't have access to EFS, you will need to add a policy that allows access (I just used the default AmazonElasticFileSystemFullAccess Role for the time being in order to try to get things working). Once again, you will have to relaunch your whole cluster in order to get this role change to propagate if you haven't already done so!

Cannot create deployment of nginx tasks definitions on AWS Fargate using virtual-kubelet

I am unable to to deploy nginx containers using kubectl to AWS Fargate using virtual-kubelet. I am following this guide: https://aws.amazon.com/blogs/opensource/aws-fargate-virtual-kubelet/.
I am having an issue with Step 6: Create Kubernetes objects.
I would like to know why the nginx containers are PENDING and why the AWS Fargate task definitions have not been created.
The following is some of my commands I used. I can give more detail upon request.
# ./virtual-kubelet --provider aws --provider-config fargate.toml
...
2019/05/16 06:50:24 Received NodeDaemonEndpoints request.
ERRO[0000] TLS certificates not provided, not setting up pod http server certPath= keyPath= node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] Initialized node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] Created node node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] Node leases not supported, falling back to only node status updates node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] Pod cache in-sync node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
2019/05/16 06:50:25 Received GetPods request.
2019/05/16 06:50:25 Responding to GetPods: [].
INFO[0000] starting workers node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] started workers node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
# kubectl describe node virtual-kubelet
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedToCreateRoute 98s (x951 over 160m) route_controller (combined from similar events): Could not create route e1e32758-77a6-11e9-a68e-0a95bb07bfa2 100.96.4.0/24 for node virtual-kubelet after 47.871544ms: instance not found
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-20-47-10.eu-west-2.compute.internal Ready master 30h v1.14.1
ip-172-20-47-242.eu-west-2.compute.internal Ready node 30h v1.14.1
ip-172-20-59-102.eu-west-2.compute.internal Ready node 30h v1.14.1
virtual-kubelet Ready agent 33m v1.13.1-vk-v0.9.0-40-g5b3190ac-dev
kubectl create -f nginx-deployment.yaml
# kubectl get deployments -o wide
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-c6695csfc-5f7bh 0/1 Pending 0 21m <none> <none> <none> <none>
nginx-deployment-c6695csfc-bwfb8 0/1 Pending 0 21m <none> <none> <none> <none>
nginx-deployment-c6695csfc-mcfvw 0/1 Pending 0 21m <none> <none> <none> <none>
# kubectl describe pod nginx-deployment-c6695csfc-5f7bh
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m11s (x191 over 22m) default-scheduler 0/4 nodes are available: 1 Insufficient cpu, 1 node(s) had taints that the pod didn't tolerate, 3 node(s) didn't match node selector.
Update:
I then ran the command to add the nodeSelector to my nodes using the following command for each node:
kubectl label nodes ip-172-20-47-15.eu-west-2.compute.internal type=virtual-kubelet
type=virtual-kubelet is the nodeSelector specified in the manifest file, nginx-deployment.yaml.
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-c6695csfc-5f7bh 1/1 Running 0 4m59s 100.96.2.7 ip-172-20-47-242.eu-west-2.compute.internal <none> <none>
nginx-deployment-c6695csfc-bwfb8 1/1 Running 0 4m59s 100.96.1.6 ip-172-20-59-102.eu-west-2.compute.internal <none> <none>
nginx-deployment-c6695csfc-mcfvw 1/1 Running 0 4m59s 100.96.2.8 ip-172-20-47-242.eu-west-2.compute.internal <none>
Now when I go to the AWS Fargate Dashboard the associated task definitions are not created as shown in the tutorial.
This issue is resolved. I was able to create the AWS Fargate definitions by adding the ALB Security group to the fargate.toml file and by adding tolerations to the nginx.deployment.yaml file as shown below:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
tolerations:
- key: virtual-kubelet.io/provider
operator: Equal
value: azure
effect: NoSchedule

Trouble mounting an EBS to a Pod in a Kubernetes cluster

The cluster that I use is bootstrapped using kubeadm and it's deployed on AWS.
sudo kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:51:33Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:”linux/amd64"}
I am trying to configure a pod to mount a persistent volume (I don’t think about PV and PVC for the moment), this is the manifest I used:
apiVersion: v1
kind: Pod
metadata:
name: mongodb-aws
spec:
volumes:
- name: mongodb-data
awsElasticBlockStore:
volumeID: vol-xxxxxx
fsType: ext4
containers:
- image: mongo
name: mongodb
volumeMounts:
- name: mongodb-data
mountPath: /data/db
ports:
- containerPort: 27017
protocol: TCP
At first I had this error from the logs of the pod :
“ mount: special device /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/vol-xxxx does not exist “
After some research, I discovered that I have to set a cloud provider and this is what I’ve tried to do for the 10 past hours, I tested many suggestions but none worked; I tried to tag all the resources used by the cluster as mentioned in: https://github.com/kubernetes/kubernetes/issues/53538#issuecomment-345942305, I also tried this official solution to run in-tree cloud providers with kubeadm : https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/ :
kubeadm_config.yml file:
apiVersion: kubeadm.k8s.io/v1alpha3
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
cloud-provider: "aws"
cloud-config: "/etc/kubernetes/cloud.conf"
---
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1alpha3
kubernetesVersion: v1.12.0
apiServerExtraArgs:
cloud-provider: "aws"
cloud-config: "/etc/kubernetes/cloud.conf"
apiServerExtraVolumes:
- name: cloud
hostPath: "/etc/kubernetes/cloud.conf"
mountPath: "/etc/kubernetes/cloud.conf"
controllerManagerExtraArgs:
cloud-provider: "aws"
cloud-config: "/etc/kubernetes/cloud.conf"
controllerManagerExtraVolumes:
- name: cloud
hostPath: "/etc/kubernetes/cloud.conf"
mountPath: “/etc/kubernetes/cloud.conf"
In /etc/kubernetes/cloud.conf I put :
[Global]
KubernetesClusterTag=kubernetes
KubernetesClusterID=kubernetes
After running kubeadm init --config kubeadm_config.yml I had these errors:
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
couldn't initialize a Kubernetes cluster
The Control Plane is not created
When I removed :
apiVersion: kubeadm.k8s.io/v1alpha3
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
cloud-provider: "aws"
cloud-config: "/etc/kubernetes/cloud.conf"
From kubeadm_config.yml and I run kubeadm init --config kubeadm_config.yml, the
Kubernetes master had initialized successfully, but when I executed : kubectl get pods —all-namespaces, I got:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-ip-172-31-31-160 1/1 Running 0 11m
kube-system kube-apiserver-ip-172-31-31-160 1/1 Running 0 11m
kube-system kube-controller-manager-ip-172-31-31-160 0/1 CrashLoopBackOff 6 11m
kube-system kube-scheduler-ip-172-31-31-160 1/1 Running 0 10m
The controller didn’t run.However the --cloud-provider=aws command-line flag is present for the apiserver (in /etc/kubernetes/manifests/kube-apiserver.yaml) and also for the controller manager ( /etc/kubernetes/manifests/kube-controller-manager.yaml )
When I run sudo kubectl logs kube-controller-manager-ip-172-31-13-85 -n kube-system I got:
Flag --address has been deprecated, see --bind-address instead.
I1126 11:27:35.006433 1 serving.go:293] Generated self-signed cert (/var/run/kubernetes/kube-controller-manager.crt, /var/run/kubernetes/kube-controller-manager.key)
I1126 11:27:35.811493 1 controllermanager.go:143] Version: v1.12.0
I1126 11:27:35.812091 1 secure_serving.go:116] Serving securely on [::]:10257
I1126 11:27:35.812605 1 deprecated_insecure_serving.go:50] Serving insecurely on 127.0.0.1:10252
I1126 11:27:35.812760 1 leaderelection.go:187] attempting to acquire leader lease kube-system/kube-controller-manager...
I1126 11:27:53.260484 1 leaderelection.go:196] successfully acquired lease kube-system/kube-controller-manager
I1126 11:27:53.261474 1 event.go:221] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"b0da1291-f16d-11e8-baeb-02a38a37cfd6", APIVersion:"v1", ResourceVersion:"449", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-172-31-13-85_4603714e-f16e-11e8-8d9d-02a38a37cfd6 became leader
I1126 11:27:53.290493 1 aws.go:1042] Building AWS cloudprovider
I1126 11:27:53.290642 1 aws.go:1004] Zone not specified in configuration file; querying AWS metadata service
F1126 11:27:53.296760 1 controllermanager.go:192] error building controller context: cloud provider could not be initialized: could not init cloud provider "aws": error finding instance i-0b063e2a3c9797398: "error listing AWS instances: \"NoCredentialProviders: no valid providers in chain. Deprecated.\\n\\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors\""
I didn’t try to downgrade kubeadm (to be able to use manifests with only kind: MasterConfiguration)
If you need more information, please feel free to ask.

Can't access Prometheus from public IP on aws

Use kops install k8s cluster on AWS.
Use Helm installed Prometheus:
$ helm install stable/prometheus \
--set server.persistentVolume.enabled=false \
--set alertmanager.persistentVolume.enabled=false
Then followed this note to do port-forward:
Get the Prometheus server URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace default port-forward $POD_NAME 9090
My EC2 instance public IP on AWS is 12.29.43.14(not true). When I tried to access it from browser:
http://12.29.43.14:9090
Can't access the page. Why?
Another issue, after installed prometheus chart, the alertmanager pod didn't run:
ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4 1/2 CrashLoopBackOff 1 9s
ungaged-woodpecker-prometheus-kube-state-metrics-5fd97698cktsj5 1/1 Running 0 9s
ungaged-woodpecker-prometheus-node-exporter-45jtn 1/1 Running 0 9s
ungaged-woodpecker-prometheus-node-exporter-ztj9w 1/1 Running 0 9s
ungaged-woodpecker-prometheus-pushgateway-57b67c7575-c868b 0/1 Running 0 9s
ungaged-woodpecker-prometheus-server-7f858db57-w5h2j 1/2 Running 0 9s
Check pod details:
$ kubectl describe po ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4
Name: ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4
Namespace: default
Node: ip-100.200.0.1.ap-northeast-1.compute.internal/100.200.0.1
Start Time: Fri, 26 Jan 2018 02:45:10 +0000
Labels: app=prometheus
component=alertmanager
pod-template-hash=2959465499
release=ungaged-woodpecker
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff","uid":"ec...
kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container prometheus-alertmanager; cpu request for container prometheus-alertmanager-configmap-reload
Status: Running
IP: 100.96.6.91
Created By: ReplicaSet/ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff
Controlled By: ReplicaSet/ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff
Containers:
prometheus-alertmanager:
Container ID: docker://e9fe9d7bd4f78354f2c072d426fa935d955e0d6748c4ab67ebdb84b51b32d720
Image: prom/alertmanager:v0.9.1
Image ID: docker-pullable://prom/alertmanager#sha256:ed926b227327eecfa61a9703702c9b16fc7fe95b69e22baa656d93cfbe098320
Port: 9093/TCP
Args:
--config.file=/etc/config/alertmanager.yml
--storage.path=/data
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 26 Jan 2018 02:45:26 +0000
Finished: Fri, 26 Jan 2018 02:45:26 +0000
Ready: False
Restart Count: 2
Requests:
cpu: 100m
Readiness: http-get http://:9093/%23/status delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wppzm (ro)
prometheus-alertmanager-configmap-reload:
Container ID: docker://9320a0f157aeee7c3947027667aa6a2e00728d7156520c19daec7f59c1bf6534
Image: jimmidyson/configmap-reload:v0.1
Image ID: docker-pullable://jimmidyson/configmap-reload#sha256:2d40c2eaa6f435b2511d0cfc5f6c0a681eeb2eaa455a5d5ac25f88ce5139986e
Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9093/-/reload
State: Running
Started: Fri, 26 Jan 2018 02:45:11 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wppzm (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: ungaged-woodpecker-prometheus-alertmanager
Optional: false
storage-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-wppzm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-wppzm
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 34s default-scheduler Successfully assigned ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4 to ip-100.200.0.1.ap-northeast-1.compute.internal
Normal SuccessfulMountVolume 34s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "storage-volume"
Normal SuccessfulMountVolume 34s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "config-volume"
Normal SuccessfulMountVolume 34s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "default-token-wppzm"
Normal Pulled 33s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Container image "jimmidyson/configmap-reload:v0.1" already present on machine
Normal Created 33s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Created container
Normal Started 33s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Started container
Normal Pulled 18s (x3 over 34s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Container image "prom/alertmanager:v0.9.1" already present on machine
Normal Created 18s (x3 over 34s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Created container
Normal Started 18s (x3 over 33s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Started container
Warning BackOff 2s (x4 over 32s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Back-off restarting failed container
Warning FailedSync 2s (x4 over 32s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Error syncing pod
Not sure why it FailedSync.
When you do a kubectl port-forward with that command it makes the port available on your localhost. So run the command and then hit http://localhost:9090.
You won't be able to directly hit the prometheus ports from the public IP, outside the cluster. In the longer run you may want expose prometheus at a nice domain name via ingress (which the chart supports), that's how I'd do it. To use the chart's support for ingress you will need to install an ingress controller in your cluster (like the nginx ingress controller for example), and then enable ingress by setting --set service.ingress.enabled=true and --set server.ingress.hosts[0]=prometheus.yourdomain.com. Ingress is a fairly large topic in itself, so I'll just refer you to the official docs for that one:
https://kubernetes.io/docs/concepts/services-networking/ingress/
And here's the nginx ingress controller:
https://github.com/kubernetes/ingress-nginx
As far as the pod that is showing FailedSync, take a look at the logs using kubectl logs ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4 to see if there's any additional information there.