Installed k8s cluster with kops on AWS.
Deployed prometheus with Helm in the k8s cluster:
$ helm install stable/prometheus
It has alertmanager configuration with some manifest files:
https://github.com/kubernetes/charts/tree/master/stable/prometheus/templates
After the installation finished, check pods:
$ kubectl get po
soft-flee-monitoring-alertmanager-5f56f7879d-sg5lx 1/2 CrashLoopBackOff 5 5m
soft-flee-monitoring-kube-state-metrics-ff9b86484-lwdvm 1/1 Running 0 5m
soft-flee-monitoring-node-exporter-ckd2r 1/1 Running 0 5m
soft-flee-monitoring-node-exporter-rwclt 0/1 Pending 0 1s
soft-flee-monitoring-pushgateway-99986f-4thpx 1/1 Running 0 5m
soft-flee-monitoring-server-558b4895c8-f56hg 0/2 Pending 0 5m
See failure reason:
$ kubectl describe po soft-flee-monitoring-alertmanager-5f56f7879d-sg5lx
Name: soft-flee-monitoring-alertmanager-5f56f7879d-sg5lx
Namespace: default
Node: ip-100.200.0.1.ap-northeast-1.compute.internal/100.200.0.1
Start Time: Thu, 25 Jan 2018 09:39:34 +0000
Labels: app=monitoring
component=alertmanager
pod-template-hash=1912934358
release=soft-flee
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"soft-flee-monitoring-alertmanager-5f56f7879d","uid":"a4e136ae-01...
kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container monitoring-alertmanager; cpu request for container monitoring-alertmanager-configmap-reload
Status: Running
IP: 100.96.6.83
Created By: ReplicaSet/soft-flee-monitoring-alertmanager-5f56f7879d
Controlled By: ReplicaSet/soft-flee-monitoring-alertmanager-5f56f7879d
Containers:
monitoring-alertmanager:
Container ID: docker://700dc92be231da0a5059e4645ba03a5cac762e8e41d3dc04b9be17a10ebfdcbb
Image: prom/alertmanager:v0.9.1
Image ID: docker-pullable://prom/alertmanager#sha256:ed926b227327eecfa61a9703702c9b16fc7fe95b69e22baa656d93cfbe098320
Port: 9093/TCP
Args:
--config.file=/etc/config/alertmanager.yml
--storage.path=/data
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 25 Jan 2018 09:40:19 +0000 Finished: Thu, 25 Jan 2018 09:40:19 +0000
Ready: False
Restart Count: 2
Requests:
cpu: 100m
Readiness: http-get http://:9093/%23/status delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wppzm (ro)
monitoring-alertmanager-configmap-reload:
Container ID: docker://0231fbc4dbe21d423d6bed858d70387cdfac60c2adb2d87a6a7087bf260ace74
Image: jimmidyson/configmap-reload:v0.1
Image ID: docker-pullable://jimmidyson/configmap-reload#sha256:2d40c2eaa6f435b2511d0cfc5f6c0a681eeb2eaa455a5d5ac25f88ce5139986e
Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9093/-/reload
State: Running
Started: Thu, 25 Jan 2018 09:40:03 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wppzm (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: soft-flee-monitoring-alertmanager
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: soft-flee-monitoring-alertmanager
ReadOnly: false
default-token-wppzm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-wppzm
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 1m (x3 over 1m) default-scheduler PersistentVolumeClaim is not bound: "soft-flee-monitoring-alertmanager" (repeated 5 times)
Normal Scheduled 1m default-scheduler Successfully assigned soft-flee-monitoring-alertmanager-5f56f7879d-sg5lx to ip-100.200.0.1.ap-northeast-1.compute.internal
Normal SuccessfulMountVolume 1m kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "default-token-wppzm"
Warning FailedMount 1m attachdetach AttachVolume.Attach failed for volume "pvc-a4c420a5-01b3-11e8-a981-06b56e90ab12" : Error attaching EBS volume "vol-0c8c9d3794bdbec90" to instance "i-0cf5ecba708a2ffe7": "IncorrectState: vol-0c8c9d3794bdbec90 is not 'available'.\n\tstatus code: 400, request id: ccda67b9-076f-4b95-93b8-86c4ca5f4229"
Normal SuccessfulMountVolume 1m kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "config-volume"
Normal SuccessfulMountVolume 56s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "pvc-a4c420a5-01b3-11e8-a981-06b56e90ab12"
Normal Pulling 55s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal pulling image "prom/alertmanager:v0.9.1"
Normal Pulled 50s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Successfully pulled image "prom/alertmanager:v0.9.1"
Normal Pulling 50s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal pulling image "jimmidyson/configmap-reload:v0.1"
Normal Created 44s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Created container
Normal Started 44s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Started container
Normal Pulled 44s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Successfully pulled image "jimmidyson/configmap-reload:v0.1"
Normal Created 28s (x3 over 50s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Created container
Normal Started 28s (x3 over 50s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Started container
Normal Pulled 28s (x2 over 44s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Container image "prom/alertmanager:v0.9.1" already present on machine
Warning BackOff 12s (x4 over 43s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Back-off restarting failed container
Warning FailedSync 12s (x4 over 43s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Error syncing pod
It got FailedMount error:
AttachVolume.Attach failed for volume "pvc-a4c420a5-01b3-11e8-a981-06b56e90ab12" : Error attaching EBS volume "vol-0c8c9d3794bdbec90" to instance
But when I check volume vol-0c8c9d3794bdbec90, it's running. Why caused this error?
If you have setup your cluster with KOPS apparently the Persistent Volumes are created automatically for you. You will get that error above but after a few minutes it will go away.
I 2 created PV that matched the size of the volume claims in the values file so that when helm was going to claim them they would get used. But actually 2 new PV were created and were claimed.
This is how i created my volumes:
aws ec2 create-volume --availability-zone=us-east-1c --size=2
--volume-type=gp2 --no-encrypted --tag-specifications "ResourceType=volume,Tags=[{Key=myproject,Value=prometheus-server}]"
Related
I'm trying to deploy a simple REST API written in Golang to AWS EKS.
I created an EKS cluster on AWS using Terraform and applied the AWS load balancer controller Helm chart to it.
All resources in the cluster look like:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/aws-load-balancer-controller-5947f7c854-fgwk2 1/1 Running 0 75m
kube-system pod/aws-load-balancer-controller-5947f7c854-gkttb 1/1 Running 0 75m
kube-system pod/aws-node-dfc7r 1/1 Running 0 120m
kube-system pod/aws-node-hpn4z 1/1 Running 0 120m
kube-system pod/aws-node-s6mng 1/1 Running 0 120m
kube-system pod/coredns-66cb55d4f4-5l7vm 1/1 Running 0 127m
kube-system pod/coredns-66cb55d4f4-frk6p 1/1 Running 0 127m
kube-system pod/kube-proxy-6ndf5 1/1 Running 0 120m
kube-system pod/kube-proxy-s95qk 1/1 Running 0 120m
kube-system pod/kube-proxy-vdrdd 1/1 Running 0 120m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 127m
kube-system service/aws-load-balancer-webhook-service ClusterIP 10.100.202.90 <none> 443/TCP 75m
kube-system service/kube-dns ClusterIP 10.100.0.10 <none> 53/UDP,53/TCP 127m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/aws-node 3 3 3 3 3 <none> 127m
kube-system daemonset.apps/kube-proxy 3 3 3 3 3 <none> 127m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/aws-load-balancer-controller 2/2 2 2 75m
kube-system deployment.apps/coredns 2/2 2 2 127m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/aws-load-balancer-controller-5947f7c854 2 2 2 75m
kube-system replicaset.apps/coredns-66cb55d4f4 2 2 2 127m
I can run the application locally with Go and with Docker. But releasing this on AWS EKS always throws CrashLoopBackOff.
Running kubectl describe pod PODNAME shows:
Name: go-api-55d74b9546-dkk9g
Namespace: default
Priority: 0
Node: ip-172-16-1-191.ec2.internal/172.16.1.191
Start Time: Tue, 15 Mar 2022 07:04:08 -0700
Labels: app=go-api
pod-template-hash=55d74b9546
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 172.16.1.195
IPs:
IP: 172.16.1.195
Controlled By: ReplicaSet/go-api-55d74b9546
Containers:
go-api:
Container ID: docker://a4bc07b60c85fd308157d967d2d0d688d8eeccfe4c829102eb929ca82fb25595
Image: saurabhmish/golang-hello:latest
Image ID: docker-pullable://saurabhmish/golang-hello#sha256:f79a495ad17710b569136f611ae3c8191173400e2cbb9cfe416e75e2af6f7874
Port: 3000/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 15 Mar 2022 07:09:50 -0700
Finished: Tue, 15 Mar 2022 07:09:50 -0700
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jt4gp (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-jt4gp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m31s default-scheduler Successfully assigned default/go-api-55d74b9546-dkk9g to ip-172-16-1-191.ec2.internal
Normal Pulled 7m17s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 12.77458991s
Normal Pulled 7m16s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 110.127771ms
Normal Pulled 7m3s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 109.617419ms
Normal Created 6m37s (x4 over 7m17s) kubelet Created container go-api
Normal Started 6m37s (x4 over 7m17s) kubelet Started container go-api
Normal Pulled 6m37s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 218.952336ms
Normal Pulling 5m56s (x5 over 7m30s) kubelet Pulling image "saurabhmish/golang-hello:latest"
Normal Pulled 5m56s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 108.105083ms
Warning BackOff 2m28s (x24 over 7m15s) kubelet Back-off restarting failed container
Running kubectl logs PODNAME and kubectl logs PODNAME -c go-api shows standard_init_linux.go:228: exec user process caused: exec format error
Manifests:
go-deploy.yaml ( This is the Docker Hub Image with documentation )
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: go-api
labels:
app: go-api
spec:
replicas: 2
selector:
matchLabels:
app: go-api
strategy: {}
template:
metadata:
labels:
app: go-api
spec:
containers:
- name: go-api
image: saurabhmish/golang-hello:latest
ports:
- containerPort: 3000
resources: {}
go-service.yaml
---
kind: Service
apiVersion: v1
metadata:
name: go-api
spec:
selector:
app: go-api
type: NodePort
ports:
- protocol: TCP
port: 80
targetPort: 3000
How can I fix this error ?
Posting this as Community wiki for better visibility.
Feel free to expand it.
Thanks to #David Maze, who pointed to the solution. There is an article 'Build Intel64-compatible Docker images from Mac M1 (ARM)' (by Beppe Catanese) here.
This article describes the underlying problem well.
You are developing/building on the ARM architecture (Mac M1), but you deploy the docker image to a x86-64 architecture based Kubernetes cluster.
Solution:
Option A: use buildx
Buildx is a Docker plugin that allows, amongst other features, to build images for various target platforms.
$ docker buildx build --platform linux/amd64 -t myapp .
Option B: set DOCKER_DEFAULT_PLATFORM
The DOCKER_DEFAULT_PLATFORM environment variable permits to set the default platform for the commands that take the --platform flag.
export DOCKER_DEFAULT_PLATFORM=linux/amd64
A CrashloopBackOff means that you have a pod starting, crashing, starting again, and then crashing again.
Maybe the error come from the application itself that it can not connect to database, redis,...
You may find something useful here:
My kubernetes pods keep crashing with "CrashLoopBackOff" but I can't find any log
I have a cluster and node creates in AWS EKS. I applied the deployment to that cluster as under
kubectl apply -f deployment.yaml
Where deployment.yaml contains the containers' specification along with DockerHub repo and image
However, I did a mistake in deployment.yaml and I need to re-apply it to the configuration
My question is:
1 - How do I reapply a deployment.yaml to the AWS EKS cluster using kubectl?
Just running the above command is not working (kubectl apply -f deployment.yaml)
2- After I re-apply the deployment.yaml , will the node will go an pick up the DockerHub image or do I still need to do something else( supposing all the other details are ok)
Some outputs below:
>> kubectl get pods
my-app-786dc95d8f-b6w4h 0/1 ImagePullBackOff 0 9h
my-app-786dc95d8f-w8hkg 0/1 ImagePullBackOff 0 9h
kubectl describe pod my-app-786dc95d8f-b6w4h
Name: my-app-786dc95d8f-b6w4h
Namespace: default
Priority: 0
Node: ip-192-168-24-13.ec2.internal/192.168.24.13
Start Time: Fri, 10 Jul 2020 12:54:38 -0400
Labels: app=my-app
pod-template-hash=786dc95d8f
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP: 192.168.7.235
IPs:
IP: 192.168.7.235
Controlled By: ReplicaSet/my-app-786dc95d8f
Containers:
simple-node:
Container ID:
Image: BAD_REPO/simple-node
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mwwvl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-mwwvl:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-mwwvl
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 17m (x2570 over 9h) kubelet, ip-192-168-24-13.ec2.internal Back-off pulling image "BAD_REPO/simple-node"
Warning Failed 2m48s (x2634 over 9h) kubelet, ip-192-168-24-13.ec2.internal Error: ImagePullBackOff
BR
if you need to change image:
kubectl set image deployment.v1.apps/{your_deployment_name} image_name:tag
but you always can do
kubectl delete -f deployment.yaml
kubectl create -f deployment.yaml
since your image is in ImagePullBackOff - it doesn't work anyway and you can just recreate deployment. Usually you don't do drop/create on prod. that is why i am using image change all the time. just have to change tag on every new image.
ImagePullBackOff means that kubernetes is not able to pull the image.
Specially, the service account "default" is not able to pull the image.
To fix this issue, you need two checks:
Check that you don't have typo in the image name and tag. And that image is available publically.
If the Docker registry is private, make sure to create secret with dockerlogin type, and then patch the service account "default" by this secret.
I am using Jenkins-X for a relatively large project, which consists of approximately 30 modules, 15 of which are services (and therefore, contain Dockerfiles, and a respective Helm chart for deployment).
During some of these relatively large builds, I am intermittently (~every other build) seeing a build pod become evicted, using kubectl describe pod <podname> I can investigate and I've noticed that the pod is evicted due to the following:
the node was low on resource imagefs
Full data:
Name: maven-96wmn
Namespace: jx
Node: ip-192-168-66-176.eu-west-1.compute.internal/
Start Time: Tue, 06 Nov 2018 10:22:54 +0000
Labels: jenkins=slave
jenkins/jenkins-maven=true
Annotations: <none>
Status: Failed
Reason: Evicted
Message: The node was low on resource: imagefs.
IP:
Containers:
maven:
Image: jenkinsxio/builder-maven:0.0.516
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
Args:
cat
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 400m
memory: 512Mi
Environment:
JENKINS_SECRET: 131c407141521c0842f62a69004df926be6cb531f9318edf0885aeb96b0662b4
JENKINS_TUNNEL: jenkins-agent:50000
DOCKER_CONFIG: /home/jenkins/.docker/
GIT_AUTHOR_EMAIL: jenkins-x#googlegroups.com
GIT_COMMITTER_EMAIL: jenkins-x#googlegroups.com
GIT_COMMITTER_NAME: jenkins-x-bot
_JAVA_OPTIONS: -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Dsun.zip.disableMemoryMapping=true -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -Xms10m -Xmx192m
GIT_AUTHOR_NAME: jenkins-x-bot
JENKINS_NAME: maven-96wmn
XDG_CONFIG_HOME: /home/jenkins
JENKINS_URL: http://jenkins:8080
HOME: /home/jenkins
Mounts:
/home/jenkins from workspace-volume (rw)
/home/jenkins/.docker from volume-2 (rw)
/home/jenkins/.gnupg from volume-3 (rw)
/root/.m2 from volume-1 (rw)
/var/run/docker.sock from volume-0 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from jenkins-token-smvvp (ro)
jnlp:
Image: jenkinsci/jnlp-slave:3.14-1
Port: <none>
Host Port: <none>
Args:
131c407141521c0842f62a69004df926be6cb531f9318edf0885aeb96b0662b4
maven-96wmn
Requests:
cpu: 100m
memory: 128Mi
Environment:
JENKINS_SECRET: 131c407141521c0842f62a69004df926be6cb531f9318edf0885aeb96b0662b4
JENKINS_TUNNEL: jenkins-agent:50000
DOCKER_CONFIG: /home/jenkins/.docker/
GIT_AUTHOR_EMAIL: jenkins-x#googlegroups.com
GIT_COMMITTER_EMAIL: jenkins-x#googlegroups.com
GIT_COMMITTER_NAME: jenkins-x-bot
_JAVA_OPTIONS: -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Dsun.zip.disableMemoryMapping=true -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -Xms10m -Xmx192m
GIT_AUTHOR_NAME: jenkins-x-bot
JENKINS_NAME: maven-96wmn
XDG_CONFIG_HOME: /home/jenkins
JENKINS_URL: http://jenkins:8080
HOME: /home/jenkins
Mounts:
/home/jenkins from workspace-volume (rw)
/home/jenkins/.docker from volume-2 (rw)
/home/jenkins/.gnupg from volume-3 (rw)
/root/.m2 from volume-1 (rw)
/var/run/docker.sock from volume-0 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from jenkins-token-smvvp (ro)
Volumes:
volume-0:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType:
volume-2:
Type: Secret (a volume populated by a Secret)
SecretName: jenkins-docker-cfg
Optional: false
volume-1:
Type: Secret (a volume populated by a Secret)
SecretName: jenkins-maven-settings
Optional: false
workspace-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
volume-3:
Type: Secret (a volume populated by a Secret)
SecretName: jenkins-release-gpg
Optional: false
jenkins-token-smvvp:
Type: Secret (a volume populated by a Secret)
SecretName: jenkins-token-smvvp
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Created container
Normal SuccessfulMountVolume 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "workspace-volume"
Normal SuccessfulMountVolume 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "volume-0"
Normal SuccessfulMountVolume 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "volume-1"
Normal SuccessfulMountVolume 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "volume-2"
Normal SuccessfulMountVolume 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "volume-3"
Normal SuccessfulMountVolume 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "jenkins-token-smvvp"
Normal Pulled 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Container image "jenkinsxio/builder-maven:0.0.516" already present on machine
Normal Scheduled 7m default-scheduler Successfully assigned maven-96wmn to ip-192-168-66-176.eu-west-1.compute.internal
Normal Started 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Started container
Normal Pulled 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Container image "jenkinsci/jnlp-slave:3.14-1" already present on machine
Normal Created 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Created container
Normal Started 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Started container
Warning Evicted 5m kubelet, ip-192-168-66-176.eu-west-1.compute.internal The node was low on resource: imagefs.
Normal Killing 5m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Killing container with id docker://jnlp:Need to kill Pod
Normal Killing 5m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Killing container with id docker://maven:Need to kill Pod
How can I remedy this issue? I generally do not fully understand what imagefs is, how I configure / increase it, or avoid saturating it.
ps. sorry this post is written so passively, I had to use an active tone to make the wording wordy enough for SO to allow me to not just post a code snippet.
Resolved; due to underlying size of storage being only 20gb, changed to 50gb in EBS and rebooted the nodes (which had increased nodefs) which removed this problem (as imagefs no longer was saturated).
Use kops install k8s cluster on AWS.
Use Helm installed Prometheus:
$ helm install stable/prometheus \
--set server.persistentVolume.enabled=false \
--set alertmanager.persistentVolume.enabled=false
Then followed this note to do port-forward:
Get the Prometheus server URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace default port-forward $POD_NAME 9090
My EC2 instance public IP on AWS is 12.29.43.14(not true). When I tried to access it from browser:
http://12.29.43.14:9090
Can't access the page. Why?
Another issue, after installed prometheus chart, the alertmanager pod didn't run:
ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4 1/2 CrashLoopBackOff 1 9s
ungaged-woodpecker-prometheus-kube-state-metrics-5fd97698cktsj5 1/1 Running 0 9s
ungaged-woodpecker-prometheus-node-exporter-45jtn 1/1 Running 0 9s
ungaged-woodpecker-prometheus-node-exporter-ztj9w 1/1 Running 0 9s
ungaged-woodpecker-prometheus-pushgateway-57b67c7575-c868b 0/1 Running 0 9s
ungaged-woodpecker-prometheus-server-7f858db57-w5h2j 1/2 Running 0 9s
Check pod details:
$ kubectl describe po ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4
Name: ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4
Namespace: default
Node: ip-100.200.0.1.ap-northeast-1.compute.internal/100.200.0.1
Start Time: Fri, 26 Jan 2018 02:45:10 +0000
Labels: app=prometheus
component=alertmanager
pod-template-hash=2959465499
release=ungaged-woodpecker
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff","uid":"ec...
kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container prometheus-alertmanager; cpu request for container prometheus-alertmanager-configmap-reload
Status: Running
IP: 100.96.6.91
Created By: ReplicaSet/ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff
Controlled By: ReplicaSet/ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff
Containers:
prometheus-alertmanager:
Container ID: docker://e9fe9d7bd4f78354f2c072d426fa935d955e0d6748c4ab67ebdb84b51b32d720
Image: prom/alertmanager:v0.9.1
Image ID: docker-pullable://prom/alertmanager#sha256:ed926b227327eecfa61a9703702c9b16fc7fe95b69e22baa656d93cfbe098320
Port: 9093/TCP
Args:
--config.file=/etc/config/alertmanager.yml
--storage.path=/data
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 26 Jan 2018 02:45:26 +0000
Finished: Fri, 26 Jan 2018 02:45:26 +0000
Ready: False
Restart Count: 2
Requests:
cpu: 100m
Readiness: http-get http://:9093/%23/status delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wppzm (ro)
prometheus-alertmanager-configmap-reload:
Container ID: docker://9320a0f157aeee7c3947027667aa6a2e00728d7156520c19daec7f59c1bf6534
Image: jimmidyson/configmap-reload:v0.1
Image ID: docker-pullable://jimmidyson/configmap-reload#sha256:2d40c2eaa6f435b2511d0cfc5f6c0a681eeb2eaa455a5d5ac25f88ce5139986e
Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9093/-/reload
State: Running
Started: Fri, 26 Jan 2018 02:45:11 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wppzm (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: ungaged-woodpecker-prometheus-alertmanager
Optional: false
storage-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-wppzm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-wppzm
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 34s default-scheduler Successfully assigned ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4 to ip-100.200.0.1.ap-northeast-1.compute.internal
Normal SuccessfulMountVolume 34s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "storage-volume"
Normal SuccessfulMountVolume 34s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "config-volume"
Normal SuccessfulMountVolume 34s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "default-token-wppzm"
Normal Pulled 33s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Container image "jimmidyson/configmap-reload:v0.1" already present on machine
Normal Created 33s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Created container
Normal Started 33s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Started container
Normal Pulled 18s (x3 over 34s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Container image "prom/alertmanager:v0.9.1" already present on machine
Normal Created 18s (x3 over 34s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Created container
Normal Started 18s (x3 over 33s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Started container
Warning BackOff 2s (x4 over 32s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Back-off restarting failed container
Warning FailedSync 2s (x4 over 32s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Error syncing pod
Not sure why it FailedSync.
When you do a kubectl port-forward with that command it makes the port available on your localhost. So run the command and then hit http://localhost:9090.
You won't be able to directly hit the prometheus ports from the public IP, outside the cluster. In the longer run you may want expose prometheus at a nice domain name via ingress (which the chart supports), that's how I'd do it. To use the chart's support for ingress you will need to install an ingress controller in your cluster (like the nginx ingress controller for example), and then enable ingress by setting --set service.ingress.enabled=true and --set server.ingress.hosts[0]=prometheus.yourdomain.com. Ingress is a fairly large topic in itself, so I'll just refer you to the official docs for that one:
https://kubernetes.io/docs/concepts/services-networking/ingress/
And here's the nginx ingress controller:
https://github.com/kubernetes/ingress-nginx
As far as the pod that is showing FailedSync, take a look at the logs using kubectl logs ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4 to see if there's any additional information there.
When I use AWS EC2 deploy k8s cluster, deploy a test container on it:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: app-deployment-test
spec:
replicas: 2
template:
metadata:
labels:
app: app-test
tier: frontend
spec:
containers:
- name: app
image: ubuntu
ports:
- containerPort: 80
imagePullPolicy: Always
Got
Back-off restarting failed container
Error syncing pod
error on pods. When describe the special pod: kubectl describe pod app-deployment-test-ccddf7bcc-dqltq, got these messages at tail of the output:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 41s default-scheduler Successfully assigned app-deployment-test-ccddf7bcc-dqltq to app-instance
Normal SuccessfulMountVolume 41s kubelet, app-instance MountVolume.SetUp succeeded for volume "default-token-zrf98"
Normal Pulling 14s (x3 over 40s) kubelet, app-instance pulling image "ubuntu"
Normal Pulled 12s (x3 over 32s) kubelet, app-instance Successfully pulled image "ubuntu"
Normal Created 12s (x3 over 31s) kubelet, app-instance Created container
Normal Started 11s (x3 over 31s) kubelet, app-instance Started container
Warning BackOff 11s (x3 over 27s) kubelet, app-instance Back-off restarting failed container
Warning FailedSync 11s (x3 over 27s) kubelet, app-instance Error syncing pod
What will be the reason it failed?
The default command to ubuntu container is bash. This bash command will run one time and will stop the container, this is the expected behavior.
If you want to keep the container running add the command and argument below.
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
I recommend you to run the nginx container that will keep running until you kill the container.