Am setting up Vora 2.1 on an AWS KOPS setup.
./install.sh --accept-license --deployment-type=cloud --enable-rbac=no --namespace=vora --docker-registry=<localrepository>:5000 --vora-admin-username=voraadmin --vora-admin-password=<secret> --cert-domain=<custeromerdomain> --interactive-security-configuration=no --vsystem-storage-class=aws-efs --vsystem-load-nfs-modules
Below is my error:
Wait until pod vora-deployment-operator-cc84bff65-hgtt4 is running...
Wait until containers in the pod vora-deployment-operator-cc84bff65-hgtt4 are ready...
Wait until voracluster CRD is created...
No resources found.
Deploying vora-cluster with: helm install --namespace vora -f values.yaml -f /install/SAPVora-2.1.60-DistributedRuntime/stateful-replica-conf.yaml --set docker.registry=172.20.41.35:5000 --set rbac.enabled=false --set imagePullSecret= --set docker.imagePullSecret= --set version.package=2.1.60 --set docker.image=vora/dqp --set docker.imageTag=2.1.32.25-vora-2.1 --set components.globalParameters.security.docker.image=vora/init-security --set components.globalParameters.security.docker.imageTag=0.0.9 --set components.globalParameters.security.enable=true --set components.globalParameters.security.context=consumer --set components.globalParameters.security.contextRoot=/etc/vora-security --set version.component=2.1.32.25-vora-2.1 --set name=vora --set dontUseExternalStorage=false --set useHostPath=false --set components.disk.useHostPath=false --set components.dlog.useHostPath=false .
NAME: quaffing-cow
LAST DEPLOYED: Thu Mar 29 09:53:24 2018
NAMESPACE: vora
STATUS: DEPLOYED
RESOURCES:
==> v1/VoraCluster
NAME KIND
vora VoraCluster.v1.sap.com
Hang tight while we grab the latest from your chart repositories...
...Unable to get an update from the "local" chart repository (http://127.0.0.1:8879/charts):
Get http://127.0.0.1:8879/charts/index.yaml: dial tcp 127.0.0.1:8879: getsockopt: connection refused
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈Happy Helming!⎈
Saving 1 charts
Downloading consul from repo https://kubernetes-charts.storage.googleapis.com/
Deleting outdated charts
vora-vsystem is already installed, skipping...
Deploying vora-thriftserver with: helm install --namespace vora -f values.yaml -f /install/SAPVora-2.1.60-DistributedRuntime/stateful-replica-conf.yaml --set docker.registry=172.20.41.35:5000 --set rbac.enabled=false --set imagePullSecret= --set docker.imagePullSecret= --set version.package=2.1.60 --set thriftserver.docker.image=vora/thriftserver --set thriftserver.docker.imageTag=2.1.14.25-vora-2.1 --set auth.enable=true --set secop.ctxRoot=/etc/vora-security --set secop.ctxName=consumer --set secop.docker.image=vora/init-security --set secop.docker.imageTag=0.0.9 --set version.component=2.1.14.25-vora-2.1 .
NAME: knotted-macaw
LAST DEPLOYED: Thu Mar 29 09:53:29 2018
NAMESPACE: vora
STATUS: DEPLOYED
RESOURCES:
==> v1/Service
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
vora-thriftserver 100.69.133.27 <none> 10001/TCP 1s
==> v1beta1/Deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
vora-thriftserver 1 1 1 0 1s
Authentication is enabled!
Running validation...
Wait until vora cluster is ready...
Wait until vora cluster is ready...
...........
Wait until vora cluster is ready...
Wait until vora cluster is ready...
Timeout while waiting for vora cluster! See below for more details:
Name: vora
Namespace: vora
Labels: <none>
Annotations: <none>
API Version: sap.com/v1
Kind: VoraCluster
Metadata:
Cluster Name:
Creation Timestamp: 2018-03-29T09:53:24Z
Generation: 0
Resource Version: 497995
Self Link: /apis/sap.com/v1/namespaces/vora/voraclusters/vora
UID: 055fc3ab-3337-11e8-8c30-0aa4c3a975fc
Spec:
Components:
Catalog:
Replicas: 1
Trace Level: info
Disk:
Db Space Size: 10000
Initial Delay Seconds: 180
Large Memory Limit: 3000
Main Cache Memory Limit: 3000
Network Drivers List: none
Pv:
Volume Claim Annotations: <nil>
Replicas: 1
Storage Size: 50Gi
Temporary Cache Memory Limit: 3000
Termination Grace Period Seconds: 300
Trace Level: info
Dlog:
Buffer Size: 4g
Initial Delay Seconds: 15
Pv:
Volume Claim Annotations: <nil>
Replication Factor: 2
Standby Factor: 1
Storage Size: 50Gi
Termination Grace Period Seconds: 60
Trace Level: info
Doc Store:
Replicas: 1
Trace Level: info
Global Parameters:
Health Check:
Deregister Timeout: 2m
Initial Delay Seconds: 15
Period Seconds: 5
Termination Grace Period Seconds: 60
Security:
Context: consumer
Context Root: /etc/vora-security
Image: 172.20.41.35:5000/vora/init-security:0.0.9
Trace Level: info
Graph:
Replicas: 1
Trace Level: info
Landscape:
Bootstrapping: True
Replicas: 1
Replication Factor: 1
Trace Level: info
Relational:
Replicas: 1
Trace Level: info
Time Series:
Replicas: 1
Trace Level: info
Tx Broker:
Replicas: 1
Trace Level: info
Tx Coordinator:
Node Port: 0
Replicas: 1
Service Type: NodePort
Trace Level: info
Tx Lock Manager:
Replicas: 1
Trace Level: info
Docker:
Image: 172.20.41.35:5000/vora/dqp:2.1.32.25-vora-2.1
Image Pull Secret:
Version:
Component: 2.1.32.25-vora-2.1
Package: 2.1.60
Status:
Message: Less available workers than Distributed Log requirements
State: Failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Update Vora Cluster 10m vora-deployment-operator Processing failed: less available workers than Distributeed Log requirements
New Vora Cluster 10m vora-deployment-operator Started processing
Timeout waiting for vora cluster! Please check the status of the cluster from above logs and kubernetes dashboard...
And some checks
kubectl get pods --namespace=vora -w
NAME READY STATUS RESTARTS AGE
vora-consul-0 1/1 Running 0 40m
vora-consul-1 1/1 Running 0 39m
vora-consul-2 1/1 Running 0 39m
vora-deployment-operator-cc84bff65-hgtt4 1/1 Running 0 38m
vora-elasticsearch-logging-v1-6cd4d466dc-gml9d 1/1 Running 0 38m
vora-elasticsearch-logging-v1-6cd4d466dc-k882r 1/1 Running 0 38m
vora-elasticsearch-retention-policy-5876dc64d4-6rb2l 1/1 Running 0 38m
vora-fluentd-kubernetes-v1.21-95xt2 1/1 Running 0 38m
vora-fluentd-kubernetes-v1.21-f856k 1/1 Running 0 38m
vora-grafana-7b5454487b-xgbjt 1/1 Running 0 38m
vora-grafana-set-datasource-nwkt4 0/1 Completed 1 38m
vora-kibana-logging-c9565b88f-wm87j 1/1 Running 0 38m
7 vora-kibana-logging-set-settings-h2vs2 0/1 Completed 1 38m
vora-prometheus-kube-state-metrics-57bb8bdb76-xlx4l 1/1 Running 0 38m
vora-prometheus-node-exporter-m7znt 1/1 Running 0 38m
vora-prometheus-node-exporter-mp5ls 1/1 Running 0 38m
vora-prometheus-pushgateway-85dcf9f96f-j74j2 1/1 Running 0 38m
vora-prometheus-pushgateway-cleaner-7ddf5657f-nwzrc 1/1 Running 0 38m
vora-prometheus-server-797df6d8fb-5s7zd 2/2 Running 0 38m
vora-security-operator-77f7fb9f5-zfs2z 1/1 Running 0 40m
vora-thriftserver-845646d95-5cz45 2/2 Running 0 38m
^Cadmin#ip-172-20-41-35:/install/SAPVora-2.1.60-DistributedRuntime$ helm test kindred-clam
Error: release: "kindred-clam" not found
admin#ip-172-20-41-35:/install/SAPVora-2.1.60-DistributedRuntime$ kubectl exec vora-consul-0 consul members --namespace=vora | grep server
vora-consul-0 100.96.1.9:8301 alive server 0.9.0 2 dc1
vora-consul-1 100.96.0.18:8301 alive server 0.9.0 2 dc1
vora-consul-2 100.96.1.10:8301 alive server 0.9.0 2 dc1
Seems the installer did not create the cluster, at all:
kubectl get vc CRD -n vora
Error from server (NotFound): voraclusters.sap.com "CRD" not found
Is there a way to manually create the cluster? Or, is that even my issue or is it something else?
Issue above is error "Processing failed: less available workers than Distributeed Log requirements".
With Vora 2.1, by default you need 1 master and 3 workers. Minimum size is 1 master and 2 workers. To only use 2 workers with Vora 2.1, you need to change the replicationFactor for DLOG in deployment/helm/vora-cluster/values.yaml
Original (3 workers needed; one for each DLOG)
dlog:
replicationFactor: 2
standbyFactor: 1
Minimum (2 workers; need to change replicationFactor)
dlog:
replicationFactor: 1
standbyFactor: 1
How many nodes you have? Recommended size is 1 master & 2 worker nodes. Typically, no Vora pods are scheduled in master as it is non-schedule able node. So all pods are scheduled in worker nodes and dlog service requires at least 2 nodes. If you have 2 nodes including master then make make master schedulable. I hope it solves your problem.
thanks Frank!
By decreasing the replication factor, I was now able to finish the install.
Now I can continue with the setup on the Hadoop cluster
Related
Im deploying EKS cluster and configuring the managed node groups so that we can have master and worker nodes .
following this doc :
https://docs.aws.amazon.com/eks/latest/userguide/cni-iam-role.html
while running this command :
kubectl get pods -n kube-system -l k8s-app=aws-node
I dont see any POD with that label . dont know why ?
Is it something due to missing configuration OR I missed something while deploying EKS cluster
please suggest
UPDATE 1
kubectl describe daemonset aws-node -n kube-system
output
Name: aws-node Selector: k8s-app=aws-node Node-Selector: <none> Labels: app.kubernetes.io/instance=aws-vpc-cni
app.kubernetes.io/name=aws-node
app.kubernetes.io/version=v1.11.4
k8s-app=aws-node Annotations: deprecated.daemonset.template.generation: 2 Desired Number of Nodes Scheduled: 0 Current Number of Nodes Scheduled: 0 Number of Nodes Scheduled with Up-to-date Pods: 0 Number of Nodes Scheduled with Available Pods: 0 Number of Nodes Misscheduled: 0 Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app.kubernetes.io/instance=aws-vpc-cni
app.kubernetes.io/name=aws-node
k8s-app=aws-node Service Account: aws-node
kubectl get nodes command says No resources found
No pod will be running if you don't have any worker node. Easiest way to add worker node is on the AWS console, goto Amazon Elastic Kubernetes Service and click on your cluster, goto "Compute" tab and select the node group, click "Edit" and change "Desired size" to > 1.
I'm trying to deploy a simple REST API written in Golang to AWS EKS.
I created an EKS cluster on AWS using Terraform and applied the AWS load balancer controller Helm chart to it.
All resources in the cluster look like:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/aws-load-balancer-controller-5947f7c854-fgwk2 1/1 Running 0 75m
kube-system pod/aws-load-balancer-controller-5947f7c854-gkttb 1/1 Running 0 75m
kube-system pod/aws-node-dfc7r 1/1 Running 0 120m
kube-system pod/aws-node-hpn4z 1/1 Running 0 120m
kube-system pod/aws-node-s6mng 1/1 Running 0 120m
kube-system pod/coredns-66cb55d4f4-5l7vm 1/1 Running 0 127m
kube-system pod/coredns-66cb55d4f4-frk6p 1/1 Running 0 127m
kube-system pod/kube-proxy-6ndf5 1/1 Running 0 120m
kube-system pod/kube-proxy-s95qk 1/1 Running 0 120m
kube-system pod/kube-proxy-vdrdd 1/1 Running 0 120m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 127m
kube-system service/aws-load-balancer-webhook-service ClusterIP 10.100.202.90 <none> 443/TCP 75m
kube-system service/kube-dns ClusterIP 10.100.0.10 <none> 53/UDP,53/TCP 127m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/aws-node 3 3 3 3 3 <none> 127m
kube-system daemonset.apps/kube-proxy 3 3 3 3 3 <none> 127m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/aws-load-balancer-controller 2/2 2 2 75m
kube-system deployment.apps/coredns 2/2 2 2 127m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/aws-load-balancer-controller-5947f7c854 2 2 2 75m
kube-system replicaset.apps/coredns-66cb55d4f4 2 2 2 127m
I can run the application locally with Go and with Docker. But releasing this on AWS EKS always throws CrashLoopBackOff.
Running kubectl describe pod PODNAME shows:
Name: go-api-55d74b9546-dkk9g
Namespace: default
Priority: 0
Node: ip-172-16-1-191.ec2.internal/172.16.1.191
Start Time: Tue, 15 Mar 2022 07:04:08 -0700
Labels: app=go-api
pod-template-hash=55d74b9546
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 172.16.1.195
IPs:
IP: 172.16.1.195
Controlled By: ReplicaSet/go-api-55d74b9546
Containers:
go-api:
Container ID: docker://a4bc07b60c85fd308157d967d2d0d688d8eeccfe4c829102eb929ca82fb25595
Image: saurabhmish/golang-hello:latest
Image ID: docker-pullable://saurabhmish/golang-hello#sha256:f79a495ad17710b569136f611ae3c8191173400e2cbb9cfe416e75e2af6f7874
Port: 3000/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 15 Mar 2022 07:09:50 -0700
Finished: Tue, 15 Mar 2022 07:09:50 -0700
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jt4gp (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-jt4gp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m31s default-scheduler Successfully assigned default/go-api-55d74b9546-dkk9g to ip-172-16-1-191.ec2.internal
Normal Pulled 7m17s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 12.77458991s
Normal Pulled 7m16s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 110.127771ms
Normal Pulled 7m3s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 109.617419ms
Normal Created 6m37s (x4 over 7m17s) kubelet Created container go-api
Normal Started 6m37s (x4 over 7m17s) kubelet Started container go-api
Normal Pulled 6m37s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 218.952336ms
Normal Pulling 5m56s (x5 over 7m30s) kubelet Pulling image "saurabhmish/golang-hello:latest"
Normal Pulled 5m56s kubelet Successfully pulled image "saurabhmish/golang-hello:latest" in 108.105083ms
Warning BackOff 2m28s (x24 over 7m15s) kubelet Back-off restarting failed container
Running kubectl logs PODNAME and kubectl logs PODNAME -c go-api shows standard_init_linux.go:228: exec user process caused: exec format error
Manifests:
go-deploy.yaml ( This is the Docker Hub Image with documentation )
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: go-api
labels:
app: go-api
spec:
replicas: 2
selector:
matchLabels:
app: go-api
strategy: {}
template:
metadata:
labels:
app: go-api
spec:
containers:
- name: go-api
image: saurabhmish/golang-hello:latest
ports:
- containerPort: 3000
resources: {}
go-service.yaml
---
kind: Service
apiVersion: v1
metadata:
name: go-api
spec:
selector:
app: go-api
type: NodePort
ports:
- protocol: TCP
port: 80
targetPort: 3000
How can I fix this error ?
Posting this as Community wiki for better visibility.
Feel free to expand it.
Thanks to #David Maze, who pointed to the solution. There is an article 'Build Intel64-compatible Docker images from Mac M1 (ARM)' (by Beppe Catanese) here.
This article describes the underlying problem well.
You are developing/building on the ARM architecture (Mac M1), but you deploy the docker image to a x86-64 architecture based Kubernetes cluster.
Solution:
Option A: use buildx
Buildx is a Docker plugin that allows, amongst other features, to build images for various target platforms.
$ docker buildx build --platform linux/amd64 -t myapp .
Option B: set DOCKER_DEFAULT_PLATFORM
The DOCKER_DEFAULT_PLATFORM environment variable permits to set the default platform for the commands that take the --platform flag.
export DOCKER_DEFAULT_PLATFORM=linux/amd64
A CrashloopBackOff means that you have a pod starting, crashing, starting again, and then crashing again.
Maybe the error come from the application itself that it can not connect to database, redis,...
You may find something useful here:
My kubernetes pods keep crashing with "CrashLoopBackOff" but I can't find any log
I am trying to run argocd on my EC2 instance running centos 7 by following official documentation and EKS workshop from AWS, but it is in pending state, all pods from kube-system namespace are running fine.
below is the output of kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
argocd argocd-application-controller-5785f6b79-nvg7n 0/1 Pending 0 29s
argocd argocd-dex-server-7f5d7d6645-gprpd 0/1 Pending 0 19h
argocd argocd-redis-cccbb8f7-vb44n 0/1 Pending 0 19h
argocd argocd-repo-server-67ddb49495-pnw5k 0/1 Pending 0 19h
argocd argocd-server-6bcbf7997d-jqqrw 0/1 Pending 0 19h
kube-system calico-kube-controllers-56b44cd6d5-tzgdm 1/1 Running 0 19h
kube-system calico-node-4z9tx 1/1 Running 0 19h
kube-system coredns-f9fd979d6-8d6hm 1/1 Running 0 19h
kube-system coredns-f9fd979d6-p9dq6 1/1 Running 0 19h
kube-system etcd-ip-10-1-3-94.us-east-2.compute.internal 1/1 Running 0 19h
kube-system kube-apiserver-ip-10-1-3-94.us-east-2.compute.internal 1/1 Running 0 19h
kube-system kube-controller-manager-ip-10-1-3-94.us-east-2.compute.internal 1/1 Running 0 19h
kube-system kube-proxy-tkp7k 1/1 Running 0 19h
kube-system kube-scheduler-ip-10-1-3-94.us-east-2.compute.internal 1/1 Running 0 19h
While same configuration is working fine on my local mac, I've made sure that docker, kubernetes services are up and runnning. Tried deleting pods, reconfigured argocd, however everytime result remained same.
Being new to ArgoCD I am unable to figure out the reason for the same. Please let me know where I am going wrong. Thanks!
I figured out what the problem was by running:
kubectl describe pods <name> -n argocd
It gave output ending with FailedScheduling:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m (x5 over 7m2s) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
henceforth, by referring this GitHub issue, I figured out to run:
kubectl taint nodes --all node-role.kubernetes.io/master-
After this command, pods started to work and transitioned from Pending state to Running with kubectl describe pods showing output as:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m (x5 over 7m2s) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
Normal Scheduled 106s default-scheduler Successfully assigned argocd/argocd-server-7d44dfbcc4-qfj6m to ip-XX-XX-XX-XX.<region>.compute.internal
Normal Pulling 105s kubelet Pulling image "argoproj/argocd:v1.7.6"
Normal Pulled 81s kubelet Successfully pulled image "argoproj/argocd:v1.7.6" in 23.779457251s
Normal Created 72s kubelet Created container argocd-server
Normal Started 72s kubelet Started container argocd-server
From this error and resolution I've learned to always use kubectl describe pods to resolve the errors.
I am unable to to deploy nginx containers using kubectl to AWS Fargate using virtual-kubelet. I am following this guide: https://aws.amazon.com/blogs/opensource/aws-fargate-virtual-kubelet/.
I am having an issue with Step 6: Create Kubernetes objects.
I would like to know why the nginx containers are PENDING and why the AWS Fargate task definitions have not been created.
The following is some of my commands I used. I can give more detail upon request.
# ./virtual-kubelet --provider aws --provider-config fargate.toml
...
2019/05/16 06:50:24 Received NodeDaemonEndpoints request.
ERRO[0000] TLS certificates not provided, not setting up pod http server certPath= keyPath= node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] Initialized node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] Created node node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] Node leases not supported, falling back to only node status updates node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] Pod cache in-sync node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
2019/05/16 06:50:25 Received GetPods request.
2019/05/16 06:50:25 Responding to GetPods: [].
INFO[0000] starting workers node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] started workers node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
# kubectl describe node virtual-kubelet
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedToCreateRoute 98s (x951 over 160m) route_controller (combined from similar events): Could not create route e1e32758-77a6-11e9-a68e-0a95bb07bfa2 100.96.4.0/24 for node virtual-kubelet after 47.871544ms: instance not found
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-20-47-10.eu-west-2.compute.internal Ready master 30h v1.14.1
ip-172-20-47-242.eu-west-2.compute.internal Ready node 30h v1.14.1
ip-172-20-59-102.eu-west-2.compute.internal Ready node 30h v1.14.1
virtual-kubelet Ready agent 33m v1.13.1-vk-v0.9.0-40-g5b3190ac-dev
kubectl create -f nginx-deployment.yaml
# kubectl get deployments -o wide
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-c6695csfc-5f7bh 0/1 Pending 0 21m <none> <none> <none> <none>
nginx-deployment-c6695csfc-bwfb8 0/1 Pending 0 21m <none> <none> <none> <none>
nginx-deployment-c6695csfc-mcfvw 0/1 Pending 0 21m <none> <none> <none> <none>
# kubectl describe pod nginx-deployment-c6695csfc-5f7bh
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m11s (x191 over 22m) default-scheduler 0/4 nodes are available: 1 Insufficient cpu, 1 node(s) had taints that the pod didn't tolerate, 3 node(s) didn't match node selector.
Update:
I then ran the command to add the nodeSelector to my nodes using the following command for each node:
kubectl label nodes ip-172-20-47-15.eu-west-2.compute.internal type=virtual-kubelet
type=virtual-kubelet is the nodeSelector specified in the manifest file, nginx-deployment.yaml.
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-c6695csfc-5f7bh 1/1 Running 0 4m59s 100.96.2.7 ip-172-20-47-242.eu-west-2.compute.internal <none> <none>
nginx-deployment-c6695csfc-bwfb8 1/1 Running 0 4m59s 100.96.1.6 ip-172-20-59-102.eu-west-2.compute.internal <none> <none>
nginx-deployment-c6695csfc-mcfvw 1/1 Running 0 4m59s 100.96.2.8 ip-172-20-47-242.eu-west-2.compute.internal <none>
Now when I go to the AWS Fargate Dashboard the associated task definitions are not created as shown in the tutorial.
This issue is resolved. I was able to create the AWS Fargate definitions by adding the ALB Security group to the fargate.toml file and by adding tolerations to the nginx.deployment.yaml file as shown below:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
tolerations:
- key: virtual-kubelet.io/provider
operator: Equal
value: azure
effect: NoSchedule
Use kops install k8s cluster on AWS.
Use Helm installed Prometheus:
$ helm install stable/prometheus \
--set server.persistentVolume.enabled=false \
--set alertmanager.persistentVolume.enabled=false
Then followed this note to do port-forward:
Get the Prometheus server URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace default port-forward $POD_NAME 9090
My EC2 instance public IP on AWS is 12.29.43.14(not true). When I tried to access it from browser:
http://12.29.43.14:9090
Can't access the page. Why?
Another issue, after installed prometheus chart, the alertmanager pod didn't run:
ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4 1/2 CrashLoopBackOff 1 9s
ungaged-woodpecker-prometheus-kube-state-metrics-5fd97698cktsj5 1/1 Running 0 9s
ungaged-woodpecker-prometheus-node-exporter-45jtn 1/1 Running 0 9s
ungaged-woodpecker-prometheus-node-exporter-ztj9w 1/1 Running 0 9s
ungaged-woodpecker-prometheus-pushgateway-57b67c7575-c868b 0/1 Running 0 9s
ungaged-woodpecker-prometheus-server-7f858db57-w5h2j 1/2 Running 0 9s
Check pod details:
$ kubectl describe po ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4
Name: ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4
Namespace: default
Node: ip-100.200.0.1.ap-northeast-1.compute.internal/100.200.0.1
Start Time: Fri, 26 Jan 2018 02:45:10 +0000
Labels: app=prometheus
component=alertmanager
pod-template-hash=2959465499
release=ungaged-woodpecker
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff","uid":"ec...
kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container prometheus-alertmanager; cpu request for container prometheus-alertmanager-configmap-reload
Status: Running
IP: 100.96.6.91
Created By: ReplicaSet/ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff
Controlled By: ReplicaSet/ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff
Containers:
prometheus-alertmanager:
Container ID: docker://e9fe9d7bd4f78354f2c072d426fa935d955e0d6748c4ab67ebdb84b51b32d720
Image: prom/alertmanager:v0.9.1
Image ID: docker-pullable://prom/alertmanager#sha256:ed926b227327eecfa61a9703702c9b16fc7fe95b69e22baa656d93cfbe098320
Port: 9093/TCP
Args:
--config.file=/etc/config/alertmanager.yml
--storage.path=/data
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 26 Jan 2018 02:45:26 +0000
Finished: Fri, 26 Jan 2018 02:45:26 +0000
Ready: False
Restart Count: 2
Requests:
cpu: 100m
Readiness: http-get http://:9093/%23/status delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wppzm (ro)
prometheus-alertmanager-configmap-reload:
Container ID: docker://9320a0f157aeee7c3947027667aa6a2e00728d7156520c19daec7f59c1bf6534
Image: jimmidyson/configmap-reload:v0.1
Image ID: docker-pullable://jimmidyson/configmap-reload#sha256:2d40c2eaa6f435b2511d0cfc5f6c0a681eeb2eaa455a5d5ac25f88ce5139986e
Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9093/-/reload
State: Running
Started: Fri, 26 Jan 2018 02:45:11 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wppzm (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: ungaged-woodpecker-prometheus-alertmanager
Optional: false
storage-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-wppzm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-wppzm
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 34s default-scheduler Successfully assigned ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4 to ip-100.200.0.1.ap-northeast-1.compute.internal
Normal SuccessfulMountVolume 34s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "storage-volume"
Normal SuccessfulMountVolume 34s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "config-volume"
Normal SuccessfulMountVolume 34s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal MountVolume.SetUp succeeded for volume "default-token-wppzm"
Normal Pulled 33s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Container image "jimmidyson/configmap-reload:v0.1" already present on machine
Normal Created 33s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Created container
Normal Started 33s kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Started container
Normal Pulled 18s (x3 over 34s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Container image "prom/alertmanager:v0.9.1" already present on machine
Normal Created 18s (x3 over 34s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Created container
Normal Started 18s (x3 over 33s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Started container
Warning BackOff 2s (x4 over 32s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Back-off restarting failed container
Warning FailedSync 2s (x4 over 32s) kubelet, ip-100.200.0.1.ap-northeast-1.compute.internal Error syncing pod
Not sure why it FailedSync.
When you do a kubectl port-forward with that command it makes the port available on your localhost. So run the command and then hit http://localhost:9090.
You won't be able to directly hit the prometheus ports from the public IP, outside the cluster. In the longer run you may want expose prometheus at a nice domain name via ingress (which the chart supports), that's how I'd do it. To use the chart's support for ingress you will need to install an ingress controller in your cluster (like the nginx ingress controller for example), and then enable ingress by setting --set service.ingress.enabled=true and --set server.ingress.hosts[0]=prometheus.yourdomain.com. Ingress is a fairly large topic in itself, so I'll just refer you to the official docs for that one:
https://kubernetes.io/docs/concepts/services-networking/ingress/
And here's the nginx ingress controller:
https://github.com/kubernetes/ingress-nginx
As far as the pod that is showing FailedSync, take a look at the logs using kubectl logs ungaged-woodpecker-prometheus-alertmanager-6f9f8b98ff-qhhw4 to see if there's any additional information there.