None persistent Prometheus metrics on Kubernetes - flask

I'm collecting Prometheus metrics from a uwsgi application hosted on Kubernetes, the metrics are not retained after the pods are deleted. Prometheus server is hosted on the same kubernetes cluster and I have assigned a persistent storage to it.
How do I retain the metrics from the pods even after they deleted?
The Prometheus deployment yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: prometheus
namespace: default
spec:
replicas: 1
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus/"
- "--storage.tsdb.retention=2200h"
ports:
- containerPort: 9090
volumeMounts:
- name: prometheus-config-volume
mountPath: /etc/prometheus/
- name: prometheus-storage-volume
mountPath: /prometheus/
volumes:
- name: prometheus-config-volume
configMap:
defaultMode: 420
name: prometheus-server-conf
- name: prometheus-storage-volume
persistentVolumeClaim:
claimName: azurefile
---
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus
name: prometheus
spec:
type: LoadBalancer
loadBalancerIP: ...
ports:
- port: 80
protocol: TCP
targetPort: 9090
selector:
app: prometheus
Application deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-app
spec:
replicas: 2
selector:
matchLabels:
app: api-app
template:
metadata:
labels:
app: api-app
spec:
containers:
- name: nginx
image: nginx
lifecycle:
preStop:
exec:
command: ["/usr/sbin/nginx","-s","quit"]
ports:
- containerPort: 80
protocol: TCP
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 10m
memory: 50Mi
volumeMounts:
- name: app-api
mountPath: /var/run/app
- name: nginx-conf
mountPath: /etc/nginx/conf.d
- name: api-app
image: azurecr.io/app_api_se:opencv
workingDir: /app
command: ["/usr/local/bin/uwsgi"]
args:
- "--die-on-term"
- "--manage-script-name"
- "--mount=/=api:app_dispatch"
- "--socket=/var/run/app/uwsgi.sock"
- "--chmod-socket=777"
- "--pyargv=se"
- "--metrics-dir=/storage"
- "--metrics-dir-restore"
resources:
requests:
cpu: 150m
memory: 1Gi
volumeMounts:
- name: app-api
mountPath: /var/run/app
- name: storage
mountPath: /storage
volumes:
- name: app-api
emptyDir: {}
- name: storage
persistentVolumeClaim:
claimName: app-storage
- name: nginx-conf
configMap:
name: app
tolerations:
- key: "sku"
operator: "Equal"
value: "test"
effect: "NoSchedule"
---
apiVersion: v1
kind: Service
metadata:
labels:
app: api-app
name: api-app
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: api-app

Your issue is with the wrong type of controller used to deploy Prometheus. The Deployment controller is wrong choice in this case (it's meant for Stateless applications, that don't need to maintain any persistence identifiers between Pods rescheduling - like persistence data).
You should switch to StatefulSet kind*, if you require persistence of data (metrics scraped by Prometheus) across Pod (re)scheduling.
*This is how Prometheus is deployed by default with prometheus-operator.

With this configuration for a volume, it will be removed when you release a pod. You are basically looking for a PersistentVolumne, documentation and example.
Also check, PersistentVolumeClaim.

Related

Containers in pod won't talk to each other in Kubernetes

I have three containers in a pod: nginx, redis, custom django app. It seems like none of them talk to each other with kubernetes. In docker compose they do but I can't use docker compose in production.
The django container gets this error:
[2022-06-20 21:45:49,420: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error 111 connecting to redis:6379. Connection refused..
Trying again in 32.00 seconds... (16/100)
and the nginx container starts but never shows any traffic. Trying to connect to localhost:8000 gets no reply.
Any idea whats wrong with my yml file?
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
creationTimestamp: null
name: djangonetwork
spec:
ingress:
- from:
- podSelector:
matchLabels:
io.kompose.network/djangonetwork: "true"
podSelector:
matchLabels:
io.kompose.network/djangonetwork: "true"
---
apiVersion: v1
data:
DB_HOST: db
DB_NAME: django_db
DB_PASSWORD: password
DB_PORT: "5432"
DB_USER: user
kind: ConfigMap
metadata:
creationTimestamp: null
labels:
io.kompose.service: web
name: envs--django
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
io.kompose.service: web
name: web
spec:
replicas: 1
selector:
matchLabels:
io.kompose.service: web
strategy:
type: Recreate
template:
metadata:
labels:
io.kompose.network/djangonetwork: "true"
io.kompose.service: web
spec:
containers:
- image: nginx:alpine
name: nginxcontainer
ports:
- containerPort: 8000
- image: redis:alpine
name: rediscontainer
ports:
- containerPort: 6379
resources: {}
- env:
- name: DB_HOST
valueFrom:
configMapKeyRef:
key: DB_HOST
name: envs--django
- name: DB_NAME
valueFrom:
configMapKeyRef:
key: DB_NAME
name: envs--django
- name: DB_PASSWORD
valueFrom:
configMapKeyRef:
key: DB_PASSWORD
name: envs--django
- name: DB_PORT
valueFrom:
configMapKeyRef:
key: DB_PORT
name: envs--django
- name: DB_USER
valueFrom:
configMapKeyRef:
key: DB_USER
name: envs--django
image: localhost:5000/integration/web:latest
name: djangocontainer
ports:
- containerPort: 8000
resources: {}
restartPolicy: Always
status: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
io.kompose.service: web
name: web
spec:
ports:
- name: "8000"
port: 8000
targetPort: 8000
selector:
io.kompose.service: web
You've put all three containers into a single Pod. That's usually not the preferred approach: it means you can't restart one of the containers without restarting all of them (any update to your application code requires discarding your Redis cache) and you can't individually scale the component parts (if you need five replicas of your application, do you also need five reverse proxies and can you usefully use five Redises?).
Instead, a preferred approach is to split these into three separate Deployments (or possibly use a StatefulSet for Redis with persistence). Each has a corresponding Service, and then those Service names can be used as DNS names.
A very minimal example for Redis could look like:
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
spec:
replicas: 1
template:
metadata:
labels:
service: web
component: redis
spec:
containers:
- name: redis
image: redis
ports:
- name: redis
containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
name: redis # <-- this name will be a DNS name
spec:
selector: # matches the template: { metadata: { labels: } }
service: web
component: redis
ports:
- name: redis
port: 6379
targetPort: redis # matches a containerPorts: [{ name: }]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
...
env:
- name: REDIS_HOST
value: redis # matches the Service
If all three parts are in the same Pod, then the Service can't really distinguish which part it's talking to. In principle, between these containers, they share a network namespace and need to talk to each other as localhost; the containers: [{ name: }] have no practical effect.

PV & PVC with EKS Cluster

After kubectl apply -f pvc.yaml the below yaml file, I can able to find the mount path /var/local/pvctest inside the container that has been created. But, the host path /var/local/pvctest in the worker node is not created.
I'm new to PV & PVC with EKS and any help to fix this issue is much appreciated!
kind: Deployment
apiVersion: apps/v1
metadata:
name: pvctest
labels:
alias: pvctest
spec:
selector:
matchLabels:
alias: pvctest
replicas: 1
template:
metadata:
labels:
alias: pvctest
spec:
containers:
- name: pvctest
image: neo4j
ports:
- containerPort: 7474
- containerPort: 7687
volumeMounts:
- name: testpv
mountPath: /var/local/pvctest
volumes:
- name: testpv
persistentVolumeClaim:
claimName: pvctest-claim
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: pvtest
labels:
type: local
spec:
persistentVolumeReclaimPolicy: Retain
capacity:
storage: 3Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /var/local/pvctest
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvctest-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
PersistentVolume with hostPath requires the directory on the host to be pre-created. If you want the directory to be created automatically for you:
...
containers:
- name: pvctest
image: neo4j
...
volumeMounts:
- name: testpv
mountPath: /var/local/pvctest
volumes:
- name: testpv
hostPath:
path: /data
type: DirectoryOrCreate
PV/PVC is actually optonal for hostPath.

Kubernetes deployment resource limit

Here is my deployment & service file for Django. The 3 pods generated from deployment.yaml works, but the resource request and limits are being ignored.
I have seen a lot of tutorials about applying resource specifications on Pods but not on Deployment files, is there a way around it?
Here is my yaml file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: djangoapi
type: web
name: djangoapi
namespace: "default"
spec:
replicas: 3
template:
metadata:
labels:
app: djangoapi
type: web
spec:
containers:
- name: djangoapi
image: wbivan/app:v0.8.1a
imagePullPolicy: Always
args:
- gunicorn
- api.wsgi
- --bind
- 0.0.0.0:8000
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
envFrom:
- configMapRef:
name: djangoapi-config
ports:
- containerPort: 8000
resources: {}
imagePullSecrets:
- name: regcred
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
name: djangoapi-svc
namespace: "default"
labels:
app: djangoapi
spec:
ports:
- port: 8000
protocol: TCP
targetPort: 8000
selector:
app: djangoapi
type: web
type: NodePort
There is one extra resource attribute under your container definition after ports.
resources: {}
This overrides original resource definition.
Remove this one and apply it again.
The simple way to avoid such issue is to use a YAML validator.
yamllint Seems like a great tool to validate and parse the YAML.
Once you run the validation, it provides a list of all the wrong things you have been doing.
Example:-
# yamllint file.yml
38:9 error duplication of key "resources" in mapping (key-duplicates)

Google Cloud, Kubernetes and Volumes

I'm new to GCE and K8s and I'm trying to figure out my first deployment, but I get an error with my volumes:
Failed to attach volume "pv0001" on node "xxxxx" with: GCE persistent disk not found: diskName="pd-disk-1" zone="europe-west1-b"
Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "xxx". list of unattached/unmounted volumes=[registrator-claim0]
This is my storage yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0001
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
gcePersistentDisk:
fsType: ext4
pdName: pd-disk-1
This is my Claim:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
name: registrator-claim0
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
status: {}
This is my Deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
name: consul
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
service: consul
spec:
restartPolicy: Always
containers:
- name: consul
image: eu.gcr.io/xxxx/consul
ports:
- containerPort: 8300
protocol: TCP
- containerPort: 8400
protocol: TCP
- containerPort: 8500
protocol: TCP
- containerPort: 53
protocol: UDP
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
args:
- -server
- -bootstrap
- -advertise=$(MY_POD_IP)
- name: registrator
args:
- -internal
- -ip=192.168.99.101
- consul://localhost:8500
image: eu.gcr.io/xxxx/registrator
volumeMounts:
- mountPath: /tmp/docker.sock
name: registrator-claim0
volumes:
- name: registrator-claim0
persistentVolumeClaim:
claimName: registrator-claim0
status: {}
What am I doing wrong? Figuring out K8s and GCE isn't that easy. These errors are not exactly helping. Hope someone can help me.
you've to create the actual storage before you define the PV, this can be done with sth like:
# make sure you're in the right zone
$ gcloud config set compute/europe-west1-b
# create the disk
$ gcloud compute disks create --size 10GB pd-disk-1
Once thats available you can create the PV and the PVC

Why can Kubernetes not route a service on public ELB on AWS?

I've been trying to follow the example (guestbook) to reproduce another application which has to be available on a public interface.
This is my Kubernetes configuration (YAML):
apiVersion: v1
kind: Service
metadata:
name: my-app-server
labels:
app: my-app-server
tier: backend
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 3000
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: my-app-server
spec:
replicas: 3
template:
metadata:
labels:
app: my-app-server
tier: backend
spec:
containers:
- name: ppm-server
image: docker/container:tag
imagePullPolicy: Always
resources:
requests:
cpu: 100m
memory: 100Mi
env:
- name: GET_HOSTS_FROM
value: dns
ports:
- containerPort: 3000
imagePullSecrets:
- name: myregistrykey
Not sure why this is not working.
The guestbook all-in-one example seems to work just fine though.
I tried using the exact same configuration file while just changing the variables in the configuration.