Using python manage.py migrate --check in kubernetes readinessProbe never succeeds - django

I have a django deployment on kubernetes cluster and in the readinessProbe, I am running python, manage.py, migrate, --check. I can see that the return value of this command is 0 but the pod never becomes ready.
Snippet of my deployment:
containers:
- name: myapp
...
imagePullPolicy: Always
readinessProbe:
exec:
command: ["python", "manage.py", "migrate", "--check"]
initialDelaySeconds: 15
periodSeconds: 5
When I describe the pod which is not yet ready:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 66s default-scheduler Successfully assigned ... Normal Pulled 66s kubelet Successfully pulled image ...
Normal Created 66s kubelet Created container ...
Normal Started 66s kubelet Started container ...
Warning Unhealthy 5s (x10 over 50s) kubelet Readiness probe failed:
I can see that migrate --check returns 0 by execing into the container which is still in not ready state and running
python manage.py migrate
echo $?
0
Is there something wrong in my exec command passed as readinessProbe?
The version of kubernetes server that I am using is 1.21.7.
The base image for my deployment is python:3.7-slim.

The solution for the issue is to increase timeoutSeconds parameter, which is by default set to 1 second:
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
After increasing the timeoutSeconds parameter, the application is able to pass the readiness probe.
Example snippet of the deployment with timeoutSeconds parameter set to 5:
containers:
- name: myapp
...
imagePullPolicy: Always
readinessProbe:
exec:
command: ["python", "manage.py", "migrate", "--check"]
initialDelaySeconds: 15
periodSeconds: 5
timeoutSeconds: 5

Related

How to resolve EKS Fragate nodes disk pressure

I am running EKS cluster with fargate profile. I checked nodes status by using kubectl describe node and it is showing disk pressure:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 12 Jul 2022 03:10:33 +0000 Wed, 29 Jun 2022 13:21:17 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Tue, 12 Jul 2022 03:10:33 +0000 Wed, 06 Jul 2022 19:46:54 +0000 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Tue, 12 Jul 2022 03:10:33 +0000 Wed, 29 Jun 2022 13:21:17 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 12 Jul 2022 03:10:33 +0000 Wed, 29 Jun 2022 13:21:27 +0000 KubeletReady kubelet is posting ready status
And also there is failed garbage collection event.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FreeDiskSpaceFailed 11m (x844 over 2d22h) kubelet failed to garbage collect required amount of images. Wanted to free 6314505830 bytes, but freed 0 bytes
Warning EvictionThresholdMet 65s (x45728 over 5d7h) kubelet Attempting to reclaim ephemeral-storage
I think cause of disk filling quickly is due to application logs, which application is writing to stdout, as per aws documentation which in turn is written to log files by container agent and I am using fargate in-built fluentbit to push application logs to opensearch cluster.
But looks like EKS cluster is not deleting old log files created by container agent.
I was looking to SSH into fargate nodes to furhter debug issue but as per aws support ssh into fargate nodes not possible.
What can be done to remove disk pressure from fargate nodes?
As suggested in answers I am using logrotate in sidecar. But as per logs of logrotate container it is not able to find dir:
rotating pattern: /var/log/containers/*.log
52428800 bytes (5 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/containers/*.log
log /var/log/containers/*.log does not exist -- skipping
reading config file /etc/logrotate.conf
Reading state from file: /var/lib/logrotate.status
Allocating hash table for state file, size 64 entries
Creating new state
yaml file is:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-apis
namespace: kube-system
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: my-apis
image: 111111xxxxx.dkr.ecr.us-west-2.amazonaws.com/my-apis:1.0.3
ports:
- containerPort: 8080
resources:
limits:
cpu: "1000m"
memory: "1200Mi"
requests:
cpu: "1000m"
memory: "1200Mi"
readinessProbe:
httpGet:
path: "/ping"
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 2
livenessProbe:
httpGet:
path: "/ping"
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
- name: logrotate
image: realz/logrotate
volumeMounts:
- mountPath: /var/log/containers
name: my-app-logs
env:
- name: CRON_EXPR
value: "*/5 * * * *"
- name: LOGROTATE_LOGFILES
value: "/var/log/containers/*.log"
- name: LOGROTATE_FILESIZE
value: "50M"
- name: LOGROTATE_FILENUM
value: "5"
volumes:
- name: my-app-logs
emptyDir: {}
What can be done to remove disk pressure from fargate nodes?
No known configuration that could have Fargate to automatic clean a specific log location. You can run logrotate as sidecar. Plenty of choices here.
Found the cause of disk filling quickly. It was due to logging library logback writing logs to both files and console and log rotation policy in logback was retaining large number of log files for long periods. Removing appender in logback config that is writing to files to fix issue.
Also I found out that STDOUT logs written to files by container agent are rotated and have files size of 10 mb and maximum of 5 files. So it cannot cause disk pressure.

Using --net=host in Tekton sidecars

I am creating a tekton project which will spawn docker images which in turn will run few kubectl commands. This I have accomplished by using sidecars in tekton docker:dind image and setting
securityContext:
privileged: true
env:
However, one of the task is failing, since it needs to have an equivalent of --net=host in docker run example.
I have tried to set a podtemplate with hostnetwork: True, but then the task with the sidecar fails to start the docker
Any idea if I could implement --net=host in the task yaml file. It would be really helpful.
Snippet of my task with the sidecar:
sidecars:
- image: mypvtreg:exv1
name: mgmtserver
args:
- --storage-driver=vfs
- --userland-proxy=false
# - --net=host
securityContext:
privileged: true
env:
# Write generated certs to the path shared with the client.
- name: DOCKER_TLS_CERTDIR
value: /certs
volumeMounts:
- mountPath: /certs
As commented by #SYN, Using docker:dind as a sidecar, your builder container, executing in your Task steps, should connect to 127.0.0.1. That's how you would talk to your dind sidecar.

Gunicorn issues on gcloud. Memory faults and restarts thread

I am deploying a django application to gcloud using gunicorn without nginx.
Running the container locally works fine, the application boots and does a memory consuming job on startup in its own thread (building a cache). Approx. 900 MB of memory is used after the job is finished.
Gunicorn is started with:CMD gunicorn -b 0.0.0.0:8080 app.wsgi:application -k eventlet --workers=1 --threads=4 --timeout 1200 --log-file /gunicorn.log --log-level debug --capture-output --worker-tmp-dir /dev/shm
Now I want to deploy this to gcloud. Creating a running container with the following manifest:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: app
namespace: default
spec:
selector:
matchLabels:
run: app
template:
metadata:
labels:
run: app
spec:
containers:
- image: gcr.io/app-numbers/app:latest
imagePullPolicy: Always
resources:
limits:
memory: "2Gi"
requests:
memory: "2Gi"
name: app
ports:
- containerPort: 8080
protocol: TCP
Giving the container 2 GB of memory.
Looking at the logs, guniucorn is booting workers [2019-09-01 11:37:48 +0200] [17] [INFO] Booting worker with pid: 17
Using free -m in the container shows the memory slowly being consumed and dmesg shows:
[497886.626932] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[497886.636597] [1452813] 0 1452813 256 1 4 2 0 -998 pause
[497886.646332] [1452977] 0 1452977 597 175 5 3 0 447 sh
[497886.656064] [1452989] 0 1452989 10195 7426 23 4 0 447 gunicorn
[497886.666376] [1453133] 0 1453133 597 360 5 3 0 447 sh
[497886.676959] [1458304] 0 1458304 543235 520309 1034 6 0 447 gunicorn
[497886.686727] Memory cgroup out of memory: Kill process 1458304 (gunicorn) score 1441 or sacrifice child
[497886.697411] Killed process 1458304 (gunicorn) total-vm:2172940kB, anon-rss:2075432kB, file-rss:5804kB, shmem-rss:0kB
[497886.858875] oom_reaper: reaped process 1458304 (gunicorn), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
What could be going on creating a memory leak on gcloud and not locally?

istio 0.2.7 helloworld app init proxy_init stuck at podinitializing

Installing Istio for the first time in Kubernetes 1.7.9. Installed with automatic sidecar injection. When trying the sample applications, although the side car and the application containers are started and in "running' state, the proxy_init is stuck at PodInitializing and the overall Pod state is at Init:0/1.
[root#node-8 helloworld]# kubectl describe pods helloworld-v1-3194034472-12rgj
Name: helloworld-v1-3194034472-12rgj
Namespace: default
Node: node-8/136.225.226.159
Start Time: Wed, 01 Nov 2017 19:13:11 +0100
Labels: app=helloworld
pod-template-hash=3194034472
version=v1
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"helloworld-v1-3194034472","uid":"5212bc02-bf30-11e7-b818-0050560...
sidecar.istio.io/status=injected-version-0.2.7
Status: Running
IP: 192.168.144.130
Created By: ReplicaSet/helloworld-v1-3194034472
Controlled By: ReplicaSet/helloworld-v1-3194034472
Init Containers:
istio-init:
Container ID:
Image: docker.io/istio/proxy_init:0.2.7
Image ID:
Port: <none>
Args:
-p
15001
-u
1337
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-76kq4 (ro)
Containers:
helloworld:
Container ID: docker://aa89ecc46d273b76d71a0f67d5169519926cc0e01d9d1f2ab960e2b88a46013b
Image: istio/examples-helloworld-v1
Image ID: docker-pullable://docker.io/istio/examples-helloworld-v1#sha256:c671702b11cbcda103720c2bd3e81a4211012bfef085b7326bb7fbfd8cea4a94
Port: 5000/TCP
State: Running
Started: Wed, 01 Nov 2017 19:13:14 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-76kq4 (ro)
istio-proxy:
Container ID: docker://9bb16159d42229512892feae13614c4c373f3436957b6263c772f62282d75e02
Image: docker.io/istio/proxy:0.2.7
Image ID: docker-pullable://docker.io/istio/proxy#sha256:910546c29a32e11f58bab92e68513a5c8f636621c0e20197833270961fda3713
Port: <none>
Args:
proxy
sidecar
-v
2
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
helloworld
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system:8080
--discoveryRefreshDelay
1s
--zipkinAddress
zipkin.istio-system:9411
--connectTimeout
10s
--statsdUdpAddress
istio-mixer.istio-system:9125
--proxyAdminPort
15000
State: Running
Started: Wed, 01 Nov 2017 19:13:15 +0100
Ready: True
Restart Count: 0
Environment:
POD_NAME: helloworld-v1-3194034472-12rgj (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
Mounts:
/etc/certs/ from istio-certs (ro)
/etc/istio/proxy from istio-envoy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-76kq4 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
istio-certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio.default
Optional: true
default-token-76kq4:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-76kq4
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
4m 4m 1 default-scheduler Normal Scheduled Successfully assigned helloworld-v1-3194034472-12rgj to node-8
4m 4m 1 kubelet, node-8 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "istio-envoy"
4m 4m 1 kubelet, node-8 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-76kq4"
4m 4m 1 kubelet, node-8 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "istio-certs"
4m 4m 1 kubelet, node-8 spec.initContainers{istio-init} Normal Pulled Container image "docker.io/istio/proxy_init:0.2.7" already present on machine
4m 4m 1 kubelet, node-8 spec.initContainers{istio-init} Normal Created Created container
4m 4m 1 kubelet, node-8 spec.initContainers{istio-init} Normal Started Started container
4m 4m 1 kubelet, node-8 spec.containers{helloworld} Normal Pulled Container image "istio/examples-helloworld-v1" already present on machine
4m 4m 1 kubelet, node-8 spec.containers{helloworld} Normal Created Created container
4m 4m 1 kubelet, node-8 spec.containers{helloworld} Normal Started Started container
4m 4m 1 kubelet, node-8 spec.containers{istio-proxy} Normal Pulled Container image "docker.io/istio/proxy:0.2.7" already present on machine
4m 4m 1 kubelet, node-8 spec.containers{istio-proxy} Normal Created Created container
4m 4m 1 kubelet, node-8 spec.containers{istio-proxy} Normal Started Started container
[root#node-8 helloworld]# kubectl get pods
NAME READY STATUS RESTARTS AGE
helloworld-v1-3194034472-12rgj 0/2 Init:0/1 0 12m
helloworld-v2-717720256-rc06f 0/2 Init:0/1 0 12m
sleep-140275861-vjqf7 0/2 Init:0/1 0 1h
[root#node-8 helloworld]#
Initializers is enabled:
[root#node-8 istio-0.2.7]# kubectl api-versions | grep admi
admissionregistration.k8s.io/v1alpha1
[root#node-8 istio-0.2.7]#
From the istio-Proxy logs,
[2017-11-02 19:40:19.323][14][warning][main] external/envoy/source/server/server.cc:164] initializing epoch 0 (hot restart version=8.2490552)
[2017-11-02 19:40:19.330][14][warning][main] external/envoy/source/server/server.cc:332] starting main dispatch loop
[2017-11-02 19:40:19.392][14][warning][main] external/envoy/source/server/server.cc:316] all clusters initialized. initializing init manager
[2017-11-02 19:40:19.427][14][warning][config] external/envoy/source/server/listener_manager_impl.cc:451] all dependencies initialized. starting workers
[2017-11-02 19:41:19.429][14][warning][main] external/envoy/source/server/drain_manager_impl.cc:62] shutting down parent after drain
but the proxy_init is stuck at waiting state.
Istio sidecars can be automatically injected into a Pod before deployment using an alpha feature in Kubernetes called Initializers. Please ensure your cluster has the initializer alpha feature enabled. For example, this requires deploy an alpha cluster in GKE. In IBM Bluemix container service, alpha feature should be already enabled in 1.7.x k8s cluster.
After further research, figured that there is a known issue that got fixed in 1.8 where the init container can wait at PodInitializing state. https://github.com/kubernetes/kubernetes/pull/51644. works in 1.8 fine.

How to set the frequency of a liveness/readiness probe in Kubernetes

Is probe frequency customizable in liveness/readiness probe?
Also, how many times readiness probe fails before it removes the pod from service load-balancer? Is it customizable?
The probe frequency is controlled by the sync-frequency command line flag on the Kubelet, which defaults to syncing pod statuses once every 10 seconds.
I'm not aware of any way to customize the number of failed probes needed before a pod is considered not-ready to serve traffic.
If either of these features is important to you, feel free to open an issue explaining what your use case is or send us a PR! :)
You can easily customise the probes failure threshold and the frequency, all parameters are defined here.
For example:
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 9081
scheme: HTTP
initialDelaySeconds: 180
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
That probe will run the first time after 3 mins, it will run every 10 seconds and the pod will be restarted after 3 consecutives failures.
To customize the liveness/readiness probe frequency and other parameters we need to add liveness/readiness element inside the containers element of the yaml associated with that pod. A simple example of the yaml file is given below :
apiVersion: v1
kind: Pod
metadata:
name: liveness-exec
spec:
containers:
- name: liveness-ex
image: ubuntu
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy;sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
the initialDelaySeconds parameter ensure that liveness probe is checked after 5sec of container start and periodSeconds ensures that it is checked after every 5 sec. For more parameters you can go to link : https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/