Is probe frequency customizable in liveness/readiness probe?
Also, how many times readiness probe fails before it removes the pod from service load-balancer? Is it customizable?
The probe frequency is controlled by the sync-frequency command line flag on the Kubelet, which defaults to syncing pod statuses once every 10 seconds.
I'm not aware of any way to customize the number of failed probes needed before a pod is considered not-ready to serve traffic.
If either of these features is important to you, feel free to open an issue explaining what your use case is or send us a PR! :)
You can easily customise the probes failure threshold and the frequency, all parameters are defined here.
For example:
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 9081
scheme: HTTP
initialDelaySeconds: 180
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
That probe will run the first time after 3 mins, it will run every 10 seconds and the pod will be restarted after 3 consecutives failures.
To customize the liveness/readiness probe frequency and other parameters we need to add liveness/readiness element inside the containers element of the yaml associated with that pod. A simple example of the yaml file is given below :
apiVersion: v1
kind: Pod
metadata:
name: liveness-exec
spec:
containers:
- name: liveness-ex
image: ubuntu
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy;sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
the initialDelaySeconds parameter ensure that liveness probe is checked after 5sec of container start and periodSeconds ensures that it is checked after every 5 sec. For more parameters you can go to link : https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
Related
I am running EKS cluster with fargate profile. I checked nodes status by using kubectl describe node and it is showing disk pressure:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 12 Jul 2022 03:10:33 +0000 Wed, 29 Jun 2022 13:21:17 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Tue, 12 Jul 2022 03:10:33 +0000 Wed, 06 Jul 2022 19:46:54 +0000 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Tue, 12 Jul 2022 03:10:33 +0000 Wed, 29 Jun 2022 13:21:17 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 12 Jul 2022 03:10:33 +0000 Wed, 29 Jun 2022 13:21:27 +0000 KubeletReady kubelet is posting ready status
And also there is failed garbage collection event.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FreeDiskSpaceFailed 11m (x844 over 2d22h) kubelet failed to garbage collect required amount of images. Wanted to free 6314505830 bytes, but freed 0 bytes
Warning EvictionThresholdMet 65s (x45728 over 5d7h) kubelet Attempting to reclaim ephemeral-storage
I think cause of disk filling quickly is due to application logs, which application is writing to stdout, as per aws documentation which in turn is written to log files by container agent and I am using fargate in-built fluentbit to push application logs to opensearch cluster.
But looks like EKS cluster is not deleting old log files created by container agent.
I was looking to SSH into fargate nodes to furhter debug issue but as per aws support ssh into fargate nodes not possible.
What can be done to remove disk pressure from fargate nodes?
As suggested in answers I am using logrotate in sidecar. But as per logs of logrotate container it is not able to find dir:
rotating pattern: /var/log/containers/*.log
52428800 bytes (5 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/containers/*.log
log /var/log/containers/*.log does not exist -- skipping
reading config file /etc/logrotate.conf
Reading state from file: /var/lib/logrotate.status
Allocating hash table for state file, size 64 entries
Creating new state
yaml file is:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-apis
namespace: kube-system
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: my-apis
image: 111111xxxxx.dkr.ecr.us-west-2.amazonaws.com/my-apis:1.0.3
ports:
- containerPort: 8080
resources:
limits:
cpu: "1000m"
memory: "1200Mi"
requests:
cpu: "1000m"
memory: "1200Mi"
readinessProbe:
httpGet:
path: "/ping"
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 2
livenessProbe:
httpGet:
path: "/ping"
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
- name: logrotate
image: realz/logrotate
volumeMounts:
- mountPath: /var/log/containers
name: my-app-logs
env:
- name: CRON_EXPR
value: "*/5 * * * *"
- name: LOGROTATE_LOGFILES
value: "/var/log/containers/*.log"
- name: LOGROTATE_FILESIZE
value: "50M"
- name: LOGROTATE_FILENUM
value: "5"
volumes:
- name: my-app-logs
emptyDir: {}
What can be done to remove disk pressure from fargate nodes?
No known configuration that could have Fargate to automatic clean a specific log location. You can run logrotate as sidecar. Plenty of choices here.
Found the cause of disk filling quickly. It was due to logging library logback writing logs to both files and console and log rotation policy in logback was retaining large number of log files for long periods. Removing appender in logback config that is writing to files to fix issue.
Also I found out that STDOUT logs written to files by container agent are rotated and have files size of 10 mb and maximum of 5 files. So it cannot cause disk pressure.
I am creating a tekton project which will spawn docker images which in turn will run few kubectl commands. This I have accomplished by using sidecars in tekton docker:dind image and setting
securityContext:
privileged: true
env:
However, one of the task is failing, since it needs to have an equivalent of --net=host in docker run example.
I have tried to set a podtemplate with hostnetwork: True, but then the task with the sidecar fails to start the docker
Any idea if I could implement --net=host in the task yaml file. It would be really helpful.
Snippet of my task with the sidecar:
sidecars:
- image: mypvtreg:exv1
name: mgmtserver
args:
- --storage-driver=vfs
- --userland-proxy=false
# - --net=host
securityContext:
privileged: true
env:
# Write generated certs to the path shared with the client.
- name: DOCKER_TLS_CERTDIR
value: /certs
volumeMounts:
- mountPath: /certs
As commented by #SYN, Using docker:dind as a sidecar, your builder container, executing in your Task steps, should connect to 127.0.0.1. That's how you would talk to your dind sidecar.
I have a django deployment on kubernetes cluster and in the readinessProbe, I am running python, manage.py, migrate, --check. I can see that the return value of this command is 0 but the pod never becomes ready.
Snippet of my deployment:
containers:
- name: myapp
...
imagePullPolicy: Always
readinessProbe:
exec:
command: ["python", "manage.py", "migrate", "--check"]
initialDelaySeconds: 15
periodSeconds: 5
When I describe the pod which is not yet ready:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 66s default-scheduler Successfully assigned ... Normal Pulled 66s kubelet Successfully pulled image ...
Normal Created 66s kubelet Created container ...
Normal Started 66s kubelet Started container ...
Warning Unhealthy 5s (x10 over 50s) kubelet Readiness probe failed:
I can see that migrate --check returns 0 by execing into the container which is still in not ready state and running
python manage.py migrate
echo $?
0
Is there something wrong in my exec command passed as readinessProbe?
The version of kubernetes server that I am using is 1.21.7.
The base image for my deployment is python:3.7-slim.
The solution for the issue is to increase timeoutSeconds parameter, which is by default set to 1 second:
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
After increasing the timeoutSeconds parameter, the application is able to pass the readiness probe.
Example snippet of the deployment with timeoutSeconds parameter set to 5:
containers:
- name: myapp
...
imagePullPolicy: Always
readinessProbe:
exec:
command: ["python", "manage.py", "migrate", "--check"]
initialDelaySeconds: 15
periodSeconds: 5
timeoutSeconds: 5
Info:
I created a flask app and on my Dockerfile last command CMD gunicorn -b 0.0.0.0:5000 --access-logfile - "app:create_app()"
I build,tag and upload image on ECR
I used this docker image to create an ECS Fargate instance with the following configs (just posting the one needed for the question):
ECSTaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Cpu: "256"
Memory: "1024"
RequiresCompatibilities:
- FARGATE
ContainerDefinitions:
- Name: contained_above
.
.
.
ECSService:
Type: AWS::ECS::Service
DependsOn: ListenerRule
Properties:
Cluster: !Sub "${EnvName}-ECScluster"
DesiredCount: 1
LaunchType: FARGATE
DeploymentConfiguration:
MaximumPercent: 200
MinimumHealthyPercent: 50
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: ENABLED
Subnets:
- Fn::ImportValue: !Sub "${EnvName}-PUBLIC-SUBNET-1"
- Fn::ImportValue: !Sub "${EnvName}-PUBLIC-SUBNET-2"
SecurityGroups:
- Fn::ImportValue: !Sub "${EnvName}-CONTAINER-SECURITY-GROUP"
ServiceName: !Sub "${EnvName}-ECS-SERVICE"
TaskDefinition: !Ref ECSTaskDefinition
LoadBalancers:
- ContainerName: contained_above
ContainerPort: 5000
TargetGroupArn: !Ref TargetGroup
(App is working normally)
Question
Now my question is what number should be the workers on gunicorn command (my last command in dockerfile)?
On gunicorn design it is stated to use Generally we recommend (2 x $num_cores) + 1 as the number of workers to start off with.
So whats the number of cores on a fargate? Does actually make sense to combine gunicorn with Fargate like the above process? Is there 'compatibility' between loadbalancers and gunicorn workers? What is the connection between DesiredCount of ECS Service and the gunicorn -w workers value? Am I missing or miss-understanding something?
Possible solution(?)
One way that I could call it is the following:
CMD gunicorn -b 0.0.0.0:5000 -w $(( 2 * `cat /proc/cpuinfo | grep 'core id' | wc -l` + 1 )) --access-logfile - "app:create_app()"
But I am not sure if that would be a good solution.
Any insights? Thanks
EDIT: I'm using a configuration file for gunicorn to use when starting:
gunicorn.conf.py
import multiprocessing
bind = "0.0.0.0:8080"
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornH11Worker"
keepalive = 0
you can tell gunicorn which config file to use with the --config flag.
Sadly I can't find the source anymore, but I've read that 4-12 workers should be enough to handle hundreds if not thousands of simultaneous requests - depending on your application structure, worker class and payload size.
Do take this with a grain of salt tho, since I can't find the source anymore, but it was in an accepted SO answer from a well-reputated person if I remember correctly.
Offical gunicorn docs state somthing in the 2-4 x $(NUM_CORES)range.
Another option would be as gunicorn docs state at another point:
Generally we recommend (2 x $num_cores) + 1 as the number of workers
to start off with. While not overly scientific, the formula is based
on the assumption that for a given core, one worker will be reading or
writing from the socket while the other worker is processing a
request.
Obviously, your particular hardware and application are going to
affect the optimal number of workers. Our recommendation is to start
with the above guess and tune using TTIN and TTOU signals while the
application is under load.
So far I've been running well with holding true to the 4-12 worker recommendation. My company runs several APIs, which connect to other APIs out there, which results in mostly 1-2seconds request time, with the longest taking up to a whole minute (a lot of external API calls here).
Another colleague I talked to mentioned, they are using 1 worker per 5 simultaneous requests they expect - with similar APIs to ours. Works fine for them as well.
I have a flask app that includes some NLP packages and takes a while to initially build some vectors before it starts the server. I've noticed this in the past with Google App Engine and I was able to set a max timeout in the app.yaml file to fix this.
The problem is that when I start my cluster on Kubernetes with this app, I notice that the workers keep timing out in the logs. Which makes sense because I'm sure the default amount of time is not enough. However, I can't figure out how to configure GKE to allow the workers enough time to do everything it needs to do before it starts serving.
How do I increase the time the workers can take before they timeout?
I deleted the old instances so I can't get the logs right now, but I can start it up if someone wants to see the logs.
It's something like this:
I 2020-06-26T01:16:04.603060653Z Computing vectors for all products
E 2020-06-26T01:16:05.660331982Z
95it [00:05, 17.84it/s][2020-06-26 01:16:05 +0000] [220] [INFO] Booting worker with pid: 220
E 2020-06-26T01:16:31.198002748Z [nltk_data] Downloading package stopwords to /root/nltk_data...
E 2020-06-26T01:16:31.198056691Z [nltk_data] Package stopwords is already up-to-date!
100it 2020-06-26T01:16:35.696015992Z [CRITICAL] WORKER TIMEOUT (pid:220)
E 2020-06-26T01:16:35.696015992Z [2020-06-26 01:16:35 +0000] [220] [INFO] Worker exiting (pid: 220)
I also see this:
The node was low on resource: memory. Container thoughtful-sha256-1 was using 1035416Ki, which exceeds its request of 0.
Obviously I don't exactly know what I'm doing. Why does it say I'm requesting 0 memory and can I set a timeout amount for the Kubernetes nodes?
Thanks for the help!
One thing you can do is add some sort of delay in a startup script for your GCP instances. You could try a simple:
#!/bin/bash
sleep <time-in-seconds>
Another thing you can try is adding some sort of delay to when your containers start in your Kubernetes nodes. For example, a delay in an initContainer
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: myapa:latest
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', "echo Waiting a bit && sleep 3600"]
Furthermore, you can try a StartupProbe combined with the Probe parameter initialDelaySeconds on your actual application container that way it actually waits for some time before saying: I'm going to see if the application has started.:
startupProbe:
exec:
command:
- touch
- /tmp/started
initialDelaySeconds: 3600