Restarting SuperVisor with code changes - django

I am running a django project with Gunicorn and Nginx with Supervisor. Everything worked fine but when i made some changes to the code it is not recognized by the supervisor and still it reads the old codes. Can you please help me. I tried to restart supervisorctl, it didnt work

If you're talking about python code changes, just use supervisorctl.
supervisorctl restart gunicorn (or whatever you called this)
If you're talking about supervisor configuration changes, use supervisorctl reread before starting your supervisor startup script via supervisorctl start foo

"You can gracefully reload your application in Gunicorn by sending HUP signal: $ kill -HUP masterpid", http://docs.gunicorn.org/en/stable/faq.html
For example, pkill -HUP gunicorn
"Sending HUP signal to the Master Gunicorn process -- Reload the configuration, start the new worker processes with a new configuration and gracefully shutdown older workers.", http://docs.gunicorn.org/en/stable/signals.html

Related

Symfony Messenger not shutting down gracefully when using APP_ENV=prod

We are using Symfony Messenger in combination with supervisor running in a Docker container on AWS ECS. We noticed the worker is not shut down gracefully. After debugging it appears it does work as expected when using APP_ENV=dev, but not when APP_ENV=prod.
I made a simple sleepMessage, which sleeps for 1 second and then prints a message for 60 seconds. This is when running with APP_ENV=dev
As you can see it's clearly waiting for the program to stop running.
Now with APP_ENV=prod:
It stops immediately without waiting.
In the Dockerfile we have configured the following to start supervisor. It's based on php:8.1-apache, so that's why STOPSIGNAL has been configured
RUN apt-get update && apt-get install -y --no-install-recommends \
# for supervisor
python \
supervisor
The start-worker.sh script contains this
#!/usr/bin/env bash
cp config/worker/messenger-worker.conf ../../../etc/supervisor/supervisord.conf
exec /usr/bin/supervisord
We do this because certain env variables are only available when starting up.
For debugging purposes the config has been hardcoded to test.
Below is the messenger-worker.conf
[unix_http_server]
file=/tmp/supervisor.sock
[supervisord]
nodaemon=true ; start in foreground if true; default false
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[program:messenger-consume]
stderr_logfile_maxbytes=0
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
command=bin/console messenger:consume async -vv --env=prod --time-limit=129600
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
numprocs=1
environment=
MESSENGER_TRANSPORT_DSN="https://sqs.eu-central-1.amazonaws.com/{id}/dev-
symfony-messenger-queue"
So in short, when using --env=prod in the config above it doesn't wait for the worker to stop, while with --env=dev it does. Does anybody know how to solve this?
I don't know why there would be a difference between dev & prod environment but it seems you have no grace period set (at least for Supervisor). As I added in the docs:
the workers will be able to handle the SIGTERM signal if you have the PCNTL PHP extension
you need to add stopwaitsecs to your Supervisor program configuration
As you use Docker too, you can also set the graceful period at the service level which defaults to 10s:
services:
my_app:
stop_grace_period: 20s
# ...
With this configuration, running docker-compose down (just an example):
Docker sends a SIGTERM signal to the service entrypoint (Supervisor) and waits 20s for it to exit
Supervisor sends a SIGTERM signal to its programs (messenger:consume commands) and waits 20s for them to exit
the messenger:consume processes will "catch" the signal, finish handling the current message and stop
every program stopped, Supervisor can stop, then the Docker Compose stack
Turns out it was related to the wait_time option related to SQS transports. It probably caused a request that was started just before the container exited and was sent back when the container did not exist anymore. So, wait_time to 0 fixed that problem.
Then there was this which could lead to the same issue

Restarting django-q qcluster gracefully

How can we restart qcluster gracefully for server changes? E.g. We use gunicorn to run django server which lets you gracefully restart workers without down time. How can you restart qcluster workers without disturbing any ongoing worker processing? Thanks.
I tried to restart Django-Q gracefully using the kill -SIGHUB since it is a convention for restarting services gracefully but it didn't work. I noticed Django-Q restart gracefully when receiving CTRL+C command and from there I found the solution.
# -2 is SIGINT. It acts like CTRL+C.
pkill -e -2 --full 'python manage.py qcluster'; (python manage.py qcluster &)

how to use uwsgi restart django

I have a wsgi.ini file in my project, and I use uwsgi wsgi.ini to run my project.But when I change the django code,I want to restart the project instead kill uwsgi then reload it. The uwsgi official document provide the following methods:
# using kill to send the signal
kill -HUP `cat /tmp/project-master.pid`
# or the convenience option --reload
uwsgi --reload /tmp/project-master.pid
# or if uwsgi was started with touch-reload=/tmp/somefile
touch /tmp/somefile
But I don't have a project-master.pid file in /tmp catalog in my system(centOS).
my question:
how to use uwsgi restart django instead of kill it then start it?
if use uwsgi official document provided method,how to create a .pid file and what content should in this file?
I find the anwser. project-master.pid is set in wsgi.ini file, you should set pidfile=/tmp/project-master.pid first. Then use uwsgi to start server: uwsgi wsgi.ini.After you start it, you can see a project-master.pid file in /tmp catalog. When you want to reload uwsgi server, you can use such command to restart server: uwsgi --reload /tmp/project-master.pid.
I found simplier answer in my opinion, you can just kill your uwsgi process and then spawn it again:
killall uwsgi
And then just run your uwsgi command again.
You don't need to use uWSGI server for your local development needs. Apache/uWSGI are meant for production, and having them restarted implicitly at every code change is not often desirable. In fact, production server not restarting even after the code is changed often acts as a safety net, so that you don't end up restarting the server without finalising the deployment.
Just use inbuild server django provides with itself.
python manage.py runserver 8000

Gunicorn sync workers spawning processes

We're using Django + Gunicorn + Nginx in our server. The problem is that after a while we see lot's of gunicorn worker processes that have became orphan, and a lot other ones that have became zombie. Also we can see that some of Gunicorn worker processes spawn some other Gunicorn workers. Our best guess is that these workers become orphans after their parent workers have died.
Why Gunicorn workers spawn child workers? Why do they die?! And how can we prevent this?
I should also mention that we've set Gunicorn log level to debug and still we don't see any thing significant, other than periodical log of workers number, which reports count of workers we wanted from it.
UPDATE
This is the line we used to run gunicorn:
gunicorn --env DJANGO_SETTINGS_MODULE=proj.settings proj.wsgi --name proj --workers 10 --user proj --group proj --bind 127.0.0.1:7003 --log-level=debug --pid gunicorn.pid --timeout 600 --access-logfile /home/proj/access.log --error-logfile /home/proj/error.log
In my case I deploy in Ubuntu servers (LTS releases, now almost are 14.04 LTS servers) and I never did have problems with gunicorn daemons, I create a gunicorn.conf.py and launch gunicorn with this config from upstart with an script like this in /etc/init/djangoapp.conf
description "djangoapp website"
start on startup
stop on shutdown
respawn
respawn limit 10 5
script
cd /home/web/djangoapp
exec /home/web/djangoapp/bin/gunicorn -c gunicorn.conf.py -u web -g web djangoapp.wsgi
end script
I configure gunicorn with a .py file config and i setup some options (details below) and deploy my app (with virtualenv) in /home/web/djangoapp and no problems with zombie and orphans gunicorn processes.
i verified your options, timeout can be a problem but another one is that you don't setup max-requests in your config, by default is 0, so, no automatic worker restart in your daemon and can generate memory leaks (http://gunicorn-docs.readthedocs.org/en/latest/settings.html#max-requests)
We will use a .sh file to start the gunicorn process. Later you will use a supervisord configuration file. what is supervisord? some external know how information link about how to install supervisord with Django,Nginx,Gunicorn Here
gunicorn_start.sh remember to give chmod +x to the file.
#!/bin/sh
NAME="myDjango"
DJANGODIR="/var/www/html/myDjango"
NUM_WORKERS=3
echo "Starting myDjango -- Django Application"
cd $DJANGODIR
exec gunicorn -w $NUM_WORKERS $NAME.wsgi:application --bind 127.0.0.1:8001
mydjango_django.conf : Remember to install supervisord on your OS. and
Copy this on the configuration folder.
[program:myDjango]
command=/var/www/html/myDjango/gunicorn_start.sh
user=root
autorestart=true
redirect_sderr=true
Later on use the command:
Reload the daemon’s configuration files, without add/remove (no restarts)
supervisordctl reread
Restart all processes Note: restart does not reread config files. For that, see reread and update.
supervisordctl start all
Get all process status info.
supervisordctl status
This sounds like a timeout issue.
You have multiple timeouts going on and they all need to be in a descending order. It seems they may not be.
For example:
Nginx has a default timeout of 60 seconds
Gunicorn has a default timeout of 30 seconds
Django has a default timeout of 300 seconds
Postgres default timeout is complicated but let's pose 60 seconds for this example.
In this example, when 30 seconds has passed and Django is still waiting for Postgres to respond. Gunicorn tells Django to stop, which in turn should tell Postgres to stop. Gunicorn will wait a certain amount of time for this to happen before it kills django, leaving the postgres process as an orphan query. The user will re-initiate their query and this time the query will take longer because the old one is still running.
I see that you have set your Gunicorn tiemeout to 300 seconds.
This would probably mean that Nginx tells Gunicorn to stop after 60 seconds, Gunicorn may wait for Django who waits for Postgres or any other underlying processes, and when Nginx gets tired of waiting, it kills Gunicorn, leaving Django hanging.
This is still just a theory, but it is a very common problem and hopefully leads you and any others experiencing similar problems, to the right place.

running celery as daemon using supervisor is not working

I have a django app in which it has a celery functionality, so i can able to run the celery sucessfully like below
celery -A tasks worker --loglevel=info
but as a known fact that we need to run it as a daemon and so i have written the below celery.conf file inside /etc/supervisor/conf.d/ folder
; ==================================
; celery worker supervisor example
; ==================================
[program:celery]
; Set full path to celery program if using virtualenv
command=/root/Envs/proj/bin/celery -A app.tasks worker --loglevel=info
user=root
environment=C_FORCE_ROOT="yes"
environment=HOME="/root",USER="root"
directory=/root/apps/proj/structure
numprocs=1
stdout_logfile=/var/log/celery/worker.log
stderr_logfile=/var/log/celery/worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
but when i tried to update the supervisor like supervisorctl reread and supervisorctl update i was getting the message from supervisorctl status
celery FATAL Exited too quickly (process log may have details)
So i went to worker.log file and seen the error message as below
Running a worker with superuser privileges when the
worker accepts messages serialized with pickle is a very bad idea!
If you really want to continue then you have to set the C_FORCE_ROOT
environment variable (but please think about this before you do).
User information: uid=0 euid=0 gid=0 egid=0
So why it was complaining about C_FORCE_ROOT even though we had set it as environment variable inside supervisor conf file ? what am i doing wrong in the above conf file ?
I had the same problem,so I added
environment=C_FORCE_ROOT="yes"
in my program config,but It didn't work
so I used
environment=C_FORCE_ROOT="true"
it's working
You'll need to run celery with a non superuser account, Please remove following lines from your config:
user=root
environment=C_FORCE_ROOT="yes"
environment=HOME="/root",USER="root"
And the add these lines to your config, I assume that you use django as a non superuser and developers as the user group:
user=django
group=developers
Note that subprocesses will inherit the environment variables of the
shell used to start supervisord except for the ones overridden here
and within the program’s environment option. See supervisord documents.
So Please note that when you change environment variables via supervisor config files, Changes won't apply by running supervisorctl reread and supervisorctl reload . You should run supervisor from the very start by following command:
supervisord -c /path/to/config/file.conf
From this other thread on stackoverflow. I managed to add the following settings and it worked for me.
app.conf.update(
CELERY_ACCEPT_CONTENT = ['json'],
CELERY_TASK_SERIALIZER = 'json',
CELERY_RESULT_SERIALIZER = 'json',
)