Running celery task on Ubuntu with supervisor - django

I have a test Django site using a mod_wsgi daemon process, and have set up a simple Celery task to email a contact form, and have installed supervisor.
I know that the Django code is correct.
The problem I'm having is that when I submit the form, I am only getting one message - the first one. Subsequent completions of the contact form do not send any message at all.
On my server, I have another test site with a configured supervisor task running which uses the Django server (ie it's not using mod_wsgi). Both of my tasks are running fine if I do
sudo supervisorctl status
Here is my conf file for the task I've described above which is saved at
/etc/supervisor/conf.d
the user in this instance is called myuser
[program:test_project]
command=/home/myuser/virtualenvs/test_project_env/bin/celery -A test_project worker --loglevel=INFO --concurrency=10 -n worker2#%%h
directory=/home/myuser/djangoprojects/test_project
user=myuser
numprocs=1
stdout_logfile=/var/log/celery/test_project.out.log
stderr_logfile=/var/log/celery/test_project.err.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
stopasgroup=true
; Set Celery priority higher than default (999)
; so, if rabbitmq is supervised, it will start first.
priority=1000
My other test site has this set as the command - note worker1#%%h
command=/home/myuser/virtualenvs/another_test_project_env/bin/celery -A another_test_project worker --loglevel=INFO --concurrency=10 -n worker1#%%h
I'm obviously doing something wrong in that my form is only submitted. If I look at the out.log file referred to above, I only see the first task, nothing is visible for the other form submissions.
Many thanks in advance.
UPDATE
I submitted the first form at 8.32 am (GMT) which was received, and then as described above, another one shortly thereafter for which a task was not created. Just after finishing the question, I submitted the form again at 9.15, and for this a task was created and the message received! I then submitted the form again, but no task was created again. Hope this helps!

use ps auxf|grep celery to see how many worker you started,if there is any other worker you start before and you don't kill it ,the worker you create before will consume the task,result in you every two or three(or more) times there is only one task is received.
and you need to stop celery by:
sudo supervisorctl -c /etc/supervisord/supervisord.conf stop all
everytime, and set this in supervisord.conf:
stopasgroup=true ; send stop signal to the UNIX process group (default false)
Otherwise it will causes memory leaks and regular task loss.
If you has multi django site,here is a demo support by RabbitMQ:
you need add rabbitmq vhost and set user to vhost:
sudo rabbitmqctl add_vhost {vhost_name}
sudo rabbitmqctl set_permissions -p {vhost_name} {username} ".*" ".*" ".*"
and different site use different vhost(but can use same user).
add this to your django settings.py:
BROKER_URL = 'amqp://username:password#localhost:5672/vhost_name'
some info here:
Using celeryd as a daemon with multiple django apps?
Running multiple Django Celery websites on same server
Run Multiple Django Apps With Celery On One Server With Rabbitmq VHosts
Run Multiple Django Apps With Celery On One Server With Rabbitmq VHosts

Related

Symfony Messenger not shutting down gracefully when using APP_ENV=prod

We are using Symfony Messenger in combination with supervisor running in a Docker container on AWS ECS. We noticed the worker is not shut down gracefully. After debugging it appears it does work as expected when using APP_ENV=dev, but not when APP_ENV=prod.
I made a simple sleepMessage, which sleeps for 1 second and then prints a message for 60 seconds. This is when running with APP_ENV=dev
As you can see it's clearly waiting for the program to stop running.
Now with APP_ENV=prod:
It stops immediately without waiting.
In the Dockerfile we have configured the following to start supervisor. It's based on php:8.1-apache, so that's why STOPSIGNAL has been configured
RUN apt-get update && apt-get install -y --no-install-recommends \
# for supervisor
python \
supervisor
The start-worker.sh script contains this
#!/usr/bin/env bash
cp config/worker/messenger-worker.conf ../../../etc/supervisor/supervisord.conf
exec /usr/bin/supervisord
We do this because certain env variables are only available when starting up.
For debugging purposes the config has been hardcoded to test.
Below is the messenger-worker.conf
[unix_http_server]
file=/tmp/supervisor.sock
[supervisord]
nodaemon=true ; start in foreground if true; default false
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[program:messenger-consume]
stderr_logfile_maxbytes=0
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
command=bin/console messenger:consume async -vv --env=prod --time-limit=129600
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
numprocs=1
environment=
MESSENGER_TRANSPORT_DSN="https://sqs.eu-central-1.amazonaws.com/{id}/dev-
symfony-messenger-queue"
So in short, when using --env=prod in the config above it doesn't wait for the worker to stop, while with --env=dev it does. Does anybody know how to solve this?
I don't know why there would be a difference between dev & prod environment but it seems you have no grace period set (at least for Supervisor). As I added in the docs:
the workers will be able to handle the SIGTERM signal if you have the PCNTL PHP extension
you need to add stopwaitsecs to your Supervisor program configuration
As you use Docker too, you can also set the graceful period at the service level which defaults to 10s:
services:
my_app:
stop_grace_period: 20s
# ...
With this configuration, running docker-compose down (just an example):
Docker sends a SIGTERM signal to the service entrypoint (Supervisor) and waits 20s for it to exit
Supervisor sends a SIGTERM signal to its programs (messenger:consume commands) and waits 20s for them to exit
the messenger:consume processes will "catch" the signal, finish handling the current message and stop
every program stopped, Supervisor can stop, then the Docker Compose stack
Turns out it was related to the wait_time option related to SQS transports. It probably caused a request that was started just before the container exited and was sent back when the container did not exist anymore. So, wait_time to 0 fixed that problem.
Then there was this which could lead to the same issue

Gunicorn sync workers spawning processes

We're using Django + Gunicorn + Nginx in our server. The problem is that after a while we see lot's of gunicorn worker processes that have became orphan, and a lot other ones that have became zombie. Also we can see that some of Gunicorn worker processes spawn some other Gunicorn workers. Our best guess is that these workers become orphans after their parent workers have died.
Why Gunicorn workers spawn child workers? Why do they die?! And how can we prevent this?
I should also mention that we've set Gunicorn log level to debug and still we don't see any thing significant, other than periodical log of workers number, which reports count of workers we wanted from it.
UPDATE
This is the line we used to run gunicorn:
gunicorn --env DJANGO_SETTINGS_MODULE=proj.settings proj.wsgi --name proj --workers 10 --user proj --group proj --bind 127.0.0.1:7003 --log-level=debug --pid gunicorn.pid --timeout 600 --access-logfile /home/proj/access.log --error-logfile /home/proj/error.log
In my case I deploy in Ubuntu servers (LTS releases, now almost are 14.04 LTS servers) and I never did have problems with gunicorn daemons, I create a gunicorn.conf.py and launch gunicorn with this config from upstart with an script like this in /etc/init/djangoapp.conf
description "djangoapp website"
start on startup
stop on shutdown
respawn
respawn limit 10 5
script
cd /home/web/djangoapp
exec /home/web/djangoapp/bin/gunicorn -c gunicorn.conf.py -u web -g web djangoapp.wsgi
end script
I configure gunicorn with a .py file config and i setup some options (details below) and deploy my app (with virtualenv) in /home/web/djangoapp and no problems with zombie and orphans gunicorn processes.
i verified your options, timeout can be a problem but another one is that you don't setup max-requests in your config, by default is 0, so, no automatic worker restart in your daemon and can generate memory leaks (http://gunicorn-docs.readthedocs.org/en/latest/settings.html#max-requests)
We will use a .sh file to start the gunicorn process. Later you will use a supervisord configuration file. what is supervisord? some external know how information link about how to install supervisord with Django,Nginx,Gunicorn Here
gunicorn_start.sh remember to give chmod +x to the file.
#!/bin/sh
NAME="myDjango"
DJANGODIR="/var/www/html/myDjango"
NUM_WORKERS=3
echo "Starting myDjango -- Django Application"
cd $DJANGODIR
exec gunicorn -w $NUM_WORKERS $NAME.wsgi:application --bind 127.0.0.1:8001
mydjango_django.conf : Remember to install supervisord on your OS. and
Copy this on the configuration folder.
[program:myDjango]
command=/var/www/html/myDjango/gunicorn_start.sh
user=root
autorestart=true
redirect_sderr=true
Later on use the command:
Reload the daemon’s configuration files, without add/remove (no restarts)
supervisordctl reread
Restart all processes Note: restart does not reread config files. For that, see reread and update.
supervisordctl start all
Get all process status info.
supervisordctl status
This sounds like a timeout issue.
You have multiple timeouts going on and they all need to be in a descending order. It seems they may not be.
For example:
Nginx has a default timeout of 60 seconds
Gunicorn has a default timeout of 30 seconds
Django has a default timeout of 300 seconds
Postgres default timeout is complicated but let's pose 60 seconds for this example.
In this example, when 30 seconds has passed and Django is still waiting for Postgres to respond. Gunicorn tells Django to stop, which in turn should tell Postgres to stop. Gunicorn will wait a certain amount of time for this to happen before it kills django, leaving the postgres process as an orphan query. The user will re-initiate their query and this time the query will take longer because the old one is still running.
I see that you have set your Gunicorn tiemeout to 300 seconds.
This would probably mean that Nginx tells Gunicorn to stop after 60 seconds, Gunicorn may wait for Django who waits for Postgres or any other underlying processes, and when Nginx gets tired of waiting, it kills Gunicorn, leaving Django hanging.
This is still just a theory, but it is a very common problem and hopefully leads you and any others experiencing similar problems, to the right place.

Supervisord / Celery output logs

I am using Supervisor to manage celery. The supervisor config file for celery contains the following 2 entries:
stdout_logfile = /var/log/supervisor/celery.log
stderr_logfile = /var/log/supervisor/celery_err.log
What is confusing me is that although Celery is working properly and all tasks are being successfully completed, they are all written to celery_err.log. I thought that would only be for errors. The celery.log file only shows the normal celery startup info. Is the behaviour of writing successful task completion to the error log correct?
Note - tasks are definitely completing successfully (emails being sent, db entries made etc).
I met the same phenomenon just like you.This is because of celery's loging mechanism. Just see the setup_task_loggers method of celery logger source.
If logfile is not specified, then sys.stderr is used.
So, clear enough? Celery uses sys.stderr when logfile is not specified.
Solution:
You can use supervisord redirect_stderr = true flag to combine two log files into one. I'm using this one.
Config the celery logfile option.
Is the behaviour of writing successful task completion to the error log correct?
No, its not. I am having same setup and logging is working fine.
celery.log has task info
[2015-07-23 11:40:07,066: INFO/MainProcess] Received task: foo[b5a6e0e8-1027-4005-b2f6-1ea032c73d34]
[2015-07-23 11:40:07,494: INFO/MainProcess] Task foo[b5a6e0e8-1027-4005-b2f6-1ea032c73d34] succeeded in 0.424549156s: 1
celery_err.log has some warnings/errors. Try restarting supervisor process.
One issue you could be having is that python buffers output by default
In your supervisor file, you can disable this by setting the PYTHONUNBUFFERED environment variable, see below my Django example supervisor file
[program:celery-myapp]
environment=DJANGO_SETTINGS_MODULE="myapp.settings.production",PYTHONUNBUFFERED=1
command=/home/me/.virtualenvs/myapp/bin/celery -A myapp worker -B -l DEBUG
directory=/home/me/www/saleor
user=me
stdout_logfile=/home/me/www/myapp/log/supervisor-celery.log
stderr_logfile=/home/me/www/myapp/log/supervisor-celery-err.log
autostart=true
autorestart=true
startsecs=10

running celery as daemon using supervisor is not working

I have a django app in which it has a celery functionality, so i can able to run the celery sucessfully like below
celery -A tasks worker --loglevel=info
but as a known fact that we need to run it as a daemon and so i have written the below celery.conf file inside /etc/supervisor/conf.d/ folder
; ==================================
; celery worker supervisor example
; ==================================
[program:celery]
; Set full path to celery program if using virtualenv
command=/root/Envs/proj/bin/celery -A app.tasks worker --loglevel=info
user=root
environment=C_FORCE_ROOT="yes"
environment=HOME="/root",USER="root"
directory=/root/apps/proj/structure
numprocs=1
stdout_logfile=/var/log/celery/worker.log
stderr_logfile=/var/log/celery/worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
but when i tried to update the supervisor like supervisorctl reread and supervisorctl update i was getting the message from supervisorctl status
celery FATAL Exited too quickly (process log may have details)
So i went to worker.log file and seen the error message as below
Running a worker with superuser privileges when the
worker accepts messages serialized with pickle is a very bad idea!
If you really want to continue then you have to set the C_FORCE_ROOT
environment variable (but please think about this before you do).
User information: uid=0 euid=0 gid=0 egid=0
So why it was complaining about C_FORCE_ROOT even though we had set it as environment variable inside supervisor conf file ? what am i doing wrong in the above conf file ?
I had the same problem,so I added
environment=C_FORCE_ROOT="yes"
in my program config,but It didn't work
so I used
environment=C_FORCE_ROOT="true"
it's working
You'll need to run celery with a non superuser account, Please remove following lines from your config:
user=root
environment=C_FORCE_ROOT="yes"
environment=HOME="/root",USER="root"
And the add these lines to your config, I assume that you use django as a non superuser and developers as the user group:
user=django
group=developers
Note that subprocesses will inherit the environment variables of the
shell used to start supervisord except for the ones overridden here
and within the program’s environment option. See supervisord documents.
So Please note that when you change environment variables via supervisor config files, Changes won't apply by running supervisorctl reread and supervisorctl reload . You should run supervisor from the very start by following command:
supervisord -c /path/to/config/file.conf
From this other thread on stackoverflow. I managed to add the following settings and it worked for me.
app.conf.update(
CELERY_ACCEPT_CONTENT = ['json'],
CELERY_TASK_SERIALIZER = 'json',
CELERY_RESULT_SERIALIZER = 'json',
)

Running multiple Django Celery websites on same server

I'm running multiple Django/apache/wsgi websites on the same server using apache2 virtual servers. And I would like to use celery, but if I start celeryd for multiple websites, all the websites will use the configuration (logs, DB, etc) of the last celeryd instance I started.
Is there a way to use multiple Celeryd (one for each website) or one Celeryd for all of them? Seems like it should be doable, but I can't find out how.
This problem was a big headache, i didn't noticed #Crazyshezy 's comment when i first came here. I just accomplished this by changing Broker URL in settings.py for each web app.
app1.settings.py
BROKER_URL = 'redis://localhost:6379/0'
app2.settings.py
BROKER_URL = 'redis://localhost:6379/1'
Yes there is a way.
We use supervisor to start celery daemons for every project we need it.
The supervisor config file looks something like this:
[program:PROJECTNAME]
command=python manage.py celeryd --loglevel=INFO --beat
environment=PATH=/home/www-data/projects/PROJECTNAME/env/bin:/usr/bin:/bin
directory=/home/www-data/projects/PROJECTNAME/
user=www-data
numprocs=1
umask=022
stdout_logfile=/home/www-data/logs/%(program_name)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=10
stderr_logfile=/home/www-data/logs/%(program_name)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=10
autorestart=true
autostart=True
startsecs=10
stopwaitsecs = 60
priority=998
There is also an other advantage if you use this setup: The celery daemons run entirely in userspace.
Remember to use different broker backends for your projects. It won't work if you use the same rabbitmq virtualhost or if you use the same redis database for every project.