How can we restart qcluster gracefully for server changes? E.g. We use gunicorn to run django server which lets you gracefully restart workers without down time. How can you restart qcluster workers without disturbing any ongoing worker processing? Thanks.
I tried to restart Django-Q gracefully using the kill -SIGHUB since it is a convention for restarting services gracefully but it didn't work. I noticed Django-Q restart gracefully when receiving CTRL+C command and from there I found the solution.
# -2 is SIGINT. It acts like CTRL+C.
pkill -e -2 --full 'python manage.py qcluster'; (python manage.py qcluster &)
I've been trying to follow this thorough explanation on how to deploy a django app with celery worker to aws elastic beanstalk:
How to run a celery worker with Django app scalable by AWS Elastic Beanstalk?
I had some problems installing pycurl but solved it with the comment in:
Pip Requirements.txt --global-option causing installation errors with other packages. "option not recognized"
Then i got:
[2019-01-26T06:43:04.865Z] INFO [12249] - [Application update app-190126_134200#28/AppDeployStage0/EbExtensionPostBuild/Infra-EmbeddedPostBuild/postbuild_1_raiseflags/Command 05_celery_tasks_run] : Activity execution failed, because: /usr/bin/env: bash
: No such file or directory
(ElasticBeanstalk::ExternalInvocationError)
But also solved it: it turns out I had to convert "celery_configuration.txt" file to UNIX EOL (i'm using Windows, and Notepad++ automatically converted it to Windows EOL).
With all these modifications I can successfully deploy the project. But the problem is that the periodic tasks are not running.
I get:
2019-01-26 09:12:57,337 INFO exited: celeryd-beat (exit status 1; not expected)
2019-01-26 09:12:58,583 INFO spawned: 'celeryd-worker' with pid 25691
2019-01-26 09:12:59,453 INFO spawned: 'celeryd-beat' with pid 25695
2019-01-26 09:12:59,666 INFO exited: celeryd-worker (exit status 1; not expected)
2019-01-26 09:13:00,790 INFO spawned: 'celeryd-worker' with pid 25705
2019-01-26 09:13:00,791 INFO exited: celeryd-beat (exit status 1; not expected)
2019-01-26 09:13:01,915 INFO exited: celeryd-worker (exit status 1; not expected)
2019-01-26 09:13:03,919 INFO spawned: 'celeryd-worker' with pid 25728
2019-01-26 09:13:03,920 INFO spawned: 'celeryd-beat' with pid 25729
2019-01-26 09:13:05,985 INFO exited: celeryd-worker (exit status 1; not expected)
2019-01-26 09:13:06,091 INFO exited: celeryd-beat (exit status 1; not expected)
2019-01-26 09:13:07,092 INFO gave up: celeryd-beat entered FATAL state, too many start retries too quickly
2019-01-26 09:13:09,096 INFO spawned: 'celeryd-worker' with pid 25737
2019-01-26 09:13:10,084 INFO exited: celeryd-worker (exit status 1; not expected)
2019-01-26 09:13:11,085 INFO gave up: celeryd-worker entered FATAL state, too many start retries too quickly
I also have this part of the logs:
[2019-01-26T09:13:00.583Z] INFO [25247] - [Application update app-190126_161213#43/AppDeployStage1/AppDeployPostHook/run_supervised_celeryd.sh] : Completed activity. Result:
[program:celeryd-worker]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A raiseflags --loglevel=INFO
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-worker.log
stderr_logfile=/var/log/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=PYTHONPATH="/opt/python/current/app/:",PATH="/opt/python/run/venv/bin/:%%(ENV_PATH)s",RDS_PORT="5432",RDS_DB_NAME="ebdb",RDS_USERNAME="foobar",PYCURL_SSL_LIBRARY="nss",DJANGO_SETTINGS_MODULE="raiseflags.settings",RDS_PASSWORD="foobar",RDS_HOSTNAME="something.something.eu-west-1.rds.amazonaws.com"
[program:celeryd-beat]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery beat -A raiseflags --loglevel=INFO --workdir=/tmp -S django --pidfile /tmp/celerybeat.pid
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-beat.log
stderr_logfile=/var/log/celery-beat.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=PYTHONPATH="/opt/python/current/app/:",PATH="/opt/python/run/venv/bin/:%%(ENV_PATH)s",RDS_PORT="5432",RDS_DB_NAME="ebdb",RDS_USERNAME="puigdemontAWS",PYCURL_SSL_LIBRARY="nss",DJANGO_SETTINGS_MODULE="raiseflags.settings",RDS_PASSWORD="holahola",RDS_HOSTNAME="aa1m59206y4fljn.cdreg3t50bbl.eu-west-1.rds.amazonaws.com"
No config updates to processes
celeryd-beat: ERROR (not running)
celeryd-beat: ERROR (abnormal termination)
celeryd-worker: ERROR (not running)
celeryd-worker: ERROR (abnormal termination)
[2019-01-26T09:13:00.583Z] INFO [25247] - [Application update app-190126_161213#43/AppDeployStage1/AppDeployPostHook] : Completed activity. Result:
Successfully execute hooks in directory /opt/elasticbeanstalk/hooks/appdeploy/post.
[2019-01-26T09:13:00.583Z] INFO [25247] - [Application update app-190126_161213#43/AppDeployStage1] : Completed activity. Result:
Application version switch - Command CMD-AppDeploy stage 1 completed
[2019-01-26T09:13:00.583Z] INFO [25247] - [Application update app-190126_161213#43/AddonsAfter] : Starting activity...
[2019-01-26T09:13:00.583Z] INFO [25247] - [Application update app-190126_161213#43/AddonsAfter/ConfigLogRotation] : Starting activity...
[2019-01-26T09:13:00.583Z] INFO [25247] - [Application update app-190126_161213#43/AddonsAfter/ConfigLogRotation/10-config.sh] : Starting activity...
[2019-01-26T09:13:00.756Z] INFO [25247] - [Application update app-190126_161213#43/AddonsAfter/ConfigLogRotation/10-config.sh] : Completed activity. Result:
Disabled forced hourly log rotation.
[2019-01-26T09:13:00.756Z] INFO [25247] - [Application update app-190126_161213#43/AddonsAfter/ConfigLogRotation] : Completed activity. Result:
Successfully execute hooks in directory /opt/elasticbeanstalk/addons/logpublish/hooks/config.
I don't know if it has something to do with the error, but notice above the line [[ PATH="/opt/python/run/venv/bin/:%%(ENV_PATH)s" ]] --> shouldn't ENV_PATH be something else?:
environment=PYTHONPATH="/opt/python/current/app/:",PATH="/opt/python/run/venv/bin/:%%(ENV_PATH)s",RDS_PORT="5432",RDS_DB_NAME="ebdb",RDS_USERNAME="foobar",PYCURL_SSL_LIBRARY="nss",DJANGO_SETTINGS_MODULE="raiseflags.settings",RDS_PASSWORD="foobar",RDS_HOSTNAME="something.something.eu-west-1.rds.amazonaws.com"
I'ts my first time deploying an app with celery, and i'm really lost to be honest. I fought a lot to solve the first two errors (i'm really amateur), and now that i get this I don't even know where to start.
Also, i'm not sure if I'm using "celery_configuration.txt" the right way. The only thing I edited was the 2 places where it says "django_app", which I changed for "raiseflags" (the name of my django project). Is this correct?
Does anyone know how to solve it? I can paste my files if needed, but they are just like the ones provided in the first link. I'm using Windows.
Thank you very much!
Ok, the problem had nothing to do with the PATH line I was referring to. I just had to add 'django_celery_beat' and 'django_celery_results' in INSTALLED_APPS in my settings.py
The connection error I later referred to talking to Fran was because I needed to set BROKER_URL instead of CELERY_BROKER_URL, also in the settings.py file. I guess this had to do with me not specifying 'CELERY' as the namespace in the app.autodiscover_tasks() in celery.py file (although in the linked question they do it, i didn't do it because i was using a different version of celery).
Thanks to Fran for everything, specially for pointing out that i should review the celery error logs. I didn't know how to do it. If any other amateur is also struggling, know that you have to "eb ssh" to your instance and then "tail -n 40 /var/log/celery-worker.log" and ""tail -n 40 /var/log/celery-beat.log" (where "40" is the number of lines you want to read). I know this sounds obvious to a lot of people but, stupid me, I had no clue.
(btw, i'm still struggling with a problem with the celery worker, that can't find pycurl module, but this has nothing to do with this question).
Referring to the line you pointed out where appears
environment=PYTHONPATH="/opt/python/current/app/:",PATH="/opt/python/run/venv/bin/:%%(ENV_PATH)s",RDS_PORT="5432",RDS_DB_NAME="ebdb",RDS_USERNAME="foobar",PYCURL_SSL_LIBRARY="nss",DJANGO_SETTINGS_MODULE="raiseflags.settings",RDS_PASSWORD="foobar",RDS_HOSTNAME="something.something.eu-west-1.rds.amazonaws.com", do you copy this line from somewhere? Because I don't see it in the link you posted.
In the linked answer was environment=$celeryenv, where $celeryenv was defined as
celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g' | sed 's/%/%%/g'`
celeryenv=${celeryenv%?}```
I have used lein repl commond to test some operation on the database of my project. But I was not able to connect to it.
Then I found that the issue is database was not getting loaded. The only solution i was able to try out was:
lein run
this resulted in following messages:
2018-04-24 12:23:07,397 [main] INFO guestbook.core - #'guestbook.db.core/*db* started
2018-04-24 12:23:07,398 [main] INFO guestbook.core - #'guestbook.handler/init-app started
2018-04-24 12:23:07,398 [main] INFO guestbook.core - #'guestbook.handler/app started
2018-04-24 12:23:07,398 [main] INFO guestbook.core - #'guestbook.core/http-server started
2018-04-24 12:23:07,398 [main] INFO guestbook.core - #'guestbook.core/repl-server started
then I ran the following command :
lein repl :connect 7000
this connected to database and started repl. next commands worked fine:
user=> (use 'guestbook.db.core)
nil
user=> (get-messages)
nil
Please let me know if there is any other way too?
On IBM DSX I find that if i leave a long running python notebook running overnight, the kernel dies around the same time (around midnight UTC).
The jupyter log shows :
[I 2017-07-29 23:37:14.929 NotebookApp] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel e827e71b-6492-4dc4-9201-b6ce29c2100c restarted
[D 2017-07-29 23:37:14.950 NotebookApp] Starting kernel: [u'/usr/local/src/bluemix_jupyter_bundle.v54/provision/pyspark_kernel_wrapper.sh', u'/gpfs/fs01/user/sc1c-81b7dbb381fb6a-c4b9ad2fa578/notebook/jupyter-rt/kernel-e827e71b-6492-4dc4-9201-b6ce29c2100c.json', u'spark20master']
[D 2017-07-29 23:37:14.954 NotebookApp] Connecting to: tcp://127.0.0.1:42931
[D 2017-07-29 23:37:17.957 NotebookApp] KernelRestarter:
restart apparently succeeded
Kernel log or Jupyter log shows nothing else before this point.
Is there some policy that is being enforced here to kill kernels? or maybe some scheduled downtime each day? Does anybody know why the "KernelRestarter" is kicking in?
The KernelRestarter is not killing anything. It notices that the kernel is gone and starts a new one automatically. DSX has inactivity timeouts, but those would shut down your service altogether rather than kill a kernel. And inactivity timeouts are not tied to a wall clock time. This seems to be a bug in DSX.
I've been wondering about and searching for solutions for this and I didn't find any.
I'm running Celery in a container built with docker-compose. My container is configured like this:
celery:
build: .
container_name: cl01
env_file: ./config/variables.env
entrypoint:
- /celery-entrypoint.sh
volumes:
- ./django:/django
depends_on:
- web
- db
- redis
stop_grace_period: 1m
And my entrypoint script looks like this:
#!/bin/sh
# Wait for django
sleep 10
su -m dockeruser -c "celery -A myapp worker -l INFO"
Now, if I run docker-compose stop, I would like to have a warm (graceful) shutdown, giving Celery the provided 1 minute (stop_grace_period) to finish already started tasks. However docker-compose stop seems to kill Celery straight away. Celery should also log that it is asked to shut down gracefully, but I don't see anything but an abrupt stop to my task logs.
What am I doing wrong or what do I need to change to make Celery shut down gracefully?
edit:
Suggested answer below about providing the --timeout parameter to docker-compose stop does not solve my issue.
You need to mark celery process with exec, this way celery process will have the same ID as docker command and docker will be able to send a SIGTERM signal to it and gracefully close celery process.
# should be the last command in script
exec celery -A myapp worker -l INFO
Via docs
Usage: stop [options] [SERVICE...]
Options:
-t, --timeout TIMEOUT Specify a shutdown timeout in seconds (default: 10).
Try with timeout set to 60 seconds at least.
My experience implementing graceful shutdown for celery workers spawned by supervisord inside a docker container.
Supervisord part
supervisord.conf
...
[supervisord]
...
nodaemon=true # run supervisord in the foreground
[include]
files=celery.conf # path to the celery config file
Set nodaemon=true so that we can start it as a background process from the entrypoint script later.
celery.conf
[group:celery_workers]
programs=one, two
[program:one]
...
command=celery -A backend --config=celery.py worker -n worker_one --pidfile=/var/log/celery/worker_one.pid --pool=gevent --concurrency=10 --loglevel=INFO
killasgroup=true
stopasgroup=true
stopsignal=TERM
stopwaitsecs=600
[program:two]
...
# similar to the previous one
The configuration file above is responsible for starting a group of workers each running in a separate process within a group. I'd like to stop on a stopwaitsecs section value. Let's see what the documentation tells us about it:
This parameter sets the number of seconds to wait for the OS to return
a SIGCHLD to supervisord after the program has been sent a
stopsignal. If this number of seconds elapses before supervisord
receives a SIGCHLD from the process, supervisord will attempt to kill
it with a final SIGKILL.
If stopwaitsecs>stop_grace_period specified for your service in a docker-compose file then you'll be getting SIGKILL from your docker. Make sure
stopwaitsecs<stop_grace_period, otherwise all running tasks get interrupted by docker.
Entrypoint script part
entrypoint.sh
#!/bin/bash
# safety switch, exit script if there's error.
set -e
on_close(){
echo "Signal caught..."
echo "Supervisor is stopping processes gracefully..."
# cleanup all pid files
rm worker_one.pid
rm worker_two.pid
supervisorctl stop celery_workers:
echo "All processes have been stopped. Exiting..."
exit 1
}
start_supervisord(){
supervisord -c /etc/supervisor/supervisord.conf
}
# start trapping signals (docker sends `SIGTERM` for shutdown)
trap on_close SIGINT SIGTERM SIGKILL
start_supervisord & # start supervisord in a background
SUPERVISORD_PID=$! # PID of the last background process started
wait $SUPERVISORD_PID
EXIT_STATUS=$? # the exit status of the last command executed
The script above consists of:
registering a cleanup function on_close
starting supervisord's process group in a background
registering the last background process's PID and waiting for it to finish
Docker part
docker-compose.yml
...
services:
celery:
...
stop_grace_period: 15m30s
entrypoint: [/entrypoints/entrypoint.sh]
The only setting worth mentioning here is entrypoint form declaration. In our case better to use exec form. It starts an executable script in a process with PID 1 and doesn't create any subprocesses as shell form does. SIGTERM from docker stop <container> gets propagated to an executable which traps it and performs all cleaning and closing logic.
Try using this:
docker-compose down