I am trying to start a celery worker on EB but get an error which doesn't explain much.
Command in config file in .ebextensions dir:
03_celery_worker:
command: "celery worker --app=config --loglevel=info -E --workdir=/opt/python/current/app/my_project/"
The listed command works fine on my local machine (just change workdir parameter).
Errors from the EB:
Activity execution failed, because: /opt/python/run/venv/local/lib/python3.6/site-packages/celery/platforms.py:796: RuntimeWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!
and
Starting new HTTPS connection (1): eu-west-1.queue.amazonaws.com
(ElasticBeanstalk::ExternalInvocationError)
I have updated celery worker command with parameter --uid=2 and privileges error disappeared but command execution is still failed due to
ExternalInvocationError
Any suggestions what I do wrong?
ExternalInvocationError
As I understand it means that listed command cannot be run from EB container commands. It is needed to create a script on the server and run celery from the script. This post describes how to do it.
Update:
It is needed to create a config file in .ebextensions directory. I called it celery.config. Link to the post above provides a script which works almost fine. It is needed to make some minor additions to work 100% correct. I had issues with schedule periodic tasks (celery beat). Below are steps on how to fix is:
Install (add to requirements) django-celery beat pip install django-celery-beat, add it to installed apps and use --scheduler parameter when starting celery beat. Instructions are here.
In the script you specify user which run the script. For celery worker it is celery user which was added earlier in the script (if doesn't exist). When I tried to start celery beat I got error PermissionDenied. It means that celery user doesn't have all necessary rights. using ssh I logged in to EB, looked a list of all users (cat /etc/passwd) and decided to use daemon user.
Listed steps resolved celery beat errors. Updated config file with the script is below (celery.config):
files:
"/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh":
mode: "000755"
owner: root
group: root
content: |
#!/usr/bin/env bash
# Create required directories
sudo mkdir -p /var/log/celery/
sudo mkdir -p /var/run/celery/
# Create group called 'celery'
sudo groupadd -f celery
# add the user 'celery' if it doesn't exist and add it to the group with same name
id -u celery &>/dev/null || sudo useradd -g celery celery
# add permissions to the celery user for r+w to the folders just created
sudo chown -R celery:celery /var/log/celery/
sudo chown -R celery:celery /var/run/celery/
# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
celeryenv=${celeryenv%?}
# Create CELERY configuration script
celeryconf="[program:celeryd]
directory=/opt/python/current/app
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A config.celery:app --loglevel=INFO --logfile=\"/var/log/celery/%%n%%I.log\" --pidfile=\"/var/run/celery/%%n.pid\"
user=celery
numprocs=1
stdout_logfile=/var/log/celery-worker.log
stderr_logfile=/var/log/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 60
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv"
# Create CELERY BEAT configuraiton script
celerybeatconf="[program:celerybeat]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery beat -A config.celery:app --loglevel=INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler --logfile=\"/var/log/celery/celery-beat.log\" --pidfile=\"/var/run/celery/celery-beat.pid\"
directory=/opt/python/current/app
user=daemon
numprocs=1
stdout_logfile=/var/log/celerybeat.log
stderr_logfile=/var/log/celerybeat.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 60
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=999
environment=$celeryenv"
# Create the celery supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf
echo "$celerybeatconf" | tee /opt/python/etc/celerybeat.conf
# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "celery.conf" /opt/python/etc/supervisord.conf
then
echo "[include]" | tee -a /opt/python/etc/supervisord.conf
echo "files: uwsgi.conf celery.conf celerybeat.conf" | tee -a /opt/python/etc/supervisord.conf
fi
# Enable supervisor to listen for HTTP/XML-RPC requests.
# supervisorctl will use XML-RPC to communicate with supervisord over port 9001.
# Source: https://askubuntu.com/questions/911994/supervisorctl-3-3-1-http-localhost9001-refused-connection
if ! grep -Fxq "[inet_http_server]" /opt/python/etc/supervisord.conf
then
echo "[inet_http_server]" | tee -a /opt/python/etc/supervisord.conf
echo "port = 127.0.0.1:9001" | tee -a /opt/python/etc/supervisord.conf
fi
# Reread the supervisord config
supervisorctl -c /opt/python/etc/supervisord.conf reread
# Update supervisord in cache without restarting all services
supervisorctl -c /opt/python/etc/supervisord.conf update
# Start/Restart celeryd through supervisord
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat
commands:
01_killotherbeats:
command: "ps auxww | grep 'celery beat' | awk '{print $2}' | sudo xargs kill -9 || true"
ignoreErrors: true
02_restartbeat:
command: "supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat"
leader_only: true
One thing to focus attention on: in my project celery.py file is in the config directory, that is why I write -A config.celery:app when start celery worker and celery beat
Related
I'm trying to get celery beat running on django within an elastic beanstalk environment. I've been following the deployment advice here:
How to run a celery worker with Django app scalable by AWS Elastic Beanstalk?
After deployment the supervisord process fails to work. This is what's show in the logs:
-------------------------------------
/opt/python/log/supervisord.log
-------------------------------------
2019-10-28 19:58:35,321 CRIT Supervisor running as root (no user in config file)
2019-10-28 19:58:35,333 INFO RPC interface 'supervisor' initialized
2019-10-28 19:58:35,333 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2019-10-28 19:58:35,334 INFO supervisord started with pid 3034
2019-10-28 19:58:36,338 INFO spawned: 'httpd' with pid 3118
2019-10-28 19:58:37,805 INFO success: httpd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-10-28 19:59:37,279 INFO spawned: 'celeryd-beat' with pid 3543
2019-10-28 19:59:37,289 INFO spawned: 'celeryd-worker' with pid 3544
2019-10-28 19:59:38,023 INFO stopped: celeryd-beat (terminated by SIGTERM)
2019-10-28 19:59:38,325 INFO spawned: 'celeryd-beat' with pid 3552
2019-10-28 19:59:38,383 INFO exited: celeryd-worker (exit status 1; not expected)
2019-10-28 19:59:38,932 INFO exited: celeryd-beat (exit status 1; not expected)
I don't understand what these logs are telling me and haven't been able to shed any light on it through my own research.
This is the shell script used to create the process:
#!/usr/bin/env bash
# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g' | sed 's/%/%%/g'`
celeryenv=${celeryenv%?}
# Create celery configuraiton script
celeryconf="[program:celeryd-worker]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A app_name --loglevel=DEBUG
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-worker.log
stderr_logfile=/var/log/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv
[program:celeryd-beat]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery beat -A app_name --loglevel=DEBUG --workdir=/tmp -S django --pidfile /tmp/celerybeat.pid
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-beat.log
stderr_logfile=/var/log/celery-beat.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv"
# Create the celery supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf
# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
then
echo "[include]" | tee -a /opt/python/etc/supervisord.conf
echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
fi
# Reread the supervisord config
supervisorctl -c /opt/python/etc/supervisord.conf reread
# Update supervisord in cache without restarting all services
supervisorctl -c /opt/python/etc/supervisord.conf update
# Start/Restart celeryd through supervisord
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker
Can anyone spot what I've done wrong? Happy to post more code if necessary.
These are the commands that solved this in the end:
sudo yum install -y libcurl-devel python-devel
PYTHON_INSTALL_LAYOUT=''
source /opt/python/run/venv/bin/activate && sudo /opt/python/run/venv/bin/python3.6 -m pip install pycurl --global-option="--with-openssl"
I am trying to get my celery worker running on elastic-beanstalk. It works fine locally but when I deploy to EB I get the error "Activity execution failed, because: /usr/bin/env: bash": No such file or directory". I am pretty novice at celery and EB so I haven't been able to find a solution so far.
I am working on a windows machine and I have seen other people face this issue with windows and fix it by converting "celery_configuration.txt" file to UNIX EOL, however I am using celery-worker.sh instead but I converted it to a UNIX EOL which still didn't work.
.ebextensions/celery.config
packages:
yum:
libcurl-devel: []
container_commands:
01_mkdir_for_log_and_pid:
command: "mkdir -p /var/log/celery/ /var/run/celery/"
02_celery_configure:
command: "cp .ebextensions/celery-worker.sh /opt/elasticbeanstalk/hooks/appdeploy/post/ && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/celery-worker.sh"
cwd: "/opt/python/ondeck/app"
03_celery_run:
command: "/opt/elasticbeanstalk/hooks/appdeploy/post/celery-worker.sh"
.ebextensions/celery-worker.sh
#!/usr/bin/env bash
# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
celeryenv=${celeryenv%?}
# Create celery configuraiton script
celeryconf="[program:celeryd-worker]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A backend -P solo --loglevel=INFO -n worker.%%h
directory=/opt/python/current/app/enq_web
user=nobody
numprocs=1
stdout_logfile=/var/log/celery/worker.log
stderr_logfile=/var/log/celery/worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv
"
# Create the celery supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf
# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
then
echo "[include]" | tee -a /opt/python/etc/supervisord.conf
echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
fi
# Reread the supervisord config
/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf reread
# Update supervisord in cache without restarting all services
/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf update
# Start/Restart celeryd through supervisord
/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker
I'm sure its something simple I missed but I have spent hours looking everywhere for a solution and at this point i'm out of ideas.
The issue is with your shebang (the #!/usr/bin/env bash) line in your celery-worker.sh file try using either /bin/bash or /bin/sh. Where bash or sh live will depend a lot on which AMI you are using with beanstalk.
This tutorial covers channels 1.x's deployment. However, this does not work with channels 2.x. The failing part is the daemon script, which is as follows:
files:"/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_daemon.sh":
mode: "000755"
owner: root
group: root
content: |
#!/usr/bin/env bash
# Get django environment variables
djangoenv=`cat /opt/python/current/env
| tr '\n' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g'
| sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
djangoenv=${djangoenv%?}
# Create daemon configuraiton script
daemonconf="[program:daphne]
; Set full path to channels program if using virtualenv
command=/opt/python/run/venv/bin/daphne -b 0.0.0.0 -p 5000 <your_project>.asgi:channel_layer
directory=/opt/python/current/app
user=ec2-user
numprocs=1
stdout_logfile=/var/log/stdout_daphne.log
stderr_logfile=/var/log/stderr_daphne.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$djangoenv
[program:worker]
; Set full path to program if using virtualenv
command=/opt/python/run/venv/bin/python manage.py runworker
directory=/opt/python/current/app
user=ec2-user
numprocs=1
stdout_logfile=/var/log/stdout_worker.log
stderr_logfile=/var/log/stderr_worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$djangoenv"
# Create the supervisord conf script
echo "$daemonconf" | sudo tee /opt/python/etc/daemon.conf
# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
then
echo "[include]" | sudo tee -a /opt/python/etc/supervisord.conf
echo "files: daemon.conf" | sudo tee -a /opt/python/etc/supervisord.conf
fi
# Reread the supervisord config
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf reread
# Update supervisord in cache without restarting all services
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf update
# Start/Restart processes through supervisord
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart daphne
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart worker
After deployment, AWS has 2 errors in logs: daphne: No such process and worker: No such process.
How should this script be changed, so that it runs on channels 2.x as well?
Thanks
I had the same error and the reason for mine was the supervisor process that runs these additional scripts wasn't picking up the Daphne process because of this line of code:
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
This checks if the supervisord.conf file has [include] present and adds the daemon process ONLY if there is no [include].
In my case I had a
[include]
celery.conf
in my supervisord file that prevented this Daphne script from adding the daemon.conf.
There are a few things you can do:
If you have another script creating a .conf file, combine these into one using the same include logic
Rewrite the include logic to check for daemon.conf specifically
Manually add daemon.conf to the supervisord.conf by SSH into your EC2 instance
I'm running celery with django and works great in development. But now I want to make it live
on my production server and I am running into some issues.
My setup is as follows:
Ubuntu
Nginx
Vitualenv
Upstart
Gunicorn
Django
I'm not sure how to now start celery with django when starting it with upstart and where does it log to?
Im starting django here:
~$ cd /var/www/webapps/minamobime_app
~$ source ../bin/activate
exec /var/www/webapps/bin/gunicorn_django -w $NUM_WORKERS \
--user=$USER --group=$GROUP --bind=$IP:$PORT --log-level=debug \
--log-file=$LOGFILE 2>>$LOGFILE
how do I start celery?
exec python manage.py celeryd -E -l info -c 2
Consider configuring celery as a daemon. For logging speciy:
CELERYD_LOG_FILE="/var/log/celery/%n.log"
where %s will be replaced by the node name
You can install supervisor using apt-get and then you can add the following to a file named celeryd.conf (or any name you wish) to etc/supervisor/conf.d folder (create the conf.d folder if it is not present)
; ==================================
; celery worker supervisor example
; ==================================
[program:celery]
; Set full path to celery program if using virtualenv
command=/home/<path to env>/env/bin/celery -A <appname> worker -l info
;enter the directory in which you want to run the app
directory=/home/<path to the app>
user=nobody
numprocs=1
stdout_logfile=/home/<path to the log file>/worker.log
stderr_logfile=/home/<path to the log file>/worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 1000
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
Also add the following lines to etc/supervisor/supervisord.conf
[include]
files = /etc/supervisor/conf.d/*.conf
Now start the supervisor by typing supervisord in terminal and celery will start automatically according to the settings you made above.
You can run:
python manage.py celery worker
this will work if you have djcelery in your INSTALLED_APPS
I really enjoy using upstart. I currently have upstart jobs to run different gunicorn instances in a number of virtualenvs. However, the 2-3 examples I found for Celery upstart scripts on the interwebs don't work for me.
So, with the following variables, how would I write an Upstart job to run django-celery in a virtualenv.
Path to Django Project:
/srv/projects/django_project
Path to this project's virtualenv:
/srv/environments/django_project
Path to celery settings is the Django project settings file (django-celery):
/srv/projects/django_project/settings.py
Path to the log file for this Celery instance:
/srv/logs/celery.log
For this virtual env, the user:
iamtheuser
and the group:
www-data
I want to run the Celery Daemon with celerybeat, so, the command I want to pass to the django-admin.py (or manage.py) is:
python manage.py celeryd -B
It'll be even better if the script starts after the gunicorn job starts, and stops when the gunicorn job stops. Let's say the file for that is:
/etc/init/gunicorn.conf
You may need to add some more configuration, but this is an upstart script I wrote for starting django-celery as a particular user in a virtualenv:
start on started mysql
stop on stopping mysql
exec su -s /bin/sh -c 'exec "$0" "$#"' user -- /home/user/project/venv/bin/python /home/user/project/django_project/manage.py celeryd
respawn
It works great for me.
I know that it looks ugly, but it appears to be the current 'proper' technique for running upstart jobs as unprivileged users, based on this superuser answer.
I thought that I would have had to do more to get it to work inside of the virtualenv, but calling the python binary inside the virtualenv is all it takes.
Here is my working config using the newer systemd running on Ubuntu 16.04 LTS. Celery is in a virtualenv. App is a Python/Flask.
Systemd file: /etc/systemd/system/celery.service
You'll want to change the user and paths.
[Unit]
Description=Celery Service
After=network.target
[Service]
Type=forking
User=nick
Group=nick
EnvironmentFile=-/home/nick/myapp/server_configs/celery_env.conf
WorkingDirectory=/home/nick/myapp
ExecStart=/bin/sh -c '${CELERY_BIN} multi start ${CELERYD_NODES} \
-A ${CELERY_APP} --pidfile=${CELERYD_PID_FILE} \
--logfile=${CELERYD_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL} ${CELERYD_OPTS}'
ExecStop=/bin/sh -c '${CELERY_BIN} multi stopwait ${CELERYD_NODES} \
--pidfile=${CELERYD_PID_FILE}'
ExecReload=/bin/sh -c '${CELERY_BIN} multi restart ${CELERYD_NODES} \
-A ${CELERY_APP} --pidfile=${CELERYD_PID_FILE} \
--logfile=${CELERYD_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL} ${CELERYD_OPTS}'
[Install]
WantedBy=multi-user.target
Environment file (referenced above):/home/nick/myapp/server_configs/celery_env.conf
# Name of nodes to start
# here we have a single node
CELERYD_NODES="w1"
# or we could have three nodes:
#CELERYD_NODES="w1 w2 w3"
# Absolute or relative path to the 'celery' command:
CELERY_BIN="/home/nick/myapp/venv/bin/celery"
# App instance to use
CELERY_APP="myapp.tasks"
# How to call manage.py
CELERYD_MULTI="multi"
# Extra command-line arguments to the worker
CELERYD_OPTS="--time-limit=300 --concurrency=8"
# - %n will be replaced with the first part of the nodename.
# - %I will be replaced with the current child process index
# and is important when using the prefork pool to avoid race conditions.
CELERYD_PID_FILE="/var/run/celery/%n.pid"
CELERYD_LOG_FILE="/var/log/celery/%n%I.log"
CELERYD_LOG_LEVEL="INFO"
To automatically create the log and run folder with the correct permissions for your user, create a file in /usr/lib/tmpfiles.d. I was having trouble with the /var/run/celery folder being deleted on rebooting and then celery could not start properly.
My /usr/lib/tmpfiles.d/celery.conf file:
d /var/log/celery 2775 nick nick -
d /var/run/celery 2775 nick nick -
To enable: sudo systemctl enable celery.service
Now you'll need to reboot your system for the /var/log/celery and /var/run/celery folders to be created. You can check to see if celery started after rebooting by checking the logs in /var/log/celery.
To restart celery: sudo systemctl restart celery.service
Debugging: tail -f /var/log/syslog and try restarting celery to see what the error is. It could be related to the backend or other things.
Hope this helps!