Attempting to restart Celery processes via Supervisor results in error - amazon-web-services

I am running supervisor/celery on an amazon aws server. Attempting to deploy a new application version eventually fails because the celery processes are not started. I have taken a look at the supervisord.conf file to ensure that the programs are included, which they are. At the end of the supervisord.conf file I have the following include:
[include]
files=celeryd.conf
files=flower.conf
I try to restart celery with
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-default celeryd-slowtasks
celeryd-defualt and celeryd-slowtaks being the names of the programs listed in celeryd.conf. I get the following error:
celeryd-default: ERROR (no such process)
celeryd-slowtasks: ERROR (no such process)
celeryd-default: ERROR (no such process)
celeryd-slowtasks: ERROR (no such process)
If I run
sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart all
I get
flower: stopped
httpd: stopped
httpd: started
flower: started
without any mention of celery. Any idea how to start figuring this issue out?

Check /opt/python/etc/supervisord.conf, you are probably including a folder that you don't expect to be included.
Also ensure that the instance of supervisor that is running is actually using the config file you ex

Related

Celery & Celery Beat daemons not running tasks

I've set up the celeryd and celerybeat daemon by following this guide. When the daemons start, everything is marked as ok, however Celery simply doesn't run any of the tasks set in my Django application.
This is my /etc/default/celeryd file:
CELERY_NODES="w1"
CELERY_BIN="/home/millez/myproject/venv/bin/celery"
CELERY_CHDIR="/home/millez/myproject"
CELERY_OPTS="-l info --pool=solo"
CELERYD_LOG_FILE="/var/log/celery/%n%I.log"
CELERYD_PID_FILE="/var/run/celery/%n.pid"
CELERY_CREATE_DIRS=1
and this is my /etc/default/celerybeat:
CELERY_BIN="/home/millez/myproject/venv/bin/celery"
CELERYBEAT_CHDIR="/home/millez/myproject"
CELERYBEAT_OPTS="--schedule=/var/run/celery/celerybeat-schedule"
If I manually restart the daemons, (sudo /etc/init.d/celeryd restart and sudo /etc/init.d/celerybeat restart), and then check their statuses, this is the only output I get:
celeryd (node celery) (pid 2493) is up...
Running the actual celery commands manually works fine, e.g celery -A myproject worker -l info, it seems to be an issue with the way I've set up the daemons. However, I'm not too Linux savvy, so if anyone happens to be able to see some easy oversight I've made, let me know, this is driving me insane.

how to reconnect to a docker logs --follow where the log file was deleted

I have a docker container running in a small AWS instance with limited disk space. The logs were getting bigger, so I used the commands below to delete the evergrowing log files:
sudo -s -H
find /var -name "*json.log" | grep docker | xargs -r rm
journalctl --vacuum-size=50M
Now I want to see what's the behaviour of one of the running docker containers, but it claims the log file has disappeared (from the rm command above):
ubuntu#x-y-z:~$ docker logs --follow name_of_running_docker_1
error from daemon in stream: Error grabbing logs: open /var/lib/docker/containers/d9562d25787aaf3af2a2bb7fd4bf00994f2fa1a4904979972adf817ea8fa57c3/d9562d25787aaf3af2a2bb7fd4bf00994f2fa1a4904979972adf817ea8fa57c3-json.log: no such file or directory
I would like to be able to see again what's going on in the running container, so I tried:
sudo touch /var/lib/docker/containers/d9562d25787aaf3af2a2bb7fd4bf00994f2fa1a4904979972adf817ea8fa57c3/d9562d25787aaf3af2a2bb7fd4bf00994f2fa1a4904979972adf817ea8fa57c3-json.log
And again docker follow, but while interacting with the software that should produce logs, I can see that nothing is happening.
Is there any way to rescue the printing into the log file again without killing (rebooting) the containers?
Is there any way to rescue the printing into the log file again without killing (rebooting) the containers?
Yes, but it's more of a trick than a real solution. You should never interact with /var/lib/docker data directly. As per Docker docs:
part of the host filesystem [which] is managed by Docker (/var/lib/docker/volumes/ on Linux). Non-Docker processes should not modify this part of the filesystem.
For this trick to work, you need to configure your Docker Daemon to keep containers alive during downtime before first running our container. For example, by setting your /etc/docker/daemon.json with:
{
"live-restore": true
}
This requires Daemon restart such as sudo systemctl restart docker.
Then create a container and delete its .log file:
$ docker run --name myhttpd -d httpd:alpine
$ sudo rm $(docker inspect myhttpd -f '{{ .LogPath }}')
# Docker is not happy
$ docker logs myhttpd
error from daemon in stream: Error grabbing logs: open /var/lib/docker/containers/xxx-json.log: no such file or directory
Restart Daemon (with live restore), this will cause Docker to somehow re-take management of our container and create our log file back. However, any logs generate before log file deletion are lost.
$ sudo systemctl restart docker
$ docker logs myhttpd # works! and log file is created back
Note: this is not a documented or official Docker feature, simply a behavior I observed with my own experimentations using Docker 19.03. It may not work with other Docker versions
With live restore enabled, our container process keeps running even though Docker Daemon is stopped. On Docker daemon restart, it probably somehow try to re-read from the still alive process stdout and stderr and redirect output to our log file (hence re-creating it)

Permission denied error with django-celery-beat despite --schedulers flag

I am running Django, Celery, and RabbitMQ in a Docker container.
It's all configured well and running, however when I am trying to install django-celery-beat I have a problem initialising the service.
Specifically, this command:
celery -A project beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler
Results in this error:
celery.platforms.LockFailed: [Errno 13] Permission denied: '/usr/src/app/celerybeat.pid'
When looking at causes/solutions, the permission denied error appears to occur when the default scheduler (celery.beat.PersistentScheduler) attempts to track of the last run times in a local shelve database file and doesn't have write access.
However, I am using django-celery-beat and appying the --scheduler flag to use the django_celery_beat.schedulers service that should store the schedule in the Django database and therefore not require write access.
What else could be causing this problem? / How can I debug this further?
celerybeat (celery.bin.beat) creates a pid file where it stores process id
--pidfile
File used to store the process pid. Defaults to celerybeat.pid.
The program won’t start if this file already exists and the pid is
still alive.
You can leave --pidfile= as empty in your command but beware then it will not know if there is more than one celerybeat process active

Puma - Rails on linux // Restart when process dies

Using puma on a rails app; it sometimes dies without any articular reason; also often dies (does not restart after being stopped) when deployed
What would be a good way to monitor if the process died, and restart it right way ?
Being called within a rails app; I'd be useful to have a way to defines it for any apps.
I did not found any useable ways to do it (looked into systemd, other linux daemons… no success)
Thanks if any feedback
You can use puma control to start/stop puma server. If you know where puma.pid file placed (for Mac it's usually "#{Dir.pwd}/tmp/pids/puma.pid") you could do:
bundle exec pumactl -P path/puma.pid stop
To set pid file path or to other options (like daemonizing) you could create puma config. You can found an example here. And then start and stop server just with config file:
bundle exec pumactl -F config/puma.rb start
You can also restart and check status in this way:
bundle exec pumactl -F config/puma.rb restart
bundle exec pumactl -F config/puma.rb status

Issues with celery daemon

We're having issues with our celery daemon being very flaky. We use a fabric deployment script to restart the daemon whenever we push changes, but for some reason this is causing massive issues.
Whenever the deployment script is run the celery processes are left in some pseudo dead state. They will (unfortunately) still consume tasks from rabbitmq, but they won't actually do anything. Confusingly a brief inspection would indicate everything seems to be "fine" in this state, celeryctl status shows one node online and ps aux | grep celery shows 2 running processes.
However, attempting to run /etc/init.d/celeryd stop manually results in the following error:
start-stop-daemon: warning: failed to kill 30360: No such process
While in this state attempting to run celeryd start appears to work correctly, but in fact does nothing. The only way to fix the issue is to manually kill the running celery processes and then start them again.
Any ideas what's going on here? We also don't have complete confirmation, but we think the problem also develops after a few days (with no activity this is a test server currently) on it's own with no deployment.
I can't say that I know what's ailing your setup, but I've always used supervisord to run celery -- maybe the issue has to do with upstart? Regardless, I've never experienced this with celery running on top of supervisord.
For good measure, here's a sample supervisor config for celery:
[program:celeryd]
directory=/path/to/project/
command=/path/to/project/venv/bin/python manage.py celeryd -l INFO
user=nobody
autostart=true
autorestart=true
startsecs=10
numprocs=1
stdout_logfile=/var/log/sites/foo/celeryd_stdout.log
stderr_logfile=/var/log/sites/foo/celeryd_stderr.log
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
Restarting celeryd in my fab script is then as simple as issuing a sudo supervisorctl restart celeryd.