Celery beat process allocating large amount of memory at startup - django

I operate a Django 1.9 website on Heroku, with Celery 3.1.23. RabbitMQ is used as a broker.
After restarting the beat worker, the memory usage is always around 497Mb. This results in frequent Error R14 (Memory quota exceeded) as it quickly reaches the 512Mb limit.
How can I analyze what is in the memory at startup? I.e. how can I get a detail of what is in the memory when restarted?
Here is a detail of memory consumption obtained with the beta Heroku log-runtime-metrics:
heroku/beat.1:
source=beat.1 dyno=heroku.52346831.1ea92181-ab6d-461c-90fa-61fa8fef2c18
sample#memory_total=497.66MB
sample#memory_rss=443.91MB
sample#memory_cache=20.43MB
sample#memory_swap=33.33MB
sample#memory_pgpgin=282965pages
sample#memory_pgpgout=164606pages
sample#memory_quota=512.00MB

I had the same problem. Searching around, I followed How many CPU cores has a heroku dyno? and Celery immediately exceeds memory on Heroku.
So I typed:
heroku run grep -c processor /proc/cpuinfo -a <app_name>
It returned 8. So I added --concurrency=4 to my Procfile line:
worker: celery -A <app> worker -l info -O fair --without-gossip --without-mingle --without-heartbeat --concurrency=4
And memory usage seemed to be divided by almost 2:

Related

Gunicorn --preload option is causing workers to hang?

We have a flask app that uses a lot of memory for ML models, and I'm trying to reduce the memory footprint by using gunicorn's preload option, but when I add the --preload flag, and deploy that (with -w 4, to a docker container running on GKE), it will handle just a few requests, and then hang until it times out, at which point gunicorn will start another worker to replace it and the same thing will happen. It's not clear yet how many requests each worker will process before hanging (possibly just 1... possibly a few)
The timeout is over 10 minutes, so it seems to be hanging indefinitely.
This does not happen at all if I remove the --preload flag.
What is it about the --preload flag that could be causing the workers to hang indefinitely?

Celery is repeating my tasks three times

I call some tasks in celery one time but celery executes all of them three times.
Is it an expected behavior of celery or is it a misconfiguration?
I'm using Django 1.5.11, Celery 3.1.23 and Redis 3.0.6.
You may have some stray workers executing the tasks or an celery flower instance may try to "help" recover unacked messages.
Make sure that only one instance of celery is running with ps -Af | grep celerybeat and check if you have any flower instance running by accessing http://localhost:5555 (it usually runs on that port).

Correct Way for Celery Process Architecture and demonizing

I have a Python/Django Project running on uwsgi/nginx. For asynchronous task we are using rabbitmq/celeryd and supervisord to manage all the daemons
Versions:
python: 2.7
django: 1.9.7
celery: 3.1.23
django-celery: 3.1.17
Celery has 10 queue of type Direct (say queue1, queue2, ...)
Each queue is handled by a separate celeryd process which is manage via supervisord. each supervisord process looks as following
[program:app_queue_worker]
command=/var/www/myproj/venv/bin/celery worker -A myproj -c 2 --queue=queue1 --loglevel=INFO
directory=/var/www/myproj/
user=ubuntu
numprocs=1
autostart=true
autorestart=true
startsecs=10
exitcodes=1
stopwaitsecs = 600
killasgroup=true
priority=1000
Hence Supervisord is running 10 Mainprocess and 20 Worker process
Other Thing I have noticed is uwsgi also spawns some celery workers(Dont understand how and why, YET ) with concurrency=2. So if I have 4 uwsgi process running i will have an addition 10 celery workers running
All these workers are each taking 200-300M memory? Something is wrong here I feel it but I am not able to put my finger on it. Celery shouldn't be running such memory heavy process?
Note: Debug=False, there is no memory leakage due to debug
Can someone please comment on the architecture if it is correct or wrong?
Would it be better to run 2-3 celery MainProcesses which listen all queues at once and increase its concurrency?
Update : celery.py Config
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'MyProject.settings')
from django.conf import settings # noqa
from chatterbox import celery_settings
app = Celery('MyProject')
# Using a string here means the worker will not have to
# pickle the object when using Windows.
app.config_from_object('django.conf:settings')
app.conf.update(
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend',
CELERYD_CONCURRENCY=1,
)
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
There is no simple answer to this.
To me, the fact that uwsgi spawns celery workers is wrong.
Creating only worker processes that consume all queues might lead to the situation where long running tasks make some queues overflow whereas separate workers that consume specific queues with short running tasks could make the situation better. Everything depends on your use case.
The 300mb residual memory is quite a lot. If the tasks are i/o bound go multi-thread/gevent. However, if the tasks are CPU bound, you have no other option than to scale process wise.
If you start a celery worker with concurrency of n, it will spawn n + 1 process by default. Since you are spawning 10 workers with a concurrency of 2, celery will start 30 processes.
Each worker consumes ~60MB(~30MB for main process & 2*~15MB for subprocesses) of memory when not consuming queues. It might vary depending what your worker is doing. If you start 10 workers, it will consume ~600MB of memory.
I am not sure how you came to know that uwsgi also spawns some celery workers. Only supervisor should spawn the process.
You can run just 1 celery worker which listens to all queues with a concurrency of 20. This will reduce your memory usage at the cost of flexibility. With this setup, you can't start/stop consuming from selected queues. Also, there is no guarantee that all queues will be consumed equally.

Delayed Job on Heroku does not work

My app runs fine on my local machine which has 16 Gig of Ram using 'heroku local' command to start both the dyno and workers using the Procfile. The background jobs queued in Delayed Job are processed one-by-one and then the table is emptied. When I run on Heroku, it fails to execute the background processing at all. It gets stuck with the following out of memory message in my logfile:
2016-04-03T23:48:06.382070+00:00 app[web.1]: Using rack adapter
2016-04-03T23:48:06.382149+00:00 app[web.1]: Thin web server (v1.6.4 codename Gob Bluth)
2016-04-03T23:48:06.382154+00:00 app[web.1]: Maximum connections set to 1024
2016-04-03T23:48:06.382155+00:00 app[web.1]: Listening on 0.0.0.0:7557, CTRL+C to stop
2016-04-03T23:48:06.711418+00:00 heroku[web.1]: State changed from starting to up
2016-04-03T23:48:37.519962+00:00 heroku[worker.1]: Process running mem=541M(105.8%)
2016-04-03T23:48:37.519962+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2016-04-03T23:48:59.317063+00:00 heroku[worker.1]: Process running mem=708M(138.3%)
2016-04-03T23:48:59.317063+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2016-04-03T23:49:21.449475+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2016-04-03T23:49:21.449325+00:00 heroku[worker.1]: Process running mem=829M(161.9%)
2016-04-03T23:49:24.273557+00:00 app[worker.1]: rake aborted!
2016-04-03T23:49:24.273587+00:00 app[worker.1]: Can't modify frozen hash
2016-04-03T23:49:24.274764+00:00 app[worker.1]: /app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.6/lib/active_record/attribute_set/builder.rb:45:in `[]='
2016-04-03T23:49:24.274771+00:00 app[worker.1]: /app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.6/lib/active_record/attribute_set.rb:39:in `write_from_user'
I know that R14 is out of memory error. so I have two questions:
Is there anyway that delayed job can be tuned to take less memory. There will be some disk swapping involved, but it least it will run.
Why do I keep getting rake aborted! Can't modify frozen hash error (lines 4 and 5 from bottom of the log shown below). I do not get it in my local environment. What does it mean? Is it memory related?
Thanks in advance for your time. I am running Rails 4.2.6 and delayed_job 4.1.1 as shown below:
→ gem list | grep delayed
delayed_job (4.1.1)
delayed_job_active_record (4.1.0)
delayed_job_web (1.2.10)
Bharat
I found the problem. I am posting my solution here for those who may be running in similar problems.
I increase the heroku worker memory to use 2 standard dynos meaning I gave it 1 Gig memory so as to remove the memory quota problem. That made R14 go away, but still I continued to get
rake aborted!
Can't modify frozen hash
error and the program will crash then. So the problem was clearly here. After much research, I found that the previous programmer had used the 'workless' gem to reduce heroku charges. Workless gem makes heroku workers go to sleep when not being used and therefore no charges are incurred when not running heroku.
What I did not post in my original question is that I have upgraded the app from Rails 3.2.9 to Rails 4.2.6. Also my research showed that the workless gem had not been upgraded in the last three years and there was no mention on rails 4 on their site. So the chances were that it may not work well with Rails 4.2.6 and Heroku.
I saw some lines in my stack trace which were related to the workless gem. This was a clue for me to see what happens if I subtract, i.e., remove this gem from production. So I removed it and redeployed.
The frozen hash error went away and my delayed_job worker ran successfully to completion on Heroku.
The lesson for me was carefully read the log and check out all the dependencies :)
Hope this helps.

Issues with celery daemon

We're having issues with our celery daemon being very flaky. We use a fabric deployment script to restart the daemon whenever we push changes, but for some reason this is causing massive issues.
Whenever the deployment script is run the celery processes are left in some pseudo dead state. They will (unfortunately) still consume tasks from rabbitmq, but they won't actually do anything. Confusingly a brief inspection would indicate everything seems to be "fine" in this state, celeryctl status shows one node online and ps aux | grep celery shows 2 running processes.
However, attempting to run /etc/init.d/celeryd stop manually results in the following error:
start-stop-daemon: warning: failed to kill 30360: No such process
While in this state attempting to run celeryd start appears to work correctly, but in fact does nothing. The only way to fix the issue is to manually kill the running celery processes and then start them again.
Any ideas what's going on here? We also don't have complete confirmation, but we think the problem also develops after a few days (with no activity this is a test server currently) on it's own with no deployment.
I can't say that I know what's ailing your setup, but I've always used supervisord to run celery -- maybe the issue has to do with upstart? Regardless, I've never experienced this with celery running on top of supervisord.
For good measure, here's a sample supervisor config for celery:
[program:celeryd]
directory=/path/to/project/
command=/path/to/project/venv/bin/python manage.py celeryd -l INFO
user=nobody
autostart=true
autorestart=true
startsecs=10
numprocs=1
stdout_logfile=/var/log/sites/foo/celeryd_stdout.log
stderr_logfile=/var/log/sites/foo/celeryd_stderr.log
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
Restarting celeryd in my fab script is then as simple as issuing a sudo supervisorctl restart celeryd.