My Django site running on Heroku is using CloudAMQP to handle its scheduled Celery tasks. CloudAMQP is registering many more messages than I have tasks, and I don't understand why.
e.g., in the past couple of hours I'll have run around 150 scheduled tasks (two that run once a minute, another that runs once every five minutes), but the CloudAMQP console's Messages count increased by around 1,300.
My relevant Django settings:
BROKER_URL = os.environ.get("CLOUDAMQP_URL", "")
BROKER_POOL_LIMIT = 1
BROKER_HEARTBEAT = None
BROKER_CONNECTION_TIMEOUT = 30
CELERY_ACCEPT_CONTENT = ['json',]
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TASK_RESULT_EXPIRES = 7 * 86400
CELERY_SEND_EVENTS = False
CELERY_EVENT_QUEUE_EXPIRES = 60
CELERY_RESULT_BACKEND = None
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
My Procfile:
web: gunicorn myproject.wsgi --log-file -
main_worker: python manage.py celery worker --beat --without-gossip --without-mingle --without-heartbeat --loglevel=info
Looking at the Heroku logs I only see the number of scheduled tasks running that I'd expect.
The RabbitMQ overview graphs tend to look like this most of the time:
I don't understand RabbitMQ enough to know if other panels shed light on the problem. I don't think they show anything obvious that would account for all these messages.
I'd like to at least understand what the extra messages are, and then whether there's a way I can eliminate some or all of them.
I'had got the same error few days ago.
For those who get the same issue, CloudAMQP's doc recommend to add some arguments when you launch your celery :
--without-gossip --without-mingle --without-heartbeat
"This will decrease the message rates substantially. Without these
flags Celery will send hundreds of messages per second with different
diagnostic and redundant heartbeat messages."
And indeed, this fix the issue so far ! And you finally get only YOUR messages sent.
Related
I have a test Django site using a mod_wsgi daemon process, and have set up a simple Celery task to email a contact form, and have installed supervisor.
I know that the Django code is correct.
The problem I'm having is that when I submit the form, I am only getting one message - the first one. Subsequent completions of the contact form do not send any message at all.
On my server, I have another test site with a configured supervisor task running which uses the Django server (ie it's not using mod_wsgi). Both of my tasks are running fine if I do
sudo supervisorctl status
Here is my conf file for the task I've described above which is saved at
/etc/supervisor/conf.d
the user in this instance is called myuser
[program:test_project]
command=/home/myuser/virtualenvs/test_project_env/bin/celery -A test_project worker --loglevel=INFO --concurrency=10 -n worker2#%%h
directory=/home/myuser/djangoprojects/test_project
user=myuser
numprocs=1
stdout_logfile=/var/log/celery/test_project.out.log
stderr_logfile=/var/log/celery/test_project.err.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
stopasgroup=true
; Set Celery priority higher than default (999)
; so, if rabbitmq is supervised, it will start first.
priority=1000
My other test site has this set as the command - note worker1#%%h
command=/home/myuser/virtualenvs/another_test_project_env/bin/celery -A another_test_project worker --loglevel=INFO --concurrency=10 -n worker1#%%h
I'm obviously doing something wrong in that my form is only submitted. If I look at the out.log file referred to above, I only see the first task, nothing is visible for the other form submissions.
Many thanks in advance.
UPDATE
I submitted the first form at 8.32 am (GMT) which was received, and then as described above, another one shortly thereafter for which a task was not created. Just after finishing the question, I submitted the form again at 9.15, and for this a task was created and the message received! I then submitted the form again, but no task was created again. Hope this helps!
use ps auxf|grep celery to see how many worker you started,if there is any other worker you start before and you don't kill it ,the worker you create before will consume the task,result in you every two or three(or more) times there is only one task is received.
and you need to stop celery by:
sudo supervisorctl -c /etc/supervisord/supervisord.conf stop all
everytime, and set this in supervisord.conf:
stopasgroup=true ; send stop signal to the UNIX process group (default false)
Otherwise it will causes memory leaks and regular task loss.
If you has multi django site,here is a demo support by RabbitMQ:
you need add rabbitmq vhost and set user to vhost:
sudo rabbitmqctl add_vhost {vhost_name}
sudo rabbitmqctl set_permissions -p {vhost_name} {username} ".*" ".*" ".*"
and different site use different vhost(but can use same user).
add this to your django settings.py:
BROKER_URL = 'amqp://username:password#localhost:5672/vhost_name'
some info here:
Using celeryd as a daemon with multiple django apps?
Running multiple Django Celery websites on same server
Run Multiple Django Apps With Celery On One Server With Rabbitmq VHosts
Run Multiple Django Apps With Celery On One Server With Rabbitmq VHosts
Blockquote
I have a periodic task that I am implementing on heroku procfile using worker:
Procfile
web: gunicorn voltbe2.wsgi --log-file - --log-level debug
worker: celery -A voltbe2 worker --beat --events --loglevel INFO
tasks.py
class PullXXXActivityTask(PeriodicTask):
"""
A periodic task that fetch data every 1 mins.
"""
run_every = timedelta(minutes=1)
def run(self, **kwargs):
abc= MyModel.objects.all()
for rk in abc:
rk.pull()
logger = self.get_logger(**kwargs)
logger.info("Running periodic task for XXX.")
return True
For this periodictask, I need the --beat (I checked by turning it off, and it does not repeat the task). So, in some way, the --beat does the work of a clock (https://devcenter.heroku.com/articles/scheduled-jobs-custom-clock-processes)
My concern is: if I scale the worker heroku ps:scale worker=2 to 2x dynos, I am seeing that there are two beats running on worker.1 and worker.2 from the logs:
Aug 25 09:38:11 emstaging app/worker.2: [2014-08-25 16:38:11,580: INFO/Beat] Scheduler: Sending due task apps.notification.tasks.SendPushNotificationTask (apps.notification.tasks.SendPushNotificationTask)
Aug 25 09:38:20 emstaging app/worker.1: [2014-08-25 16:38:20,239: INFO/Beat] Scheduler: Sending due task apps.notification.tasks.SendPushNotificationTask (apps.notification.tasks.SendPushNotificationTask)
The log displayed is for a different periodic task, but the key point is that both worker dynos are getting signals to do the same task from their respective clocks, while in fact there should be one clock that ticks and after every XX seconds decides what to do, and gives that task to the least loaded worker.n dyno
More on why a single clock is essential is here : https://devcenter.heroku.com/articles/scheduled-jobs-custom-clock-processes#custom-clock-processes
Is this a problem and how to avoid this, if so?
You should have a separate worker for the beat process.
web: gunicorn voltbe2.wsgi --log-file - --log-level debug
worker: celery -A voltbe2 worker -events -loglevel info
beat: celery -A voltbe2 beat
Now you can scale the worker task without affecting the beat one.
Alternatively, if you won't always need the extra process, you can continue to use -B in the worker task but also have a second task - say, extra_worker - which is normally set to 0 dynos, but which you can scale up as necessary. The important thing is to always keep the task with the beat at 1 process
I use Celery with RabbitMQ in my Django app (on Elastic Beanstalk) to manage background tasks and I daemonized it using Supervisor.
The problem now, is that one of the period task that I defined is failing (after a week in which it worked properly), the error I've got is:
[01/Apr/2014 23:04:03] [ERROR] [celery.worker.job:272] Task clean-dead-sessions[1bfb5a0a-7914-4623-8b5b-35fc68443d2e] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Traceback (most recent call last):
File "/opt/python/run/venv/lib/python2.7/site-packages/billiard/pool.py", line 1168, in mark_as_worker_lost
human_status(exitcode)),
WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
All the processes managed by supervisor are up and running properly (supervisorctl status says RUNNNING).
I tried to read several logs on my ec2 instance but no one seems to help me in finding out what is the cause of the SIGKILL. What should I do? How can I investigate?
These are my celery settings:
CELERY_TIMEZONE = 'UTC'
CELERY_TASK_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT = ['json']
BROKER_URL = os.environ['RABBITMQ_URL']
CELERY_IGNORE_RESULT = True
CELERY_DISABLE_RATE_LIMITS = False
CELERYD_HIJACK_ROOT_LOGGER = False
And this is my supervisord.conf:
[program:celery_worker]
environment=$env_variables
directory=/opt/python/current/app
command=/opt/python/run/venv/bin/celery worker -A com.cygora -l info --pidfile=/opt/python/run/celery_worker.pid
startsecs=10
stopwaitsecs=60
stopasgroup=true
killasgroup=true
autostart=true
autorestart=true
stdout_logfile=/opt/python/log/celery_worker.stdout.log
stdout_logfile_maxbytes=5MB
stdout_logfile_backups=10
stderr_logfile=/opt/python/log/celery_worker.stderr.log
stderr_logfile_maxbytes=5MB
stderr_logfile_backups=10
numprocs=1
[program:celery_beat]
environment=$env_variables
directory=/opt/python/current/app
command=/opt/python/run/venv/bin/celery beat -A com.cygora -l info --pidfile=/opt/python/run/celery_beat.pid --schedule=/opt/python/run/celery_beat_schedule
startsecs=10
stopwaitsecs=300
stopasgroup=true
killasgroup=true
autostart=false
autorestart=true
stdout_logfile=/opt/python/log/celery_beat.stdout.log
stdout_logfile_maxbytes=5MB
stdout_logfile_backups=10
stderr_logfile=/opt/python/log/celery_beat.stderr.log
stderr_logfile_maxbytes=5MB
stderr_logfile_backups=10
numprocs=1
Edit 1
After restarting celery beat the problem remains.
Edit 2
Changed killasgroup=true to killasgroup=false and the problem remains.
The SIGKILL your worker received was initiated by another process. Your supervisord config looks fine, and the killasgroup would only affect a supervisor initiated kill (e.g. the ctl or a plugin) - and without that setting it would have sent the signal to the dispatcher anyway, not the child.
Most likely you have a memory leak and the OS's oomkiller is assassinating your process for bad behavior.
grep oom /var/log/messages. If you see messages, that's your problem.
If you don't find anything, try running the periodic process manually in a shell:
MyPeriodicTask().run()
And see what happens. I'd monitor system and process metrics from top in another terminal, if you don't have good instrumentation like cactus, ganglia, etc for this host.
One sees this kind of error when an asynchronous task (through celery) or the script you are using is storing a lot of data in memory because it leaks.
In my case, I was getting data from another system and saving it on a variable, so I could export all data (into Django model / Excel file) after finishing the process.
Here is the catch. My script was gathering 10 Million data; it was leaking memory while I was gathering data. This resulted in the raised Exception.
To overcome the issue, I divided 10 million pieces of data into 20 parts (half a million on each part). I stored the data in my own preferred local file / Django model every time the length of data reached 500,000 items. I repeated this for every batch of 500k items.
No need to do the exact number of partitions. It is the idea of solving a complex problem by splitting it into multiple subproblems and solving the subproblems one by one. :D
I'm running multiple Django/apache/wsgi websites on the same server using apache2 virtual servers. And I would like to use celery, but if I start celeryd for multiple websites, all the websites will use the configuration (logs, DB, etc) of the last celeryd instance I started.
Is there a way to use multiple Celeryd (one for each website) or one Celeryd for all of them? Seems like it should be doable, but I can't find out how.
This problem was a big headache, i didn't noticed #Crazyshezy 's comment when i first came here. I just accomplished this by changing Broker URL in settings.py for each web app.
app1.settings.py
BROKER_URL = 'redis://localhost:6379/0'
app2.settings.py
BROKER_URL = 'redis://localhost:6379/1'
Yes there is a way.
We use supervisor to start celery daemons for every project we need it.
The supervisor config file looks something like this:
[program:PROJECTNAME]
command=python manage.py celeryd --loglevel=INFO --beat
environment=PATH=/home/www-data/projects/PROJECTNAME/env/bin:/usr/bin:/bin
directory=/home/www-data/projects/PROJECTNAME/
user=www-data
numprocs=1
umask=022
stdout_logfile=/home/www-data/logs/%(program_name)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=10
stderr_logfile=/home/www-data/logs/%(program_name)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=10
autorestart=true
autostart=True
startsecs=10
stopwaitsecs = 60
priority=998
There is also an other advantage if you use this setup: The celery daemons run entirely in userspace.
Remember to use different broker backends for your projects. It won't work if you use the same rabbitmq virtualhost or if you use the same redis database for every project.
I cannot figure out why celeryd is not pulling new tasks that are added to the queue. It will only retrieve tasks once it is started, then fails to monitor thereafter. I am running the Django development server with django-celery using the Django ORM for the message broker.
My Django settings file has this configuration for celery:
INSTALLED_APPS += ("djcelery", )
INSTALLED_APPS += ("djkombu", )
import djcelery
djcelery.setup_loader()
BROKER_TRANSPORT = "djkombu.transport.DatabaseTransport"
BROKER_HOST = "localhost"
BROKER_PORT = 5672
BROKER_USER = "guest"
BROKER_PASSWORD = "guest"
BROKER_VHOST = "/"
Furthermore if it matters, I am using a remote MySQL server as my database backend.
Edit: Resolved!
After spending several hours on this, I found an answer in the Celery FAQ:
MySQL is throwing deadlock errors, what can I do?
I wasn't seeing the deadlock errors in the log, but modifying my /etc/my.cnf to include:
[mysqld]
transaction-isolation = READ-COMMITTED
resolved my issue.
Hope this helps someone else!
After spending several hours on this, I found an answer in the Celery FAQ:
MySQL is throwing deadlock errors, what can I do?
I wasn't seeing the deadlock errors in the log, but modifying my /etc/my.cnf to include:
[mysqld]
transaction-isolation = READ-COMMITTED
resolved my issue.
Hope this helps someone else!