Celery 4 + Django + Redis, missing django settings section in documentation? - django

I am trying to setup celery 4 in my Django project which I want Redis as broker. But I cannot find Django specific settings for broker in the Celery 4 documentation? Also the settings documentation for version 4 does not mention about CELERY_BROKER_URL anymore, I am sure the version 3 documentation does mention these settings.
I searched on the web and found these settings:
CELERY_BROKER_URL = 'redis://localhost:6379'
CELERY_RESULT_BACKEND = 'redis://localhost:6379'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TASK_SERIALIZER = 'json'
But I am not sure if it's for version 3 or version 4. I am utterly confused.

OK! Found paragraph buried inside of the "First steps with Django" documentation:
The uppercase name-space means that all Celery configuration options must be specified in uppercase instead of lowercase, and start with CELERY_, so for example the task_always_eager setting becomes CELERY_TASK_ALWAYS_EAGER, and the broker_url setting becomes CELERY_BROKER_URL. This also applies to the workers settings, for instance, the worker_concurrency setting becomes CELERY_WORKER_CONCURRENCY.

Related

Proper replacement for CELERY_RESULT_BACKEND when upgrading to Celery 4.x for django 1.11

In trying to replace django-celery and upgrade celery to 4.x from an inherited project, I'm having hard time understanding the real changes to effect.
Celery is already setup as the project uses 3.x, however in removing djcelery from the app, I come across this:
CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
Reading the docs, I'm more confused about using result_backend or celery.backend.database or which:
CELERY_RESULT_BACKEND = 'celery.backends.database'
CELERYBEAT_SCHEDULER = 'beat_scheduler' OR
CELERY_RESULT_BACKEND: result_backend
CELERYBEAT_SCHEDULER: beat_scheduler
I'm new to Celery, still getting familiar with the details.
Celery 4 changed their settings as follows: http://docs.celeryproject.org/en/latest/userguide/configuration.html#new-lowercase-settings
The major difference between previous versions, apart from the lower
case names, are the renaming of some prefixes, like celerybeat_ to
beat_, celeryd_ to worker_, and most of the top level celery_ settings
have been moved into a new task_ prefix.
Celery will still be able to read old configuration files, so there’s
no rush in moving to the new settings format.
The expectation is that you use result_backend instead of CELERY_RESULT_BACKEND. Full mapping of old upper case settings to new ones are documented here: http://docs.celeryproject.org/en/latest/userguide/configuration.html#new-lowercase-settings
In other words, resut_backend is the new name of the key, NOT the new recommended value. It is the replacement for the left hand side of your assignment. These are equivalent:
CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
result_backend = 'djcelery.backends.database:DatabaseBackend'
Likewise these are equivalent:
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
beat_scheduler = 'djcelery.schedulers.DatabaseScheduler'

Why is python-pdfkit hanging on printing page with OpenLayers3 content when run with uWSGI and NGINX?

I'm using Django served by uWSGI and NGINX.
Ubuntu 14.04.1 LTS 64-bit
Python 3.4
Django 1.7.4
uWSGI 1.9.17.1-debian (64bit)
NGINX 1.4.6
python-pdfkit 0.5.0
wkhtmltopdf 0.12.2.1
OpenLayers v3.0.0
When I try running pdfkit.from_url(...) to print a map to pdf the request times out.
More specifically it hangs in python's subprocess.py communicate, self._communicate:
with _PopenSelector() as selector:
if self.stdin and input:
selector.register(self.stdin, selectors.EVENT_WRITE)
if self.stdout:
selector.register(self.stdout, selectors.EVENT_READ)
if self.stderr:
selector.register(self.stderr, selectors.EVENT_READ)
while selector.get_map():
...
selector.get_map() always returns a valid result, ensuring an infinite loop.
If I run this in the Django development server (instead of uWSGI+NGINX) everything runs fine.
in my view:
wkhtmltopdfBinLocationString = '/usr/local/bin/wkhtmltopdf'
wkhtmltopdfBinLocationBytes = wkhtmltopdfBinLocationString.encode('utf-8')
#this fixes some leftover python2 assumptions about strings
config = pdfkit.configuration(wkhtmltopdf=wkhtmltopdfBinLocationBytes)
pdfkit.from_url(reportPdfUrl, reportPdfFile, configuration=config, options={
'javascript-delay': 1500
})
Several places I have seen answers along the line of "set the close-on-exec flag on the socket" solving similar issues.
Is this something I can set from my "from_url" options (wkhtmltopdf does not accept it by that name) or can I configure uWSGI to assume 'close-on-exec'? I have not been able to make either of these work, but maybe I just need help with changing my uWSGI customization file:
[uwsgi]
workers = 1
chdir = [...]
plugins = python34
wsgi-file = [...]/wsgi.py
pythonpath = [...]
I tried something like
close-on-exec = true
but that didn't seem to do anything.
NOTE: the wsgi.py file is simple:
"""
WSGI config for dst project.
It exposes the WSGI callable as a module-level variable named ``application``.
For more information on this file, see
https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/
"""
import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "[my_project].settings")
from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()
Any thoughts?

Correct timesettings in Django for Celery

Im wondering how to correctly use timesettings in django and celery.
Here is what I have:
TIME_ZONE = 'Europe/Oslo'
CELERY_TIMEZONE = 'Europe/Oslo'
CELERY_ENABLE_UTC = True
USE_TZ = True
TZINFO = 'UTC'
But the timestamp on my Celery task is ahead by two hours. How can I fix it?
Using:
Django - 1.6b2
celery - 3.0.23
django-celery - 3.0.23
You can use TZ default environment variable. Django will automatically use it with calling: http://docs.python.org/2/library/time.html#time.tzset
If your celery runs from django, it will work there too.
Also you could use something like:
os.environ['TZ'] = 'your timezone'
at the beginning of ( manage.py or wsgi.py ) in your local installation.
I think you might be hitting a bug in django-celery that I am also running into. There were timezone related changes in the last few releases of django-celery and this bug first showed up for me when I updated from 3.0.19 to 3.0.23.
I asked about this on the #celery IRC chat and was told that the django admin based celery task view is not that great and I should be using something like Flower (https://github.com/mher/flower) to monitor my tasks.
I installed and ran Flower and it did not suffer from the same timestamp issues that the django-celery admin based view does.

Multiple Django projects inadvertently receiving each other's celery tasks

I have five different Django projects all running on one box with one installation of RabbitMQ. I use celery for various tasks. Each project appears to be receiving tasks meant for other projects.
Each codebase has it's own virtual environment where something like the following is run:
./manage.py celeryd --concurrency=2 --queues=high_priority
The parameters in each settings.py look like the following:
CELERY_SEND_EVENTS = True
CELERY_TASK_RESULT_EXPIRES = 10
CELERY_RESULT_BACKEND = 'amqp'
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
CELERY_TIMEZONE = 'UTC'
BROKER_URL = 'amqp://guest#127.0.0.1:5672//'
BROKER_VHOST = 'specific_app_name'
I'm seeing tracebacks that make me think apps are receiving each other's messages when they shouldn't be:
Traceback (most recent call last):
File "/home/.../.virtualenvs/.../local/lib/python2.7/site-packages/kombu/messaging.py", line 556, in _receive_callback
decoded = None if on_m else message.decode()
File "/home/.../.virtualenvs/.../local/lib/python2.7/site-packages/kombu/transport/base.py", line 147, in decode
self.content_encoding, accept=self.accept)
File "/home/.../.virtualenvs/.../local/lib/python2.7/site-packages/kombu/serialization.py", line 187, in decode
return decode(data)
File "/home/.../.virtualenvs/.../local/lib/python2.7/site-packages/kombu/serialization.py", line 74, in pickle_loads
return load(BytesIO(s))
ImportError: No module named emails.models
The emails.models module in this case appears in one project but not the others. Yet the others are showing this traceback.
I haven't look at multiple node names or anything like that. Would something like that fix this problem?
Your AMQP settings in celeryconfig.py are wrong. You are using:
BROKER_URL = 'amqp://guest#127.0.0.1:5672//'
BROKER_VHOST = 'specific_app_name'
The BROKER_VHOST parameter is ignored because BROKER_URL is present (also it is deprecated). If you want to use virtualhosts (which by the way is the preferred way to solve the problem you presented) you should create a virtualhost for each app and use the following in each app settings:
BROKER_URL = 'amqp://guest#127.0.0.1:5672//specific_app_name'
edited: fixed missing /
You should specify different queue settings for each of the projects. For example:
CELERY_QUEUES = {
"celery": {
"exchange": "project1_celery",
"binding_key": "project1_celery"},
}
CELERY_DEFAULT_QUEUE = "celery"
For the second project you specify exchange and binding_key as project2_celery and so on.
The code I posted is for Celery<3.0. If you are using a newer version, it would probably look like the following (I haven't used the new versions myself yet, so I'm not sure):
from kombu import Exchange, Queue
CELERY_DEFAULT_QUEUE = 'celery'
CELERY_QUEUES = (
Queue('celery', Exchange('project1_celery'), routing_key='project1_celery'),
)
You can read more in the celery docs: http://docs.celeryproject.org/en/latest/userguide/routing.html
With multiple django projects sharing the same box, you need to explicitly "namespace" #tasks per project. The error msg returned a namespace of "emails.models", thats not unique to any one project.
For example, if one project is name "project1", and another "project2", just add "name=" parameters to the #task decorators:
# project1
# emails.py
#tasks(name=project1.emails.my_email_function, queue=high_priority)
def my_email_function(user_id):
return [x of x in user_id if spam()]
# project2
# tasks.py
#tasks(name=project2.tasks.my_task_function, queue=high_priority)
def my_task_function(user_id):
return [x of x in user_id if blahblah()]

RabbitMQ/Celery/Django Memory Leak?

I recently took over another part of the project that my company is working on and have discovered what seems to be a memory leak in our RabbitMQ/Celery setup.
Our system has 2Gb of memory, with roughly 1.8Gb free at any given time. We have multiple tasks that crunch large amounts of data and add them to our database.
When these tasks run, they consume a rather large amount of memory, quickly plummeting our available memory to anywhere between 16Mb and 300Mb. The problem is, after these tasks finish, the memory does not come back.
We're using:
RabbitMQ v2.7.1
AMQP 0-9-1 / 0-9 / 0-8 (got this line from the
RabbitMQ startup_log)
Celery 2.4.6
Django 1.3.1
amqplib 1.0.2
django-celery 2.4.2
kombu 2.1.0
Python 2.6.6
erlang 5.8
Our server is running Debian 6.0.4.
I am new to this setup, so if there is any other information you need that could help me determine where this problem is coming from, please let me know.
All tasks have return values, all tasks have ignore_result=True, CELERY_IGNORE_RESULT is set to True.
Thank you very much for your time.
My current config file is:
CELERY_TASK_RESULT_EXPIRES = 30
CELERY_MAX_CACHED_RESULTS = 1
CELERY_RESULT_BACKEND = False
CELERY_IGNORE_RESULT = True
BROKER_HOST = 'localhost'
BROKER_PORT = 5672
BROKER_USER = c.celery.u
BROKER_PASSWORD = c.celery.p
BROKER_VHOST = c.celery.vhost
I am almost certain you are running this setup with DEBUG=True wich leads to a memory leak.
Check this post: Disable Django Debugging for Celery.
I'll post my configuration in case it helps.
settings.py
djcelery.setup_loader()
BROKER_HOST = "localhost"
BROKER_PORT = 5672
BROKER_VHOST = "rabbit"
BROKER_USER = "YYYYYY"
BROKER_PASSWORD = "XXXXXXX"
CELERY_IGNORE_RESULT = True
CELERY_DISABLE_RATE_LIMITS = True
CELERY_ACKS_LATE = True
CELERYD_PREFETCH_MULTIPLIER = 1
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
CELERY_ROUTES = ('FILE_WITH_ROUTES',)
You might be hitting this issue in librabbitmq. Please check whether or not Celery is using librabbitmq>=1.0.1.
A simple fix to try is: pip install librabbitmq>=1.0.1.