RabbitMQ/Celery/Django Memory Leak? - django

I recently took over another part of the project that my company is working on and have discovered what seems to be a memory leak in our RabbitMQ/Celery setup.
Our system has 2Gb of memory, with roughly 1.8Gb free at any given time. We have multiple tasks that crunch large amounts of data and add them to our database.
When these tasks run, they consume a rather large amount of memory, quickly plummeting our available memory to anywhere between 16Mb and 300Mb. The problem is, after these tasks finish, the memory does not come back.
We're using:
RabbitMQ v2.7.1
AMQP 0-9-1 / 0-9 / 0-8 (got this line from the
RabbitMQ startup_log)
Celery 2.4.6
Django 1.3.1
amqplib 1.0.2
django-celery 2.4.2
kombu 2.1.0
Python 2.6.6
erlang 5.8
Our server is running Debian 6.0.4.
I am new to this setup, so if there is any other information you need that could help me determine where this problem is coming from, please let me know.
All tasks have return values, all tasks have ignore_result=True, CELERY_IGNORE_RESULT is set to True.
Thank you very much for your time.
My current config file is:
CELERY_TASK_RESULT_EXPIRES = 30
CELERY_MAX_CACHED_RESULTS = 1
CELERY_RESULT_BACKEND = False
CELERY_IGNORE_RESULT = True
BROKER_HOST = 'localhost'
BROKER_PORT = 5672
BROKER_USER = c.celery.u
BROKER_PASSWORD = c.celery.p
BROKER_VHOST = c.celery.vhost

I am almost certain you are running this setup with DEBUG=True wich leads to a memory leak.
Check this post: Disable Django Debugging for Celery.
I'll post my configuration in case it helps.
settings.py
djcelery.setup_loader()
BROKER_HOST = "localhost"
BROKER_PORT = 5672
BROKER_VHOST = "rabbit"
BROKER_USER = "YYYYYY"
BROKER_PASSWORD = "XXXXXXX"
CELERY_IGNORE_RESULT = True
CELERY_DISABLE_RATE_LIMITS = True
CELERY_ACKS_LATE = True
CELERYD_PREFETCH_MULTIPLIER = 1
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
CELERY_ROUTES = ('FILE_WITH_ROUTES',)

You might be hitting this issue in librabbitmq. Please check whether or not Celery is using librabbitmq>=1.0.1.
A simple fix to try is: pip install librabbitmq>=1.0.1.

Related

Logging in Django does not work when using uWSGI

I have a problem logging into a file using python built-in module.
Here is an example of how logs are generated:
logging.info('a log message')
Logging works fine when running the app directly through Python. However when running the app through uWSGI, logging does not work.
Here is my uWSGI configuration:
[uwsgi]
module = myapp.app:application
master = true
processes = 5
uid = nginx
socket = /run/uwsgi/myapp.sock
chown-socket = nginx:nginx
chmod-socket = 660
vacuum = true
die-on-term = true
logto = /var/log/myapp/myapp.log
logfile-chown = nginx:nginx
logfile-chmod = 640
EDIT:
The path /var/log/myapp/myapp.log is logging nginx access logs. There is another path configured in a settings.py file. The 2nd path is where application logs are ment to go. But there are non when using uWSGI.
Thanks in advance

Celery 4 + Django + Redis, missing django settings section in documentation?

I am trying to setup celery 4 in my Django project which I want Redis as broker. But I cannot find Django specific settings for broker in the Celery 4 documentation? Also the settings documentation for version 4 does not mention about CELERY_BROKER_URL anymore, I am sure the version 3 documentation does mention these settings.
I searched on the web and found these settings:
CELERY_BROKER_URL = 'redis://localhost:6379'
CELERY_RESULT_BACKEND = 'redis://localhost:6379'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TASK_SERIALIZER = 'json'
But I am not sure if it's for version 3 or version 4. I am utterly confused.
OK! Found paragraph buried inside of the "First steps with Django" documentation:
The uppercase name-space means that all Celery configuration options must be specified in uppercase instead of lowercase, and start with CELERY_, so for example the task_always_eager setting becomes CELERY_TASK_ALWAYS_EAGER, and the broker_url setting becomes CELERY_BROKER_URL. This also applies to the workers settings, for instance, the worker_concurrency setting becomes CELERY_WORKER_CONCURRENCY.

Proper replacement for CELERY_RESULT_BACKEND when upgrading to Celery 4.x for django 1.11

In trying to replace django-celery and upgrade celery to 4.x from an inherited project, I'm having hard time understanding the real changes to effect.
Celery is already setup as the project uses 3.x, however in removing djcelery from the app, I come across this:
CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
Reading the docs, I'm more confused about using result_backend or celery.backend.database or which:
CELERY_RESULT_BACKEND = 'celery.backends.database'
CELERYBEAT_SCHEDULER = 'beat_scheduler' OR
CELERY_RESULT_BACKEND: result_backend
CELERYBEAT_SCHEDULER: beat_scheduler
I'm new to Celery, still getting familiar with the details.
Celery 4 changed their settings as follows: http://docs.celeryproject.org/en/latest/userguide/configuration.html#new-lowercase-settings
The major difference between previous versions, apart from the lower
case names, are the renaming of some prefixes, like celerybeat_ to
beat_, celeryd_ to worker_, and most of the top level celery_ settings
have been moved into a new task_ prefix.
Celery will still be able to read old configuration files, so there’s
no rush in moving to the new settings format.
The expectation is that you use result_backend instead of CELERY_RESULT_BACKEND. Full mapping of old upper case settings to new ones are documented here: http://docs.celeryproject.org/en/latest/userguide/configuration.html#new-lowercase-settings
In other words, resut_backend is the new name of the key, NOT the new recommended value. It is the replacement for the left hand side of your assignment. These are equivalent:
CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
result_backend = 'djcelery.backends.database:DatabaseBackend'
Likewise these are equivalent:
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
beat_scheduler = 'djcelery.schedulers.DatabaseScheduler'

Celery starts the scheduler more often than specified in the settings

Tell me in what there can be a problem with Celery worker? When I run it, it starts executing the task more often than once a second, although it takes an interval of several minutes.
 
Running the bit: "celery market_capitalizations beat -l info --scheduler django_celery_beat.schedulers: DatabaseScheduler"
Launch of a vorker: "celery -A market_capitalizations worker -l info -S django"
 
Maybe I'm not starting the service correctly?
Settings:
INSTALLED_APPS = [
        'django.contrib.admin',
        'django.contrib.auth',
        'django.contrib.contenttypes',
        'django.contrib.sessions',
        'django.contrib.messages',
        'django.contrib.staticfiles',
        'exchange_rates',
        'django_celery_beat',
        'django_celery_results',
        ]
    TIME_ZONE = 'Europe / Saratov'
    USE_I18N = True
    USE_L10N = True
    USE_TZ = True
    CELERY_BROKER_URL = 'redis: // localhost: 6379'
    CELERY_RESULT_BACKEND = 'redis: // localhost: 6379'
    CELERY_ACCEPT_CONTENT = ['application / json']
    CELERY_TASK_SERIALIZER = 'json'
    CELERY_RESULT_SERIALIZER = 'json'
    CELERY_TIMEZONE = TIME_ZONE
    CELERY_ENABLE_UTC = False
    CELERYBEAT_SCHEDULER = 'django_celery_beat.schedulers: DatabaseScheduler'
running services
When the task is started, a request is not sent.
admin panel
Tell me, please, how to make a celery pick up task time from a web page and run the task with it?
I tried to run the task through the code, but it still runs more often than in a second.
 
    
from celery.schedules import crontab
    app.conf.beat_schedule = {
        'add-every-5-seconds': {
            'task': 'save_exchange_rates_task',
            'schedule': 600.0,
            # 'args': (16, 16)
        },
    }
 
I ran into the similiar issue when using django-celery-beat. But when I turn off USE_TZ(USE_TZ = False), the issue was gone.
But I need set USE_TZ to False to let my app TZ not aware the time zone.
If you have any solution, can you share it? Thansk.
My dev environment:
Python 3.7 + Django 2.0 + Celery 4.2 + Django-celery-beat 1.4
BTW,
Now I am using configuration schedule in settings and it is working well
I am still finding the solution to use django-celery-beat to use the db to manager the tasks.
CELERY_BEAT_SCHEDULE = {
'audit-db-every-10-minutes': {
'task': 'myapp.tasks.db_audit',
'schedule': 600.0, # 10 minutes
'args': ()
},
}

Correct timesettings in Django for Celery

Im wondering how to correctly use timesettings in django and celery.
Here is what I have:
TIME_ZONE = 'Europe/Oslo'
CELERY_TIMEZONE = 'Europe/Oslo'
CELERY_ENABLE_UTC = True
USE_TZ = True
TZINFO = 'UTC'
But the timestamp on my Celery task is ahead by two hours. How can I fix it?
Using:
Django - 1.6b2
celery - 3.0.23
django-celery - 3.0.23
You can use TZ default environment variable. Django will automatically use it with calling: http://docs.python.org/2/library/time.html#time.tzset
If your celery runs from django, it will work there too.
Also you could use something like:
os.environ['TZ'] = 'your timezone'
at the beginning of ( manage.py or wsgi.py ) in your local installation.
I think you might be hitting a bug in django-celery that I am also running into. There were timezone related changes in the last few releases of django-celery and this bug first showed up for me when I updated from 3.0.19 to 3.0.23.
I asked about this on the #celery IRC chat and was told that the django admin based celery task view is not that great and I should be using something like Flower (https://github.com/mher/flower) to monitor my tasks.
I installed and ran Flower and it did not suffer from the same timestamp issues that the django-celery admin based view does.