Python Django Celery is taking too much memory - django

I am running a celery server which have 5,6 task to run periodically. Celery is taking too much memory after 5,6 days of continuous execution.
Celery documentation is very confusing. I am using following settings.
# celeryconfig.py
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'xxx.settings'
# default RabbitMQ broker
BROKER_URL = "amqp://guest:guest#localhost:5672//"
from celery.schedules import crontab
# default RabbitMQ backend
CELERY_RESULT_BACKEND = None
#4 CONCURRENT proccesess are running.
CELERYD_CONCURRENCY = 4
# specify location of log files
CELERYD_LOG_FILE="/var/log/celery/celery.log"
CELERY_ALWAYS_EAGER = True
CELERY_IMPORTS = (
'xxx.celerydir.cron_tasks.deprov_cron_script',
)
CELERYBEAT_SCHEDULE = {
'deprov_cron_script': {
'task': 'xxx.celerydir.cron_tasks.deprov_cron_script.check_deprovision_vms',
'schedule': crontab(minute=0, hour=17),
'args': ''
}
}
I am running celery service using nohup command(this will run this in background).
nohup celery beat -A xxx.celerydir &

After going through documentation. I came to know that DEBUG was True in settings.
Just change value of DEBUG in settings.
REF:https://github.com/celery/celery/issues/2927

Related

How to route tasks to different queues with Celery and Django

I am using the following stack:
Python 3.6
Celery v4.2.1 (Broker: RabbitMQ v3.6.0)
Django v2.0.4.
According Celery's documentation, running scheduled tasks on different queues should be as easy as defining the corresponding queues for the tasks on CELERY_ROUTES, nonetheless all tasks seem to be executed on Celery's default queue.
This is the configuration on my_app/settings.py:
CELERY_BROKER_URL = "amqp://guest:guest#localhost:5672//"
CELERY_ROUTES = {
'app1.tasks.*': {'queue': 'queue1'},
'app2.tasks.*': {'queue': 'queue2'},
}
CELERY_BEAT_SCHEDULE = {
'app1_test': {
'task': 'app1.tasks.app1_test',
'schedule': 15,
},
'app2_test': {
'task': 'app2.tasks.app2_test',
'schedule': 15,
},
}
The tasks are just simple scripts for testing routing:
File app1/tasks.py:
from my_app.celery import app
import time
#app.task()
def app1_test():
print('I am app1_test task!')
time.sleep(10)
File app2/tasks.py:
from my_app.celery import app
import time
#app.task()
def app2_test():
print('I am app2_test task!')
time.sleep(10)
When I run Celery with all the required queues:
celery -A my_app worker -B -l info -Q celery,queue1,queue2
RabbitMQ will show that only the default queue "celery" is running the tasks:
sudo rabbitmqctl list_queues
# Tasks executed by each queue:
# - celery 2
# - queue1 0
# - queue2 0
Does somebody know how to fix this unexpected behavior?
Regards,
I have got it working, there are few things to note here:
According Celery's 4.2.0 documentation, CELERY_ROUTES should be the variable to define queue routing, but it only works for me using CELERY_TASK_ROUTES instead. The task routing seems to be independent from Celery Beat, therefore this will only work for tasks scheduled manually:
app1_test.delay()
app2_test.delay()
or
app1_test.apply_async()
app2_test.apply_async()
To make it work with Celery Beat, we just need to define the queues explicitly in the CELERY_BEAT_SCHEDULE variable. The final setup of the file my_app/settings.py would be as follows:
CELERY_BROKER_URL = "amqp://guest:guest#localhost:5672//"
CELERY_TASK_ROUTES = {
'app1.tasks.*': {'queue': 'queue1'},
'app2.tasks.*': {'queue': 'queue2'},
}
CELERY_BEAT_SCHEDULE = {
'app1_test': {
'task': 'app1.tasks.app1_test',
'schedule': 15,
'options': {'queue': 'queue1'}
},
'app2_test': {
'task': 'app2.tasks.app2_test',
'schedule': 15,
'options': {'queue': 'queue2'}
},
}
And to run Celery listening on those two queues:
celery -A my_app worker -B -l INFO -Q queue1,queue2
Where
-A: name of the project or app.
-B: Initiates the task scheduler Celery beat.
-l: Defines the logging level.
-Q: Defines the queues handled by this worker.
I hope this saves some time to other developers.
adding queue parameter to the decorator may help you,
#app.task(queue='queue1')
def app1_test():
print('I am app1_test task!')
time.sleep(10)
Okay as i have tried the same command that you have used to run the worker so I found that you just have to remove the "celery after the -Q parameter and that'll be fine too.
So the old command is
celery -A my_app worker -B -l info -Q celery,queue1,queue2
And the new command is
celery -A my_app worker -B -l info -Q queue1,queue2

Celery starts the scheduler more often than specified in the settings

Tell me in what there can be a problem with Celery worker? When I run it, it starts executing the task more often than once a second, although it takes an interval of several minutes.
 
Running the bit: "celery market_capitalizations beat -l info --scheduler django_celery_beat.schedulers: DatabaseScheduler"
Launch of a vorker: "celery -A market_capitalizations worker -l info -S django"
 
Maybe I'm not starting the service correctly?
Settings:
INSTALLED_APPS = [
        'django.contrib.admin',
        'django.contrib.auth',
        'django.contrib.contenttypes',
        'django.contrib.sessions',
        'django.contrib.messages',
        'django.contrib.staticfiles',
        'exchange_rates',
        'django_celery_beat',
        'django_celery_results',
        ]
    TIME_ZONE = 'Europe / Saratov'
    USE_I18N = True
    USE_L10N = True
    USE_TZ = True
    CELERY_BROKER_URL = 'redis: // localhost: 6379'
    CELERY_RESULT_BACKEND = 'redis: // localhost: 6379'
    CELERY_ACCEPT_CONTENT = ['application / json']
    CELERY_TASK_SERIALIZER = 'json'
    CELERY_RESULT_SERIALIZER = 'json'
    CELERY_TIMEZONE = TIME_ZONE
    CELERY_ENABLE_UTC = False
    CELERYBEAT_SCHEDULER = 'django_celery_beat.schedulers: DatabaseScheduler'
running services
When the task is started, a request is not sent.
admin panel
Tell me, please, how to make a celery pick up task time from a web page and run the task with it?
I tried to run the task through the code, but it still runs more often than in a second.
 
    
from celery.schedules import crontab
    app.conf.beat_schedule = {
        'add-every-5-seconds': {
            'task': 'save_exchange_rates_task',
            'schedule': 600.0,
            # 'args': (16, 16)
        },
    }
 
I ran into the similiar issue when using django-celery-beat. But when I turn off USE_TZ(USE_TZ = False), the issue was gone.
But I need set USE_TZ to False to let my app TZ not aware the time zone.
If you have any solution, can you share it? Thansk.
My dev environment:
Python 3.7 + Django 2.0 + Celery 4.2 + Django-celery-beat 1.4
BTW,
Now I am using configuration schedule in settings and it is working well
I am still finding the solution to use django-celery-beat to use the db to manager the tasks.
CELERY_BEAT_SCHEDULE = {
'audit-db-every-10-minutes': {
'task': 'myapp.tasks.db_audit',
'schedule': 600.0, # 10 minutes
'args': ()
},
}

Django - setup path to celerybeat scheduler in supervisor

In my django settings.py file I have the following code for the celerybeat scheduler
CELERYBEAT_SCHEDULE = {
'call-every-30-seconds': {
'task': 'myapp.tasks.update_value',
'schedule': timedelta(minutes=30),
},
}
How would I set the path to my CELERYBEAT_SCHEDULE in my supervisord.conf file, which looks like this
[program:celerybeat]
command=celery beat -A RPF1 --schedule path/to/celerybeat/schedule --loglevel=INFO
Any information will be appreciated. Thank you.
Drop off the --schedule argument. It's unnecessary. Celery will pick up the CELERYBEAT_SCHEDULE from the Django environment and use that.

Tasks not executing (Django + Heroku + Celery + RabbitMQ)

I'm using RabbitMQ for the first time and I must be misunderstanding some simple configuration settings. Note that I am encountering this issue while running the app locally right now; I have not yet attempted to launch to production via Heroku.
For this app, every 20 seconds I want to look for some unsent messages in the database, and send them via Twilio. Apologies in advance if I've left some relevant code out of my examples below. I've followed all of the Celery setup/config instructions. Here is my current setup:
BROKER_URL = 'amqp://VflhnMEP:8wGLOrNBP.........Bhshs' # Truncated URL string
from datetime import timedelta
CELERYBEAT_SCHEDULE = {
'send_queued_messages_every_20_seconds': {
'task': 'comm.tasks.send_queued_messages',
'schedule': timedelta(seconds=20),
# 'schedule': crontab(seconds='*/20')
},
}
CELERY_TIMEZONE = 'UTC'
I am pretty sure that the tasks are being racked up in RabbitMQ; here is the dash that I can see with all of the accumulated messages:
The function, 'send_queued_messages' should be called every 20 seconds.
comm/tasks.py
import datetime
from celery.decorators import periodic_task
from comm.utils import get_user_mobile_number
from comm.api import get_twilio_connection, send_message
from dispatch.models import Message
#periodic_task
def send_queued_messages(run_every=datetime.timedelta(seconds=20)):
unsent_messages = Message.objects.filter(sent_success=False)
connection = get_twilio_connection()
for message in unsent_messages:
mobile_number = get_user_mobile_number(message=message)
try:
send_message(
connection=connection,
mobile_number=mobile_number,
message=message.raw_text
)
message.sent_success=True
message.save()
except BaseException as e:
raise e
pass
I'm pretty sure that I have something misconfigured with RabbitMQ or in my Heroku project settings, but I'm not sure how to continue troubleshooting. When I run 'celery -A myproject beat' everything appears to be running smoothly.
(venv)josephs-mbp:myproject josephfusaro$ celery -A myproject beat
celery beat v3.1.18 (Cipater) is starting.
__ - ... __ - _
Configuration ->
. broker -> amqp://VflhnMEP:**#happ...Bhshs
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]#%INFO
. maxinterval -> now (0s)
[2015-05-27 03:01:53,810: INFO/MainProcess] beat: Starting...
[2015-05-27 03:02:13,941: INFO/MainProcess] Scheduler: Sending due task send_queued_messages_every_20_seconds (comm.tasks.send_queued_messages)
[2015-05-27 03:02:34,036: INFO/MainProcess] Scheduler: Sending due task send_queued_messages_every_20_seconds (comm.tasks.send_queued_messages)
So why aren't the tasks executing as they do without Celery being involved*?
My Procfile:
web: gunicorn myproject.wsgi --log-file -
worker: celery -A myproject beat
*I have confirmed that my code executes as expected without Celery being involved!
Special thanks to #MauroRocco for pushing me in the right direction on this. The pieces that I was missing were best explained in this tutorial: https://www.rabbitmq.com/tutorials/tutorial-one-python.html
Note: I needed to modify some of the code in the tutorial to use URLParameters, passing in the resource URL defined in my settings file.
The only line in send.py and receive.py is:
connection = pika.BlockingConnection(pika.URLParameters(BROKER_URL))
and of course we need to import the BROKER_URL variable from settings.py
from settings import BROKER_URL
settings.py
BROKER_URL = 'amqp://VflhnMEP:8wGLOrNBP...4.bigwig.lshift.net:10791/sdklsfssd'
send.py
import pika
from settings import BROKER_URL
connection = pika.BlockingConnection(pika.URLParameters(BROKER_URL))
channel = connection.channel()
channel.queue_declare(queue='hello')
channel.basic_publish(exchange='',
routing_key='hello',
body='Hello World!')
print " [x] Sent 'Hello World!'"
connection.close()
receive.py
import pika
from settings import BROKER_URL
connection = pika.BlockingConnection(pika.URLParameters(BROKER_URL))
channel = connection.channel()
channel.queue_declare(queue='hello')
print ' [*] Waiting for messages. To exit press CTRL+C'
def callback(ch, method, properties, body):
print " [x] Received %r" % (body,)
channel.basic_consume(callback,
queue='hello',
no_ack=True)
channel.start_consuming()

Running periodic tasks with django and celery

I'm trying create a simple background periodic task using Django-Celery-RabbitMQ combination. I installed Django 1.3.1, I downloaded and setup djcelery. Here is how my settings.py file looks like:
BROKER_HOST = "127.0.0.1"
BROKER_PORT = 5672
BROKER_VHOST = "/"
BROKER_USER = "guest"
BROKER_PASSWORD = "guest"
....
import djcelery
djcelery.setup_loader()
...
INSTALLED_APPS = (
'djcelery',
)
And I put a 'tasks.py' file in my application folder with the following contents:
from celery.task import PeriodicTask
from celery.registry import tasks
from datetime import timedelta
from datetime import datetime
class MyTask(PeriodicTask):
run_every = timedelta(minutes=1)
def run(self, **kwargs):
self.get_logger().info("Time now: " + datetime.now())
print("Time now: " + datetime.now())
tasks.register(MyTask)
And then I start up my django server (local development instance):
python manage.py runserver
Then I start up the celerybeat process:
python manage.py celerybeat --logfile=<path_to_log_file> -l DEBUG
I can see entries like this in the log:
[2012-04-29 07:50:54,671: DEBUG/MainProcess] tasks.MyTask sent. id->72a5963c-6e15-4fc5-a078-dd26da663323
And I also can see the corresponding entries getting created in database, but I can't find where it is logging the text I specified in the actual run function in MyTask class.
I tried fiddling with the logging settings, tried using the django logger instead of celery logger, but of no use. I'm not even sure, my task is getting executed. If I print any debug information in the task, where does it go?
Also, this is first time I'm working with any type of message queuing system. It looks like the task will get executed as part of the celerybeat process - outside the django web framework. Will I still be able to access all the django models I created.
Thanks,
Venkat.
Celerybeat it stuff, which pushes task when it need, but not executing them. You tasks instances stored in RabbitMq server. You need to execute celeryd daemon for executing your tasks.
python manage.py celeryd --logfile=<path_to_log_file> -l DEBUG
Also if you using RabbitMq, I recommend to you to install special rabbitmq management plugins:
rabbitmq-plugins list
rabbitmq-enable rabbitmq_management
service rabbitmq-server restart
It will be available at http://:55672/ login: guest pass: guest. Here you can check how many tasks in your rabbit instance online.
You should check the RabbitMQ logs, since celery sends the tasks to RabbitMQ and it should execute them. So all the prints of the tasks should be in RabbitMQ logs.