Running periodic tasks with django and celery - django

I'm trying create a simple background periodic task using Django-Celery-RabbitMQ combination. I installed Django 1.3.1, I downloaded and setup djcelery. Here is how my settings.py file looks like:
BROKER_HOST = "127.0.0.1"
BROKER_PORT = 5672
BROKER_VHOST = "/"
BROKER_USER = "guest"
BROKER_PASSWORD = "guest"
....
import djcelery
djcelery.setup_loader()
...
INSTALLED_APPS = (
'djcelery',
)
And I put a 'tasks.py' file in my application folder with the following contents:
from celery.task import PeriodicTask
from celery.registry import tasks
from datetime import timedelta
from datetime import datetime
class MyTask(PeriodicTask):
run_every = timedelta(minutes=1)
def run(self, **kwargs):
self.get_logger().info("Time now: " + datetime.now())
print("Time now: " + datetime.now())
tasks.register(MyTask)
And then I start up my django server (local development instance):
python manage.py runserver
Then I start up the celerybeat process:
python manage.py celerybeat --logfile=<path_to_log_file> -l DEBUG
I can see entries like this in the log:
[2012-04-29 07:50:54,671: DEBUG/MainProcess] tasks.MyTask sent. id->72a5963c-6e15-4fc5-a078-dd26da663323
And I also can see the corresponding entries getting created in database, but I can't find where it is logging the text I specified in the actual run function in MyTask class.
I tried fiddling with the logging settings, tried using the django logger instead of celery logger, but of no use. I'm not even sure, my task is getting executed. If I print any debug information in the task, where does it go?
Also, this is first time I'm working with any type of message queuing system. It looks like the task will get executed as part of the celerybeat process - outside the django web framework. Will I still be able to access all the django models I created.
Thanks,
Venkat.

Celerybeat it stuff, which pushes task when it need, but not executing them. You tasks instances stored in RabbitMq server. You need to execute celeryd daemon for executing your tasks.
python manage.py celeryd --logfile=<path_to_log_file> -l DEBUG
Also if you using RabbitMq, I recommend to you to install special rabbitmq management plugins:
rabbitmq-plugins list
rabbitmq-enable rabbitmq_management
service rabbitmq-server restart
It will be available at http://:55672/ login: guest pass: guest. Here you can check how many tasks in your rabbit instance online.

You should check the RabbitMQ logs, since celery sends the tasks to RabbitMQ and it should execute them. So all the prints of the tasks should be in RabbitMQ logs.

Related

Update database fields hourly with Python/Django

Suppose I have 1000 user_ids in a table and I would run every hour to get from Google API info and update 3 fields in that table. How would the impact be and how can it be done efficiently?
I've seen this variant:
m = Module.objects.get(user_id=1).update(field_one=100, field_two=200, field_three=300)
And this one:
m = Module.objects.get(user_id=1)
m.field_one = 100
m.field_two = 200
m.field_three = 300
m.save()
Also how can it be done so that it will run every hour and grab that information? Never done something like this.
Use Redis, Celery to setup asynchronous task queue every hour. Look here https://realpython.com/blog/python/asynchronous-tasks-with-django-and-celery/ for more info on how to setup asych task queue system for django.
Here is the code for tasks.py
from celery.task import periodic_task
from celery.schedules import crontab
#periodic_task(run_every=crontab(minute=0, hour='*/1'))
def get_data_from_google_api():
data_from_google =ping_google_api() # ping google api to get data
return Module.objects.get(user_id=1).update(field_one= data_from_google['field_one'], field_two= data_from_google['field_two'], field_three= data_from_google['field_three'])
Look here for more info :
https://www.caktusgroup.com/blog/2014/06/23/scheduling-tasks-celery/
How to run a Django celery task every 6am and 6pm daily?
Fof this purpose you need to run background queries with periodic taks.
Here is most popular in django task-queue-libs
For example, if you decide use celery, you can write simple periodic task:
from celery.schedules import crontab
from celery.task import periodic_task
#periodic_task(
name='UPDATE_USER',
run_every=crontab(
minute='1',
hour='1,4,7,10,13,16,19,22'))
def update_user():
#get some value from api
Module.objects.filter(user_id=1).update(
field_one=value, field_two=value, field_three=value)
All settings for django you can look in celery docs

django-celery as a daemon: not working

I have a website project written with django, celery and rabbitmq. And a '.delay' task (the task creates a new folder) is called when a button is clicked.
Everything works fine with celery (the .delay task is called, and a new folder is created) when I run celery with manage.py like:
python manage.py celeryd
However, when I ran celery as the daemon, even there was no error, the task was not executed (no folder was created).
I was kind of following the tutorial: http://www.arruda.blog.br/programacao/django-celery-in-daemon/
My settings are:
/etc/default/celeryd
:
# Name of nodes to start, here we have a single node
CELERYD_NODES="w1"
# Where to chdir at start.
CELERYD_CHDIR="/var/www/myproject"
# How to call "manage.py celeryd_multi"
CELERYD_MULTI="$CELERYD_CHDIR/manage.py celeryd_multi"
# How to call "manage.py celeryctl"
CELERYCTL="$CELERYD_CHDIR/manage.py celeryctl"
# Extra arguments to celeryd
CELERYD_OPTS=""
# Name of the celery config module.
CELERY_CONFIG_MODULE="myproject.settings"
# %n will be replaced with the nodename.
CELERYD_LOG_FILE="/var/log/celery/w1.log"
CELERYD_PID_FILE="/var/run/celery/w1.pid"
# Workers should run as an unprivileged user.
#CELERYD_USER="root"
#CELERYD_GROUP="root"
# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="myproject.settings"
the correlated folders are created too
for the '/etc/default/celeryd/init.d' file, I used this version:
https://raw.github.com/ask/celery/1da3aa43d1e6de525beeda398d0acb8841d5b4d2/contrib/generic-init.d/celeryd
for /var/www/myproject/myproject/settings.py, I have:
:
import djcelery
djcelery.setup_loader()
BROKER_HOST = "127.0.0.1"
BROKER_PORT = 5672
BROKER_VHOST = "/"
BROKER_USER = "guest"
BROKER_PASSWORD = "guest"
INSTALLED_APPS = (
'djcelery',
...
)
There was no error when I start celery by using:
/etc/init.d/celeryd start
and no results neither. Does someone know how to fix the problem?
Celery's docs have a daemon troubleshooting section that might be helpful. Celery has a flag that lets you run your init script without actually daemonizing, and that should show what's going wrong:
C_FAKEFORK=1 sh -x /etc/init.d/celeryd start
Newer versions of that init script have a dryrun command that's an easier-to-remember way to run the start command without daemonizing.

celery parallel tasking error 'no result backend configured'

Running django-celery 3.1.16, Celery 3.1.17, Django 1.4.16. Trying to run some parallel tasks using 3 workers and collect the results using the following:
from celery import group
positions = []
jobs = group(celery_calculate_something.s(data.id) for data in a_very_big_list)
results = jobs.apply_async()
positions.extend(results.get())
The task celery_calculate_something returns an object to place the in the results list:
app.task(ignore_result=False)
def celery_calculate_something(id):
<do stuff>
No matter what I try, I always get the same result when calling get() on results:
No result backend configured. Please see the documentation for more information.
However, the results backend IS configured - I have many other tasks with ignore_result=False merrily adding to the tasks meta table in django_celery. It is something to do with using the results returned from group(). I should note it is not set explicitly in settings - it seems that django-celery has set it automatically for you.
I have the worker collecting events using:
manage.py celery worker -l info -E
and celerycam running with
python manage.py celerycam
Inspecting the results object returned (an instance of GroupResult) I can see that the backend attr is an instance of DisabledBackend. Is this the problem? What have I mis-understood?
You did not configure the results backend, so basically you need tables to store the results, since you have django-celery add it to INSTALLED_APPS in your settings.py file and then perform the migration (python manage.py migrate) After that open your celery.py file and modify your backend to djcelery.backends.database:DatabaseBackend. Here's an example
app = Celery('almanet',
broker='amqp://guest#localhost//',
backend='djcelery.backends.database:DatabaseBackend',
include=['alm_crm.tasks'] #References your tasks. Donc forget to put the whole absolute path.
)
After that you can import results from celery import result Now you can save the result and extract the result by job.id
from celery import group
positions = []
jobs = group(celery_calculate_something.s(data.id) for data in
a_very_big_list)
results = jobs.apply_async()
results.save()
some_task_result = result.GroupResult.restore(results.id)
print some_task_results.ready()

How to call task properly?

I configured django-celery in my application. This is my task:
from celery.decorators import task
import simplejson as json
import requests
#task
def call_api(sid):
try:
results = requests.put(
'http://localhost:8000/api/v1/sids/'+str(sid)+"/",
data={'active': '1'}
)
json_response = json.loads(results.text)
except Exception, e:
print e
logger.info('Finished call_api')
When I add in my view:
call_api.apply_async(
(instance.service.id,),
eta=instance.date
)
celeryd shows me:
Got task from broker: my_app.tasks.call_api[755d50fd-0f0f-4861-9a18-7f4e4563290a]
Task my_app.tasks.call_api[755d50fd-0f0f-4861-9a18-7f4e4563290a] succeeded in 0.00513911247253s: None
so should be good, but nothing happen... There is no call to for example:
http://localhost:8000/api/v1/sids/1/
What am I doing wrong?
Are you running celery as a separate process?
For example in Ubuntu run using the command
sudo python manage.py celeryd
Till you run celery (or django celery) as a separate process, the jobs will be stored in the database (or queue or the persistent mechanism you have configured - generally in settings.py).

celery - Tasks that need to run in priority

In my website users can UPDATE they profile (manual) every time he want, or automatic once a day.
This task is being distributed with celery now.
But i have a "problem" :
Every day, in automatic update, a job put ALL users (+-6k users) on queue:
from celery import group
from tasks import *
import datetime
from lastActivityDate.models import UserActivity
today = datetime.datetime.today()
one_day = datetime.timedelta(days=5)
today -= one_day
print datetime.datetime.today()
user_list = UserActivity.objects.filter(last_activity_date__gte=today)
g = group(update_user_profile.s(i.user.auth.username) for i in user_list)
print datetime.datetime.today()
print g(user_list.count()).get()
If someone try to do the manual update, they will enter on te queue and last forever to be executed.
Is there a way to set this manual task to run in a piority way?
Or make a dedicated for each separated queue: manual and automatic?
Celery does not support task priority. (v3.0)
http://docs.celeryproject.org/en/master/faq.html#does-celery-support-task-priorities
You may solve this problem by routing tasks.
http://docs.celeryproject.org/en/latest/userguide/routing.html
Prepare default and priority_high Queue.
from kombu import Queue
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
Queue('default'),
Queue('priority_high'),
)
Run two daemon.
user#x:/$ celery worker -Q priority_high
user#y:/$ celery worker -Q default,priority_high
And route task.
your_task.apply_async(args=['...'], queue='priority_high')
If you use RabbitMQ transport then configure your queues the following way:
settings.py
from kombu import Queue
...
CELERY_TASK_QUEUES = (
Queue('default', routing_key='task_default.#', max_priority=10),
...)
Then run your tasks:
my_low_prio_task.apply_async(args=(...), priority=1)
my_high_prio_task.apply_async(args=(...), priority=10)
Presently this code works for kombu==4.6.11, celery==4.4.6.