CELERY_ROUTES seems to be ignored - python-2.7

I'm using django and djcelery. Since some tasks are a bit heavy I want them in a separate queue.
According to the part about automatic routing in the user guide, you can achieve this through the settings variable CELERY_ROUTES. However this doesn't work, all tasks end up on the default queue. When I explicitly set the queue variable in my task class it DOES work.
In my settings file I have
CELERY_ROUTES = {"analysis.tasks.analyze_item": {"queue": "analysis_queue"}}
I start the celery worker with
python manage.py celeryd -Q analysis_queue
My task.py
class Analyze(Task):
queue = 'analysis_queue' # <--- without this it doesn't work..
....
analyze_item = registry.tasks[Analyze.name]

Related

Initializing Different Celery Workers with Different Values

I am using celery to run long running tasks on Hadoop. Each task executes a Pig script on Hadoop which runs for about 30 mins - 2 hours.
My current Hadoop setup has 4 queues a,b,c, and default. All tasks are currently being executed by a single worker which submits the job to a single queue.
I want to add 3 more workers which submit jobs to other queues, one worker per queue.
The problem is the queue is currently hard-coded and I wish to make this variable per worker.
I searched a lot but I am unable to find a way to pass each celery worker a different queue value and access it in my task.
I start my celery worker like so.
celery -A app.celery worker
I wish to pass some additional arguments in the command-line itself and access it in my task but celery complains that it doesn't understand my custom argument.
I plan to run all the workers on the same host by setting the --concurrency=3 parameter. Is there any solution to this problem?
Thanks!
EDIT
The current scenario is like this. Every I try to execute the task print_something by saying tasks.print_something.delay() it only prints queue C.
#celery.task()
def print_something():
print "C"
I need to have the workers print a variable letter based on what value I pass to them while starting them.
#celery.task()
def print_something():
print "<Variable Value Per Worker Here>"
Hope this helps someone.
Multiple problems needed solving for this problem.
The first step involved adding support in celery for the custom parameter. If this is not done, celery will complain that it doesn't understand the parameter.
Since I am running celery with Flask, I initialize celery like so.
def configure_celery():
app.config.update(
CELERY_BROKER_URL='amqp://:#localhost:5672',
RESULT_BACKEND='db+mysql://root:#localhost:3306/<database_name>'
)
celery = Celery(app.import_name, backend=app.config['RESULT_BACKEND'],
broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)
TaskBase = celery.Task
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
with app.app_context():
return TaskBase.__call__(self, *args, **kwargs)
celery.Task = ContextTask
return celery
I call this function to initialize celery and store it in a variable called celery.
celery = configure_celery()
To add the custom parameter you need to do the following.
def add_hadoop_queue_argument_to_worker(parser):
parser.add_argument(
'--hadoop-queue', help='Hadoop queue to be used by the worker'
)
The celery used below is the one we obtained from above steps.
celery.user_options['worker'].add(add_hadoop_queue_argument_to_worker)
The next step would be to make this argument accessible in the worker. To do that follow these steps.
class HadoopCustomWorkerStep(bootsteps.StartStopStep):
def __init__(self, worker, **kwargs):
worker.app.hadoop_queue = kwargs['hadoop_queue']
Inform celery to use this class for creating the workers.
celery.steps['worker'].add(HadoopCustomWorkerStep)
The tasks should now be able to access the variables.
#app.task(bind=True)
def print_hadoop_queue_from_config(self):
print self.app.hadoop_queue
Verify it by running the worker on the command-line.
celery -A app.celery worker --concurrency=1 --hadoop-queue=A -n aworker#%h
celery -A app.celery worker --concurrency=1 --hadoop-queue=B -n bworker#%h
celery -A app.celery worker --concurrency=1 --hadoop-queue=C -n cworker#%h
celery -A app.celery worker --concurrency=1 --hadoop-queue=default -n defaultworker#%h
What I usually do is, after starting the workers (the tasks are not executed) in another script (say manage.py) I add commands with parameters to start specific tasks or tasks with different arguments.
in manager.py:
from tasks import some_task
#click.command
def run_task(params):
some_task.apply_async(params)
And this will start the tasks as needed.

Running celery task when celery beat starts

How do I schedule a task to run when I start celery beat then again in 1 hours and so.
Currently I have schedule in settings.py:
CELERYBEAT_SCHEDULE = {
'update_database': {
'task': 'myapp.tasks.update_database',
'schedule': timedelta(seconds=60),
},
}
I saw a post from 1 year here on stackoverflow asking the same question:
How to run celery schedule instantly?
However this does not work for me, because my celery worker get 3-4 requests for the same task, when I run django server
I'm starting my worker and beat like this:
celery -A dashboard_web worker -B --loglevel=INFO --concurrency=10
Crontab schedule
You could try to use a crontab schedule instead which will run every hour and start 1 min after initialization of the scheduler. Warning: you might want to do it a couple of minutes later in case it takes longer to start, otherwise you might need to wait the full hour.
from celery.schedules import crontab
from datetime import datetime
CELERYBEAT_SCHEDULE = {
'update_database': {
'task': 'myapp.tasks.update_database',
'schedule': crontab(minute=(datetime.now().minute + 1) % 60),
},
}
Reference: http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#crontab-schedules
Ready method of MyAppConfig
In order to ensure that your task is run right away, you could use the same method as before to create the periodic task without adding 1 to the minute. Then, you call your task in the ready method of MyAppConfig which is called whenever your app is ready.
#myapp/apps.py
class MyAppConfig(AppConfig):
name = "myapp"
def ready(self):
from .tasks import update_database
update_database.delay()
Please note that you could also create the periodic task directly in the ready method if you were to use django_celery_beat.
Edit: Didn't see that the second method was already covered in the link you mentioned. I'll leave it here in case it is useful for someone else arriving here.
Try setting the configuration parameter CELERY_ALWAYS_EAGER = True
Something like this
app.conf.CELERY_ALWAYS_EAGER = True

celery parallel tasking error 'no result backend configured'

Running django-celery 3.1.16, Celery 3.1.17, Django 1.4.16. Trying to run some parallel tasks using 3 workers and collect the results using the following:
from celery import group
positions = []
jobs = group(celery_calculate_something.s(data.id) for data in a_very_big_list)
results = jobs.apply_async()
positions.extend(results.get())
The task celery_calculate_something returns an object to place the in the results list:
app.task(ignore_result=False)
def celery_calculate_something(id):
<do stuff>
No matter what I try, I always get the same result when calling get() on results:
No result backend configured. Please see the documentation for more information.
However, the results backend IS configured - I have many other tasks with ignore_result=False merrily adding to the tasks meta table in django_celery. It is something to do with using the results returned from group(). I should note it is not set explicitly in settings - it seems that django-celery has set it automatically for you.
I have the worker collecting events using:
manage.py celery worker -l info -E
and celerycam running with
python manage.py celerycam
Inspecting the results object returned (an instance of GroupResult) I can see that the backend attr is an instance of DisabledBackend. Is this the problem? What have I mis-understood?
You did not configure the results backend, so basically you need tables to store the results, since you have django-celery add it to INSTALLED_APPS in your settings.py file and then perform the migration (python manage.py migrate) After that open your celery.py file and modify your backend to djcelery.backends.database:DatabaseBackend. Here's an example
app = Celery('almanet',
broker='amqp://guest#localhost//',
backend='djcelery.backends.database:DatabaseBackend',
include=['alm_crm.tasks'] #References your tasks. Donc forget to put the whole absolute path.
)
After that you can import results from celery import result Now you can save the result and extract the result by job.id
from celery import group
positions = []
jobs = group(celery_calculate_something.s(data.id) for data in
a_very_big_list)
results = jobs.apply_async()
results.save()
some_task_result = result.GroupResult.restore(results.id)
print some_task_results.ready()

celery - Tasks that need to run in priority

In my website users can UPDATE they profile (manual) every time he want, or automatic once a day.
This task is being distributed with celery now.
But i have a "problem" :
Every day, in automatic update, a job put ALL users (+-6k users) on queue:
from celery import group
from tasks import *
import datetime
from lastActivityDate.models import UserActivity
today = datetime.datetime.today()
one_day = datetime.timedelta(days=5)
today -= one_day
print datetime.datetime.today()
user_list = UserActivity.objects.filter(last_activity_date__gte=today)
g = group(update_user_profile.s(i.user.auth.username) for i in user_list)
print datetime.datetime.today()
print g(user_list.count()).get()
If someone try to do the manual update, they will enter on te queue and last forever to be executed.
Is there a way to set this manual task to run in a piority way?
Or make a dedicated for each separated queue: manual and automatic?
Celery does not support task priority. (v3.0)
http://docs.celeryproject.org/en/master/faq.html#does-celery-support-task-priorities
You may solve this problem by routing tasks.
http://docs.celeryproject.org/en/latest/userguide/routing.html
Prepare default and priority_high Queue.
from kombu import Queue
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
Queue('default'),
Queue('priority_high'),
)
Run two daemon.
user#x:/$ celery worker -Q priority_high
user#y:/$ celery worker -Q default,priority_high
And route task.
your_task.apply_async(args=['...'], queue='priority_high')
If you use RabbitMQ transport then configure your queues the following way:
settings.py
from kombu import Queue
...
CELERY_TASK_QUEUES = (
Queue('default', routing_key='task_default.#', max_priority=10),
...)
Then run your tasks:
my_low_prio_task.apply_async(args=(...), priority=1)
my_high_prio_task.apply_async(args=(...), priority=10)
Presently this code works for kombu==4.6.11, celery==4.4.6.

Running periodic tasks with django and celery

I'm trying create a simple background periodic task using Django-Celery-RabbitMQ combination. I installed Django 1.3.1, I downloaded and setup djcelery. Here is how my settings.py file looks like:
BROKER_HOST = "127.0.0.1"
BROKER_PORT = 5672
BROKER_VHOST = "/"
BROKER_USER = "guest"
BROKER_PASSWORD = "guest"
....
import djcelery
djcelery.setup_loader()
...
INSTALLED_APPS = (
'djcelery',
)
And I put a 'tasks.py' file in my application folder with the following contents:
from celery.task import PeriodicTask
from celery.registry import tasks
from datetime import timedelta
from datetime import datetime
class MyTask(PeriodicTask):
run_every = timedelta(minutes=1)
def run(self, **kwargs):
self.get_logger().info("Time now: " + datetime.now())
print("Time now: " + datetime.now())
tasks.register(MyTask)
And then I start up my django server (local development instance):
python manage.py runserver
Then I start up the celerybeat process:
python manage.py celerybeat --logfile=<path_to_log_file> -l DEBUG
I can see entries like this in the log:
[2012-04-29 07:50:54,671: DEBUG/MainProcess] tasks.MyTask sent. id->72a5963c-6e15-4fc5-a078-dd26da663323
And I also can see the corresponding entries getting created in database, but I can't find where it is logging the text I specified in the actual run function in MyTask class.
I tried fiddling with the logging settings, tried using the django logger instead of celery logger, but of no use. I'm not even sure, my task is getting executed. If I print any debug information in the task, where does it go?
Also, this is first time I'm working with any type of message queuing system. It looks like the task will get executed as part of the celerybeat process - outside the django web framework. Will I still be able to access all the django models I created.
Thanks,
Venkat.
Celerybeat it stuff, which pushes task when it need, but not executing them. You tasks instances stored in RabbitMq server. You need to execute celeryd daemon for executing your tasks.
python manage.py celeryd --logfile=<path_to_log_file> -l DEBUG
Also if you using RabbitMq, I recommend to you to install special rabbitmq management plugins:
rabbitmq-plugins list
rabbitmq-enable rabbitmq_management
service rabbitmq-server restart
It will be available at http://:55672/ login: guest pass: guest. Here you can check how many tasks in your rabbit instance online.
You should check the RabbitMQ logs, since celery sends the tasks to RabbitMQ and it should execute them. So all the prints of the tasks should be in RabbitMQ logs.