in my Django project with celery, I have celery task function that needs to be received all incoming tasks but starts it step by step like Singleton.
I can do this like:
#shared_task(bind=True)
def make_some_task(self, event_id):
lock_name = os.path.join(settings.BASE_DIR, 'create_lock')
is_exists = os.path.exists(lock_name)
while is_exists:
time.sleep(10)
with open('create_lock', 'w') as file:
file.write('locked')
..... do some staff.....
os.remove(lock_name)
but I think this is not the correct way to use this inside Celery, I think must be the better way to implement this
Related
I have a Task (let's call it MainTask) that is scheduled using apply_async method, this task has some validations that can trigger another task (SecondaryTask) to be scheduled with an eta.
Every time the MainTask tries to schedule the SecondaryTask using apply_async method, the SecondaryTask runs inmediatly, overriding the eta parameter.
How can I schedule a different task from a "Main Task" and to be executed later, using eta?
Here is an example of the code:
views.py
def function():
main_task.apply_async(eta=some_day)
tasks.py
#app.task(bind=True, name="main_task", autoretry_for=(Exception,), default_retry_delay=10, max_retries=3, queue='mail')
def main_task(self):
...
if something:
...
another_task.apply_async(eta=tomorrow)
#app.task(bind=True, name="another_task", autoretry_for=(Exception,), default_retry_delay=10, max_retries=3, queue='mail')
def another_task(self):
do_something()
I'm using Celery 4.4.6 btw
What celery version are you using?
I'm with 4.4.7 and using countdown which is kind of the same (maybe there's a bug with eta?):
countdown is a shortcut to set ETA by seconds into the future.
assuming that you are doing something similar to this:
tomorrow = datetime.utcnow() + timedelta(days=1)
you can always get the seconds and use with countdown:
seconds = timedelta(days=1).total_seconds()
another_task.apply_async(countdown=seconds)
EDIT:
can you try to apply_async on signature to see if that makes a difference?
another_task.si().apply_async(eta=tomorrow)
I am using celery to run long running tasks on Hadoop. Each task executes a Pig script on Hadoop which runs for about 30 mins - 2 hours.
My current Hadoop setup has 4 queues a,b,c, and default. All tasks are currently being executed by a single worker which submits the job to a single queue.
I want to add 3 more workers which submit jobs to other queues, one worker per queue.
The problem is the queue is currently hard-coded and I wish to make this variable per worker.
I searched a lot but I am unable to find a way to pass each celery worker a different queue value and access it in my task.
I start my celery worker like so.
celery -A app.celery worker
I wish to pass some additional arguments in the command-line itself and access it in my task but celery complains that it doesn't understand my custom argument.
I plan to run all the workers on the same host by setting the --concurrency=3 parameter. Is there any solution to this problem?
Thanks!
EDIT
The current scenario is like this. Every I try to execute the task print_something by saying tasks.print_something.delay() it only prints queue C.
#celery.task()
def print_something():
print "C"
I need to have the workers print a variable letter based on what value I pass to them while starting them.
#celery.task()
def print_something():
print "<Variable Value Per Worker Here>"
Hope this helps someone.
Multiple problems needed solving for this problem.
The first step involved adding support in celery for the custom parameter. If this is not done, celery will complain that it doesn't understand the parameter.
Since I am running celery with Flask, I initialize celery like so.
def configure_celery():
app.config.update(
CELERY_BROKER_URL='amqp://:#localhost:5672',
RESULT_BACKEND='db+mysql://root:#localhost:3306/<database_name>'
)
celery = Celery(app.import_name, backend=app.config['RESULT_BACKEND'],
broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)
TaskBase = celery.Task
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
with app.app_context():
return TaskBase.__call__(self, *args, **kwargs)
celery.Task = ContextTask
return celery
I call this function to initialize celery and store it in a variable called celery.
celery = configure_celery()
To add the custom parameter you need to do the following.
def add_hadoop_queue_argument_to_worker(parser):
parser.add_argument(
'--hadoop-queue', help='Hadoop queue to be used by the worker'
)
The celery used below is the one we obtained from above steps.
celery.user_options['worker'].add(add_hadoop_queue_argument_to_worker)
The next step would be to make this argument accessible in the worker. To do that follow these steps.
class HadoopCustomWorkerStep(bootsteps.StartStopStep):
def __init__(self, worker, **kwargs):
worker.app.hadoop_queue = kwargs['hadoop_queue']
Inform celery to use this class for creating the workers.
celery.steps['worker'].add(HadoopCustomWorkerStep)
The tasks should now be able to access the variables.
#app.task(bind=True)
def print_hadoop_queue_from_config(self):
print self.app.hadoop_queue
Verify it by running the worker on the command-line.
celery -A app.celery worker --concurrency=1 --hadoop-queue=A -n aworker#%h
celery -A app.celery worker --concurrency=1 --hadoop-queue=B -n bworker#%h
celery -A app.celery worker --concurrency=1 --hadoop-queue=C -n cworker#%h
celery -A app.celery worker --concurrency=1 --hadoop-queue=default -n defaultworker#%h
What I usually do is, after starting the workers (the tasks are not executed) in another script (say manage.py) I add commands with parameters to start specific tasks or tasks with different arguments.
in manager.py:
from tasks import some_task
#click.command
def run_task(params):
some_task.apply_async(params)
And this will start the tasks as needed.
My Celery queue has hundreds of tasks with countdowns that will make them trigger over the next few hours. Is there a way to have these tasks run immediately such that the queue is effectively flushed?
I'm currently planning an upgrade to our server and I want to make sure that there are no background tasks running while the upgrade completes. If I have to wait for these countdowns, that's OK, but I'd rather force the tasks to run instead.
Another option could be to pause processing of the queue until the upgrade is complete, but flushing seems like a better option.
EDIT: I've figured out how to find a list of tasks that are scheduled:
from celery.task.control import inspect
i = inspect()
tasks = i.scheduled()
Now I just need to sort out how to force their execution.
OK, I'm fairly certain I've sorted out roughly how to do this. I'm making this answer a wiki and putting down my notes, in case anybody wants to tune up the general process here.
The general idea is this:
Stop adding new items to the queue.
Determine any tasks that are queued.
Revoke all those tasks using result.revoke().
Re-start those tasks using some saved state.
Note that this doesn't support adding an eta to the items once you re-queue them, as that's probably implementation-specific.
So, to figure out what tasks are queued, you do:
from celery.task.control import inspect
i = inspect()
scheduled_tasks = i.scheduled()
Which returns a dict, like so:
{u'w1.courtlistener.com': [{u'eta': 1414435210.198864,
u'priority': 6,
u'request': {u'acknowledged': False,
u'args': u'(2745724,)',
u'delivery_info': {u'exchange': u'celery',
u'priority': None,
u'routing_key': u'celery'},
u'hostname': u'w1.courtlistener.com',
u'id': u'99bc8650-3be1-4d24-81d6-a882d77a8b25',
u'kwargs': u'{}',
u'name': u'citations.tasks.update_document_by_id',
u'time_start': None,
u'worker_pid': None}}]}
The next step is to revoke all those tasks, with something like:
from celery.task.control import revoke
with open('revoked_tasks.csv', 'w') as f:
for worker, tasks in scheduled_tasks.iteritems():
print "Now processing worker: %s" % worker
for task in tasks:
print "Now revoking task: %s. %s with args: %s and kwargs: %s" % \
(task['request']['id'], task['request']['name'], task['request']['args'], task['request']['kwargs'])
f.write('%s|%s|%s|%s|%s\n' % (worker, task['request']['name'], task['request']['id'], task['request']['args'], task['request']['kwargs']))
revoke(task['request']['id'], terminate=True)
Then, finally, re-run the tasks as you would normally, loading them from your CSV file:
with open('revoked_tasks', 'r') as f:
for line in f:
worker, command, id, args, kwargs = line.split("|")
# Impost task here, something like...
package, module = command.rsplit('.', 1)
mod = __import__(package, globals(), locals(), [module])
# Run the commands, something like...
mod.__get_attribute__(module).delay(args*, kwargs**)
In my website users can UPDATE they profile (manual) every time he want, or automatic once a day.
This task is being distributed with celery now.
But i have a "problem" :
Every day, in automatic update, a job put ALL users (+-6k users) on queue:
from celery import group
from tasks import *
import datetime
from lastActivityDate.models import UserActivity
today = datetime.datetime.today()
one_day = datetime.timedelta(days=5)
today -= one_day
print datetime.datetime.today()
user_list = UserActivity.objects.filter(last_activity_date__gte=today)
g = group(update_user_profile.s(i.user.auth.username) for i in user_list)
print datetime.datetime.today()
print g(user_list.count()).get()
If someone try to do the manual update, they will enter on te queue and last forever to be executed.
Is there a way to set this manual task to run in a piority way?
Or make a dedicated for each separated queue: manual and automatic?
Celery does not support task priority. (v3.0)
http://docs.celeryproject.org/en/master/faq.html#does-celery-support-task-priorities
You may solve this problem by routing tasks.
http://docs.celeryproject.org/en/latest/userguide/routing.html
Prepare default and priority_high Queue.
from kombu import Queue
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
Queue('default'),
Queue('priority_high'),
)
Run two daemon.
user#x:/$ celery worker -Q priority_high
user#y:/$ celery worker -Q default,priority_high
And route task.
your_task.apply_async(args=['...'], queue='priority_high')
If you use RabbitMQ transport then configure your queues the following way:
settings.py
from kombu import Queue
...
CELERY_TASK_QUEUES = (
Queue('default', routing_key='task_default.#', max_priority=10),
...)
Then run your tasks:
my_low_prio_task.apply_async(args=(...), priority=1)
my_high_prio_task.apply_async(args=(...), priority=10)
Presently this code works for kombu==4.6.11, celery==4.4.6.
Recently, I have been going though celery & kombu documentation as i need them integrated in one of my projects. I have a basic understanding of how this should work but documentation examples using different brokers have me confused.
Here is the scenario:
Within my application i have two views ViewA and ViewB both of them does some expensive processing, so i wanted to have them use celery tasks for processing. So this is what i did.
views.py
def ViewA(request):
tasks.do_task_a.apply_async(args=[a, b])
def ViewB(request):
tasks.do_task_b.apply_async(args=[a, b])
tasks.py
#task()
def do_task_a(a, b):
# Do something Expensive
#task()
def do_task_b(a, b):
# Do something Expensive here too
Until now, everything is working fine. The problem is that do_task_a creates a txt file on the system, which i need to use in do_task_b. Now, in the do_task_b method i can check for the file existence and call the tasks retry method [which is what i am doing right now] if the file does not exist.
Here, I would rather want to take a different approach (i.e. where messaging comes in). I would want do_task_a to send a message to do_task_b once the file has been created instead of looping the retry method until the file is created.
I read through the documentation of celery and kombu and updated my settings as follows.
BROKER_URL = "django://"
CELERY_RESULT_BACKEND = "database"
CELERY_RESULT_DBURI = "sqlite:///celery"
TASK_RETRY_DELAY = 30 #Define Time in Seconds
DATABASE_ROUTERS = ['portal.db_routers.CeleryRouter']
CELERY_QUEUES = (
Queue('filecreation', exchange=exchanges.genex, routing_key='file.create'),
)
CELERY_ROUTES = ('celeryconf.routers.CeleryTaskRouter',)
and i am stuck here.
don't know where to go from here.
What should i do next to make do_task_a to broadcast a message to do_task_b on file creation ? and what should i do to make do_task_b receive (consume) the message and process the code further ??
Any Ideas and suggestions are welcome.
This is a good example for using Celery's callback/linking function.
Celery supports linking tasks together so that one task follows another.
You can read more about it here
apply_async() functions has two optional arguments
+link : excute the linked function on success
+link_error : excute the linked function on an error
#task
def add(a, b):
return a + b
#task
def total(numbers):
return sum(numbers)
#task
def error_handler(uuid):
result = AsyncResult(uuid)
exc = result.get(propagate=False)
print('Task %r raised exception: %r\n%r' % (exc, result.traceback))
Now in your calling function do something like
def main():
#for error_handling
add.apply_async((2, 2), link_error=error_handler.subtask())
#for linking 2 tasks
add.apply_async((2, 2), link=add.subtask((8, )))
# output 12
#what you can do is your case is something like this.
if user_requires:
add.apply_async((2, 2), link=add.subtask((8, )))
else:
add.apply_async((2, 2))
Hope this is helpful