How to run all celery tasks without workers, I mean call directly?
I can call task with TaskName.run(), but I want to write this in configurations, so how to make it?
Just set the CELERY_ALWAYS_EAGER settings to true, this will force celery not to queue the tasks and run them synchronously in the current process.
If you want to be able to do it per specific task, you can run them with apply() or run() as you mentioned, instead of running them with apply_async() or delay().
So tl;dr:
CELERY_ALWAYS_EAGER = True
# The following two would do and act the same, processing synchronously
my_task.run()
my_task.delay()
But
CELERY_ALWAYS_EAGER = False
# These two won't be the same anymore.
my_task.run() # Runs synchronously
my_task.delay() # Passed to the queue and runs Asynchronously, in another process
If I understand you right, you want to call the task synchronously.
Just call the method as normal:
TaskName()
You only need to use delay when you want to send it to the worker.
In complement to SpiXel 's answer, from this answer, CELERY_ALWAYS_EAGER has been renamed to CELERY_TASK_ALWAYS_EAGER in versions 4.0+. Worked for me with Django 1.11+Celery 4.1.0. So...
CELERY_TASK_ALWAYS_EAGER = False #assync
CELERY_TASK_ALWAYS_EAGER = True #serial
Related
like this I added in my celery.py
#app.task(bind=True)
def execute_analysis(id_=1):
task1 = group(news_event_task.si(i) for i in range(10))
task2 = group(parallel_task.si(i) for i in range(10))
return chain(task1, task2)()
Problem: You are calling too many functions(tasks) in same process sequentially so if any task (scrapping news data) gets blocked all other will keep waiting and might go in block state.
Solution : A better design would be to run news_event_task in delay and and with each news_event_task if you want to call parallel_task then both can be done in same process. So now all tasks will run in parallel ( Use celery eventlet to achieve this).
Another approach could be send these tasks in queue (rather than keeping its sequence in memory) and then process each news_event_task one by one.
My django application currently takes in a file, and reads in the file line by line, for each line there's a celery task that delegates processing said line.
Here's kinda what it look slike
File -> For each line in file -> celery_task.delay(line)
Now then, I also have other celery tasks that can be triggered by the user for example:
User input line -> celery_task.delay(line)
This of course isn't strictly the same task, the user can in essence invoke any celery task depending on what they do (signals also invoke some tasks as well)
Now the problem that I'm facing is, when a user uploads a relatively large file, my redis queue gets boggled up with processing the file, when the user does anything, their task will be delegated and executed only after the file's celery_task.delay() tasks are done executing. My question is, is it possible to reserve a set amount of workers or delay a celery task with a "higher" priority and overwrite the queue?
Here's in general what the code looks like:
#app.task(name='process_line')
def process_line(line):
some_stuff_with_line(line)
do_heavy_logic_stuff_with_line(line)
more_stuff_here(line)
obj = Data.objects.make_data_from_line(line)
serialize_object.delay(obj.id)
return obj.id
#app.task(name='serialize_object')
def serialize_object(important_id):
obj = Data.objects.get(id=important_id)
obj.pre_serialized_json = DataSerializer(obj).data
obj.save()
#app.task(name='process_file')
def process_file(file_id):
ingested_file = IngestedFile.objects.get(id=file_id)
for line in ingested_file.get_all_lines():
process_line.delay(line)
Yes you can create multiple queues, and then you can decide to route your tasks to those queues running on multiple workers or single worker. By default all tasks go to default queue named as celery. Check Celery documentation on Routing Tasks to get more information and some examples.
I'm writing a healthcheck endpoint for my web service.
The end point calls a series of functions which return True if the component is working correctly:
The system is considered to be working if all the components are working:
def is_health():
healthy = all(r for r in (database(), cache(), worker(), storage()))
return healthy
When things aren't working, the functions may take a long time to return. For example if the database is bogged down with slow queries, database() could take more than 30 seconds to return.
The healthcheck endpoint runs in the context of a Django view, running inside a uWSGI container. If the request / response cycle takes longer than 30 seconds, the request is harakiri-ed!
This is a huge bummer, because I lose all contextual information that I could have logged about which component took a long time.
What I'd really like, is for the component functions to run within a timeout or a deadline:
with timeout(seconds=30):
database_result = database()
cache_result = cache()
worker_result = worker()
storage_result = storage()
In my imagination, as the deadline / harakiri timeout approaches, I can abort the remaining health checks and just report the work I've completely.
What's the right way to handle this sort of thing?
I've looked at threading.Thread and Queue.Queue - the idea being that I create a work and result queue, and then use a thread to consume the work queue while placing the results in result queue. Then I could use the thread's Thread.join function to stop processing the rest of the components.
The one challenge there is that I'm not sure how to hard exit the thread - I wouldn't want it hanging around forever if it didn't complete it's run.
Here is the code I've got so far. Am I on the right track?
import Queue
import threading
import time
class WorkThread(threading.Thread):
def __init__(self, work_queue, result_queue):
super(WorkThread, self).__init__()
self.work_queue = work_queue
self.result_queue = result_queue
self._timeout = threading.Event()
def timeout(self):
self._timeout.set()
def timed_out(self):
return self._timeout.is_set()
def run(self):
while not self.timed_out():
try:
work_fn, work_arg = self.work_queue.get()
retval = work_fn(work_arg)
self.result_queue.put(retval)
except (Queue.Empty):
break
def work(retval, timeout=1):
time.sleep(timeout)
return retval
def main():
# Two work items that will take at least two seconds to complete.
work_queue = Queue.Queue()
work_queue.put_nowait([work, 1])
work_queue.put_nowait([work, 2])
result_queue = Queue.Queue()
# Run the `WorkThread`. It should complete one item from the work queue
# before it times out.
t = WorkThread(work_queue=work_queue, result_queue=result_queue)
t.start()
t.join(timeout=1.1)
t.timeout()
results = []
while True:
try:
result = result_queue.get_nowait()
results.append(result)
except (Queue.Empty):
break
print results
if __name__ == "__main__":
main()
Update
It seems like in Python you've got a few options for timeouts of this nature:
Use SIGALARMS which work great if you have full control of the signals used by the process but probably are a mistake when you're running in a container like uWSGI.
Threads, which give you limited timeout control. Depending on your container environment (like uWSGI) you might need to set options to enable them.
Subprocesses, which give you full timeout control, but you need to be conscious of how they might change how your service consumes resources.
Use existing network timeouts. For example, if part of your healthcheck is to use Celery workers, you could rely on AsyncResult's timeout parameter to bound execution.
Do nothing! Log at regular intervals. Analyze later.
I'm exploring the benefits of these different options more.
Update #2
I put together a GitHub repo with quite a bit more information on the topic:
https://github.com/johnboxall/pytimeout
I'll type it up into a answer one day but the TLDR is here:
https://github.com/johnboxall/pytimeout#recommendations
I have a long-running process that must run every five minutes, but more than one instance of the processes should never run at the same time. The process should not normally run past five min, but I want to be sure that a second instance does not start up if it runs over.
Per a previous recommendation, I'm using Django Celery to schedule this long-running task.
I don't think a periodic task will work, because if I have a five minute period, I don't want a second task to execute if another instance of the task is running already.
My current experiment is as follows: at 8:55, an instance of the task starts to run. When the task is finishing up, it will trigger another instance of itself to run at the next five min mark. So if the first task finished at 8:57, the second task would run at 9:00. If the first task happens to run long and finish at 9:01, it would schedule the next instance to run at 9:05.
I've been struggling with a variety of cryptic errors when doing anything more than the simple example below and I haven't found any other examples of people scheduling tasks from a previous instance of itself. I'm wondering if there is maybe a better approach to doing what I am trying to do. I know there's a way to name one's tasks; perhaps there's a way to search for running or scheduled instances with the same name? Does anyone have any advice to offer regarding running a task every five min, but ensuring that only one task runs at a time?
Thank you,
Joe
In mymodule/tasks.py:
import datetime
from celery.decorators import task
#task
def test(run_periodically, frequency):
run_long_process()
now = datetime.datetime.now()
# Run this task every x minutes, where x is an integer specified by frequency
eta = (
now - datetime.timedelta(
minutes = now.minute % frequency , seconds = now.second,
microseconds = now.microsecond ) ) + datetime.timedelta(minutes=frequency)
task = test.apply_async(args=[run_periodically, frequency,], eta=eta)
From a ./manage.py shell:
from mymodule import tasks
result = tasks.test.apply_async(args=[True, 5])
You can use periodic tasks paired with a special lock which ensures the tasks are executed one at a time. Here is a sample implementation from Celery documentation:
http://ask.github.com/celery/cookbook/tasks.html#ensuring-a-task-is-only-executed-one-at-a-time
Your described method with scheduling task from the previous execution can stop the execution of tasks if there will be failure in one of them.
I personally solve this issue by caching a flag by a key like task.name + args
def perevent_run_duplicate(func):
"""
this decorator set a flag to cache for a task with specifig args
and wait to completion, if during this task received another call
with same cache key will ignore to avoid of conflicts.
and then after task finished will delete the key from cache
- cache keys with a day of timeout
"""
#wraps(func)
def outer(self, *args, **kwargs):
if cache.get(f"running_task_{self.name}_{args}", False):
return
else:
cache.set(f"running_task_{self.name}_{args}", True, 24 * 60 * 60)
try:
func(self, *args, **kwargs)
finally:
cache.delete(f"running_task_{self.name}_{args}")
return outer
this decorator will manage task calls to prevent duplicate calls for a task by same args.
We use Celery Once and it has resolved similar issues for us. Github link - https://github.com/cameronmaske/celery-once
It has very intuitive interface and easier to incorporate that the one recommended in celery documentation.
I am trying to write a test that involves running a django-tasks task. The problem is I can't seem to get the tasks to go beyond the "scheduled" status.
I have set
DJANGOTASK_DEMON_THREAD = True
in my settings, for simplicity.
ptask = djangotasks.task_for_function(f)
djangotasks.run_task(ptask)
while ptask.status!='successful':
ptask = djangotasks.task_for_function(f)
print ptask.status
time.sleep(5)
This is what I'm attempting, which works well outside of tests.
edit: fixed typo
I think you didn't assign a task worker. In your django directory :
> python manage.py taskd run
your scheduled tasks would be executed by this "taskd".