How to purge all celery subtasks from parent task? - python-2.7

In my sharedtask I call several subtasks(...).apply_async(). Thus, both the parent task and the subtasks have their own task_id.
When I cancel the entire operation, I call the revoke of all active tasks and it works correctly. But as soon as cores are released, the queue moves on, executing the next subtasks.
How do I programmatically clear the queue, thereby preventing the next subtasks from being executed?

You could try the following:
app.control.revoke(task_id, terminate=True)
and start the task with:
task_id = subtasks(...).apply_async()
I'm not sure if it works with subtasks to save the task id in this way, but with normal tasks it works well, so it is worth trying.

Related

Reusing a database record created by means of Celery task

There is a task which creates database record {R) when it runs for the first time. When task is started second time it should read database record, perform some calculations and call external API. First and second start happens in a loop
In case of single start of the task there are no problems, but in the case of loops (at each loop's iteration the new task is created and starts at certain time) there is a problem. In the task queue (for it we use a flower) we have crashed task on every second iteration.
If we add, at the and of the loop time.sleep(1) sometimes the tasks work properly, but sometimes - not. How to avoid this problem? We afraid that task for different combination of two users started at the same time also will be crashed.
Is there some problem with running tasks in Celery simultaneously? Or something we should consider, tasks are for scheduled payments so they have to work rock solid

(Django) RQ scheduler - Jobs disappearing from queue

Since my project has so many moving parts.. probably best to explain the symptom
I have 1 scheduler running on 1 queue. I add scheduled jobs ( to be executed within seconds of the scheduling).
I keep repeating scheduling of jobs with NO rq worker doing anything (in fact, the process is completely off). In another words, the queue should just be piling up.
But ALL of a sudden.. the queue gets chopped off (randomly) and first 70-80% of jobs just disappear.
Does this have anything to do with:
the "max length" of queue? (but i dont recall seeing any limits)
does the scheduler automatically "discard" jobs where the start time
is BEFORE the current time?
ran my own experiment. RQ scheduler does indeed remove jobs whose start date < now.

CompletableFuture.runAsync at fixed rate

The CompletableFuture is very powerful when it comes to joining futures. Among other advantages (execute something when the task finishes, execute something on an exception, etc) it has the option to run tasks in the background using runAsync.
What it lacks though is the possibility to have a task run periodically, similar to ScheduledExecutorService.scheduleAtFixedRate.
Does anyone know how to have a task running periodically using a CompletableFuture? I tried using an endless loop in the task itself, however one loses the option to cancel a task using the future's cancel method.

Standard way to wait for all tasks to finish before exiting

I was wondering - is there a straightforward way to wait for all tasks to finish running before exiting without keeping track of all the ObjectIDs (and get()ing them)? Use case is when I launch #remotes for saving output, for example, where there is no return result needed. It's just extra stuff to keep track of if I have to store those futures.
Currently there is no standard way to block until all tasks have finished.
There are some workarounds that can be used.
Keep track of all of the object IDs in a list object_ids and then call ray.get(object_ids) or ray.wait(object_ids, num_returns=len(object_ids)).
Loop as long as some resources are being used.
import time
while (ray.global_state.cluster_resources() !=
ray.global_state.available_resources()):
time.sleep(1)
The above code will loop until it detects that no tasks are currently being executed. However this is not a foolproof approach. It's possible that there could be a moment in time when no tasks are running but a scheduler a task is about to start running.

Django-celery - How to execute tasks serially?

I'm using Celery 3.1. I need to only execute the next task when the last one is finish. How can I assure that there are not two tasks working at the same time? I've read the documentation but it is not clear for me.
I've the following scheme:
Task Main
- Subtask 1
- Subtask 2
I need that when I call "Task Main" the process will run till the end(Subtask 2) without any new "Task Main" starting.
How can I assure this?
One strategy is through the use of locks. The Celery Task Cookbook has an example at http://docs.celeryproject.org/en/latest/tutorials/task-cookbook.html.
If I understand you want to execute only MainTask one by one, and you want to call subtasks in your MainTask. Without creating separate queues and at least 2 separate workers this is impossible. Because if you will store in same queue all tasks looks for celery as same tasks.
So solution for is:
map MainTask to main_queue
Start separate worker for this queue like:
celeryd --concurrency=1 --queue=main_queue
map subtasks to sub_queue
Start separate worker for this queue
celeryd --queue=sub_queue
Should work!
But I think this is complecated architecture, may be you can make it much easier if you will redesign your process.
Also you can find this useful (it works for you but it could run parallel MainTask):
You should try to use chains, here is an example on Celery's docs: http://docs.celeryproject.org/en/latest/userguide/tasks.html#avoid-launching-synchronous-subtasks.