How to enqueue a periodic task if it gets terminated in celery? - django

Let's say there is a periodic task scheduled to run every hour. A worker receives the tasks and starts processing. While the task is being processed, the celeryd process (controlled via supervisord) gets restarted (supervisorctl restart all). Even though the task had never finished execution, it won't get re-executed.
How can I re-queue the periodic task right away and prevent the multiple versions of the tasks run at the same time?

There may be a nicer way to do it, but you could just use the periodic task to create a regular task in the queue (e.g., my_actual_task.defer(…)) which will not be removed from the queue until it is completed (assuming you are using acks_late).
If you're not using acks_late, you can put the bulk of the task in a try, and in the corresponding finally put a my_actual_task.retry().
Either way, you should generally avoid killing workers without giving them a chance to finish up what they're doing.

Related

Deleting a Google Cloud Task does not stop task running

I have a task queue which users can push tasks onto, only one task can run at a time enforced by the concurrency setting for the queue. In some cases (e.g. long running task) they may wish to cancel a running task in order to free up the queue to process the next task.
To achieve this I have been running the task queue as a Flask application, should a user wish to cancel a task I call the delete_task method of the python client library for a given queue & task.
However, I am seeing that the underlying task continues to be processed even after the task has been deleted. Have been trying to find documentation of how Cloud Tasks handles a task being deleted, but haven't found anything concrete.
Hoping that i'd be able to listen for a signal of some sort in order to gracefully shut down the process if a deletion is received. Or that the underlying process would be killed if the parent task is deleted.
Has anyone worked with the Cloud Tasks API before? Is it correct to assume that a deleted task will cleanup any processes that are running?
I don't see how a worker would be able to find out that the task it is working on has been deleted.
In the eyes of the worker, a task is an incoming Http request. I don't know how the Queue could tell that specific process to stop. I'm fairly certain that "deleting" a task just removes it from the Queue only.
You'd have to build a custom 'cancel' function that would be able to reach out to this worker.
Or this worker would have to periodically check with the Queue to see if its task still exists.
https://cloud.google.com/tasks/docs/reference/rest/v2/projects.locations.queues.tasks/get
https://googleapis.dev/python/cloudtasks/latest/gapic/v2/api.html#google.cloud.tasks_v2.CloudTasksClient.get_task
I'm not actually sure what the Queue will return if you try to call 'get task' a deleted task since i don't see a 'status' property for task. Maybe it will return an error like 'task does not exist'

AWS SWF Simple Workflow - Best Way to Keep Activity Worker Scripts Running?

The maximum amount of time the pollForActivityTask method stays open polling for requests is 60 seconds. I am currently scheduling a cron job every minute to call my activity worker file so that my activity worker machine is constantly polling for jobs.
Is this the correct way to have continuous queue coverage?
The way that the Java Flow SDK does it and the way that you create an ActivityWorker, give it a tasklist, domain, activity implementations, and a few other settings. You set both the setPollThreadCount and setTaskExecutorSize. The polling threads long poll and then hand over work to the executor threads to avoid blocking further polling. You call start on the ActivityWorker to boot it up and when wanting to shutdown the workers, you can call one of the shutdown methods (usually best to call shutdownAndAwaitTermination).
Essentially your workers are long lived and need to deal with a few factors:
New versions of Activities
Various tasklists
Scaling independently on tasklist, activity implementations, workflow workers, host sizes, etc.
Handle error cases and deal with polling
Handle shutdowns (in case of deployments and new versions)
I ended using a solution where I had another script file that is called by a cron job every minute. This file checks whether an activity worker is already running in the background (if so, I assume a workflow execution is already being processed on the current server).
If no activity worker is there, then the previous long poll has completed and we launch the activity worker script again. If there is an activity worker already present, then the previous poll found a workflow execution and started processing so we refrain from launching another activity worker.

Workflow of celery

I am a beginner with django, I have celery installed.
I am confused about the working of the celery, if the queued works are handled synchronously or asynchronously. Can other works be queued when the queued work is already being processed?
Celery is a task queuing system, that is backed by a message queuing system, Celery allows you to invoke tasks asynchronously, in a way that won't block your process for the task to finish, you can wait for the task to finish using the AsyncResult.get.
Other tasks can be queued while a task is being processed, and if Celery is running more than one process/thread (which is the default case), tasks will be executed in parallel to each others.
It is your responsibility to make sure that related tasks are executed in the correct order, e.g. if the output of a task A is an input to the other task B then you should make sure that you get the result from task A before you start the task B.
Read Avoid launching synchronous subtasks from Celery documentation.
I think you're possibly a bit confused about what Celery does.
Celery isn't really responsible for queueing at all. That is taken care of by the queue itself - RabbitMQ, Redis, or whatever. The only way Celery gets involved at this end is as a library that you call inside your app to serialize to task into something suitable for putting onto the queue. Since that is done by your web app, it is exactly as synchronous or asynchronous as your app itself: usually, in production, you'd have multiple processes running your site, each of those could put things onto the queue simultaneously, but each queueing action is done in-process.
The main point of Celery is the separate worker processes. This is where the asynchronous bit comes from: the workers run completely separately from your web app, and pick tasks off the queue as necessary. They are not at all involved in the process of putting tasks onto the queue in the first place.

What's common practice for enabling an locking mechanism for multiple SQS consumers in Django so I can be idempotent

SQS expects your application to be idempotent and I've got multiple consumers/producers where (even if SQS had a deliver-once mechanism) I will have race conditions creating duplicates and race conditions consuming because my consumers run via cron jobs.
My current plan is to use the Django 1.4 select_for_update which should block other consumers on the same row, doing something like:
reminders = EmailReminder.objects.select_for_update().filter(id=some_id)
if not reminders[0].finished:
reminder.send()
reminder.update(finished=datetime.now())
# Delete job.
Are there better ways of dealing with this?
Hook up django-celery to SQS and have it designate a periodic job using celerybeat. Then have celeryd worker(s) running on the same queue anywhere you want. Only one will pick up a job at a time and execute it. No need to introduce DB locking on any level.
As long as your worker is guaranteed to finish its current task before celerybeat fires a new one you will never have a need for a lock. Now if you think there is a chance they may overlap you can introduce states for your notifications where:
Any reminder starts in "unsent" state.
Your celerybeat sends a request to process unsent emails to the queue.
Some worker picks it up and grabs all of them.
Immediately the worker transitions all of them to "sending" state.
Proceeds to send them one at a time (or in bulk).
If sending fails for any, revert their state back to unsent.
For all that succeeded transition to sent.
This way if celerybeat fires another job while your original job is not done with the initial batch, you won't have duplicate emails sent. As an added bonus you can scale the solution and distribute the load.

Heroku Scheduler - why enqueue long-running jobs

The Heroku Scheduler documentation says:
Scheduled jobs are meant to execute short running tasks or enqueue longer running tasks into a background job queue. Anything that takes longer than a couple of minutes to complete should use a worker process to run
If the Scheduler starts a new dyno for these jobs and the cost is the same for a dyno vs. a worker, what is the advantage to adding a task to the queue and having a worker process run it?
It is an architectural best practice to only schedule, and not execute, interval tasks on the scheduler task (or your own custom clock process). The motivation for this is explained in the scheduled jobs article but, to summarize, you want your scheduler process/task to be as light-weight as possible since there should only be one of them. When you start overloading scheduling with execution you often run into schedule conflicts and erratic behavior.
Imagine that one interval job hangs, or takes much longer than expected. If your intervals are tight enough this will start causing a backlog and future intervals could be pushed back or skipped all together.
Also, it is just wise to keep component responsibilities as separated as possible - not having a single component be responsible for orthogonal tasks. This is a common design practice which is reflected in the scheduled job use-case by keeping scheduling and execution independent.
Best practices aside, if you're in development or bootstrap mode and understand the consequences stated above you can certainly choose to ignore such advice and run everything within the scheduler task. Just be careful for hard to debug job conflicts or apparent duplication.
Well, I think this is just a recommendation. If you have a task which is ran by Scheduler and you'll run this task manually (in the Heroku administration), you'll get an error - this error is caused by timeout (because each task has limit 30s). But in fact, this task will not be interrupted - the task is gonna be finished correctly.
If you have 1 dyno, so this one dyno use Heroku for your application. If you run some scheduled job, so this dyno gonna be taken be the Scheduler -> if you have long-time running task, your page will be "idle" (not correctly working till the time, when the scheduled job will be finished).