How do confirm_publish and acks_late compare in Celery? - django

I've noticed the following in the docs:
Ideally task functions should be idempotent: meaning the function won’t cause unintended effects even if called multiple times with the same arguments. Since the worker cannot detect if your tasks are idempotent, the default behavior is to acknowledge the message in advance, just before it’s executed, so that a task invocation that already started is never executed again.
If your task is idempotent you can set the acks_late option to have the worker acknowledge the message after the task returns instead. See also the FAQ entry Should I use retry or acks_late?.
If set to True messages for this task will be acknowledged after the task has been executed, not just before (the default behavior).
Note: This means the task may be executed multiple times should the worker crash in the middle of execution. Make sure your tasks are idempotent.
Then there is the BROKER_TRANSPORT_OPTIONS = {'confirm_publish': True} option found here. I could not find official documentation for that.
I want to be certain that tasks which are submitted to celery (1) arrive at celery and (2) eventually get executed.
Here is how I think it works:
Celery stores the information which tasks should get executed in a broker (typically RabbitMQ or Redis)
The application (e.g. Django) submits a task to Celery which immediately stores it in the broker. confirm_publish confirms that it was added (right?). If confirm_publish is set but the confirmation is missing, it retries (right?).
Celery takes messages from the broker. Now celery behaves as a consumer for the broker. The consumer acknowledges (confirms) that it received a message an the broker stores this information. If the consumer didn't sent an acknowledgement, the broker will re-try to send it.
Is that correct?

Related

Deleting a Google Cloud Task does not stop task running

I have a task queue which users can push tasks onto, only one task can run at a time enforced by the concurrency setting for the queue. In some cases (e.g. long running task) they may wish to cancel a running task in order to free up the queue to process the next task.
To achieve this I have been running the task queue as a Flask application, should a user wish to cancel a task I call the delete_task method of the python client library for a given queue & task.
However, I am seeing that the underlying task continues to be processed even after the task has been deleted. Have been trying to find documentation of how Cloud Tasks handles a task being deleted, but haven't found anything concrete.
Hoping that i'd be able to listen for a signal of some sort in order to gracefully shut down the process if a deletion is received. Or that the underlying process would be killed if the parent task is deleted.
Has anyone worked with the Cloud Tasks API before? Is it correct to assume that a deleted task will cleanup any processes that are running?
I don't see how a worker would be able to find out that the task it is working on has been deleted.
In the eyes of the worker, a task is an incoming Http request. I don't know how the Queue could tell that specific process to stop. I'm fairly certain that "deleting" a task just removes it from the Queue only.
You'd have to build a custom 'cancel' function that would be able to reach out to this worker.
Or this worker would have to periodically check with the Queue to see if its task still exists.
https://cloud.google.com/tasks/docs/reference/rest/v2/projects.locations.queues.tasks/get
https://googleapis.dev/python/cloudtasks/latest/gapic/v2/api.html#google.cloud.tasks_v2.CloudTasksClient.get_task
I'm not actually sure what the Queue will return if you try to call 'get task' a deleted task since i don't see a 'status' property for task. Maybe it will return an error like 'task does not exist'

How to detect stale workers (or auto-restart)

We recently experienced a nasty situation with the celery framework. There were a lot of messages in the queue, however those messages weren't processed. We restarted celery and the messages started being processed again. However we do not want a situation like this happening again and are looking for a permanent solution.
It appears that celery's workers have gone stale. The documentation of celery notes the following on stale workers:
This shows that there’s 2891 messages waiting to be processed in the task queue, and there are two consumers processing them.
One reason that the queue is never emptied could be that you have a stale worker process taking the messages hostage. This could happen if the worker wasn’t properly shut down.
When a message is received by a worker the broker waits for it to be acknowledged before marking the message as processed. The broker will not re-send that message to another consumer until the consumer is shut down properly.
If you hit this problem you have to kill all workers manually and restart them
See documentation
However this relies on manual checking for stale workers, leaving lots of room for error and costing manual labor. What would be a good solution to keep celery working?
You could use supervisor or supervisor-like tools to deploy the workers, refer to Running the worker as daemon .
Moreover, you could monitor the queue status with rabbitmq-management, to check if the queue become too large, assume that you are using RabbitMQ; celery monitoring also provide some mechanisms for monitoring

Django-celery project, how to handle results from result-backend?

1) I am currently working on a web application that exposes a REST api and uses Django and Celery to handle request and solve them. For a request in order to get solved, there have to be submitted a set of celery tasks to an amqp queue, so that they get executed on workers (situated on other machines). Each task is very CPU intensive and takes very long (hours) to finish.
I have configured Celery to use also amqp as results-backend, and I am using RabbitMQ as Celery's broker.
Each task returns a result that needs to be stored afterwards in a DB, but not by the workers directly. Only the "central node" - the machine running django-celery and publishing tasks in the RabbitMQ queue - has access to this storage DB, so the results from the workers have to return somehow on this machine.
The question is how can I process the results of the tasks execution afterwards? So after a worker finishes, the result from it gets stored in the configured results-backend (amqp), but now I don't know what would be the best way to get the results from there and process them.
All I could find in the documentation is that you can either check on the results's status from time to time with:
result.state
which means that basically I need a dedicated piece of code that runs periodically this command, and therefore keeps busy a whole thread/process only with this, or to block everything with:
result.get()
until a task finishes, which is not what I wish.
The only solution I can think of is to have on the "central node" an extra thread that runs periodically a function that basically checks on the async_results returned by each task at its submission, and to take action if the task has a finished status.
Does anyone have any other suggestion?
Also, since the backend-results' processing takes place on the "central node", what I aim is to minimize the impact of this operation on this machine.
What would be the best way to do that?
2) How do people usually solve the problem of dealing with the results returned from the workers and put in the backend-results? (assuming that a backend-results has been configured)
I'm not sure if I fully understand your question, but take into account each task has a task id. If tasks are being sent by users you can store the ids and then check for the results using json as follows:
#urls.py
from djcelery.views import is_task_successful
urlpatterns += patterns('',
url(r'(?P<task_id>[\w\d\-\.]+)/done/?$', is_task_successful,
name='celery-is_task_successful'),
)
Other related concept is that of signals each finished task emits a signal. A finnished task will emit a task_success signal. More can be found on real time proc.

What's common practice for enabling an locking mechanism for multiple SQS consumers in Django so I can be idempotent

SQS expects your application to be idempotent and I've got multiple consumers/producers where (even if SQS had a deliver-once mechanism) I will have race conditions creating duplicates and race conditions consuming because my consumers run via cron jobs.
My current plan is to use the Django 1.4 select_for_update which should block other consumers on the same row, doing something like:
reminders = EmailReminder.objects.select_for_update().filter(id=some_id)
if not reminders[0].finished:
reminder.send()
reminder.update(finished=datetime.now())
# Delete job.
Are there better ways of dealing with this?
Hook up django-celery to SQS and have it designate a periodic job using celerybeat. Then have celeryd worker(s) running on the same queue anywhere you want. Only one will pick up a job at a time and execute it. No need to introduce DB locking on any level.
As long as your worker is guaranteed to finish its current task before celerybeat fires a new one you will never have a need for a lock. Now if you think there is a chance they may overlap you can introduce states for your notifications where:
Any reminder starts in "unsent" state.
Your celerybeat sends a request to process unsent emails to the queue.
Some worker picks it up and grabs all of them.
Immediately the worker transitions all of them to "sending" state.
Proceeds to send them one at a time (or in bulk).
If sending fails for any, revert their state back to unsent.
For all that succeeded transition to sent.
This way if celerybeat fires another job while your original job is not done with the initial batch, you won't have duplicate emails sent. As an added bonus you can scale the solution and distribute the load.

Understanding how basic celery message qeueing works

I've implemented a small test which uses celery for message queueing and I just want to make sure I understand how it works on a basic level (Django-Celery, Using Redis as a broker).
My understanding is that when I place a call to start an asyncronous task, the task information is placed in redis and then a celeryd instance connected to the broker consumes and executes the task. Is this essentially what is happening?
If I setup a periodic task thats supposed to execute once every hour does that task get executed on all task consumers? If so is there a way to limit it so that only one consumer will ever execute a periodic task?
The workers will consume as many messages as the broker contains. If you have 8 workers, but only 1 message, 1 of the 8 workers will consume the message, executing the task.