Deleting a Google Cloud Task does not stop task running - google-cloud-platform

I have a task queue which users can push tasks onto, only one task can run at a time enforced by the concurrency setting for the queue. In some cases (e.g. long running task) they may wish to cancel a running task in order to free up the queue to process the next task.
To achieve this I have been running the task queue as a Flask application, should a user wish to cancel a task I call the delete_task method of the python client library for a given queue & task.
However, I am seeing that the underlying task continues to be processed even after the task has been deleted. Have been trying to find documentation of how Cloud Tasks handles a task being deleted, but haven't found anything concrete.
Hoping that i'd be able to listen for a signal of some sort in order to gracefully shut down the process if a deletion is received. Or that the underlying process would be killed if the parent task is deleted.
Has anyone worked with the Cloud Tasks API before? Is it correct to assume that a deleted task will cleanup any processes that are running?

I don't see how a worker would be able to find out that the task it is working on has been deleted.
In the eyes of the worker, a task is an incoming Http request. I don't know how the Queue could tell that specific process to stop. I'm fairly certain that "deleting" a task just removes it from the Queue only.
You'd have to build a custom 'cancel' function that would be able to reach out to this worker.
Or this worker would have to periodically check with the Queue to see if its task still exists.
https://cloud.google.com/tasks/docs/reference/rest/v2/projects.locations.queues.tasks/get
https://googleapis.dev/python/cloudtasks/latest/gapic/v2/api.html#google.cloud.tasks_v2.CloudTasksClient.get_task
I'm not actually sure what the Queue will return if you try to call 'get task' a deleted task since i don't see a 'status' property for task. Maybe it will return an error like 'task does not exist'

Related

Does SQS kill long running jobs that have Thread.sleep?

I have some java code that calls Thread.sleep(100_000) inside a job running in SQS. In production, during the sleep the job is often killed and re-submitted as failed. On dev I can never re-create that. Does SQS in production kill long running jobs?
SQS doesn't kill jobs - and I am not sure what you mean by you have code 'running in SQS' - what SQS does do, is to assume your job (which is running someplace other than SQS), has failed if you don't mark it completed within the timeout (Default Visibility Timeout) you set when you setup the queue.
Your job asks SQS for an item to work on (a message to process) - your job is supposed to do that work and then tell SQS that the job is now done (deletemessage). If you don't tell it it is done, SQS makes an assumption for you that the job has failed and puts the message back in the queue for another task to process.
If you need more time to complete the tasks, you can change the default visibility timeout to a higher value if you want.

How to enqueue a periodic task if it gets terminated in celery?

Let's say there is a periodic task scheduled to run every hour. A worker receives the tasks and starts processing. While the task is being processed, the celeryd process (controlled via supervisord) gets restarted (supervisorctl restart all). Even though the task had never finished execution, it won't get re-executed.
How can I re-queue the periodic task right away and prevent the multiple versions of the tasks run at the same time?
There may be a nicer way to do it, but you could just use the periodic task to create a regular task in the queue (e.g., my_actual_task.defer(…)) which will not be removed from the queue until it is completed (assuming you are using acks_late).
If you're not using acks_late, you can put the bulk of the task in a try, and in the corresponding finally put a my_actual_task.retry().
Either way, you should generally avoid killing workers without giving them a chance to finish up what they're doing.

AWS SWF Simple Workflow - Best Way to Keep Activity Worker Scripts Running?

The maximum amount of time the pollForActivityTask method stays open polling for requests is 60 seconds. I am currently scheduling a cron job every minute to call my activity worker file so that my activity worker machine is constantly polling for jobs.
Is this the correct way to have continuous queue coverage?
The way that the Java Flow SDK does it and the way that you create an ActivityWorker, give it a tasklist, domain, activity implementations, and a few other settings. You set both the setPollThreadCount and setTaskExecutorSize. The polling threads long poll and then hand over work to the executor threads to avoid blocking further polling. You call start on the ActivityWorker to boot it up and when wanting to shutdown the workers, you can call one of the shutdown methods (usually best to call shutdownAndAwaitTermination).
Essentially your workers are long lived and need to deal with a few factors:
New versions of Activities
Various tasklists
Scaling independently on tasklist, activity implementations, workflow workers, host sizes, etc.
Handle error cases and deal with polling
Handle shutdowns (in case of deployments and new versions)
I ended using a solution where I had another script file that is called by a cron job every minute. This file checks whether an activity worker is already running in the background (if so, I assume a workflow execution is already being processed on the current server).
If no activity worker is there, then the previous long poll has completed and we launch the activity worker script again. If there is an activity worker already present, then the previous poll found a workflow execution and started processing so we refrain from launching another activity worker.

Django-celery project, how to handle results from result-backend?

1) I am currently working on a web application that exposes a REST api and uses Django and Celery to handle request and solve them. For a request in order to get solved, there have to be submitted a set of celery tasks to an amqp queue, so that they get executed on workers (situated on other machines). Each task is very CPU intensive and takes very long (hours) to finish.
I have configured Celery to use also amqp as results-backend, and I am using RabbitMQ as Celery's broker.
Each task returns a result that needs to be stored afterwards in a DB, but not by the workers directly. Only the "central node" - the machine running django-celery and publishing tasks in the RabbitMQ queue - has access to this storage DB, so the results from the workers have to return somehow on this machine.
The question is how can I process the results of the tasks execution afterwards? So after a worker finishes, the result from it gets stored in the configured results-backend (amqp), but now I don't know what would be the best way to get the results from there and process them.
All I could find in the documentation is that you can either check on the results's status from time to time with:
result.state
which means that basically I need a dedicated piece of code that runs periodically this command, and therefore keeps busy a whole thread/process only with this, or to block everything with:
result.get()
until a task finishes, which is not what I wish.
The only solution I can think of is to have on the "central node" an extra thread that runs periodically a function that basically checks on the async_results returned by each task at its submission, and to take action if the task has a finished status.
Does anyone have any other suggestion?
Also, since the backend-results' processing takes place on the "central node", what I aim is to minimize the impact of this operation on this machine.
What would be the best way to do that?
2) How do people usually solve the problem of dealing with the results returned from the workers and put in the backend-results? (assuming that a backend-results has been configured)
I'm not sure if I fully understand your question, but take into account each task has a task id. If tasks are being sent by users you can store the ids and then check for the results using json as follows:
#urls.py
from djcelery.views import is_task_successful
urlpatterns += patterns('',
url(r'(?P<task_id>[\w\d\-\.]+)/done/?$', is_task_successful,
name='celery-is_task_successful'),
)
Other related concept is that of signals each finished task emits a signal. A finnished task will emit a task_success signal. More can be found on real time proc.

What's common practice for enabling an locking mechanism for multiple SQS consumers in Django so I can be idempotent

SQS expects your application to be idempotent and I've got multiple consumers/producers where (even if SQS had a deliver-once mechanism) I will have race conditions creating duplicates and race conditions consuming because my consumers run via cron jobs.
My current plan is to use the Django 1.4 select_for_update which should block other consumers on the same row, doing something like:
reminders = EmailReminder.objects.select_for_update().filter(id=some_id)
if not reminders[0].finished:
reminder.send()
reminder.update(finished=datetime.now())
# Delete job.
Are there better ways of dealing with this?
Hook up django-celery to SQS and have it designate a periodic job using celerybeat. Then have celeryd worker(s) running on the same queue anywhere you want. Only one will pick up a job at a time and execute it. No need to introduce DB locking on any level.
As long as your worker is guaranteed to finish its current task before celerybeat fires a new one you will never have a need for a lock. Now if you think there is a chance they may overlap you can introduce states for your notifications where:
Any reminder starts in "unsent" state.
Your celerybeat sends a request to process unsent emails to the queue.
Some worker picks it up and grabs all of them.
Immediately the worker transitions all of them to "sending" state.
Proceeds to send them one at a time (or in bulk).
If sending fails for any, revert their state back to unsent.
For all that succeeded transition to sent.
This way if celerybeat fires another job while your original job is not done with the initial batch, you won't have duplicate emails sent. As an added bonus you can scale the solution and distribute the load.