I have a function "train_model" which is call via a "train" (Flask) API. Once this API is triggered training of a model is started. On completion it saves a model. But I want to introduce a "cancel" API. Which will stop the training, and should return a valid response for "train" API.
You should probably consider using multiprocessing and run that model in a separate process and respond with a unique request_id, which is stored in the cache/db later whenever you want to cancel, so when model training is complete before exiting the process remove it from cache/db and your API should take request_id and stop that process or if it is not cache/db respond 404 accordingly since this way work is too much of re-inventing wheel you could simply consider using celery
sample untested code with multiprocessing pool
jobs = {}
#app.route("/api/train", methods=['POST'])
def train_it():
with Pool() as process_pool:
job = process_pool.apply_async(train_model_func,args=(your_inputs))
job['12455ABC'] = job
return "OK I got you! here is your request id 12455ABC"
Related
I have a request that lasts more than 3 minutes, I want the request to be sent and immediately give the answer 200 and after the end of the work - give the result
The workflow you've described is called asynchronous task execution.
The main idea is to remove time or resource consuming parts of work from the code that handles HTTP requests and deligate it to some kind of worker. The worker might be a diffrent thread or process or even a separate service that runs on a different server.
This makes your application more responsive, as the users gets the HTTP response much quicker. Also, with this approach you can display such UI-friendly things as progress bars and status marks for the task, create retrial policies if task failes etc.
Example workflow:
user makes HTTP request initiating the task
the server creates the task, adds it to the queue and returns the HTTP response with task_id immediately
the front-end code starts ajax polling to get the results of the task passing task_id
the server handles polling HTTP requests and gets status information for this task_id. It returns the info (whether results or "still waiting") with the HTTP response
the front-end displays spinner if server returns "still waiting" or the results if they are ready
The most popular way to do this in Django is using the celery disctributed task queue.
Suppose a request comes, you will have to verify it. Then send response and use a mechanism to complete the request in the background. You will have to be clear that the request can be completed. You can use pipelining, where you put every task into pipeline, Django-Celery is an option but don't use it unless required. Find easy way to resolve the issue
I've to perform two tasks in an API request but I want to run the second task asynchronously in the background so the API doesn't have to wait for the second task and return the response after the completion of the first task, so how can I achieve it?
#api_view(['POST'])
def create_project(request):
data = first_task()
second_task(data) # want to run this function at background
return Response("Created") # want to return this response after completion of first_task()
You need to use other methods to run async function, you can use any of the following:
django-background-tasks: Simple and doesn't require a worker
python-rq: Great for simple async tasks
celery: A more complete solution
I am trying to learn how to scheduled a task in Django using schedule package. Here is the code I have added to my view. I should mention that I only have one view so I need to run scheduler in my index view.. I know there is a problem in code logic and it only render scheduler and would trap in the loop.. Can you tell me how can I use it?
def job():
print "this is scheduled job", str(datetime.now())
def index(request):
schedule.every(10).second.do(job())
while True:
schedule.run_pending()
time.sleep(1)
objs= objsdb.objects.all()
template = loader.get_template('objtest/index.html')
context= { 'objs': objs}
return HttpResponse(template.render(context, request))
You picked the wrong approach. If you want to schedule something that should run periodically you should not do this within a web request. The request never ends, because of the wile loop - and browsers and webservers very much dislike this behavior.
Instead you might want to write a management command that runs on its own and is responsible to call your tasks.
Additionally you might want to read Django - Set Up A Scheduled Job? - they also tell something about other approaches like AMPQ and cron. But those would replace your choice of the schedule module.
Does Django have something similar to ASP.NET MVC's Asynchronous Controller?
I have some requests that will be handled by celery workers, but won't take a long time (a few seconds). I want the clients to get the response after the worker is done. I can have my view function wait for the task to complete, but I'm worried it will put too much burden on the webserver.
Clarification:
Here's the flow I can have today
def my_view(request):
async = my_task.delay(params)
result = async.get()
return my_response(result)
async.get() can take a few seconds - not too long so that the client can't wait for the HTTP response to get back.
This code might put unnecessary strain on the server. What ASP.NET MVC's AsynchronousController provides, is the ability to break this function in two, something similar to this:
def my_view(request):
async = my_task.delay(params)
return DelayedResponse(async, lambda result=>my_response(result))
This releases the webserver to handle other requests until the async operation is done. Once it done, it will execute the lambda expression on the result, giving back the response.
Instead of waiting for request to complete, you can return status "In progress" an then send one more request to check if status has changed. Since you're doing pure lookups, the response will be very fast and won't put much burden on your web server.
You can outsource this specific view/feature to Tornado web server which is designed for async callback. The rest of the site may continue to run on django.
Most likely the solution should be not technical, but in UI/UX area. If something takes long, it's ok to notify user about it if notification is clear.
Yes, you can do something only when the task completes. You would want to look into something called chain(). You can bind celery tasks in chain:
chain = first_function.s(set) | second_Function.s(do)
chain()
These two functions first_function and second_function will both are celery functions. The second_function is executed only when first_function finishes its execution.
I have managed to get periodic tasks working in django-celery by subclassing PeriodicTask. I tried to create a test task and set it running doing something useless. It works.
Now I can't stop it. I've read the documentation and I cannot find out how to remove the task from the execution queue. I have tried using celeryctl and using the shell, but registry.tasks() is empty, so I can't see how to remove it.
I have seen suggestions that I should "revoke" it, but for this I appear to need a task id, and I can't see how I would find the task id.
Thanks.
A task is a message, and a "periodic task" sends task messages at periodic intervals. Each of the tasks sent will have an unique id assigned to it.
revoke will only cancel a single task message. To get the id for a task you have to keep
track of the id sent, but you can also specify a custom id when you send a task.
I'm not sure if you want to cancel a single task message, or if you want to stop the periodic task from sending more messages, so I'll list answers for both.
There is no built-in way to keep the id of a task sent with periodic tasks,
but you could set the id for each task to the name of the periodic task, that way
the id will refer to any task sent with the periodic task (usually the last one).
You can specify a custom id this way,
either with the #periodic_task decorator:
#periodic_task(options={"task_id": "my_periodic_task"})
def my_periodic_task():
pass
or with the CELERYBEAT_SCHEDULE setting:
CELERYBEAT_SCHEDULE = {name: {"task": task_name,
"options": {"task_id": name}}}
If you want to remove a periodic task you simply remove the #periodic_task from the codebase, or remove the entry from CELERYBEAT_SCHEDULE.
If you are using the Django database scheduler you have to remove the periodic task
from the Django Admin interface.
PS1: revoke doesn't stop a task that has already been started. It only cancels
tasks that haven't been started yet. You can terminate a running task using
revoke(task_id, terminate=True). By default this will send the TERM signal to
the process, if you want to send another signal (e.g. KILL) use
revoke(task_id, terminate=True, signal="KILL").
PS2: revoke is a remote control command so it is only supported by the RabbitMQ
and Redis broker transports.
If you want your task to support cancellation you should do so by storing a cancelled
flag in a database and have the task check that flag when it starts:
from celery.task import Task
class RevokeableTask(Task):
"""Task that can be revoked.
Example usage:
#task(base=RevokeableTask)
def mytask():
pass
"""
def __call__(self, *args, **kwargs):
if revoke_flag_set_in_db_for(self.request.id):
return
super(RevokeableTask, self).__call__(*args, **kwargs)
Just in case this may help someone ... We had the same problem at work, and despites some efforts to find some kind of management command to remove the periodic task, we could not. So here are some pointers.
You should probably first double-check which scheduler class you're using.
The default scheduler is celery.beat.PersistentScheduler, which is simply keeping track of the last run times in a local database file (a shelve).
In our case, we were using the djcelery.schedulers.DatabaseScheduler class.
django-celery also ships with a scheduler that stores the schedule in the Django database
Although the documentation does mention a way to remove the periodic tasks:
Using django-celery‘s scheduler you can add, modify and remove periodic tasks from the Django Admin.
We wanted to perform the removal programmatically, or via a (celery/management) command in a shell.
Since we could not find a command line, we used the django/python shell:
$ python manage.py shell
>>> from djcelery.models import PeriodicTask
>>> pt = PeriodicTask.objects.get(name='the_task_name')
>>> pt.delete()
I hope this helps!