How to handle file processing request in Django? - django

I am making a Django Rest framework based server and in one of the request, I get an audio file from front-end, on which I need to run some ML based algorithm(I have script for same) and respond back to user with the result. Problem is that this request might take 5-10 seconds to execute. I am trying to understand following things:
Will Celery help me reduce the workload on server, as in any case I need to wait for the result of the ML Algo and respond back to user.
Should I create a different server to handle this type of request? Will that be a better approach?
Also, is my flow of doing things correct. First, Upload the file to some cloud platform for storage and serialize the instance to get the url of file. Second run the script using celery and wait for the result. Third, Respond back with the result.
Thanks for helping.

Related

django with heavy computation and long runtime - offline computation and send results

I have a django app where the user sends a request, and the server does some SQL lookup, followed by computation on results and finally showing the results to the user.
The SQL lookup and the computation afterwards can take a long time, maybe 30+ minutes. I have seen some webpages ask for email in such cases then send you the URL later. But I'm not sure how this can be done in django or whether there are other options for this situation. Any pointer will be very helpful.
(I'm sorry but as I said it's a rather general question, I don't know how can I provide a min runnable code for this)
One way to accomplish this would be to use something like Celery, which is a distributed task queue. The processing task would go into the queue (synchronously or asynchronously), and it would call a function to send an email to the user alerting them it is ready when the task is complete.
Documentation: https://docs.celeryproject.org/en/stable/django/first-steps-with-django.html

Returning the result of celery task to the client in Django template

So I'm trying to accomplish the following. User browses webpage and at the sime time there is a task running in the background. When the task completes it should return args where one of args is flag: True in order to trigger a javascript and javascript shows a modal form.
I tested it before without async tasks and it works, but now with celery it just stores results in database. I did some research on tornado-celery and related stuff but some of components like tornado-redis is not mantained anymore so it would not be vise in my opinion to use that.
So what are my options, thanks?
If I understand you correctly, then you want to communicate something from the server side back to the client. You generally have three options for that:
1) Make a long pending request to the server - kinda bad. Jumping over the details, it will bog down your web server if not configured to handle that, it will make your site score low on performance tests and if the request fails, everything fails.
2) Poll the server with numerous requests with a time interval (0.2 s, something like that) - better. It will increase the traffic, but the requests will be tiny and will not interfere with the site's performance very much. If you instate a long interval to not load the server with pointless requests, then the users will see the data with a bit of a delay. On the upside this will not fail (if written correctly) even if the connection is interrupted.
3) Websockets where the server can just hit the client with any message whenever needed - nice, but takes some time to get used to. If you want to try, you can use django-channels which is a nice library for Django websockets.
If I did not understand you correctly and this is not the problem at hand and you are figuring how to get data back from a Celery task to Django, then you can store the Celery task ID-s and use the ID-s to first check, if the task is completed and then query the data from Celery.

How to update progress bar while making a Django Rest api request?

My django rest app accepts request to scrape multiple pages for prices & compare them (which takes time ~5 seconds) then returns a list of the prices from each page as a json object.
I want to update the user with the current operation, for example if I scrape 3 pages I want to update the interface like this :
Searching 1/3
Searching 2/3
Searching 3/3
How can I do this?
I am using Angular 2 for my front end but this shouldn't make a big difference as it's a backend issue.
This isn't the only way, but this is how I do this in Django.
Things you'll need
Asynchronous worker procecess
This allows you to do work outside the context of the request-response cycle. The most common are either django-rq or Celery. I'd recommend django-rq for its simplicity, especially if all you're implementing is a progress indicator.
Caching layer (optional)
While you can use the database for persistence in this case, temporary cache key-value stores make more sense here as the progress information is ephemeral. The Memcached backend is built into Django, however I'd recommend switching to Redis as it's more fully featured, super fast, and since it's behind Django's caching abstraction, does not add complexity. (It's also a requirement for using the django-rq worker processes above)
Implementation
Overview
Basically, we're going to send a request to the server to start the async worker, and poll a different progress-indicator endpoint which gives the current status of that worker's progress until it's finished (or failed).
Server side
Refactor the function you'd like to track the progress of into an async task function (using the #job decorator in the case of django-rq)
The initial POST endpoint should first generate a random unique ID to identify the request (possibly with uuid). Then, pass the POST data along with this unique ID to the async function (in django-rq this would look something like function_name.delay(payload, unique_id)). Since this is an async call, the interpreter does not wait for the task to finish and moves on immediately. Return a HttpResponse with a JSON payload that includes the unique ID.
Back in the async function, we need to set the progress using cache. At the very top of the function, we should add a cache.set(unique_id, 0) to show that there is zero progress so far. Using your own math implementation, as the progress approaches 100% completion, change this value to be closer to 1. If for some reason the operation fails, you can set this to -1.
Create a new endpoint to be polled by the browser to check the progress. This looks for a unique_id query parameter and uses this to look up the progress with cache.get(unique_id). Return a JSON object back with the progress amount.
Client side
After sending the POST request for the action and receiving a response, that response should include the unique_id. Immediately start polling the progress endpoint at a regular interval, setting the unique_id as a query parameter. The interval could be something like 1 second using setInterval(), with logic to prevent sending a new request if there is still a pending request.
When the progress received equals to 1 (or -1 for failures), you know the process is finished and you can stop polling
That's it! It's a bit of work just to get progress indicators, but once you've done it once it's much easier to re-use the pattern in other projects.
Another way to do this which I have not explored is via Webhooks / Channels. In this way, polling is not required, and the server simply sends the messages to the client directly.

Django concurrency with celery

I am using django framework and ran into some performance problems.
There is a very heavy (which costs about 2 seconds) in my views.py. And let's call it heavy().
The client uses ajax to send a request, which is routed to heavy(), and waits for a json response.
The bad thing is that, I think heavy() is not concurrent. As shown in the image below, if there are two requests routed to heavy() at the same time, one must wait for another. In another word, heavy() is serial: it cannot take another request before returning from current request. The observation is tested and proven on my local machine.
I am trying to make the functions in views.py concurrent and asynchronous. Ideally, when there are two requests coming to heavy(), heavy() should throw the job to some remote worker with a callback, and return. Then, heavy() can process another request. When the task is done, the callback can send the results back to client. The logic is demonstrated as below:
However, there is a problem: if heavy() wants to process another request, it must return; but if it returns something, the django framework will send a (fake)response to the client, and the client may not wait for another response. Moreover, the fake response doesn't contain the correct data. I have searched throught stackoverflow and find less useful tips. I wonder if anyone have tried this and knows a good way to solve this problem.
Thanks,
First make sure that 'inconcurrency' is actually caused by your heavy task. If you're using only one worker for django, you will be able to process only one request at a time, no matter what it will be. Consider having more workers for some concurrency, because it will affect also short requests.
For returning some information when task is done, you can do it in at least two ways:
sending AJAX requests periodicaly to fetch status of your task
using SSE or websocket to subscribe for actual result
Both of them will require to write some more JavaScript code for handling it. First one is really easy achievable, for second one you can use uWSGI capabilities, as described here. It can be handled asynchronously that way, independently of your django workers (django will just create connection and start task in celery, checking status and sending it to client will be handled by gevent.
To follow up on GwynBliedD's answer:
celery is commonly used to process tasks, it has very simple django integration. #GwynBlieD's first suggestion is very commonly implemented using celery and a celery result backend.
https://www.reddit.com/r/django/comments/1wx587/how_do_i_return_the_result_of_a_celery_task_to/
A common workflow Using celery is:
client hits heavy()
heavy() queues heavy() task asynchronously
heavy() returns future task ID to client (view returns very quickly because little work was actually performed)
client starts polling a status endpoint using the task ID
when task completes status returns result to client

Sustain an http connection while django processes a big request (20mins+)

I've got a django site that is producing a csv download. The content of the csv is dictated by user defined parameters. It's possible that users will set parameters that require significant thinking time on the server. I need a way of sustaining the http connection so the browser doesn't kick up an error message. I heard that it's possible to send intermittent http headers to do this. Can anyone point me in the right direction to set this up on a django site?
(unfortunatly I'm stuck with the possibility of slow reports - improving my sql won't mitigate this)
Don't do it online. Trigger an offline task, use a bit of Javascript to repeatedly call a view that checks if the task has finished, and redirect to the finished file when it's ready.
Instead of blocking the user and it's browser for 20 minutes (which is not a good idea) do the time-consuming task in the background. When the task will finish and generate the result simply notify the user so that he/she will just need to download the ready result.