I have a django form, which is collecting user response. I also have a tensorflow sentences classification model. What is the best/standard way to put these two together.
Details:
tensorflow model was trained on the Movie Review data from Rotten Tomatoes.
Everytime a new row is made in my response model , i want the tensorflow code to classify it( + or - ).
Basically I have a django project directory and two .py files for classification. Before going ahead myself , i wanted to know what is the standard way to implement machine learning algorithms to a web app.
It'd be awesome if you could suggest a tutorial or a repo.
Thank you !
Asynchronous processing
If you don't need the classification result from the ML code to pass immediately to the user (e.g. as a response to the same POST request that submtted), then you can always queue the classification job to be ran in the background or even a different server with more CPU/memory resources (e.g. with django-background-tasks or Celery)
A queued task would be for example to populate the field UserResponse.class_name (positive, negative) on the database rows that have that field blank (not yet classified)
Real time notification
If the ML code is slow and want to return that result to the user as soon as it is available, you can use the asynchronous approach described above, and pair with the real time notification (e.g. socket.io to the browser (this can be triggered from the queued task)
This becomes necessary if ML execution time is so long that it might time-out the HTTP request in the synchronous approach described below.
Synchronous processing, if ML code is not CPU intensive (fast enough)
If you need that classification result returned immediately, and the ML classification is fast enough *, you can do so within the HTTP request-response cycle (the POST request returns after the ML code is done, synchronously)
*Fast enough here means it wouldn't time-out the HTTP request/response, and the user wouldn't lose patience.
Well, I had to develop the same solution myself. In my case, I used Theano. If you are using tensorflow or theano, you are able to save the model you have built. So first, train the model with your training dataset, then save the model using the library you have chosen. You need to deploy into your django web application only the part of your code that handles the prediction. So using a simple POST, you would give to the user the predicted class of your sentence quickly enough. Also, if you think is needed, you can run a job periodically to train your model again with the new input patterns and save it once more.
I would suggest not to use Django since it will add execution time to the solution.
Instead, you could use node to serve a Reactjs frontend that interacts with the TensorFlow rest API that functions as a standalone server.
As the answer above this post suggests, it will be better to use WebSockets, you could use a react WebSocket module so it will refresh your components once the state of the component changes.
Hope this helps.
Related
Currently I am developing a django + react website, that will (I hope) serve a decent number of users. The project demo is mostly complete, and I am starting to think about the scale required to put this thing into production
The website essentially does three things:
Grab data from external APIs (i.e. Twitter) for 50,000 unique keywords (the keywords dont change). This process happens every 30 minutes
Run computation on all of the data, and save that computation to the database. Assume that the algorithm is as optimized as possible
When a user visits the website it should serve a pretty graph/chart of all of the computational data per keyword
The issue being, this is far too intense a task to be done by the same application that serves the website, users would be waiting decades to see their data. My current plan is to have a separate API made that services the website with the data, that the website can then store in it's database. This separate API would process the data without fear of affecting users, and it should be able to finish its current computation in under 30 minutes, in time for the next round of data.
Can anyone help me understand how I can better equip my project to handle the scale? I'd love some ideas.
As a 4th year CS Student I figured it's time to put a real project out into the world and I am very excited about it and the progress I've made so far. My main worry is that the end users will be negatively effected, if I don't figure out some kind of pipeline to make this process happen.
To re-iterate my idea:
Django + React - This is the forward facing website
External API - Grabs the data off the internet and processes it, and waits for a GET request from the website
Is there a better way to do this? Or on the other hand am I severely overestimating how computationally heavy this is.
Edit: Including current research
Handling computationally intensive tasks in a Django webapp
Separation of business logic and data access in django
What you want is to have the computation task to be executed by a different process in the "background".
The most straight-forward and popular solution is to use Celery, see here.
The Celery worker(s) - which performs the background task - can either run on the same machine as the web-application or (when scale becomes an issue), you can change the configuration so that it will run on an entirely different machine.
When using ml-engine for online prediction we send a request and get the prediction results, that's cool but Request is usually different compared to model input, for example:
A categorical variable can be in request but model is expecting and integer mapped to that that category
also for a given feature we may need to create multiple features, like splitting text into two or more features
And we might need to exclude some of the features in the request like a constant feature that's useless for the model
How do you handle this process? My solution is to get the request with an appengine app, send it to pub/sub , process it in dataflow, save it to gcs and trigger a cloud function to send the processed request to ml-engine endpoint and get the predicted result. This can be an over-engineering and I want to avoid that, If you have any advice regarding to Xgboost models I'll be appreciated.
We are testing out a feature that allows a user to provide some Python code to be run server-side. This will allow you to do the types of transformations you are trying to do, either as a scikit learn pipeline or as a Python function. If you'd like to test it out, please contact cloudml-feedback#google.com.
I have a django application that deploys the model logic and data handling through the administration.
I also have in the same project a python file (scriptcl.py) that makes use of the model data to perform heavy calculations that take some time, per example 5 secs, to be processed.
I have migrated the project to the cloud and now I need an API to call this file (scriptcl.py) passing parameters, process the computation accordingly to the parameters and data of the DB (maintained in the admin) and then respond back.
All examples of the django DRF that I've seen so far only contain authentication and data handling (Create, Read, Update, Delete).
Could anyone suggest an idea to approach this?
In my opinion correct approach would be using Celery to perform this calculations asynchronous.
Write a class which inherits from DRF APIView which handles authentication, write whatever logic you want or call whichever function, Get the final result and send back the JsonReposen. But as you mentioned if the Api takes more time to respond. Then you might have to think of some thing else. Like giving back a request_id and hit that server with the request_id every 5seconds to get the data or something like that.
Just to give a feedback to this, the approach that I took was to build another API using flask and normal python scripts.
I also used sqlalchemy to access the database and retrieve the necessary data.
I am building an app, where I need to fetch some data from an API and update all the models with that data every few minutes.
What would be a clean way to accomplish something like this?
Well, it's a quite open question.
You'll need to create a task that runs every few minutes, you can do this with Celery. Celery has a task schedluer http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html wich will launch a certain function at a configured time similar to a crontab
The task then would fetch the data, http://docs.python-requests.org/en/master/ is a very good library to make http requests.
And lastly but no less important you would need to serialize the fetched data and save it to your model. Django rest framework serializing capabilities are a great starting point, but if data structure is simple enough you can just use JSON python library json.loads(data) and create a function that translate the fields on the API to the fields of the model.
By the way, I'm supposing a REST API.
You can use a task management tool that has the feature of running periodic tasks in the intervals you specify, like Periodic Tasks in Celery.
Also, If you run your code on a Unix-like system, you can stick with the core django functionality. Just write your functionality as a Django Management Command and set a cronjob to run it in your preferred interval.
One of my view functions is a very long processing job and clearly needs to be handled differently.
Instead of making the user wait for long time, it would be best if I were able to lunch the processing job which would email the results, and without waiting for completion notify the user that their request is being processed and let them browse on.
I know I can use os.fork, but I was wondering if there is a 'right way' in terms of Django. Perhaps I can return the HTTP response, and than go on with this job somehow?
There are a couple of solutions to this problem, and the best one depends a bit on how heavy your workload will be.
If you have a light workload you can use the approach used by django-mailer which is to define a "jobs" model, save new jobs into the database, then have cron run a stand-alone script every so often to process the jobs stored in the database (deleting them when done). You can use something like django-chronograph to manage the job scheduling easier
If you need help understanding how to write a script to process the job see James Bennett's article Standalone Django Scripts for help.
If you have a very high workload, meaning you'll need more than a single server to process the jobs, then you want to use a real distribute task queue. There is a lot of competition here so I can't really detail all the options, but a good one to use with for Django apps is celery.
Why not simply start a thread to do the processing and then go on to send the response?
Before you select a solution, you need to determine how the process will be run. I.e is it the same process for every single user, the data is the same and can be scheduled regularly? or does each user request something and the results are slightly different ?
As an example, if the data will be the same for every single user and can be run on a schedule you could use cron.
See: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/
or
http://docs.djangoproject.com/en/dev/howto/custom-management-commands/
However if the requests will be adhoc and you need something scalable that can handle high load and is asynchronous: what you are actually looking for is a message queing system. Your view will add a request to the queue which will then get acted upon.
There are a few options to implement this in Django:
Django Queue service is purely django & python and simple, though the last commit was in April and it seems the project has been abandoned.
http://code.google.com/p/django-queue-service/
The second option if you need something that scales, is distributed and makes use of open source message queing servers: celery is what you need
http://ask.github.com/celery/introduction.html
http://github.com/ask/celery/tree