Using Twisted for asynchronous file uploads from Django app - django

We have a Django app that needs to post messages and upload files from the web server to another server via an XML API. We need to do X asynchronous file uploads and then make another XML API request when they have finished uploading. I'd also like the files to stream from disk without having to load them completely into memory first. Finally, I need to send the files as application/octet-stream in a POST body (rather than a more typical form data MIME type) and I wasn't able to find a way to do this with urllib2 or httplib.
I ended up integrating Twisted into the app. It seemed perfect for this task, and sure enough I was able to write a beautifully clean implementation with deferreds for each upload. I use my own IBaseProducer to read the data from the file in chunks and send it to the server in a POST request body. Unfortunately then I found out that the Twister reactor cannot be restarted, so I can't just run it and then stop it whenever I want to upload files. Since Twisted is apparently used more for full-blown servers, I'm now wondering whether this was the right choice.
I'm not sure if I should:
a) Configure the WSGI container (currently I'm testing with manage.py) to start a Twisted thread on startup and use blockingCallFromThread to trigger my file uploads.
b) Use Twisted as the WSGI container for the Django app. I'm assuming we'll want to deploy later on Apache and I'm not sure what the implications are if we take this route.
c) Simply can Twisted and use some other approach for the file uploads. Kind of a shame since the Twisted approach with deferreds is elegant and works.
Which of these should we choose, or is there some other alternative?

Why would you want to deploy later on Apache? Twisted is rad. I would do (b) until someone presented specific, compelling reasons not to. Then I would do (a). Fortunately, your application code looks the same either way. blockingCallFromThread works fine whether Twisted is your WSGI container or not - either way, you're just dealing with running code in a separate thread than the reactor is running in.

Related

Does Django Channels work as intended as a WSGI app?

I am trying to implement Django Channels because I need to have users receive notifications when another user does something, and I am completely confused by this part:
http://channels.readthedocs.io/en/stable/deploying.html
Deploying applications using channels requires a few more steps than a
normal Django WSGI application, but you have a couple of options as to
how to deploy it and how much of your traffic you wish to route
through the channel layers.
Firstly, remember that it’s an entirely optional part of Django. If
you leave a project with the default settings (no CHANNEL_LAYERS),
it’ll just run and work like a normal WSGI app.
The problem is that I have quite limited rights on the shared hosting that I am using and therefore, I can't use the runworker command.
The quote above says that this part is "optional" and that without it, it'll work like a normal WSGI app. But can I use Django Channels with a normal WSGI app? If not, then doesn't that mean that it's not optional at all?
So my question is: if I skip this part, will the channels still work and will I be able to use the things showed on this page (routing, sending messages, etc): http://channels.readthedocs.io/en/stable/getting-started.html ?
From reading the docs, what i get is, first you need to use a back end to run channel eg. redis, Sharding, and run "runworker", but since it's not an option for you, have a look at this http://channels.readthedocs.io/en/stable/backends.html
"""The in-memory layer is only useful when running the protocol server and the worker server in a single process; the most common case of this is runserver, where a server thread, this channel layer, and worker thread all co-exist inside the same python process."""
So by avoiding third party backend you can use in-memory asgi layer and just run 'runserver' and the channel layer is setup. Just look for in-memory subtopic in the link
And if you keep the CHANNEL_LAYERS empty django'll work as a wsgi app, but what we need is asgi app, and asgi is required for channels.

How to communicate between C++ server app and django web app

I have some framework doing specific task in C++ and a django-based web app. The idea is to launch this framework, receive some data from it, send some data or request and check it's status in some period.
I'm looking for the best way of communication. Both apps run on the same server. I was wondering if a json server in C++ is a good idea. Django would send a request to this server, and server would parse it, and delegate a worker thread to complete the task. Almost all data that need to be send is string-like. Other data will be stored in database so there is no problem with that.
Is JSON a good idea? Maybe you know some better mechanism for local communication between C++ and django?
If your requirement is guaranteed to always have the C++ application on the same machine as the Django web application, include the C++ code by converting it into a shared library and wrapping python around it. Just like this Calling C/C++ from python?
JSON and other serializations make sense if you are going to do remote calls and the code needs to communicate across machines.
JSON seems like a fair enough choice for data serialization - it's good at handling strings and there's existing libraries for encoding/decoding JSON in both python & C++.
However, I think your bigger problem is likely to be the transport protocol that you use for transferring JSON between your client and server. Here's some options:
You could build an HTTP server into your C++ application (which I think might be what you mean by "JSON server" in your question), which would work fine, though might be a bit of a pain to implement unless you get a hold of a library to handle the hard work for you.
Another option might be to use the 0MQ library to send JSON (or otherwise) messages between your client and server. I think this would probably be a lot easier than implementing a full HTTP server, and 0MQ has some interprocess communication code that would likely be a lot faster than sending things over the network.
A third option would just be to run your C++ as a standalone application and pass the data in to it via stdin or command line parameters. This is probably the simplest way to do things, though may not be the most flexble. If you were to go this way, you might be better off just building a Python/C++ binding as suggested by ablm.
Alternatively you could attempt to build some sort of job queue based on redis or something other database system. The idea being that your django application puts some JSON describing the job into the job queue, and then the C++ application could periodically poll the queue, and use a seperate redis entry to pass the results back to the client. This could have the advantage that you could reasonably easily have several "workers" reading from the job queue with minimal effort.
There's almost definitely some other ways to go about it, but those are the ones that immediately spring to mind.

Worker threads/Queues to process datasets after an upload?

I'm writing a web application with Django where users can upload files with statistical data.
The data needs to be processed before it can be properly used (each dataset can take up to a few minutes of time before processing is finished). My idea was to use a python thread for this and offload the data processing into a separate thread.
However, since I'm using uwsgi, I've read about a feature called "Spoolers". The documentation on that is rather short, but I think it might be what I'm looking for. Unfortunately the -Q option for uwsgi requires a directory, which confuses me.
Anyway, what are the best practices to implement something like worker threads which don't block uwsgi's web workers so I can reliably process data in the background while still having access to Django's database/models? Should I use threads instead?
All of the offloading subsystems need some kind of 'queue' to store the 'things to do'.
uWSGI Spooler uses a printer-like approach where each file in the directory is a task. When the task in done the file is removed. Other systems relies on more heavy/advanced servers like rabbitmq and so on.
Finally, do not directly use the low-level api of the spooler but rely on decorators:
http://projects.unbit.it/uwsgi/wiki/Decorators

Django + Coffescript: Real time video application with io sockets

I have been trying to solve this for 2 weeks and I have not been able to reach a solution.
Here is what I am trying to do:
I need a web application in which users can upload a video; the video is going to be transformed using opencv's python API. Since I have Python's API for opencv I decided to create the webapp using Django. Everything is fine to that point.
The problem is that the video transformation is a very long process so I was trying to implement some real time capabilities in order to show the user the video as it is transformed, in other words, I transform a frame and show it to the user inmediatly. I am trying to do this with CoffeScript and io sockets following some examples; however I havent been successful.
My question is; what would be the right approach to add real time capabilities to a Django application ?
I'd recommend using a non-django service to handle the websockets. Setting up websockets properly is tricky on both the client and server side. Look at pusher.com for a free/cheap solution that will just work and save you a whole lot of hassle.
The initial request to start rendering should kick off the long-lived process, and return with an ID which is used to listen to the websocket for updates.
Once you have your websockets set up, you can send messages to the client about each finished frame. Personally I wouldn't try to push the whole frame down the websocket, but rather just send a message saying the frame is done with a URL to get the frame. Then normal HTTP with its caching and browser niceties moves the big data.
You're definitely not choosing the easy path. The easy path is to have your long-lived render task update the render state in the database, and have the client poll that. Extra server load, but a lot simpler.
Django itself really is focused on doing one kind of web interface, which is following the HTTP Request/Response pattern. To maintain a persistent connection with clients, which socket.io really makes dead simple, you need to diverge a bit from a normal Django installation.
This article discusses the issue of doing real-time with Django, with the help of Orbited and Twisted. It's rather old, and it relies on Comet, which is not the preferred way of doing real-time these days.
You might benefit a lot by going for Socket.io on the client, and something like Tornado (wiki) + Tornado client for Socket.io. But, if you really want to stick with Django for the web development (which Tornado also provide), you would need to make the two work together internally, each handling their particular use case.
Finally, this other article discusses how to make Django work with gevent (an coroutine-based networking library for Python) and Socket.io, which might well be your best option otherwise.
Don't hesitate to post questions/comments as they pop up!

Automatic upload of 10KB file to web service?

I am writing an application, similar to Seti#Home, that allows users to run processing on their home machine, and then upload the result to the central server.
However, the final result is maybe a 10K binary file. (Processing to achieve this output is several hours.)
What is the simplest reliable automatic method to upload this file to the central server? What do I need to do server-side to prevent blocking? Perhaps having the client send mail is simple and reliable? NB the client program is currently written in Python, if that matters.
Email is not a good solution; you will run into potential ISP blocking and other anti-spam mechanisms.
The easiest way is over HTTP via a simple webservice. Have a listener at your server that accepts the uploaded files as part of a HTTP POST and then dump them wherever they need to be post-processed.