Django, Manage threads and websocket for a data visualisation website

Django, Manage threads and websocket for a data visualisation website - django

I am a Django beginner and here is the project:
I am building a data visualization website. I am loading a continuous data stream and I want the client to be able to choose the data processing to apply (in python, IA treatments with TensorFlow for example). Once the treatments are chosen by the client, I want to be able to launch them in a thread and send the results to the client every X seconds in a WebSocket (and be able to process WebSocket messages coming from the client at the same time).
I have already done it with Flask but I don't succeed with Django.
I have followed many tutorials, in particular this one, which seems similar to what I want to do: https://www.neerajbyte.com/post/how-to-implement-websocket-in-django-using-channels-and-stream-websocket-data
The main issue is that I don't know how or even where to create my thread. I can't do it in an AsyncWebsocketConsumer class, because it doesn't allow me to launch a thread that is able to send a Websocket request (I need to launch an async function to be able to send a request and I can't launch an async function in a thread).
One solution proposed in the above tutorial to send data regularly is to create a management command. A solution that doesn't please me either because the function executed this way can't be chosen by the client in an AsyncWebsocketConsumer class.
So here I am, I'm really struggling to find the solution.
I have heard some things about Celery and Gunicorn from here and there. What do you think about these? And how can these solve my issue?
I am open to every piece of advice, every tutorial, every idea which could allow me to move forward in my project.

Related

Is there any way to build an interactive terminal using Django Channels with it's current limitations?

It seems with Django Channels each time anything happens on the websocket there is no persistent state. Even within the same websocket connection, you can not preserve anything between each call to receive() on a class based consumer. If it can't be serialized into the channel_session, it can't be stored.
I assumed that the class based consumer would be persisted for the duration of the web socket connection.
What I'm trying to build is a simple terminal emulator, where a shell session would be created when the websocket connects. Read data would be passed as input to the shell and the shell's output would be passed out the websocket.
I can not find a way to persist anything between calls to receive(). It seems like they took all the bad things about HTTP and brought them over to websockets. With each call to conenct(), recieve(), and disconnect() the whole Consumer class is reinstantiated.
So am I missing something obvious. Can I make another thread and have it read from a Group?
Edit: The answers to this can be found in the comments below. You can hack around it. Channels 3.0 will not instantiate the Consumers on every receive call.

The new version of Channels does not have this limitation. Consumers stay in memory for the duration of the websocket request.

django-channels databinding on model.save()

I have a channels app that is using databinding. When changes are made with django admin they are being pushed to the web as expected. I have loop set up on a socket connection to do some long polling on a gpio unit and update the db, these changes are not being pushed to the web. Channels documentation says:
Signals are used to power outbound binding, so if you change the values of a model outside of Django (or use the .update() method on a QuerySet), the signals are not triggered and the change will not be sent out. You can trigger changes yourself, but you’ll need to source the events from the right place for your system.
How do I go about triggering these changes, as it happens with admin?
Thanks and please let me know if this is to vague.

The relevant low-level code is in lines 121-187 of channels/binding/base.py (at least in version 1.1.6). That's where the signals are received and processed. It involves a few different things, such as keeping track of which groups to send the messages to. So it's a little involved, but you can probably tease out how to do it, looking at that code.
The steps involved are basically:
Find the right groups for the client
Format your message in the same way that the databinding code would (see this section of the docs)
Send the message to all the relevant groups you found in step 1.
Alternatively, you might consider using a REST API such that the socket code submits a POST to the API (which would create a database record via the ORM in the normal way) rather than directly creating database records. Your signals will happen automatically in that case. djangorestframework (server-side) and requests (client-side, if you're using python for the long-polling code) are your friends if you want to go that way, for sure. If you're using another language for the long-polling client, there are many equivalent packages for REST API client work.
Good luck!

Handling long requests

I'm working on a long request to a django app (nginx reverse proxy, mysql db, celery-rabbitMQ-redis set) and have some doubts about the solution i should apply :
Functionning : One functionality of the app allows users to migrate thousands of objects from one system to another. Each migration is logged into a db, and the users are provided the possibility to get in a csv format the history of the migration : which objects have been migrated, which status (success, errors, ...)
To get the history, a get request is sent to a django view, which returns, after serialization and rendering into csv, the download response.
Problem : the serialisation and rendering processes, for a large set of objects (e.g. 160 000) are quite long and the request times out.
Some solutions I was thinking about/found thanks to pervious search are :
Increasing the amount of time before timeout : easy, but I saw everywhere that this is a global nginx setting and would affect every requests on the server.
Using an asynchronous task handled by celery : the concept would be to make an initial request to the server, which would launch the serializing and rendering task with celery, and give a special httpresponse to the client. Then the client would regularly ask the server if the job is done, and the server would deliver the history at the end of processing. I like this one but I'm not sure about how to technically implement that.
Creating and temporarily storing the csv file on the server, and give the user a way to access it & to download it. I'm not a big fan of that one.
So my question is : has anyone already faced a similar question ? Do you have advises for the technical implementation of the solution (#2), or a better solution to propose me ?
Thqnks !

Clearly you should use Celery + RabbitMQ/REDIS. If you look at the docs it´s not that hard to setup.
The first question is whether to use RabbitMQ or Redis. There are many SO questions about this with good information about pros/cons.
The implementation in django is really simple. You can just wrap django functions with celery tasks (with #task attribute) and it´ll become async, so this is the easy part.
The problem I see in your project is that the server who is handling http traffic is the same server running the long process. That can affect performance and user experience even if celery is running on the background. Of course that depends on how much traffic you are expecting on that machine and how many migrations can run at the same time.
One of the things you setup on Celery is the number of workers (concurrent processing units) available. So the number of cores in your machine will matter.
If you need to handle http calls quickly I would suggest to delegate the migration process to another machine. Celery/REDIS can be configured that way. Let´s say you´ve got 2 servers. One would handle only normal django calls (no celery) and trigger celery tasks on the other server (the one who actually runs the migration process). Both servers can connect to the same database.
But this is just an infrastructure optimization and you may not need it.
I hope this answers your question. If you have specific Celery issues it would be better to create another question.

why using messaging queue in web applications

When developing my web application using Django, I faced a problem, when I call some functions locally they work correctly, but once i call them over HTTP request they are not executed.
I asked around and i was told to execute them asynchronously outside the request response cycle using celery and a messaging queue server, it worked well, but still I don't understand why i have to execute some tasks asynchronously even when i don't have race condition and there's only one client calling the web service.
This is a big black spot for me because I make it work without really knowing how.
Can anyone explain it to me?
Thanks.

The two main benefits I know of for queue-based systems are:
One, a response can be given to the client without having to wait for work to be done. This lets pages load faster and clients spend less time waiting.
Second, a queue gives you a central location for scheduled jobs that multiple workers can draw from. If a certain component of your application can't keep up with the amount of work there is to do (or if it fails for some reason), you can have other instances of that component doing the work, and there is a single place where all of the work that needs to be done can be found.

Django + Coffescript: Real time video application with io sockets

I have been trying to solve this for 2 weeks and I have not been able to reach a solution.
Here is what I am trying to do:
I need a web application in which users can upload a video; the video is going to be transformed using opencv's python API. Since I have Python's API for opencv I decided to create the webapp using Django. Everything is fine to that point.
The problem is that the video transformation is a very long process so I was trying to implement some real time capabilities in order to show the user the video as it is transformed, in other words, I transform a frame and show it to the user inmediatly. I am trying to do this with CoffeScript and io sockets following some examples; however I havent been successful.
My question is; what would be the right approach to add real time capabilities to a Django application ?

I'd recommend using a non-django service to handle the websockets. Setting up websockets properly is tricky on both the client and server side. Look at pusher.com for a free/cheap solution that will just work and save you a whole lot of hassle.
The initial request to start rendering should kick off the long-lived process, and return with an ID which is used to listen to the websocket for updates.
Once you have your websockets set up, you can send messages to the client about each finished frame. Personally I wouldn't try to push the whole frame down the websocket, but rather just send a message saying the frame is done with a URL to get the frame. Then normal HTTP with its caching and browser niceties moves the big data.
You're definitely not choosing the easy path. The easy path is to have your long-lived render task update the render state in the database, and have the client poll that. Extra server load, but a lot simpler.

Django itself really is focused on doing one kind of web interface, which is following the HTTP Request/Response pattern. To maintain a persistent connection with clients, which socket.io really makes dead simple, you need to diverge a bit from a normal Django installation.
This article discusses the issue of doing real-time with Django, with the help of Orbited and Twisted. It's rather old, and it relies on Comet, which is not the preferred way of doing real-time these days.
You might benefit a lot by going for Socket.io on the client, and something like Tornado (wiki) + Tornado client for Socket.io. But, if you really want to stick with Django for the web development (which Tornado also provide), you would need to make the two work together internally, each handling their particular use case.
Finally, this other article discusses how to make Django work with gevent (an coroutine-based networking library for Python) and Socket.io, which might well be your best option otherwise.
Don't hesitate to post questions/comments as they pop up!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js