I have Django server that servers HTTP requests and updates Postgres DB. There is also Tornado TCP server that should push new data from Database.
The problem is: best way to notify Tornado about DB changes. That's it.
What options did I think of:
Celery. I've read a lot of docs and it seems that this is just a queue for tasks, not for communications. Moreover, those communications should be established between different processes.
Redis. It seems like a good solution that also has brukva asynchronous client for Tornado. I'll need to introduce additional dependency on a project though.
Socket. It seems the most lightweight solution, besides the Tornado TCP server serves lots of client sockets anyways, so adding one more shouldn't be a big deal. I don't want to go this low though.
DB notifications. This was a solution I came up at first, but disliked later. Postgres has native notifications support by mechanisms of NOTIFY and LISTEN that are also asynchronous and can be easily added to Tornado IOLoop (io_loop.add_handler with connection file descriptor). This is low level too, but more important is that this solution violates Data Independency Principle.
Now I need to mention a subtle detail that can make a big difference with what I've described:
99% of changes in DB occur not in Django process, but in separate scripts (and hence separate processes) that are launched by scheduler and use Django ORM.
Now this description more or less reflects the actual state of project.
So is Celery really not suited for solving this kind of problem? Is Redis the best solution? Are there any other that I don't know about? Any thoughts and suggestions are appreciated.
After evaluating all these options, I decided to go with Redis+Tornado approach. It showed itself quite robust and stable (it was 5 years ago, just for the reference). I laid out the whole project with a working example here: https://www.databrawl.com/2017/04/27/real-time-python-via-tcp/
Related
I'm working on a long request to a django app (nginx reverse proxy, mysql db, celery-rabbitMQ-redis set) and have some doubts about the solution i should apply :
Functionning : One functionality of the app allows users to migrate thousands of objects from one system to another. Each migration is logged into a db, and the users are provided the possibility to get in a csv format the history of the migration : which objects have been migrated, which status (success, errors, ...)
To get the history, a get request is sent to a django view, which returns, after serialization and rendering into csv, the download response.
Problem : the serialisation and rendering processes, for a large set of objects (e.g. 160 000) are quite long and the request times out.
Some solutions I was thinking about/found thanks to pervious search are :
Increasing the amount of time before timeout : easy, but I saw everywhere that this is a global nginx setting and would affect every requests on the server.
Using an asynchronous task handled by celery : the concept would be to make an initial request to the server, which would launch the serializing and rendering task with celery, and give a special httpresponse to the client. Then the client would regularly ask the server if the job is done, and the server would deliver the history at the end of processing. I like this one but I'm not sure about how to technically implement that.
Creating and temporarily storing the csv file on the server, and give the user a way to access it & to download it. I'm not a big fan of that one.
So my question is : has anyone already faced a similar question ? Do you have advises for the technical implementation of the solution (#2), or a better solution to propose me ?
Thqnks !
Clearly you should use Celery + RabbitMQ/REDIS. If you look at the docs it´s not that hard to setup.
The first question is whether to use RabbitMQ or Redis. There are many SO questions about this with good information about pros/cons.
The implementation in django is really simple. You can just wrap django functions with celery tasks (with #task attribute) and it´ll become async, so this is the easy part.
The problem I see in your project is that the server who is handling http traffic is the same server running the long process. That can affect performance and user experience even if celery is running on the background. Of course that depends on how much traffic you are expecting on that machine and how many migrations can run at the same time.
One of the things you setup on Celery is the number of workers (concurrent processing units) available. So the number of cores in your machine will matter.
If you need to handle http calls quickly I would suggest to delegate the migration process to another machine. Celery/REDIS can be configured that way. Let´s say you´ve got 2 servers. One would handle only normal django calls (no celery) and trigger celery tasks on the other server (the one who actually runs the migration process). Both servers can connect to the same database.
But this is just an infrastructure optimization and you may not need it.
I hope this answers your question. If you have specific Celery issues it would be better to create another question.
I'm trying to do a Django application with an asynchronous part: Websockets. Just as a little challenge, I want to mount everything in the same process. Tried Socket.IO but couldn't manage to actually use sockets, instead of longpolling (which killed my browser several times, until I gave up).
What I then tried was a not-so-maintained library based on gevent-websocket. However, had many errors and was not easy to debug.
Now I am trying a Tornado approach but AFAIK (please correct me if I'm wrong) integrating async with a regular django app wrapped by WSGIContainer (websockets would go through Tornado, regular connections through Django) will be a true server killer if a resource is heavy or, somehow, the Django ORM goes slow into heavy operations.
I was thinking on moving to Twisted/Cyclone. Before I move from one architecture with such issue to ANOTHER architecture with such issue, i'd like to ask:
Does Tornado (and/or Twisted) have an architecture of scheduling tasks in the same way Gevent does? (this means: when certain greenlets "block", they schedule themselves to other threads, at least until the operation finishes). I'm asking this because (please correct me if I'm wrong) a regular django view will not be suitable for stuff like #inlineCallbacks, and will cause the whole server to be blocked (incl. the websockets).
I'm new to async programming in python, so there's a huge change I have misinformation about more than one concept. Please help me clarifying this before I switch.
Neither Tornado nor Twisted have anything like gevent's magic to run (some) blocking code with the performance characteristics of asynchronous code. Idiomatic use of either Tornado or Twisted will be visible throughout your app in the form of callbacks and/or Futures/Deferreds.
In general, since you'll need to run multiple python processes anyway due to the GIL, it's usually best to dedicate some processes to websockets with Tornado/Twisted and other processes to Django with the WSGI container of your choice (and then put nginx or haproxy in front so it looks like a single service to the outside world).
If you still want to combine django and an asynchronous service in the same process, the next best solution is to use threads. If you want the two to share one listening port, the listener must be a websocket-aware HTTP server that can spawn other threads for WSGI requests. Tornado does not yet have a solution for this, although one is planned for version 4.1 (https://github.com/tornadoweb/tornado/pull/1075). I believe Twisted's WSGI container does support running the WSGI workers in threads, but I don't have any experience with it myself. If you need them in the same process but do not need to share the same port, then you can simply run the IOLoop or Reactor in one thread and the WSGI container of your choice in another (with its associated worker threads).
I am trying to make a realtime messaging application. There will be 2 distinct server(node.js and django) and when a user sends message to another user message will be stored in database than node.js will send a message to receiver like "You have new Message!". For that i am planing to call url which node.js serve. So node.js and django will interact each other. And what is best way send message to specifig client ? (I keep clients with their id's in a assosicative array.)
what do you think about that? is it efficent or do you suggest better way to do this ?
Now that I understand more about what you're trying to do, here my answer, just keep in mind that this only reflects my opinion, and I bet that many others would argue about it.
It all matter on how much traffic you expect to have in your application. If it's not a high traffic application, then efficiency in run-time is insignificant when compared to that of the development, and so choose the technology you feel most comfortable with.
If though you do aim for high traffic application, then I believe that this setup is not a good one.
First of all while http based communication between servers might seem comfortable, you are dealing with the overhead of http over tcp (since http is based on tcp). And so regular tcp sockets scale better, but on the other hand if you write the sockets server in python than you can run it from the same process as the django and then just use it as an object from django (you're entering the realm of threads here). But that's problematic if you have a few web instances, again depends on how much traffic you expect.
As for your choice for implementing the messaging server, I've never tested node.js but I believe that in benchmark tests it won't compare for something written in erlang or Java NIO. For example: JAVA AIO (NIO.2) VS NODEJS
I'm working on a WebApp using backbone.js and socket.io on the client side, and Django on the server side. I'd like to do "push" from the server when data changes (just like in a chat app). I came across two implementation of socket.io in Python with a Django integration that looked promising:
django-socketio which is based on gevent and gevent-socketio
Tornado-based integrations that use torandio2 such as tornadio-with-django and django-tornadio
Both gevent and tornado have very good performance, so I'm not interested in other async connection frameworks. The only other requirement is the use of SSL for the connection - no plain text transmission.
So between these two approached, which would be the easiest to implement? Is there a good subscription-based framework for tornadio2 similar to how django-socketio does it for gevent?
Another option I came across is django-serverpush, which is also based on TornadIO2. It better integrates with Django than the other TornadIO2 apps, but the implementation still needs some improvement before it becomes production-ready.
At the time of writing this answer, django-socketio still hasn't been fully upgraded to work with the latest socket.io.
Tornado/TornadIO2 on the other hand are well maintained, and with a few custom extensions I was able to get them up and running very nicely. After I launch my product, I'm hoping to spend some time to open-source my modifications. Until then, I'd be happy to answer any questions on how to get this running.
Honestly I'd say they're pretty similar. This is more of an opinion. For performance I think gevent has more performance based on what I've read, but you should do your own tests to find out which has highest performance.
I have read all the questions and answers I can find regarding Django and HTTP Push. Yet, none offer a clear, concise, beginning-to-end solution about how to accomplish a basic "hello world" of so-called "comet" functionality.
First question (1): To what extent is the problem that HTTP simply isn't (at least so far) made for this? Are all the potential solutions essentially hacks?
2) What's the best currently available solution?
Orbited?
Some other Twisted-based solution?
Tornado?
node.JS?
XMPP w/ BOSH?
Some other solution?
3) How does nginx push module play into this discussion?
4) Which of these solutions require replacement of the typical mod_wsgi / nginx (or apache) deployment model? Why do they require this? Is this a favorable transition in any case?
5) How significant are the advantages of using a solution that is already in Python?
Alex Gaynor's presentation from PyCon 2010, which I just watched on blip.tv, is amazing and informative, but not terrifically specific on the current state of HTTP Push in Django. One thing that he said that gave me some confidence was this: Orbited does a good job of abstracting and simulating the concept of network sockets. Thus, when WebSockets actually land, we'll be in a good place for a transition.
6) How does HTML5 Websockets differ from current solutions? Is Gaynor's assessment of the ease of transition from Orbited accurate?
I'd take a look at evserver (http://code.google.com/p/evserver/) if all you need is comet.
It "supports [the] little known Asynchronous WSGI extension" and is build around libevent. Works like a charm and supports django. The actual handler code is a bit ugly, but it scales well as it really is async io.
I have used evserver and I'm currently moving to cyclone (tornado on twisted) because I need a little more than evserver offsers. I need true bidirectional io (think socket.io (http://socket.io/)) and while evserver could support it I thought it was easier to reimplement tornado's socket.io in cyclone (I opted for cyclone instead of tornado as cyclone is build on twisted, thus allowing for more transports that aren't implemented in twisted (i.c. zeromq)) Socket.io supports websockets, comet style polling, and, much more interseting, flash based websockets. I think that in most practical situations websockets + flash based websockets are enough to support 99% (according to adobe flash penetration is about 99% (http://www.adobe.com/products/player_census/flashplayer/version_penetration.html)) of a websites visitors (only people not using flash need to fallback to one of socket.io its (less perfomant and resource hogging) backup transports)
Be aware though websockets are not an http transport thus putting them behind http based proxies (e.g haproxy in http mode) breaks the connection. Better serve them on an alternate ip or port so you can proxy in tcp mode (e.g haproxy in tcp mode).
To answer your questions:
(1) If you don't need a bidirectional transport longpolling based solutions are good enough (all they do is keep a connection open). Things do get iffy when you need your connection to be statefull or you need to be able to both send and receive data. In the latter case socket.io helps. However websockets are made for this scenario and with the support of flash its available to most of a websites vistors (via socket.io or standalone, however socket.io has the added benefit of backup transports for those people not wanting to install flash)
(2) if all you need is push, evserver is your best bet. It uses the the same javascripts on the client side as orbited. Else look at socket.io (this also needs a supporting server, the only python one available is tornado.)
(3) It's just one other server implementation. If i read it correctly it's push only. pushing data to a client is done by making http equest from your app to the nginx server. (nginx then takes care they reach the client). If you're inteersted in this, look at mongrel2 (http://mongrel2.org/home) it not only has handlers for longpolling but also for websockets.(instead of making http request to mongrel, this time you use zeromq handlers to get data to your mongrel server) (Do take note of the developer's lack of enthusiasm for websockets and flash based websockets. Especially taking into account that the websocket protocol tends to evolve you might, at some point, need to recode mongrel2's websocket support yourself keep having support for websockets)
(4) All solutions except evserver replace wsgi with something else. Though most servers also have some wsgi support ontop of this "something else". No matter what solution you choose be careful that one cpu intensive or otherwise io blocking request doesn't block the server. (you either need multiple instances or threads).
(5) Not very significant. All solutions depend on some custom handlers to push (and, if applicable, receive) data to the client. All solutions i mentioned allow these handlers to be written in python. If you want to use a completely different framework (node.js) then you have to weigh the ease of node.js (it's assumed to be easy, but it's also rather experimental, and i found very few libraries to be actually stable) against the convenience of using your existing code base and the available libraries (e.g. if your app needs a blog ther are plenty django blogs you could plug in, but none for node.js) Also don't stare yourself blind on performance stats. unless you plan to push dumb predefined data (what all benchmarks do) to the client you'll find that the actual processing of data adds much more overhead than even the worst async io implementation. (But you still want to use an async io based server if you plan to have many simultaneous clients, threading simply isn't meant to keep thousands of connections alive)
(6) websockets offer bidirectional communication, long polling/comet only pushes data but does not accept writes. (Socket.io simulates this bidirectional support by using two http requests, one to longpoll, one to send data. It tracks their interdependance by a (session) id that's part of both requests query string). flash based websockets are similar to real websockets (the difference is that their implementation is in the swf, not your browser). Also the websockets protocol does not follow the http protocol; longpolling/comet stuff does (technically the websocket client sends an upgrade request to websocket server, the upgraded protocol isn't http anymore)
There is support for WebSockets with django-websocket, but unfortunately there are major issues with it for getting it working; here's a quote from that page:
Disclaimer (what you should know when using django-websocket)
BIG FAT DISCLAIMER - right at the moment its technically NOT possible in any way to use a websocket with WSGI. This is a known issue but cannot be worked around in a clean way due to some design decision that were made while the WSGI stadard was written. At this time things like Websockets etc. didn't exist and were not predictable.
...
But not only WSGI is the limiting factor. Django itself was designed around a simple request to response scenario without Websockets in mind. This also means that providing a standard conform websocket implemention is not possible right now for django. However it works somehow in a not-so pretty way. So be aware that tcp sockets might get tortured while using django-websocket.
So at the moment, WSGI: no go; Django: hardly any go, even with django-websockets; see also a comment in the author's original announcement:
I can't say this looks like a good idea. You're doing long-lived connections in a way that is going to require threading. django-websocket requires threading setup, and won't work if you've got processes (because you'd just have too many processes) but threads won't scale for a lot of connections at the same time, either, so its just a false safety. You need an asynchronous platform for long-lived things, and I do this by doing my app in Django and my comet and websocket in Node.js
Personally if trying to use WebSockets (which I expect to be next year), I would try the combination of Twisted and Cyclone first. They're designed to cope with WebSockets, and scale well. If you write your code properly to remove unnecessary dependencies on Django, you should be able to use much of your code in a Twisted-based system. This is a very distinct advantage over using Node.js or Comet or any system in another language. You could also make a simple push
Finally, you could also just decide it's too hard and use an external service to provide the push support. That then becomes a matter of sending a simple JSON request to their servers instead of worrying about how to make the connection and how concurrency will work and things like that. Of course, you'll need to pay for it (though currently it may be free while in Beta), but you don't need to worry about implementation details; you won't have the full power of WebSockets that way though - just push support.
I can't believe it's been over six years since I asked this question.
Async with Django (and the associated network traffic, eg websockets) has been an itch for many of us in the community. I have taken these past few years, to among other things, scratch this itch.
hendrix
hendrix is a WSGI/ASGI conatiner that runs on Twisted. It has been a project mainly driven by 5 enthusiasts, with help and funding from some visionary organizations. It is in production today at dozens, but not hundreds, of companies.
I'll leave it to you to read the documentation to see why it's the best solution to this problem, but a few quick highlights:
it's based on Twisted, requires no knowledge or use of Twisted internals, but leaves them all available
It "just works" in the sense that you don't need any special server or process configuration to do async and socket traffic from within your Django (or Pyramid, or Flask) app
It is very likely to be forward-compatible with ASGI, the Django Channels standard, and is in some meaningful ways the first ASGI container
It ships with simple APIs that maintain the flow of your view logic and are easy to unit test.
Please see this talk that I gave at Django-NYC (at the Buzzfeed offices) for more information about why I think this is the best answer to this question.
Re question #2, I recently was given a tour of the internals of a Django app that uses Comet heavily, and Orbited was the solution they chose.