Django / Comet (Push): Least of all evils?

Django / Comet (Push): Least of all evils? - django

I have read all the questions and answers I can find regarding Django and HTTP Push. Yet, none offer a clear, concise, beginning-to-end solution about how to accomplish a basic "hello world" of so-called "comet" functionality.
First question (1): To what extent is the problem that HTTP simply isn't (at least so far) made for this? Are all the potential solutions essentially hacks?
2) What's the best currently available solution?
Orbited?
Some other Twisted-based solution?
Tornado?
node.JS?
XMPP w/ BOSH?
Some other solution?
3) How does nginx push module play into this discussion?
4) Which of these solutions require replacement of the typical mod_wsgi / nginx (or apache) deployment model? Why do they require this? Is this a favorable transition in any case?
5) How significant are the advantages of using a solution that is already in Python?
Alex Gaynor's presentation from PyCon 2010, which I just watched on blip.tv, is amazing and informative, but not terrifically specific on the current state of HTTP Push in Django. One thing that he said that gave me some confidence was this: Orbited does a good job of abstracting and simulating the concept of network sockets. Thus, when WebSockets actually land, we'll be in a good place for a transition.
6) How does HTML5 Websockets differ from current solutions? Is Gaynor's assessment of the ease of transition from Orbited accurate?

I'd take a look at evserver (http://code.google.com/p/evserver/) if all you need is comet.
It "supports [the] little known Asynchronous WSGI extension" and is build around libevent. Works like a charm and supports django. The actual handler code is a bit ugly, but it scales well as it really is async io.
I have used evserver and I'm currently moving to cyclone (tornado on twisted) because I need a little more than evserver offsers. I need true bidirectional io (think socket.io (http://socket.io/)) and while evserver could support it I thought it was easier to reimplement tornado's socket.io in cyclone (I opted for cyclone instead of tornado as cyclone is build on twisted, thus allowing for more transports that aren't implemented in twisted (i.c. zeromq)) Socket.io supports websockets, comet style polling, and, much more interseting, flash based websockets. I think that in most practical situations websockets + flash based websockets are enough to support 99% (according to adobe flash penetration is about 99% (http://www.adobe.com/products/player_census/flashplayer/version_penetration.html)) of a websites visitors (only people not using flash need to fallback to one of socket.io its (less perfomant and resource hogging) backup transports)
Be aware though websockets are not an http transport thus putting them behind http based proxies (e.g haproxy in http mode) breaks the connection. Better serve them on an alternate ip or port so you can proxy in tcp mode (e.g haproxy in tcp mode).
To answer your questions:
(1) If you don't need a bidirectional transport longpolling based solutions are good enough (all they do is keep a connection open). Things do get iffy when you need your connection to be statefull or you need to be able to both send and receive data. In the latter case socket.io helps. However websockets are made for this scenario and with the support of flash its available to most of a websites vistors (via socket.io or standalone, however socket.io has the added benefit of backup transports for those people not wanting to install flash)
(2) if all you need is push, evserver is your best bet. It uses the the same javascripts on the client side as orbited. Else look at socket.io (this also needs a supporting server, the only python one available is tornado.)
(3) It's just one other server implementation. If i read it correctly it's push only. pushing data to a client is done by making http equest from your app to the nginx server. (nginx then takes care they reach the client). If you're inteersted in this, look at mongrel2 (http://mongrel2.org/home) it not only has handlers for longpolling but also for websockets.(instead of making http request to mongrel, this time you use zeromq handlers to get data to your mongrel server) (Do take note of the developer's lack of enthusiasm for websockets and flash based websockets. Especially taking into account that the websocket protocol tends to evolve you might, at some point, need to recode mongrel2's websocket support yourself keep having support for websockets)
(4) All solutions except evserver replace wsgi with something else. Though most servers also have some wsgi support ontop of this "something else". No matter what solution you choose be careful that one cpu intensive or otherwise io blocking request doesn't block the server. (you either need multiple instances or threads).
(5) Not very significant. All solutions depend on some custom handlers to push (and, if applicable, receive) data to the client. All solutions i mentioned allow these handlers to be written in python. If you want to use a completely different framework (node.js) then you have to weigh the ease of node.js (it's assumed to be easy, but it's also rather experimental, and i found very few libraries to be actually stable) against the convenience of using your existing code base and the available libraries (e.g. if your app needs a blog ther are plenty django blogs you could plug in, but none for node.js) Also don't stare yourself blind on performance stats. unless you plan to push dumb predefined data (what all benchmarks do) to the client you'll find that the actual processing of data adds much more overhead than even the worst async io implementation. (But you still want to use an async io based server if you plan to have many simultaneous clients, threading simply isn't meant to keep thousands of connections alive)
(6) websockets offer bidirectional communication, long polling/comet only pushes data but does not accept writes. (Socket.io simulates this bidirectional support by using two http requests, one to longpoll, one to send data. It tracks their interdependance by a (session) id that's part of both requests query string). flash based websockets are similar to real websockets (the difference is that their implementation is in the swf, not your browser). Also the websockets protocol does not follow the http protocol; longpolling/comet stuff does (technically the websocket client sends an upgrade request to websocket server, the upgraded protocol isn't http anymore)

There is support for WebSockets with django-websocket, but unfortunately there are major issues with it for getting it working; here's a quote from that page:
Disclaimer (what you should know when using django-websocket)
BIG FAT DISCLAIMER - right at the moment its technically NOT possible in any way to use a websocket with WSGI. This is a known issue but cannot be worked around in a clean way due to some design decision that were made while the WSGI stadard was written. At this time things like Websockets etc. didn't exist and were not predictable.
...
But not only WSGI is the limiting factor. Django itself was designed around a simple request to response scenario without Websockets in mind. This also means that providing a standard conform websocket implemention is not possible right now for django. However it works somehow in a not-so pretty way. So be aware that tcp sockets might get tortured while using django-websocket.
So at the moment, WSGI: no go; Django: hardly any go, even with django-websockets; see also a comment in the author's original announcement:
I can't say this looks like a good idea. You're doing long-lived connections in a way that is going to require threading. django-websocket requires threading setup, and won't work if you've got processes (because you'd just have too many processes) but threads won't scale for a lot of connections at the same time, either, so its just a false safety. You need an asynchronous platform for long-lived things, and I do this by doing my app in Django and my comet and websocket in Node.js
Personally if trying to use WebSockets (which I expect to be next year), I would try the combination of Twisted and Cyclone first. They're designed to cope with WebSockets, and scale well. If you write your code properly to remove unnecessary dependencies on Django, you should be able to use much of your code in a Twisted-based system. This is a very distinct advantage over using Node.js or Comet or any system in another language. You could also make a simple push
Finally, you could also just decide it's too hard and use an external service to provide the push support. That then becomes a matter of sending a simple JSON request to their servers instead of worrying about how to make the connection and how concurrency will work and things like that. Of course, you'll need to pay for it (though currently it may be free while in Beta), but you don't need to worry about implementation details; you won't have the full power of WebSockets that way though - just push support.

I can't believe it's been over six years since I asked this question.
Async with Django (and the associated network traffic, eg websockets) has been an itch for many of us in the community. I have taken these past few years, to among other things, scratch this itch.
hendrix
hendrix is a WSGI/ASGI conatiner that runs on Twisted. It has been a project mainly driven by 5 enthusiasts, with help and funding from some visionary organizations. It is in production today at dozens, but not hundreds, of companies.
I'll leave it to you to read the documentation to see why it's the best solution to this problem, but a few quick highlights:
it's based on Twisted, requires no knowledge or use of Twisted internals, but leaves them all available
It "just works" in the sense that you don't need any special server or process configuration to do async and socket traffic from within your Django (or Pyramid, or Flask) app
It is very likely to be forward-compatible with ASGI, the Django Channels standard, and is in some meaningful ways the first ASGI container
It ships with simple APIs that maintain the flow of your view logic and are easy to unit test.
Please see this talk that I gave at Django-NYC (at the Buzzfeed offices) for more information about why I think this is the best answer to this question.

Re question #2, I recently was given a tour of the internals of a Django app that uses Comet heavily, and Orbited was the solution they chose.

Related

Why is *SGI + Nginx/HTTP considered the best practice for deploying web applications?

My friend recently asked me the following question: given that Django already has runserver, why didn't wasn't it extended to be a production-ready customer-facing HTTP server? What people do instead is set up an uwsgi server that speaks WSGI and exposes something that Nginx forwards traffic to by reverse proxying...
Based on what I know, many other languages use this pattern: there is a "simple" HTTP server meant for development, as well as an interface for *GI (ASGI/WSGI/FCGI/CGI) that web server is supposed to reverse proxy to. What is the main reason those web servers don't grow production-ready and instead assume presence of another web server?
Here are some of my theories, but I'm not sure if I'm missing something more significant:
History: dynamic websites date back to perl/PHP, both worked as a "dumb" CGI backend that was basically a filter that processed HTTP request (stdin) to a response (stdout). This architecture worked for some time and became a common pattern,
Performance: web applications are often written in languages that don't JIT and having a web server written in such a language would introduce extra overhead while milliseconds matter. Also, this lets us speed up static file serving,
Security: Django's runserver is clearly described as potentially insecure, according to this quote:
DO NOT USE THIS SERVER IN A PRODUCTION SETTING. It has not gone through security audits or performance tests. (And that’s how it’s gonna stay.
The last point seems to suggest that writing a production-ready HTTP server is too complex to fit within Django's goals, what kind of edge cases would need to be supported to get there?
Is any of the points actually valid, or am I missing the elephant in the room here?

Because they don't want to get into the web server business, and I think that's a wise decision.
Creating, developing and most importantly maintaining a web server is not a trivial thing. They couldn't simply write it once and then it's done (in fact, that's pretty much what they did and it's runserver).
Rather than re-invent the wheel, they've chosen to leave it to those who do it best. They're not likely to match the stability and functionality of a proper web server by doing it as a side-project to support running Django applications. They're better spending their time making Django better.
It's also consistent with the UNIX philosophy, but that's not necessary to get into here.

Django node.js socket.io

I am trying to make a realtime messaging application. There will be 2 distinct server(node.js and django) and when a user sends message to another user message will be stored in database than node.js will send a message to receiver like "You have new Message!". For that i am planing to call url which node.js serve. So node.js and django will interact each other. And what is best way send message to specifig client ? (I keep clients with their id's in a assosicative array.)
what do you think about that? is it efficent or do you suggest better way to do this ?

Now that I understand more about what you're trying to do, here my answer, just keep in mind that this only reflects my opinion, and I bet that many others would argue about it.
It all matter on how much traffic you expect to have in your application. If it's not a high traffic application, then efficiency in run-time is insignificant when compared to that of the development, and so choose the technology you feel most comfortable with.
If though you do aim for high traffic application, then I believe that this setup is not a good one.
First of all while http based communication between servers might seem comfortable, you are dealing with the overhead of http over tcp (since http is based on tcp). And so regular tcp sockets scale better, but on the other hand if you write the sockets server in python than you can run it from the same process as the django and then just use it as an object from django (you're entering the realm of threads here). But that's problematic if you have a few web instances, again depends on how much traffic you expect.
As for your choice for implementing the messaging server, I've never tested node.js but I believe that in benchmark tests it won't compare for something written in erlang or Java NIO. For example: JAVA AIO (NIO.2) VS NODEJS

Socket Server vs. Standard Servers

I'm working on a project of which a large part is server side software. I started programming in C++ using the sockets library. But, one of my partners suggested that we use a standard server like IIS, Apache or nginx.
Which one is better to do, in the long run? When I program it in C++, I have direct access to the raw requests where as in the case of using standard servers I need to use a scripting language to handle the requests. In any case, which one is the better option and why?
Also, when it comes to security for things like DDOS attacks etc., do the standard servers already have protection? If I would want to implement it in my socket server, what is the best way?

"Server side software" could mean lots of different things, for example this could be a trivial app which "echoes" everything back on a specific port, to a telnet/ftp server to a webserver running lots of "services".
So where in this gamut of possibilities does your particular application lie? Without further information, it's difficult to make any suggestions, but let's see..
Web Services, i.e. your "server side" requirement is to handle individual requests and respond having done some set of business logic. Typically communication is via SOAP/XML, and this is ideal if you bave web based clients (though nothing prevents your from accessing these services via standalone clients). Typially you host these on web servers as you mentioned, and often they are easiest written in Java (I've yet to come across one that needed to be written in C++!)
Simple web site - slightly different to the above, respods to HTML get/post requests and serves up static or dymanic content (I'm guessing this is not what you're after!)
Standalone server which responds to something specific, here you'd have to implement your own "messaging"/protocols etc. and the server will carry out a specific function on incoming request and potentially send responses back. Key thing here is that the server does something specific, and is not a generic container (at which point 1 makes more sense!)
So where does your application lie? If 1/2 use Java or some scripting language (such as Perl/ASP/JSP etc.) If 3, you can certainly use C++, and if you do, use a suitable abstraction, such as boost::asio and Google Protocol buffers, save yourself a lot of headache...
With regards to security, ofcourse bugs and security holes are found all the time, however the good thing with some of these OS projects is that the community will tackle and fix them. Let's just say, you'll be safer using them than your own custom handrolled imlpementation, the likelyhood that you'll be able to address all the issues that they would have encountered in the years they've been around is very small (no disrespect to your abilities!)
EDIT: now that there's a little more info, here is one possible approach (this is what I've done in the past, and I've jused Java most of the way..)
The client facing server should be something reliable, esp. if it's over the internet, here I would use a proven product, something like Apache is good or IIS (depends on which technologies you have available). IMHO, I would go for jBoss AS - really powerful and easily customisable piece of kit, and integrates really nicely with lots of different things (all Java ofcourse!) You could then have a simple bit of Java which can then delegate to your actual Server processes that do the work..
For the Server procesess you can use C++ if that's what you are comfortable with
There is one key bit which I left out, and this is how 1 & 2 talk to each other. This is where you should look at an open source messaging product (even more higher level than asio or protocol buffers), and here I would look at something like Zero MQ, or Red Hat Messaging (both are MQ messaging protocols), the great advantage of this type of "messaging bus" is that there is no tight coupling between your servers, with your own handrolled implementation, you'll be doing lots of boilerplate to get the interaction to work just right, with something like MQ, you'll have multiplatform communication without having to get into the details... You wil save yourself a lot of time and bother if you elect to use something like that.. (btw. there are other messaging products out there, and some are easier to use - such as Tibco RV or EMS etc, however they are commercial products and licenses will cost a lot of money!)
With a messaging solution your servers become trivial as they simply handle incoming messagins and send messages back out again, and you can focus on the business logic...
my two pennies... :)

If you opt for 1st solution in Nim's list (web services) I would suggest you to have a look at WSO's web services framework for C++ , Axis CPP and Axis2/C web services framework (if you are not restricted to C++). Web Services might be the best solution for your requirement as you can quickly build them and use either as processing or proxy modules on the server side of your system.

online game best-practice

I'm developing a django-based MMO, and I'm wondering what would be the best way for server-client communication. The solutions I found are:
periodical AJAX calls
keeping a connection alive and sending data through it
Later edit:
This would consist in "you have a message", "user x attacked you", "your transport to x has arrived" and stuff like this. They could grow in number (something like 1/second), but for a typical user they shouldn't reach 1/minute

Not sure if it's applicable to what you're looking for, but there's a pretty good live example of lightweight server-client communication using node.js for a simple chat service:
http://chat.nodejs.org/

You might want to take a look at crossbar
Crossbar.io is an open-source server software that allows developers
to create distributed systems, composed of application components
which are loosely coupled, communicate in (soft) real-time and can be
implemented in different languages

There's also a third technique involving "hanging" queries:
Client requests an updated page (or whatever)
Server doesn't answer right away
Sometime before the request times out, there's a state update in the server, and the server finally answers the client, which can then update.
If there really is nothing new to tell the client within the update period, then the server responds before the timeout with a "no news" message, and the client starts up another "hanging" request.
Advantages:
Client doesn't have to do Ajax. You could even make regular HTML pages "interactive" like this.
Probably not quite as much senseless polling traffic.
Disadvantages:
Server needs to keep more active connections open, and service them at least once per timeout period. Also,
depending on how well the server code supports multi-threading (does PHP provide any help there?), it may be more difficult to code than AJAX response handling.

Best messaging medium for real-time SOA applications?

I'm working on a real time application implemented using in a SOA-style (read loosely coupled components connected via some messaging protocol - JMS, MQ or HTTP).
The architect who designed this system opted to use JMS to connect the components. This system is real time so there no need to queue up messages should one component fail (the transaction will simply time out). Further, there is no need for guaranteed delivery or rollback.
In this instance, is there any benefit to using JMS over something like an HTTP web service (speed, resource footprint, etc)?
One thing that I'm thinking is since the JMS approach requires us to set a thread pool size (the number of components listening to a JMS topic/queue), wouldn't a HTTP service be a better fit since this additional configuration is not needed (a new thread is created for each HTTP request making the application scalable to an "unlimited" number of requests until the server runs out of resources).
Am I missing something?

I don't disagree with the points made by S.Lott at all, but here are a couple of points to consider regarding HTTP web services:
Your clients only need to know how to communicate via HTTP - a protocol well supported by just about every modern langauge in one form or another. JMS, though popular, is more specialist than HTTP, and so restricts the languages your interconnected systems can use. Perhaps not an issue for your system at the moment, but will you need to plug in other systems later that might struggle to support JMS connectivity?
Standards like WSDL and SOAP which you could levarage for your services are well supported by many langauges and there are plenty of tools around that will generate code to implement both ends of the pipeline (client and server) for you from a WSDL file, reducing the amount of dev you'll have to do. These standards also make it relatively simple to define and publish the specification of the data you'll be passing between your systems, something you'll presumably have to do by hand using a queueing technology like JMS.
On the downside, as pointed out by S.Lott, JMS gives you functionality that you throw away using the (stateless) HTTP protocol: guaranteed ordering & reliability; monitoring; scalability; etc. Are you sure you don't need these, and won't need these going forward?
Great question, btw.

I think it's really dependent on the situation. Where I work, we support Remoting, JMS, MQ, HTTP, and sFTP. We are implementing a middleware appliance that speaks Remoting, JMS, MQ, and HTTP, and a software middleware component that speaks JMS, MQ, and HTTP.
As sgreeve alluded to above, standards help us become flexible, but proprietary formats allow more functionality.
In a nutshell, I'd say use HTTP for stateless calls (which could end up meeting almost all of your needs), and whatever proprietary formats you need for stateful calls. If you work in a big enterprise, a hardware appliance is usually a great fit as middleware: Lightning fast compression, encryption, transformation, and translation, with very low total cost of ownership.

I don't know enough about your requirements, but you may be overlooking Manageability, Flexibility and Performance.
JMS allows you to monitor and manage the queue. These are features HTTP lacks, and you'd have to build rather than buy from a vendor.
Also, There are queues and topics in JMS, allowing multiple subscribers to a single publisher. Not possible in HTTP.
While you may not need those things in release 1.0, you might want them in the future.
Also, JMS may be able to use other transport mechanisms like named sockets, which reduces the overheads if there isn't all that socket negotiation going on with (almost) every request.

If you go down the HTTP route and you want to support more than one machine or some kind of reliability - you are going to need a load balancer capable of discovering the available web servers and loading requests across them - then failing over to another web server if a particular box/process dies. Clients making HTTP requests are also going to have to deal with servers failing and retrying operations in some loop.
This is one of the main features of a message queue - reliable load balancing with failover and loose coupling among the producers and consumers without them having to include retry logic - so your client or server code doesn't have to worry about this kinda thing. This is totally separate to whether or not you want message persistence or want to use ACID transactions to produce/consume messages (which can be very handy BTW).
If you focus just on the server side using Java - whether Servlets or MessageListener/MDBs they are kinda similar either way really. The difference is the load balancer.
So maybe the question should really be - is a JMS broker easier to setup & work with than setting up your DNS/NAT/IP/HTTP load balancer infrastructure?

I suppose it depends on what you mean by real-time... Neither JMS nor HTTP in my opinion support "real-time" applications well, meaning they cannot offer predictable/deterministic performance nor properly prioritize flows in the presence of contention.
Part of it is that these technologies are built on top of TCP which serializes all traffic into a single FIFO meaning that different traffic flows cannot be easily prioritized. Moreover TCP timers are not easily controlled resulting unpredictable blocking and timeouts... For this reason many streaming applications use UDP instead of TCP as an underlying protocol.
Another problem with JMS is that typical implementations use a broker that centralizes message dispatch. This is not the best architecture to get deterministic performance.
If you are looking for a middleware that can offer you the kind of reliability guarantees and publish-subscribe semantics you get with JMS, but was developed to fit the real-time application domain I recommend you take a look at the OMG Data-Distribution Service (DDS). See dds.omg.org and this article I wrote arguing why DDS is the best middleware to implement a real-time SOA. http://soa.sys-con.com/node/467488

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js