How to process requests and cookies using C++ wtih NGINX? - c++

We are planning on migrating from IIS to Nginx to gain performance. Our web layer is very lightweight - for each request we are reading/setting cookies and perform some very quick data cleanup and passing it down to very fast storage (Aerospike). Most of the requests take under 100ms, but we are experiencing inefficiencies due to IIS binding thread to each request. We are processing A LOT of concurrent requests.
Whats the best way to accomplish the same thing in Nginx? I know it would probably make sense to C++ to do most of my processing. Where do I take care of cookies, can I do it with C++? How do I forward a request from Nginx down to a compiled C++ binary effectively.
Thanks for your help!

You will need to write a module in nginx. The basic module should be written in C. But I guess you will be able write the main workhorse functions in c++. Unfortunately the api for module development etc is not documented well but Evan Millers Guide is your best guide.

Related

Writing a REST Service in C++ with Nginx

I'm a bit underwhelmed by the Nginx module documentation. I've a lot of C++ code, and a REST Service already running using Boost Beast, and I'd like to compare performance between Beast and NGINX using the C++ module interface against a Benchmark I'll write accordingly to my needs.
I've seen this tutorial here: https://www.evanmiller.org/nginx-modules-guide.html
But I've thus far not seen a concise, short example to just get started.
Is there a hidden documentation? Alternatively, do you have an example showing how to use Nginx as a REST service in C(++)?
Short answer: Do not embed any application code into nginx.
Long answer:
You can make new nginx module to help nginx to do its work better, for example:
add some new method of authentication
or some new transport to back-end, like shared memory.
Nginx was designed to serve static content, proxy requests and do some filtering like modifying headers.
Main objective of nginx - do these things as fast as possible and spend as less resources as possible.
It allows your application server to scale dynamically without affecting currently connected users.
Nginx is good web server but was never designed to become application server.
It does not makes much sense to embed application logic into nginx just because it is built with C language.
If you need to have the best of both worlds (proxy, static files and rest server) then just use them both (nginx and Beast) with each having its own responsibility.
Nginx will take care of balancing, encryption and any other non-application specific function and app server will do its work.
Nginx's architecture is based on non-blocking network/file calls and all connections are served in a single thread and Nginx do it well because it just shuffles data back and forth.
If the code of your application can generate content fast and without blocking calls to external services then you could embed your app into nginx with consequences of loosing scalability. And if some part of your app requires CPU bound work or blocking calls then you need to move such things off main networking loop and it complicates things "a bit".
By embedding your logic into nginx you could probably save some microseconds and file handles on communications.
For multi-user websocket app like chat or stock feed (i.e. app with long-term open connections) it could liberate extra resources but for the REST app with fast responses it would not make any gain.
Your REST app most likely uses SSL encryption. This encryption adds much more microseconds(milliseconds) to your response time compared to what you could gain by such implementation.
My advice: Leave nginx to do its things and do not interfere with it

Why is *SGI + Nginx/HTTP considered the best practice for deploying web applications?

My friend recently asked me the following question: given that Django already has runserver, why didn't wasn't it extended to be a production-ready customer-facing HTTP server? What people do instead is set up an uwsgi server that speaks WSGI and exposes something that Nginx forwards traffic to by reverse proxying...
Based on what I know, many other languages use this pattern: there is a "simple" HTTP server meant for development, as well as an interface for *GI (ASGI/WSGI/FCGI/CGI) that web server is supposed to reverse proxy to. What is the main reason those web servers don't grow production-ready and instead assume presence of another web server?
Here are some of my theories, but I'm not sure if I'm missing something more significant:
History: dynamic websites date back to perl/PHP, both worked as a "dumb" CGI backend that was basically a filter that processed HTTP request (stdin) to a response (stdout). This architecture worked for some time and became a common pattern,
Performance: web applications are often written in languages that don't JIT and having a web server written in such a language would introduce extra overhead while milliseconds matter. Also, this lets us speed up static file serving,
Security: Django's runserver is clearly described as potentially insecure, according to this quote:
DO NOT USE THIS SERVER IN A PRODUCTION SETTING. It has not gone through security audits or performance tests. (And that’s how it’s gonna stay.
The last point seems to suggest that writing a production-ready HTTP server is too complex to fit within Django's goals, what kind of edge cases would need to be supported to get there?
Is any of the points actually valid, or am I missing the elephant in the room here?
Because they don't want to get into the web server business, and I think that's a wise decision.
Creating, developing and most importantly maintaining a web server is not a trivial thing. They couldn't simply write it once and then it's done (in fact, that's pretty much what they did and it's runserver).
Rather than re-invent the wheel, they've chosen to leave it to those who do it best. They're not likely to match the stability and functionality of a proper web server by doing it as a side-project to support running Django applications. They're better spending their time making Django better.
It's also consistent with the UNIX philosophy, but that's not necessary to get into here.

fcgi vs mod_fastcgi on apache server

I have an apache server in which I am setting up fcgi. I was contemplating if I've to setup the tailor made mod_fastcgi or the plain old cgi-fcgi.
mod-fastcgi doesn't seem to support the "multiplexing" features of fcgi, and the web service I am building is a very high traffic service with several thousand calls per minute and I want them to be processed as quick as possible.
Any suggestions or advice??
Indeed, mod_fastcgi does not support multiplexing. I suppose this is because the Apache web server handles concurrent processing itself. You've probably dealt with it's various Multi-Processing-Models (MPMs) already...
Apache is highly optimized around the several (request) phases provided. The various modules can hook in where-ever you like, which makes the Apache an excellent server to directly integrate high performance and/or really complex applications (e.g. with custom modules in c, mod_perl and so on) as modules themselves.
But both, mod_fastcgi and cgi-fcgi, are IMHO only used to provide response and/or filter handler. Thus; many of the great features (configuration, mapping, post-request logging & cleanup...) provided with Apache are just not used in such a setup.
Thus; if your application is built on top of FGCI, I'd rather not recommend using Apache. Especially for high performance applications under high load; One may prefer a more lightweight but fast HTTP daemon. There are plenty of alternatives like nginx or lighttpd.
Usually one would use them as proxies/balancer to the FCGI processes, cache, SSL handler and logging provider. Of course, Apache is also capable of these tasks, but it's somehow like using a helicopter to direct the traffic at the intersection...
Cheers!

Django / Comet (Push): Least of all evils?

I have read all the questions and answers I can find regarding Django and HTTP Push. Yet, none offer a clear, concise, beginning-to-end solution about how to accomplish a basic "hello world" of so-called "comet" functionality.
First question (1): To what extent is the problem that HTTP simply isn't (at least so far) made for this? Are all the potential solutions essentially hacks?
2) What's the best currently available solution?
Orbited?
Some other Twisted-based solution?
Tornado?
node.JS?
XMPP w/ BOSH?
Some other solution?
3) How does nginx push module play into this discussion?
4) Which of these solutions require replacement of the typical mod_wsgi / nginx (or apache) deployment model? Why do they require this? Is this a favorable transition in any case?
5) How significant are the advantages of using a solution that is already in Python?
Alex Gaynor's presentation from PyCon 2010, which I just watched on blip.tv, is amazing and informative, but not terrifically specific on the current state of HTTP Push in Django. One thing that he said that gave me some confidence was this: Orbited does a good job of abstracting and simulating the concept of network sockets. Thus, when WebSockets actually land, we'll be in a good place for a transition.
6) How does HTML5 Websockets differ from current solutions? Is Gaynor's assessment of the ease of transition from Orbited accurate?
I'd take a look at evserver (http://code.google.com/p/evserver/) if all you need is comet.
It "supports [the] little known Asynchronous WSGI extension" and is build around libevent. Works like a charm and supports django. The actual handler code is a bit ugly, but it scales well as it really is async io.
I have used evserver and I'm currently moving to cyclone (tornado on twisted) because I need a little more than evserver offsers. I need true bidirectional io (think socket.io (http://socket.io/)) and while evserver could support it I thought it was easier to reimplement tornado's socket.io in cyclone (I opted for cyclone instead of tornado as cyclone is build on twisted, thus allowing for more transports that aren't implemented in twisted (i.c. zeromq)) Socket.io supports websockets, comet style polling, and, much more interseting, flash based websockets. I think that in most practical situations websockets + flash based websockets are enough to support 99% (according to adobe flash penetration is about 99% (http://www.adobe.com/products/player_census/flashplayer/version_penetration.html)) of a websites visitors (only people not using flash need to fallback to one of socket.io its (less perfomant and resource hogging) backup transports)
Be aware though websockets are not an http transport thus putting them behind http based proxies (e.g haproxy in http mode) breaks the connection. Better serve them on an alternate ip or port so you can proxy in tcp mode (e.g haproxy in tcp mode).
To answer your questions:
(1) If you don't need a bidirectional transport longpolling based solutions are good enough (all they do is keep a connection open). Things do get iffy when you need your connection to be statefull or you need to be able to both send and receive data. In the latter case socket.io helps. However websockets are made for this scenario and with the support of flash its available to most of a websites vistors (via socket.io or standalone, however socket.io has the added benefit of backup transports for those people not wanting to install flash)
(2) if all you need is push, evserver is your best bet. It uses the the same javascripts on the client side as orbited. Else look at socket.io (this also needs a supporting server, the only python one available is tornado.)
(3) It's just one other server implementation. If i read it correctly it's push only. pushing data to a client is done by making http equest from your app to the nginx server. (nginx then takes care they reach the client). If you're inteersted in this, look at mongrel2 (http://mongrel2.org/home) it not only has handlers for longpolling but also for websockets.(instead of making http request to mongrel, this time you use zeromq handlers to get data to your mongrel server) (Do take note of the developer's lack of enthusiasm for websockets and flash based websockets. Especially taking into account that the websocket protocol tends to evolve you might, at some point, need to recode mongrel2's websocket support yourself keep having support for websockets)
(4) All solutions except evserver replace wsgi with something else. Though most servers also have some wsgi support ontop of this "something else". No matter what solution you choose be careful that one cpu intensive or otherwise io blocking request doesn't block the server. (you either need multiple instances or threads).
(5) Not very significant. All solutions depend on some custom handlers to push (and, if applicable, receive) data to the client. All solutions i mentioned allow these handlers to be written in python. If you want to use a completely different framework (node.js) then you have to weigh the ease of node.js (it's assumed to be easy, but it's also rather experimental, and i found very few libraries to be actually stable) against the convenience of using your existing code base and the available libraries (e.g. if your app needs a blog ther are plenty django blogs you could plug in, but none for node.js) Also don't stare yourself blind on performance stats. unless you plan to push dumb predefined data (what all benchmarks do) to the client you'll find that the actual processing of data adds much more overhead than even the worst async io implementation. (But you still want to use an async io based server if you plan to have many simultaneous clients, threading simply isn't meant to keep thousands of connections alive)
(6) websockets offer bidirectional communication, long polling/comet only pushes data but does not accept writes. (Socket.io simulates this bidirectional support by using two http requests, one to longpoll, one to send data. It tracks their interdependance by a (session) id that's part of both requests query string). flash based websockets are similar to real websockets (the difference is that their implementation is in the swf, not your browser). Also the websockets protocol does not follow the http protocol; longpolling/comet stuff does (technically the websocket client sends an upgrade request to websocket server, the upgraded protocol isn't http anymore)
There is support for WebSockets with django-websocket, but unfortunately there are major issues with it for getting it working; here's a quote from that page:
Disclaimer (what you should know when using django-websocket)
BIG FAT DISCLAIMER - right at the moment its technically NOT possible in any way to use a websocket with WSGI. This is a known issue but cannot be worked around in a clean way due to some design decision that were made while the WSGI stadard was written. At this time things like Websockets etc. didn't exist and were not predictable.
...
But not only WSGI is the limiting factor. Django itself was designed around a simple request to response scenario without Websockets in mind. This also means that providing a standard conform websocket implemention is not possible right now for django. However it works somehow in a not-so pretty way. So be aware that tcp sockets might get tortured while using django-websocket.
So at the moment, WSGI: no go; Django: hardly any go, even with django-websockets; see also a comment in the author's original announcement:
I can't say this looks like a good idea. You're doing long-lived connections in a way that is going to require threading. django-websocket requires threading setup, and won't work if you've got processes (because you'd just have too many processes) but threads won't scale for a lot of connections at the same time, either, so its just a false safety. You need an asynchronous platform for long-lived things, and I do this by doing my app in Django and my comet and websocket in Node.js
Personally if trying to use WebSockets (which I expect to be next year), I would try the combination of Twisted and Cyclone first. They're designed to cope with WebSockets, and scale well. If you write your code properly to remove unnecessary dependencies on Django, you should be able to use much of your code in a Twisted-based system. This is a very distinct advantage over using Node.js or Comet or any system in another language. You could also make a simple push
Finally, you could also just decide it's too hard and use an external service to provide the push support. That then becomes a matter of sending a simple JSON request to their servers instead of worrying about how to make the connection and how concurrency will work and things like that. Of course, you'll need to pay for it (though currently it may be free while in Beta), but you don't need to worry about implementation details; you won't have the full power of WebSockets that way though - just push support.
I can't believe it's been over six years since I asked this question.
Async with Django (and the associated network traffic, eg websockets) has been an itch for many of us in the community. I have taken these past few years, to among other things, scratch this itch.
hendrix
hendrix is a WSGI/ASGI conatiner that runs on Twisted. It has been a project mainly driven by 5 enthusiasts, with help and funding from some visionary organizations. It is in production today at dozens, but not hundreds, of companies.
I'll leave it to you to read the documentation to see why it's the best solution to this problem, but a few quick highlights:
it's based on Twisted, requires no knowledge or use of Twisted internals, but leaves them all available
It "just works" in the sense that you don't need any special server or process configuration to do async and socket traffic from within your Django (or Pyramid, or Flask) app
It is very likely to be forward-compatible with ASGI, the Django Channels standard, and is in some meaningful ways the first ASGI container
It ships with simple APIs that maintain the flow of your view logic and are easy to unit test.
Please see this talk that I gave at Django-NYC (at the Buzzfeed offices) for more information about why I think this is the best answer to this question.
Re question #2, I recently was given a tour of the internals of a Django app that uses Comet heavily, and Orbited was the solution they chose.

What web server interface to choose?

I'm in the process of planning a web service, which will be written in C++. The goal is to be able to select more or less any web server to drive the service. For this to become true, I obviously have to choose a standardized interface between web servers and applications.
Well known methods that I've heard of are:
CGI
FastCGI
WSGI
Now, as I have absolutely no experience on using those interfaces, I don't really know what to choose. I do have some requirements though.
needs to be reasonably fast (from what I've heard of, this pretty much rules out CGI)
should be easily usable in a pure C/C++ environment (e.g. there should be libraries available)
must provide support for HTTP 1.1 (dunno if that matters)
Thanks for any suggestions :)
WSGI is for Python apps; if your language is C++ this isn't an option.
FCGI is a good way to go. An FCGI can be invoked as a standard CGI, convenient for debugging and testing, then run as an FCGI in production.
Performance of CGI vs. FCGI depends a lot on what you're trying to do and the amount of traffic you expect. Tasks that have a lot of startup overhead benefit most from FCGI; the FCGI controller can be configured to spawn additional processes to handle heavy loads.
Practically any web server will run CGI with minimal configuration; you'll likely need an additional module to run FCGI but that depends on the web server.
http://en.wikipedia.org/wiki/FastCGI
there is nothing "slow" about CGI it just isn't scalable. FCGI is more scalable but you can't easily develop in that environment because the process is long lived and makes debugging a nightmare. HTTP/1.1 isn't an issue at this level of abstraction. If you are worried about speed and at this point without any profiling or testing you shouldn't be, but these interfaces are not about speed they are about compatibility. Speed will depend on the container you are running your code from.
there shouldnt be much problems with CGI/fastCGI. if you implement fastcgi, your program can still run as normal CGI. and most web servers support cgi/fastcgi.