I'm in the process of planning a web service, which will be written in C++. The goal is to be able to select more or less any web server to drive the service. For this to become true, I obviously have to choose a standardized interface between web servers and applications.
Well known methods that I've heard of are:
CGI
FastCGI
WSGI
Now, as I have absolutely no experience on using those interfaces, I don't really know what to choose. I do have some requirements though.
needs to be reasonably fast (from what I've heard of, this pretty much rules out CGI)
should be easily usable in a pure C/C++ environment (e.g. there should be libraries available)
must provide support for HTTP 1.1 (dunno if that matters)
Thanks for any suggestions :)
WSGI is for Python apps; if your language is C++ this isn't an option.
FCGI is a good way to go. An FCGI can be invoked as a standard CGI, convenient for debugging and testing, then run as an FCGI in production.
Performance of CGI vs. FCGI depends a lot on what you're trying to do and the amount of traffic you expect. Tasks that have a lot of startup overhead benefit most from FCGI; the FCGI controller can be configured to spawn additional processes to handle heavy loads.
Practically any web server will run CGI with minimal configuration; you'll likely need an additional module to run FCGI but that depends on the web server.
http://en.wikipedia.org/wiki/FastCGI
there is nothing "slow" about CGI it just isn't scalable. FCGI is more scalable but you can't easily develop in that environment because the process is long lived and makes debugging a nightmare. HTTP/1.1 isn't an issue at this level of abstraction. If you are worried about speed and at this point without any profiling or testing you shouldn't be, but these interfaces are not about speed they are about compatibility. Speed will depend on the container you are running your code from.
there shouldnt be much problems with CGI/fastCGI. if you implement fastcgi, your program can still run as normal CGI. and most web servers support cgi/fastcgi.
Related
My friend recently asked me the following question: given that Django already has runserver, why didn't wasn't it extended to be a production-ready customer-facing HTTP server? What people do instead is set up an uwsgi server that speaks WSGI and exposes something that Nginx forwards traffic to by reverse proxying...
Based on what I know, many other languages use this pattern: there is a "simple" HTTP server meant for development, as well as an interface for *GI (ASGI/WSGI/FCGI/CGI) that web server is supposed to reverse proxy to. What is the main reason those web servers don't grow production-ready and instead assume presence of another web server?
Here are some of my theories, but I'm not sure if I'm missing something more significant:
History: dynamic websites date back to perl/PHP, both worked as a "dumb" CGI backend that was basically a filter that processed HTTP request (stdin) to a response (stdout). This architecture worked for some time and became a common pattern,
Performance: web applications are often written in languages that don't JIT and having a web server written in such a language would introduce extra overhead while milliseconds matter. Also, this lets us speed up static file serving,
Security: Django's runserver is clearly described as potentially insecure, according to this quote:
DO NOT USE THIS SERVER IN A PRODUCTION SETTING. It has not gone through security audits or performance tests. (And that’s how it’s gonna stay.
The last point seems to suggest that writing a production-ready HTTP server is too complex to fit within Django's goals, what kind of edge cases would need to be supported to get there?
Is any of the points actually valid, or am I missing the elephant in the room here?
Because they don't want to get into the web server business, and I think that's a wise decision.
Creating, developing and most importantly maintaining a web server is not a trivial thing. They couldn't simply write it once and then it's done (in fact, that's pretty much what they did and it's runserver).
Rather than re-invent the wheel, they've chosen to leave it to those who do it best. They're not likely to match the stability and functionality of a proper web server by doing it as a side-project to support running Django applications. They're better spending their time making Django better.
It's also consistent with the UNIX philosophy, but that's not necessary to get into here.
We are planning on migrating from IIS to Nginx to gain performance. Our web layer is very lightweight - for each request we are reading/setting cookies and perform some very quick data cleanup and passing it down to very fast storage (Aerospike). Most of the requests take under 100ms, but we are experiencing inefficiencies due to IIS binding thread to each request. We are processing A LOT of concurrent requests.
Whats the best way to accomplish the same thing in Nginx? I know it would probably make sense to C++ to do most of my processing. Where do I take care of cookies, can I do it with C++? How do I forward a request from Nginx down to a compiled C++ binary effectively.
Thanks for your help!
You will need to write a module in nginx. The basic module should be written in C. But I guess you will be able write the main workhorse functions in c++. Unfortunately the api for module development etc is not documented well but Evan Millers Guide is your best guide.
I have an apache server in which I am setting up fcgi. I was contemplating if I've to setup the tailor made mod_fastcgi or the plain old cgi-fcgi.
mod-fastcgi doesn't seem to support the "multiplexing" features of fcgi, and the web service I am building is a very high traffic service with several thousand calls per minute and I want them to be processed as quick as possible.
Any suggestions or advice??
Indeed, mod_fastcgi does not support multiplexing. I suppose this is because the Apache web server handles concurrent processing itself. You've probably dealt with it's various Multi-Processing-Models (MPMs) already...
Apache is highly optimized around the several (request) phases provided. The various modules can hook in where-ever you like, which makes the Apache an excellent server to directly integrate high performance and/or really complex applications (e.g. with custom modules in c, mod_perl and so on) as modules themselves.
But both, mod_fastcgi and cgi-fcgi, are IMHO only used to provide response and/or filter handler. Thus; many of the great features (configuration, mapping, post-request logging & cleanup...) provided with Apache are just not used in such a setup.
Thus; if your application is built on top of FGCI, I'd rather not recommend using Apache. Especially for high performance applications under high load; One may prefer a more lightweight but fast HTTP daemon. There are plenty of alternatives like nginx or lighttpd.
Usually one would use them as proxies/balancer to the FCGI processes, cache, SSL handler and logging provider. Of course, Apache is also capable of these tasks, but it's somehow like using a helicopter to direct the traffic at the intersection...
Cheers!
What C++ software stack do developers use to create custom fast, responsive and not very resource hungry web services?
I'd recommend you to take a look on CppCMS:
http://cppcms.com
It exactly fits the situation you had described:
performance-oriented (preferably web service) software stack
for C++ web development.
It should have a low memory footprint
work on UNIX (FreeBSD) and Linux systems
perform well under high server load and be able to handle many requests with great efficiency
[as I plan to use it in a virtual environment] where resources will be to some extent limited.
So far I have only come across Staff WSF, Boost, Poco libraries. The latter two could be used to implement a custom web server...
The problem that web server is about 2% of web development there are so much stuff to handle:
web templates
sessions
cache
forms
security-security-security - which is far from being trivial
And much more, that is why you need web frameworks.
You could write an apache module, and put all your processing code in there.
Or there's CppCMS, or Treefrog or for writing web services (not web sites) use gSOAP or Apache Axis
But ultimately, there's no "easy to use framework" because C++ developers like to build apps from smaller components. There's no Ruby-style framework, but there is all manner of libraries for handling xml or whatever, and Apache offers the http protocol bits in the module spec so you can build up your app quite happily using whatever pieces make sense to you. Now whether there's a market for bundling this up to make something easier to use is another matter.
Personally, the best web app system I wrote (for a company) used a very think web layer in the web server (IIS and ASP, but this applies to any webserver, use php for example) that did nothing except act as a gateway to pass the data from the requests through to a C++ service. The C++ service could then be written completely as a normal C++ command line server with well-defined entry points, using as thin an RPC system as possible (shared memory, but you may want to check out ZeroMQ), which not only increased security but allowed us to easily scale by shifting the services to app servers and running the web servers on different hardware. It was also really easy to test.
I'm planning to develop a program for our university research that has to send lots of post requests to different urls. It must work as quick as possible (we should process about 100kk urls). What language shoud i use (currently i'm writing in c++, delphi and perl a bit)?
Also, I've heard that it's possible to write an multithreaded app in perl using prefork that can process about 20-30k per minute. Is it true?
// Sorry for my bad english, but it seems to be the only place where i can get the right answer
Andrew
The 20-30k per minute is completely arbitrary. If you run this on an 8-core machine with a beefy network connection you could probably surpass that.
However, I don't think your choice of programming language / library is going to matter much here. Instead, you're going to run into the number of concurrent TCP connections allowed by the machine, and also the bandwidth of the link itself.
Webserver Stress Tool claims capable of simulating the HTTP requests generated by up to 10.000 simultaneous users and has an entry in Torry's site: Presumably it's written in Delphi or C++ Builder.
My suggestion:
You can write your custom stress tool (HTTP(S) Client) with Delphi (It happens to be my favorite language so I advocate it) using light HTTP(S) library such as RTC SDK and OmniThreadLibrary for multithreading.
See this page for a clue/hint.
Edit:
Excerpt from Demos\Readme_Demos.txt in RealThinClient_SDK331.zip
App Client, Server and ISAPI demos can be used to stress-test RTC
component using Remote Functions with strong encryption by opening
hundreds of connections from each client and flooding the
Server/ISAPI with requests.
App Client Demo is ideal for stress-testing RTC remote functions using
multiple connections in multi-threaded mode, visualy showing activity
and stage for each connection in a live graph. Client can choose
between "Proxy" and standard connection components, to see the
difference in bandwidth usage and distribution.
I have heard Erlang is pretty good for such applications as it is very efficient to spawn many processes in Erlang quickly. But I think using Python would be fine too, just use the popen module to spawn multiple processes.
After all you are limited by how many you can run at the same time depending on how many processors your machine has. The choice of language may not matter as much depending on what you are doing with the data downloaded from these URLs as that may be more processing intensive than the cost of spawning.