How to send lots of POST requests QUICKLY - c++

I'm planning to develop a program for our university research that has to send lots of post requests to different urls. It must work as quick as possible (we should process about 100kk urls). What language shoud i use (currently i'm writing in c++, delphi and perl a bit)?
Also, I've heard that it's possible to write an multithreaded app in perl using prefork that can process about 20-30k per minute. Is it true?
// Sorry for my bad english, but it seems to be the only place where i can get the right answer
Andrew

The 20-30k per minute is completely arbitrary. If you run this on an 8-core machine with a beefy network connection you could probably surpass that.
However, I don't think your choice of programming language / library is going to matter much here. Instead, you're going to run into the number of concurrent TCP connections allowed by the machine, and also the bandwidth of the link itself.

Webserver Stress Tool claims capable of simulating the HTTP requests generated by up to 10.000 simultaneous users and has an entry in Torry's site: Presumably it's written in Delphi or C++ Builder.
My suggestion:
You can write your custom stress tool (HTTP(S) Client) with Delphi (It happens to be my favorite language so I advocate it) using light HTTP(S) library such as RTC SDK and OmniThreadLibrary for multithreading.
See this page for a clue/hint.
Edit:
Excerpt from Demos\Readme_Demos.txt in RealThinClient_SDK331.zip
App Client, Server and ISAPI demos can be used to stress-test RTC
component using Remote Functions with strong encryption by opening
hundreds of connections from each client and flooding the
Server/ISAPI with requests.
App Client Demo is ideal for stress-testing RTC remote functions using
multiple connections in multi-threaded mode, visualy showing activity
and stage for each connection in a live graph. Client can choose
between "Proxy" and standard connection components, to see the
difference in bandwidth usage and distribution.

I have heard Erlang is pretty good for such applications as it is very efficient to spawn many processes in Erlang quickly. But I think using Python would be fine too, just use the popen module to spawn multiple processes.
After all you are limited by how many you can run at the same time depending on how many processors your machine has. The choice of language may not matter as much depending on what you are doing with the data downloaded from these URLs as that may be more processing intensive than the cost of spawning.

Related

Django node.js socket.io

I am trying to make a realtime messaging application. There will be 2 distinct server(node.js and django) and when a user sends message to another user message will be stored in database than node.js will send a message to receiver like "You have new Message!". For that i am planing to call url which node.js serve. So node.js and django will interact each other. And what is best way send message to specifig client ? (I keep clients with their id's in a assosicative array.)
what do you think about that? is it efficent or do you suggest better way to do this ?
Now that I understand more about what you're trying to do, here my answer, just keep in mind that this only reflects my opinion, and I bet that many others would argue about it.
It all matter on how much traffic you expect to have in your application. If it's not a high traffic application, then efficiency in run-time is insignificant when compared to that of the development, and so choose the technology you feel most comfortable with.
If though you do aim for high traffic application, then I believe that this setup is not a good one.
First of all while http based communication between servers might seem comfortable, you are dealing with the overhead of http over tcp (since http is based on tcp). And so regular tcp sockets scale better, but on the other hand if you write the sockets server in python than you can run it from the same process as the django and then just use it as an object from django (you're entering the realm of threads here). But that's problematic if you have a few web instances, again depends on how much traffic you expect.
As for your choice for implementing the messaging server, I've never tested node.js but I believe that in benchmark tests it won't compare for something written in erlang or Java NIO. For example: JAVA AIO (NIO.2) VS NODEJS

Django / Comet (Push): Least of all evils?

I have read all the questions and answers I can find regarding Django and HTTP Push. Yet, none offer a clear, concise, beginning-to-end solution about how to accomplish a basic "hello world" of so-called "comet" functionality.
First question (1): To what extent is the problem that HTTP simply isn't (at least so far) made for this? Are all the potential solutions essentially hacks?
2) What's the best currently available solution?
Orbited?
Some other Twisted-based solution?
Tornado?
node.JS?
XMPP w/ BOSH?
Some other solution?
3) How does nginx push module play into this discussion?
4) Which of these solutions require replacement of the typical mod_wsgi / nginx (or apache) deployment model? Why do they require this? Is this a favorable transition in any case?
5) How significant are the advantages of using a solution that is already in Python?
Alex Gaynor's presentation from PyCon 2010, which I just watched on blip.tv, is amazing and informative, but not terrifically specific on the current state of HTTP Push in Django. One thing that he said that gave me some confidence was this: Orbited does a good job of abstracting and simulating the concept of network sockets. Thus, when WebSockets actually land, we'll be in a good place for a transition.
6) How does HTML5 Websockets differ from current solutions? Is Gaynor's assessment of the ease of transition from Orbited accurate?
I'd take a look at evserver (http://code.google.com/p/evserver/) if all you need is comet.
It "supports [the] little known Asynchronous WSGI extension" and is build around libevent. Works like a charm and supports django. The actual handler code is a bit ugly, but it scales well as it really is async io.
I have used evserver and I'm currently moving to cyclone (tornado on twisted) because I need a little more than evserver offsers. I need true bidirectional io (think socket.io (http://socket.io/)) and while evserver could support it I thought it was easier to reimplement tornado's socket.io in cyclone (I opted for cyclone instead of tornado as cyclone is build on twisted, thus allowing for more transports that aren't implemented in twisted (i.c. zeromq)) Socket.io supports websockets, comet style polling, and, much more interseting, flash based websockets. I think that in most practical situations websockets + flash based websockets are enough to support 99% (according to adobe flash penetration is about 99% (http://www.adobe.com/products/player_census/flashplayer/version_penetration.html)) of a websites visitors (only people not using flash need to fallback to one of socket.io its (less perfomant and resource hogging) backup transports)
Be aware though websockets are not an http transport thus putting them behind http based proxies (e.g haproxy in http mode) breaks the connection. Better serve them on an alternate ip or port so you can proxy in tcp mode (e.g haproxy in tcp mode).
To answer your questions:
(1) If you don't need a bidirectional transport longpolling based solutions are good enough (all they do is keep a connection open). Things do get iffy when you need your connection to be statefull or you need to be able to both send and receive data. In the latter case socket.io helps. However websockets are made for this scenario and with the support of flash its available to most of a websites vistors (via socket.io or standalone, however socket.io has the added benefit of backup transports for those people not wanting to install flash)
(2) if all you need is push, evserver is your best bet. It uses the the same javascripts on the client side as orbited. Else look at socket.io (this also needs a supporting server, the only python one available is tornado.)
(3) It's just one other server implementation. If i read it correctly it's push only. pushing data to a client is done by making http equest from your app to the nginx server. (nginx then takes care they reach the client). If you're inteersted in this, look at mongrel2 (http://mongrel2.org/home) it not only has handlers for longpolling but also for websockets.(instead of making http request to mongrel, this time you use zeromq handlers to get data to your mongrel server) (Do take note of the developer's lack of enthusiasm for websockets and flash based websockets. Especially taking into account that the websocket protocol tends to evolve you might, at some point, need to recode mongrel2's websocket support yourself keep having support for websockets)
(4) All solutions except evserver replace wsgi with something else. Though most servers also have some wsgi support ontop of this "something else". No matter what solution you choose be careful that one cpu intensive or otherwise io blocking request doesn't block the server. (you either need multiple instances or threads).
(5) Not very significant. All solutions depend on some custom handlers to push (and, if applicable, receive) data to the client. All solutions i mentioned allow these handlers to be written in python. If you want to use a completely different framework (node.js) then you have to weigh the ease of node.js (it's assumed to be easy, but it's also rather experimental, and i found very few libraries to be actually stable) against the convenience of using your existing code base and the available libraries (e.g. if your app needs a blog ther are plenty django blogs you could plug in, but none for node.js) Also don't stare yourself blind on performance stats. unless you plan to push dumb predefined data (what all benchmarks do) to the client you'll find that the actual processing of data adds much more overhead than even the worst async io implementation. (But you still want to use an async io based server if you plan to have many simultaneous clients, threading simply isn't meant to keep thousands of connections alive)
(6) websockets offer bidirectional communication, long polling/comet only pushes data but does not accept writes. (Socket.io simulates this bidirectional support by using two http requests, one to longpoll, one to send data. It tracks their interdependance by a (session) id that's part of both requests query string). flash based websockets are similar to real websockets (the difference is that their implementation is in the swf, not your browser). Also the websockets protocol does not follow the http protocol; longpolling/comet stuff does (technically the websocket client sends an upgrade request to websocket server, the upgraded protocol isn't http anymore)
There is support for WebSockets with django-websocket, but unfortunately there are major issues with it for getting it working; here's a quote from that page:
Disclaimer (what you should know when using django-websocket)
BIG FAT DISCLAIMER - right at the moment its technically NOT possible in any way to use a websocket with WSGI. This is a known issue but cannot be worked around in a clean way due to some design decision that were made while the WSGI stadard was written. At this time things like Websockets etc. didn't exist and were not predictable.
...
But not only WSGI is the limiting factor. Django itself was designed around a simple request to response scenario without Websockets in mind. This also means that providing a standard conform websocket implemention is not possible right now for django. However it works somehow in a not-so pretty way. So be aware that tcp sockets might get tortured while using django-websocket.
So at the moment, WSGI: no go; Django: hardly any go, even with django-websockets; see also a comment in the author's original announcement:
I can't say this looks like a good idea. You're doing long-lived connections in a way that is going to require threading. django-websocket requires threading setup, and won't work if you've got processes (because you'd just have too many processes) but threads won't scale for a lot of connections at the same time, either, so its just a false safety. You need an asynchronous platform for long-lived things, and I do this by doing my app in Django and my comet and websocket in Node.js
Personally if trying to use WebSockets (which I expect to be next year), I would try the combination of Twisted and Cyclone first. They're designed to cope with WebSockets, and scale well. If you write your code properly to remove unnecessary dependencies on Django, you should be able to use much of your code in a Twisted-based system. This is a very distinct advantage over using Node.js or Comet or any system in another language. You could also make a simple push
Finally, you could also just decide it's too hard and use an external service to provide the push support. That then becomes a matter of sending a simple JSON request to their servers instead of worrying about how to make the connection and how concurrency will work and things like that. Of course, you'll need to pay for it (though currently it may be free while in Beta), but you don't need to worry about implementation details; you won't have the full power of WebSockets that way though - just push support.
I can't believe it's been over six years since I asked this question.
Async with Django (and the associated network traffic, eg websockets) has been an itch for many of us in the community. I have taken these past few years, to among other things, scratch this itch.
hendrix
hendrix is a WSGI/ASGI conatiner that runs on Twisted. It has been a project mainly driven by 5 enthusiasts, with help and funding from some visionary organizations. It is in production today at dozens, but not hundreds, of companies.
I'll leave it to you to read the documentation to see why it's the best solution to this problem, but a few quick highlights:
it's based on Twisted, requires no knowledge or use of Twisted internals, but leaves them all available
It "just works" in the sense that you don't need any special server or process configuration to do async and socket traffic from within your Django (or Pyramid, or Flask) app
It is very likely to be forward-compatible with ASGI, the Django Channels standard, and is in some meaningful ways the first ASGI container
It ships with simple APIs that maintain the flow of your view logic and are easy to unit test.
Please see this talk that I gave at Django-NYC (at the Buzzfeed offices) for more information about why I think this is the best answer to this question.
Re question #2, I recently was given a tour of the internals of a Django app that uses Comet heavily, and Orbited was the solution they chose.

Socket Server vs. Standard Servers

I'm working on a project of which a large part is server side software. I started programming in C++ using the sockets library. But, one of my partners suggested that we use a standard server like IIS, Apache or nginx.
Which one is better to do, in the long run? When I program it in C++, I have direct access to the raw requests where as in the case of using standard servers I need to use a scripting language to handle the requests. In any case, which one is the better option and why?
Also, when it comes to security for things like DDOS attacks etc., do the standard servers already have protection? If I would want to implement it in my socket server, what is the best way?
"Server side software" could mean lots of different things, for example this could be a trivial app which "echoes" everything back on a specific port, to a telnet/ftp server to a webserver running lots of "services".
So where in this gamut of possibilities does your particular application lie? Without further information, it's difficult to make any suggestions, but let's see..
Web Services, i.e. your "server side" requirement is to handle individual requests and respond having done some set of business logic. Typically communication is via SOAP/XML, and this is ideal if you bave web based clients (though nothing prevents your from accessing these services via standalone clients). Typially you host these on web servers as you mentioned, and often they are easiest written in Java (I've yet to come across one that needed to be written in C++!)
Simple web site - slightly different to the above, respods to HTML get/post requests and serves up static or dymanic content (I'm guessing this is not what you're after!)
Standalone server which responds to something specific, here you'd have to implement your own "messaging"/protocols etc. and the server will carry out a specific function on incoming request and potentially send responses back. Key thing here is that the server does something specific, and is not a generic container (at which point 1 makes more sense!)
So where does your application lie? If 1/2 use Java or some scripting language (such as Perl/ASP/JSP etc.) If 3, you can certainly use C++, and if you do, use a suitable abstraction, such as boost::asio and Google Protocol buffers, save yourself a lot of headache...
With regards to security, ofcourse bugs and security holes are found all the time, however the good thing with some of these OS projects is that the community will tackle and fix them. Let's just say, you'll be safer using them than your own custom handrolled imlpementation, the likelyhood that you'll be able to address all the issues that they would have encountered in the years they've been around is very small (no disrespect to your abilities!)
EDIT: now that there's a little more info, here is one possible approach (this is what I've done in the past, and I've jused Java most of the way..)
The client facing server should be something reliable, esp. if it's over the internet, here I would use a proven product, something like Apache is good or IIS (depends on which technologies you have available). IMHO, I would go for jBoss AS - really powerful and easily customisable piece of kit, and integrates really nicely with lots of different things (all Java ofcourse!) You could then have a simple bit of Java which can then delegate to your actual Server processes that do the work..
For the Server procesess you can use C++ if that's what you are comfortable with
There is one key bit which I left out, and this is how 1 & 2 talk to each other. This is where you should look at an open source messaging product (even more higher level than asio or protocol buffers), and here I would look at something like Zero MQ, or Red Hat Messaging (both are MQ messaging protocols), the great advantage of this type of "messaging bus" is that there is no tight coupling between your servers, with your own handrolled implementation, you'll be doing lots of boilerplate to get the interaction to work just right, with something like MQ, you'll have multiplatform communication without having to get into the details... You wil save yourself a lot of time and bother if you elect to use something like that.. (btw. there are other messaging products out there, and some are easier to use - such as Tibco RV or EMS etc, however they are commercial products and licenses will cost a lot of money!)
With a messaging solution your servers become trivial as they simply handle incoming messagins and send messages back out again, and you can focus on the business logic...
my two pennies... :)
If you opt for 1st solution in Nim's list (web services) I would suggest you to have a look at WSO's web services framework for C++ , Axis CPP and Axis2/C web services framework (if you are not restricted to C++). Web Services might be the best solution for your requirement as you can quickly build them and use either as processing or proxy modules on the server side of your system.

Integrating C++ code with any web technology on Linux

i am writing an program in c++ and i need an web interface to control the program and which will be efficient and best programming language ...
Your application will just have to listen to messages from the network that your web application would send to it.
Any web application (whatever the language) implementation could use sockets so don't worry about the details, just make sure your application manage messages that you made a protocol for.
Now, if you want to keep it all C++, you could use CPPCMS for your web application.
If it were Windows, I could advice you to register some COM component for your program. At least from ASP.NET it is easily accessible.
You could try some in-memory exchange techniques like reading/writing over a localhost socket connection. It however requires you to design some exchange protocol first.
Or data exchange via a database. You program writes/reads data from the database, the web front-end reads/writes data to the database.
You could use a framework like Thrift to communicate between a PHP/Python/Ruby/whatever webapp and a C++ daemon, or you could even go the extra mile (probably harder than just using something like Thrift) and write language bindings for the scripting language of your choice.
Either of the two options gives you the ability to write web-facing code in a language more suitable for the task while keeping the "heavy lifting" in C++.
Did you take a look at Wt? It's a widget-centric C++ framework for web applications, has a solid MVC system, an ORM, ...
The Win32 API method.
MSDN - Getting Started with Winsock:
http://msdn.microsoft.com/en-us/library/ms738545%28v=VS.85%29.aspx
(Since you didn't specify an OS, we're assuming Windows)
This is not as simple as it seems!
There is a mis-match between your C++ program (which presumibly is long running otherwise why would it need controlling) and a typical web program which starts up when it receives the http request and dies once the reply is sent.
You could possibly use one of the Java based web servers where it is possible to have a long running task.
Alternatively you could use a database or other storage as the communication medium:-
You program periodically writes it current status to a well know table, when a user invokes the control application it reads the current status and gives an appropriate set of options to the user which can then be stored in the DB, and actioned by your program the next time it polls for a request.
This works better if you have a queuing mechanism avaiable, as it can then be event driven rather than polled.
Go PHP :) Look at this Program execution Functions

How are Massively Multiplayer Online RPGs built?

How are Massively Multiplayer Online RPG games built?
What server infrastructure are they built on? especially with so many clients connected and communicating in real time.
Do they manage with scripts that execute on page requests? or installed services that run in the background and manage communication with connected clients?
Do they use other protocols? because HTTP does not allow servers to push data to clients.
How do the "engines" work, to centrally process hundreds of conflicting gameplay events?
Thanks for your time.
Many roads lead to Rome, and many architectures lead to MMORPG's.
Here are some general thoughts to your bullet points:
The server infrastructure needs to support the ability to scale out... add additional servers as load increases. This is well-suited to Cloud Computing by the way. I'm currently running a large financial services app that needs to scale up and down depending on time of day and time of year. We use Amazon AWS to almost instantly add and remove virtual servers.
MMORPG's that I'm familiar with probably don't use web services for communication (since they are stateless) but rather a custom server-side program (e.g. a service that listens for TCP and/or UDP messages).
They probably use a custom TCP and/or UDP based protocol (look into socket communication)
Most games are segmented into "worlds", limiting the number of players that are in the same virtual universe to the number of game events that one server (probably with lots of CPU's and lots of memory) can reasonably process. The exact event processing mechanism depends on the requirements of the game designer, but generally I expect that incoming events go into a priority queue (prioritized by time received and/or time sent and probably other criteria along the lines of "how bad is it if we ignore this event?").
This is a very large subject overall. I would suggest you check over on Amazon.com for books covering this topic.
What server infrastructure are they built on? especially with so many clients connected and communicating in real time.
I'd guess the servers will be running on Linux, BSD or Solaris almost 99% of the time.
Do they manage with scripts that execute on page requests? or installed services that run in the background and manage communication with connected clients?
The server your client talks to will be a server running a daemons or service that sits idle listening for connections. For instances (dungeons), usually a new process is launched for each group, which would mean there is a dispatcher service somewhere mananging this (analogous to a threadpool)
Do they use other protocols? because HTTP does not allow servers to push data to clients.
UDP is the protocol used. It's fast as it makes no guarantees the packet will be received. You don't care if a bit of latency causes the client to lose their world position.
How do the "engines" work, to centrally process hundreds of conflicting gameplay events?
Most MMOs have zones which limit this to a certain amount of people. For those that do have 100s of people in one area, there is usually high latency. The server is having to deal with 100s of spells being sent its way, which it must calculate damage amounts for each one. For the big five MMOs I imagine there are teams of 10-20 very intelligent, mathematically gifted developers working on this daily and there isn't a MMO out there that has got it right yet, most break after 100 players.
--
Have a look for Wowemu (there's no official site and I don't want to link to a dodgy site). This is based on ApireCore which is an MMO simulator, or basically a reverse engineer of the WoW protocol. This is what the private WoW servers run off. From what I recall Wowemu is
mySQL
Python
However ApireCore is C++.
The backend for Wowemu is amazingly simple (I tried it in 2005 however) and probably a complete over simplification of the database schema. It does gives you a good idea of what's involved.
Because MMOs by and large require the resources of a business to develop and deploy, at which point they are valuable company IP, there isn't a ton of publicly available information about implementations.
One thing that is fairly certain is that since MMOs by and large use a custom client and 3D renderer they don't use HTTP because they aren't web browsers. Online games are going to have their own protocols built on top of TCP/IP or UDP.
The game simulations themselves will be built using the same techniques as any networked 3D game, so you can look towards resources for that problem domain to learn more.
For the big daddy, World of Warcraft, we can guess that their database is Oracle because Blizzard's job listings frequently cite Oracle experience as a requirement/plus. They use Lua for user interface scripting. C++ and OpenGL (for Mac) and Direct3D (for PC) can be assumed as the implementation languages for the game clients because that's what games are made with.
One company that is cool about discussing their implementation is CCP, creators of Eve online. They have published a number of presentations and articles about Eve's infrastructure, and it is a particularly interesting case because they use Stackless Python for a lot of Eve's implementation.
http://www.disinterest.org/resource/PyCon2006-StacklessInEve.wmv
http://us.pycon.org/2009/conference/schedule/event/91/
There was also a recent Game Developer Magazine article on Eve's architecture:
https://store.cmpgame.com/product/3359/Game-Developer-June%7B47%7DJuly-2009-Issue---Digital-Edition
The Software Engineering radio podcast had an episode with Jim Purbrick about Second Life which discusses servers, worlds, scaling and other MMORPG internals.
Traditionally MMOs have been based on C++ server applications running on Linux communicating with a database for back end storage and fat client applications using OpenGL or DirectX.
In many cases the client and server embed a scripting engine which allows behaviours to be defined in a higher level language. EVE is notable in that it is mostly implemented in Python and runs on top of Stackless rather than being mostly C++ with some high level scripts.
Generally the server sits in a loop reading requests from connected clients, processing them to enforce game mechanics and then sending out updates to the clients. UDP can be used to minimize latency and the retransmission of stale data, but as RPGs generally don't employ twitch gameplay TCP/IP is normally a better choice. Comet or BOSH can be used to allow bi-directional communications over HTTP for web based MMOs and web sockets will soon be a good option there.
If I were building a new MMO today I'd probably use XMPP, BOSH and build the client in JavaScript as that would allow it to work without a fat client download and interoperate with XMPP based IM and voice systems (like gchat). Once WebGL is widely supported this would even allow browser based 3D virtual worlds.
Because the environments are too large to simulate in a single process, they are normally split up geographically between processes each of which simulates a small area of the world. Often there is an optimal population for a world, so multiple copies (shards) are run which different sets of people use.
There's a good presentation about the Second Life architecture by Ian Wilkes who was the Director of Operations here: http://www.infoq.com/presentations/Second-Life-Ian-Wilkes
Most of my talks on Second Life technology are linked to from my blog at: http://jimpurbrick.com
Take a look at Erlang. It's a concurrent programming language and runtime system, and was designed to support distributed, fault-tolerant, soft-real-time, non-stop applications.