Cross-platform (linux/Win32) nonblocking C++ IO on stdin/stdout/stderr

Cross-platform (linux/Win32) nonblocking C++ IO on stdin/stdout/stderr - c++

I'm trying to find the best solution for nonblocking IO via stdin/stdout with the following characteristics:
As long as there is enough data, read in n-sized chunks.
If there's not enough data, read in a partial chunk.
If there is no data available, block until there is some (even though it may be smaller than n).
The goal is to allow efficient transfer for large datasets while processing 'control' codes immediately (instead of having them linger in some partially-filled buffer somewhere).
I know I can achieve this by using threads and a istream::get() loop, or by writing a bunch of platform-specific code (since you can't select() on file handles in windows)... ((There is also istream::readsome() which seems promising, but the only results I can find on google were of people saying it doesn't actually work well.))
Since I haven't done much coding w/ these APIs, perhaps there is a better way.

Maybe boost::asio can be of use for you?

I used the threads and platform specific code. See my answer to another question. I was able to put the OS-specific stuff in inputAvailable() (Linux uses select, Windows just returns true). I could then use WaitForSingleObject() with a timeout on Windows to try to let the thread complete, then TerminateThread() to kill it. Very ugly, but the team didn't want to use this bit of boost.

I did something similar to jwhitlock ... I ended up with a StdinDataIO class that wraps around the appropriate OS-specific implementation(*) so that the rest of my program can select() on the file descriptor StdinDataIO provides, remaining blissfully ignorant of Windows' limitations regarding stdin. Have a look here and here if you like, the code is all open-source/BSD-licensed.
(*) the implementation is a simple pass-through for Linux/MacOSX, and in Windows it's a rather complex process of setting up a child thread to read from stdin and send the data it receives over a socket back to the main thread... not very elegant, but it works.

Related

Why is it called Overlapped I/O?

All I can find is tutorials on how to use Overlapped I/O, but I can't find why is it called like that.
Is it because for example I can read something from a socket, and then read something else before necessarily the first read returns the bytes read?

The classic meaning dates back to the 1960's (or ealier), where overlapped I/O meant that multiple I/O transfers (normally each I/O to a different device) could occur at the same time (like concurrent reads from tape and writes to disk). An alternative classic name for this was concurrent I/O. This could be accomplished via interrupts and/or hardware similar to DMA (in those days, some of the DMA hardware implementations were more like a set of small processors)
Example article for IBM mainframe:
Overlapped I/O - IBM

I think the idea was (20 years ago) that you could start an IO, perform some computation or other work and later wait for the result. This is rarely done today. I think this idea comes from a time where select and poll were considered state of the art.
A better name would be asynchronous IO. That's what every other platform seems to call it. In fact the MSDN documentation mixes the two terms.

Overlap operation in Microsoft Windows means nothing else as asynchron in everybody else OS-language.
To stick to your example, you start a read on a socket and do not wait for success but do something completely different (maybe read on a different(!) socket). Then ask if the first operation is finished.
You can also set an event-handle for that. Or give a CALLBACK function which is called on completion.
In that case, the first call "overlapped" the rest of your operations.
Look also at wikipedia.
My guess why Microsoft is(was) calling it overlapped is that it is not like starting a thread, more like to start an async task at a time there was no standard name for it. It is more like std::async than std::thread.

Handling more than 1024 sockets?

I'm working on a MMO game server project and I have a problem. That's select() method's limit. I want to handle more than 1024 socket I/O with a single thread. I want to make this with single thread because I've tried to make a multi-thread handling system. That system creates 3 thread (for example in 4 cores processor; 1 is main, 3 is select() handlers) that handles select() method but there is an other problem again, now our limit is gone to 3072 (1024 * 3) and that isn't a solution! After that idea, I want to make a non-blocking socket system, with this system I've called 2 different select method in 1 single thread like this; "select() select()". They returns in order and I can handle them in order. But there is an other problem I think. If I want to implement a thread like "while(true){ select() select()}" and select() methods (non-blocking) retuns, I'll overload CPU like a empty "while(true)" block. If I want to make a select() timeout, I can't handle bottom select() in realtime. Now I can't make a algorithm for that. Can anybody help me about this?
NOTE: I don't want to use poll-epoll-wsapoll etc. (poll cannot handle microseconds, it isn't fast as select!) and libevent like 3rd party libraries (I want to make my own!)
FINALLY SOLUTION (I think): I don't need to handle nanoseconds for a I/O operation because there is no sense to handle it. Poll is a good way to handle more than 1024 socket I/O. I'll research something for understanding MMO systems. And the last one is I'll make some tests and I'll try somethings before I ask a question :) Thanks!
EDIT: I'm new in this Q&A platform. Can you tell me what's wrong with my question after giving a negative point? :)

Using select is fundamentally wrong with this many (thousands) of connections. While select is usually faster when you have only a very small number of sockets (maybe tens,) it scales horribly to several thousand and more. Everywhere that I know of, select slows down linearly with the number of connections (it's even worse than that, but I wouldn't go into the details.)
Even poll doesn't do much better than select at scaling to thousands of connections. It doesn't have select's (low) limit on the number of file descriptors you can poll, but it still scales linearly with the number of connections.
What you really should use are platform-specific facilities like epoll and kqueue. They scale extremely better (usually O(1),) but obviously they aren't portable.
I seriously suggest that you consider something like libev that is a portable, highly-tested and a thin wrapper around platform-specific facilities and services.
This is because platform-specific methods (e.g. select, poll, epoll, kqueue, I/O completion ports, event ports, etc.) are different form each other and none of them is available on more than one or two platforms, or their limits and the details of their behaviors differ slightly. These facilities might even change from one version of an OS to the next (e.g. epoll on Linux 2.6.9, IIRC.)
Even if you are not concerned with portability or future-proofing your code, such a library can provide you with more functionality and a nicer interface.
Two more libraries you can try are libevent (a little larger and slower, but more features) and libuv (if you need Windows portability.)

Given the requirements you have set, your problem has no solution.
The normal way to overcome select()'s limit of FD_SETSIZZE (1024) file descriptors is to use poll() (or even better alternatives epoll and kqueue) but you've rejected that option.
Otherwise, you could always overcome the problem by calling select() multiple times in parallel in different threads with different sets of file descriptors... but you've rejected that option too.
I don't believe there can really be any other solution!
Perhaps you should explain why both the poll() et al option and the thread option are not suitable. Your requirements seem like artificial limitations without justification.

Calling boost::asio::read() in a thread blocks calling thread or process?

I'm quite new to network programming and I'm writing a program that should accept many TCP connections and receive data from them. To make things go parallel, the agent should read data from each socket in a new thread. I decided to use boost::asio instead of raw *nix sockets to make things simpler. Though this seems to be a wrong decision...
I wonder if I calling boost::asio::read or boost::asio::read_some blocks only its calling thread or blocks process? Yes I should write my own small test and see results myself, but I have no access to my Linux box right now. Just thinking about code that I should write tomorrow at university.
So if it blocks the process, what's correct way of implementing a server/client architecture that accepts many clients at same time?
Notes:
I'm having difficulties about design decisions. Any suggestion is appropriate.

The read and read_some calls are both blocking, and will only block the current thread for Linux and Win32 (and probably most others, just don't have direct expericence).
You might want to look into using async_read instead though if you are having a large number of incoming connections, as you might acctually do better performance wise using a smaller number of threads than number of connections. Boost does provide examples of using the thread pool to handle client connections.

Regarding handling more than 1024 socket descriptors

I have written a chat server using C on Linux. I have tested the same and it works fine with respect to performance. The only thing which lags is that I am using select system call for handling of sockets descriptors. Since select has the limit of 1024 so at max my chat server can handle only 1024 users concurrently.
I know that the other option which I can use is poll, but not so sure about it and its performance as compared to select.
Please suggest me the most effective way by which I can resolve this situation.

poll() can be used as an almost drop-in replacement for select(), and will allow you to exceed 1024 file descriptors (you can make make the array passed to poll() as large as you want).
It will have similar performance characteristics to select(), since both require the kernel and userspace application to scan the entire array - but if select() is working OK for you, then poll() should too. (There is actually a slight performance improvement in poll() - the .events field, specifying the events you are interested in for each file descriptor, is not changed by poll(), so you don't have to rebuild the array before every call like you do with the file descriptor sets passed to select()).
If you later find yourself having performance problems caused by scanning the poll file descriptor array, you can consider switching to the epoll interface, which is more complicated but also scales better with very large numbers of file descriptors.

Your question is known as the C10K problem (how to deal with more than 10 thousands simultaneous connections). You'll find lot of resources on the web, e.g. this one.
And you should consider select as an obsolete system call. Even with only dozens of file descriptors, you should at least prefer poll
Notice that Qt and Gtk provide you with an event loop machinery, often using poll (and QtCore or Glib can be used outside of graphical interfaces). There is also libev and libevent. I suggest using one of them.

Linux has no 1024 limit on select(). But:
select() performance is very poor
FreeBSD does :)
Your can use poll(). But its performance suffers when number of active connections increases.
Using epoll() is preferable on Linux however I would suggest to use libevent
libevent is fast, clean and portable way to implement heavy loaded servers and for linux it has epoll under the hood.

How to free a through istream blocked thread

i have created two classes. One for input reading (through an istream object) and parsing and the other one for processing the output of the parser.
There is one instance of each of those.
I have the parser running in a loop calling istream::get() and then creating commands for the second object based upon the input. These commands are then put on a queue which the second object processes in a separate thread.
Now it is quite obvious that I eventually need to be able to send a "Quit" command. Here the problem arises though: The "Quit" command needs to end the parsing loop as well but I can't find a way to signal the parser that it should quit because it is caught within istream::get().
I would need a way to wake it from that method, but I cannot find any...
I have thought of writing some sort of "termination sequence" to the istream object (which in this case is cin) by creating an ostream object from istream::rdbuf(). But that doesn't work - The badbit is set after the attempt to write to the buffer.
In another question at StackOverflow I saw the asio class of the Boost library mentioned, but I'd rather not depend on third party libraries.
Is there a way to wake the thread from istream::get() - i.e. is there a way to write to the istream buffer (maybe assuming it actually is cin) from within the program?
Another approach would be to kill the thread which I could find acceptable as well since there is no cleanup needed in that specific place. But how can this be done? (I'm relying on a POSIX thread implementation)

You will have to depend on something other than the standard iostream classes, because they don't provide select()-style behaviour.
Also, killing the thread is impossible with POSIX (and utterly broken in Windows). You can issue a cancellation request via pthread_cancel(), but in your case, it may be stuck in an un-cancellable system call. Of particular interest to you, read() may or may not be cancellable, depending on the environment. At least one environment says that a cancellation point may occur in read(), though admittedly it is a Windows POSIX layer. Also, Mac OS X, as recently as Leopard 10.5.1, had a broken read() implementation with respect to cancellability.
Once past this hurdle, you also have to consider the uneasy relationship between C++ destructors and pthread_cancel. Not all environments guarantee that destructors will be called, so you have to be extremely cautions when using pthread_cancel in C++ code.
In short, for interruptible I/O, use low-level I/O and select(): one fd for I/O, a second fd (created by pipe()) for signalling. Or, if you're brave, use AIO, but you're probably better off using a high level interface such as Boost.Asio.

Any chance this is implemented in .NET? - if so take a look at the Reactive Framework.
It provides a very elegant way of handling streams and especially cancelling them on the fly.
On top of this, you get a very extensible library of Linq extension for all sorts of stuff, like Buffering, Memoization, Zip ect..
We use it a lot for transforming (and parsing), modelling of streamed data.
Jeff from the Reative team has a couble of nice blogs about Streaming and Reative here:

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js