Performance difference between select() vs WSAEventSelect() vs WSAWaitForMultipleEvents() - c++

I have an application that is using a cross platform library developed internally.
For various reasons I must stick with using this library under windows at least.
That library contains a socket class that I have to use and it's calling select.
I have the ability to modify the library a little.
Would there be a performance improvement if going to WSAWaitForMultipleEvents or WSAEventSelect?
Bear in mind that the client library is based on a blocking I/O.
i.e. it calls select to check read first before issuing recvfrom, and the same for writing.
From what I can see there is quite a lot of set up just for select and wondered if I might improve the polling speed by going to the windows native versions as my UDP server which is linux based is occasionally overwealming my receiver which stalls it. i.e. the receiver while not doing a lot has trouble keeping up. Increasing the receive buffers has helped enormously but now I'm looking at select.

Related

Why libcurl needs `CURLOPT_NOSIGNAL` option and what are side effects when it is enabled?

I spent a lot of time to investigate why multithreaded libcurl application crashes on Linux. I saw in forums that I have to use CURLOPT_NOSIGNAL to bypass this problem. Ok, no problems, but are there any information what side effects can it create? If CURLOPT_NOSIGNAL = 0 is buggy, why libcurl needs this option at all nowadays when even mobile devices have multicore processors and that is why a lot of applications use multiple threads to use this hardware multitasking support?
By default the DNS resolution uses signals to implement the timeout logic but this is not thread-safe: the signal could be executed on another thread than the original thread that started it.
When libcurl is not built with async DNS support (which means threaded resolver or c-ares) you must set the CURLOPT_NOSIGNAL option to 1 in your multi-threaded application.
You can find more details related to this topic here:
http://curl.haxx.se/libcurl/c/libcurl-tutorial.html#Multi-threading
http://www.redhat.com/archives/libvir-list/2012-September/msg01960.html
http://curl.haxx.se/mail/lib-2013-03/0086.html
Why libcurl needs this option at all nowadays?
Mainly for legacy reasons, but also because not all applications use libcurl in a multi-threaded context.
This is still something that is actively discussed. See this recent discussion:
libcurl has no threads on its own that it needs to protect and it doesn't know
what thread library/concept you use so it cannot on its own set the callbacks.
This has been the subject for discussion before and there are indeed valid
reasons to reconsider what we can and should do, but this is how things have
been since forever and still are. [...] but I'm always open for further discussions!

VNC viewer implementation

Our team is implementing a VNC viewer (=VNC client) on Windows. The protocol (called RFB) is stateful, meaning that the viewer has to read 1 byte, see what it is, then read either 3 or 10 bytes more, parse them, and so on.
We've decided to use asynchronous sockets and a single (UI) thread. Consequently, there are 2 ways to go:
1) state machine -- if we get a block on socket reading, just remember the current state and quit. Later on, a socket notification will arrive and the interrupted logic will resume from the proper stage;
2) inner message loop -- once we determine that reading from the socket would block, we enter an inner message loop and spin there until all the necessary data is finally received.
UI is not thus frozen in case of a block.
As experience showed, the second approach is bad, as any message can come while we're in the inner message loop. I cannot tell the full story here, but it simply is not reliable enough. Crashes and kludges.
The first option seems to be quite acceptable, but it is not easy to program in such a style. One has to remember the state of an algorithm and values of all the local variables required for further processing.
This is quite possible to use multiple threads, but we just thought that the problems in this case would be even much harder: synchronization of frame-buffer access, multi-threading issues, etc. Moreover, even in this variant it seems necessary to use asynchronous sockets as well.
So, what way is in your opinion the best ?
The problem is quite a general one. This is the problem of organizing asynchronous communication through stateful protocols.
Edit 1: We use C++ and MFC as UI framework.
I've done a few parallel computing projects and it seems that MPI (Message Passing Interface) might be helpful to your VNC project. You're probably not so interested in the parallel computing power provided by MPI, but you may want to use the simplified socket-like interface for asynchronous communication over a network.
http://www.open-mpi.org/
You can find other implementations of MPI and tons of use examples from google.
Don't bother with CSocket, you'll move to CAsyncSocket in the end because of the extra control you get (interrupting, shutting down etc.). I'd also recommend using a separate thread to manage the communication, it adds complexity but keeping the UI responsive should be a top priority.
I think you will find that your design will be simplified greatly by using a separate thread to handle a blocking socket.
The main reason for this is you don't need to spin and wait. The UI remains responsive while the network thread(s) block when it has nothing to do and comes back when it has stuff to do. You are effectively offloading a large portion of your overhead to the OS.
Remember, RFB does not require a whole lot of state info to work. Because client to server messages are short; there is nothing requiring you to receive a frame buffer before you send your next pointer input.
My point being is messages in RFB can be intermixed; the server will work on your schedule.
Now, Windows provides easy to use synchronization API's that while not always the most efficient, are more than enough for your purposes and will ease getting a proof of concept up and going.
Take a look at Windows Synchronization and specifically Critical Sections
Just my 2cents, I've implemented both a vnc server and client on windows, these were my impressions.

Regarding handling more than 1024 socket descriptors

I have written a chat server using C on Linux. I have tested the same and it works fine with respect to performance. The only thing which lags is that I am using select system call for handling of sockets descriptors. Since select has the limit of 1024 so at max my chat server can handle only 1024 users concurrently.
I know that the other option which I can use is poll, but not so sure about it and its performance as compared to select.
Please suggest me the most effective way by which I can resolve this situation.
poll() can be used as an almost drop-in replacement for select(), and will allow you to exceed 1024 file descriptors (you can make make the array passed to poll() as large as you want).
It will have similar performance characteristics to select(), since both require the kernel and userspace application to scan the entire array - but if select() is working OK for you, then poll() should too. (There is actually a slight performance improvement in poll() - the .events field, specifying the events you are interested in for each file descriptor, is not changed by poll(), so you don't have to rebuild the array before every call like you do with the file descriptor sets passed to select()).
If you later find yourself having performance problems caused by scanning the poll file descriptor array, you can consider switching to the epoll interface, which is more complicated but also scales better with very large numbers of file descriptors.
Your question is known as the C10K problem (how to deal with more than 10 thousands simultaneous connections). You'll find lot of resources on the web, e.g. this one.
And you should consider select as an obsolete system call. Even with only dozens of file descriptors, you should at least prefer poll
Notice that Qt and Gtk provide you with an event loop machinery, often using poll (and QtCore or Glib can be used outside of graphical interfaces). There is also libev and libevent. I suggest using one of them.
Linux has no 1024 limit on select(). But:
select() performance is very poor
FreeBSD does :)
Your can use poll(). But its performance suffers when number of active connections increases.
Using epoll() is preferable on Linux however I would suggest to use libevent
libevent is fast, clean and portable way to implement heavy loaded servers and for linux it has epoll under the hood.

Most suitable asynchronous socket model for an instant messenger client?

I'm working on an instant messenger client in C++ (Win32) and I'm experimenting with different asynchronous socket models. So far I've been using WSAAsyncSelect for receiving notifications via my main window. However, I've been experiencing some unexpected results with Winsock spawning additionally 5-6 threads (in addition to the initial thread created when calling WSAAsyncSelect) for one single socket.
I have plans to revamp the client to support additional protocols via DLL:s, and I'm afraid that my current solution won't be suitable based on my experiences with WSAAsyncSelect in addition to me being negative towards mixing network with UI code (in the message loop).
I'm looking for advice on what a suitable asynchronous socket model could be for a multi-protocol IM client which needs to be able to handle roughly 10-20+ connections (depending on amount of protocols and protocol design etc.), while not using an excessive amount of threads -- I am very interested in performance and keeping the resource usage down.
I've been looking on IO Completion Ports, but from what I've gathered, it seems overkill. I'd very much appreciate some input on what a suitable socket solution could be!
Thanks in advance! :-)
There are four basic ways to handle multiple concurrent sockets.
Multiplexing, that is using select() to poll the sockets.
AsyncSelect which is basically what you're doing with WSAAsyncSelect.
Worker Threads, creating a single thread for each connection.
IO Completion Ports, or IOCP. dp mentions them above, but basically they are an OS specific way to handle asynchronous I/O, which has very good performance, but it is a little more confusing.
Which you choose often depends on where you plan to go. If you plan to port the application to other platforms, you may want to choose #1 or #3, since select is not terribly different from other models used on other OS's, and most other OS's also have the concept of threads (though they may operate differently). IOCP is typically windows specific (although Linux now has some async I/O functions as well).
If your app is Windows only, then you basically want to choose the best model for what you're doing. This would likely be either #3 or #4. #4 is the most efficient, as it calls back into your application (similar, but with better peformance and fewer issues to WSAsyncSelect).
The big thing you have to deal with when using threads (either IOCP or WorkerThreads) is marshaling the data back to a thread that can update the UI, since you can't call UI functions on worker threads. Ultimately, this will involve some messaging back and forth in most cases.
If you were developing this in Managed code, i'd tell you to look at Jeffrey Richter's AysncEnumerator, but you've chose C++ which has it's pros and cons. Lots of people have written various network libraries for C++, maybe you should spend some time researching some of them.
consider to use the ASIO library you can find in boost (www.boost.org).
Just use synchronous models. Modern operating systems handle multiple threads quite well. Async IO is really needed in rare situations, mostly on servers.
In some ways IO Completion Ports (IOCP) are overkill but to be honest I find the model for asynchronous sockets easier to use than the alternatives (select, non-blocking sockets, Overlapped IO, etc.).
The IOCP API could be clearer but once you get past it it's actually easier to use I think. Back when, the biggest obstacle was platform support (it needed an NT based OS -- i.e., Windows 9x did not support IOCP). With that restriction long gone, I'd consider it.
If you do decide to use IOCP (which, IMHO, is the best option if you're writing for Windows) then I've got some free code available which takes away a lot of the work that you need to do.
Latest version of the code and links to the original articles are available from here.
And my views on how my framework compares to Boost::ASIO can be found here: http://www.lenholgate.com/blog/2008/09/how-does-the-socket-server-framework-compare-to-boostasio.html.

alternatives to winsock2 with example server source in c++

i'm using this example implementation found at http://tangentsoft.net/wskfaq/examples/basics/select-server.html
This is doing most of what I need, handles connections without blocking and does all work in its thread (not creating a new thread for each connection as some examples do), but i'm worried since i've been told winsock will only support max 64 client connectios :S
Is this 64 connections true?
What other choices do I have? It would be cool to have a c++ example for a similar implementation.
Thanks
Alternative library:
You should consider using boost asio. It is a cross platform networking library which simplifies many of the tasks you may have to do.
You can find the example source code you seek here.
About the 64 limit:
There is no hard 64 connection limit that you will experience with a good design. Basically if you use some kind of threading model you will not experience this limitation.
Here's some information on the limit you heard about:
4.9 - What are the "64 sockets" limitations?
There are two 64-socket limitations:
The Win32 event mechanism (e.g.
WaitForMultipleObjects()) can only
wait on 64 event objects at a time.
Winsock 2 provides the
WSAEventSelect() function which lets
you use Win32's event mechanism to
wait for events on sockets. Because it
uses Win32's event mechanism, you can
only wait for events on 64 sockets at
a time. If you want to wait on more
than 64 Winsock event objects at a
time, you need to use multiple
threads, each waiting on no more than
64 of the sockets.
The select() function is also limited
in certain situations to waiting on 64
sockets at a time. The FD_SETSIZE
constant defined in winsock.h
determines the size of the fd_set
structures you pass to select(). It's
defined by default to 64. You can
define this constant to a higher value
before you #include winsock.h, and
this will override the default value.
Unfortunately, at least one
non-Microsoft Winsock stack and some
Layered Service Providers assume the
default of 64; they will ignore
sockets beyond the 64th in larger
fd_sets.
You can write a test program to try
this on the systems you plan on
supporting, to see if they are not
limited. If they are, you can get
around this with threads, just as you
would with event objects.
Source
#Brian:
if ((gConnections.size() + 1) > 64) {
// For the background on this check, see
// www.tangentsoft.net/wskfaq/advanced.html#64sockets
// The +1 is to account for the listener socket.
cout << "WARNING: More than 63 client "
"connections accepted. This will not "
"work reliably on some Winsock "
"stacks!" << endl;
}
To the OP:
Why would you not want to use winsock2?
You could try to look at building your own server using IOCP, although making this cross-platform is a little tricky. You could look at Boost::asio like Brian suggested.
Before you decide that you need 'alternatives to winsock2" please read this: Network Programming for Microsoft Windows.
In summary, you DON'T need an 'alternative to Winsock2' you need to understand how to use the programming models supplied to full effect on the platform that you're targeting. Then, if you really need cross platform sockets code that uses async I/O then look at ASIO, but, if you don't really need cross platform code then consider something that actually focuses on the problems that you might have on the platform that you do need to focus on - i.e. something windows specific. Go back to the book mentioned above and take a look at the various options you have.
The most performant and scalable option is to use IO Completion Ports. I have some free code available from here that makes it pretty easy to write a server that scales and performs well on a windows (NT) based platform; the linked page also links to some articles that I've written about this. A comparison of my framework to ASIO can be found here: http://www.lenholgate.com/blog/2008/09/how-does-the-socket-server-framework-compare-to-boostasio.html.