VNC viewer implementation - c++

Our team is implementing a VNC viewer (=VNC client) on Windows. The protocol (called RFB) is stateful, meaning that the viewer has to read 1 byte, see what it is, then read either 3 or 10 bytes more, parse them, and so on.
We've decided to use asynchronous sockets and a single (UI) thread. Consequently, there are 2 ways to go:
1) state machine -- if we get a block on socket reading, just remember the current state and quit. Later on, a socket notification will arrive and the interrupted logic will resume from the proper stage;
2) inner message loop -- once we determine that reading from the socket would block, we enter an inner message loop and spin there until all the necessary data is finally received.
UI is not thus frozen in case of a block.
As experience showed, the second approach is bad, as any message can come while we're in the inner message loop. I cannot tell the full story here, but it simply is not reliable enough. Crashes and kludges.
The first option seems to be quite acceptable, but it is not easy to program in such a style. One has to remember the state of an algorithm and values of all the local variables required for further processing.
This is quite possible to use multiple threads, but we just thought that the problems in this case would be even much harder: synchronization of frame-buffer access, multi-threading issues, etc. Moreover, even in this variant it seems necessary to use asynchronous sockets as well.
So, what way is in your opinion the best ?
The problem is quite a general one. This is the problem of organizing asynchronous communication through stateful protocols.
Edit 1: We use C++ and MFC as UI framework.

I've done a few parallel computing projects and it seems that MPI (Message Passing Interface) might be helpful to your VNC project. You're probably not so interested in the parallel computing power provided by MPI, but you may want to use the simplified socket-like interface for asynchronous communication over a network.
http://www.open-mpi.org/
You can find other implementations of MPI and tons of use examples from google.

Don't bother with CSocket, you'll move to CAsyncSocket in the end because of the extra control you get (interrupting, shutting down etc.). I'd also recommend using a separate thread to manage the communication, it adds complexity but keeping the UI responsive should be a top priority.

I think you will find that your design will be simplified greatly by using a separate thread to handle a blocking socket.
The main reason for this is you don't need to spin and wait. The UI remains responsive while the network thread(s) block when it has nothing to do and comes back when it has stuff to do. You are effectively offloading a large portion of your overhead to the OS.
Remember, RFB does not require a whole lot of state info to work. Because client to server messages are short; there is nothing requiring you to receive a frame buffer before you send your next pointer input.
My point being is messages in RFB can be intermixed; the server will work on your schedule.
Now, Windows provides easy to use synchronization API's that while not always the most efficient, are more than enough for your purposes and will ease getting a proof of concept up and going.
Take a look at Windows Synchronization and specifically Critical Sections
Just my 2cents, I've implemented both a vnc server and client on windows, these were my impressions.

Related

Run function periodically in Cap'n Proto RPC server

I have a Cap'n Proto RPC server that runs some OpenGL commands in a window. I am not interested in the window's events at all, but in order to avoid getting killed on Windows I need to poll events once a second or so. How can I do this in a simple fashion?
I have read that you can make your own EventPort, but I couldn't figure out how to actually use EventPorts. It might also be overkill when I'm not actually interested in the events. I would like prioritize RPC events over polling the window if possible.
Using something else than EZ-rpc is not a downside, as I want to move to shared memory communication later on.
So, there's this critical flaw in Windows event handling: The best way to handle network I/O, especially with many connections, is via I/O Completion Ports (IOCP). However, unfortunately, Windows provides no way for a thread to wait on IOCP events and GUI events in the same thread. This seems to be a serious design flaw in the Win32 API, yet it's been this way for decades. Weirder still, the internal NT kernel APIs do in fact support an alternative (specifically, they allow I/O completion events to be delivered via APC) but Microsoft hasn't made these APIs public, so applications that use them could break in a future version of Windows.
As a result, there are essentially two ways to design a program that simultaneously does network I/O and implements a GUI:
Use a MsgWaitForMultipleObjectsEx-based event loop instead of IOCP. You will be limited to no more than 64 connections, and the event loop will be relatively inefficient.
Have separate threads for network and GUI.
For your use case, it sounds like #1 would probably be fine, but there's another problem: The KJ event loop library (used by Cap'n Proto) doesn't implement this case yet. It only implements IOCP-based networking. There's a class Win32WaitObjectThreadPool defined in kj/async-win32.h meant to handle the GUI event loop approach... but at present it is not implemented. (PRs are welcome if you'd like to contribute!)
If you truly don't care about handling GUI events in a timely fashion, then perhaps a hack would work: You could use kj::Timer to create a loop that waits for a second, then checks the Win32 GUI event queue, then waits again, and so on. This is really ugly but would probably be easy to implement. I'm not sure if kj::Timer is exposed via EZ-rpc, so you may have to go to lower-level building blocks like kj::setupAsyncIo() instead.

TCP Two-Way Communication using Qt

I am trying to setup a TCP communication framework between two computers. I would like each computer to send data to the other. So computer A would perform a calculation, and send it to computer B. Computer B would then read this data, perform a calculation using it, and send a result back to computer A. Computer A would wait until it receives something from computer B before proceeding with performing another calculation, and sending it to computer B.
This seems conceptually straightforward, but I haven't been able to locate an example that details two-way (bidirectional) communication via TCP. I've only found one-way server-client communication, where a server sends data to a client. These are some examples that I have looked at closely so far:
Server-Client communication
Synchronized server-client communication
I'm basically looking to have two "servers" communicate with each other. The synchronized approach above is, I believe, important for what I'm trying to do. But I'm struggling to setup a two-way communication framework via a single socket.
I would appreciate it greatly if someone could point me to examples that describe how to setup bidirectional communication with TCP, or give me some pointers on how to set this up, from the examples I have linked above. I am very new to TCP and network communication frameworks and there might be a lot that I could be misunderstanding, so it would be great if I could get some clear pointers on how to proceed.
This answer does not go into specifics, but it should give you a general idea, since that's what you really seem to be asking for. I've never used Qt before, I do all my networking code with BSD-style sockets directly or with my own wrappers.
Stuff to think about:
Protocol. Hand-rolled or existing?
Existing protocols can be heavyweight, depending on what your payload looks like. Examples include HTTP and Google ProtoBuf; there are many more.
Handrolled might mean more work, but more controlled. There are two general approaches: length-based and sentinel-based.
Length-based means embedding the length into the first bytes. Requires caring about endianness. Requires thinking about what if a message is longer than can be embedded in the length byte. If you do this, I strongly recommend that you define your packet formats in some data file, and then generate the low-level packet encoding logic.
Sentinel-based means ending the message when some character (or sequence) is seen. Common sentinels are '\0', '\n', and "\r\n". If the rest of your protocol is also text-based, this means it is much easier to debug.
For both designs, you have to think about what happens if the other side tries to send more data than you are willing (or able) to store in memory. In either case, limiting the payload size to a 16-bit unsigned integer is probably a good idea; you can stream replies with multiple packets. Note that serious protocols (based on UDP + crypto) typically have a protocol-layer size limit of 512-1500 bytes, though application-layer may be larger of course.
For both designs, EOF on the socket without having a sentinel means you must drop the message and log an error.
Main loop. Qt probably has one you can use, but I don't know about it.
It's possible to develop simple operations using solely blocking operations, but I don't recommend it. Always assume the other end of a network connection is a dangerous psychopath who knows where you live.
There are two fundamental operations in a main loop:
Socket events: a socket reports being ready for read, or ready to write. There are also other sorts of events that you probably won't use, since most useful information can be found separately in the read/write handlers: exceptional/priority, (write)hangup, read-hangup, error.
Timer events: when a certain time delta has passed, interrupt the wait-for-socket-events syscall and dispatch to the timer heap. If you don't have any, either pass the syscalls notion of "infinity". But if you have long sleeps, you might want some arbitrary, relatively number like "10 seconds" or "10 minutes" depending on your application, because long timer intervals can do all sorts of weird things with clock changes, hibernation, and such. It's possible to avoid those if you're careful enough and use the right APIs, but most people don't.
Choice of multiplex syscall:
The p versions below include atomic signal mask changing. I don't recommend using them; instead if you need signals either add signalfd to the set or else emulate it using signal handlers and a (nonblocking, be careful!) pipe.
select/pselect is the classic, available everywhere. Cannot have more than FD_SETSIZE file descriptors, which may be very small (but can be #defined on the command-line if you're careful enough. Inefficient with sparse sets. Timeout is microseconds for select and nanonseconds for pselect, but chances are you can't actually get that. Only use this if you have no other choice.
poll/ppoll solves the problems of sparse sets, and more significantly the problem of listening to more than FD_SETSIZE file descriptors. It does use more memory, but it is simpler to use. poll is POSIX, ppoll is GNU-specific. For both, the API provides nanosecond granularity for the timeout, but you probably can't get that. I recommend this if you need BSD compatibility and don't need massive scalability, or if you only have one socket and don't want to deal with epoll's headaches.
epoll solves the problem of having to respecify the file descriptor and event list every time. by keeping the list of file descriptors. Among other things, this means that when, the low-level kernel event occurs, the epoll can immediately be made aware, regardless of whether the user program is already in a syscall or not. Supports edge-triggered mode, but don't use it unless you're sure you understand it. Its API only provides millisecond granularity for the timeout, but that's probably all you can rely on anyway. If you are able to only target Linux, I strongly suggest you use this, except possibly if you can guarantee only a single socket at once, in which case poll is simpler.
kqueue is found on BSD-derived systems, including Mac OS X. It is supposed to solve the same problems as epoll, but instead of keeping things simple by using file descriptors, it has all sorts of strange structures and does not follow the "do only one thing" principle. I have never used it. Use this if you need massive scalability on BSD.
IOCP. This only exists on Windows and some obscure Unixen. I have never used it and it has significantly different semantics. Use this, but be aware that much of this post is not applicable because Windows is weird. But why would you use Windows for any sort of serious system?
io_uring. A new API in Linux 5.1. Significantly reducing the number of syscalls and memory copies. Worth it if you have a lot of sockets, but since it's so new, you must provide a fallback path.
Handler implementation:
When the multiplex syscall signifies an event, look up the handler for that file number (some class with virtual functions) and call the relevant events (note there may be more than one).
Make sure all your sockets have O_NONBLOCK set and also disable Nagle's algorithm (since you're doing buffering yourself), except possibly connect's before the connection is made, since that requires confusing logic, especially if you want to play nice with multiple DNS results.
For TCP sockets, all you need is accept in the listening socket's handler, and read/write family in the accept/connected handler. For other sorts of sockets, you need the send/recv family. See the "see also" in their man pages for more info - chances are one of them will be useful to you sometimes, do this before you hard-code too much into your API design.
You need to think hard about buffering. Buffering reads means you need to be able to check the header of a packet to see if there are enough bytes to do anything with it, or if you have to store the bytes until next time. Also remember that you might receive more than one packet at once (I suggest you rethink your design so that you don't mandate blocking until you get the reply before sending the next packet). Buffering writes is harder than you think, since you don't want to be woken when there is a "can write" even on a socket for which you have no data to write. The application should never write itself, only queue a write. Though TCP_CORK might imply a different design, I haven't used it.
Do not provide a network-level public API of iterating over all sockets. If needed, implement this at a higher level; remember that you may have all sorts of internal file descriptors with special purposes.
All of the above applies to both the server and the client. As others have said, there is no real difference once the connection is set up.
Edit 2019:
The documentation of D-Bus and 0MQ are worth reading, whether you use them or not. In particular, it's worth thinking about 3 kinds of conversations:
request/reply: a "client" makes a request and the "server" does one of 3 things: 1. replies meaningfully, 2. replies that it doesn't understand the request, 3. fails to reply (either due to a disconnect, or due to a buggy/hostile server). Don't let un-acknowledged requests DoS the "client"! This can be difficult, but this is a very common workflow.
publish/subscribe: a "client" tells the "server" that it is interested in certain events. Every time the event happens, the "server" publishes a message to all registered "clients". Variations: , subscription expires after one use. This workflow has simpler failure modes than request/reply, but consider: 1. the server publishes an event that the client didn't ask for (either because it didn't know, or because it doesn't want it yet, or because it was supposed to be a oneshot, or because the client sent an unsubscribe but the server didn't process it yet), 2. this might be a magnification attack (though that is also possible for request/reply, consider requiring requests to be padded), 3. the client might have disconnected, so the server must take care to unsubscribe them, 4. (especially if using UDP) the client might not have received an earlier notification. Note that it might be perfectly legal for a single client to subscribe multiple times; if there isn't naturally discriminating data you may need to keep a cookie to unsubscribe.
distribute/collect: a "master" distributes work to multiple "slaves", then collects the results, aka map/reduce any many other reinvented terms for the same thing. This is similar to a combination of the above (a client subscribes to work-available events, then the server makes a unique request to each clients instead of a normal notification). Note the following additional cases: 1. some slaves are very slow, while others are idle because they've already completed their tasks and the master might have to store the incomplete combined output, 2. some slaves might return a wrong answer, 3. there might not be any slaves, 4.
D-Bus in particular makes a lot of decisions that seem quite strange at first, but do have justifications (which may or may not be relevant, depending on the use case). Normally, it is only used locally.
0MQ is lower-level and most of its "downsides" are solved by building on top of it. Beware of the MxN problem; you might want to artificially create a broker node just for messages that are prone to it.
#include <QAbstractSocket>
#include <QtNetwork>
#include <QTcpServer>
#include <QTcpSocket>
QTcpSocket* m_pTcpSocket;
Connect to host: set up connections with tcp socket and implement your slots. If data bytes are available readyread() signal will be emmited.
void connectToHost(QString hostname, int port){
if(!m_pTcpSocket)
{
m_pTcpSocket = new QTcpSocket(this);
m_pTcpSocket->setSocketOption(QAbstractSocket::KeepAliveOption,1);
}
connect(m_pTcpSocket,SIGNAL(readyRead()),SLOT(readSocketData()),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(error(QAbstractSocket::SocketError)),SIGNAL(connectionError(QAbstractSocket::SocketError)),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(stateChanged(QAbstractSocket::SocketState)),SIGNAL(tcpSocketState(QAbstractSocket::SocketState)),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(disconnected()),SLOT(onConnectionTerminated()),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(connected()),SLOT(onConnectionEstablished()),Qt::UniqueConnection);
if(!(QAbstractSocket::ConnectedState == m_pTcpSocket->state())){
m_pTcpSocket->connectToHost(hostname,port, QIODevice::ReadWrite);
}
}
Write:
void sendMessage(QString msgToSend){
QByteArray l_vDataToBeSent;
QDataStream l_vStream(&l_vDataToBeSent, QIODevice::WriteOnly);
l_vStream.setByteOrder(QDataStream::LittleEndian);
l_vStream << msgToSend.length();
l_vDataToBeSent.append(msgToSend);
m_pTcpSocket->write(l_vDataToBeSent, l_vDataToBeSent.length());
}
Read:
void readSocketData(){
while(m_pTcpSocket->bytesAvailable()){
QByteArray receivedData = m_pTcpSocket->readAll();
}
}
TCP is inherently bidirectional. Get one way working (client connects to server). After that both ends can use send and recv in exactly the same way.
Have a look at QWebSocket, this is based on HTTP and it also allows for HTTPS

Designing a multi-client tcp server to process data

I am attempting to rewrite my current project to include more features and stability, and need some help designing it. Here is the jist of it (for linux):
TCP_SERVER receives connection (auth packet)
TCP_SERVER starts a new (thread/fork) to handle the new client
TCP_SERVER will be receiving many packets from client > which will be added to a circular buffer
A separate thread will be created for that client to process those packets and build a list of objects
Another thread should be created to send parts of the list of objects to another client
The reason to separate all the processing into threads is because server will be getting many packets and the processing wont be able to keep up (which needs to be quick, as its time sensitive) (im not sure if tcp will drop packets if the internal buffer gets too large?), and another thread to send to another client to keep the processing fast as possible.
So for each new connection, 3 threads should be created. 1 to receive packets, 1 to process them, and 1 to send the processed data to another client (which is technically the same person/ip just on a different device)
And i need help designing this, as how to structure this, what to use (forks/threads), what libraries to use.
Trying to do this yourself is going to cause you a world of pain. Focus on your actual application, and leverage an existing socket handling framework. For example, you said:
for each new connection, 3 threads should be created
That statement says the following:
1. You haven't done this before, at scale, and haven't realized the impact all these threads will have.
2. You've never benchmarked thread creation or synchronous operations.
3. The number of things that can go wrong with this approach is pretty overwhelming.
Give some serious thought to using an existing library that does most of this for you. Getting the scaffolding right around this can literally take years, and you're better off focusing on your code rather than all the random plumbing.
The Boost C++ libraries seem to have a nice Async C++ socket handling infrastructure. Combine this with some of the existing C++ thread pools and you could likely have a highly performant solution up fairly quickly.
I would also question your use of C++ for this. Java and C# both do highly scalable socket servers pretty well, and some of the higher level language tooling (Spring, Guarva, etc) can be very, very valuable. If you ever want to secure this, via TLS or another mechanism, you'll also probably find this much easier in Java or C# than in C++.
Some of the major things you'll care about:
1. True Async I/O will be a huge perf and scalability win. Try really hard to do this. The boost asio library looks pretty nice.
2. Focus on your features and stability, rather than building a new socket handling platform.
3. Threads are expensive, avoid creating them. Thread pools are your friend.
You plan to create one-or-more threads for every connection your server handles. Threads are not free, they come with a memory and CPU overhead, and when you have many active threads you also begin to have resource contention.
What usage pattern do you anticipate? Do you expect that when you have 8 connections, all 8 network threads will be consuming 100% of a cpu core pushing/pulling packets? Or do you expect them to have a relatively low turn-around?
As you add more threads, you will begin to have to spend more time competing for resources in things like mutexes etc.
A better pattern is to have one or more thread for network io - most os'es have mechanisms for saying "tell me when one or more of these network connections has io" which is an efficiency saving over having lots of individual threads all doing the same thing for just one connection.
Then for actual processing, spin up a pool of worker threads to do actual work, allowing you to minimize the competition for resources. You can monitor work load to determine if you need to spin up more to meet delivery requirements.
You might also want to look into something to implement the network IO infrastructure for you; I've had really good performance results with libevent but then I've only had to deal with very high performance/reliability networking systems.

Multiuser chat server c++

I am building a Chat Server (which allows private messages between users) in c++ ... just as a challenge for me, and I've hit a dead point... where I don't know what may be better.
By the way: I am barely new to C++; that's why I want a challenge... so if there are other optimal ways, multithreading, etc... let me know please.
Option A
I have a c++ application running, that has an array of sockets, reads all the input (looping through all the sockets) in every loop (1second loop I guess) and stores it to DB (a log is required), and after that, loops again over all the sockets sending what's needed in every socket.
Pros: One single process, contained. Easy to develop.
Cons: I see it hardly scalable, and a single focus of failure ... I mean, what about performance with 20k sockets?
Option B
I have a c++ application listening to connections.
When a connection is received, it forks a subprocess that handles that socket... reading and saving to a DB all the input of the user. And checking all the required output from DB on every loop to write to the socket.
Pros: If the daemon is small enough, having a process per socket is likely more scalable. And at the same time if a process fails, all the others are kept online.
Cons: Harder to develop. May be it consumes too much resources to maintain a process for each connection.
What option do you think is the best? Any other idea or suggestion is welcome :)
As mentioned in the comments, there is an additional alternative which is to use select() or poll() (or, if you don't mind making your application platform-specific, something like epoll()). Personally I would suggest poll() because I find it more convenient, but I think only select() is available on at least some versions of Windows - I don't know whether running on Windows is important to you.
The basic approach here is that you first add all your sockets (including a listen socket, if you're listening for connections) to a structure and then call select() or poll() as appropriate. This call will block your application until at least one of the socket has some data to read, and then you get woken up and you go through the socket(s) that are ready for reading, process the data and then jump back into blocking again. You generally do this in a loop, something like:
while (running) {
int rc = poll(...);
// Handle active file descriptors here.
}
This is a great way to write an application which is primarily IO-bound - i.e. it spends much more time handling network (or disk) traffic than it does actually processing the data with the CPU.
As also mentioned in the comments, another approach is to fork a thread per connection. This is quite effective, and you can use simple blocking IO in each thread to read and write to that connection. Personally I would advise against this approach for several reasons, most of which are largely personal preference.
Firstly, it's fiddly to handle connections where you need to write large amounts of data at a time. A socket can't guarantee to write all pending data at once (i.e. the amount that it sent may not be the full amount you requested). In this case you have to buffer up the pending data locally and wait until there's room in the socket to send it. This means at any given time, you might be waiting for two conditions - either the socket is ready to send, or the socket is ready to read. You could, of course, avoid reading from the socket until all the pending data is sent, but this introduces latency into handling the data. Or, you could use select() or poll() on just that connection - but if so, why bother using threads at all, just handle all the connections that way. You could also use two threads per connection, one for reading and one for writing, which is probably the best approach if you're not confident whether you can always send all messages in a single call, although this doubles the number of threads you need which could make your code more complicated and slightly increase resource usage.
Secondly, if you plan to handle many connections, or a high connection turnover, threads are somewhat more of a load on the system than using select() or friends. This isn't a particularly big deal in most cases, but it's a factor for larger applications. This probably isn't a practical issue unless you were writing something like a webserver that was handling hundreds of requests a second, but I thought it was relevant to mention for reference. If you're writing something of this scale you'd likely end up using a hybrid approach anyway, where you multiplexed some combination of processes, threads and non-blocking IO on top of each other.
Thirdly, some programmers find threads complicated to deal with. You need to be very careful to make all your shared data structures thread-safe, either with exclusive locking (mutexes) or using someone else's library code which does this for you. There are a lot of examples and libraries out there to help you with this, but I'm just pointing out that care is needed - whether multithreaded coding suits you is a matter of taste. It's relatively easy to forget to lock something and have your code work fine in testing because the threads don't happen to contend that data structure, and then find hard-to-diagnose issues when this happens under higher load in the real world. With care and discipline, it's not too hard to write robust multithreaded code and I have no objection to it (though opinions vary), but you should be aware of the care required. To some extent this applies to writing any software, of course, it's just a matter of degree.
Those issues aside, threads are quite a reasonable approach for many applications and some people seem to find them easier to deal with than non-blocking IO with select().
As to your approaches, A will work but is wasteful of CPU because you have to wake up every second regardless of whether there's actual useful work to do. Also, you introduce up to a second's delay in handling messages, which could be irritating for a chat server. In general I would suggest that something like select() is a much better approach than this.
Option B could work although when you want to send messages between connections you're going to have to use something like pipes to communicate between processes and that's a bit of a pain. You'll end up having to wait on both your incoming pipe (for data to send) as well as the socket (for data to receive) and thus you end up effectively with the same problem, having to wait on two filehandles with something like select() or threads. Really, as others have said, threads are the right way to process each connection separately. Separate processes are also a little more expensive of resources than threads (although on platforms such as Linux the copy-on-write approach to fork() means it's not actually too bad).
For small applications with only, say, tens of connections there's not an awful lot technically to choose between threads and processes, it largely depends on which style appeals to you more. I would personally use non-blocking IO (some people call this asynchronous IO, but that's not how I would use the term) and I've written quite a lot of code that does that as well as lots of multithreaded code, but it's still only my personal opinion really.
Finally, if you want to write portable non-blocking IO loops I strongly suggest investigating libev (or possbily libevent but personally I find the former easier to use and more performant). These libraries use different primitives such as select() and poll() on different platforms so your code can remain the same, and they also tend to offer slightly more convenient interfaces.
If you have any more questions on any of that, feel free to ask.

Most suitable asynchronous socket model for an instant messenger client?

I'm working on an instant messenger client in C++ (Win32) and I'm experimenting with different asynchronous socket models. So far I've been using WSAAsyncSelect for receiving notifications via my main window. However, I've been experiencing some unexpected results with Winsock spawning additionally 5-6 threads (in addition to the initial thread created when calling WSAAsyncSelect) for one single socket.
I have plans to revamp the client to support additional protocols via DLL:s, and I'm afraid that my current solution won't be suitable based on my experiences with WSAAsyncSelect in addition to me being negative towards mixing network with UI code (in the message loop).
I'm looking for advice on what a suitable asynchronous socket model could be for a multi-protocol IM client which needs to be able to handle roughly 10-20+ connections (depending on amount of protocols and protocol design etc.), while not using an excessive amount of threads -- I am very interested in performance and keeping the resource usage down.
I've been looking on IO Completion Ports, but from what I've gathered, it seems overkill. I'd very much appreciate some input on what a suitable socket solution could be!
Thanks in advance! :-)
There are four basic ways to handle multiple concurrent sockets.
Multiplexing, that is using select() to poll the sockets.
AsyncSelect which is basically what you're doing with WSAAsyncSelect.
Worker Threads, creating a single thread for each connection.
IO Completion Ports, or IOCP. dp mentions them above, but basically they are an OS specific way to handle asynchronous I/O, which has very good performance, but it is a little more confusing.
Which you choose often depends on where you plan to go. If you plan to port the application to other platforms, you may want to choose #1 or #3, since select is not terribly different from other models used on other OS's, and most other OS's also have the concept of threads (though they may operate differently). IOCP is typically windows specific (although Linux now has some async I/O functions as well).
If your app is Windows only, then you basically want to choose the best model for what you're doing. This would likely be either #3 or #4. #4 is the most efficient, as it calls back into your application (similar, but with better peformance and fewer issues to WSAsyncSelect).
The big thing you have to deal with when using threads (either IOCP or WorkerThreads) is marshaling the data back to a thread that can update the UI, since you can't call UI functions on worker threads. Ultimately, this will involve some messaging back and forth in most cases.
If you were developing this in Managed code, i'd tell you to look at Jeffrey Richter's AysncEnumerator, but you've chose C++ which has it's pros and cons. Lots of people have written various network libraries for C++, maybe you should spend some time researching some of them.
consider to use the ASIO library you can find in boost (www.boost.org).
Just use synchronous models. Modern operating systems handle multiple threads quite well. Async IO is really needed in rare situations, mostly on servers.
In some ways IO Completion Ports (IOCP) are overkill but to be honest I find the model for asynchronous sockets easier to use than the alternatives (select, non-blocking sockets, Overlapped IO, etc.).
The IOCP API could be clearer but once you get past it it's actually easier to use I think. Back when, the biggest obstacle was platform support (it needed an NT based OS -- i.e., Windows 9x did not support IOCP). With that restriction long gone, I'd consider it.
If you do decide to use IOCP (which, IMHO, is the best option if you're writing for Windows) then I've got some free code available which takes away a lot of the work that you need to do.
Latest version of the code and links to the original articles are available from here.
And my views on how my framework compares to Boost::ASIO can be found here: http://www.lenholgate.com/blog/2008/09/how-does-the-socket-server-framework-compare-to-boostasio.html.