Program structure for bi-directional TCP communication using Boost::Asio

Program structure for bi-directional TCP communication using Boost::Asio - c++

First off, I hope my question makes sense and is even possible! From what I've read about TCP sockets and Boost::ASIO, I think it should be.
What I'm trying to do is to set up two machines and have a working bi-directional read/write link over TCP between them. Either party should be able to send some data to be used by the other party.
The first confusing part about TCP(/IP?) is that it requires this client/server model. However, reading shows that either side is capable of writing or reading, so I'm not yet completely discouraged. I don't mind establishing an arbitrary party as the client and the other as the server. In my application, that can be negotiated ahead of time and is not of concern to me.
Unfortunately, all of the examples I come across seem to focus on a client connecting to a server, and the server immediately sending some bit of data back. But I want the client to be able to write to the server also.
I envision some kind of loop wherein I call io_service.poll(). If the polling shows that the other party is waiting to send some data, it will call read() and accept that data. If there's nothing waiting in the queue, and it has data to send, then it will call write(). With both sides doing this, they should be able to both read and write to each other.
My concern is how to avoid situations in which both enter into some synchronous write() operation at the same time. They both have data to send, and then sit there waiting to send it on both sides. Does that problem just imply that I should only do asynchronous write() and read()? In that case, will things blow up if both sides of a connection try to write asynchronously at the same time?
I'm hoping somebody can ideally:
1) Provide a very high-level structure or best practice approach which could accomplish this task from both client and server perspectives
or, somewhat less ideally,
2) Say that what I'm trying to do is impossible and perhaps suggest a workaround of some kind.

What you want to do is absolutely possible. Web traffic is a good example of a situation where the "client" sends something long before the server does. I think you're getting tripped up by the words "client" and "server".
What those words really describe is the method of connection establishment. In the case of "client", it's "active" establishment; in the case of "server" it's "passive". Thus, you may find it less confusing to use the terms "active" and "passive", or at least think about them that way.
With respect to finding example code that you can use as a basis for your work, I'd strongly encourage you to take a look at W. Richard Stevens' "Unix Network Programming" book. Any edition will suffice, though the 2nd Edition will be more up to date. It will be only C, but that's okay, because the socket API is C only. boost::asio is nice, but it sounds like you might benefit from seeing some of the nuts and bolts under the hood.

My concern is how to avoid situations
in which both enter into some
synchronous write() operation at the
same time. They both have data to
send, and then sit there waiting to
send it on both sides. Does that
problem just imply that I should only
do asynchronous write() and read()? In
that case, will things blow up if both
sides of a connection try to write
asynchronously at the same time?
It sounds like you are somewhat confused about how protocols are used. TCP only provides a reliable stream of bytes, nothing more. On top of that applications speak a protocol so they know when and how much data to read and write. Both the client and the server writing data concurrently can lead to a deadlock if neither side is reading the data. One way to solve that behavior is to use a deadline_timer to cancel the asynchronous write operation if it has not completed in a certain amount of time.
You should be using asynchronous methods when writing a server. Synchronous methods are appropriate for some trivial client applications.

TCP is full-duplex, meaning you can send and receive data in the order you want. To prevent a deadlock in your own protocol (the high-level behaviour of your program), when you have the opportunity to both send and receive, you should receive as a priority. With epoll in level-triggered mode that looks like: epoll for send and receive, if you can receive do so, otherwise if you can send and have something to send do so. I don't know how boost::asio or threads fit here; you do need some measure of control on how sends and receives are interleaved.

The word you're looking for is "non-blocking", which is entirely different from POSIX asynchronous I/O (which involves signals).
The idea is that you use something like fcntl(fd,F_SETFL,O_NONBLOCK). write() will return the number of bytes successfully written (if positive) and both read() and write() return -1 and set errno = EAGAIN if "no progress can be made" (no data to read or write window full).
You then use something like select/epoll/kqueue which blocks until a socket is readable/writable (depending on the flags set).

Related

write_some vs write - boost asio

Why do someone want to use write_some when it may not transmit all data to peer?
from boost write_some documentation
The write_some operation may not transmit all of the data to the peer.
Consider using the write function if you need to ensure that all data
is written before the blocking operation completes.
What is the relevance of write_some method in boost when it has write method? I went through the boost write_some documentation,nothing I can guess.

At one extreme, write waits until all the data has been confirmed as written to the remote system. It gives the greatest certainty of successful completion at the expense of being the slowest.
At the opposite extreme, you could just queue the data for writing and return immediately. This is fast, but gives no assurance at all that the data will actually be written. If a router was down, a DNS giving incorrect addresses, etc., you could be trying to send to some machine that isn't available and (possibly) hasn't been for a long time.
write_some is kind of a halfway point between these two extremes. It doesn't return until at least some data has been written, so it assures you that the remote host you were trying to write to does currently exist (for some, possibly rather loose, definition of "currently"). It doesn't assure you that all the data will be written but may complete faster, and still gives a little bit of a "warm fuzzy" feeling that the write is likely to complete.
As to when you'd likely want to use it: the obvious scenario would be something like a large transfer over a local connection on a home computer. The likely problem here isn't with the hardware, but with the computer (or router) being mis-configured. As soon as one byte has gone through, you're fairly assured that the connection is configured correctly, and the transfer will probably complete. Since the transfer is large, you may be saving a lot of time in return for a minimal loss of assurance about successful completion.
As to when you'd want to avoid it: pretty much reverse the circumstances above. You're sending a small amount of data over (for example) an unreliable Internet connection. Since you're only sending a little data, you don't save much time by returning before all the data's sent. The connection is unreliable enough that the odds of a packet being transmitted are effectively independent of the odds for other packets--that is, sending one packet through tells you little about the likelihood of being able to send the next.

There is no reason really. But these functions are at different levels.
basic_stream_socket::write_some is an operation on a socket that pretty much wraps the OS's send operation (most send implementations do not guarantee transmission of the complete message). Normally you wrap this call in a loop until all of the data is sent.
asio::write is a high-level wrapper that will loop until all of the data is sent. It accepts a socket as an argument.
One possible reason to use write_some could be when porting existing code that is based on sockets and that already does the looping.

TCP Two-Way Communication using Qt

I am trying to setup a TCP communication framework between two computers. I would like each computer to send data to the other. So computer A would perform a calculation, and send it to computer B. Computer B would then read this data, perform a calculation using it, and send a result back to computer A. Computer A would wait until it receives something from computer B before proceeding with performing another calculation, and sending it to computer B.
This seems conceptually straightforward, but I haven't been able to locate an example that details two-way (bidirectional) communication via TCP. I've only found one-way server-client communication, where a server sends data to a client. These are some examples that I have looked at closely so far:
Server-Client communication
Synchronized server-client communication
I'm basically looking to have two "servers" communicate with each other. The synchronized approach above is, I believe, important for what I'm trying to do. But I'm struggling to setup a two-way communication framework via a single socket.
I would appreciate it greatly if someone could point me to examples that describe how to setup bidirectional communication with TCP, or give me some pointers on how to set this up, from the examples I have linked above. I am very new to TCP and network communication frameworks and there might be a lot that I could be misunderstanding, so it would be great if I could get some clear pointers on how to proceed.

This answer does not go into specifics, but it should give you a general idea, since that's what you really seem to be asking for. I've never used Qt before, I do all my networking code with BSD-style sockets directly or with my own wrappers.
Stuff to think about:
Protocol. Hand-rolled or existing?
Existing protocols can be heavyweight, depending on what your payload looks like. Examples include HTTP and Google ProtoBuf; there are many more.
Handrolled might mean more work, but more controlled. There are two general approaches: length-based and sentinel-based.
Length-based means embedding the length into the first bytes. Requires caring about endianness. Requires thinking about what if a message is longer than can be embedded in the length byte. If you do this, I strongly recommend that you define your packet formats in some data file, and then generate the low-level packet encoding logic.
Sentinel-based means ending the message when some character (or sequence) is seen. Common sentinels are '\0', '\n', and "\r\n". If the rest of your protocol is also text-based, this means it is much easier to debug.
For both designs, you have to think about what happens if the other side tries to send more data than you are willing (or able) to store in memory. In either case, limiting the payload size to a 16-bit unsigned integer is probably a good idea; you can stream replies with multiple packets. Note that serious protocols (based on UDP + crypto) typically have a protocol-layer size limit of 512-1500 bytes, though application-layer may be larger of course.
For both designs, EOF on the socket without having a sentinel means you must drop the message and log an error.
Main loop. Qt probably has one you can use, but I don't know about it.
It's possible to develop simple operations using solely blocking operations, but I don't recommend it. Always assume the other end of a network connection is a dangerous psychopath who knows where you live.
There are two fundamental operations in a main loop:
Socket events: a socket reports being ready for read, or ready to write. There are also other sorts of events that you probably won't use, since most useful information can be found separately in the read/write handlers: exceptional/priority, (write)hangup, read-hangup, error.
Timer events: when a certain time delta has passed, interrupt the wait-for-socket-events syscall and dispatch to the timer heap. If you don't have any, either pass the syscalls notion of "infinity". But if you have long sleeps, you might want some arbitrary, relatively number like "10 seconds" or "10 minutes" depending on your application, because long timer intervals can do all sorts of weird things with clock changes, hibernation, and such. It's possible to avoid those if you're careful enough and use the right APIs, but most people don't.
Choice of multiplex syscall:
The p versions below include atomic signal mask changing. I don't recommend using them; instead if you need signals either add signalfd to the set or else emulate it using signal handlers and a (nonblocking, be careful!) pipe.
select/pselect is the classic, available everywhere. Cannot have more than FD_SETSIZE file descriptors, which may be very small (but can be #defined on the command-line if you're careful enough. Inefficient with sparse sets. Timeout is microseconds for select and nanonseconds for pselect, but chances are you can't actually get that. Only use this if you have no other choice.
poll/ppoll solves the problems of sparse sets, and more significantly the problem of listening to more than FD_SETSIZE file descriptors. It does use more memory, but it is simpler to use. poll is POSIX, ppoll is GNU-specific. For both, the API provides nanosecond granularity for the timeout, but you probably can't get that. I recommend this if you need BSD compatibility and don't need massive scalability, or if you only have one socket and don't want to deal with epoll's headaches.
epoll solves the problem of having to respecify the file descriptor and event list every time. by keeping the list of file descriptors. Among other things, this means that when, the low-level kernel event occurs, the epoll can immediately be made aware, regardless of whether the user program is already in a syscall or not. Supports edge-triggered mode, but don't use it unless you're sure you understand it. Its API only provides millisecond granularity for the timeout, but that's probably all you can rely on anyway. If you are able to only target Linux, I strongly suggest you use this, except possibly if you can guarantee only a single socket at once, in which case poll is simpler.
kqueue is found on BSD-derived systems, including Mac OS X. It is supposed to solve the same problems as epoll, but instead of keeping things simple by using file descriptors, it has all sorts of strange structures and does not follow the "do only one thing" principle. I have never used it. Use this if you need massive scalability on BSD.
IOCP. This only exists on Windows and some obscure Unixen. I have never used it and it has significantly different semantics. Use this, but be aware that much of this post is not applicable because Windows is weird. But why would you use Windows for any sort of serious system?
io_uring. A new API in Linux 5.1. Significantly reducing the number of syscalls and memory copies. Worth it if you have a lot of sockets, but since it's so new, you must provide a fallback path.
Handler implementation:
When the multiplex syscall signifies an event, look up the handler for that file number (some class with virtual functions) and call the relevant events (note there may be more than one).
Make sure all your sockets have O_NONBLOCK set and also disable Nagle's algorithm (since you're doing buffering yourself), except possibly connect's before the connection is made, since that requires confusing logic, especially if you want to play nice with multiple DNS results.
For TCP sockets, all you need is accept in the listening socket's handler, and read/write family in the accept/connected handler. For other sorts of sockets, you need the send/recv family. See the "see also" in their man pages for more info - chances are one of them will be useful to you sometimes, do this before you hard-code too much into your API design.
You need to think hard about buffering. Buffering reads means you need to be able to check the header of a packet to see if there are enough bytes to do anything with it, or if you have to store the bytes until next time. Also remember that you might receive more than one packet at once (I suggest you rethink your design so that you don't mandate blocking until you get the reply before sending the next packet). Buffering writes is harder than you think, since you don't want to be woken when there is a "can write" even on a socket for which you have no data to write. The application should never write itself, only queue a write. Though TCP_CORK might imply a different design, I haven't used it.
Do not provide a network-level public API of iterating over all sockets. If needed, implement this at a higher level; remember that you may have all sorts of internal file descriptors with special purposes.
All of the above applies to both the server and the client. As others have said, there is no real difference once the connection is set up.
Edit 2019:
The documentation of D-Bus and 0MQ are worth reading, whether you use them or not. In particular, it's worth thinking about 3 kinds of conversations:
request/reply: a "client" makes a request and the "server" does one of 3 things: 1. replies meaningfully, 2. replies that it doesn't understand the request, 3. fails to reply (either due to a disconnect, or due to a buggy/hostile server). Don't let un-acknowledged requests DoS the "client"! This can be difficult, but this is a very common workflow.
publish/subscribe: a "client" tells the "server" that it is interested in certain events. Every time the event happens, the "server" publishes a message to all registered "clients". Variations: , subscription expires after one use. This workflow has simpler failure modes than request/reply, but consider: 1. the server publishes an event that the client didn't ask for (either because it didn't know, or because it doesn't want it yet, or because it was supposed to be a oneshot, or because the client sent an unsubscribe but the server didn't process it yet), 2. this might be a magnification attack (though that is also possible for request/reply, consider requiring requests to be padded), 3. the client might have disconnected, so the server must take care to unsubscribe them, 4. (especially if using UDP) the client might not have received an earlier notification. Note that it might be perfectly legal for a single client to subscribe multiple times; if there isn't naturally discriminating data you may need to keep a cookie to unsubscribe.
distribute/collect: a "master" distributes work to multiple "slaves", then collects the results, aka map/reduce any many other reinvented terms for the same thing. This is similar to a combination of the above (a client subscribes to work-available events, then the server makes a unique request to each clients instead of a normal notification). Note the following additional cases: 1. some slaves are very slow, while others are idle because they've already completed their tasks and the master might have to store the incomplete combined output, 2. some slaves might return a wrong answer, 3. there might not be any slaves, 4.
D-Bus in particular makes a lot of decisions that seem quite strange at first, but do have justifications (which may or may not be relevant, depending on the use case). Normally, it is only used locally.
0MQ is lower-level and most of its "downsides" are solved by building on top of it. Beware of the MxN problem; you might want to artificially create a broker node just for messages that are prone to it.

#include <QAbstractSocket>
#include <QtNetwork>
#include <QTcpServer>
#include <QTcpSocket>
QTcpSocket* m_pTcpSocket;
Connect to host: set up connections with tcp socket and implement your slots. If data bytes are available readyread() signal will be emmited.
void connectToHost(QString hostname, int port){
if(!m_pTcpSocket)
{
m_pTcpSocket = new QTcpSocket(this);
m_pTcpSocket->setSocketOption(QAbstractSocket::KeepAliveOption,1);
}
connect(m_pTcpSocket,SIGNAL(readyRead()),SLOT(readSocketData()),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(error(QAbstractSocket::SocketError)),SIGNAL(connectionError(QAbstractSocket::SocketError)),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(stateChanged(QAbstractSocket::SocketState)),SIGNAL(tcpSocketState(QAbstractSocket::SocketState)),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(disconnected()),SLOT(onConnectionTerminated()),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(connected()),SLOT(onConnectionEstablished()),Qt::UniqueConnection);
if(!(QAbstractSocket::ConnectedState == m_pTcpSocket->state())){
m_pTcpSocket->connectToHost(hostname,port, QIODevice::ReadWrite);
}
}
Write:
void sendMessage(QString msgToSend){
QByteArray l_vDataToBeSent;
QDataStream l_vStream(&l_vDataToBeSent, QIODevice::WriteOnly);
l_vStream.setByteOrder(QDataStream::LittleEndian);
l_vStream << msgToSend.length();
l_vDataToBeSent.append(msgToSend);
m_pTcpSocket->write(l_vDataToBeSent, l_vDataToBeSent.length());
}
Read:
void readSocketData(){
while(m_pTcpSocket->bytesAvailable()){
QByteArray receivedData = m_pTcpSocket->readAll();
}
}

TCP is inherently bidirectional. Get one way working (client connects to server). After that both ends can use send and recv in exactly the same way.

Have a look at QWebSocket, this is based on HTTP and it also allows for HTTPS

Can single-buffer blocking WSASend deliver partial data?

I've pretty much always used send() with sockets and now I'm moving onto the WSA functions. With send(), I have a sendall() helper that ensured all data is delivered even if it didn't happen in one try and a partial send occurred on first call.
So, instead of learning the hard way or over-complicating code when I don't have to, decided to ask you:
Can a blocking WSASend() send partial data or does it send everything before it returns or fails? Or should I check the bytes sent vs. expected to send and keep at it until everything is delivered?
ANSWER: Overlapped WSASend() does not send partial data but if it does, it means the connection has terminated. I've never encountered the case yet.

From the WSASend docs:
If the socket is non-blocking and stream-oriented, and there is not sufficient space in the transport's buffer, WSASend will return with only part of the application's buffers having been consumed. Given the same buffer situation and a blocking socket, WSASend will block until all of the application buffer contents have been consumed.
I haven't tried this behavior though. BTW, why do you rewrite your code to use WSA functions? Switching from standard bsd socket api just to use the socket basically with the same blocking behavior doesn't really seem to be a good idea for me. Just keep the old blocking code with send with the "retry code", this way its portable and bulletproof. It is not saving 1-2 comparisons is that makes your IO code performant.
Switch to specialized WSA functions only if you are trying to exploit some windows specific strengths, or if you want to use for non-blocking sockets with WSAWaitForMultipleObjects that is a bit better than the standard select but even in that case you can simply go with send and recv as I did it.
In my opinion using epoll/kqueue/iocp (or a library that abstracts these away) with sockets are the way to go. There are some very basic tasks that can be done with blocking sockets but if you cross the line and you need nonblocking socks then switching straight to epoll/kqueue/iocp is the way to go instead of programming painful select or WSAWaitForMultipleObjects based apis. epoll/kqueue/iocp are not only better but also easier to program than the select based alternatives. Really. They are more modern apis that were invented based on more experience. (Although they are not crossplatform, but even select has portability issues...).
The previously mentioned apis for linux/bsd/windows are based on the same concept but in my opinion the simplest and easiest to learn is the epoll api of linux. It is ways better than a select call but its 100x easier to program once you get the idea. If you start using IOCP on windows than it my seem a bit more complicated.
If you haven't yet used these apis then definitely give epoll a go if you are familiar with linux and then on windows implement the same with IOCP that is based on a similar concept with a bit more complicated overlapped IO programming. With IOCP you will have a reason for using WSASend because you can not start overlapped IO on a socket with send but you can do that with WSASend (or WriteFile).
EDIT: If you are going for max performance with IOCP then here are some additional hints:
Drop blocking operations. This is very important. A serious networking engine can not afford blocking IO. It simply doesn't scale on any of the platforms. Do overlapped operations for both send and receive, overlapped IO is the big gun of windows.
Setup a thread pool that processes the completed IO operations. Setup test clients that bomb your server with real-world-usage-like messages and parallel connection counts and under stress tweak the buffer sizes and thread counts for your actual target hardware.
Set the SO_RCVBUF and SO_SNDBUF sizes of your sockets to zero and play around with the size of the buffers that you are using to send and receive data. Setting the rcv/send buf of the socket handle to zero allows the tcp stack to receive/send data directly to/from your buffers avoiding an additional copy between your userspace buffers and the socket buffers. The optimal size for these buffers is also subject to tweaking. I usually use at least a few ten K buffers sizes but sometimes in case of large volume transfers 1-2M buffer sizes are better depending on the number of parallel busy connections. Again, tweak the values while stressing the server with some test clients that do activity similar to real world clients. When you are ready with the first working version of your network engine on top of it lets build a test client that can simulate many (maybe thousands of) parallel clients depending on the real world usage of your server.
You will need "per connection software send buffers" inside your network engine and you may (or may not) want to control the max size of the send buffers. In case of reaching the max send buffer size you may want to block or discard messages/data depending on what you want to do, encapsulate this special buffer and provide two nice interfaces to it: one for the threads that are putting data into this buffer and another interface that is used by the IOCP sender code. This buffer is usually a very critical part of the whole thing and I usually had a lot of bugs around this part of the code so make sure to design its interface nicely to minimize the number of bugs. Depending on how your application constructs and puts messages into the queue you can play around a lot with the internal implementation (size of storage chunks, nagle-like optimizations, ...).

Writing a server application that Pushes to clients (TCP)

I'm writing a client-server application and one of the requirements is the Server, upon receiving an update from one of the clients, be able to Push out new data to all the other clients. This is a C++ (Qt) application meant to run on Linux (both client and server), but I'm more looking for high-level conceptual ideas of how this should work (though low-level thoughts are good, too).
Server:
It needs to (among its other duties) keep a socket open listening for incoming packets from potentially n different clients, presumably on a background thread (I haven't written much in terms of socket code other than some rinky-dink examples in school). Upon getting this data from a client, it processes it and then spits it out to all its clients, right?
Of course, I'm not sure how it actually does this. I'm guessing this means it has to keep persistent connections with every single client (at least the active clients), but I don't understand even conceptually how to maintain this connection (or the list of these connections).
So, how should I approach this?

In general when you have multiple clients, there are a few ways to handle this.
First of all, in TCP, when a client connects to you they're placed into a queue until they can be serviced. This is a given, you don't need to do anything except call the accept system call to receive a new client. Once the client is recieved, you'll be given a socket which you use to read and write. Who reads / writes first is entirely dependent on your protocol, but both sides need to know the protocol (which is up to you to define).
Once you've got the socket, you can do a few things. In a simple case, you just read some data, process it, write back to the socket, close the socket, and serve the next client. Unfortunately this means you can only serve one client at a time, thus no "push" updates are possible. Another strategy is to keep a list of all the open sockets. Any "updates" simply iterate over the list and write to each socket. This may present a problem though because it only allows push updates (if a client sent a request, who would be watching for it?)
The more advanced approach is to assign one thread to each socket. In this scenario, each time a socket is created, you spin up a new thread whose whole purpose is to serve exactly one client. This cuts down on latency and utilizes multiple cores (if available), but is far more difficult to program. Also if you have 10,000 clients connecting, that's 10,000 threads which gets to be too much. Pushing an update to a single client (in this scenario) is very simple (a thread just writes to its respective socket). Pushing to all of them at once is a little more tricky (requires either a thread event or a producer / consumer queue, neither of which are very fun to implement)
There are, of course, a million other ways to handle this (one process per client, a thread pool, a load-balancing proxy, you name it). Suffice it to say there's no way to cover all of these in one answer. I hope this answers your basic questions, let me know if you need me to clarify anything. It's a very large subject. However if I might make a suggestion, handling multiple clients is a wheel that has been re-invented a million times. There are very good libraries out there that are far more efficient and programmer-friendly than raw socket IO. I suggest libevent, which turns network requests into an event-driven paradigm (much more like GUI programming, which might be nice for you), and is incredibly efficient.

From what I understand, I think you need to keep an infinite loop going, (at least until the program terminates) that answers a connection request from your clients. It would be best to add them to a array of some sort. Use an event to see when a new client is added to that array, and wait for one of them to give data. Then you do what you have to do with that data and spit it back.

C++ Sockets Send() Thread-Safety

I am coding sockets server for 1000 clients maxmimum, the server is about my game, i'm using non-blocking sockets and about 10 threads that receive data simultaneously from different sockets (first thread receives from 0-100,second from 101-200 and so on..)
but if thread 1 wants to send data to all 1000 clients and thread 2 also wants to send data to all 1000 clients at the same time, is that safe? are there any chances of the data being messed in the other (client) side?
if yes, i guess the only problem that can happen is that sometimes client would receive 2 or 10 packets as 1 packet, is that correct? if yes, is there any solution to that :(

The usual pattern of dealing with many sockets is to have a dedicated thread polling for I/O events with select(2), poll(2), or better kqueue(2) or epoll(4) (depending on the platform) acting as socket event dispatcher. The sockets are usually handled in non-blocking mode. Then one might have pool of threads reacting to the events and either do reads and writes directly or via lower level buffers/queues.
All sorts of techniques are applicable here - from queues to event subscription whiteboards. It gets tricky with multiplexing accepts/reads/writes/EOFs on the I/O level and with event arbitration on the application level. Several libraries like libevent and boost::asio help structure the lower level (the ACE library is also in this space, but I'd hate recommending it to anybody). You would have to come up with application-level protocols and state machines yourself (again boost::statechart might be of help).
Some good links to get better understanding of what you are up against (this is probably the millionth time they are mentioned here on SO):
The C10K problem
High-Performance Server Architecture
Apologies for not offering a concrete solution, but this is a very wide design question and most decisions depend heavily on the context (lots of fun though). Hope this helps a bit.

Since you are sending data using different sockets, there must not be any problem. Rather when these different threads access same data you have to ensure data integrity.

Are you using UDP or TCP sockets?
If UDP, each write should be encapsulated in a separate packet and should be carried to the other side intact. The order may be swapped (as it may for any UDP packet) but they should be whole.
If TCP, there's no concept of packets on the transport layer and any 10 writes on one side may be bundled up on the other side in one read. TCP writes may also only accept part of your buffer so even if the send() function is atomic, your write isn't necessarily. In this case you'd need to synchronize it.

send() is not atomic in most implementations, so sending to 1000 different sockets from multiple threads could lead to mixed-up messages arriving on the client side, and all kinds of weirdness. (I know nothing, see Nicolai's and Robert's comments below the rest of my comment still stands though (in terms of being a solution to your problem))
What I would do is use threads for sending like you use them for receiving. One thread to manage sending to one (or more) sockets that ensures that you don't write to one socket from multiple threads at the same time.
Also look here for some additional discussion and more interesting links.
If you're on windows, the winsock programmers faq is an invaluable resource, for your issue see here.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js