C++ Sockets Send() Thread-Safety - c++

I am coding sockets server for 1000 clients maxmimum, the server is about my game, i'm using non-blocking sockets and about 10 threads that receive data simultaneously from different sockets (first thread receives from 0-100,second from 101-200 and so on..)
but if thread 1 wants to send data to all 1000 clients and thread 2 also wants to send data to all 1000 clients at the same time, is that safe? are there any chances of the data being messed in the other (client) side?
if yes, i guess the only problem that can happen is that sometimes client would receive 2 or 10 packets as 1 packet, is that correct? if yes, is there any solution to that :(

The usual pattern of dealing with many sockets is to have a dedicated thread polling for I/O events with select(2), poll(2), or better kqueue(2) or epoll(4) (depending on the platform) acting as socket event dispatcher. The sockets are usually handled in non-blocking mode. Then one might have pool of threads reacting to the events and either do reads and writes directly or via lower level buffers/queues.
All sorts of techniques are applicable here - from queues to event subscription whiteboards. It gets tricky with multiplexing accepts/reads/writes/EOFs on the I/O level and with event arbitration on the application level. Several libraries like libevent and boost::asio help structure the lower level (the ACE library is also in this space, but I'd hate recommending it to anybody). You would have to come up with application-level protocols and state machines yourself (again boost::statechart might be of help).
Some good links to get better understanding of what you are up against (this is probably the millionth time they are mentioned here on SO):
The C10K problem
High-Performance Server Architecture
Apologies for not offering a concrete solution, but this is a very wide design question and most decisions depend heavily on the context (lots of fun though). Hope this helps a bit.

Since you are sending data using different sockets, there must not be any problem. Rather when these different threads access same data you have to ensure data integrity.

Are you using UDP or TCP sockets?
If UDP, each write should be encapsulated in a separate packet and should be carried to the other side intact. The order may be swapped (as it may for any UDP packet) but they should be whole.
If TCP, there's no concept of packets on the transport layer and any 10 writes on one side may be bundled up on the other side in one read. TCP writes may also only accept part of your buffer so even if the send() function is atomic, your write isn't necessarily. In this case you'd need to synchronize it.

send() is not atomic in most implementations, so sending to 1000 different sockets from multiple threads could lead to mixed-up messages arriving on the client side, and all kinds of weirdness. (I know nothing, see Nicolai's and Robert's comments below the rest of my comment still stands though (in terms of being a solution to your problem))
What I would do is use threads for sending like you use them for receiving. One thread to manage sending to one (or more) sockets that ensures that you don't write to one socket from multiple threads at the same time.
Also look here for some additional discussion and more interesting links.
If you're on windows, the winsock programmers faq is an invaluable resource, for your issue see here.

Related

accept a socket in one thread and write data in different thread [duplicate]

I am implementing a simple server, that accepts a single connection and then uses that socket to simultaneously read and write messages from the read and write threads.
What is the safe and easy way to simultaneously read and write from the same socket descriptor in c/c++ on linux?
I dont need to worry about multiple threads read and writing from the same socket as there will be a single dedicated read and single dedicated write thread writing to the socket.
In the above scenario, is any kind of locking required?
Does the above scenario require non blocking socket?
Is there any opensource library, that would help in the above scenario?
In the above scenario, is any kind of locking required?
None.
Does the above scenario require non blocking socket?
The bit you're probably worried about - the read/recv and write/send threads on an established connection - do not need to be non-blocking if you're happy for those threads to sit there waiting to complete. That's normally one of the reasons you'd use threads rather than select, epoll, async operations, or io_uring - keeps the code simpler too.
If the thread accepting new clients is happy to block in the call to accept(), then you're all good there too.
Still, there's one subtle issue with TCP servers you might want to keep in the back of your mind... if your program grows to handle multiple clients and have some periodic housekeeping to do. It's natural and tempting to use a select or epoll call with a timeout to check for readability on the listening socket - which indicates a client connection attempt - then accept the connection. There's a race condition there: the client connection attempt may have dropped between select() and accept(), in which case accept() will block if the listening socket's not non-blocking, and that can prevent a timely return to the select() loop and halt the periodic on-timeout processing until another client connects.
Is there any opensource library, that would help in the above scenario?
There are hundreds of libraries for writing basic servers (and asking for 3rd party lib recommendations is off-topic on SO so I won't get into it), but ultimately what you've asked for is easily achieved atop an OS-provided BSD sockets API or the Windows bastardisation ("winsock").
Sockets are BI-DIRECTIONAL. If you've ever actually dissected an Ethernet or Serial cable or seen the low-level hardware wiring diagram for them, you can actually SEE distinct copper wires for the "TX" (transmit) and "RX" (receive) lines. The software for sending the signals, from the device controller up to most OS APIs for a 'socket', reflects this and it is the key difference between a socket and an ordinary pipe on most systems (e.g. Linux).
To really get the most out of sockets, you need:
1) Async IO support that uses IO Completion Ports, epoll(), or some similar async callback or event system to 'wake up' whenever data comes in on the socket. This then must call your lowest-level 'ReadData' API to read the message off the socket connection.
2) A 2nd API that supports the low-level writes, a 'WriteData' (transmit) that pushes bytes onto the socket and does not depend on anything the 'ReadData' logic needs. Remember, your send and receive are independent even at the hardware level, so don't introduce locking or other synchronization at this level.
3) A pool of Socket IO threads, which blindly do any processing of data that is read from or will be written to a socket.
4) PROTOCOL CALLBACK: A callback object the socket threads have smart pointers to. It handles any PROTOCOL layer- such as parsing your data blob into a real HTTP request- that sits on top of the basic socket connection. Remember, a socket is just a data pipe between computers and data sent over it will often arrive as a series of fragments- the packets. In protocols like UDP the packets aren't even in order. The low-level 'ReadData' and 'WriteData' will callback from their threads into here, because it is where content-aware data processing actually begins.
5) Any callbacks the protocol handler itself needs. For HTTP, you package the raw request buffers into nice objects that you hand off to a real servlet, which should return a nice response object that can be serialized into an HTTP spec-compliant response.
Notice the basic pattern: You have to make the whole system fundamentally async (an 'onion of callbacks') if you wish to take full advantage of bi-directional, async IO over sockets. The only way to read and write simultaneously to the socket is with threads, so you could still synchronize between a 'writer' and 'reader' thread, but I'd only do it if the protocol or other considerations forced my hand. The good news is that you can get great performance with sockets using highly async processing, the bad is that building such a system in a robust way is a serious effort.
You don't have to worry about it. One thread reading and one thread writing will work as you expect. Sockets are full duplex, so you can read while you write and vice-versa. You'd have to worry if you had multiple writers, but this is not the case.

Methodologies to handling multiple clients in C++ winsock

I'm developing a peer to peer message parsing application. So one peer may need to handle many clients. And also there is a possibility to send and receive large data (~20 MB data as one message). There can be situations like many peers send large data to the same peer. I heard there are many solutions to handle these kind of a situation.
Use thread per peer
Using a loop to go through the peers and if there are data we can
recive
Using select function
etc.
What is the most suitable methodology or most common and accepted way to handle these kind of a situation? Any advice or hint are welcome.
Updated: Is there a good peer to peer distributed computing library or framework for C++ on windows platform
Don't use a thread per peer; past the number of processors, additional threads is likely only to hurt performance. You'd also have been expected to tweak the dwStackSize so that 1000 idle peers doesn't cost you 1000MB of RAM.
You can use a thread-pool (X threads handling Y sockets) to get a performance boost (or, ideally, IO Completion Ports), but this tends to work incredibly well for certain kinds of applications, and not at all for other kinds of applications. Unless you're certain that yours is suited for this, I wouldn't justify taking the risk.
It's entirely permissible to use a single thread and poll/send from a large quantity of sockets. I don't know precisely when large would have a concernable overhead, but I'd (conservatively) ballpark it somewhere between 2k-5k sockets (on below average hardware).
The workaround for WSAEWOULDBLOCK is to have a std::queue<BYTE> of bytes (not a queue of "packet objects") for each socket in your application (you populate this queue with the data you want to send), and have a single background-thread whose sole purpose is to drain the queues into the respective socket send (X bytes at a time); you can use blocking socket for this now (since it's a background-worker), but if you do use a non-blocking socket and get WSAEWOULDBLOCK you can just keep trying to drain the queue (here it won't obstruct the flow of your application).
You could use libtorrent.org which is built on top of boost (boost-asio ). It's focusing on efficiency and scalability.
I have not much experience in developing a socket in C++ but in C# I had really good experience accepting connections asynchronously and pass them to an own thread from a threadpool.

simultaneously read and write on the same socket in C or C++

I am implementing a simple server, that accepts a single connection and then uses that socket to simultaneously read and write messages from the read and write threads.
What is the safe and easy way to simultaneously read and write from the same socket descriptor in c/c++ on linux?
I dont need to worry about multiple threads read and writing from the same socket as there will be a single dedicated read and single dedicated write thread writing to the socket.
In the above scenario, is any kind of locking required?
Does the above scenario require non blocking socket?
Is there any opensource library, that would help in the above scenario?
In the above scenario, is any kind of locking required?
None.
Does the above scenario require non blocking socket?
The bit you're probably worried about - the read/recv and write/send threads on an established connection - do not need to be non-blocking if you're happy for those threads to sit there waiting to complete. That's normally one of the reasons you'd use threads rather than select, epoll, async operations, or io_uring - keeps the code simpler too.
If the thread accepting new clients is happy to block in the call to accept(), then you're all good there too.
Still, there's one subtle issue with TCP servers you might want to keep in the back of your mind... if your program grows to handle multiple clients and have some periodic housekeeping to do. It's natural and tempting to use a select or epoll call with a timeout to check for readability on the listening socket - which indicates a client connection attempt - then accept the connection. There's a race condition there: the client connection attempt may have dropped between select() and accept(), in which case accept() will block if the listening socket's not non-blocking, and that can prevent a timely return to the select() loop and halt the periodic on-timeout processing until another client connects.
Is there any opensource library, that would help in the above scenario?
There are hundreds of libraries for writing basic servers (and asking for 3rd party lib recommendations is off-topic on SO so I won't get into it), but ultimately what you've asked for is easily achieved atop an OS-provided BSD sockets API or the Windows bastardisation ("winsock").
Sockets are BI-DIRECTIONAL. If you've ever actually dissected an Ethernet or Serial cable or seen the low-level hardware wiring diagram for them, you can actually SEE distinct copper wires for the "TX" (transmit) and "RX" (receive) lines. The software for sending the signals, from the device controller up to most OS APIs for a 'socket', reflects this and it is the key difference between a socket and an ordinary pipe on most systems (e.g. Linux).
To really get the most out of sockets, you need:
1) Async IO support that uses IO Completion Ports, epoll(), or some similar async callback or event system to 'wake up' whenever data comes in on the socket. This then must call your lowest-level 'ReadData' API to read the message off the socket connection.
2) A 2nd API that supports the low-level writes, a 'WriteData' (transmit) that pushes bytes onto the socket and does not depend on anything the 'ReadData' logic needs. Remember, your send and receive are independent even at the hardware level, so don't introduce locking or other synchronization at this level.
3) A pool of Socket IO threads, which blindly do any processing of data that is read from or will be written to a socket.
4) PROTOCOL CALLBACK: A callback object the socket threads have smart pointers to. It handles any PROTOCOL layer- such as parsing your data blob into a real HTTP request- that sits on top of the basic socket connection. Remember, a socket is just a data pipe between computers and data sent over it will often arrive as a series of fragments- the packets. In protocols like UDP the packets aren't even in order. The low-level 'ReadData' and 'WriteData' will callback from their threads into here, because it is where content-aware data processing actually begins.
5) Any callbacks the protocol handler itself needs. For HTTP, you package the raw request buffers into nice objects that you hand off to a real servlet, which should return a nice response object that can be serialized into an HTTP spec-compliant response.
Notice the basic pattern: You have to make the whole system fundamentally async (an 'onion of callbacks') if you wish to take full advantage of bi-directional, async IO over sockets. The only way to read and write simultaneously to the socket is with threads, so you could still synchronize between a 'writer' and 'reader' thread, but I'd only do it if the protocol or other considerations forced my hand. The good news is that you can get great performance with sockets using highly async processing, the bad is that building such a system in a robust way is a serious effort.
You don't have to worry about it. One thread reading and one thread writing will work as you expect. Sockets are full duplex, so you can read while you write and vice-versa. You'd have to worry if you had multiple writers, but this is not the case.

Can I use Boost::Asio and not worry about network programming problems?

I have to make a server in my current project but I don't have any or little experience in this area. My question is, can I just use Asio in my project and it will simply handle any problems a normal server has to face (partial reads, multithreading problems, ...)?
(My server will have to handle hundreds of clients at the same time)
ASIO takes care of the low-level socket programming and polling code. You still have to provide all the functionality to process raw network data. Ultimately, you get an unpredictable number of bytes from the network any time a read callback is called, and it is up to you to take those bytes and reconstruct your application message from them.
But indeed, as far as receiving an unspecified number of bytes is concerned, you won't have to worry about how that is implemented.
Multithreading is "easy" in the sense that you can run the ASIO processor multiple times concurrently, but it is your responsibility to provide a read callback that can deal with being run multiple times at once.
Asio is intentionally not multithreaded. It handles concurrency by multiplexing via the operating system's select(), kqueue, epoll, or other mechanism.
As for partial receives, there is no automatic way to get TCP to respect message boundaries. Asio can't do anything about that, so you'll need some technique at the application level to indicate completion. HTTP traditionally handles this by closing the socket when it's finished, though it's also possible to pre-send the size of the message.

Program structure for bi-directional TCP communication using Boost::Asio

First off, I hope my question makes sense and is even possible! From what I've read about TCP sockets and Boost::ASIO, I think it should be.
What I'm trying to do is to set up two machines and have a working bi-directional read/write link over TCP between them. Either party should be able to send some data to be used by the other party.
The first confusing part about TCP(/IP?) is that it requires this client/server model. However, reading shows that either side is capable of writing or reading, so I'm not yet completely discouraged. I don't mind establishing an arbitrary party as the client and the other as the server. In my application, that can be negotiated ahead of time and is not of concern to me.
Unfortunately, all of the examples I come across seem to focus on a client connecting to a server, and the server immediately sending some bit of data back. But I want the client to be able to write to the server also.
I envision some kind of loop wherein I call io_service.poll(). If the polling shows that the other party is waiting to send some data, it will call read() and accept that data. If there's nothing waiting in the queue, and it has data to send, then it will call write(). With both sides doing this, they should be able to both read and write to each other.
My concern is how to avoid situations in which both enter into some synchronous write() operation at the same time. They both have data to send, and then sit there waiting to send it on both sides. Does that problem just imply that I should only do asynchronous write() and read()? In that case, will things blow up if both sides of a connection try to write asynchronously at the same time?
I'm hoping somebody can ideally:
1) Provide a very high-level structure or best practice approach which could accomplish this task from both client and server perspectives
or, somewhat less ideally,
2) Say that what I'm trying to do is impossible and perhaps suggest a workaround of some kind.
What you want to do is absolutely possible. Web traffic is a good example of a situation where the "client" sends something long before the server does. I think you're getting tripped up by the words "client" and "server".
What those words really describe is the method of connection establishment. In the case of "client", it's "active" establishment; in the case of "server" it's "passive". Thus, you may find it less confusing to use the terms "active" and "passive", or at least think about them that way.
With respect to finding example code that you can use as a basis for your work, I'd strongly encourage you to take a look at W. Richard Stevens' "Unix Network Programming" book. Any edition will suffice, though the 2nd Edition will be more up to date. It will be only C, but that's okay, because the socket API is C only. boost::asio is nice, but it sounds like you might benefit from seeing some of the nuts and bolts under the hood.
My concern is how to avoid situations
in which both enter into some
synchronous write() operation at the
same time. They both have data to
send, and then sit there waiting to
send it on both sides. Does that
problem just imply that I should only
do asynchronous write() and read()? In
that case, will things blow up if both
sides of a connection try to write
asynchronously at the same time?
It sounds like you are somewhat confused about how protocols are used. TCP only provides a reliable stream of bytes, nothing more. On top of that applications speak a protocol so they know when and how much data to read and write. Both the client and the server writing data concurrently can lead to a deadlock if neither side is reading the data. One way to solve that behavior is to use a deadline_timer to cancel the asynchronous write operation if it has not completed in a certain amount of time.
You should be using asynchronous methods when writing a server. Synchronous methods are appropriate for some trivial client applications.
TCP is full-duplex, meaning you can send and receive data in the order you want. To prevent a deadlock in your own protocol (the high-level behaviour of your program), when you have the opportunity to both send and receive, you should receive as a priority. With epoll in level-triggered mode that looks like: epoll for send and receive, if you can receive do so, otherwise if you can send and have something to send do so. I don't know how boost::asio or threads fit here; you do need some measure of control on how sends and receives are interleaved.
The word you're looking for is "non-blocking", which is entirely different from POSIX asynchronous I/O (which involves signals).
The idea is that you use something like fcntl(fd,F_SETFL,O_NONBLOCK). write() will return the number of bytes successfully written (if positive) and both read() and write() return -1 and set errno = EAGAIN if "no progress can be made" (no data to read or write window full).
You then use something like select/epoll/kqueue which blocks until a socket is readable/writable (depending on the flags set).