Can select() be used with blocking sockets? - c++

I want to use select() to monitor if a socket has data to be read, but I do not want to use non-blocking sockets. So can select() be used with blocking sockets?
I am using Windows.

Yes, this is the entire point of select.
It watches for activity on sockets that would block if you tried to read from them without knowing that data were there. Most importantly, it can watch for activity on multiple sockets, which you couldn't do without select on a blocking socket unless you had each socket handled in a separate thread. Also importantly, it tells you when a socket is ready for reading and/or for writing; simply invoking either read or write can't do that.
The behaviour of select is even documented in these terms:
select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.
Of course, you can also use it with non-blocking sockets, because otherwise in order to "wait" for activity you'd have to come up with a read-sleep-read-sleep-… loop and that's suboptimal for a few reasons.

Related

accept a socket in one thread and write data in different thread [duplicate]

I am implementing a simple server, that accepts a single connection and then uses that socket to simultaneously read and write messages from the read and write threads.
What is the safe and easy way to simultaneously read and write from the same socket descriptor in c/c++ on linux?
I dont need to worry about multiple threads read and writing from the same socket as there will be a single dedicated read and single dedicated write thread writing to the socket.
In the above scenario, is any kind of locking required?
Does the above scenario require non blocking socket?
Is there any opensource library, that would help in the above scenario?
In the above scenario, is any kind of locking required?
None.
Does the above scenario require non blocking socket?
The bit you're probably worried about - the read/recv and write/send threads on an established connection - do not need to be non-blocking if you're happy for those threads to sit there waiting to complete. That's normally one of the reasons you'd use threads rather than select, epoll, async operations, or io_uring - keeps the code simpler too.
If the thread accepting new clients is happy to block in the call to accept(), then you're all good there too.
Still, there's one subtle issue with TCP servers you might want to keep in the back of your mind... if your program grows to handle multiple clients and have some periodic housekeeping to do. It's natural and tempting to use a select or epoll call with a timeout to check for readability on the listening socket - which indicates a client connection attempt - then accept the connection. There's a race condition there: the client connection attempt may have dropped between select() and accept(), in which case accept() will block if the listening socket's not non-blocking, and that can prevent a timely return to the select() loop and halt the periodic on-timeout processing until another client connects.
Is there any opensource library, that would help in the above scenario?
There are hundreds of libraries for writing basic servers (and asking for 3rd party lib recommendations is off-topic on SO so I won't get into it), but ultimately what you've asked for is easily achieved atop an OS-provided BSD sockets API or the Windows bastardisation ("winsock").
Sockets are BI-DIRECTIONAL. If you've ever actually dissected an Ethernet or Serial cable or seen the low-level hardware wiring diagram for them, you can actually SEE distinct copper wires for the "TX" (transmit) and "RX" (receive) lines. The software for sending the signals, from the device controller up to most OS APIs for a 'socket', reflects this and it is the key difference between a socket and an ordinary pipe on most systems (e.g. Linux).
To really get the most out of sockets, you need:
1) Async IO support that uses IO Completion Ports, epoll(), or some similar async callback or event system to 'wake up' whenever data comes in on the socket. This then must call your lowest-level 'ReadData' API to read the message off the socket connection.
2) A 2nd API that supports the low-level writes, a 'WriteData' (transmit) that pushes bytes onto the socket and does not depend on anything the 'ReadData' logic needs. Remember, your send and receive are independent even at the hardware level, so don't introduce locking or other synchronization at this level.
3) A pool of Socket IO threads, which blindly do any processing of data that is read from or will be written to a socket.
4) PROTOCOL CALLBACK: A callback object the socket threads have smart pointers to. It handles any PROTOCOL layer- such as parsing your data blob into a real HTTP request- that sits on top of the basic socket connection. Remember, a socket is just a data pipe between computers and data sent over it will often arrive as a series of fragments- the packets. In protocols like UDP the packets aren't even in order. The low-level 'ReadData' and 'WriteData' will callback from their threads into here, because it is where content-aware data processing actually begins.
5) Any callbacks the protocol handler itself needs. For HTTP, you package the raw request buffers into nice objects that you hand off to a real servlet, which should return a nice response object that can be serialized into an HTTP spec-compliant response.
Notice the basic pattern: You have to make the whole system fundamentally async (an 'onion of callbacks') if you wish to take full advantage of bi-directional, async IO over sockets. The only way to read and write simultaneously to the socket is with threads, so you could still synchronize between a 'writer' and 'reader' thread, but I'd only do it if the protocol or other considerations forced my hand. The good news is that you can get great performance with sockets using highly async processing, the bad is that building such a system in a robust way is a serious effort.
You don't have to worry about it. One thread reading and one thread writing will work as you expect. Sockets are full duplex, so you can read while you write and vice-versa. You'd have to worry if you had multiple writers, but this is not the case.

How do I recover when a synchronous call to socket send() gets blocked due to the loss of the other end of the connection?

When my socket connection is terminated normally, then it works fine. But there are cases where the normal termination does not occur and the remote side of the connection simply disappears. When this happens, the sending task gets stuck in send() because the other side has stopped ack'ing the data. My application has a ping request/response going on and so, in another thread, it recognizes that the connection is dead. The question is...what should this other thread do in order to bring the connection to a safe termination. Should it call close()? I see SIGPIPE thrown around when this happens and I just want to make sure I am closing the connection in a safe way. I just don't want it to crash...I don't care about the leftover data. I am using a C++ library that is using synchronous sockets, so moving to async is not an easy option for me.
I avoid this problem by setting setting SIGPIPE to be ignored, and setting all my sockets to non-blocking I/O mode. Once a socket is in non-blocking mode, it will never block inside of send() or recv() -- rather, in any situation where it would normally block, it will instead immediately return -1 and set errno to EWOULDBLOCK instead. Therefore I can never "lose control" of the thread due to bad network conditions.
Of course if you never block, how do you keep your event loop from spinning and using up 100% of a core all the time? The answer is that you can block waiting for I/O inside of a separate call that is designed to do just that, e.g. select() or poll() or similar. These functions are designed to block until any one of a number of sockets becomes ready-to-read (or optionally ready-for-write) or until a pre-specified amount of time elapses, whichever comes first. So by using these, you can have your thread wake up when it needs to wake up and also sleep when there's nothing to do.
Anyway, once you have that (and you've made sure that your code handles short reads, short writes, and -1/EWOULDBLOCK gracefully, as those happen more often in non-blocking mode), you are free to implement your dead-network-detector in any of several ways. You could implement it within your network I/O thread, by keeping track of how long it has been since any data was last sent or received, and by using the timeout argument to select() to cause the blocking function to wake up at the appropriate times based on that. Or you could still use a second thread, but now the second thread has a safe way to wake up the first thread: by calling pipe() or socketpair() you can create a pair of connected file descriptors, and your network I/O thread can select()/poll() on the receiving file descriptor while the other thread holds the sending file descriptor. Then when the other thread wants to wake up the I/O thread, it can send a byte on its file descriptor, or just close() it; either one will cause the network I/O thread to return from select() or poll() and find out that something has happened on its receiving-file-descriptor, which gives it the opportunity to react by exiting (or taking whatever action is appropriate).
I use this technique in almost all of my network programming, and I find it works very well to achieve network behavior that is both reliable and CPU-efficient.
I had a lot of SIGPIPE in my application. Those are not really important: they just tells you that a Pipe (here a SOCKET) is no more available.
I do then, in my main function
signal(SIGPIPE, SIG_IGN);
Another option is to use MSG_NOSIGNAL flag for send, e.g. send(..., MSG_NOSIGNAL);. In that case SIGPIPE is not sent, the call returns -1 and errno == EPIPE.

Using send() on a blocking socket from multiple threads

I have read that you are not supposed to use send() on a blocking socket from multiple threads, but I do not know why! And if I want to use send() from multiple threads, is there anything I can do to allow it?
I am using Windows.
The fundamental reason is that synchronous I/O functions use the handle object (sockets are implemented as handles) to keep track of whether the I/O is complete or not.
The result is that if you try to send() to the same socket from multiple threads simultaneously, send() is liable to (a) hang or (b) exit before the I/O is actually complete, with catastrophic results.
You can use a critical section to prevent the sends from overlapping, or have a designated thread that reads data to send from a queue.
Note that this only applies if the sends are to the same socket. Sending to different sockets simultaneously is fine.

simultaneously read and write on the same socket in C or C++

I am implementing a simple server, that accepts a single connection and then uses that socket to simultaneously read and write messages from the read and write threads.
What is the safe and easy way to simultaneously read and write from the same socket descriptor in c/c++ on linux?
I dont need to worry about multiple threads read and writing from the same socket as there will be a single dedicated read and single dedicated write thread writing to the socket.
In the above scenario, is any kind of locking required?
Does the above scenario require non blocking socket?
Is there any opensource library, that would help in the above scenario?
In the above scenario, is any kind of locking required?
None.
Does the above scenario require non blocking socket?
The bit you're probably worried about - the read/recv and write/send threads on an established connection - do not need to be non-blocking if you're happy for those threads to sit there waiting to complete. That's normally one of the reasons you'd use threads rather than select, epoll, async operations, or io_uring - keeps the code simpler too.
If the thread accepting new clients is happy to block in the call to accept(), then you're all good there too.
Still, there's one subtle issue with TCP servers you might want to keep in the back of your mind... if your program grows to handle multiple clients and have some periodic housekeeping to do. It's natural and tempting to use a select or epoll call with a timeout to check for readability on the listening socket - which indicates a client connection attempt - then accept the connection. There's a race condition there: the client connection attempt may have dropped between select() and accept(), in which case accept() will block if the listening socket's not non-blocking, and that can prevent a timely return to the select() loop and halt the periodic on-timeout processing until another client connects.
Is there any opensource library, that would help in the above scenario?
There are hundreds of libraries for writing basic servers (and asking for 3rd party lib recommendations is off-topic on SO so I won't get into it), but ultimately what you've asked for is easily achieved atop an OS-provided BSD sockets API or the Windows bastardisation ("winsock").
Sockets are BI-DIRECTIONAL. If you've ever actually dissected an Ethernet or Serial cable or seen the low-level hardware wiring diagram for them, you can actually SEE distinct copper wires for the "TX" (transmit) and "RX" (receive) lines. The software for sending the signals, from the device controller up to most OS APIs for a 'socket', reflects this and it is the key difference between a socket and an ordinary pipe on most systems (e.g. Linux).
To really get the most out of sockets, you need:
1) Async IO support that uses IO Completion Ports, epoll(), or some similar async callback or event system to 'wake up' whenever data comes in on the socket. This then must call your lowest-level 'ReadData' API to read the message off the socket connection.
2) A 2nd API that supports the low-level writes, a 'WriteData' (transmit) that pushes bytes onto the socket and does not depend on anything the 'ReadData' logic needs. Remember, your send and receive are independent even at the hardware level, so don't introduce locking or other synchronization at this level.
3) A pool of Socket IO threads, which blindly do any processing of data that is read from or will be written to a socket.
4) PROTOCOL CALLBACK: A callback object the socket threads have smart pointers to. It handles any PROTOCOL layer- such as parsing your data blob into a real HTTP request- that sits on top of the basic socket connection. Remember, a socket is just a data pipe between computers and data sent over it will often arrive as a series of fragments- the packets. In protocols like UDP the packets aren't even in order. The low-level 'ReadData' and 'WriteData' will callback from their threads into here, because it is where content-aware data processing actually begins.
5) Any callbacks the protocol handler itself needs. For HTTP, you package the raw request buffers into nice objects that you hand off to a real servlet, which should return a nice response object that can be serialized into an HTTP spec-compliant response.
Notice the basic pattern: You have to make the whole system fundamentally async (an 'onion of callbacks') if you wish to take full advantage of bi-directional, async IO over sockets. The only way to read and write simultaneously to the socket is with threads, so you could still synchronize between a 'writer' and 'reader' thread, but I'd only do it if the protocol or other considerations forced my hand. The good news is that you can get great performance with sockets using highly async processing, the bad is that building such a system in a robust way is a serious effort.
You don't have to worry about it. One thread reading and one thread writing will work as you expect. Sockets are full duplex, so you can read while you write and vice-versa. You'd have to worry if you had multiple writers, but this is not the case.

Waiting on a condition (pthread_cond_wait) and a socket change (select) simultaneously

I'm writing a POSIX compatible multi-threaded server in c/c++ that must be able to accept, read from, and write to a large number of connections asynchronously. The server has several worker threads which perform tasks and occasionally (and unpredictably) queue data to be written to the sockets. Data is also occasionally (and unpredictably) written to the sockets by the clients, so the server must also read asynchronously. One obvious way of doing this is to give each connection a thread which reads and writes from/to its socket; this is ugly, though, since each connection may persist for a long time and the server thus may have to hold hundred or thousand threads just to keep track of connections.
A better approach would be to have a single thread that handled all communications using the select()/pselect() functions. I.e., a single thread waits on any socket to be readable, then spawns a job to process the input that will be handled by a pool of other threads whenever input is available. Whenever the other worker threads produce output for a connection, it gets queued, and the communication thread waits for that socket to be writable before writing it.
The problem with this is that the communication thread may be waiting in the select() or pselect() function when output is queued by the worker threads of the server. It's possible that, if no input arrives for several seconds or minutes, a queued chunk of output will just wait for the communication thread to be done select()ing. This shouldn't happen, however--data should be written as soon as possible.
Right now I see a couple solutions to this that are thread-safe. One is to have the communication thread busy-wait on input and update the list of sockets it waits on for writing every tenth of a second or so. This isn't optimal since it involves busy-waiting, but it will work. Another option is to use pselect() and send the USR1 signal (or something equivalent) whenever new output has been queued, allowing the communication thread to update the list of sockets it is waiting on for writable status immediately. I prefer the latter here, but still dislike using a signal for something that should be a condition (pthread_cond_t). Yet another option would be to include, in the list of file descriptors on which select() is waiting, a dummy file that we write a single byte to whenever a socket needs to be added to the writable fd_set for select(); this would wake up the communications server because that particular dummy file would then be readable, thus allowing the communications thread to immediately update it's writable fd_set.
I feel intuitively, that the second approach (with the signal) is the 'most correct' way to program the server, but I'm curious if anyone knows either which of the above is the most efficient, generally speaking, whether either of the above will cause race conditions that I'm not aware of, or if anyone knows of a more general solution to this problem. What I really want is a pthread_cond_wait_and_select() function that allows the comm thread to wait on both a change in sockets or a signal from a condition.
Thanks in advance.
This is a fairly common problem.
One often used solution is to have pipes as a communication mechanism from worker threads back to the I/O thread. Having completed its task a worker thread writes the pointer to the result into the pipe. The I/O thread waits on the read end of the pipe along with other sockets and file descriptors and once the pipe is ready for read it wakes up, retrieves the pointer to the result and proceeds with pushing the result into the client connection in non-blocking mode.
Note, that since pipe reads and writes of less then or equal to PIPE_BUF are atomic, the pointers get written and read in one shot. One can even have multiple worker threads writing pointers into the same pipe because of the atomicity guarantee.
Unfortunately, the best way to do this is different for each platform. The canonical, portable way to do it is to have your I/O thread block in poll. If you need to get the I/O thread to leave poll, you send a single byte on a pipe that the thread is polling. That will cause the thread to exit from poll immediately.
On Linux, epoll is the best way. On BSD-derived operating systems (including OSX, I think), kqueue. On Solaris, it used to be /dev/poll and there's something else now whose name I forget.
You may just want to consider using a library like libevent or Boost.Asio. They give you the best I/O model on each platform they support.
Your second approach is the cleaner way to go. It's totally normal to have things like select or epoll include custom events in your list. This is what we do on my current project to handle such events. We also use timers (on Linux timerfd_create) for periodic events.
On Linux the eventfd lets you create such arbitrary user events for this purpose -- thus I'd say it is quite accepted practice. For POSIX only functions, well, hmm, perhaps one of the pipe commands or socketpair I've also seen.
Busy-polling is not a good option. First you'll be scanning memory which will be used by other threads, thus causing CPU memory contention. Secondly you'll always have to return to your select call which will create a huge number of system calls and context switches which will hurt overall system performance.