breaking out from socket select - c++

I have a loop which basically calls this every few seconds (after the timeout):
while(true){
if(finished)
return;
switch(select(FD_SETSIZE, &readfds, 0, 0, &tv)){
case SOCKET_ERROR : report bad stuff etc; return;
default : break;
}
// do stuff with the incoming connection
}
So basically for every few seconds (which is specified by tv), it reactivates the listening.
This is run on thread B (not a main thread). There are times when I want to end this acceptor loop immediately from thread A (main thread), but seems like I have to wait until the time interval finishes..
Is there a way to disrupt the select function from another thread so thread B can quit instantly?

The easiest way is probably to use pipe(2) to create a pipe and add the read end to readfds. When the other thread wants to interrupt the select() just write a byte to it, then consume it afterward.

Yes, you create a connected pair of sockets. Then thread B writes to one side of socket and thread A adds the other side socket to select. So once B writes to socket A exits select, do not forget to read this byte from socket.
This is the most standard and common way to interrupt selects.
Notes:
Under Unix, use socketpair to create a pair of sockets, under windows it is little bit tricky but googling for Windows socketpair would give you samples of code.

Can't you just make the timeout sufficiently short (like 10ms or so?).
These "just create a dummy connection"-type solution seem sort of hacked. I personally think that if an application is well designed, concurrent tasks never have to be interrupted forcefully, the just has worker check often enough (this is also a reason why boost.threads do not have a terminate function).
Edit Made this answer CV. It is bad, but it might help other to understand why it is bad, which is explained in the comments.

You can use shutdown(Sock, SHUT_RDWR) call from main thread to come out of waiting select call which will also exit your another thread before the timeout so you don't need to wait till timeout expires.
cheers. :)

Related

How to stop select() immediately on closing the worker thread? [duplicate]

I have a loop which basically calls this every few seconds (after the timeout):
while(true){
if(finished)
return;
switch(select(FD_SETSIZE, &readfds, 0, 0, &tv)){
case SOCKET_ERROR : report bad stuff etc; return;
default : break;
}
// do stuff with the incoming connection
}
So basically for every few seconds (which is specified by tv), it reactivates the listening.
This is run on thread B (not a main thread). There are times when I want to end this acceptor loop immediately from thread A (main thread), but seems like I have to wait until the time interval finishes..
Is there a way to disrupt the select function from another thread so thread B can quit instantly?
The easiest way is probably to use pipe(2) to create a pipe and add the read end to readfds. When the other thread wants to interrupt the select() just write a byte to it, then consume it afterward.
Yes, you create a connected pair of sockets. Then thread B writes to one side of socket and thread A adds the other side socket to select. So once B writes to socket A exits select, do not forget to read this byte from socket.
This is the most standard and common way to interrupt selects.
Notes:
Under Unix, use socketpair to create a pair of sockets, under windows it is little bit tricky but googling for Windows socketpair would give you samples of code.
Can't you just make the timeout sufficiently short (like 10ms or so?).
These "just create a dummy connection"-type solution seem sort of hacked. I personally think that if an application is well designed, concurrent tasks never have to be interrupted forcefully, the just has worker check often enough (this is also a reason why boost.threads do not have a terminate function).
Edit Made this answer CV. It is bad, but it might help other to understand why it is bad, which is explained in the comments.
You can use shutdown(Sock, SHUT_RDWR) call from main thread to come out of waiting select call which will also exit your another thread before the timeout so you don't need to wait till timeout expires.
cheers. :)

How to wakeup Select call without timeout period from another thread

I am searching solution to wake-up select call in c++, As per application requirement i cant set timeout because of multiple thread using select system call.
Please see below scenario.
i want to wakeup select system call waiting on other thread. I tried to write data on the thread from main thread but still it is not able to wakeup it.
I want to close thread and socket if there is empty data on this thread.
It is wakes up select call if socket connection is close from other process, but not working with thread.
Does any one have idea regarding this
On a recent Linux you can use eventfd, on everything in general - a pipe, usage - register one side of the pipe in selector for readability along with actual socket(s), to wake up a selector - just write one byte to the other end of the pipe. Alternatively (if your libc has it) you can use pselect with a sigmask to catch the ALRM signal and raise that signal whenever you need to wake the selector up. Be very careful with using signals approach in a multithreaded application (as "I would not use"), as if not done right a signal may be delivered to a random thread.
Thanks all for valuable suggestion, I am able to resolve the issue with shutdown() call on socket FD using reference answer present on this link, it will pass wakeup signal to select, which is waiting for action. We should close socket only after select call otherwise select will not able to get wake up signal.

How does a thread pool allows me to handle many client connections?

I want to handle 300 to 400 client connections, but I do not want to create a thread for each client connection (or is there anything wrong in creating a 400 threads?).
so I have read that I should use a thread pool to fix this problem, but I am unable to understand how does a thread pool actually fix this problem. I mean in my understanding of a thread pool, there is a limited number of threads that start to take tasks. But once a thread takes a recv() task it will immediately block if there is nothing to read! so isn't the solution should be that I should have a mechanism that allows me to know if there is something to be read before actually attempting to read it? So how exactly does a thread pool fix my problem of handling many client connections?
Edit: Changed read() to recv().
As user743414 already pointed out, to many threads are not a good idea. But the main problem lies IMHO in your blocking read. You only should use read if there is something to read. Use select to find out which socket has something to read and dispatch that to a worker thread out of the threadpool is the usual way.
With Windows you should use WSASockets.
You use selectin a single thread. Than you use the result of select (which will tell you on which socket are action needed) to dispatch the connection to a worker thread.
You wrote that you use microsoft. Take the sample:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms742219(v=vs.85).aspx
search for the code
//-----------------------------------------
// If data has been received, echo the received data
// from DataBuf back to the client
iResult =
WSASend(AcceptSocket, &DataBuf, 1, &RecvBytes, Flags, &AcceptOverlapped, NULL);
if (iResult != 0) {
wprintf(L"WSASend failed with error = %d\n", WSAGetLastError());
}
you would replace this part with your threadpool like (pseudocode):
mythreadpool *thread=takeOrCreateThreadFromThreadPool();
thread->callWith(&DataBuf,&RecvBytes);
You will find many different but good threadpool implementation which will use methods like that.
Creating 300 - 400 threads should work, but isn't the best solution. Context switch is here the keyword you have to search for. Context switches are expensive.
Another problem with more threads is that each thread gets 1 MB stack memory, and that memory is limited. You could easily try this an check how many threads you can create.
With a threadpool you have one thread which receives a request an then give these request to your threadpool to work.
So you wouldn't have a thread which blocks while waiting for reads. You threadpool is just working when there is something to read.
Another, better option would be on Windows I/O completion ports. Similar technics are also available on linux.
The thread pool helps because you probably will not have all the 400 connections constantly sending and receiving data, so your app needs only a handful of threads to manage them all.
A single thread can monitor all the connections (using select for instance) and as soon as select unlocks then it loops through all the sockets that need attention and pass them to the thread pool. If select specified that a sockets has received data then read will not block (and you can still set the timeout of read to 0)

What is an overlapped I/O alternative to WaitNamedPipe?

The WaitNamedPipe function allows a pipe client application to synchronously wait for an available connection on a named pipe server. You then call CreateFile to open the pipe as a client. Pseudocode:
// loop works around race condition with WaitNamedPipe and CreateFile
HANDLE hPipe;
while (true) {
if (WaitNamedPipe says connection is ready) {
hPipe = CreateFile(...);
if (hPipe ok or last error is NOT pipe busy) {
break; // hPipe is valid or last error is set
}
} else {
break; // WaitNamedPipe failed
}
}
The problem is that these are all blocking, synchronous calls. What is a good way to do this asynchronously? I can't seem to find an API that uses overlapped I/O to do this, for example. For example, for pipe servers the ConnectNamedPipe function provides an lpOverlapped parameters allowing for a server to asynchronously wait for a client. The pipe server can then call WaitForMultipleObjects and wait for the I/O operation to complete, or any other event to be signaled (for example, an event signaling the thread to cancel pending I/O and terminate).
The only way I can think of is to call WaitNamedPipe in a loop with a short, finite timeout and check other signals if it times out. Alternatively, in a loop call CreateFile, check other signals, and then call Sleep with a short delay (or WaitNamedPipe). For example:
HANDLE hPipe;
while (true) {
hPipe = CreateFile(...);
if (hPipe not valid and pipe is busy) {
// sleep 100 milliseconds; alternatively, call WaitNamedPipe with timeout
Sleep(100);
// TODO: check other signals here to see if we should abort I/O
} else
break;
}
But this method stinks to high heaven in my opinion. If a pipe isn't available for awhile, the thread continues to run - sucking up CPU, using power, requiring memory pages to remain in RAM, etc. In my mind, a thread that relies on Sleep or short timeouts does not perform well and is a sign of sloppy multi-threaded programming.
But what's the alternative in this case?
WaitNamedPipe is completely useless, and will just use all the cpu if you specify a timeout and there's no server waiting for it.
Just call CreateFile over and over with a Sleep like you're doing, and move it to other threads as you see appropriate. There is no API alternative.
The only "benefit" WaitNamedPipe provides is if you want to know if you can connect to a named pipe but you explicitly don't want to actually open a connection. It's junk.
If you really want to be thorough, your only options are
Ensure that whatever program is opening the named pipe is always calling CreateNamedPipe again immediately after it's named pipe is connected to.
Have your program actually check if that program is running.
If your intent is really not to have additional connections, still call CreateNamedPipe, and when someone connects, tell them to go away until they're waited a given amount of time, the close the pipe.
Why can't the server just create more pipes? The performance hit in the scenario you describe isn't a problem if it is rare.
I.e. if there are usually enough pipes to go round what does it matter if you use CreateFile/Sleep instead of WaitForMultipleObjects? The performance hit will not matter.
I also have to question the need for overlapped IO in a client. How many servers is it communicating with at a time? If the answer is less than, say, 10 you could reasonably create a thread per connection.
Basically I am saying I think the reason there is no overlapped WaitforNamedPipe is because there is no reasonable use-case which requires it.
You can open the pipe file system at \\.\pipe\ and then use DeviceIoControl to send FSCTL_PIPE_WAIT.

I want to wait on both a file descriptor and a mutex, what's the recommended way to do this?

I would like to spawn off threads to perform certain tasks, and use a thread-safe queue to communicate with them. I would also like to be doing IO to a variety of file descriptors while I'm waiting.
What's the recommended way to accomplish this? Do I have to created an inter-thread pipe and write to it when the queue goes from no elements to some elements? Isn't there a better way?
And if I have to create the inter-thread pipe, why don't more libraries that implement shared queues allow you to create the shared queue and inter-thread pipe as a single entity?
Does the fact I want to do this at all imply a fundamental design flaw?
I'm asking this about both C++ and Python. And I'm mildly interested in a cross-platform solution, but primarily interested in Linux.
For a more concrete example...
I have some code which will be searching for stuff in a filesystem tree. I have several communications channels open to the outside world through sockets. Requests that may (or may not) result in a need to search for stuff in the filesystem tree will be arriving.
I'm going to isolate the code that searches for stuff in the filesystem tree in one or more threads. I would like to take requests that result in a need to search the tree and put them in a thread-safe queue of things to be done by the searcher threads. The results will be put into a queue of completed searches.
I would like to be able to service all the non-search requests quickly while the searches are going on. I would like to be able to act on the search results in a timely fashion.
Servicing the incoming requests would generally imply some kind of event-driven architecture that uses epoll. The queue of disk-search requests and the return queue of results would imply a thread-safe queue that uses mutexes or semaphores to implement the thread safety.
The standard way to wait on an empty queue is to use a condition variable. But that won't work if I need to service other requests while I'm waiting. Either I end up polling the results queue all the time (and delaying the results by half the poll interval, on average), blocking and not servicing requests.
Whenever one uses an event driven architecture, one is required to have a single mechanism to report event completion. On Linux, if one is using files, one is required to use something from the select or poll family meaning that one is stuck with using a pipe to initiate all none file related events.
Edit: Linux has eventfd and timerfd. These can be added to your epoll list and used to break out of the epoll_wait when either triggered from another thread or on a timer event respectively.
There is another option and that is signals. One can use fcntl modify the file descriptor such that a signal is emitted when the file descriptor becomes active. The signal handler may then push a file-ready message onto any type of queue of your choosing. This may be a simple semaphore or mutex/condvar driven queue. Since one is now no longer using select/poll, one no longer needs to use a pipe to queue none file based messages.
Health warning: I have not tried this and although I cannot see why it will not work, I don't really know the performance implications of the signal approach.
Edit: Manipulating a mutex in a signal handler is probably a very bad idea.
I've solved this exact problem using what you mention, pipe() and libevent (which wraps epoll). The worker thread writes a byte to its pipe FD when its output queue goes from empty to non-empty. That wakes up the main IO thread, which can then grab the worker thread's output. This works great is actually very simple to code.
You have the Linux tag so I am going to throw this out: POSIX Message Queues do all this, which should fulfill your "built-in" request if not your less desired cross-platform wish.
The thread-safe synchronization is built-in. You can have your worker threads block on read of the queue. Alternatively MQs can use mq_notify() to spawn a new thread (or signal an existing one) when there is a new item put in the queue. And since it looks like you are going to be using select(), MQ's identifier (mqd_t) can be used as a file descriptor with select.
It seems nobody has mentioned this option yet:
Don't run select/poll/etc. in your "main thread". Start a dedicated secondary thread which does the I/O and pushes notifications into your thread-safe queue (the same queue which your other threads use to communicate with the main thread) when I/O operations complete.
Then your main thread just needs to wait on the notification queue.
Duck's and twk's are actually better answers than doron's (the one selected by the OP), in my opinion. doron suggests writing to a message queue from within the context of a signal handler, and states that the message queue can be "any type of queue." I would strongly caution you against this since many C library/system calls cannot safely be called from within a signal handler (see async-signal-safe).
In particuliar, if you choose a queue protected by a mutex, you should not access it from a signal handler. Consider this scenario: your consumer thread locks the queue to read it. Immediately after, the kernel delivers the signal to notify you that a file descriptor now has data on it. You signal handler runs in the consumer thread, necessarily), and tries to put something on your queue. To do this, it first has to take the lock. But it already holds the lock, so you are now deadlocked.
select/poll is, in my experience, the only viable solution to an event-driven program in UNIX/Linux. I wish there were a better way inside a mutlithreaded program, but you need some mechanism to "wake up" your consumer thread. I have yet to find a method that does not involve a system call (since the consumer thread is on a waitqueue inside the kernel during any blocking call such as select).
EDIT: I forgot to mention one Linux-specific way to handle signals when using select/poll: signalfd(2). You get a file descriptor you can select/poll on, and you handling code runs normally instead of in a signal handler's context.
This is a very common seen problem, especially when you are developing network server-side program. Most Linux server-side program's main look will loop like this:
epoll_add(serv_sock);
while(1){
ret = epoll_wait();
foreach(ret as fd){
req = fd.read();
resp = proc(req);
fd.send(resp);
}
}
It is single threaded(the main thread), epoll based server framework. The problem is, it is single threaded, not multi-threaded. It requires that proc() should never blocks or runs for a significant time(say 10 ms for common cases).
If proc() will ever runs for a long time, WE NEED MULTI THREADS, and executes proc() in a separated thread(the worker thread).
We can submit task to the worker thread without blocking the main thread, using a mutex based message queue, it is fast enough.
epoll_add(serv_sock);
while(1){
ret = epoll_wait();
foreach(ret as fd){
req = fd.read();
queue.add_job(req); // fast, non blockable
}
}
Then we need a way to obtain the task result from a worker thread. How? If we just check the message queue directly, before or after epoll_wait().
epoll_add(serv_sock);
while(1){
ret = epoll_wait(); // may blocks for 10ms
resp = queue.check_result(); // fast, non blockable
foreach(ret as fd){
req = fd.read();
queue.add_job(req); // fast, non blockable
}
}
However, the checking action will execute after epoll_wait() to end, and epoll_wait() usually blocks for 10 micro seconds(common cases) if all file descriptors it waits are not active.
For a server, 10 ms is quite a long time! Can we signal epoll_wait() to end immediately when task result is generated?
Yes! I will describe how it is done in one of my open source project:
Create a pipe for all worker threads, and epoll waits on that pipe as well. Once a task result is generated, the worker thread writes one byte into the pipe, then epoll_wait() will end in nearly the same time! - Linux pipe has 5 us to 20 us latency.
In my project SSDB(a Redis protocol compatible in-disk NoSQL database), I create a SelectableQueue for passing messages between the main thread and worker threads. Just like its name, SelectableQueue has an file descriptor, which can be wait by epoll.
SelectableQueue: https://github.com/ideawu/ssdb/blob/master/src/util/thread.h#L94
Usage in main thread:
epoll_add(serv_sock);
epoll_add(queue->fd());
while(1){
ret = epoll_wait();
foreach(ret as fd){
if(fd is queue){
sock, resp = queue->pop_result();
sock.send(resp);
}
if(fd is client_socket){
req = fd.read();
queue->add_task(fd, req);
}
}
}
Usage in worker thread:
fd, req = queue->pop_task();
resp = proc(req);
queue->add_result(fd, resp);
C++11 has std::mutex and std::condition_variable. The two can be used to have one thread signal another when a certain condition is met. It sounds to me like you will need to build your solution out of these primitives. If you environment does not yet support these C++11 library features, you can find very similar ones at boost. Sorry, can't say much about python.
One way to accomplish what you're looking to do is by implementing the Observer Pattern
You would register your main thread as an observer with all your spawned threads, and have them notify it when they were done doing what they were supposed to (or updating during their run with the info you need).
Basically, you want to change your approach to an event-driven model.