I am utilizing the Berkeley sockets select function in the following way.
/*Windows and linux typedefs/aliases/includes are made here with wsa
junk already taken care of.*/
/**Check if a socket can receive data without waiting.
\param socket The os level socket to check.
\param to The timeout value. A nullptr value will block forever, and zero
for each member of the value will cause it to return immediately.
\return True if recv can be called on the socket without blocking.*/
bool CanReceive(OSSocket& socket,
const timeval * to)
{
fd_set set = {};
FD_SET(socket, &set);
timeval* toCopy = nullptr;
if (to)
{
toCopy = new timeval;
*toCopy = *to;
}
int error = select((int)socket, &set, 0, 0, toCopy);
delete toCopy;
if (error == -1)
throw Err(); //will auto set from errno.
else if (error == 0)
return false;
else
return true;
}
I have written a class that will watch a container of sockets (wrapped up in aother class) and add an ID to a separate container that stores info on what sockets are ready to be accessed. The map is an unordered_map.
while(m_running)
{
for(auto& e : m_idMap)
{
auto id = e.first;
auto socket = e.second;
timeval timeout = ZeroTime; /*0sec, 0micro*/
if(CanReceive(socket,&timeout) &&
std::count(m_readyList.begin(),m_readyList.end(),socket) == 0)
{
/*only add sockets that are not on the list already.*/
m_readyList.push_back(id);
}
}
}
As I'm sure many have noticed, this code run insanely fast and gobbles up CPU like there is no tomorrow (40% CPU usage with only one socket in the map). My first solution was to have a smart waiting function that keeps the iterations per second to a set value. That seemed to be fine with some people. My question is this: How can I be notified when sockets are ready without using this method? Even if it might require a bunch of macro junk to keep it portable that's fine. I can only think there might be some way to have the operating system watch it for me and get some sort of notification or event when the socket is ready. Just to be clear, I have chosen not use dot net.
The loop runs in its own thread, sends notifications to other parts of the software when sockets are ready. The entire thing is multi threaded and every part of it (except this part) uses an event based notification system that eliminates the busy waiting problem. I understand that things become OS-dependent and limited in this area.
Edit: The sockets are run in BLOCKING mode (but select has no timeout, and therefor will not block), but they are operated on in a dedicated thread.
Edit: The system performs great with the smart sleeping functions on it, but not as good as it could with some notification system in place (likely from the OS).
First, you must set the socket non-blocking if you don't want the sockets to block. The select function does not provide a guarantee that a subsequent operation will not block. It's just a status reporting function that tells you about the past and the present.
Second, the best way to do this varies from platform to platform. If you don't want to write lots of platform specific code, you really should use a library like Boost ASIO or libevent.
Third, you can call select on all the sockets at the same time with a timeout. The function will return immediately if any of the sockets are (or were) readable and, if not, will wait up to the timeout. When select returns, it will report whether it timed out or, if not, which sockets were readable.
This will still perform very poorly because of the large number of wait lists the process has to be put on just to be immediately removed from all of them as soon as a single socket is readable. But it's the best you can do with reasonable portability.
How can I be notified when sockets are ready without using this
method?
That's what select() is for. The idea is that your call to select() should block until at least one of the sockets you passed in to it (via FD_SET()) is ready-for-read. After select() returns, you can find out which socket(s) are now ready-for-read (by calling FD_ISSET()) and call recv() on those sockets to get some data from them and handle it. After that you loop again, go back to sleep inside select() again, and repeat ad infinitum. In this way you handle all of your tasks as quickly as possible, while using the minimum amount of CPU cycles.
The entire thing is multi threaded and every part of it (except this
part) uses an event based notification system that eliminates the busy
waiting problem.
Note that if your thread is blocked inside of select() and you want it to wake up and do something right away (i.e. without relying on a timeout, which would be slow and inefficient), then you'll need some way to cause select() in that thread to return immediately. In my experience the most reliable way to do that is to create a pipe() or socketpair() and have the thread include one end of the file-descriptor-pair in its ready-for-read fd_set. Then when another thread wants to wake that thread up, it can do so simply by sending a byte on the the other end of the pair. That will cause select() to return, the thread can then read the single byte (and throw it away), and then do whatever it is supposed to do after waking up.
Related
Let's say I start a thread to receive on a port. The socket call will block on recvfrom.
Then, somehow in another thread, I close the socket.
On Windows, this will unblock recvfrom and my thread execution will terminate.
On Linux, this does not unblock recvfrom, and as a result, my thread is sitting doing nothing forever, and the thread execution does not terminate.
Can anyone help me with what's happening on Linux? When the socket is closed, I want recvfrom to unblock
I keep reading about using select(), but I don't know how to use it for my specific case.
Call shutdown(sock, SHUT_RDWR) on the socket, then wait for the thread to exit. (i.e. pthread_join).
You would think that close() would unblock the recvfrom(), but it doesn't on linux.
Here's a sketch of a simple way to use select() to deal with this problem:
// Note: untested code, may contain typos or bugs
static volatile bool _threadGoAway = false;
void MyThread(void *)
{
int fd = (your socket fd);
while(1)
{
struct timeval timeout = {1, 0}; // make select() return once per second
fd_set readSet;
FD_ZERO(&readSet);
FD_SET(fd, &readSet);
if (select(fd+1, &readSet, NULL, NULL, &timeout) >= 0)
{
if (_threadGoAway)
{
printf("MyThread: main thread wants me to scram, bye bye!\n");
return;
}
else if (FD_ISSET(fd, &readSet))
{
char buf[1024];
int numBytes = recvfrom(fd, buf, sizeof(buf), 0);
[...handle the received bytes here...]
}
}
else perror("select");
}
}
// To be called by the main thread at shutdown time
void MakeTheReadThreadGoAway()
{
_threadGoAway = true;
(void) pthread_join(_thread, NULL); // may block for up to one second
}
A more elegant method would be to avoid using the timeout feature of select, and instead create a socket pair (using socketpair()) and have the main thread send a byte on its end of the socket pair when it wants the I/O thread to go away, and have the I/O thread exit when it receives a byte on its socket at the other end of the socketpair. I'll leave that as an exercise for the reader though. :)
It's also often a good idea to set the socket to non-blocking mode also, to avoid the (small but non-zero) chance that the recvfrom() call might block even after select() indicated the socket is ready-to-read, as described here. But blocking mode might be "good enough" for your purpose.
Not an answer, but the Linux close man page contains the interesting quote:
It is probably unwise to close file descriptors while they may be in
use by system calls in other threads in the same process. Since a file
descriptor may be reused, there are some obscure race conditions that
may cause unintended side effects.
You are asking for the impossible. There is simply no possible way for the thread that calls close to know that the other thread is blocked in recvfrom. Try to write code that guarantees that this happens, you will find that it is impossible.
No matter what you do, it will always be possible for the call to close to race with the call to recvfrom. The call to close changes what the socket descriptor refers to, so it can change the semantic meaning of the call to recvfrom.
There is no way for the thread that enters recvfrom to somehow signal to the thread that calls close that it is blocked (as opposed to being about to block or just entering the system call). So there is literally no possible way to ensure the behavior of close and recvfrom are predictable.
Consider the following:
A thread is about to call recvfrom, but it gets pre-empted by other things the system needs to do.
Later, the thread calls close.
A thread started by the system's I/O library calls socket and gets the same decsriptor as the one you closed.
Finally, the thread calls recvfrom, and now it's receiving from the socket the library opened.
Oops.
Don'd ever do anything even remotely like this. A resource must not be released while another thread is, or might be, using it. Period.
I have two separate processes, a client and server process. They are linked using shared memory.
A client will begin his response by first altering a certain part of the shared memory to the input value and then flipping a bit indicating that the input is valid and that the value has not already been computed.
The server waits for a kill signal, or new data to come in. Right now the relevant server code looks like so:
while(!((*metadata)&SERVER_KILL)){
//while no kill signal
bool valid_client = ((*metadata)&CLIENT_REQUEST_VALID)==CLIENT_REQUEST_VALID;
bool not_already_finished = ((*metadata)&SERVER_RESPONSE_VALID)!=SERVER_RESPONSE_VALID;
if(valid_client & not_already_finished){
*int2 = sqrt(*int1);
*metadata = *metadata | SERVER_RESPONSE_VALID;
//place square root of input in memory, set
//metadata to indicate value has been found
}
}
The problem with this is that the while loop takes up too many resources.
Most solutions to this problem are usually with a multithreaded application in which case you can use condition variables and mutexes to control the progression of the server process. Since these are single threaded applications, this solution is not applicable. Is there a lightweight solution that allows for waiting for these memory locations to change all while not completely occupying a hardware thread?
You can poll or block... You can also wait for an interrupt, but that would probably also entail some polling or blocking.
Is message-passing on the table? That would allow you to block. Maybe a socket?
You can also send a signal from one process to another. You would write an interrupt handler for the receiving process.
Note that when an interrupt handler runs, it preempts the process' thread of execution. In other words, the main thread is paused while the handler runs. So your interrupt handler shouldn't grab a lock if there's a chance that the lock is already held, as it will create a deadlock situation. You can avoid this by using a re-entrant lock or one of the special lock types that disables interrupts before grabbing the lock. Things that grab locks: mutex.lock (obviously), I/O, allocating memory, condition.signal (much less obvious).
I am writing a wrapper around the C++ API of a programme, which needs to connect to a network. I want my own Connect() function to wait for 2 seconds or less, and continue if no connection is established. What I was thinking of is simply using Sleep(...) and checking again, but this doesn't work:
class MyWrapperClass
{
IClient* client;
bool MyWrapperClass::Connect()
{
client->Connect();
int i = 0;
while (i++ < 20 && !client->IsConnected())
Sleep(100); /* Sleep for 0.1 s to give client time to connect (DOESN'T HAPPEN) */
return client->IsConnected();
}
}
I am assuming that this fails (i.e. no connection is established) because the thread as a whole stops, including the IClient::Connect() method. I have no access to this method, so I cannot verify whether this starts any other threads or anything.
Is there a better way to have a function wait for a short while without blocking anything?
Edit:
To complicate matters consider the following: the programme has to be compiled with /clr as the API demands this (so std::thread cannot be used) AND IClient cannot be an unmanaged class (i.e. IClient^ client = gcnew IClient() is not legal), as the class contains unmanaged stuff. Neither is in my power to alter, as it is demanded by the API.
As others pointed out you can't wait without blocking. Blocking IS the entire point of waiting.
I would look carefully af IClient and read any documentation to ensure there is no function that lets you do this asynchronously.
If you have no luck, then you are left with doing a loop with sleep in you code. If you can use c++11 then Chris gave a good suggetion. Otherwise you are left with whatever your OS gives you. On a POSIX system (unix) you could try usleep() or nanosleep() to give you shorter sleep than sleep() see http://linux.die.net/man/3/usleep.
If you want to connect without switching to another thread you can use system-specific options for that. For example setsockopt() for SO_RCVTIMEO option before connect will help you on linux. You can try to find a way to pass a preconfigured socket to the library in question.
Let's say that I have two libraries (A and B), and each has one function that listen on sockets. These functions use select() and they return some event immediately if the data has arrived, otherwise they wait for some time (timeout) and then return NULL:
A_event_t* A_wait_for_event(int timeout);
B_event_t* B_wait_for_event(int timeout);
Now, I use them in my program:
int main (int argc, char *argv[]) {
// Init A
// Init B
// .. do some other initialization
A_event_t *evA;
B_event_t *evB;
for(;;) {
evA = A_wait_for_event(50);
evB = B_wait_for_event(50);
// do some work based on events
}
}
Each library has its own sockets (e.g. udp socket) and it is not accessible from outside.
PROBLEM: This is not very efficient. If for example there is a lot of events waiting to be delivered by *B_wait_for_event* these would have to wait always until *A_wait_for_event* timeouts, which effectively limits the throughput of library B and my program.
Normally, one could use threads to separate processing, BUT what if processing of some event require to call function of other library and vice verse. Example:
if (evA != 0 && evA == A_EVENT_1) {
B_do_something();
}
if (evB != 0 && evB == B_EVENT_C) {
A_do_something();
}
So, even if I could create two threads and separate functionality from libraries, these threads would have to exchange events among them (probably through pipe). This would still limit performance, because one thread would be blocked by *X_wait_for_event()* function, and would not be possible to receive data immediately from other thread.
How to solve this?
This solution may not be available depending on the libraries you're using, but the best solution is not to call functions in individual libraries that wait for events. Each library should support hooking into an external event loop. Then your application uses a single loop which contains a poll() or select() call that waits on all of the events that all of the libraries you use want to wait for.
glib's event loop is good for this because many libraries already know how to hook into it. But if you don't use something as elaborate as glib, the normal approach is this:
Loop forever:
Start with an infinite timer and an empty set of file descriptors
For each library you use:
Call a setup function in the library which is allowed to add file descriptors to your set and/or shorten (but not lengthen) the timeout.
Run poll()
For each library you use:
Call a dispatch function in the library that responds to any events that might have occurred when the poll() returned.
Yes, it's still possible for an earlier library to starve a later library, but it works in practice.
If the libraries you use don't support this kind of setup & dispatch interface, add it as a feature and contribute the code upstream!
(I'm moving this to an answer since it's getting too long for a comment)
If you are in a situation where you're not allowed to call A_do_something in one thread while another thread is executing A_wait_for_event (and similarly for B), then I'm pretty sure you can't do anything efficient, and have to settle between various evils.
The most obvious improvement is to immediately take action upon getting an event, rather than trying to read from both: i.e. order your loop
Wait for an A event
Maybe do something in B
Wait for a B event
Maybe do something in A
Other mitigations you could do are
Try to predict whether an A event or a B event is more likely to come next, and wait on that first. (e.g. if they come in streaks, then after getting and handling an A event, you should go back to waiting for another A event)
Fiddle with the timeout values to strike a balance between spin loops and too much blocking. (maybe even adjust dynamically)
EDIT: You might check the APIs for your library; they might already offer a way to deal with the problem. For example, they might allow you to register callbacks for events, and get notifications of events through the callback, rather than polling wait_for_event.
Another thing is if you can create new file descriptors for the library to listen on. e.g. If you create a new pipe and hand one end to library A, then if thread #1 is waiting for an A event, thread #2 can write to the pipe to make an event happen, thus forcing #1 out of wait_for_event. With the ability to kick threads out of the wait_for_event functions at will, all sorts of new options become available.
A possible solution is to use two threads to wait_for_events plus boost::condition_variable in "main" thread which "does something". An alike but not exact solution is here
I have a list of HANDLE's, controlled by a lot of different IO devices. What would be the (performance) difference between:
A call to WaitForMultipleObjects on all these handles
async_read on boost::windows::basic_handle's around all these handles
Is WaitForMultipleObjects O(n) time complex with n the amount of handles?
You can somehow call async_read on a windows::basic_handle right? Or is that assumption wrong?
If I call run on the same IO device in multiple threads, will the handling-calls be balanced between those threads? That would be a major benefit of using asio.
since it sounds like the main use you would derive from asio is the fact that it is built on top of IO completion ports (iocp for short). So, let's start with comparing iocp with WaitForMultipleObjects(). These two approaches are essentially the same as select vs. epoll on linux.
The main drawback of WaitForMultipleObjects that was solved by iocp is the inability to scale with many file descriptors. It is O(n), since for each event you receive you pass in the full array again, and internally WaitForMultipleObjects must scan the array to know which handles to trigger on.
However, this is rarely a problem because of the second drawback. WaitForMultipleObjects() has a limit on the max number of handles it can wait on (MAXIMUM_WAIT_OBJECTS). This limit is 64 objects (see winnt.h). There are ways around this limit by creating Event objects and tying multiple sockets to each event, and then wait on 64 events.
The third drawback is that there's actually a subtle "bug" in WaitForMultipleObjects(). It returns the index of the handle which triggered an event. This means it can only communicate a single event back to the user. This is different from select, which will return all file descriptors that triggered an event. WaitForMultipleObjects scans the handles passed in to it and return the first handle that has its event raised.
This means, if you are waiting on 10 very active sockets, all of which has an event on them most of the time, there will be a very heavy bias toward servicing the first socket in the list passed in to WaitForMultipleObjects. This can be circumvented by, every time the function returns and the event has been serviced, run it again with a timeout of 0, but this time only pass in the part of the array 1 past the event that triggered. Repeatedly until all handles has been visited, then go back to the original call with all handles and an actual timeout.
iocp solves all of these problems, and also introduces an interface for a more generic event notification, which is quite nice.
With iocp (and hence asio):
you don't repeat which handles you're interested in, you tell windows once, and it remembers it. This means it scales a lot better with many handles.
You don't have a limit of the number of handles you can wait on
You get every event, i.e. there's no bias towards any specific handle
I'm not sure about your assumption of using async_read on a custom handle. You might have to test that. If your handle refers to a socket, I would imagine it would work.
As for the threading question; yes. If you run() the io_service in multiple threads, events are dispatched to a free thread, and will scale with more threads. This is a feature of iocp, which even has a thread pool API.
In short: I believe asio or iocp would provide better performance than simply using WaitForMultipleObjects, but whether or not that performance will benefit you mostly depends on how many handles you have and how active they are.
Both WaitForSingleObject & WaitForMultipleObjects are widely used functions, The WaitForSingleObject function is used for waiting on a single Thread synchronization object. This is signaled when the object is set to signal or the time out interval is finished. If the time interval is INFINITE, it waits infinitely.
DWORD WaitForSingleObject(
HANDLE hHandle,
DWORD dwMilliseconds
);
The WaitForMultipleObjects is used to wait for multiple objects signaled. In the Semaphore thread synchronization object, when the counters go to zero the object is non-signaled. The Auto reset event and Mutex is non-signaled when it releases the object. The manual reset event does affect the wait functions' state.
DWORD WaitForMultipleObjects(
DWORD nCount,
const HANDLE *lpHandles,
BOOL bWaitAll,
DWORD dwMilliseconds
);
If dwMilliseconds is zero, the function does not enter a wait state if the object is not signaled; it always returns immediately. If dwMilliseconds is INFINITE, the function will return only when the object is signaled.