IPC connection with thread on Windows using Mutex

IPC connection with thread on Windows using Mutex - c++

I have a question about Windows IPC. I implemented IPC with mutex on Windows, but there is a problem when I made the connection with another thread;when the thread terminated, the connection is closed.
The connection thread(A) makes connection to the server
Main thread(B) uses the connection handle(global variable) returned by A
A terminates
B cannot refer the handle any more - because connection is closed
It is natural that mutex is released when the process terminated. However, in the case of thread, I need the way to hold mutex to maintain connection even though the thread terminated, if the process is alive.
Semaphore can be the alternative on Linux, however, on Windows, it is impossible to use semaphor because it cannot sense the abnormal disconnection.
Does someone have any idea?

There is no way to prevent the ownership of a mutex from being released when the thread that owns it exits.
There are a number of other ways you might be able to fix the problem, depending on the circumstances.
1) Can you change any of the code on the client? For example, if the client executable is using a DLL that you have provided to establish and maintain the connection, you could change the DLL so that it uses a more appropriate object (such as a named pipe) rather than a mutex, or you could get the DLL to start its own thread to own the mutex.
2) Is there more than one client? Presumably, since you are using a mutex, you are only expecting one client to connect at a time. If you can safely assume that only one client will be connected at a time, then when the server detects that the mutex has been abandoned, it could close its own handle to the mutex. When the client process exits, the mutex will automatically be deleted, so the server could periodically check to see whether it still exists or not.
3) How is the client communicating with the server? The server is presumably doing something useful for the client, so there must be another communications channel as well as the mutex. For example, if the client is opening a named pipe to the server, you could use that connection instead of the mutex to detect when the client process exits. Or, if the communications channel allows you to determine the process ID of the client, you could open a handle to the process and use that to detect when the client process exits.
4) If no other solution will work, and you are forced to rewrite the client as well as the server, consider using a more appropriate form of IPC, such as a named pipe.
Additional
5) It is common practice to use a process handle to wait for (or test for) process termination. Most often, these handles are the ones generated for the parent when a process is created, but there is no reason not to use a handle generated by OpenProcess. As far as precedent goes, I assure you there is at least as much precedent for using a handle generated by OpenProcess to monitor a client process as there is for using a mutex; it is entirely possible that you are the first person to ever try to use a Windows mutex to detect that a process has exited. :-)
6) Presumably the SQLDisconnect() function is calling ReleaseMutex in order to disconnect from the server. Since it is doing so from a thread that doesn't own the mutex, that won't do anything except return an error code, so there's no reasonable way for your server to detect that happening. Does the function also call CloseHandle on the mutex? If so, you could use the approach in (2) to detect when this happens. This would work both for calls to SQLDisconnect() and when the process exits. It shouldn't matter that there are multiple clients, since they are using different mutexes.
6a) I say "no reasonable way" because you could conceivably use hooking to change the behaviour of ReleaseMutex. This would not be a good option.
7) You should examine carefully what the SQLDisconnect() function does apart from calling ReleaseMutex and/or CloseHandle. It is entirely possible that you can detect the disconnection by some means other than the mutex.

Related

Boost::interprocess: scoped_lock seems impossible to be acquired after crash [duplicate]

My scenario: one server and some clients (though not many). The server can only respond to one client at a time, so they must be queued up. I'm using a mutex (boost::interprocess::interprocess_mutex) to do this, wrapped in a boost::interprocess::scoped_lock.
The thing is, if one client dies unexpectedly (i.e. no destructor runs) while holding the mutex, the other clients are in trouble, because they are waiting on that mutex. I've considered using timed wait, so if I client waits for, say, 20 seconds and doesn't get the mutex, it goes ahead and talks to the server anyway.
Problems with this approach: 1) it does this everytime. If it's in a loop, talking constantly to the server, it needs to wait for the timeout every single time. 2) If there are three clients, and one of them dies while holding the mutex, the other two will just wait 20 seconds and talk to the server at the same time - exactly what I was trying to avoid.
So, how can I say to a client, "hey there, it seems this mutex has been abandoned, take ownership of it"?

Unfortunately, this isn't supported by the boost::interprocess API as-is. There are a few ways you could implement it however:
If you are on a POSIX platform with support for pthread_mutexattr_setrobust_np, edit boost/interprocess/sync/posix/thread_helpers.hpp and boost/interprocess/sync/posix/interprocess_mutex.hpp to use robust mutexes, and to handle somehow the EOWNERDEAD return from pthread_mutex_lock.
If you are on some other platform, you could edit boost/interprocess/sync/emulation/interprocess_mutex.hpp to use a generation counter, with the locked flag in the lower bit. Then you can create a reclaim protocol that will set a flag in the lock word to indicate a pending reclaim, then do a compare-and-swap after a timeout to check that the same generation is still in the lock word, and if so replace it with a locked next-generation value.
If you're on windows, another good option would be to use native mutex objects; they'll likely be more efficient than busy-waiting anyway.
You may also want to reconsider the use of a shared-memory protocol - why not use a network protocol instead?

`io_context.stop()` vs `socket.close()`

To close a Tcp client, which one should be used, io_context.stop() or socket.close()? What aspects should be considered when making such a choice?
As far as I know, io_context is thread-safe whereas socket is not.
So, I can invoke io_context.stop() in any thread which may be different from the one that has called io_context.run().
But for socket.close(), I need to call io_context.post([=](){socket.stop()}) if socket object is called in a different thread(e.g. the said thread calls aiso::async_read(socket, ...)).

To close a Tcp client, which one should be used, io_context.stop() or socket.close()?
Obviously socket.cancel() and or socket.shutdown() :)
Stopping the entire iexecution context might seem equivalent in the case of only a single IO object (your socket). But as soon as you have multiple sockets open or use timers and signal_sets, it becomes obvious why that is shooting a fly with a canon.
Also note that io_context::stop has the side effect of clearing any outstanding work (at least, inability to resume without reset() first) which makes it even more of a blunt weapon.
Instead, use socket::cancel() to cancel any IO operation on it. They will complete with error::operation_aborted so you can detect the situation. This is enough if you control all the async initiations on the object. If you want to prevent "other" parties from starting new IO operations successfully you can shutdown the socket instead. You can shutdown the writing side, reading side or both of a socket.
The reason why shutdown is often superior to close() can be quite subtle. On the one hand, shutting down one side makes it so that you can still handle/notify the other side for graceful shutdown. On the other hand there's a prevention of a pretty common race condition when the native socket handle is (also) being stored somewhere: Closing the socket makes the native handle eligible for re-use, and a client that is unaware of the change could at a later type continue to use that handle, unaware that it now belongs to someone else. I have seen bugs in production code where under high load RPC calls would suddenly be written to the database server due to this kind of thing.
In short, best to tie the socket handle to the life time of the socket instance, and prefer to use cancel() or shutdown().
I need to call io_context.post(={socket.stop()}) if socket object is called in a different thread(e.g. the said thread calls aiso::async_read(socket, ...)).
Yes, thread-safety is your responsibiliity. And no, post(io_context, ...) is not even enough when multiple threads are running the execution context. In that case you need more synchronization, like post(strand_, ...). See Why do I need strand per connection when using boost::asio?

How do I recover when a synchronous call to socket send() gets blocked due to the loss of the other end of the connection?

When my socket connection is terminated normally, then it works fine. But there are cases where the normal termination does not occur and the remote side of the connection simply disappears. When this happens, the sending task gets stuck in send() because the other side has stopped ack'ing the data. My application has a ping request/response going on and so, in another thread, it recognizes that the connection is dead. The question is...what should this other thread do in order to bring the connection to a safe termination. Should it call close()? I see SIGPIPE thrown around when this happens and I just want to make sure I am closing the connection in a safe way. I just don't want it to crash...I don't care about the leftover data. I am using a C++ library that is using synchronous sockets, so moving to async is not an easy option for me.

I avoid this problem by setting setting SIGPIPE to be ignored, and setting all my sockets to non-blocking I/O mode. Once a socket is in non-blocking mode, it will never block inside of send() or recv() -- rather, in any situation where it would normally block, it will instead immediately return -1 and set errno to EWOULDBLOCK instead. Therefore I can never "lose control" of the thread due to bad network conditions.
Of course if you never block, how do you keep your event loop from spinning and using up 100% of a core all the time? The answer is that you can block waiting for I/O inside of a separate call that is designed to do just that, e.g. select() or poll() or similar. These functions are designed to block until any one of a number of sockets becomes ready-to-read (or optionally ready-for-write) or until a pre-specified amount of time elapses, whichever comes first. So by using these, you can have your thread wake up when it needs to wake up and also sleep when there's nothing to do.
Anyway, once you have that (and you've made sure that your code handles short reads, short writes, and -1/EWOULDBLOCK gracefully, as those happen more often in non-blocking mode), you are free to implement your dead-network-detector in any of several ways. You could implement it within your network I/O thread, by keeping track of how long it has been since any data was last sent or received, and by using the timeout argument to select() to cause the blocking function to wake up at the appropriate times based on that. Or you could still use a second thread, but now the second thread has a safe way to wake up the first thread: by calling pipe() or socketpair() you can create a pair of connected file descriptors, and your network I/O thread can select()/poll() on the receiving file descriptor while the other thread holds the sending file descriptor. Then when the other thread wants to wake up the I/O thread, it can send a byte on its file descriptor, or just close() it; either one will cause the network I/O thread to return from select() or poll() and find out that something has happened on its receiving-file-descriptor, which gives it the opportunity to react by exiting (or taking whatever action is appropriate).
I use this technique in almost all of my network programming, and I find it works very well to achieve network behavior that is both reliable and CPU-efficient.

I had a lot of SIGPIPE in my application. Those are not really important: they just tells you that a Pipe (here a SOCKET) is no more available.
I do then, in my main function
signal(SIGPIPE, SIG_IGN);

Another option is to use MSG_NOSIGNAL flag for send, e.g. send(..., MSG_NOSIGNAL);. In that case SIGPIPE is not sent, the call returns -1 and errno == EPIPE.

SDL_net 2.0 multithreading

Is it safe to call SDL_net functions on another thread (other than the main thread)? And are there any rules about it? I could not find any information about it when I searched for it.

Yes, it is safe. In fact, some operations should be done in a separate thread.
I looked into the TCP part of SDL_net. In particular, any call to
SDLNet_ResolveHost, if it has to resolve the DNS query over a remote host
SDLNet_TCP_Open that connects to a remote host and doesn't just establish a listening socket
SDLNet_TCP_Recv if and only if there there aren't any pending bytes on the TCP stream
SDLNet_TCP_Send
must be done on a separate thread if you want to avoid blocking the render thread, missed timings and windows that are not responding anymore.
However, it should be avoided that two or more threads meddle with the same socket at the same time. Make sure to have the threads communicate with eachother properly to avoid bugs caused by concurrency. Use mutexes, locks, etc. to make sure of that.

Waiting on a condition (pthread_cond_wait) and a socket change (select) simultaneously

I'm writing a POSIX compatible multi-threaded server in c/c++ that must be able to accept, read from, and write to a large number of connections asynchronously. The server has several worker threads which perform tasks and occasionally (and unpredictably) queue data to be written to the sockets. Data is also occasionally (and unpredictably) written to the sockets by the clients, so the server must also read asynchronously. One obvious way of doing this is to give each connection a thread which reads and writes from/to its socket; this is ugly, though, since each connection may persist for a long time and the server thus may have to hold hundred or thousand threads just to keep track of connections.
A better approach would be to have a single thread that handled all communications using the select()/pselect() functions. I.e., a single thread waits on any socket to be readable, then spawns a job to process the input that will be handled by a pool of other threads whenever input is available. Whenever the other worker threads produce output for a connection, it gets queued, and the communication thread waits for that socket to be writable before writing it.
The problem with this is that the communication thread may be waiting in the select() or pselect() function when output is queued by the worker threads of the server. It's possible that, if no input arrives for several seconds or minutes, a queued chunk of output will just wait for the communication thread to be done select()ing. This shouldn't happen, however--data should be written as soon as possible.
Right now I see a couple solutions to this that are thread-safe. One is to have the communication thread busy-wait on input and update the list of sockets it waits on for writing every tenth of a second or so. This isn't optimal since it involves busy-waiting, but it will work. Another option is to use pselect() and send the USR1 signal (or something equivalent) whenever new output has been queued, allowing the communication thread to update the list of sockets it is waiting on for writable status immediately. I prefer the latter here, but still dislike using a signal for something that should be a condition (pthread_cond_t). Yet another option would be to include, in the list of file descriptors on which select() is waiting, a dummy file that we write a single byte to whenever a socket needs to be added to the writable fd_set for select(); this would wake up the communications server because that particular dummy file would then be readable, thus allowing the communications thread to immediately update it's writable fd_set.
I feel intuitively, that the second approach (with the signal) is the 'most correct' way to program the server, but I'm curious if anyone knows either which of the above is the most efficient, generally speaking, whether either of the above will cause race conditions that I'm not aware of, or if anyone knows of a more general solution to this problem. What I really want is a pthread_cond_wait_and_select() function that allows the comm thread to wait on both a change in sockets or a signal from a condition.
Thanks in advance.

This is a fairly common problem.
One often used solution is to have pipes as a communication mechanism from worker threads back to the I/O thread. Having completed its task a worker thread writes the pointer to the result into the pipe. The I/O thread waits on the read end of the pipe along with other sockets and file descriptors and once the pipe is ready for read it wakes up, retrieves the pointer to the result and proceeds with pushing the result into the client connection in non-blocking mode.
Note, that since pipe reads and writes of less then or equal to PIPE_BUF are atomic, the pointers get written and read in one shot. One can even have multiple worker threads writing pointers into the same pipe because of the atomicity guarantee.

Unfortunately, the best way to do this is different for each platform. The canonical, portable way to do it is to have your I/O thread block in poll. If you need to get the I/O thread to leave poll, you send a single byte on a pipe that the thread is polling. That will cause the thread to exit from poll immediately.
On Linux, epoll is the best way. On BSD-derived operating systems (including OSX, I think), kqueue. On Solaris, it used to be /dev/poll and there's something else now whose name I forget.
You may just want to consider using a library like libevent or Boost.Asio. They give you the best I/O model on each platform they support.

Your second approach is the cleaner way to go. It's totally normal to have things like select or epoll include custom events in your list. This is what we do on my current project to handle such events. We also use timers (on Linux timerfd_create) for periodic events.
On Linux the eventfd lets you create such arbitrary user events for this purpose -- thus I'd say it is quite accepted practice. For POSIX only functions, well, hmm, perhaps one of the pipe commands or socketpair I've also seen.
Busy-polling is not a good option. First you'll be scanning memory which will be used by other threads, thus causing CPU memory contention. Secondly you'll always have to return to your select call which will create a huge number of system calls and context switches which will hurt overall system performance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js