My scenario: one server and some clients (though not many). The server can only respond to one client at a time, so they must be queued up. I'm using a mutex (boost::interprocess::interprocess_mutex) to do this, wrapped in a boost::interprocess::scoped_lock.
The thing is, if one client dies unexpectedly (i.e. no destructor runs) while holding the mutex, the other clients are in trouble, because they are waiting on that mutex. I've considered using timed wait, so if I client waits for, say, 20 seconds and doesn't get the mutex, it goes ahead and talks to the server anyway.
Problems with this approach: 1) it does this everytime. If it's in a loop, talking constantly to the server, it needs to wait for the timeout every single time. 2) If there are three clients, and one of them dies while holding the mutex, the other two will just wait 20 seconds and talk to the server at the same time - exactly what I was trying to avoid.
So, how can I say to a client, "hey there, it seems this mutex has been abandoned, take ownership of it"?
Unfortunately, this isn't supported by the boost::interprocess API as-is. There are a few ways you could implement it however:
If you are on a POSIX platform with support for pthread_mutexattr_setrobust_np, edit boost/interprocess/sync/posix/thread_helpers.hpp and boost/interprocess/sync/posix/interprocess_mutex.hpp to use robust mutexes, and to handle somehow the EOWNERDEAD return from pthread_mutex_lock.
If you are on some other platform, you could edit boost/interprocess/sync/emulation/interprocess_mutex.hpp to use a generation counter, with the locked flag in the lower bit. Then you can create a reclaim protocol that will set a flag in the lock word to indicate a pending reclaim, then do a compare-and-swap after a timeout to check that the same generation is still in the lock word, and if so replace it with a locked next-generation value.
If you're on windows, another good option would be to use native mutex objects; they'll likely be more efficient than busy-waiting anyway.
You may also want to reconsider the use of a shared-memory protocol - why not use a network protocol instead?
Related
I'm working with a boost::statechart::state_machine and I experienced a crash in the machine. Upon investigation of the core I realized that it happened because multiple threads processed an event around the same time, one of which called terminate and the other of which crashed because it tried to use a terminated object.
I therefore need to know what my options are for making my state machine thread-safe. In looking at the boost's statechard documentation, it explicitly says that statechart::state_machine is not thread-safe and indicates that thread-safety can be accomplished by aynchronous_state_machine. But asynchronous_state_machine looks like it solves more problems than just thread safety and converting from state_machine to asynchronous_state_machine looks non-trivial. Can I achieve a thread-safe implementation by simply locking around my calls to process_event?
As an alternative to mutex semaphores or locks, you might consider a monitor.
The state machine can possibly be just as you have it now.
There are several kinds I know of, and I have (not so recently) used a Hoare Monitor for a state machine of my own design (not boost).
From wiki-pedia: "In concurrent programming, a monitor is a synchronization construct that allows threads to have both mutual exclusion and the ability to wait (block) for a certain condition to become true. "
My implementation of a Hoare Monitor transformed any event (input to my state machine) into an IPC message to the monitor thread. Only the monitor thread modifies the state machine. This machine (and all its states) are private data to the class containing the monitor thread and its methods.
Some updates must be synchronous, that is, a requesting thread suspends until it receives an IPC response. Some updates can be asynchronous, so the requesting thread need not wait. While processing one thread request, the monitor ignores the other thread requests, their requests simply queue until the monitor can get to them.
Since only 1 thread is allowed to directly modify the (private data attribute) state machine, no other mutex schemes are needed.
That effort was for a telecommunications device, and the events were mostly from human action, there for not time critical.
The state machine can possibly be just as you have it now. You only need to implement the monitor thread, decide on an IPC (or maybe inter-thread-comm) and ensure that only the one thread will have access to the state machine.
I have 2 processes called Writer and Reader running on the same machine. Writer is a singular thread and writes data to a shared memory. Reader has 8 threads that intend to read data from the shared memory concurrently. I need a locking mechanism that meets following criteria:
1) At a time, either Writer or Reader is allowed to access the shared memory.
2) If Reader has permission to read data from the shared memory, all its own threads can read data.
3) Writer has to wait until Reader "completely" releases the lock (because it has multiple threads).
I have read much about sharable mutex that seems to be the solution. Here I describe more detailed about my system:
1) System should run on both Windows & Linux.
2) I divide the shared memory into two regions: locks & data. The data region is further divided into 100 blocks. I intend to create 100 "lock objects" (sharable mutex) and lay them on the locks region. These lock objects are used for synchronization of 100 the data blocks, 1 lock object for 1 data block.
3) Writer, Readers first determine which block it would like to access then try to acquire the appropriate lock. Once acquired the lock, it then performs on the data block.
My concern now is:
Is there any "built-in" way to lay the lock objects on shared memory on Windows and Linux (Centos) and then I can do lock/unlock with the objects without using boost library.
[Edited Feb 25, 2016, 09:30 GMT]
I can suggest a few things. It really depends on the requirements.
If it seems like the boost upgradeable mutex fits the bill, then by all means, use it. From 5 minute reading is seems you can use them in shm. I have no experience with it as I don't use boost. Boost is available on Windows and Linux so I don't see why not use it. You can always grab the specific code you like and bring it into your project without dragging the entire behemoth along.
Anyway, isn't it fairly easy to test and see is it good enough?
I don't understand the requirement for placing locks in shm. If it's no real requirement, and you want to use OS native objects, you can use a different mechanism per OS. Say, named mutex on Windows (not in shm), and pthread_rwlock, in shm, on Linux.
I know what I would prefer to use: a seqlock.
I work in the low-latency domain, so I'm picking what gets me the lowest possible latency. I measure it in cpu cycles.
From you mentioning that you want a lock per object, rather than one big lock, I assume performance is important.
There're important questions here, though:
Since it's in shm, I assume it's POD (flat data)? If not, you can switch to a read/write spinlock.
Are you ok with spinning (busy wait) or do you want to sleep-wait? seqlocks and spinlocks are no OS mechanism, so there's nobody to put your waiting threads to sleep. If you do want to sleep-wait, read #4
If you care to know the other side (reader/write) died, you have to impl that in some other way. Again, because seqlock is no OS beast. If you want to be notified of other side's death as part of the synchronization mechanism, you'll have to settle for named mutexes, on Windows, and on robust mutexes, in shm, on Linux
Spinlocks and seqlocks provide the maximum throughput and minimum latency. With kernel supported synchronization, a big part of the latency is spent in switching between user and kernel space. In most applications it is not a problem as synchronization is only happening in a small fraction of the time, and the extra latency of a few microseconds is negligible. Even in games, 100 fps leaves you with 10ms per frame, that is eternity in term of mutex lock/unlock.
There are alternatives to spinlock that are usually not much more expensive.
In Windows, Critical Section is actually a spinlock with a back-off mechanism that uses an Event object. This was re-implemented using shm and named Event and called Metered Section.
In Linux, the pthread mutex is futex based. A futex is like Event on Windows. A non-robust mutex with no contention is just a spinlock.
These guys still don't provide you with notification when the other side dies.
Addition [Feb 26, 2016, 10:00 GMT]
How to add your own owner death detection:
The Windows named mutex and pthread robust mutex have this capability built-in. It's easy enough to add it yourself when using other lock types and could be essential when using user-space-based locks.
First, I have to say, in many scenarios it's more appropriate to simply restart everything instead of detecting owner's death. It is definitely simpler as you also have to release the lock from a process that is not the original owner.
Anyway, native way to detect a process death is easy on Windows - processes are waitable objects so you can just wait on them. You can wait for zero time for an immediate check.
On Linux, only the parent is supposed to know about it's child's death, so less trivial. The parent can get SIGCHILD, or use waitpid().
My favorite way to detect process death is different. I connect a non-blocking TCP socket between the 2 processes and trust the OS to kill it on process death.
When you try to read data from the socket (on any of the sides) you'd read 0 bytes if the peer has died. If it's still alive, you'd get EWOULDBLOCK.
Obviously, this also works between boxes, so kinda convenient to have it uniformly done once and for all.
Your worker loop will have to change to interleave the peer death check and it's usual work.
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <boost/interprocess/sync/interprocess_condition.hpp>**
//Mutex to protect access to the queue
boost::interprocess::interprocess_mutex mutex;
//Condition to wait when the queue is empty
boost::interprocess::interprocess_condition cond_empty;
//Condition to wait when the queue is full
boost::interprocess::interprocess_condition cond_full;
I'm running an fully operational IOCP TCP socket application. Today I was thinking about the Critical Section design and now I have one endless question in my head: global or per client Critical Section? I came to this because as I see there is no point to use multiple working threads if every threads depends on a single lock, right? I mean... now I don't see any performance issue with 100 simultaneous clients, but what if was 10000?
My shared resource is per client pre allocated struct, so, each client have your own IO context, socket and stuff. There is no inter-client resource share, so I think that is another point for use the per client CS. I use one accept thread and 8 (processors * 2) working threads. This applications is basicaly designed for small (< 1KB) packets but sometimes for file streaming.
The "correct" answer probably depends on your design, the number of concurrent clients and the performance that you require from the hardware that you have available.
In general, I find it best to go with the simplest thing that works and then profile to locate hot spots.
However... You say that you have no inter-client shared resources so I assume the only synchronisation that you need to do is around 'per-connection' state.
Since it's per connection the obvious (to me) design would be for the per-connection state to contain its own critical section. What do you perceive to be the downside of this approach?
The problem with a single shared lock is that you introduce contention between connections (and threads) that have no reason to block each other. This will adversely affect performance and will likely become a hot-spot as connection numbers rise.
Once you have a per connection lock you might want to look at avoiding using it as often as possible by having the IOCP threads simply lock to place completions in a per connection queue for processing. This has the advantage of allowing a single IOCP thread to work on each connection and preventing a single connection from having additional IOCP threads blocking on it. It also works well with 'skip completion port on success' processing.
This is probably impossible, but i'm going to ask anyways. I have a multi-threaded program (server) that receives a request on a thread dedicated to IP communications and then passes it on to worker threads to do work, then I have to send a reply back with answers to the client and send it when it is actually finished, with as little delay as possible. Currently I am using a consumer/producer pattern and placing replies on a queue for the IP Thread to take off and send back to my client. This, however gives me no guarantee about WHEN this is going to happen, as the IP thread might not get scheduled any time soon, I cannot know. This makes my client, that is blocking for this call, think that the request has failed, which is obviously not the point.
Due to the fact I am unable to make changes in the client, I need to solve this sending issue on my side, the problem that I'm facing is that I do not wish to start sharing my IP object (currently only on 1 thread) with the worker threads, as then things get overly complicated. I wondered if there is some way I can use thread sync mechanisms to ensure that the moment my worker thread is finished, the IP thread will execute my send the reply back to the client?
Will manual/autoreset events do this for me or are these not guaranteed to wake up the thread immediately?
If you need it sent immediately, your best bet is to bite the bullet and start sharing the connection object. Lock it before accessing it, of course, and be sure to think about what you'll do if the send buffer is already full (the connection thread will need to deal with sending the portion of the message that didn't fit the first time, or the worker thread will be blocked until the client accepts some of the data you've sent). This may not be too difficult if your clients only have one request running at a time; if that's the case you can simply pass ownership of the client object to the worker thread when it begins processing, and pass it back when you're done.
Another option is using real-time threads. The details will vary between operating systems, but in most cases, if your thread has a high enough priority, it will be scheduled in immediately if it becomes ready to run, and will preempt all other threads with lower priority until done. On Linux this can be done with the SCHED_RR priority class, for example. However, this can negatively impact performance in many cases; as well as crashing the system if your thread gets into an infinite loop. It also usually requires administrative rights to use these scheduling classes.
That said, if scheduling takes long enough that the client times out, you might have some other problems with load. You should also really put a number on how fast the response needs to be - there's no end of things you can do if you want to speed up the response, but there'll come a point where it doesn't matter anymore (do you need response in the tens of ms? single-digit ms? hundreds of microseconds? single-digit microseconds?).
There is no synchronization mechanism that will wake a thread immediately. When a synchronization mechanism for which a thread is waiting is signaled, the thread is placed in a ready queue for its priority class. It can be starved there for several seconds before it's scheduled (Windows does have mechanisms that deal with starvation over 3-4 second intervals).
I think that for out-of-band, critical communications you can have a higher priority thread to which you can enqueue the reply message and wake it up (with a condition variable, MRE or any other synchronization mechanism). If that thread has higher priority than the rest of your application's threads, waking it up will immediately effect a context switch.
So, the situation is this. I've got a C++ library that is doing some interprocess communication, with a wait() function that blocks and waits for an incoming message. The difficulty is that I need a timed wait, which will return with a status value if no message is received in a specified amount of time.
The most elegant solution is probably to rewrite the library to add a timed wait to its API, but for the sake of this question I'll assume it's not feasible. (In actuality, it looks difficult, so I want to know what the other option is.)
Here's how I'd do this with a busy wait loop, in pseudocode:
while(message == false && current_time - start_time < timeout)
{
if (Listener.new_message()) then message = true;
}
I don't want a busy wait that eats processor cycles, though. And I also don't want to just add a sleep() call in the loop to avoid processor load, as that means slower response. I want something that does this with a proper sort of blocks and interrupts. If the better solution involves threading (which seems likely), we're already using boost::thread, so I'd prefer to use that.
I'm posting this question because this seems like the sort of situation that would have a clear "best practices" right answer, since it's a pretty common pattern. What's the right way to do it?
Edit to add: A large part of my concern here is that this is in a spot in the program that's both performance-critical and critical to avoid race conditions or memory leaks. Thus, while "use two threads and a timer" is helpful advice, I'm still left trying to figure out how to actually implement that in a safe and correct way, and I can easily see myself making newbie mistakes in the code that I don't even know I've made. Thus, some actual example code would be really appreciated!
Also, I have a concern about the multiple-threads solution: If I use the "put the blocking call in a second thread and do a timed-wait on that thread" method, what happens to that second thread if the blocked call never returns? I know that the timed-wait in the first thread will return and I'll see that no answer has happened and go on with things, but have I then "leaked" a thread that will sit around in a blocked state forever? Is there any way to avoid that? (Is there any way to avoid that and avoid leaking the second thread's memory?) A complete solution to what I need would need to avoid having leaks if the blocking call doesn't return.
You could use sigaction(2) and alarm(2), which are both POSIX. You set a callback action for the timeout using sigaction, then you set a timer using alarm, then make your blocking call. The blocking call will be interrupted if it does not complete within your chosen timeout (in seconds; if you need finer granularity you can use setitimer(2)).
Note that signals in C are somewhat hairy, and there are fairly onerous restriction on what you can do in your signal handler.
This page is useful and fairly concise:
http://www.gnu.org/s/libc/manual/html_node/Setting-an-Alarm.html
What you want is something like select(2), depending on the OS you are targeting.
It sounds like you need a 'monitor', capable of signaling availability of resource to threads via a shared mutex (typically). In Boost.Thread a condition_variable could do the job.
You might want to look at timed locks: Your blocking method can aquire the lock before starting to wait and release it as soon as the data is availabe. You can then try to acquire the lock (with a timeout) in your timed wait method.
Encapsulate the blocking call in a separate thread. Have an intermediate message buffer in that thread that is guarded by a condition variable (as said before). Make your main thread timed-wait on that condition variable. Receive the intermediately stored message if the condition is met.
So basically put a new layer capable of timed-wait between the API and your application. Adapter pattern.
Regarding
what happens to that second thread if the blocked call never returns?
I believe there is nothing you can do to recover cleanly without cooperation from the called function (or library). 'Cleanly' means cleaning up all resources owned by that thread, including memory, other threads, locks, files, locks on files, sockets, GPU resources... Un-cleanly, you can indeed kill the runaway thread.