Alternatives to POSIX semaphores for 64-bit/32-bit IPC? - c++

I need to implement some sort of blocking wait for a project requiring synchronization between 64-bit and 32-bit processes. Busy waiting on a shared memory variable introduces performance/scheduling issues and POSIX semaphores do not appear to support IPC between 32-bit and 64-bit processes. Are there other low-overhead alternatives for interprocess synchronization on Linux?

Linux has futexes which are a kernel primitive that provides a way for one process to go to sleep and another process to wake it up. They have extremely good fast paths (avoiding kernel calls in those cases) which matters a lot if you use them as a mutex but not so much if you use them as a semaphore.
You would only need its two most primitive functions. One, FUTEX_WAIT, puts a kernel to sleep if, and only if, a particular entry in shared memory has a particular value. The other, FUTEX_WAKE, wakes a process that has gone to sleep with FUTEX_WAIT.
Your "wait" code would atomically check a shared variable to see that it needed to sleep and then call FUTEX_WAIT to go to sleep if, and only if, the shared variable has not changed. Your "wake" code would change the value of the atomic shared variable and then call FUTEX_WAKE to wake any thread that was sleeping.
The 32-bit/64-bit issue would not matter at all if you use a 64-bit shared variable but only put meaningful data in the first 32-bits so it would work the same whether addressed as a 64-bit variable or a 32-bit variable.

For inter-process synchronization using blocking waits, simple solutions include either a named pipe (fd) or a System V Semaphore.
Named pipes have a file path associated with them, so that the two processes can open the file independently (one for read, the other for write). For pure synchronization, just putc() to signal, and getc() to wait, one character at a time (value doesn't matter). This creates a unidirectional ("half duplex") channel; for bidirectional signal/waits you'd create two files. You can even queue up multiple signals by performing many putc() calls in a row, kind of like a semaphore which never saturates.
System V Semaphores also have a file path associated with them. These behave like a Dijkstra semaphore.
For additional options, check out
https://en.wikipedia.org/wiki/Inter-process_communication

Related

Upgradable mutex lies at shared memory on both Windows and Linux

I have 2 processes called Writer and Reader running on the same machine. Writer is a singular thread and writes data to a shared memory. Reader has 8 threads that intend to read data from the shared memory concurrently. I need a locking mechanism that meets following criteria:
1) At a time, either Writer or Reader is allowed to access the shared memory.
2) If Reader has permission to read data from the shared memory, all its own threads can read data.
3) Writer has to wait until Reader "completely" releases the lock (because it has multiple threads).
I have read much about sharable mutex that seems to be the solution. Here I describe more detailed about my system:
1) System should run on both Windows & Linux.
2) I divide the shared memory into two regions: locks & data. The data region is further divided into 100 blocks. I intend to create 100 "lock objects" (sharable mutex) and lay them on the locks region. These lock objects are used for synchronization of 100 the data blocks, 1 lock object for 1 data block.
3) Writer, Readers first determine which block it would like to access then try to acquire the appropriate lock. Once acquired the lock, it then performs on the data block.
My concern now is:
Is there any "built-in" way to lay the lock objects on shared memory on Windows and Linux (Centos) and then I can do lock/unlock with the objects without using boost library.
[Edited Feb 25, 2016, 09:30 GMT]
I can suggest a few things. It really depends on the requirements.
If it seems like the boost upgradeable mutex fits the bill, then by all means, use it. From 5 minute reading is seems you can use them in shm. I have no experience with it as I don't use boost. Boost is available on Windows and Linux so I don't see why not use it. You can always grab the specific code you like and bring it into your project without dragging the entire behemoth along.
Anyway, isn't it fairly easy to test and see is it good enough?
I don't understand the requirement for placing locks in shm. If it's no real requirement, and you want to use OS native objects, you can use a different mechanism per OS. Say, named mutex on Windows (not in shm), and pthread_rwlock, in shm, on Linux.
I know what I would prefer to use: a seqlock.
I work in the low-latency domain, so I'm picking what gets me the lowest possible latency. I measure it in cpu cycles.
From you mentioning that you want a lock per object, rather than one big lock, I assume performance is important.
There're important questions here, though:
Since it's in shm, I assume it's POD (flat data)? If not, you can switch to a read/write spinlock.
Are you ok with spinning (busy wait) or do you want to sleep-wait? seqlocks and spinlocks are no OS mechanism, so there's nobody to put your waiting threads to sleep. If you do want to sleep-wait, read #4
If you care to know the other side (reader/write) died, you have to impl that in some other way. Again, because seqlock is no OS beast. If you want to be notified of other side's death as part of the synchronization mechanism, you'll have to settle for named mutexes, on Windows, and on robust mutexes, in shm, on Linux
Spinlocks and seqlocks provide the maximum throughput and minimum latency. With kernel supported synchronization, a big part of the latency is spent in switching between user and kernel space. In most applications it is not a problem as synchronization is only happening in a small fraction of the time, and the extra latency of a few microseconds is negligible. Even in games, 100 fps leaves you with 10ms per frame, that is eternity in term of mutex lock/unlock.
There are alternatives to spinlock that are usually not much more expensive.
In Windows, Critical Section is actually a spinlock with a back-off mechanism that uses an Event object. This was re-implemented using shm and named Event and called Metered Section.
In Linux, the pthread mutex is futex based. A futex is like Event on Windows. A non-robust mutex with no contention is just a spinlock.
These guys still don't provide you with notification when the other side dies.
Addition [Feb 26, 2016, 10:00 GMT]
How to add your own owner death detection:
The Windows named mutex and pthread robust mutex have this capability built-in. It's easy enough to add it yourself when using other lock types and could be essential when using user-space-based locks.
First, I have to say, in many scenarios it's more appropriate to simply restart everything instead of detecting owner's death. It is definitely simpler as you also have to release the lock from a process that is not the original owner.
Anyway, native way to detect a process death is easy on Windows - processes are waitable objects so you can just wait on them. You can wait for zero time for an immediate check.
On Linux, only the parent is supposed to know about it's child's death, so less trivial. The parent can get SIGCHILD, or use waitpid().
My favorite way to detect process death is different. I connect a non-blocking TCP socket between the 2 processes and trust the OS to kill it on process death.
When you try to read data from the socket (on any of the sides) you'd read 0 bytes if the peer has died. If it's still alive, you'd get EWOULDBLOCK.
Obviously, this also works between boxes, so kinda convenient to have it uniformly done once and for all.
Your worker loop will have to change to interleave the peer death check and it's usual work.
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <boost/interprocess/sync/interprocess_condition.hpp>**
//Mutex to protect access to the queue
boost::interprocess::interprocess_mutex mutex;
//Condition to wait when the queue is empty
boost::interprocess::interprocess_condition cond_empty;
//Condition to wait when the queue is full
boost::interprocess::interprocess_condition cond_full;

How do I protect a character string in shared memory between two processes?

I have a piece of shared memory that contains a char string and an integer between two processes.
Process A writes to it and Process B reads it (and not vice versa)
What is the most efficient and effective way to make sure that Process A doesn't happen to update (write to it) that same time Process B is reading it? (Should I just use flags in the shared memory, use semaphores, critical section....)
If you could point me in the right direction, I would appreciate it.
Thanks.
Windows, C++
You cannot use a Critical Section because these can only be used for synchronization between threads within the same process. For inter process synchronization you need to use a Mutex or a Semaphore. The difference between these two is that the former allows only a single thread to own a resource, while the latter can allow up to a maximum number (specified during creation) to own the resource simultaneously.
In your case a Mutex seems appropriate.
Since you have two processes you need a cross-process synchronisation object. I think this means that you need to use a mutex.
A mutex object facilitates protection against data races and allows
thread-safe synchronization of data between threads. A thread obtains
ownership of a mutex object by calling one of the lock functions and
relinquishes ownership by calling the corresponding unlock function.
If you are using boost thread, you can use it's mutex and locking, more to read see the link below:
http://www.boost.org/doc/libs/1_47_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_types
Since you're talking about two processes, system-wide mutexes will work, and Windows has those. However, they aren't necessarily the most efficient way.
If you can put more things in shared memory, then passing data via atomic operations on flags in that memory should be the most efficient thing to do. For instance, you might use the Interlocked functions to implement Dekker's Algorithm (you'll probably want to use something like YieldProcessor() to avoid busy waiting).

Mutexes in multithread Linux application

Could you help me to understand how to use mutexes in multithread Linux application, where:
during data writing it is need to lock variable on write and read
during data reading from the variable it is need to lock it on write.
So it is possible to read simultaneously, but writing opertion is a single opertaion in the same time. During writing, all other operation should wait before it finishes.
You're asking about something that is a bit higher level than mutexes. A mutex is a simple, low-level device. When you lock a thread with a mutex, the CPU is either executing code in the thread that obtained the lock or it is executing some other process entirely. In other words, the mutex has locked out all other threads that belong to the same (heavyweight) process.
You are asking about a read-write lock. Read-write locks use mutexes underneath the hood. The POSIX functions that deal with read-write locks start with pthread_rwlock_. Since you are on a Linux machine, just type man pthread and look for the section marked "READ/WRITE LOCK ROUTINES".
You need a reader/writer lock to allow multiple readers/single writer.
Boost.Thread has one of these (boost::shared_mutex), if you have no other preferred threading library. This uses PThreads primitives under the covers, and will probably save you time in wrapping the raw APIs yourself.
I would not recommend implementing this yourself - it's easy to get something that appears to work, but under load either crashes or kills performance or (worst of all) silently modifies your data in a way it should not be, so you get bad results.
A simple boost::mutex can also be used here as noted by #Als, but won't allow multiple concurrent reads. That is simpler to implement, and may be sufficient for your needs, depending on your read/write access profile.
You will need to use mutexes, if you have global or static objects which are being accessed(read and written to) from different threads.

How to write your own condition variable using atomic primitives

I need to write my own implementation of a condition variable much like pthread_cond_t.
I know I'll need to use the compiler provided primitives like __sync_val_compare_and_swap etc.
Does anyone know how I'd go about this please.
Thx
Correct implementation of condition variables is HARD. Use one of the many libraries out there instead (e.g. boost, pthreads-win32, my just::thread library)
You need to:
Keep a list of waiting threads (this might be a "virtual" list rather than an actual data structure)
Ensure that when a thread waits you atomically unlock the mutex owned by the waiting thread and add it to the list before that thread goes into a blocking OS call
Ensure that when the condition variable is notified then one of the threads waiting at that time is woken, and not one that waits later
Ensure that when the condition variable is broadcast then all of the threads waiting at that time are woken, and not any threads that wait later.
plus other issues that I can't think of just now.
The details vary with OS, as you are dependent on the OS blocking/waking primitives.
I need to write my own implementation of a condition variable much like pthread_cond_t.
The condition variables cannot be implemented using only the atomic primitives like compare-and-swap.
The purpose in life of the cond vars is to provide flexible mechanism for application to access the process/thread scheduler: put a thread into sleep and wake it up.
Atomic ops are implemented by the CPU, while process/thread scheduler is an OS territory. Without some supporting system call (or emulation using existing synchronization primitives) implementing cond vars is impossible.
Edit1. The only sensible example I know and can point you to is the implementation of the historical Linux pthread library which can be found here - e.g. version from 1997. The implementation (found in condvar.c file) is rather easy to read but also highlights the requirements for implementation of the cond vars. Spinlocks (using test-and-set op) are used for synchronizations and POSIX signals are used to put threads into sleep and to wake them up.
It depends on your requirements. IF you have no further requirements, and if your process may consume 100% of available CPU time, then you have the rare chance to experiment and try out different mutex and condition variables - just try it out, and learn about the details. Great thing.
But in reality, you are uusally bound to an operating system, and so you are captivated on the OSs threading primitives, because they represent the only kind of control to - yeah - process/threading/cpu ressource usage! So, in that case, you will not even have the chance to implement your OWN condition variables - if they are not based on the primites, that the OS provides you!
So... double check your environment, what do you control? What don't you control? And what makes sense?

Thread communication theory

What is the common theory behind thread communication? I have some primitive idea about how it should work but something doesn't settle well with me. Is there a way of doing it with interrupts?
Really, it's just the same as any concurrency problem: you've got multiple threads of control, and it's indeterminate which statements on which threads get executed when. That means there are a large number of POTENTIAL execution paths through the program, and your program must be correct under all of them.
In general the place where trouble can occur is when state is shared among the threads (aka "lightweight processes" in the old days.) That happens when there are shared memory areas,
To ensure correctness, what you need to do is ensure that these data areas get updated in a way that can't cause errors. To do this, you need to identify "critical sections" of the program, where sequential operation must be guaranteed. Those can be as little as a single instruction or line of code; if the language and architecture ensure that these are atomic, that is, can't be interrupted, then you're golden.
Otherwise, you idnetify that section, and put some kind of guards onto it. The classic way is to use a semaphore, which is an atomic statement that only allows one thread of control past at a time. These were invented by Edsgar Dijkstra, and so have names that come from the Dutch, P and V. When you come to a P, only one thread can proceed; all other threads are queued and waiting until the executing thread comes to the associated V operation.
Because these primitives are a little primitive, and because the Dutch names aren't very intuitive, there have been some ther larger-scale approaches developed.
Per Brinch-Hansen invented the monitor, which is basically just a data structure that has operations which are guaranteed atomic; they can be implemented with semaphores. Monitors are pretty much what Java synchronized statements are based on; they make an object or code block have that particular behavir -- that is, only one thread can be "in" them at a time -- with simpler syntax.
There are other modeals possible. Haskell and Erlang solve the problem by being functional languages that never allow a variable to be modified once it's created; this means they naturally don't need to wory about synchronization. Some new languages, like Clojure, instead have a structure called "transactional memory", which basically means that when there is an assignment, you're guaranteed the assignment is atomic and reversible.
So that's it in a nutshell. To really learn about it, the best places to look at Operating Systems texts, like, eg, Andy Tannenbaum's text.
The two most common mechanisms for thread communication are shared state and message passing.
THe most common way for threads to communicate is via some shared data structure, typically a queue. Some threads put information into the queue while others take it out. The queue must be protected by operating system facilities such as mutexes and semaphores. Interrupts have nothing to do with it.
If you're really interested in a theory of thread communications, you may want to look into formalisms like the pi Calculus.
To communicate between threads, you'll need to use whatever mechanism is supplied by your operating system and/or runtime. Interrupts would be unusually low level, although they might be used implicitly if your threads communicate using sockets or named pipes.
A common pattern would be to implement shared state using a shared memory block, relying on an os-supplied synchronization primitive such as a mutex to spare you from busy-waiting when your read from the block. Remember that if you have threads at all, then you must have some kind of scheduler already (whether it's native from the OS or emulated in your language runtime). So this scheduler can provide synchronization objects and a "sleep" function without necessarily having to rely on hardware support.
Sockets, pipes, and shared memory work between processes too. Sometimes a runtime will give you a lighter-weight way of doing synchronization for threads within the same process. Shared memory is cheaper within a single process. And sometimes your runtime will also give you an atomic message-passing mechanism.