I am sharing some data across multiple processes by using shared memory; I use inter processes mutexes to achieve synchronization.
My question is the following: is it possible to use lock-free data structures AND/OR atomic operations to achieve faster synchronization without using mutexes between 2 processes?
If not do you know what is the main reason for this?
They are used only to synchronize threads of the same process. Are these concepts portable to processes as well? If they aren't do you know any faster method to share/synchronize data across processes?
Are these concepts portable to processes as well?
Yes, atomic operations are universal both for threads and processes, IIF the memory atomically used is shared.
Atomic operation is specific instruction of processor itself and in knows nothing about threads or processes, it is just All-or-nothing (indivisible) complex of actions (read; compare; store) with low-level hardware implementation.
So, you can setup shared memory between processes and put an atomic_t into it.
lock-free
Yes, if lock-free is implemented only with atomic. (It should)
data structures
You should check, that shared memory is mapped to the same address in both processes when it is used to store pointers (in data structures).
If the memory will be mapped to different address, pointers will be broken in another process. In this case you need to use relative addresses, and do simple memory translation.
inter processes mutexes
And I should say that glibc>2.4 (NPTL) uses futex combined with atomic operations for non-contended lock (for Process shared mutexes = inter process mutexes). So, you already use atomic operations in shared memory.
On x86 with NPTL, most of the synchronization primitives have as their fast path just a single interlocked operation with a full memory barrier. Since x86 platforms don't really have anything lighter than that, they are already about the best you can do. Unless the existing atomic operations do exactly what you need to do, there will be no performance boost to pay back the costs of using the semantically lighter primitive.
Related
Let's say I am on CentOS 7 x86_64 + GCC 7.
I would like to create a ringbuffer in shared memory.
If I have two processes Producer and Consumer, and both share a named shared memory, which is created/accessed through shm_open() + mmap().
If Producer writes something like:
struct Data {
uint64_t length;
char data[100];
}
to the shared memory at a random time, and the Consumer is constantly polling the shared memory to read. Will I have some sort of synchronization issue that the member length is seen but the member data is still in the progress of writing? If yes, what's the most efficient technique to avoid the issue?
I see this post:
Shared-memory IPC synchronization (lock-free)
But I would like to get a deeper, more low level of understanding what's required to synchronize between two processes efficiently.
Thanks in advance!
To avoid this, you would want to make the structure std::atomic and access it with acquire-release memory ordering. On most modern processors, the instructions this inserts are memory fences, which guarantee that the writer wait for all loads to complete before it begins writing, and that the reader wait for all stores to complete before it begins reading.
There are, in addition, locking primitives in POSIX, but the <atomic> header is newer and what you probably want.
What the Standard Says
From [atomics.lockfree], emphasis added:
Operations that are lock-free should also be address-free. That is, atomic operations on the same memory location via two different addresses will communicate atomically. The implementation should not depend on any per-process state. This restriction enables communication by memory that is mapped into a process more than once and by memory that is shared between two processes.
For lockable atomics, the standard says in [thread.rec.lockable.general], emphasis added:
An execution agent is an entity such as a thread that may perform work in parallel with other execution agents. [...] Implementations or users may introduce other kinds of agents such as processes [....]
You will sometimes see the claim that the standard supposedly makes no mention of using the <atomic> primitives with memory shared between processes, only threads. This is incorrect.
However, passing pointers to the other process through shared memory will not work, as the shared memory may be mapped to different parts of the address space, and of course a pointer to any object not in shared memory is right out. Indices and offsets of objects within shared memory will. (Or, if you really need pointers, Boost provides IPC-safe wrappers.)
Yes, you will ultimately run into data races, not only length being written and read before data is written, but also parts of those members will be written out of sync of your process reading it.
Although lock-free is the new trend, I'd suggest to go for a simpler tool as your first IPC sync job: the semaphore. On linux, the following man pages will be useful:
sem_init
sem_wait
sem_post
The idea is to have both processes signal the other one it is currently reading or writing the shared memory segment. With a semaphore, you can write inter-process mutexes:
Producer:
while true:
(opt) create resource
lock semaphore (sem_wait)
copy resource to shm
unlock semaphore (sem_post)
Consumer:
while true:
lock semaphore (sem_wait)
copy resource to local memory
or crunch resource
unlock semaphore (sem_post)
If for instance Producer is writing into shm while Consumer calls sem_wait, Consumer will block until after Producer will call sem_post, but, you have no guarantee Producer won't go for another loop, writing two times in a row before Consumer will be woke up. You have to build a mechanism unsure Producer & Consumer do work alternatively.
I have a general question about parallel programming in C and C++ and would appreciate it if you could answer it. As far as I know, we can declare a variable in at least one level higher (parent thread) to share it among children threads. So, I was wondering if there is any other way to share a variable among threads with the same parent thread? Is this API dependant or not?
For Posix threads, read some pthread tutorial.
For C++11, read the documentation of its thread library
All threads of the same process share the same address space in virtual memory. As commented by Marco A. consider also thread_local variables.
Notice that you share data or memory (not variables, which exist only in the source code)
In practice, you'll better protect with a mutex the shared data (for synchronization) to avoid data races.
In the simple case, the mutex and the shared data are in some global variables.
You could also use atomic operations.
BTW, you could also develop a parallel application using some message passing paradigm, e.g. using MPI (or simply using some RPC or other messages, e.g. JSON on sockets). You might consider for regular numerical applications to use the GPGPU e.g. using OpenCL. And of course you might mix all the approaches (using OpenCL, with several threads, and having your parallel software running in several such processes communicating with MPI).
Debugging a heavily parallel software can become a nightmare. Performance may depend upon the hardware system and may require tricky tuning. scalability and synchronization may becoming a growing concern.map-reduce is often a useful model.
In C++ and C any memory location (identified by a variable) can be shared among threads. The memory space is the same across all threads. There is no parent/child thread relationship with memory.
The challenge is to control or synchronize access to the memory location among the threads.
That is implementation dependent.
Any global variable is sharable among threads, since threads are light weight processes sharing the same address space. For synchronization, you need to ensure mutual exclusion while updating/accessing those global variables through semaphores or wait notify blocks.
Does std::atomic play well in shared memory, or is it undefined? It seems like an easy way to add lockless basic types to shared memory however I could believe that it's not possible to guarantee atomic behaviour in the context of shared memory.
Why not, you just need to allocate and construct it inside the shared memory region properly.
It depends.
If the architecture you are using supports atomic operations on 64-bit types, I would expect it to work. If std::atomic is simulating atomic operations with mutexes then you will have a problem:
Shared memory is usually used to communicate between processes - and the mutex being used may only work between threads in a single process (for example the Windows CriticalSection API).
Alternatively, shared memory is quite likely to be mapped at different addresses in different processes, and the mutex may have internal pointers which mean that doesn't work.
Currently, I have 2 processes that communicate using the message_queue and shared_memory form boost. Everything work as attended.
Now I need to make one of this process multi threaded (thanks to boost again), and I was wondering if I need to use protection mechanism between the threads (such as mutex), or if the boost::interprocess library already provides a protection mechanism ?
I did not find any info on that on the boost documentation. By the way I'm using boost 1.40.
Thanks in advance.
The shared resources from boost::interprocess (shared memory, etc) require that you provide the necessary synchronization. The reason for this is that you may not require synchronization, and it usually a somewhat expensive operation performance wise.
Say for example that you had a process that wrote to shared memory the current statistics of something in 32-bit integer format, and a few processes that read those values. Since the values are integers (and therefore on your platform the reads and writes are atomic) and you have a single process writing them and a few process reading them, no synchronization is needed for this design.
However in some examples you will require synchronization, like if the above example had multiple writers, or if instead of integers you were using string data. There are various synchronization mechanisms inside of boost (as well as non-boost ones, but since your already using boost), described here:
[Boost info for stable version: 1.48]
http://www.boost.org/doc/libs/1_48_0/doc/html/interprocess/synchronization_mechanisms.html
[Boost info for version your using: 1.40]
http://www.boost.org/doc/libs/1_40_0/doc/html/interprocess/synchronization_mechanisms.html
With shared memory it is a common practice to place the synchronization mechanism at the base of the shared memory segment, where it can be anonymous (meaning the OS kernel does not provide access to it by name). This way all the processes know how to lock the shared memory segment, and you can associate locks with their segments (if you had multiple for example)
Remember that a mutex requires the same thread of execution (inside a process) to unlock it that locked it. If you require locking and unlocking a synchronization object from different threads of execution, you need a semaphore.
Please be sure that if you choose to use a mutex that it is an interprocess mutex (http://www.boost.org/doc/libs/1_48_0/doc/html/boost/interprocess/interprocess_mutex.html) as opposed to the mutex in the boost thread library which is for a single process with multiple threads.
You have to make sure you lock the shared resources.
You can find examples in boost documentation. For example:
http://www.boost.org/doc/libs/1_40_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.mutexes.mutexes_scoped_lock
I have a piece of shared memory that contains a char string and an integer between two processes.
Process A writes to it and Process B reads it (and not vice versa)
What is the most efficient and effective way to make sure that Process A doesn't happen to update (write to it) that same time Process B is reading it? (Should I just use flags in the shared memory, use semaphores, critical section....)
If you could point me in the right direction, I would appreciate it.
Thanks.
Windows, C++
You cannot use a Critical Section because these can only be used for synchronization between threads within the same process. For inter process synchronization you need to use a Mutex or a Semaphore. The difference between these two is that the former allows only a single thread to own a resource, while the latter can allow up to a maximum number (specified during creation) to own the resource simultaneously.
In your case a Mutex seems appropriate.
Since you have two processes you need a cross-process synchronisation object. I think this means that you need to use a mutex.
A mutex object facilitates protection against data races and allows
thread-safe synchronization of data between threads. A thread obtains
ownership of a mutex object by calling one of the lock functions and
relinquishes ownership by calling the corresponding unlock function.
If you are using boost thread, you can use it's mutex and locking, more to read see the link below:
http://www.boost.org/doc/libs/1_47_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_types
Since you're talking about two processes, system-wide mutexes will work, and Windows has those. However, they aren't necessarily the most efficient way.
If you can put more things in shared memory, then passing data via atomic operations on flags in that memory should be the most efficient thing to do. For instance, you might use the Interlocked functions to implement Dekker's Algorithm (you'll probably want to use something like YieldProcessor() to avoid busy waiting).