Visibility of change to shared memory from shm_open() + mmap() - c++

Let's say I am on CentOS 7 x86_64 + GCC 7.
I would like to create a ringbuffer in shared memory.
If I have two processes Producer and Consumer, and both share a named shared memory, which is created/accessed through shm_open() + mmap().
If Producer writes something like:
struct Data {
uint64_t length;
char data[100];
}
to the shared memory at a random time, and the Consumer is constantly polling the shared memory to read. Will I have some sort of synchronization issue that the member length is seen but the member data is still in the progress of writing? If yes, what's the most efficient technique to avoid the issue?
I see this post:
Shared-memory IPC synchronization (lock-free)
But I would like to get a deeper, more low level of understanding what's required to synchronize between two processes efficiently.
Thanks in advance!

To avoid this, you would want to make the structure std::atomic and access it with acquire-release memory ordering. On most modern processors, the instructions this inserts are memory fences, which guarantee that the writer wait for all loads to complete before it begins writing, and that the reader wait for all stores to complete before it begins reading.
There are, in addition, locking primitives in POSIX, but the <atomic> header is newer and what you probably want.
What the Standard Says
From [atomics.lockfree], emphasis added:
Operations that are lock-free should also be address-free. That is, atomic operations on the same memory location via two different addresses will communicate atomically. The implementation should not depend on any per-process state. This restriction enables communication by memory that is mapped into a process more than once and by memory that is shared between two processes.
For lockable atomics, the standard says in [thread.rec.lockable.general], emphasis added:
An execution agent is an entity such as a thread that may perform work in parallel with other execution agents. [...] Implementations or users may introduce other kinds of agents such as processes [....]
You will sometimes see the claim that the standard supposedly makes no mention of using the <atomic> primitives with memory shared between processes, only threads. This is incorrect.
However, passing pointers to the other process through shared memory will not work, as the shared memory may be mapped to different parts of the address space, and of course a pointer to any object not in shared memory is right out. Indices and offsets of objects within shared memory will. (Or, if you really need pointers, Boost provides IPC-safe wrappers.)

Yes, you will ultimately run into data races, not only length being written and read before data is written, but also parts of those members will be written out of sync of your process reading it.
Although lock-free is the new trend, I'd suggest to go for a simpler tool as your first IPC sync job: the semaphore. On linux, the following man pages will be useful:
sem_init
sem_wait
sem_post
The idea is to have both processes signal the other one it is currently reading or writing the shared memory segment. With a semaphore, you can write inter-process mutexes:
Producer:
while true:
(opt) create resource
lock semaphore (sem_wait)
copy resource to shm
unlock semaphore (sem_post)
Consumer:
while true:
lock semaphore (sem_wait)
copy resource to local memory
or crunch resource
unlock semaphore (sem_post)
If for instance Producer is writing into shm while Consumer calls sem_wait, Consumer will block until after Producer will call sem_post, but, you have no guarantee Producer won't go for another loop, writing two times in a row before Consumer will be woke up. You have to build a mechanism unsure Producer & Consumer do work alternatively.

Related

Do these Boost::Interprocess components require synchronisation?

I am building a multi-producer/single-consumer application using IPC, implemented using Boost.Interprocess.
Each producer sends a message by allocating a block inside shared memory (managed_shared_memory::allocate), and marshalling an object into that block. Then, it sends a small object via a message_queue which holds the location (offset) of the block and the size.
The consumer receives this indicator from the queue and unmarshalls the object. The consumer is responsible for deallocating the block of memory.
Based on this implementation, I don't believe the objects or blocks as they exist in memory need synchronisation, because as soon as the consumer knows about them, the producer will no longer touch them. Therefore, I believe it is only the internals of message_queue and managed_shared_memory which need synchronisation.
My question is: Bearing in mind that each process is single-threaded, do allocate/deallocate, and send/receive calls need synchronisation?
The Boost examples provided by the documentation don't make use of synchronisation for the message queue, but I think this is only to simplify the sample source.
I have seen this question, but it is asking about thread-safety, and not about these specific components of Boost.Interprocess.
You do not need to use any kind of locking to protect these, operations. They are already protected using a recursive mutex in shared memory, otherwise multiple processes would not be able to operate in the same shared memory block at the same time.
Regarding managed_shared_memory:
One of the features of named/unique allocations/searches/destructions
is that they are atomic. Named allocations use the recursive
synchronization scheme defined by the internal mutex_family typedef
defined of the memory allocation algorithm template parameter
(MemoryAlgorithm). That is, the mutex type used to synchronize
named/unique allocations is defined by the
MemoryAlgorithm::mutex_family::recursive_mutex_type type. For shared
memory, and memory mapped file based managed segments this recursive
mutex is defined as boost::interprocess::interprocess_recursive_mutex.
This also extends to raw allocation, you can verify this by looking at boost/interprocess/mem_algo/detail/simple_seq_fit.hpp.
For message queue, boost::interprocess considers this to be a synchronization mechanism in the same way a mutex is, it will take care of all the necessary guarantees, locking its internal datastructures and issuing memory barriers as required.
Furthermore this is all equally applicable to multithreaded programming. Even if you were calling send or allocate from multiple threads in the same program everything would be fine. The locking boost::interporcess provides would protect you from other threads in the same way it protects you from other processes.

Is boost::interprocess threadsafe?

Currently, I have 2 processes that communicate using the message_queue and shared_memory form boost. Everything work as attended.
Now I need to make one of this process multi threaded (thanks to boost again), and I was wondering if I need to use protection mechanism between the threads (such as mutex), or if the boost::interprocess library already provides a protection mechanism ?
I did not find any info on that on the boost documentation. By the way I'm using boost 1.40.
Thanks in advance.
The shared resources from boost::interprocess (shared memory, etc) require that you provide the necessary synchronization. The reason for this is that you may not require synchronization, and it usually a somewhat expensive operation performance wise.
Say for example that you had a process that wrote to shared memory the current statistics of something in 32-bit integer format, and a few processes that read those values. Since the values are integers (and therefore on your platform the reads and writes are atomic) and you have a single process writing them and a few process reading them, no synchronization is needed for this design.
However in some examples you will require synchronization, like if the above example had multiple writers, or if instead of integers you were using string data. There are various synchronization mechanisms inside of boost (as well as non-boost ones, but since your already using boost), described here:
[Boost info for stable version: 1.48]
http://www.boost.org/doc/libs/1_48_0/doc/html/interprocess/synchronization_mechanisms.html
[Boost info for version your using: 1.40]
http://www.boost.org/doc/libs/1_40_0/doc/html/interprocess/synchronization_mechanisms.html
With shared memory it is a common practice to place the synchronization mechanism at the base of the shared memory segment, where it can be anonymous (meaning the OS kernel does not provide access to it by name). This way all the processes know how to lock the shared memory segment, and you can associate locks with their segments (if you had multiple for example)
Remember that a mutex requires the same thread of execution (inside a process) to unlock it that locked it. If you require locking and unlocking a synchronization object from different threads of execution, you need a semaphore.
Please be sure that if you choose to use a mutex that it is an interprocess mutex (http://www.boost.org/doc/libs/1_48_0/doc/html/boost/interprocess/interprocess_mutex.html) as opposed to the mutex in the boost thread library which is for a single process with multiple threads.
You have to make sure you lock the shared resources.
You can find examples in boost documentation. For example:
http://www.boost.org/doc/libs/1_40_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.mutexes.mutexes_scoped_lock

lock freedom/atomic operations across 2 processes instead of threads

I am sharing some data across multiple processes by using shared memory; I use inter processes mutexes to achieve synchronization.
My question is the following: is it possible to use lock-free data structures AND/OR atomic operations to achieve faster synchronization without using mutexes between 2 processes?
If not do you know what is the main reason for this?
They are used only to synchronize threads of the same process. Are these concepts portable to processes as well? If they aren't do you know any faster method to share/synchronize data across processes?
Are these concepts portable to processes as well?
Yes, atomic operations are universal both for threads and processes, IIF the memory atomically used is shared.
Atomic operation is specific instruction of processor itself and in knows nothing about threads or processes, it is just All-or-nothing (indivisible) complex of actions (read; compare; store) with low-level hardware implementation.
So, you can setup shared memory between processes and put an atomic_t into it.
lock-free
Yes, if lock-free is implemented only with atomic. (It should)
data structures
You should check, that shared memory is mapped to the same address in both processes when it is used to store pointers (in data structures).
If the memory will be mapped to different address, pointers will be broken in another process. In this case you need to use relative addresses, and do simple memory translation.
inter processes mutexes
And I should say that glibc>2.4 (NPTL) uses futex combined with atomic operations for non-contended lock (for Process shared mutexes = inter process mutexes). So, you already use atomic operations in shared memory.
On x86 with NPTL, most of the synchronization primitives have as their fast path just a single interlocked operation with a full memory barrier. Since x86 platforms don't really have anything lighter than that, they are already about the best you can do. Unless the existing atomic operations do exactly what you need to do, there will be no performance boost to pay back the costs of using the semantically lighter primitive.

How do I protect a character string in shared memory between two processes?

I have a piece of shared memory that contains a char string and an integer between two processes.
Process A writes to it and Process B reads it (and not vice versa)
What is the most efficient and effective way to make sure that Process A doesn't happen to update (write to it) that same time Process B is reading it? (Should I just use flags in the shared memory, use semaphores, critical section....)
If you could point me in the right direction, I would appreciate it.
Thanks.
Windows, C++
You cannot use a Critical Section because these can only be used for synchronization between threads within the same process. For inter process synchronization you need to use a Mutex or a Semaphore. The difference between these two is that the former allows only a single thread to own a resource, while the latter can allow up to a maximum number (specified during creation) to own the resource simultaneously.
In your case a Mutex seems appropriate.
Since you have two processes you need a cross-process synchronisation object. I think this means that you need to use a mutex.
A mutex object facilitates protection against data races and allows
thread-safe synchronization of data between threads. A thread obtains
ownership of a mutex object by calling one of the lock functions and
relinquishes ownership by calling the corresponding unlock function.
If you are using boost thread, you can use it's mutex and locking, more to read see the link below:
http://www.boost.org/doc/libs/1_47_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_types
Since you're talking about two processes, system-wide mutexes will work, and Windows has those. However, they aren't necessarily the most efficient way.
If you can put more things in shared memory, then passing data via atomic operations on flags in that memory should be the most efficient thing to do. For instance, you might use the Interlocked functions to implement Dekker's Algorithm (you'll probably want to use something like YieldProcessor() to avoid busy waiting).

On MacOSX, in C++, how to do interprocess communication over shared memory without spin lock?

I have two processes:
Producer
and
Consumer
they have a commonly mmaped shared region of memory
Memory
Now, Producer writes stuff to Memory. Consumer reads stuff from Memory.
I would prefer Consumer not to spin wait with Memory is empty.
I would prefer Producer not to spin wait when Memory is full.
How do I achieve this?
how about using mutexes? since a mutex will sleep until the resource is available, you won't experience the spin-wait problem.
This is reminiscent of the Dining Philosophers Problem. If your platform supports it, you could use condition variables shared across multiple processes. With such shared conditional variables your Producer could signal your Consumer to read Memory when data is available, and vice versa when Memory is empty. Remember to check for a spurious wakeup.
You'd need to check if MacOSX pthread implementation supports condition variables shared across processes. See my answer to your mutex related question to determine how. The answer applies for condition variables as well.