Do these Boost::Interprocess components require synchronisation? - c++

I am building a multi-producer/single-consumer application using IPC, implemented using Boost.Interprocess.
Each producer sends a message by allocating a block inside shared memory (managed_shared_memory::allocate), and marshalling an object into that block. Then, it sends a small object via a message_queue which holds the location (offset) of the block and the size.
The consumer receives this indicator from the queue and unmarshalls the object. The consumer is responsible for deallocating the block of memory.
Based on this implementation, I don't believe the objects or blocks as they exist in memory need synchronisation, because as soon as the consumer knows about them, the producer will no longer touch them. Therefore, I believe it is only the internals of message_queue and managed_shared_memory which need synchronisation.
My question is: Bearing in mind that each process is single-threaded, do allocate/deallocate, and send/receive calls need synchronisation?
The Boost examples provided by the documentation don't make use of synchronisation for the message queue, but I think this is only to simplify the sample source.
I have seen this question, but it is asking about thread-safety, and not about these specific components of Boost.Interprocess.

You do not need to use any kind of locking to protect these, operations. They are already protected using a recursive mutex in shared memory, otherwise multiple processes would not be able to operate in the same shared memory block at the same time.
Regarding managed_shared_memory:
One of the features of named/unique allocations/searches/destructions
is that they are atomic. Named allocations use the recursive
synchronization scheme defined by the internal mutex_family typedef
defined of the memory allocation algorithm template parameter
(MemoryAlgorithm). That is, the mutex type used to synchronize
named/unique allocations is defined by the
MemoryAlgorithm::mutex_family::recursive_mutex_type type. For shared
memory, and memory mapped file based managed segments this recursive
mutex is defined as boost::interprocess::interprocess_recursive_mutex.
This also extends to raw allocation, you can verify this by looking at boost/interprocess/mem_algo/detail/simple_seq_fit.hpp.
For message queue, boost::interprocess considers this to be a synchronization mechanism in the same way a mutex is, it will take care of all the necessary guarantees, locking its internal datastructures and issuing memory barriers as required.
Furthermore this is all equally applicable to multithreaded programming. Even if you were calling send or allocate from multiple threads in the same program everything would be fine. The locking boost::interporcess provides would protect you from other threads in the same way it protects you from other processes.

Related

Visibility of change to shared memory from shm_open() + mmap()

Let's say I am on CentOS 7 x86_64 + GCC 7.
I would like to create a ringbuffer in shared memory.
If I have two processes Producer and Consumer, and both share a named shared memory, which is created/accessed through shm_open() + mmap().
If Producer writes something like:
struct Data {
uint64_t length;
char data[100];
}
to the shared memory at a random time, and the Consumer is constantly polling the shared memory to read. Will I have some sort of synchronization issue that the member length is seen but the member data is still in the progress of writing? If yes, what's the most efficient technique to avoid the issue?
I see this post:
Shared-memory IPC synchronization (lock-free)
But I would like to get a deeper, more low level of understanding what's required to synchronize between two processes efficiently.
Thanks in advance!
To avoid this, you would want to make the structure std::atomic and access it with acquire-release memory ordering. On most modern processors, the instructions this inserts are memory fences, which guarantee that the writer wait for all loads to complete before it begins writing, and that the reader wait for all stores to complete before it begins reading.
There are, in addition, locking primitives in POSIX, but the <atomic> header is newer and what you probably want.
What the Standard Says
From [atomics.lockfree], emphasis added:
Operations that are lock-free should also be address-free. That is, atomic operations on the same memory location via two different addresses will communicate atomically. The implementation should not depend on any per-process state. This restriction enables communication by memory that is mapped into a process more than once and by memory that is shared between two processes.
For lockable atomics, the standard says in [thread.rec.lockable.general], emphasis added:
An execution agent is an entity such as a thread that may perform work in parallel with other execution agents. [...] Implementations or users may introduce other kinds of agents such as processes [....]
You will sometimes see the claim that the standard supposedly makes no mention of using the <atomic> primitives with memory shared between processes, only threads. This is incorrect.
However, passing pointers to the other process through shared memory will not work, as the shared memory may be mapped to different parts of the address space, and of course a pointer to any object not in shared memory is right out. Indices and offsets of objects within shared memory will. (Or, if you really need pointers, Boost provides IPC-safe wrappers.)
Yes, you will ultimately run into data races, not only length being written and read before data is written, but also parts of those members will be written out of sync of your process reading it.
Although lock-free is the new trend, I'd suggest to go for a simpler tool as your first IPC sync job: the semaphore. On linux, the following man pages will be useful:
sem_init
sem_wait
sem_post
The idea is to have both processes signal the other one it is currently reading or writing the shared memory segment. With a semaphore, you can write inter-process mutexes:
Producer:
while true:
(opt) create resource
lock semaphore (sem_wait)
copy resource to shm
unlock semaphore (sem_post)
Consumer:
while true:
lock semaphore (sem_wait)
copy resource to local memory
or crunch resource
unlock semaphore (sem_post)
If for instance Producer is writing into shm while Consumer calls sem_wait, Consumer will block until after Producer will call sem_post, but, you have no guarantee Producer won't go for another loop, writing two times in a row before Consumer will be woke up. You have to build a mechanism unsure Producer & Consumer do work alternatively.

x64 linux, c++ threaded memory allocation : must i use mutex lock?

problem : if i use mutex lock in thread, allocation slows down significantly, but im getting proper allocation, therefore - proper data structure.
if i dont use mutex lock, i get the allocation job done much faster in threads, but get corrupted data structure.
this is closely related to my previous post that had fully working code too (with improper usage of mutex lock).
c++ linked list missing nodes after allocation in multiple threads, on x64 linux; why?
ive tried three different allocators and they all seem to slow down if i use mutex lock and if i dont, data structure gets corrupted. any suggestions ?
If multiple threads use a common data structure, e.g., some sort of memory pool, and there is at least one thread modifying the data structure, you need synchronization of some form. Whether the synchronization is based on atomics, mutexes, or other primitives is separate question.
The memory allocation mechanisms provided by the standard library (operator new() and malloc() and the other members of their respective family) are thread-safe and you don't need to do any additional synchronization. If you need to use memory allocation from a resource shared between multiple threads you create yourself you will have to synchronize even it becomes slower as a result.

Is boost::interprocess threadsafe?

Currently, I have 2 processes that communicate using the message_queue and shared_memory form boost. Everything work as attended.
Now I need to make one of this process multi threaded (thanks to boost again), and I was wondering if I need to use protection mechanism between the threads (such as mutex), or if the boost::interprocess library already provides a protection mechanism ?
I did not find any info on that on the boost documentation. By the way I'm using boost 1.40.
Thanks in advance.
The shared resources from boost::interprocess (shared memory, etc) require that you provide the necessary synchronization. The reason for this is that you may not require synchronization, and it usually a somewhat expensive operation performance wise.
Say for example that you had a process that wrote to shared memory the current statistics of something in 32-bit integer format, and a few processes that read those values. Since the values are integers (and therefore on your platform the reads and writes are atomic) and you have a single process writing them and a few process reading them, no synchronization is needed for this design.
However in some examples you will require synchronization, like if the above example had multiple writers, or if instead of integers you were using string data. There are various synchronization mechanisms inside of boost (as well as non-boost ones, but since your already using boost), described here:
[Boost info for stable version: 1.48]
http://www.boost.org/doc/libs/1_48_0/doc/html/interprocess/synchronization_mechanisms.html
[Boost info for version your using: 1.40]
http://www.boost.org/doc/libs/1_40_0/doc/html/interprocess/synchronization_mechanisms.html
With shared memory it is a common practice to place the synchronization mechanism at the base of the shared memory segment, where it can be anonymous (meaning the OS kernel does not provide access to it by name). This way all the processes know how to lock the shared memory segment, and you can associate locks with their segments (if you had multiple for example)
Remember that a mutex requires the same thread of execution (inside a process) to unlock it that locked it. If you require locking and unlocking a synchronization object from different threads of execution, you need a semaphore.
Please be sure that if you choose to use a mutex that it is an interprocess mutex (http://www.boost.org/doc/libs/1_48_0/doc/html/boost/interprocess/interprocess_mutex.html) as opposed to the mutex in the boost thread library which is for a single process with multiple threads.
You have to make sure you lock the shared resources.
You can find examples in boost documentation. For example:
http://www.boost.org/doc/libs/1_40_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.mutexes.mutexes_scoped_lock

what's the advantage of message queue over shared data in thread communication?

I read a article about multithread program design http://drdobbs.com/architecture-and-design/215900465, it says it's a best practice that "replacing shared data with asynchronous messages. As much as possible, prefer to keep each thread’s data isolated (unshared), and let threads instead communicate via asynchronous messages that pass copies of data".
What confuse me is that I don't see the difference between using shared data and message queues. I am now working on a non-gui project on windows, so let's use windows's message queues. and take a tradition producer-consumer problem as a example.
Using shared data, there would be a shared container and a lock guarding the container between the producer thread and the consumer thread. when producer output product, it first wait for the lock and then write something to the container then release the lock.
Using message queue, the producer could simply PostThreadMessage without block. and this is the async message's advantage. but I think there must exist some lock guarding the message queue between the two threads, otherwise the data will definitely corrupt. the PostThreadMessage call just hide the details. I don't know whether my guess is right but if it's true, the advantage seems no longer exist,since both two method do the same thing and the only difference is that the system hide the details when using message queues.
ps. maybe the message queue use a non-blocking containner, but I could use a concurrent container in the former way too. I want to know how the message queue is implemented and is there any performance difference bwtween the two ways?
updated:
I still don't get the concept of async message if the message queue operations are still blocked somewhere else. Correct me if my guess was wrong: when we use shared containers and locks we will block in our own thread. but when using message queues, myself's thread returned immediately, and left the blocking work to some system thread.
Message passing is useful for exchanging smaller amounts of data, because no conflicts need be avoided. It's much easier to implement than is shared memory for intercomputer communication. Also, as you've already noticed, message passing has the advantage that application developers don't need to worry about the details of protections like shared memory.
Shared memory allows maximum speed and convenience of communication, as it can be done at memory speeds when within a computer. Shared memory is usually faster than message passing, as message-passing are typically implemented using system calls and thus require the more time-consuming tasks of kernel intervention. In contrast, in shared-memory systems, system calls are required only to establish shared-memory regions. Once established, all access are treated as normal memory accesses w/o extra assistance from the kernel.
Edit: One case that you might want implement your own queue is that there are lots of messages to be produced and consumed, e.g., a logging system. With the implemenetation of PostThreadMessage, its queue capacity is fixed. Messages will most liky get lost if that capacity is exceeded.
Imagine you have 1 thread producing data,and 4 threads processing that data (presumably to make use of a multi core machine). If you have a big global pool of data you are likely to have to lock it when any of the threads needs access, potentially blocking 3 other threads. As you add more processing threads you increase the chance of a lock having to wait and increase how many things might have to wait. Eventually adding more threads achieves nothing because all you do is spend more time blocking.
If instead you have one thread sending messages into message queues, one for each consumer thread then they can't block each other. You stil have to lock the queue between the producer and consumer threads but as you have a separate queue for each thread you have a separate lock and each thread can't block all the others waiting for data.
If you suddenly get a 32 core machine you can add 20 more processing threads (and queues) and expect that performance will scale fairly linearly unlike the first case where the new threads will just run into each other all the time.
I have used a shared memory model where the pointers to the shared memory are managed in a message queue with careful locking. In a sense, this is a hybrid between a message queue and shared memory. This is very when large quantities of data must be passed between threads while retaining the safety of the message queue.
The entire queue can be packaged in a single C++ class with appropriate locking and the like. The key is that the queue owns the shared storage and takes care of the locking. Producers acquire a lock for input to the queue and receive a pointer to the next available storage chunk (usually an object of some sort), populates it and releases it. The consumer will block until the next shared object has released by the producer. It can then acquire a lock to the storage, process the data and release it back to the pool. In A suitably designed queue can perform multiple producer/multiple consumer operations with great efficiency. Think a Java thread safe (java.util.concurrent.BlockingQueue) semantics but for pointers to storage.
Of course there is "shared data" when you pass messages. After all, the message itself is some sort of data. However, the important distinction is when you pass a message, the consumer will receive a copy.
the PostThreadMessage call just hide the details
Yes, it does, but being a WINAPI call, you can be reasonably sure that it does it right.
I still don't get the concept of async message if the message queue operations are still blocked somewhere else.
The advantage is more safety. You have a locking mechanism that is systematically enforced when you are passing a message. You don't even need to think about it, you can't forget to lock. Given that multi-thread bugs are some of the nastiest ones (think of race conditions), this is very important. Message passing is a higher level of abstraction built on locks.
The disadvantage is that passing large amounts of data would be probably slow. In that case, you need to use need shared memory.
For passing state (i.e. worker thread reporting progress to the GUI) the messages are the way to go.
It's quite simple (I'm amazed others wrote such length responses!):
Using a message queue system instead of 'raw' shared data means that you have to get the synchronization (locking/unlocking of resources) right only once, in a central place.
With a message-based system, you can think in higher terms of "messages" without having to worry about synchronization issues anymore. For what it's worth, it's perfectly possible that a message queue is implemented using shared data internally.
I think this is the key piece of info there: "As much as possible, prefer to keep each thread’s data isolated (unshared), and let threads instead communicate via asynchronous messages that pass copies of data". I.e. use producer-consumer :)
You can do your own message passing or use something provided by the OS. That's an implementation detail (needs to be done right ofc). The key is to avoid shared data, as in having the same region of memory modified by multiple threads. This can cause hard to find bugs, and even if the code is perfect it will eat performance because of all the locking.
I had exact the same question. After reading the answers. I feel:
in most typical use case, queue = async, shared memory (locks) = sync. Indeed, you can do a async version of shared memory, but that's more code, similar to reinvent the message passing wheel.
Less code = less bug and more time to focus on other stuff.
The pros and cons are already mentioned by previous answers so I will not repeat.

Thread Safe Data and Thread Safe Containers

Hi Guys I want to know what is the difference between thread safe Data and Thread Safe Containers
Thread safe data:
Generally refers to data which is protected using mutexes, semaphores or other similar constructs.
Data is considered thread safe if measures have been put in place to ensure that:
It can be modified from multiple threads in a controlled manner, to ensure the resultant data structure doesn't becoming corrupt, or lead to race conditions in the code.
It can be read in a reliable fashion without the data become corrupt during the read process. This is especially important with STL-style containers which use iterators.
Mutexes generally work by blocking access to other threads while one thread is modifying shared data. This is also known as a critical section, and RAII is a common design pattern used in conjunction with critical sections.
Depending on the CPU type, some primitive data types (e.g. int) and operations (increment) might not need mutex protection (e.g. if they resolve down to an atomic instruction in machine language). However:
It is bad practice to make any assumptions about CPU architecture.
You should always code defensively to ensure code will remain thread-safe regardless of the target platform.
Thread safe containers:
Are containers which have measures in place to ensure that any changes made to them occur in a thread-safe manner.
For example, a thread safe container may allow items to be inserted or removed using a specific set of public methods which ensure that any code which uses it is thread-safe.
In other words, the container class provides the mutex protection as a service to the caller, and the user doesn't have to roll their own.