x64 linux, c++ threaded memory allocation : must i use mutex lock? - c++

problem : if i use mutex lock in thread, allocation slows down significantly, but im getting proper allocation, therefore - proper data structure.
if i dont use mutex lock, i get the allocation job done much faster in threads, but get corrupted data structure.
this is closely related to my previous post that had fully working code too (with improper usage of mutex lock).
c++ linked list missing nodes after allocation in multiple threads, on x64 linux; why?
ive tried three different allocators and they all seem to slow down if i use mutex lock and if i dont, data structure gets corrupted. any suggestions ?

If multiple threads use a common data structure, e.g., some sort of memory pool, and there is at least one thread modifying the data structure, you need synchronization of some form. Whether the synchronization is based on atomics, mutexes, or other primitives is separate question.
The memory allocation mechanisms provided by the standard library (operator new() and malloc() and the other members of their respective family) are thread-safe and you don't need to do any additional synchronization. If you need to use memory allocation from a resource shared between multiple threads you create yourself you will have to synchronize even it becomes slower as a result.

Related

C/C++ arrays with threads - do I need to use mutexes or locks?

I am new to using threads and have read a lot about how data is shared and protected. But I have also not really got a good grasp of using mutexes and locks to protect data.
Below is a description of the problem I will be working on. The important thing to note is that it will be time-critical, so I need to reduce overheads as much as possible.
I have two fixed-size double arrays.
The first array will provide data for subsequent calculations.
Threads will read values from it, but it will never be modified. An element may be read at some time by any of the threads.
The second array will be used to store the results of the calculations performed by the threads. An element of this array will only ever be updated by one thread, and probably only once when the result value
is written to it.
My questions then:
Do I really need to use a mutex in a thread each time I access the data from the read-only array? If so, could you explain why?
Do I need to use a mutex in a thread when it writes to the result array even though this will be the only thread that ever writes to this element?
Should I use atomic data types, and will there be any significant time overhead if I do?
Many answers to this type of question seem to be - no, you don't need the mutex if your variables are aligned. Would my array elements in this example be aligned, or is there some way to ensure they are?
The code will be implemented on 64bit Linux. I am planning on using Boost libraries for multithreading.
I have been mulling this over and looking all over the web for days, and once posted, the answer and clear explanations came back in literally seconds. There is an "accepted answer," but all the answers and comments were equally helpful.
Do I really need to use a mutex in a thread each time I access the data from the read-only array? If so could you explain why?
No. Because the data is never modified, there cannot be synchronization problem.
Do I need to use a mutex in a thread when it writes to the result array even though this will be the only thread that ever writes to this element?
Depends.
If any other thread is going to read the element, you need synchronization.
If any thread may modify the size of the vector, you need synchronization.
In any case, take care of not writing into adjacent memory locations by different threads a lot. That could destroy the performance. See "false sharing". Considering, you probably don't have a lot of cores and therefore not a lot of threads and you say write is done only once, this is probably not going to be a significant problem though.
Should I use atomic data types and will there be any significant time over head if I do?
If you use locks (mutex), atomic variables are not necessary (and they do have overhead). If you need no synchronization, atomic variables are not necessary. If you need synchronization, then atomic variables can be used to avoid locks in some cases. In which cases can you use atomics instead of locks... is more complicated and beyond the scope of this question I think.
Given the description of your situation in the comments, it seems that no synchronization is required at all and therefore no atomics nor locks.
...Would my array elements in this example be aligned, or is there some way to ensure they are?
As pointed out by Arvid, you can request specific alignment using the alginas keyword which was introduced in c++11. Pre c++11, you may resort to compiler specific extensions: https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gcc/Variable-Attributes.html
Under the two conditions given, there's no need for mutexes. Remember every use of a mutex (or any synchronization construct) is a performance overhead. So you want to avoid them as much as possible (without compromising correct code, of course).
No. Mutexes are not needed since threads are only reading the array.
No. Since each thread only writes to a distinct memory location, no race condition is possible.
No. There's no need for atomic access to objects here. In fact, using atomic objects could affect the performance negatively as it prevents the optimization possibilities such as re-ordering operations.
The only time you need to use Locks is when data is modified on a shared resource. Eg if some threads where used to write data and some used to read data (in both cases from the same resource) then you only need a lock for when writing is done. This is to prevent whats known as "race".
There is good information of race on google for when you make programs that manipulate data on a shared resource.
You are on the right track.
1) For the first array (read only) , you do not need to utilize a mutex lock for it. Since the threads are just reading not altering the data there is no way a thread can corrupt the data for another thread
2) I'm a little confused by this question. If you know that thread 1 will only write an element to array slot 1 and thread 2 will only write to array slot 2 then you do not need a mutex lock. However I'm not sure how your achieving this property. If my above statement is not correct for your situation you would definitely need a mutex lock.
3) Given the definition of atomic:
Atomic types are types that encapsulate a value whose access is guaranteed to not cause data races and can be used to synchronize memory accesses among different threads.
Key note, a mutex lock is atomic meaning that there is only 1 assembly instruction needed to grab/release a lock. If it required 2 assembly instructions to grab/release a lock, a lock would not be thread safe. For example, if thread 1 attempted to grab a lock and was switched to thread 2, thread 2 would grab the lock.
Use of atomic data types would decrease your overhead but not significantly.
4) I'm not sure how you can assure your variables are lined. Since threads can switch at any moment in your program (Your OS determines when a thread switches)
Hope this helps

Multithreading library that can share pointer data safely?

Basically what I want to achieve is to share a dynamically allocated array of state flags among different threads to control the interactions between threads.
Are there any library that can achieve this flawlessly in Windows OS?
I tried Open MP, and it gives me all kinds of weird bugs and lots headache, even with omp flush all sometimes the data are still not up-to-date, volatile pointers didnt help either when the freqency of accesses are high,so the program become very unstable and inconsistent.
Are there any libraries that can handle shared and freqently updated and accessed data array (dynamic) better? Can TBB handle this situation?
Threads of the same process share the same heap, so memory allocated on this heap can be shared between those threads.
All the program needs to asure is protecting such "shared" memory against concurrent access.
The latter can be achieved by using locks, like mutexes.
The common solution is to use mutexes. The basic idea is to wrap any access to a shared variable with a critical section, ie. a mutex lock:
WaitForSingleObject(mutexHandle);
// shared data access & modification
ReleaseMutex(mutexHandle);
CreateMutex
WaitForSingleObject
Tutorial
If you have access to C++11, try using std::atomic<T> types, which let you share primitive types with atomic access semantics.
std::atomic

Do these Boost::Interprocess components require synchronisation?

I am building a multi-producer/single-consumer application using IPC, implemented using Boost.Interprocess.
Each producer sends a message by allocating a block inside shared memory (managed_shared_memory::allocate), and marshalling an object into that block. Then, it sends a small object via a message_queue which holds the location (offset) of the block and the size.
The consumer receives this indicator from the queue and unmarshalls the object. The consumer is responsible for deallocating the block of memory.
Based on this implementation, I don't believe the objects or blocks as they exist in memory need synchronisation, because as soon as the consumer knows about them, the producer will no longer touch them. Therefore, I believe it is only the internals of message_queue and managed_shared_memory which need synchronisation.
My question is: Bearing in mind that each process is single-threaded, do allocate/deallocate, and send/receive calls need synchronisation?
The Boost examples provided by the documentation don't make use of synchronisation for the message queue, but I think this is only to simplify the sample source.
I have seen this question, but it is asking about thread-safety, and not about these specific components of Boost.Interprocess.
You do not need to use any kind of locking to protect these, operations. They are already protected using a recursive mutex in shared memory, otherwise multiple processes would not be able to operate in the same shared memory block at the same time.
Regarding managed_shared_memory:
One of the features of named/unique allocations/searches/destructions
is that they are atomic. Named allocations use the recursive
synchronization scheme defined by the internal mutex_family typedef
defined of the memory allocation algorithm template parameter
(MemoryAlgorithm). That is, the mutex type used to synchronize
named/unique allocations is defined by the
MemoryAlgorithm::mutex_family::recursive_mutex_type type. For shared
memory, and memory mapped file based managed segments this recursive
mutex is defined as boost::interprocess::interprocess_recursive_mutex.
This also extends to raw allocation, you can verify this by looking at boost/interprocess/mem_algo/detail/simple_seq_fit.hpp.
For message queue, boost::interprocess considers this to be a synchronization mechanism in the same way a mutex is, it will take care of all the necessary guarantees, locking its internal datastructures and issuing memory barriers as required.
Furthermore this is all equally applicable to multithreaded programming. Even if you were calling send or allocate from multiple threads in the same program everything would be fine. The locking boost::interporcess provides would protect you from other threads in the same way it protects you from other processes.

Is boost::interprocess threadsafe?

Currently, I have 2 processes that communicate using the message_queue and shared_memory form boost. Everything work as attended.
Now I need to make one of this process multi threaded (thanks to boost again), and I was wondering if I need to use protection mechanism between the threads (such as mutex), or if the boost::interprocess library already provides a protection mechanism ?
I did not find any info on that on the boost documentation. By the way I'm using boost 1.40.
Thanks in advance.
The shared resources from boost::interprocess (shared memory, etc) require that you provide the necessary synchronization. The reason for this is that you may not require synchronization, and it usually a somewhat expensive operation performance wise.
Say for example that you had a process that wrote to shared memory the current statistics of something in 32-bit integer format, and a few processes that read those values. Since the values are integers (and therefore on your platform the reads and writes are atomic) and you have a single process writing them and a few process reading them, no synchronization is needed for this design.
However in some examples you will require synchronization, like if the above example had multiple writers, or if instead of integers you were using string data. There are various synchronization mechanisms inside of boost (as well as non-boost ones, but since your already using boost), described here:
[Boost info for stable version: 1.48]
http://www.boost.org/doc/libs/1_48_0/doc/html/interprocess/synchronization_mechanisms.html
[Boost info for version your using: 1.40]
http://www.boost.org/doc/libs/1_40_0/doc/html/interprocess/synchronization_mechanisms.html
With shared memory it is a common practice to place the synchronization mechanism at the base of the shared memory segment, where it can be anonymous (meaning the OS kernel does not provide access to it by name). This way all the processes know how to lock the shared memory segment, and you can associate locks with their segments (if you had multiple for example)
Remember that a mutex requires the same thread of execution (inside a process) to unlock it that locked it. If you require locking and unlocking a synchronization object from different threads of execution, you need a semaphore.
Please be sure that if you choose to use a mutex that it is an interprocess mutex (http://www.boost.org/doc/libs/1_48_0/doc/html/boost/interprocess/interprocess_mutex.html) as opposed to the mutex in the boost thread library which is for a single process with multiple threads.
You have to make sure you lock the shared resources.
You can find examples in boost documentation. For example:
http://www.boost.org/doc/libs/1_40_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.mutexes.mutexes_scoped_lock

How do I protect a character string in shared memory between two processes?

I have a piece of shared memory that contains a char string and an integer between two processes.
Process A writes to it and Process B reads it (and not vice versa)
What is the most efficient and effective way to make sure that Process A doesn't happen to update (write to it) that same time Process B is reading it? (Should I just use flags in the shared memory, use semaphores, critical section....)
If you could point me in the right direction, I would appreciate it.
Thanks.
Windows, C++
You cannot use a Critical Section because these can only be used for synchronization between threads within the same process. For inter process synchronization you need to use a Mutex or a Semaphore. The difference between these two is that the former allows only a single thread to own a resource, while the latter can allow up to a maximum number (specified during creation) to own the resource simultaneously.
In your case a Mutex seems appropriate.
Since you have two processes you need a cross-process synchronisation object. I think this means that you need to use a mutex.
A mutex object facilitates protection against data races and allows
thread-safe synchronization of data between threads. A thread obtains
ownership of a mutex object by calling one of the lock functions and
relinquishes ownership by calling the corresponding unlock function.
If you are using boost thread, you can use it's mutex and locking, more to read see the link below:
http://www.boost.org/doc/libs/1_47_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_types
Since you're talking about two processes, system-wide mutexes will work, and Windows has those. However, they aren't necessarily the most efficient way.
If you can put more things in shared memory, then passing data via atomic operations on flags in that memory should be the most efficient thing to do. For instance, you might use the Interlocked functions to implement Dekker's Algorithm (you'll probably want to use something like YieldProcessor() to avoid busy waiting).