Ways to share a variable among threads - c++

I have a general question about parallel programming in C and C++ and would appreciate it if you could answer it. As far as I know, we can declare a variable in at least one level higher (parent thread) to share it among children threads. So, I was wondering if there is any other way to share a variable among threads with the same parent thread? Is this API dependant or not?

For Posix threads, read some pthread tutorial.
For C++11, read the documentation of its thread library
All threads of the same process share the same address space in virtual memory. As commented by Marco A. consider also thread_local variables.
Notice that you share data or memory (not variables, which exist only in the source code)
In practice, you'll better protect with a mutex the shared data (for synchronization) to avoid data races.
In the simple case, the mutex and the shared data are in some global variables.
You could also use atomic operations.
BTW, you could also develop a parallel application using some message passing paradigm, e.g. using MPI (or simply using some RPC or other messages, e.g. JSON on sockets). You might consider for regular numerical applications to use the GPGPU e.g. using OpenCL. And of course you might mix all the approaches (using OpenCL, with several threads, and having your parallel software running in several such processes communicating with MPI).
Debugging a heavily parallel software can become a nightmare. Performance may depend upon the hardware system and may require tricky tuning. scalability and synchronization may becoming a growing concern.map-reduce is often a useful model.

In C++ and C any memory location (identified by a variable) can be shared among threads. The memory space is the same across all threads. There is no parent/child thread relationship with memory.
The challenge is to control or synchronize access to the memory location among the threads.
That is implementation dependent.

Any global variable is sharable among threads, since threads are light weight processes sharing the same address space. For synchronization, you need to ensure mutual exclusion while updating/accessing those global variables through semaphores or wait notify blocks.

Related

POSIX Shared Memory Sync Across Processes C++/C++11

Problem (in short):
I'm using POSIX Shared Memory and currently just used POSIX semaphores and i need to control multiple readers, multiple writers. I need help with what variables/methods i can use to control access within the limitations described below.
I've found an approach that I want to implement but i'm unsure of what methodology i can use to implement it when using POSIX Shared memory.
What I've Found
https://stackoverflow.com/a/28140784
This link has the algorithm i'd like to use but i'm unsure how to implement it with shared memory. Do i store the class in shared memory somehow? This is where I need help please.
The reason I'm unsure is a lot of my research, points towards keeping shared memory to primitives only to avoid addressing problems and STL objects can't be used.
NOTE:
For all my multi-threading i'm using C++11 features. This shared memory will be completely seperate program executables using C++11 std::threads from which any thread of any process/executable will want access. I have avoided the Linux pthread for any of my multi-threading and will continue to do so (except if its just control variable not actual pThreads).
Solution Parameters aimed for
Must be shareable between 2+ processes which will be running multiple C++11 std::thread that may wish access. I.e. Multiple Writers (exclusive one at a time) while allowing multiple simultaneous readers when no writer wants access.
Not using BOOST libraries. Ideally native C++11 or built in linux libraries, something that will work without the need to install abstract libraries.
Not using pThread actual threads but could use some object from there that will work with C++11 std::thread.
Ideally can handle a process crash while in operation. E.g. Using POSIX semaphore if a process crashes while it has the semaphore, everyone is screwed. I have seen people using file locks?
Thanks in advance
keeping shared memory to primitives only to avoid addressing problems
You can use pointers in and to shared memory objects across programs, so long as the memory is mmaped to the same address. This is actually a straightforward proposition, especially on 64 bit. See this open source C library I wrote for implementation details: rszshm - resizable pointer-safe shared memory.
Using POSIX semaphore if a process crashes while it has the semaphore, everyone is screwed.
If you want to use OS mediated semaphores, the SysV semaphores have SEM_UNDO, which recovers in this case. OTOH pthread offers robust mutexes that can be embedded and shared in shared memory. This can be used to build more sophisticated mechanisms.
The SysV scheme of providing multiple semaphores in a semaphore set, where a group of actions must all succeed, or the call blocks, permits building sophisticated mechanism too. A read/write lock can be made with a set of three semaphores.

lock freedom/atomic operations across 2 processes instead of threads

I am sharing some data across multiple processes by using shared memory; I use inter processes mutexes to achieve synchronization.
My question is the following: is it possible to use lock-free data structures AND/OR atomic operations to achieve faster synchronization without using mutexes between 2 processes?
If not do you know what is the main reason for this?
They are used only to synchronize threads of the same process. Are these concepts portable to processes as well? If they aren't do you know any faster method to share/synchronize data across processes?
Are these concepts portable to processes as well?
Yes, atomic operations are universal both for threads and processes, IIF the memory atomically used is shared.
Atomic operation is specific instruction of processor itself and in knows nothing about threads or processes, it is just All-or-nothing (indivisible) complex of actions (read; compare; store) with low-level hardware implementation.
So, you can setup shared memory between processes and put an atomic_t into it.
lock-free
Yes, if lock-free is implemented only with atomic. (It should)
data structures
You should check, that shared memory is mapped to the same address in both processes when it is used to store pointers (in data structures).
If the memory will be mapped to different address, pointers will be broken in another process. In this case you need to use relative addresses, and do simple memory translation.
inter processes mutexes
And I should say that glibc>2.4 (NPTL) uses futex combined with atomic operations for non-contended lock (for Process shared mutexes = inter process mutexes). So, you already use atomic operations in shared memory.
On x86 with NPTL, most of the synchronization primitives have as their fast path just a single interlocked operation with a full memory barrier. Since x86 platforms don't really have anything lighter than that, they are already about the best you can do. Unless the existing atomic operations do exactly what you need to do, there will be no performance boost to pay back the costs of using the semantically lighter primitive.

Is there a disadvantage to using boost::interprocess::interprocess_semaphore within a single multithreaded c++ process?

The disadvantage would be in comparison to a technique that was specialized to work on threads that are running within the same process. For example, does wait/post cause the whole process to yield, rather than just the executing thread, even though anyone waiting for a post would be within the same process?
The semaphore would be used, for example, to solve a producer/consumer problem in a shared buffer between two threads in the same process.
Are there any reasonable alternatives?
Use Boost.Thread condition variables as shown here. The accompanying article has a good summary of Boost.Thread features.
Using interprocess semaphores will work but it's likely to place a tax on your execution due to use of unnecessarily heavyweight underlying OS locking primitives (named kernel objects in Windows, for example).

Do pthread mutexes work across threads if in shared memory?

I found this:
Fast interprocess synchronization method
I used to believe that a pthread mutex can only be shared between two threads in the same address space.
The question / answers there seems to imply:
If I have two separate proceses A & B. They have a shared memory region M. I can put a pThread mutex in M, lock in A, lock in B, unlock in A; and B will no longer block on the mutex. Is this correct? Can pThread mutexes be shared in two separate processes?
Edit: I'm using C++, on MacOSX.
You need to tell the mutex to be process-shared when it's inited:
http://www.opengroup.org/onlinepubs/007908775/xsh/pthread_mutexattr_setpshared.html
Note in particular, "The default value of the attribute is PTHREAD_PROCESS_PRIVATE", meaning that accessing it from different processes is undefined behaviour.
If your C/pthread library is conforming, you should be able to tell if it supports mutexes shared across multiple process by checking if the _POSIX_THREAD_PROCESS_SHARED feature test macro is defined to a value other than -1 or by querying the system configuration at run-time using sysconf(_SC_THREAD_PROCESS_SHARED) if that feature test macro is undefined.
EDIT: As Steve pointed out, you'll need to explicitly configure the mutex for sharing across processes assuming the platform supports that feature as I described above.
I was concerned that there might be a condition where a mutex in shared memory might fail to behave properly, so I did some digging and came up with some documents which treat the issue like a no-brainer:
https://computing.llnl.gov/tutorials/pthreads/
Further digging, however, showed that older versions of glibc suffered issues in shared memory mutexes: (This is an ancient change, but it illustrates the point.)
in linuxthreads/mutex.c
int __pthread_mutexattr_setpshared(...) {
/* For now it is not possible to shared a conditional variable. */
if (pshared != PTHREAD_PROCESS_PRIVATE)
return ENOSYS;
}
Without more detail on what implementation of pthread you're using, it's difficult to say whether you're safe or not.
My cause for concern is that many implementations (and some entire languages, like perl, python, and ruby) have a global lock object that manages access to shared objects. That object would not be shared between processes and therefore, while your mutexes would probably work most of the time, you might find yourself having two processes simultaneously manipulating the mutex at the same time.
I know that this flies in the face of the definition of a mutex but it is possible:
If two threads are operating at the same time in different processes, it implies that they are on different cores. Both acquire their global lock object and go to manipulate the mutex in shared memory. If the pthread implementation forces the update of the mutex through the caches, both threads could end up updating at the same time, both thinking they hold the mutex. This is just a possible failure vector that comes to mind. There could be any number of others. What are the specifics of your situation - OS, pthreads version, etc.?

Thread communication theory

What is the common theory behind thread communication? I have some primitive idea about how it should work but something doesn't settle well with me. Is there a way of doing it with interrupts?
Really, it's just the same as any concurrency problem: you've got multiple threads of control, and it's indeterminate which statements on which threads get executed when. That means there are a large number of POTENTIAL execution paths through the program, and your program must be correct under all of them.
In general the place where trouble can occur is when state is shared among the threads (aka "lightweight processes" in the old days.) That happens when there are shared memory areas,
To ensure correctness, what you need to do is ensure that these data areas get updated in a way that can't cause errors. To do this, you need to identify "critical sections" of the program, where sequential operation must be guaranteed. Those can be as little as a single instruction or line of code; if the language and architecture ensure that these are atomic, that is, can't be interrupted, then you're golden.
Otherwise, you idnetify that section, and put some kind of guards onto it. The classic way is to use a semaphore, which is an atomic statement that only allows one thread of control past at a time. These were invented by Edsgar Dijkstra, and so have names that come from the Dutch, P and V. When you come to a P, only one thread can proceed; all other threads are queued and waiting until the executing thread comes to the associated V operation.
Because these primitives are a little primitive, and because the Dutch names aren't very intuitive, there have been some ther larger-scale approaches developed.
Per Brinch-Hansen invented the monitor, which is basically just a data structure that has operations which are guaranteed atomic; they can be implemented with semaphores. Monitors are pretty much what Java synchronized statements are based on; they make an object or code block have that particular behavir -- that is, only one thread can be "in" them at a time -- with simpler syntax.
There are other modeals possible. Haskell and Erlang solve the problem by being functional languages that never allow a variable to be modified once it's created; this means they naturally don't need to wory about synchronization. Some new languages, like Clojure, instead have a structure called "transactional memory", which basically means that when there is an assignment, you're guaranteed the assignment is atomic and reversible.
So that's it in a nutshell. To really learn about it, the best places to look at Operating Systems texts, like, eg, Andy Tannenbaum's text.
The two most common mechanisms for thread communication are shared state and message passing.
THe most common way for threads to communicate is via some shared data structure, typically a queue. Some threads put information into the queue while others take it out. The queue must be protected by operating system facilities such as mutexes and semaphores. Interrupts have nothing to do with it.
If you're really interested in a theory of thread communications, you may want to look into formalisms like the pi Calculus.
To communicate between threads, you'll need to use whatever mechanism is supplied by your operating system and/or runtime. Interrupts would be unusually low level, although they might be used implicitly if your threads communicate using sockets or named pipes.
A common pattern would be to implement shared state using a shared memory block, relying on an os-supplied synchronization primitive such as a mutex to spare you from busy-waiting when your read from the block. Remember that if you have threads at all, then you must have some kind of scheduler already (whether it's native from the OS or emulated in your language runtime). So this scheduler can provide synchronization objects and a "sleep" function without necessarily having to rely on hardware support.
Sockets, pipes, and shared memory work between processes too. Sometimes a runtime will give you a lighter-weight way of doing synchronization for threads within the same process. Shared memory is cheaper within a single process. And sometimes your runtime will also give you an atomic message-passing mechanism.