Thread communication theory - c++

What is the common theory behind thread communication? I have some primitive idea about how it should work but something doesn't settle well with me. Is there a way of doing it with interrupts?

Really, it's just the same as any concurrency problem: you've got multiple threads of control, and it's indeterminate which statements on which threads get executed when. That means there are a large number of POTENTIAL execution paths through the program, and your program must be correct under all of them.
In general the place where trouble can occur is when state is shared among the threads (aka "lightweight processes" in the old days.) That happens when there are shared memory areas,
To ensure correctness, what you need to do is ensure that these data areas get updated in a way that can't cause errors. To do this, you need to identify "critical sections" of the program, where sequential operation must be guaranteed. Those can be as little as a single instruction or line of code; if the language and architecture ensure that these are atomic, that is, can't be interrupted, then you're golden.
Otherwise, you idnetify that section, and put some kind of guards onto it. The classic way is to use a semaphore, which is an atomic statement that only allows one thread of control past at a time. These were invented by Edsgar Dijkstra, and so have names that come from the Dutch, P and V. When you come to a P, only one thread can proceed; all other threads are queued and waiting until the executing thread comes to the associated V operation.
Because these primitives are a little primitive, and because the Dutch names aren't very intuitive, there have been some ther larger-scale approaches developed.
Per Brinch-Hansen invented the monitor, which is basically just a data structure that has operations which are guaranteed atomic; they can be implemented with semaphores. Monitors are pretty much what Java synchronized statements are based on; they make an object or code block have that particular behavir -- that is, only one thread can be "in" them at a time -- with simpler syntax.
There are other modeals possible. Haskell and Erlang solve the problem by being functional languages that never allow a variable to be modified once it's created; this means they naturally don't need to wory about synchronization. Some new languages, like Clojure, instead have a structure called "transactional memory", which basically means that when there is an assignment, you're guaranteed the assignment is atomic and reversible.
So that's it in a nutshell. To really learn about it, the best places to look at Operating Systems texts, like, eg, Andy Tannenbaum's text.

The two most common mechanisms for thread communication are shared state and message passing.

THe most common way for threads to communicate is via some shared data structure, typically a queue. Some threads put information into the queue while others take it out. The queue must be protected by operating system facilities such as mutexes and semaphores. Interrupts have nothing to do with it.

If you're really interested in a theory of thread communications, you may want to look into formalisms like the pi Calculus.

To communicate between threads, you'll need to use whatever mechanism is supplied by your operating system and/or runtime. Interrupts would be unusually low level, although they might be used implicitly if your threads communicate using sockets or named pipes.
A common pattern would be to implement shared state using a shared memory block, relying on an os-supplied synchronization primitive such as a mutex to spare you from busy-waiting when your read from the block. Remember that if you have threads at all, then you must have some kind of scheduler already (whether it's native from the OS or emulated in your language runtime). So this scheduler can provide synchronization objects and a "sleep" function without necessarily having to rely on hardware support.
Sockets, pipes, and shared memory work between processes too. Sometimes a runtime will give you a lighter-weight way of doing synchronization for threads within the same process. Shared memory is cheaper within a single process. And sometimes your runtime will also give you an atomic message-passing mechanism.

Related

Cancelling arbitary jobs running in a thread_pool

Is there a way for a thread-pool to cancel a task underway? Better yet, is there a safe alternative for on-demand cancelling opaque function calls in thread_pools?
Killing the entire process is a bad idea and using native handle to perform pthread_cancel or similar API is a last resort only.
Extra
Bonus if the cancellation is immediate, but it's acceptable if the cancellation has some time constraint 'guarantees' (say cancellation within 0.1 execution seconds of the thread in question for example)
More details
I am not restricted to using Boost.Thread.thread_pool or any specific library. The only limitation is compatibility with C++14, and ability to work on at least BSD and Linux based OS.
The tasks are usually data-processing related, pre-compiled and loaded dynamically using C-API (extern "C") and thus are opaque entities. The aim is to perform compute intensive tasks with an option to cancel them when the user sends interrupts.
While launching, the thread_id for a specific task is known, and thus some API can be sued to find more details if required.
Disclaimer
I know using native thread handles to cancel/exit threads is not recommended and is a sign of bad design. I also can't modify the functions using boost::this_thread::interrupt_point, but can wrap them in lambdas/other constructs if that helps. I feel like this is a rock and hard place situation, so alternate suggestions are welcome, but they need to be minimally intrusive in existing functionality, and can be dramatic in their scope for the feature-set being discussed.
EDIT:
Clarification
I guess this should have gone in the 'More Details' section, but I want it to remain separate to show that existing 2 answers are based o limited information. After reading the answers, I went back to the drawing board and came up with the following "constraints" since the question I posed was overly generic. If I should post a new question, please let me know.
My interface promises a "const" input (functional programming style non-mutable input) by using mutexes/copy-by-value as needed and passing by const& (and expecting thread to behave well).
I also mis-used the term "arbitrary" since the jobs aren't arbitrary (empirically speaking) and have the following constraints:
some which download from "internet" already use a "condition variable"
not violate const correctness
can spawn other threads, but they must not outlast the parent
can use mutex, but those can't exist outside the function body
output is via atomic<shared_ptr> passed as argument
pure functions (no shared state with outside) **
** can be lambda binding a functor, in which case the function needs to makes sure it's data structures aren't corrupted (which is the case as usually, the state is a 1 or 2 atomic<inbuilt-type>). Usually the internal state is queried from an external db (similar architecture like cookie + web-server, and the tab/browser can be closed anytime)
These constraints aren't written down as a contract or anything, but rather I generalized based on the "modules" currently in use. The jobs are arbitrary in terms of what they can do: GPU/CPU/internet all are fair play.
It is infeasible to insert a periodic check because of heavy library usage. The libraries (not owned by us) haven't been designed to periodically check a condition variable since it'd incur a performance penalty for the general case and rewriting the libraries is not possible.
Is there a way for a thread-pool to cancel a task underway?
Not at that level of generality, no, and also not if the task running in the thread is implemented natively and arbitrarily in C or C++. You cannot terminate a running task prior to its completion without terminating its whole thread, except with the cooperation of the task.
Better
yet, is there a safe alternative for on-demand cancelling opaque
function calls in thread_pools?
No. The only way to get (approximately) on-demand preemption of a specific thread is to deliver a signal to it (that is is not blocking or ignoring) via pthread_kill(). If such a signal terminates the thread but not the whole process then it does not automatically make any provision for freeing allocated objects or managing the state of mutexes or other synchronization objects. If the signal does not terminate the thread then the interruption can produce surprising and unwanted effects in code not designed to accommodate such signal usage.
Killing the entire process is a bad idea and using native handle to
perform pthread_cancel or similar API is a last resort only.
Note that pthread_cancel() can be blocked by the thread, and that even when not blocked, its effects may be deferred indefinitely. When the effects do occur, they do not necessarily include memory or synchronization-object cleanup. You need the thread to cooperate with its own cancellation to achieve these.
Just what a thread's cooperation with cancellation looks like depends in part on the details of the cancellation mechanism you choose.
Cancelling a non cooperative, not designed to be cancelled component is only possible if that component has limited, constrained, managed interactions with the rest of the system:
the ressources owned by the components should be managed externally (the system knows which component uses what resources)
all accesses should be indirect
the modifications of shared ressources should be safe and reversible until completion
That would allow the system to clean up resource, stop operations, cancel incomplete changes...
None of these properties are cheap; all the properties of threads are the exact opposite of these properties.
Threads only have an implied concept of ownership apparent in the running thread: for a deleted thread, determining what was owned by the thread is not possible.
Threads access shared objects directly. A thread can start modifications of shared objects; after cancellation, such modifications that would be partial, non effective, incoherent if stopped in the middle of an operation.
Cancelled threads could leave locked mutexes around. At least subsequent accesses to these mutexes by other threads trying to access the shared object would deadlock.
Or they might find some data structure in a bad state.
Providing safe cancellation for arbitrary non cooperative threads is not doable even with very large scale changes to thread synchronization objects. Not even by a complete redesign of the thread primitives.
You would have to make thread almost like full processes to be able to do that; but it wouldn't be called a thread then!

How do functional languages handle shared state data?

I've been learning about functional programming and see that it can certainly make parallelism easier to handle, but I do not see how it makes handling shared resources easier. I've seen people talk about variable immutability being a key factor, but how does that help two threads accessing the same resource? Say two threads are adding a request to a queue. They both get a copy of the queue, make a new copy with their request added (since the queue is immutable), and then return the new queue. The first request to return will be overridden by the second, as the copies of the queue each thread got did not have the other thread's request present. So I assume there is a locking mechanism a la mutex available in functional languages? How then does that differ from an imperative approach to the problem? Or do practical applications of functional programming still require some imperative operations to handle shared resources?
As soon as your global data can be updated. you're breaking the pure functional paradigm. In that sense, you need some sort of imperative structure. However, this is important enough that most functional languages offer a way to do this, and you need it to be able to communicate with the rest of the world anyway. (The most complicated formal one is the IO monad of Haskell.) Apart from simple bindings for some other synchronization library, they would probably try to implement a lock-free, wait-free data structure if possible.
Some approaches include:
Data that is written only once and never altered can be accessed safely with no locks or waiting on most CPUs. (There is typically a memory fence instruction to ensure that the memory updates in the right order for both the producer and the consumer.)
Some data structures, such as a difference list, have the property that you can tack on updates without invalidating any existing data. Let's say you have the association list [(1,'a'), (2,'b'), (3,'c')] and you want to update by changing the third entry to 'g'. If you express this as (3,'g'):originalList, then you can update the current list with the new version, and keep originalList valid and unaltered. Any thread that saw it can still safely use it.
Even if you have to work around the garbage collector, each thread can make its own thread-local copy of the shared state so long as the original does not get deleted while it is being copied. The underlying low-level implementation would be a producer/consumer model that atomically updates a pointer to the state data and inserts memory-fence instructions before the update and the copy operations.
If the program has a way to compare-and-swap atomically, and the garbage collector is aware, each thread can use the receive-copy-update pattern. A thread-aware garbage collector will keep the older data around as long as any thread is using it, and recycle it when the last thread is done with it. This should not require locking in software (for example, on modern ISAs, incrementing or decrementing a word-sized counter is an atomic operation, and atomic compare-and-swap is wait-free).
The functional language can add an extension to call an IPC library written in some other language, and update data in place. In Haskell, this would be defined with the IO monad to ensure sequential memory consistency, but nearly every functional language has some way to exchange data with the system libraries.
So, a functional language does offer some guarantees that are useful for efficient concurrent programming. For example, most current ISAs impose no extra overhead on multiple reader threads when there is at most a single writer, certain consistency bugs cannot occur, and functional languages are well-suited to express this pattern.

Ways to share a variable among threads

I have a general question about parallel programming in C and C++ and would appreciate it if you could answer it. As far as I know, we can declare a variable in at least one level higher (parent thread) to share it among children threads. So, I was wondering if there is any other way to share a variable among threads with the same parent thread? Is this API dependant or not?
For Posix threads, read some pthread tutorial.
For C++11, read the documentation of its thread library
All threads of the same process share the same address space in virtual memory. As commented by Marco A. consider also thread_local variables.
Notice that you share data or memory (not variables, which exist only in the source code)
In practice, you'll better protect with a mutex the shared data (for synchronization) to avoid data races.
In the simple case, the mutex and the shared data are in some global variables.
You could also use atomic operations.
BTW, you could also develop a parallel application using some message passing paradigm, e.g. using MPI (or simply using some RPC or other messages, e.g. JSON on sockets). You might consider for regular numerical applications to use the GPGPU e.g. using OpenCL. And of course you might mix all the approaches (using OpenCL, with several threads, and having your parallel software running in several such processes communicating with MPI).
Debugging a heavily parallel software can become a nightmare. Performance may depend upon the hardware system and may require tricky tuning. scalability and synchronization may becoming a growing concern.map-reduce is often a useful model.
In C++ and C any memory location (identified by a variable) can be shared among threads. The memory space is the same across all threads. There is no parent/child thread relationship with memory.
The challenge is to control or synchronize access to the memory location among the threads.
That is implementation dependent.
Any global variable is sharable among threads, since threads are light weight processes sharing the same address space. For synchronization, you need to ensure mutual exclusion while updating/accessing those global variables through semaphores or wait notify blocks.

C/C++ - ring buffer in shared memory (POSIX compatible)

I've an application where producers and consumers ("clients") want to send broadcast messages to each other, i.e. a n:m relationship. All could be different programs so they are different processes and not threads.
To reduce the n:m to something more maintainable I was thinking of a setup like introducing a little, central server. That server would offer an socket where each client connects to.
And each client would send a new message through that socket to the server - resulting in 1:n.
The server would also offer a shared memory that is read only for the clients. It would be organized as a ring buffer where the new messages would be added by the server and overwrite older ones.
This would give the clients some time to process the message - but if it's too slow it's bad luck, it wouldn't be relevant anymore anyway...
The advantage I see by this approach is that I avoid synchronisation as well as unnecessary data copying and buffer hierarchies, the central one should be enough, shouldn't it?
That's the architecture so far - I hope it makes sense...
Now to the more interesting aspect of implementing that:
The index of the newest element in the ring buffer is a variable in shared memory and the clients would just have to wait till it changes. Instead of a stupid while( central_index == my_last_processed_index ) { /* do nothing */ } I want to free CPU resources, e.g. by using a pthread_cond_wait().
But that needs a mutex that I think I don't need - on the other hand Why do pthreads’ condition variable functions require a mutex? gave me the impression that I'd better ask if my architecture makes sense and could be implemented like that...
Can you give me a hint if all of that makes sense and could work?
(Side note: the client programs could also be written in the common scripting languages like Perl and Python. So the communication with the server has to be recreated there and thus shouldn't be too complicated or even proprietary)
If memory serves, the reason for the mutex accompanying a condition variable is that under POSIX, signalling the condition variable causes the kernel to wake up all waiters on the condition variable. In these circumstances, the first thing that consumer threads need to do is check is that there is something to consume - by means of accessing a variable shared between producer and consumer threads. The mutex protects against concurrent access to the variable used for this purpose. This of course means that if there are many consumers, n-1 of them are needless awoken.
Having implemented precisely the arrangement described above, the choice of IPC object to use is not obvious. We were buffering audio between high priority real-time threads in separate processes, and didn't want to block the consumer. As the audio was produced and consumed in real-time, we were already getting scheduled regularly on both ends, and if there wasn't to consume (or space to produce into) we trashed the data because we'd already missed the deadline.
In the arrangement you describe, you will need a mutex to prevent the consumers concurrently consuming items that are queued (and believe me, on a lightly loaded SMP system, they will). However, you don't need to have the producer contend on this as well.
I don't understand you comment about the consumer having read-only access to the shared memory. In the classic lockless ring buffer implementation, the producer writes the queue tail pointer and the consumer(s) the head - whilst all parties need to be able to read both.
You might of course arrange for the queue head and tails to be in a different shared memory region to the queue data itself.
Also be aware that there is a theoretical data coherency hazard on SMP systems when implementing a ring buffer such as this - namely that write-back to memory of the queue content with respect to the head or tail pointer may occur out of order (they in cache - usually per-CPU core). There are other variants on this theme to do with synchonization of caches between CPUs. To guard against these, you need to an memory, load and store barriers to enforce ordering. See Memory Barrier on Wikipedia. You explicitly avoid this hazard by using kernel synchronisation primitives such as mutex and condition variables.
The C11 atomic operations can help with this.
You do need a mutex on a pthread_cond_wait() as far as I know. The reason is that pthread_cond_wait() is not atomic. The condition variable could change during the call, unless it's protected by a mutex.
It's possible that you can ignore this situation - the client might sleep past message 1, but when the subsequent message is sent then the client will wake up and find two messages to process. If that's unacceptable then use a mutex.
You probably can have a bit of different design by using sem_t if your system has them; some POSIX systems are still stuck on the 2001 version of POSIX.
You probably don't forcably need a mutex/condition pair. This is just how it was designed long time ago for POSIX.
Modern C, C11, and C++, C++11, now brings you (or will bring you) atomic operations, which were a feature that is implemented in all modern processors, but lacked support from most higher languages. Atomic operations are part of the answer for resolving a race condition for a ring buffer as you want to implement it. But they are not sufficient because with them you can only do active wait through polling, which is probably not what you want.
Linux, as an extension to POSIX, has futex that resolves both problems: to avoid races for updates by using atomic operations and the ability to putting waiters to sleep via a system call. Futexes are often considered as being too low level for everyday programming, but I think that it actually isn't too difficult to use them. I have written up things here.

Simulating Thread with fork()

What's your idea about simulating thread with "fork() function" and a "shared memory" block ...
Is it possible ?
How much is it reasonable to do this for a program ? ( I mean , Will it work well..?)
For starters, don't mix a thread and fork().
A fork gives you a brand new process, which is a copy of the current process, with the same code segments. As the memory image changes (typically this is due to different behavior of the two processes) you get a separation of the memory images, however the executable code remains the same. Tasks do not share memory unless they use some Inter Process Communication (IPC) primitive.
In contrast a thread is another execution thread of the same task. One task can have multiple threads, and the task memory object are shared among threads, therefore shared data must be accessed through some primitive and synchronization objects that allow you to avoid data corruption.
Yes, it is possible, but I cannot imagine it being a good idea, and it would be a real pain to test.
If you have a shared heap, and you make sure all semaphores etc. are allocated in the heap, and not the stack, then there's no inherent reason you couldn't do something like it. There would be some tricky differences though.
For example, anything you do in an interrupt handler in a multi-threaded program can change data used by all the threads, while in a forked program, you would have to send multiple interrupts, which would be caught at different times, and might lead to unintended effects.
If you want threading behavior, just use a thread.
AFAIK, fork will create a separate process with its own context, stack and so on. Depends what you mean by "simulating"...
You might want to check this out : http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them
A few of the answers here focus on "don't mix fork and threads". But the way I read your question is: "can you use two different processes, and still communicate quickly and conveniently with shared memory between them, just like how threads have access to each others' memory?"
And the answer is, yes you can, but you have to remember to explicitly mark which memory areas you want shared. You can not just share your variables between the processes. Also, you can communicate this way between processes not related to each other at all. It is not limited to processes forked from each other.
Have a look at shared memory or "shm".