How to reliably initialize IPC semaphores?

How to reliably initialize IPC semaphores? - concurrency

The problem: How to use SysV semaphores for synchronisation between two processes (let’s call them procA and procB), assuming that both are independently run from the shell (none of them is spawned by the fork/exec combination) and that the semaphores must be created by one of these two processes.
Quoting man semget:
The values of the semaphores in a newly created set are
indeterminate. (POSIX.1-2001 is explicit on this point.) Although
Linux, like many other implementations, initializes the semaphore
values to 0, a portable application cannot rely on this: it
should explicitly initialize the semaphores to the desired values.
Assume we would like to write portable code that relies only on POSIX guarantees, but no Linux-specific guarantees. Very well, so it is impossible to atomically create a semaphore set and initialize it. This must be done by two separate calls.
So, the code for creation of the semaphore set for procA would look sth like this:
int sem_id = semget(key, nsems, IPC_CREAT | S_IRWXU);
And same for procB – this way, whichever process happens to need the semaphores for the first time, it also creates them; otherwise, it simply obtains the semaphore set’s ID and is ready to use it.
Problems start to appear when initialisation is required. The instruction for initialisation is of course semctl with SETALL – but: • the initialisation should be done only once, and • the initialisation should be done before the semaphores are used. This could of course be enforced by… semaphores, but such such solution is unfortunately recursive: we need semaphores to set up semaphores, which themselves need semaphores to be set up and so forth.
Is it possible to do this only using sysV semaphores, or am I right in my assumption that I have to resort to other IPC facilities like signals or message queues to be able to reliably set up these semaphores?

In my experience this isn't a problem in the real world. I put the IPC creation and initialization in a separate program that is called by a system startup script in advance of any client program being run. The IPC resources are never removed, and only go away when the box is rebooted.
If I had to create the resources on the fly, I'd have the creator program start as a separate user and create the resource with owner-only permissions. It would then initialize it, and finally grant permissions to the client's user, and exit.
The clients would simply retry on ENOENT or EACCES, perhaps with a nanosleep.
On second though, I'd probably just use POSIX semaphores, since sem_open(3) let's you specify O_EXCL and an initial value. Old Sys V habits are hard to break.

Related

Cancelling arbitary jobs running in a thread_pool

Is there a way for a thread-pool to cancel a task underway? Better yet, is there a safe alternative for on-demand cancelling opaque function calls in thread_pools?
Killing the entire process is a bad idea and using native handle to perform pthread_cancel or similar API is a last resort only.
Extra
Bonus if the cancellation is immediate, but it's acceptable if the cancellation has some time constraint 'guarantees' (say cancellation within 0.1 execution seconds of the thread in question for example)
More details
I am not restricted to using Boost.Thread.thread_pool or any specific library. The only limitation is compatibility with C++14, and ability to work on at least BSD and Linux based OS.
The tasks are usually data-processing related, pre-compiled and loaded dynamically using C-API (extern "C") and thus are opaque entities. The aim is to perform compute intensive tasks with an option to cancel them when the user sends interrupts.
While launching, the thread_id for a specific task is known, and thus some API can be sued to find more details if required.
Disclaimer
I know using native thread handles to cancel/exit threads is not recommended and is a sign of bad design. I also can't modify the functions using boost::this_thread::interrupt_point, but can wrap them in lambdas/other constructs if that helps. I feel like this is a rock and hard place situation, so alternate suggestions are welcome, but they need to be minimally intrusive in existing functionality, and can be dramatic in their scope for the feature-set being discussed.
EDIT:
Clarification
I guess this should have gone in the 'More Details' section, but I want it to remain separate to show that existing 2 answers are based o limited information. After reading the answers, I went back to the drawing board and came up with the following "constraints" since the question I posed was overly generic. If I should post a new question, please let me know.
My interface promises a "const" input (functional programming style non-mutable input) by using mutexes/copy-by-value as needed and passing by const& (and expecting thread to behave well).
I also mis-used the term "arbitrary" since the jobs aren't arbitrary (empirically speaking) and have the following constraints:
some which download from "internet" already use a "condition variable"
not violate const correctness
can spawn other threads, but they must not outlast the parent
can use mutex, but those can't exist outside the function body
output is via atomic<shared_ptr> passed as argument
pure functions (no shared state with outside) **
** can be lambda binding a functor, in which case the function needs to makes sure it's data structures aren't corrupted (which is the case as usually, the state is a 1 or 2 atomic<inbuilt-type>). Usually the internal state is queried from an external db (similar architecture like cookie + web-server, and the tab/browser can be closed anytime)
These constraints aren't written down as a contract or anything, but rather I generalized based on the "modules" currently in use. The jobs are arbitrary in terms of what they can do: GPU/CPU/internet all are fair play.
It is infeasible to insert a periodic check because of heavy library usage. The libraries (not owned by us) haven't been designed to periodically check a condition variable since it'd incur a performance penalty for the general case and rewriting the libraries is not possible.

Is there a way for a thread-pool to cancel a task underway?
Not at that level of generality, no, and also not if the task running in the thread is implemented natively and arbitrarily in C or C++. You cannot terminate a running task prior to its completion without terminating its whole thread, except with the cooperation of the task.
Better
yet, is there a safe alternative for on-demand cancelling opaque
function calls in thread_pools?
No. The only way to get (approximately) on-demand preemption of a specific thread is to deliver a signal to it (that is is not blocking or ignoring) via pthread_kill(). If such a signal terminates the thread but not the whole process then it does not automatically make any provision for freeing allocated objects or managing the state of mutexes or other synchronization objects. If the signal does not terminate the thread then the interruption can produce surprising and unwanted effects in code not designed to accommodate such signal usage.
Killing the entire process is a bad idea and using native handle to
perform pthread_cancel or similar API is a last resort only.
Note that pthread_cancel() can be blocked by the thread, and that even when not blocked, its effects may be deferred indefinitely. When the effects do occur, they do not necessarily include memory or synchronization-object cleanup. You need the thread to cooperate with its own cancellation to achieve these.
Just what a thread's cooperation with cancellation looks like depends in part on the details of the cancellation mechanism you choose.

Cancelling a non cooperative, not designed to be cancelled component is only possible if that component has limited, constrained, managed interactions with the rest of the system:
the ressources owned by the components should be managed externally (the system knows which component uses what resources)
all accesses should be indirect
the modifications of shared ressources should be safe and reversible until completion
That would allow the system to clean up resource, stop operations, cancel incomplete changes...
None of these properties are cheap; all the properties of threads are the exact opposite of these properties.
Threads only have an implied concept of ownership apparent in the running thread: for a deleted thread, determining what was owned by the thread is not possible.
Threads access shared objects directly. A thread can start modifications of shared objects; after cancellation, such modifications that would be partial, non effective, incoherent if stopped in the middle of an operation.
Cancelled threads could leave locked mutexes around. At least subsequent accesses to these mutexes by other threads trying to access the shared object would deadlock.
Or they might find some data structure in a bad state.
Providing safe cancellation for arbitrary non cooperative threads is not doable even with very large scale changes to thread synchronization objects. Not even by a complete redesign of the thread primitives.
You would have to make thread almost like full processes to be able to do that; but it wouldn't be called a thread then!

Alternatives to POSIX semaphores for 64-bit/32-bit IPC?

I need to implement some sort of blocking wait for a project requiring synchronization between 64-bit and 32-bit processes. Busy waiting on a shared memory variable introduces performance/scheduling issues and POSIX semaphores do not appear to support IPC between 32-bit and 64-bit processes. Are there other low-overhead alternatives for interprocess synchronization on Linux?

Linux has futexes which are a kernel primitive that provides a way for one process to go to sleep and another process to wake it up. They have extremely good fast paths (avoiding kernel calls in those cases) which matters a lot if you use them as a mutex but not so much if you use them as a semaphore.
You would only need its two most primitive functions. One, FUTEX_WAIT, puts a kernel to sleep if, and only if, a particular entry in shared memory has a particular value. The other, FUTEX_WAKE, wakes a process that has gone to sleep with FUTEX_WAIT.
Your "wait" code would atomically check a shared variable to see that it needed to sleep and then call FUTEX_WAIT to go to sleep if, and only if, the shared variable has not changed. Your "wake" code would change the value of the atomic shared variable and then call FUTEX_WAKE to wake any thread that was sleeping.
The 32-bit/64-bit issue would not matter at all if you use a 64-bit shared variable but only put meaningful data in the first 32-bits so it would work the same whether addressed as a 64-bit variable or a 32-bit variable.

For inter-process synchronization using blocking waits, simple solutions include either a named pipe (fd) or a System V Semaphore.
Named pipes have a file path associated with them, so that the two processes can open the file independently (one for read, the other for write). For pure synchronization, just putc() to signal, and getc() to wait, one character at a time (value doesn't matter). This creates a unidirectional ("half duplex") channel; for bidirectional signal/waits you'd create two files. You can even queue up multiple signals by performing many putc() calls in a row, kind of like a semaphore which never saturates.
System V Semaphores also have a file path associated with them. These behave like a Dijkstra semaphore.
For additional options, check out
https://en.wikipedia.org/wiki/Inter-process_communication

How to write your own condition variable using atomic primitives

I need to write my own implementation of a condition variable much like pthread_cond_t.
I know I'll need to use the compiler provided primitives like __sync_val_compare_and_swap etc.
Does anyone know how I'd go about this please.
Thx

Correct implementation of condition variables is HARD. Use one of the many libraries out there instead (e.g. boost, pthreads-win32, my just::thread library)
You need to:
Keep a list of waiting threads (this might be a "virtual" list rather than an actual data structure)
Ensure that when a thread waits you atomically unlock the mutex owned by the waiting thread and add it to the list before that thread goes into a blocking OS call
Ensure that when the condition variable is notified then one of the threads waiting at that time is woken, and not one that waits later
Ensure that when the condition variable is broadcast then all of the threads waiting at that time are woken, and not any threads that wait later.
plus other issues that I can't think of just now.
The details vary with OS, as you are dependent on the OS blocking/waking primitives.

I need to write my own implementation of a condition variable much like pthread_cond_t.
The condition variables cannot be implemented using only the atomic primitives like compare-and-swap.
The purpose in life of the cond vars is to provide flexible mechanism for application to access the process/thread scheduler: put a thread into sleep and wake it up.
Atomic ops are implemented by the CPU, while process/thread scheduler is an OS territory. Without some supporting system call (or emulation using existing synchronization primitives) implementing cond vars is impossible.
Edit1. The only sensible example I know and can point you to is the implementation of the historical Linux pthread library which can be found here - e.g. version from 1997. The implementation (found in condvar.c file) is rather easy to read but also highlights the requirements for implementation of the cond vars. Spinlocks (using test-and-set op) are used for synchronizations and POSIX signals are used to put threads into sleep and to wake them up.

It depends on your requirements. IF you have no further requirements, and if your process may consume 100% of available CPU time, then you have the rare chance to experiment and try out different mutex and condition variables - just try it out, and learn about the details. Great thing.
But in reality, you are uusally bound to an operating system, and so you are captivated on the OSs threading primitives, because they represent the only kind of control to - yeah - process/threading/cpu ressource usage! So, in that case, you will not even have the chance to implement your OWN condition variables - if they are not based on the primites, that the OS provides you!
So... double check your environment, what do you control? What don't you control? And what makes sense?

How can I pass data from a thread to the parent process?

I have a main process that uses a single thread library and I can only the library functions from the main process. I have a thread spawned by the parent process that puts info it receives from the network into a queue.
I need to able to tell the main process that something is on the queue. Then it can access the queue and process the objects. The thread cannot process those objects because the library can only be called by one process.
I guess I need to use pipes and signals. I also read from various newsgroups that I need to use a 'self-trick' pipe.
How should this scenario be implemented?
A more specific case of the following post:
How can unix pipes be used between main process and thread?

Why not use a simple FIFO (named pipe)? The main process will automatically block until it can read something.
If it shouldn't block, it must be possible to poll instead, but maybe it will suck CPU. There probably exists an efficient library for this purpose.
I wouldn't recommend using signals because they are easy to get wrong. If you want to use them anyway, the easiest way I've found is:
Mask all signals in every thread,
A special thread handles signals with sigwait(). It may have to wake up another thread which will handle the signal, e.g. using condition variables.
The advantage is that you don't have to worry anymore about which function is safe to call from the handler.

The "optimal" solution depends quite a bit on your concrete setup. Do you have one process with a main thread and a child thread or do you have one parent process and a child process? Which OS and which thread library do you use?
The reason for the last question is that the current C++03 standard has no notion of a 'thread'. This means in particular that whatever solution your OS and your thread library offer are platform specific. The most portable solutions will only hide these specifics from you in their implementation.
In particular, C++ has no notion of threads in its memory model, nor does it have a notion of atomic operations, synchronization, ordered memory accesses, race conditions etc.
Chances are, however, that whatever library you are using already provides a solution for your problem on your platform.

I highly suggest you used a thread-safe queue such as this one (article and source code). I have personally used it and it's very simple to use. The API consist in simple methods such as push(), try_pop(), wait_and_pop() and empty().
Note that it is based on Boost.Thread.

Thread communication theory

What is the common theory behind thread communication? I have some primitive idea about how it should work but something doesn't settle well with me. Is there a way of doing it with interrupts?

Really, it's just the same as any concurrency problem: you've got multiple threads of control, and it's indeterminate which statements on which threads get executed when. That means there are a large number of POTENTIAL execution paths through the program, and your program must be correct under all of them.
In general the place where trouble can occur is when state is shared among the threads (aka "lightweight processes" in the old days.) That happens when there are shared memory areas,
To ensure correctness, what you need to do is ensure that these data areas get updated in a way that can't cause errors. To do this, you need to identify "critical sections" of the program, where sequential operation must be guaranteed. Those can be as little as a single instruction or line of code; if the language and architecture ensure that these are atomic, that is, can't be interrupted, then you're golden.
Otherwise, you idnetify that section, and put some kind of guards onto it. The classic way is to use a semaphore, which is an atomic statement that only allows one thread of control past at a time. These were invented by Edsgar Dijkstra, and so have names that come from the Dutch, P and V. When you come to a P, only one thread can proceed; all other threads are queued and waiting until the executing thread comes to the associated V operation.
Because these primitives are a little primitive, and because the Dutch names aren't very intuitive, there have been some ther larger-scale approaches developed.
Per Brinch-Hansen invented the monitor, which is basically just a data structure that has operations which are guaranteed atomic; they can be implemented with semaphores. Monitors are pretty much what Java synchronized statements are based on; they make an object or code block have that particular behavir -- that is, only one thread can be "in" them at a time -- with simpler syntax.
There are other modeals possible. Haskell and Erlang solve the problem by being functional languages that never allow a variable to be modified once it's created; this means they naturally don't need to wory about synchronization. Some new languages, like Clojure, instead have a structure called "transactional memory", which basically means that when there is an assignment, you're guaranteed the assignment is atomic and reversible.
So that's it in a nutshell. To really learn about it, the best places to look at Operating Systems texts, like, eg, Andy Tannenbaum's text.

The two most common mechanisms for thread communication are shared state and message passing.

THe most common way for threads to communicate is via some shared data structure, typically a queue. Some threads put information into the queue while others take it out. The queue must be protected by operating system facilities such as mutexes and semaphores. Interrupts have nothing to do with it.

If you're really interested in a theory of thread communications, you may want to look into formalisms like the pi Calculus.

To communicate between threads, you'll need to use whatever mechanism is supplied by your operating system and/or runtime. Interrupts would be unusually low level, although they might be used implicitly if your threads communicate using sockets or named pipes.
A common pattern would be to implement shared state using a shared memory block, relying on an os-supplied synchronization primitive such as a mutex to spare you from busy-waiting when your read from the block. Remember that if you have threads at all, then you must have some kind of scheduler already (whether it's native from the OS or emulated in your language runtime). So this scheduler can provide synchronization objects and a "sleep" function without necessarily having to rely on hardware support.
Sockets, pipes, and shared memory work between processes too. Sometimes a runtime will give you a lighter-weight way of doing synchronization for threads within the same process. Shared memory is cheaper within a single process. And sometimes your runtime will also give you an atomic message-passing mechanism.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js