Implementing task-local variables for Concurrency Runtime

Implementing task-local variables for Concurrency Runtime - c++

I'm improving an application (Win64, C++) by making it more asynchronous. I'm using the Concurrency Runtime and it's worked great for me so far.
The application basically executes a number of 'jobs' transforming data. To track what each job does, certain subsystems are instrumented with code to track certain operations that the job performs. Previously this would use a single global variable representing the currently executing job to be able to register tracking information without having to pass context information all the way down the calling chain. Each job may also turn use the ConcRT to parallelize the job itself. This all works quite well.
Now though, I am refactoring the application so that we can execute the top-level jobs in parallel. Each job is executed as a ConcRT task, and this works well for all jobs except those which need tracking.
What I basically need is a way to associate some context information with a Task, and have that flow to any other tasks spawned by that task. Basically, I need "Task Local" variables.
With ConcRT we can't simply use thread locals to store the context information, since the job may spawn other jobs using ConcRT and these will execute on any number of threads.
My current approach involves creating a number of Scheduler instances at startup, and spawning each job in a scheduler dedicated to that job. I can then use the Concurrency::CurrentScheduler::Id() function to retrieve an integer ID which I can use as a key to figure out the context. This works but single-stepping through the Concurrency::CurrentScheduler::Id() in assembly makes me wince somewhat since it performs multiple virtual function calls and safety checks which adds quite a lot of overhead, which is a bit of a problem since this lookup needs to be done at an extremely high rate in some cases.
So - is there some better way to accomplish this? I would have loved to have a first-class TaskLocal/userdata mechanism which allowed me to associate a single context pointer with the current Scheduler/SchedulerGroup/Task which I could retrieve with very little overhead.
A hook which is called whenever a ConcRT thread grabs a new task would be my ideal, as I could then retrieve the Scheduler/ScheduleGroup ID and store it in a thread local for minimal access overhead. Alas, I can't see any way to register such a hook and it doesn't seem to be possible to implement custom Scheduler classes for PPL/agents (see this article).

Is there some reason that you can't pass some sort of context object to these tasks that gives them an interface for updating their status? Because from where I'm standing, it sounds like you have a really bad problem with Singletons (aka global variables), one that should be solved with dependency injection.
If dependency injection isn't an option, there is another strategy for dealing with Singletons. That strategy is basically allowing the Singleton to be a 'stack'. You can 'push' a new value to the Singleton, and then everybody who accesses it gets this new value. And then you can 'pop' the value back off and the value before pushing is restored. This does not have to be directly modeled with an actual stack, which is why I put the words 'push', 'pop' and 'stack' in quotes.
You can adapt this model to your circumstance by having a thread local Singleton that is initialized with the value (not the whole stack of values, just the top value) of the parent thread's version of this variable. Then, if a new context is required for this thread and its children you can push a new value onto the thread-local Singleton.

Related

Cancelling arbitary jobs running in a thread_pool

Is there a way for a thread-pool to cancel a task underway? Better yet, is there a safe alternative for on-demand cancelling opaque function calls in thread_pools?
Killing the entire process is a bad idea and using native handle to perform pthread_cancel or similar API is a last resort only.
Extra
Bonus if the cancellation is immediate, but it's acceptable if the cancellation has some time constraint 'guarantees' (say cancellation within 0.1 execution seconds of the thread in question for example)
More details
I am not restricted to using Boost.Thread.thread_pool or any specific library. The only limitation is compatibility with C++14, and ability to work on at least BSD and Linux based OS.
The tasks are usually data-processing related, pre-compiled and loaded dynamically using C-API (extern "C") and thus are opaque entities. The aim is to perform compute intensive tasks with an option to cancel them when the user sends interrupts.
While launching, the thread_id for a specific task is known, and thus some API can be sued to find more details if required.
Disclaimer
I know using native thread handles to cancel/exit threads is not recommended and is a sign of bad design. I also can't modify the functions using boost::this_thread::interrupt_point, but can wrap them in lambdas/other constructs if that helps. I feel like this is a rock and hard place situation, so alternate suggestions are welcome, but they need to be minimally intrusive in existing functionality, and can be dramatic in their scope for the feature-set being discussed.
EDIT:
Clarification
I guess this should have gone in the 'More Details' section, but I want it to remain separate to show that existing 2 answers are based o limited information. After reading the answers, I went back to the drawing board and came up with the following "constraints" since the question I posed was overly generic. If I should post a new question, please let me know.
My interface promises a "const" input (functional programming style non-mutable input) by using mutexes/copy-by-value as needed and passing by const& (and expecting thread to behave well).
I also mis-used the term "arbitrary" since the jobs aren't arbitrary (empirically speaking) and have the following constraints:
some which download from "internet" already use a "condition variable"
not violate const correctness
can spawn other threads, but they must not outlast the parent
can use mutex, but those can't exist outside the function body
output is via atomic<shared_ptr> passed as argument
pure functions (no shared state with outside) **
** can be lambda binding a functor, in which case the function needs to makes sure it's data structures aren't corrupted (which is the case as usually, the state is a 1 or 2 atomic<inbuilt-type>). Usually the internal state is queried from an external db (similar architecture like cookie + web-server, and the tab/browser can be closed anytime)
These constraints aren't written down as a contract or anything, but rather I generalized based on the "modules" currently in use. The jobs are arbitrary in terms of what they can do: GPU/CPU/internet all are fair play.
It is infeasible to insert a periodic check because of heavy library usage. The libraries (not owned by us) haven't been designed to periodically check a condition variable since it'd incur a performance penalty for the general case and rewriting the libraries is not possible.

Is there a way for a thread-pool to cancel a task underway?
Not at that level of generality, no, and also not if the task running in the thread is implemented natively and arbitrarily in C or C++. You cannot terminate a running task prior to its completion without terminating its whole thread, except with the cooperation of the task.
Better
yet, is there a safe alternative for on-demand cancelling opaque
function calls in thread_pools?
No. The only way to get (approximately) on-demand preemption of a specific thread is to deliver a signal to it (that is is not blocking or ignoring) via pthread_kill(). If such a signal terminates the thread but not the whole process then it does not automatically make any provision for freeing allocated objects or managing the state of mutexes or other synchronization objects. If the signal does not terminate the thread then the interruption can produce surprising and unwanted effects in code not designed to accommodate such signal usage.
Killing the entire process is a bad idea and using native handle to
perform pthread_cancel or similar API is a last resort only.
Note that pthread_cancel() can be blocked by the thread, and that even when not blocked, its effects may be deferred indefinitely. When the effects do occur, they do not necessarily include memory or synchronization-object cleanup. You need the thread to cooperate with its own cancellation to achieve these.
Just what a thread's cooperation with cancellation looks like depends in part on the details of the cancellation mechanism you choose.

Cancelling a non cooperative, not designed to be cancelled component is only possible if that component has limited, constrained, managed interactions with the rest of the system:
the ressources owned by the components should be managed externally (the system knows which component uses what resources)
all accesses should be indirect
the modifications of shared ressources should be safe and reversible until completion
That would allow the system to clean up resource, stop operations, cancel incomplete changes...
None of these properties are cheap; all the properties of threads are the exact opposite of these properties.
Threads only have an implied concept of ownership apparent in the running thread: for a deleted thread, determining what was owned by the thread is not possible.
Threads access shared objects directly. A thread can start modifications of shared objects; after cancellation, such modifications that would be partial, non effective, incoherent if stopped in the middle of an operation.
Cancelled threads could leave locked mutexes around. At least subsequent accesses to these mutexes by other threads trying to access the shared object would deadlock.
Or they might find some data structure in a bad state.
Providing safe cancellation for arbitrary non cooperative threads is not doable even with very large scale changes to thread synchronization objects. Not even by a complete redesign of the thread primitives.
You would have to make thread almost like full processes to be able to do that; but it wouldn't be called a thread then!

What is the executor pattern in a C++ context?

The author of asio, Christopher Kohlhoff, is working on a library and proposal for executors in C++. His work so far includes this repo and docs. Unfortunately, the rationale portion has yet to be written. So far, the docs give a few examples of what the library does but I don't feel like I'm missing something. Somehow this is more than a family of fancy invoker functions.
Everything I can find on Google is very Java specific and a lot of it is particular to specific frameworks so I'm having trouble figuring out what this "executor pattern" is all about.
What are executors in this context? What do they do? What are the canonical examples of when they would be helpful? What variations exist among executors? What are the alternatives to executors and how do they compare? In particular, there seems to be a lot of overlap with an event loop where the events are initial input events, execution events, and a shutdown event.
When trying to figure out new abstractions I usually find understanding the motivation key. So for executors, what are we trying to abstract and why? What are we trying to make generic? Without executors, what extra work would we have to do?

The most basic benefit of executors is separating the definition of a program's parallelism from how it's used. Java's executor model exists because, by and large, you don't actually know, when you're first writing code, what parallelism model is best for your scenario. You might have little to gain from parallelism and shouldn't use threads at all, you might do best with a long running dedicated worker thread for each core, or a dynamically scaling pool of threads based on current load that cleans up threads after they've been idle a while to reduce memory usage, context switches, etc., or maybe just launching a thread for every task on demand, exiting when the task is done.
The key here is it's nigh impossible to know which approach is best when you're first writing code. You may know where parallelism might help you, but in traditional threading, you end up intermingling the parallelism "configuration" (when and whether to create threads) with the use of parallelism (determining which functions to call with what arguments). When you do mix the code like this, it's a royal pain to do performance testing of different options, because each and every thread launch is independent, and must be updated separately.
The main benefit of the executor model is that the parallelism configuration is done in one place (where the executor is created), and the users of that executor don't have to know anything about it. They just submit work to the executor, receive a future, and at some later point, retrieve the result (blocking if necessary) from the future. If you want to experiment with other configurations, you change the one line defining the executor and run your code again. Even if you decide you need to use different parallelism models for different sections of your code, refactoring to add a second executor and change some of the users of the first executor to use the second is easy compared to manually rewriting the threading details of every site; as long as the executor's name is (relatively) unique, finding users and changing them to use a different one is pretty easy. Executors both simplify your code (by avoiding intermingling thread creation/management with the tasks the threads do) and simplify performance testing.
As a side-benefit, you also abstract away the complexities of transferring data into and out of a worker thread (the submit method encapsulates the former, the future's result method encapsulates the latter). std::async gets you some of this benefit, but with no real control over the parallelism involved (just a yes/no/maybe choice of whether to force a thread, force deferred execution in the current thread, or let the compiler/library decide, with no fine grained control over whether a thread pool is used, and if so, how it behaves). A true executor framework gives you the control std::async fails to provide, with similar ease of use.

Concurrency, tasks

I'm new to the Microsoft Concurrency Runtime (and asynchronous programming in general), and I'm trying to understand what can and can't be done with it. Is it possible to create a task group such that the tasks execute in the order in which they were added, and any task doesn't start until the previous one ends?
I'm trying to understand if there's a more general and devolved way of dealing with tasks compared to chaining several tasks within a single member function. For example, let's say I have a program that creates resources at different points in a program, and the order in which the resources are allocated matters. Could any resource allocation function that was called simply append a task to the end of a central task list, with the result that the tasks execute in the order in which they were added (i.e. the order in which the resource allocation functions were called)?
Thanks,
RobertF

I'm not sure I understand what you're trying to achieve, but, are you looking for Agent or Actor model?
You post messages to an Async Agent and it processes them. It can then send messages to other agents.

what are the benefits of clojure promises over using add-watch?

I am looking at different ways of implementing concurrency in clojure and these seem to be two competing ways of doing the same thing, so I was wondering where I should use each technique.

Watches are about one entity in a concurrent system and promises are about two entities.
promises are more of a way to communicate between events on different timelines. They provide a way for a piece of code to receive a response with out having to worry about what mechanism will be providing the answer. the original code path can create a promise and pass it to two different code paths in a single thread, or threads, or agents, or nodes in a distributed system. then when one of the threads/agents/refs needs an answer it can block on the promise with out having to know anything about the entity that will be fulfilling the promise. And when the other thread/agent/ref/other figures out the answer it can fulfill the promise with out having to know anything about the entity that is waiting on the promise (or not yet waiting).
promises are a communication mechanism across timelines that are independent of the concurrently mechanism used.
Watches are a way of specifying a function to call when an atom or ref changes. this is a way of communicating intent to all the future states of a single agent/ref, by saying "Hey, make sure this condition is always true", or "log the change here".

Watches and promises are both very useful for concurrency, but are suited to slightly different uses. You may well find that you want to use both in different places in the same application.
Use a watch if you want notification of change in a reference. For example, if one thread is handling events and updates a ref in response to some of these events, you could use add-watch to enable other parts of your system to receive notification of the update. A single watch can handle many updates over time.
Use a promise if you want to pass another thread a handle to access a value that is not yet computed. If the other thread tries to dereference the promise, they will block until the computation of the promise has finished (i.e. the original thread places a value in the promise via "deliver"). A single promise is only intended to be used once - after that it is just a fixed value.

Is checking current thread inside a function ok?

Is it ok to check the current thread inside a function?
For example if some non-thread safe data structure is only altered by one thread, and there is a function which is called by multiple threads, it would be useful to have separate code paths depending on the current thread. If the current thread is the one that alters the data structure, it is ok to alter the data structure directly in the function. However, if the current thread is some other thread, the actual altering would have to be delayed, so that it is performed when it is safe to perform the operation.
Or, would it be better to use some boolean which is given as a parameter to the function to separate the different code paths?
Or do something totally different?
What do you think?

You are not making all too much sense. You said a non-thread safe data structure is only ever altered by one thread, but in the next sentence you talk about delaying any changes made to that data structure by other threads. Make up your mind.
In general, I'd suggest wrapping the access to the data structure up with a critical section, or mutex.

It's possible to use such animals as reader/writer locks to differentiate between readers and writers of datastructures but the performance advantage for typical cases usually wont merit the additional complexity associated with their use.
From the way your question is stated, I'm guessing you're fairly new to multithreaded development. I highly suggest sticking with the simplist and most commonly used approaches for ensuring data integrity (most books/articles you readon the issue will mention the same uses for mutexes/critical sections). Multithreaded development is extremely easy to get wrong and can be difficult to debug. Also, what seems like the "optimal" solution very often doesn't buy you the huge performance benefit you might think. It's usually best to implement the simplist approach that will work then worry about optimizing it after the fact.

There is a trick that could work in case, as you said, the other threads will only make changes only once in a while, although it is still rather hackish:
make sure your "master" thread can't be interrupted by the other ones (higher priority, non fair scheduling)
check your thread
if "master", just change
if other, put off scheduling, if needed by putting off interrupts, make change, reinstall scheduling
really test to see whether there are no issues in your setup.
As you can see, if requirements change a little bit, this could turn out worse than using normal locks.

As mentioned, the simplest solution when two threads need access to the same data is to use some synchronization mechanism (i.e. critical section or mutex).
If you already have synchronization in your design try to reuse it (if possible) instead of adding more. For example, if the main thread receives its work from a synchronized queue you might be able to have thread 2 queue the data structure update. The main thread will pick up the request and can update it without additional synchronization.
The queuing concept can be hidden from the rest of the design through the Active Object pattern. The activ object may also be able to publish the data structure changes through the Observer pattern to other interested threads.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js