condition variable usage in detecting an event in C++ scott meyers - c++

I am reading about condition variables in Effective Modern C++ by Scott Meyers book below is text.
std::condition_variable cv
std::mutex m
T1 (detecting task)
...
cv.notify_one();
T2 (reacting task)
...
{
std::unique_lock<std::mutex> lk(m);
cv.wait(lk);
...
}
Here author mentions as below
Mutexs are used to control access to shared data, but it's entirely
possible that the detecting and reacting tasks have no need for such
mediation. For example, the detecting task might be responsible for
initializing a global data structure, then turning it over to reacting
task for use. If the detecting task never access the data structure
after initialzing it, and if the reacting task never access it before
the detecting task indicates that it's ready, the two tasks will stay
out of each other's way through program logic. There will be no need
for mutex.
On above text I have difficulty in understanding
What does author mean by " two tasks will stay out of each other's way through program logic" ?
What does author mean by no need for mutex?

Mutex are used to solve race conditions, e.g.:
A race condition occurs when two or more threads can access shared data and they try to change it at the same time
In your case that doesn't happen since the operations done on your structure will be done in a different time frame, i.e.: not causing any race conditions. Since you don't have two threads writing at the same time you don't need a mutex.
Also consider that most problems arise when you have "check-then-act" (e.g. "check" if the value is X, then "act" to do something that depends on the value being X) and another thread does something to the value in between the "check" and the "act".

The detecting task and the reacting task both access the same data, but it is guaranteed by program logic that they never access that data at the same time. Therefore, they do not need a mutex (or any other mechanism) to prevent concurrent access to the data, because such concurrent access is prevented by other means. The other means are the program logic.
"Program logic" refers to the control flow of the program. I'll reformat the code slightly:
Data shared_data;
std::condition_variable cv;
std::mutex m;
void detecting_task()
{
initialise(shared_data);
cv.notify_one();
}
void reacting_task()
{
{
std::unique_lock<std::mutex> lk(m);
cv.wait(lk);
}
process(shared_data);
}
Even if both detecting_task and reacting_task start at the same time, you can see that they cannot possibly act on shared_data concurrently. The program logic is such that detecting_task only touches the data before cv is notified, and reacting_task only touches the data after cv is notified. So they cannot possibly overlap, and hence don't need to protect shared_data by a mutex.

Essentially the text says: when two threads do not access any shared resources, there is no need for synchronizing access to the resources. That is, if each thread just uses its own data structures and no other thread accesses them, there is no need for any locking via a mutex - no other thread is going to access the respective resource.
Given the context, it seems a simple message passing system, i.e., the thing using the mutex and the condition variable, is described which does all the necessary synchronization: the "detecting thread" notices something and sends a notification via the message passing system to the "reacting thread". The only thing shared between them is the message passing object.

I don't fully understand the citation either. If you are not protecting shared data, mutex is not necessary for memory access, for sure. This may mean, that detecting task will not require to lock the mutex.
But, the c++11 the condition_variable requires a mutex for working. And this is because, at least some underlying OS implementations are requiring a mutex, such as pthreads
See : Why do pthreads’ condition variable functions require a mutex?

Related

std::condition_variable memory writes visibility

Does std::condition_variable::notify_one() or std::condition_variable::notify_all() guarantee that non-atomic memory writes in the current thread prior to the call will be visible in notified threads?
Other threads do:
{
std::unique_lock lock(mutex);
cv.wait(lock, []() { return values[threadIndex] != 0; });
// May a thread here see a zero value and therefore start to wait again?
}
Main thread does:
fillData(values); // All values are zero and all threads wait() before calling this.
cv.notify_all(); // Do need some memory fence or lock before this
// to ensure that new non-zero values will be visible
// in other threads immediately after waking up?
Doesn't notify_all() store some atomic value therefore enforcing memory ordering? I did not clarified it.
UPD: according to Superlokkus' answer and an answer here: we have to acquire a lock to ensure memory writes visibility in other threads (memory propagation), otherwise threads in my case may read zero values.
Also I missed this quote here about condition_variable, which specifically answers my question. Even an atomic variable has to be modified under a lock in a case when the modification must become visible immediately.
Even if the shared variable is atomic, it must be modified under the
mutex in order to correctly publish the modification to the waiting
thread.
I guess you are mixing up memory ordering of so called atomic values and the mechanisms of classic lock based synchronization.
When you have a datum which is shared between threads, lets say an int for example, one thread can not simply read it while the other thread might be write to it meanwhile. Otherwise we would have a data race.
To get around this for long time we used classic lock based synchronization:
The threads share at least a mutex and the int. To read or to write any thread has to hold the lock first, meaning they wait on the mutex. Mutexes are build so that they are fine that this can happen concurrently. If a thread wins gettting the mutex it can change or read the int and then should unlock it, so others can read/write too. Using a conditional variable like you used is just to make the pattern "readers wait for a change of a value by a writer" more efficient, they get woken up by the cv instead of periodically waiting on the lock, reading, and unlocking, which would be called busy waiting.
So because you hold the lock in any after waiting on the mutex or in you case, correctly (mutex is still needed) waiting on the conditional variable, you can change the int. And readers will read the new value after the writer was able to wrote it, never the old. UPDATE: However one thing if have to add, which might also be the cause of confusion: Conditional variables are subject for so called spurious wakeups. Meaning even though you write did not have notified any thread, a read thread might still wake up, with the mutex locked. So you have to check if you writer actually waked you up, which is usually done by the writer by changing another datum just to notify this, or if its suitable by using the same datum you already wanted to share. The lambda parameter overload of std::condition_variable::wait was just made to make the checking and going back to sleep code looking a bit prettier. Based on your question now I don't know if you want to use you values for this job.
However at snippet for the "main" thread is incorrect or incomplete:
You are not synchronizing on the mutex in order to change values.
You have to hold the lock for that, but notifying can be done without the lock.
std::unique_lock lock(mutex);
fillData(values);
lock.unlock();
cv.notify_all();
But these mutex based patters have some drawbacks and are slow, only one thread at a time can do something. This is were so called atomics, like std::atomic<int> came into play. They can be written and read at the same time without an mutex by multiple threads concurrently. Memory ordering is only a thing to consider there and an optimization for cases where you uses several of them in a meaningful way or you don't need the "after the write, I never see the old value" guarantee. However with it's default memory ordering memory_order_seq_cst you would also be fine.

Safely Destroying a Thread Pool

Consider the following implementation of a trivial thread pool written in C++14.
threadpool.h
threadpool.cpp
Observe that each thread is sleeping until it's been notified to awaken -- or some spurious wake up call -- and the following predicate evaluates to true:
std::unique_lock<mutex> lock(this->instance_mutex_);
this->cond_handle_task_.wait(lock, [this] {
return (this->destroy_ || !this->tasks_.empty());
});
Furthermore, observe that a ThreadPool object uses the data member destroy_ to determine if its being destroyed -- the destructor has been called. Toggling this data member to true will notify each worker thread that it's time to finish its current task and any of the other queued tasks then synchronize with the thread that's destroying this object; in addition to prohibiting the enqueue member function.
For your convenience, the implementation of the destructor is below:
ThreadPool::~ThreadPool() {
{
std::lock_guard<mutex> lock(this->instance_mutex_); // this line.
this->destroy_ = true;
}
this->cond_handle_task_.notify_all();
for (auto &worker : this->workers_) {
worker.join();
}
}
Q: I do not understand why it's necessary to lock the object's mutex while toggling destroy_ to true in the destructor. Furthermore, is it only necessary for setting its value or is it also necessary for accessing its value?
BQ: Can this thread pool implementation be improved or optimized while maintaining it's original purpose; a thread pool that can pool N amount of threads and distribute tasks to them to be executed concurrently?
This thread pool implementation is forked from Jakob Progsch's C++11 thread pool repository with a thorough code step through to understand the purpose behind its implementation and some subjective style changes.
I am introducing myself to concurrent programming and there is still much to learn -- I am a novice concurrent programmer as it stands right now. If my questions are not worded correctly then please make the appropriate correction(s) in your provided answer. Moreover, if the answer can be geared towards a client who is being introduced to concurrent programming for the first time then that would be best -- for myself and any other novices as well.
If the owning thread of the ThreadPool object is the only thread that atomically writes to the destroy_ variable, and the worker threads only atomically read from the destroy_ variable, then no, a mutex is not needed to protect the destroy_ variable in the ThreadPool destructor. Typically a mutex is necessary when an atomic set of operations must take place that can't be accomplished through a single atomic instruction on a platform, (i.e., operations beyond an atomic swap, etc.). That being said, the author of the thread pool may be trying to force some type of acquire semantics on the destroy_ variable without restoring to atomic operations (i.e. a memory fence operation), and/or the setting of the flag itself is not considered an atomic operation (platform dependent)... Some other options include declaring the variable as volatile to prevent it from being cached, etc. You can see this thread for more info.
Without some sort of synchronization operation in place, the worst case scenario could end up with a worker that won't complete due to the destroy_ variable being cached on a thread. On platforms with weaker memory ordering models, that's always a possibility if you allowed a benign memory race condition to exist ...
C++ defines a data race as multiple threads potentially accessing an object simultaneously with at least one of those accesses being a write. Programs with data races have undefined behavior. If you were to write to destroy in your destructor without holding the mutex, your program would have undefined behavior and we cannot predict what would happen.
If you were to read destroy elsewhere without holding the mutex, that read could potentially happen while the destructor is writing to it which is also a data race.

Need to mutex-protect (atomic) assignment sought by condition variable?

I understand how to use condition variables (crummy name for this construct, IMO, as the cv object neither is a variable nor indicates a condition). So I have a pair of threads, canonically set up with Boost.Thread as:
bool awake = false;
boost::mutex sync;
boost::condition_variable cv;
void thread1()
{
boost::unique_lock<boost::mutex> lock1(sync);
while (!awake)
cv.wait(lock1);
lock1.unlock(); // this line actually not canonical, but why not?
// proceed...
}
void thread2()
{
//...
boost::unique_lock<boost::mutex> lock2;
awake = true;
lock2.unlock();
cv.notify_all();
}
My question is: does thread2 really need to be protecting the assignment to awake? It seems to me the notify_all() call should be sufficient. If the data being manipulated and checked against were more than a simple "ok to proceed" flag, I see the value in the mutex, but here it seems like overkill.
A secondary question is that asked in the code fragment: Why doesn't the Boost documentation show the lock in thread1 being unlocked before the "process data" step?
EDIT: Maybe my question is really: Is there a cleaner construct than a CV to implement this kind of wait?
does thread2 really need to be protecting the assignment to awake?
Yes. Modifying an object from one thread and accessing it from another without synchronisation gives undefined behaviour. Even if it's just a bool.
For example, on some multiprocessor systems the write might only affect local memory; without an explicit synchronisation operation, other threads might never see the change.
Why doesn't the Boost documentation show the lock in thread1 being unlocked before the "process data" step?
If you unlocked the mutex before clearing the flag, then you might miss another signal.
Is there a cleaner construct than a CV to implement this kind of wait?
In Boost and the standard C++ library, no; a condition variable is flexible enough to handle arbitrary shared state and not particularly over-complicated for this simple case, so there's no particular need for anything simpler.
More generally, you could use a semaphore or a pipe to send a simple signal between threads.
Formally, you definitely need the lock in both threads: if any thread
modifies an object, and more than one thread accesses it, then all
accesses must be synchronized.
In practice, you'll probably get away with it without the lock; it's
almost certain that notify_all will issue the necessary fence or
membar instructions to ensure that the memory is properly synchronized.
But why take the risk?
As to the absense of the unlock, that's the whole point of the scoped
locking pattern: the unlock is in the destructor of the object, so
that the mutex will be unlocked even if an exception passes through.

Cheapest way to wake up multiple waiting threads without blocking

I use boost::thread to manage threads. In my program i have pool of threads (workers) that are activated sometimes to do some job simultaneously.
Now i use boost::condition_variable: and all threads are waiting inside boost::condition_variable::wait() call on their own conditional_variableS objects.
Can i AVOID using mutexes in classic scheme, when i work with conditional_variables? I want to wake up threads, but don't need to pass some data to them, so don't need a mutex to be locked/unlocked during awakening process, why should i spend CPU on this (but yes, i should remember about spurious wakeups)?
The boost::condition_variable::wait() call trying to REACQUIRE the locking object when CV received the notification. But i don't need this exact facility.
What is cheapest way to awake several threads from another thread?
If you don't reacquire the locking object, how can the threads know that they are done waiting? What will tell them that? Returning from the block tells them nothing because the blocking object is stateless. It doesn't have an "unlocked" or "not blocking" state for it to return in.
You have to pass some data to them, otherwise how will they know that before they had to wait and now they don't? A condition variable is completely stateless, so any state that you need must be maintained and passed by you.
One common pattern is to use a mutex, condition variable, and a state integer. To block, do this:
Acquire the mutex.
Copy the value of the state integer.
Block on the condition variable, releasing the mutex.
If the state integer is the same as it was when you coped it, go to step 3.
Release the mutex.
To unblock all threads, do this:
Acquire the mutex.
Increment the state integer.
Broadcast the condition variable.
Release the mutex.
Notice how step 4 of the locking algorithm tests whether the thread is done waiting? Notice how this code tracks whether or not there has been an unblock since the thread decided to block? You have to do that because condition variables don't do it themselves. (And that's why you need to reacquire the locking object.)
If you try to remove the state integer, your code will behave unpredictably. Sometimes you will block too long due to missed wakeups and sometimes you won't block long enough due to spurious wakeups. Only a state integer (or similar predicate) protected by the mutex tells the threads when to wait and when to stop waiting.
Also, I haven't seen how your code uses this, but it almost always folds into logic you're already using. Why did the threads block anyway? Is it because there's no work for them to do? And when they wakeup, are they going to figure out what to do? Well, finding out that there's no work for them to do and finding out what work they do need to do will require some lock since it's shared state, right? So there almost always is already a lock you're holding when you decide to block and need to reacquire when you're done waiting.
For controlling threads doing parallel jobs, there is a nice primitive called a barrier.
A barrier is initialized with some positive integer value N representing how many threads it holds. A barrier has only a single operation: wait. When N threads call wait, the barrier releases all of them. Additionally, one of the threads is given a special return value indicating that it is the "serial thread"; that thread will be the one to do some special job, like integrating the results of the computation from the other threads.
The limitation is that a given barrier has to know the exact number of threads. It's really suitable for parallel processing type situations.
POSIX added barriers in 2003. A web search indicates that Boost has them, too.
http://www.boost.org/doc/libs/1_33_1/doc/html/barrier.html
Generally speaking, you can't.
Assuming the algorithm looks something like this:
ConditionVariable cv;
void WorkerThread()
{
for (;;)
{
cv.wait();
DoWork();
}
}
void MainThread()
{
for (;;)
{
ScheduleWork();
cv.notify_all();
}
}
NOTE: I intentionally omitted any reference to mutexes in this pseudo-code. For the purposes of this example, we'll suppose ConditionVariable does not require a mutex.
The first time through MainTnread(), work is queued and then it notifies WorkerThread() that it should execute its work. At this point two things can happen:
WorkerThread() completes DoWork() before MainThread() can complete ScheduleWork().
MainThread() completes ScheduleWork() before WorkerThread() can complete DoWork().
In case #1, WorkerThread() comes back around to sleep on the CV, and is awoken by the next cv.notify() and all is well.
In case #2, MainThread() comes back around and notifies... nobody and continues on. Meanwhile WorkerThread() eventually comes back around in its loop and waits on the CV but it is now one or more iterations behind MainThread().
This is known as a "lost wakeup". It is similar to the notorious "spurious wakeup" in that the two threads now have different ideas about how many notify()s have taken place. If you are expecting the two threads to maintain synchrony (and usually you are), you need some sort of shared synchronization primitive to control it. This is where the mutex comes in. It helps avoid lost wakeups which, arguably, are a more serious problem than the spurious variety. Either way, the effects can be serious.
UPDATE: For further rationale behind this design, see this comment by one of the original POSIX authors: https://groups.google.com/d/msg/comp.programming.threads/cpJxTPu3acc/Hw3sbptsY4sJ
Spurious wakeups are two things:
Write your program carefully, and make sure it works even if you
missed something.
Support efficient SMP implementations
There may be rare cases where an "absolutely, paranoiacally correct"
implementation of condition wakeup, given simultaneous wait and
signal/broadcast on different processors, would require additional
synchronization that would slow down ALL condition variable operations
while providing no benefit in 99.99999% of all calls. Is it worth the
overhead? No way!
But, really, that's an excuse because we wanted to force people to
write safe code. (Yes, that's the truth.)
boost::condition_variable::notify_*(lock) does NOT require that the caller hold the lock on the mutex. THis is a nice improvement over the Java model in that it decouples the notification of threads with the holding of the lock.
Strictly speaking, this means the following pointless code SHOULD DO what you are asking:
lock_guard lock(mutex);
// Do something
cv.wait(lock);
// Do something else
unique_lock otherLock(mutex);
//do something
otherLock.unlock();
cv.notify_one();
I do not believe you need to call otherLock.lock() first.

Modelling boost::Lockable with semaphore rather than mutex (previously titled: Unlocking a mutex from a different thread)

I'm using the C++ boost::thread library, which in my case means I'm using pthreads. Officially, a mutex must be unlocked from the same thread which locks it, and I want the effect of being able to lock in one thread and then unlock in another. There are many ways to accomplish this. One possibility would be to write a new mutex class which allows this behavior.
For example:
class inter_thread_mutex{
bool locked;
boost::mutex mx;
boost::condition_variable cv;
public:
void lock(){
boost::unique_lock<boost::mutex> lck(mx);
while(locked) cv.wait(lck);
locked=true;
}
void unlock(){
{
boost::lock_guard<boost::mutex> lck(mx);
if(!locked) error();
locked=false;
}
cv.notify_one();
}
// bool try_lock(); void error(); etc.
}
I should point out that the above code doesn't guarantee FIFO access, since if one thread calls lock() while another calls unlock(), this first thread may acquire the lock ahead of other threads which are waiting. (Come to think of it, the boost::thread documentation doesn't appear to make any explicit scheduling guarantees for either mutexes or condition variables). But let's just ignore that (and any other bugs) for now.
My question is, if I decide to go this route, would I be able to use such a mutex as a model for the boost Lockable concept. For example, would anything go wrong if I use a boost::unique_lock< inter_thread_mutex > for RAII-style access, and then pass this lock to boost::condition_variable_any.wait(), etc.
On one hand I don't see why not. On the other hand, "I don't see why not" is usually a very bad way of determining whether something will work.
The reason I ask is that if it turns out that I have to write wrapper classes for RAII locks and condition variables and whatever else, then I'd rather just find some other way to achieve the same effect.
EDIT:
The kind of behavior I want is basically as follows. I have an object, and it needs to be locked whenever it is modified. I want to lock the object from one thread, and do some work on it. Then I want to keep the object locked while I tell another worker thread to complete the work. So the first thread can go on and do something else while the worker thread finishes up. When the worker thread gets done, it unlocks the mutex.
And I want the transition to be seemless so nobody else can get the mutex lock in between when thread 1 starts the work and thread 2 completes it.
Something like inter_thread_mutex seems like it would work, and it would also allow the program to interact with it as if it were an ordinary mutex. So it seems like a clean solution. If there's a better solution, I'd be happy to hear that also.
EDIT AGAIN:
The reason I need locks to begin with is that there are multiple master threads, and the locks are there to prevent them from accessing shared objects concurrently in invalid ways.
So the code already uses loop-level lock-free sequencing of operations at the master thread level. Also, in the original implementation, there were no worker threads, and the mutexes were ordinary kosher mutexes.
The inter_thread_thingy came up as an optimization, primarily to improve response time. In many cases, it was sufficient to guarantee that the "first part" of operation A, occurs before the "first part" of operation B. As a dumb example, say I punch object 1 and give it a black eye. Then I tell object 1 to change it's internal structure to reflect all the tissue damage. I don't want to wait around for the tissue damage before I move on to punch object 2. However, I do want the tissue damage to occur as part of the same operation; for example, in the interim, I don't want any other thread to reconfigure the object in such a way that would make tissue damage an invalid operation. (yes, this example is imperfect in many ways, and no I'm not working on a game)
So we made the change to a model where ownership of an object can be passed to a worker thread to complete an operation, and it actually works quite nicely; each master thread is able to get a lot more operations done because it doesn't need to wait for them all to complete. And, since the event sequencing at the master thread level is still loop-based, it is easy to write high-level master-thread operations, as they can be based on the assumption that an operation is complete (more precisely, the critical "first part" upon which the sequencing logic depends is complete) when the corresponding function call returns.
Finally, I thought it would be nice to use inter_thread mutex/semaphore thingies using RAII with boost locks to encapsulate the necessary synchronization that is required to make the whole thing work.
man pthread_unlock (this is on OS X, similar wording on Linux) has the answer:
NAME
pthread_mutex_unlock -- unlock a mutex
SYNOPSIS
#include <pthread.h>
int
pthread_mutex_unlock(pthread_mutex_t *mutex);
DESCRIPTION
If the current thread holds the lock on mutex, then the
pthread_mutex_unlock() function unlocks mutex.
Calling pthread_mutex_unlock() with a mutex that the
calling thread does not hold will result in
undefined behavior.
...
My counter-question would be - what kind of synchronization problem are you trying to solve with this? Most probably there is an easier solution.
Neither pthreads nor boost::thread (built on top of it) guarantee any order in which a contended mutex is acquired by competing threads.
Sorry, but I don't understand. what will be the state of your mutex in line [1] in the following code if another thread can unlock it?
inter_thread_mutex m;
{
m.lock();
// [1]
m.unlock();
}
This has no sens.
There's a few ways to approach this. Both of the ones I'm going to suggest are going to involve adding an additional piece of information to the object, rather adding a mechanism to unlock a thread from a thread other than the one that owns it.
1) you can add some information to indicate the object's state:
enum modification_state { consistent, // ready to be examined or to start being modified
phase1_complete, // ready for the second thread to finish the work
};
// first worker thread
lock();
do_init_work(object);
object.mod_state = phase1_complete;
unlock();
signal();
do_other_stuff();
// second worker thread
lock()
while( object.mod_state != phase1_complete )
wait()
do_final_work(obj)
object.mod_state = consistent;
unlock()
signal()
// some other thread that needs to read the data
lock()
while( object.mod_state != consistent )
wait();
read_data(obj)
unlock()
Works just fine with condition variables, because obviously you're not writing your own lock.
2) If you have a specific thread in mind, you can give the object an owner.
// first worker
lock();
while( obj.owner != this_thread() ) wait();
do_initial_work(obj);
obj.owner = second_thread_id;
unlock()
signal()
...
This is pretty much the same solution as my first solution, but more flexible in the adding/removing of phases, and less flexible in the adding/removing of threads.
To be honest, I'm not sure how inter thread mutex would help you here. You'd still need a semaphore or condition variable to signal the passing of the work to the second thread.
Small modification to what you already have: how about storing the id of the thread which you want to take the lock, in your inter_thread_whatever? Then unlock it, and send a message to that thread, saying "I want you execute whatever routine it is that tries to take this lock".
Then the condition in lock becomes while(locked || (desired_locker != thisthread && desired_locker != 0)). Technically you've "released the lock" in the first thread, and "taken it again" in the second thread, but there's no way that any other thread can grab it in between, so it's as if you've transferred it directly from one to the other.
There's a potential problem, that if a thread exits or is killed, while it's the desired locker of your lock, then that thread deadlocks. But you were already talking about the first thread waiting for a message from the second thread to say that it has successfully acquired the lock, so presumably you already have a plan in mind for what happens if that message is never received. To that plan, add "reset the desired_locker field on the inter_thread_whatever".
This is all very hairy, though, I'm not convinced that what I've proposed is correct. Is there a way that the "master" thread (the one that's directing all these helpers) can just make sure that it doesn't order any more operations to be performed on whatever is protected by this lock, until the first op is completed (or fails and some RAII thing notifies you)? You don't need locks as such, if you can deal with it at the level of the message loop.
I don't think it is a good idea to say that your inter_thread_mutex (binary_semaphore) can be seen as a model of Lockable. The main issue is that the main feature of your inter_thread_mutex defeats the Locakble concept. If inter_thread_mutex was a model of lockable you will expect in In [1] that the inter_thread_mutex m is locked.
// thread T1
inter_thread_mutex m;
{
unique_lock<inter_thread_mutex> lk(m);
// [1]
}
But as an other thread T2 can do m.unlock() while T1 is in [1], the guaranty is broken.
Binary semaphores can be used as Lockables as far as each thread tries to lock before unlocking. But the main goal of your class is exactly the contrary.
This is one of the reason semaphores in Boost.Interprocess don't use lock/unlock to name the functions, but wait/notify. Curiously these are the same names used by conditions :)
A mutex is a mechanism for describing mutually exclusive blocks of code. It does not make sense for these blocks of code to cross thread boundaries. Trying to use such a concept in such an counter intuitive way can only lead to problems down the line.
It sounds very much like you're looking for a different multi-threading concept, but without more detail it's hard to know what.