Clarification of role of CSingleLock and the sync object it uses - c++

I'm confused by the example given by Leo Davidson in is Ccriticalsection usable in production?. Leo gives three code blocks introduced as "Wrong (his example)", "Right", and "Even better (so you get RAII)".
After dismissing the first block as "Wrong", Leo acknowledges later that this is something that can occur if a function that obtains a lock calls another function which obtains the same lock. Fine - there is a real danger here to avoid, and the example is not so much "wrong" as an easy trap to fall into through careless programming.
But the second and third examples confuse me completely... because we have one sync object (the CCriticalSection crit) which is used for two CSingleLock locks... implying that crit is not a lockable thing at all, but only the mechanism which does the locking for an independent object or objects. The trouble is, there is a comment saying "crit is unlocked now" right at the end... which contradicts that implication. Also... other comments qualify themselves by the need to test IsLocked()... when in my understanding, the CCriticalSection cannot timeout, and will only ever return if IsLocked() is TRUE.
The Microsoft documentation I have scanned is really not clear about what role the CSyncObject plays and the CSingleLock or CMultiLock plays. That's my main concern. Can anyone point to documentation that definitively says you can create two locks using a single sync object as Leo has suggested here?

After dismissing the first block as
"Wrong", Leo acknowledges later that
this is something that can occur if a
function that obtains a lock calls
another function which obtains the
same lock. Fine - there is a real
danger here to avoid, and the example
is not so much "wrong" as an easy trap
to fall into through careless
programming.
The "wrong" first block is always wrong and should never be something you do, whether explicitly or by accident. You cannot use a CSingleLock to obtain multiple locks at the same time.
As its name suggests, CSingleLock is an object which manages one lock on one synchronization object. (The underlying synchronization object may be capable of being locked multiple times, but not via just a single CSingleLock.)
I meant that the other two code-blocks were situations you could run into legitimately.
You never need to lock the same CCriticalSection if you already have a lock on it (since you only need one lock to know you own the object), but you may lock it multiple times (usually as a result of holding the lock, then calling a function which gets the lock itself in case it is called by something that doesn't already have it). That's fine.
But the second and third examples
confuse me completely... because we
have one sync object (the
CCriticalSection crit) which is used
for two CSingleLock locks... implying
that crit is not a lockable thing at
all, but only the mechanism which does
the locking for an independent object
or objects.
You can lock a CCriticalSection directly (and multiple times if you want to). It has Lock and Unlock methods for doing that.
If you do that, though, you have to ensure that you have matching Unlock calls for every one of your Lock calls. It can be easy to miss one (especially if you use early returns or exceptions where an Unlock later in a function may be bypassed entirely).
Using a CSingleLock to lock a CCriticalSection is usually better because it will release the lock it holds automatically when it goes out of scope (including if you return early, throw an exception or whatever).
Can anyone point to documentation that
definitively says you can create two
locks using a single sync object as
Leo has suggested here?
Although I couldn't find the source, CCriticalSection (like most MFC objects) is almost certainly a very thin wrapper around the Win32 equivalent, in this case CRITICAL_SECTION. The documentation on EnterCriticalSection tells you:
After a thread has ownership of a
critical section, it can make
additional calls to
EnterCriticalSection or
TryEnterCriticalSection without
blocking its execution. This prevents
a thread from deadlocking itself while
waiting for a critical section that it
already owns. The thread enters the
critical section each time
EnterCriticalSection and
TryEnterCriticalSection succeed. A
thread must call LeaveCriticalSection
once for each time that it entered the
critical section.

Related

Why do I need to explicitly detach a short term variable?

Let's say I have a small operation which I want to perform in a separate thread. I do not need to know when it completes, nor do I need to wait for its completion, but I do not want the operation blocking my current thread. When I write the following code, I will get a crash:
void myFunction() {
// do other stuff
std::thread([]()
{
// do thread stuff
});
}
This crash is solved by assigning the thread to a variable, and detaching it:
void myFunction() {
// do other stuff
std::thread t([]()
{
// do thread stuff
});
t.detach();
}
Why is this step necessary? Or is there a better way to create a small single-use thread?
Because the std::thread::~thread() specification says so:
A thread object does not have an associated thread (and is safe to destroy) after
it was default-constructed
it was moved from
join() has been called
detach() has been called
It looks like detach() is the only one of these that makes sense in your case, unless you want to return the thread object (by moving) to the caller.
Why is this step necessary?
Consider that the thread object represents a long-running "thread" of execution (a lightweight process or kernel schedulable entity or similar).
Allowing you to destroy the object while the thread is still executing, leaves you no way to subsequently join (and find the result of) that thread. This may be a logical error, but it can also make it hard even to correctly exit your program.
Or is there a better way to create a small single-use thread?
Not obviously, but it's frequently better to use a thread pool for running tasks in the background, instead of starting and stopping lots of short-lived threads.
You might be able to use std::async() instead, but the future it returns may block in the destructor in some circumstances, if you try to discard it.
See the documentation of the destructor of std:thread:
If *this has an associated thread (joinable() == true), std::terminate() is called.
You should explicitly say that you don't care what's going to happen with the thread, and that you're OK with loosing any control over it. And that is what detach is for.
In general, this looks like a design problem so crashing makes sense: it's hard to propose a general and not surprising rule about what should happen in such a case (e.g. your program might as well normally end its execution - what should happen with the thread?).
Basically, your use case requires a call to detach() because your use case is pretty weird, and not what C++ is trying to make easy.
While Java and .Net blithely let you toss away a Thread object whose associated thread is still running, in the C++ model the Thread is closer to being the thread, in the sense that the existence of the Thread object coincides with the lifetime, or at least joinability, of the execution it refers to. Note how it's not possible to create a Thread without starting it (except in the case of the default constructor, which is really just there in the service of move semantics), or to copy it or to make one from a thread id. C++ wants Thread to outlive the thread.
Maintaining that condition has various benefits. Final cleanup of a thread's control data doesn't have to be done automagically by the OS, because once a Thread goes away, nothing can ever try to join it. It's easier to ensure that variables with thread storage get destroyed in time, since the main thread is the last to exit (barring some move shenanigans). And a missing join -- which is an extremely common type of bug -- gets properly flagged at runtime.
Letting some thread wander off into the distance, in contrast, is allowed, but it's an unusual thing to do. Unless it's interacting with your other threads through sync objects, there's no way to ensure it's done whatever it was meant to do. A detached thread is on the level of reinterpret_cast: You're allowed to tell the compiler that you know something it doesn't, but that has to be explicit, not just the consequence of the function you didn't call.
Consider this: thread A creates thread B and thread A leaves its scope of execution. The handle for thread B is about to be lost. What should happen now? There are several possibilities, with most obvious as follows:
Thread B is detached and continues its execution indempedently
Thread A waits (joins) thread B before quiting its own scope
Now you can argue which is better: 1 or 2? How should we (the compiler) decide on which one of these is better?
So what the designers did was something different: crash terminate the code so that the developer picks one of these solutions explicitely. In order to avoid implicit (perhaps unwanted) behaviuor. It's a signal for you: "hey, pay attention now, this piece of code is important and I (the compiler) don't want to decide for you".

Locking in function hierarchies

I am currently running into some design problems regarding concurrent programming in C++
and I was wondering if you could help me out:
Assume that some function func operates on some object obj. It is necessary during these operations to hold a lock (which might be a member variable of obj). Now assume that
func calls a subfunction func_2 while it holds the lock. Now func_2 operates on an object which is already locked. However, what if I also want to call func_2 from somewhere else without holding the lock? Should func_2 lock obj or should it not? I see 3 possibilites:
I could pass a bool to func_2 indicating whether ot not locking is required.
This seems to introduce a lot of boilerplate code though.
I could use a recursive lock and just always lock obj in func_2. Recursive locks
seem to
be problematic though, see here.
I could assume that every caller of func_2 holds the lock already. I would have
to document this and perhaps enforce this (at least in debugging mode). Is
it reasonable to have functions make assumptions regarding which locks are / are not
held by the calling thread? More generally, how do I decide from a design perspective
whether a function should lock Obj and which should assume that it is already locked?
(Obviously if a function assumes that certain locks are held then it can only call
functions which make at least equally strong assumptions but apart from that?)
My question is the following: Which one of these approaches is used in practice and why?
Thanks in advance
hfhc2
1. Passing an indicator whether to lock or not:
You give the the lock choice to the caller. This is error prone:
the caller might not do the right choice
the caller needs to know implementation details about your object, thus breaking the principle of encapsulation
the caller needs access to the mutex
If you have several objects, you eventually facilitate conditions for deadlocks
2. recursive lock:
You already highlighted the issue.
3. Pass locking responsibility to caller:
Among the different alternatives that you propose, this seems the most consistent. On contrary of 1, ou don't give the choice, but you pass complete responsibility for locking. It's part of the contract for using func_2.
You could even assert if a lock is set on the object, to prevent mistakes (although teh check wold be limited because you would not necessarily be in position to verivy who owns the lock).
4.Reconsider your design:
If you need to ensure in func_2 that the object is locked, it means that you have a critical section therein that you must protect. There are chances that both functions need to lock because they perform some lower level operations on obj and need to prevent data races on an instable state of the object.
I'd strongly advidse to look if it would be feasible to extract these lower-level routines from both func and func_2, and encapsulated them in simpler primitive functions on obj. This approach could also contribute to locking for shorter sequences thus increasing opportunity for real concurrency.
Ok, just as another follow-up. I recently read the API documentation of glib, in particular the section about message-passing queues. I found that most functions operating on these queues come in two variants, named function and function_unlocked. The idea is that if a programmer wants to execute a single operation, like popping from the queue this can be done using g_async_queue_pop(). The function automatically takes care of the locking/unlocking of the queue. However, if the programmer wants to for instance pop two elements without interruption, the following sequence may be used:
GAsyncQueue *queue = g_async_queue_new();
// ...
g_async_queue_lock(queue);
g_async_queue_pop_unlocked(queue);
g_async_queue_pop_unlocked(queue);
g_async_queue_unlock(queue);
This resembles my third approach. It is also the case that assumptions regarding the state of certain locks are made, they are required by the API and required to be documented.

Cheapest way to wake up multiple waiting threads without blocking

I use boost::thread to manage threads. In my program i have pool of threads (workers) that are activated sometimes to do some job simultaneously.
Now i use boost::condition_variable: and all threads are waiting inside boost::condition_variable::wait() call on their own conditional_variableS objects.
Can i AVOID using mutexes in classic scheme, when i work with conditional_variables? I want to wake up threads, but don't need to pass some data to them, so don't need a mutex to be locked/unlocked during awakening process, why should i spend CPU on this (but yes, i should remember about spurious wakeups)?
The boost::condition_variable::wait() call trying to REACQUIRE the locking object when CV received the notification. But i don't need this exact facility.
What is cheapest way to awake several threads from another thread?
If you don't reacquire the locking object, how can the threads know that they are done waiting? What will tell them that? Returning from the block tells them nothing because the blocking object is stateless. It doesn't have an "unlocked" or "not blocking" state for it to return in.
You have to pass some data to them, otherwise how will they know that before they had to wait and now they don't? A condition variable is completely stateless, so any state that you need must be maintained and passed by you.
One common pattern is to use a mutex, condition variable, and a state integer. To block, do this:
Acquire the mutex.
Copy the value of the state integer.
Block on the condition variable, releasing the mutex.
If the state integer is the same as it was when you coped it, go to step 3.
Release the mutex.
To unblock all threads, do this:
Acquire the mutex.
Increment the state integer.
Broadcast the condition variable.
Release the mutex.
Notice how step 4 of the locking algorithm tests whether the thread is done waiting? Notice how this code tracks whether or not there has been an unblock since the thread decided to block? You have to do that because condition variables don't do it themselves. (And that's why you need to reacquire the locking object.)
If you try to remove the state integer, your code will behave unpredictably. Sometimes you will block too long due to missed wakeups and sometimes you won't block long enough due to spurious wakeups. Only a state integer (or similar predicate) protected by the mutex tells the threads when to wait and when to stop waiting.
Also, I haven't seen how your code uses this, but it almost always folds into logic you're already using. Why did the threads block anyway? Is it because there's no work for them to do? And when they wakeup, are they going to figure out what to do? Well, finding out that there's no work for them to do and finding out what work they do need to do will require some lock since it's shared state, right? So there almost always is already a lock you're holding when you decide to block and need to reacquire when you're done waiting.
For controlling threads doing parallel jobs, there is a nice primitive called a barrier.
A barrier is initialized with some positive integer value N representing how many threads it holds. A barrier has only a single operation: wait. When N threads call wait, the barrier releases all of them. Additionally, one of the threads is given a special return value indicating that it is the "serial thread"; that thread will be the one to do some special job, like integrating the results of the computation from the other threads.
The limitation is that a given barrier has to know the exact number of threads. It's really suitable for parallel processing type situations.
POSIX added barriers in 2003. A web search indicates that Boost has them, too.
http://www.boost.org/doc/libs/1_33_1/doc/html/barrier.html
Generally speaking, you can't.
Assuming the algorithm looks something like this:
ConditionVariable cv;
void WorkerThread()
{
for (;;)
{
cv.wait();
DoWork();
}
}
void MainThread()
{
for (;;)
{
ScheduleWork();
cv.notify_all();
}
}
NOTE: I intentionally omitted any reference to mutexes in this pseudo-code. For the purposes of this example, we'll suppose ConditionVariable does not require a mutex.
The first time through MainTnread(), work is queued and then it notifies WorkerThread() that it should execute its work. At this point two things can happen:
WorkerThread() completes DoWork() before MainThread() can complete ScheduleWork().
MainThread() completes ScheduleWork() before WorkerThread() can complete DoWork().
In case #1, WorkerThread() comes back around to sleep on the CV, and is awoken by the next cv.notify() and all is well.
In case #2, MainThread() comes back around and notifies... nobody and continues on. Meanwhile WorkerThread() eventually comes back around in its loop and waits on the CV but it is now one or more iterations behind MainThread().
This is known as a "lost wakeup". It is similar to the notorious "spurious wakeup" in that the two threads now have different ideas about how many notify()s have taken place. If you are expecting the two threads to maintain synchrony (and usually you are), you need some sort of shared synchronization primitive to control it. This is where the mutex comes in. It helps avoid lost wakeups which, arguably, are a more serious problem than the spurious variety. Either way, the effects can be serious.
UPDATE: For further rationale behind this design, see this comment by one of the original POSIX authors: https://groups.google.com/d/msg/comp.programming.threads/cpJxTPu3acc/Hw3sbptsY4sJ
Spurious wakeups are two things:
Write your program carefully, and make sure it works even if you
missed something.
Support efficient SMP implementations
There may be rare cases where an "absolutely, paranoiacally correct"
implementation of condition wakeup, given simultaneous wait and
signal/broadcast on different processors, would require additional
synchronization that would slow down ALL condition variable operations
while providing no benefit in 99.99999% of all calls. Is it worth the
overhead? No way!
But, really, that's an excuse because we wanted to force people to
write safe code. (Yes, that's the truth.)
boost::condition_variable::notify_*(lock) does NOT require that the caller hold the lock on the mutex. THis is a nice improvement over the Java model in that it decouples the notification of threads with the holding of the lock.
Strictly speaking, this means the following pointless code SHOULD DO what you are asking:
lock_guard lock(mutex);
// Do something
cv.wait(lock);
// Do something else
unique_lock otherLock(mutex);
//do something
otherLock.unlock();
cv.notify_one();
I do not believe you need to call otherLock.lock() first.

createmutex questions

Let's say I call
h=CreateMutex(NULL,FALSE,"full");
y=WaitForSingleObject(h,INFINITE);
//Read from a queue (critical section)
ReleaseMutex(h);
What issues can arise that can lead to an access violation reading a location?
For example is it possible for multiple threads to enter that critical section as the same time?
Although you're storing the results of those functions in variables, you're not reading them to determine whether the functions succeeded. Perhaps you didn't create or open the given mutex, so h is 0. Or perhaps instead of acquiring ownership of the mutex, the the wait failed. In either case, you should call GetLastError to find out why, and then don't execute the protected section of code.
It's possible for a mutex to be abandoned. That means that the thread that previously owned the mutex was terminated before it released ownership of the mutex. (Only a mutex can be abandoned; critical sections and semaphores don't have thread affinity the way mutex objects do.) If that happens, you'll still be granted ownership of the mutex, but you can't really trust the validity of the data that the mutex is supposed to be protecting because the previous owner might not have left things in a stable state before it terminated.
If you call the functions correctly and check for errors, there's no way for multiple threads to enter a critical section simultaneously. That's the whole purpose of synchronization objects.

How to synchronize access to many objects

I have a thread pool with some threads (e.g. as many as number of cores) that work on many objects, say thousands of objects. Normally I would give each object a mutex to protect access to its internals, lock it when I'm doing work, then release it. When two threads would try to access the same object, one of the threads has to wait.
Now I want to save some resources and be scalable, as there may be thousands of objects, and still only a hand full of threads. I'm thinking about a class design where the thread has some sort of mutex or lock object, and assigns the lock to the object when the object should be accessed. This would save resources, as I only have as much lock objects as I have threads.
Now comes the programming part, where I want to transfer this design into code, but don't know quite where to start. I'm programming in C++ and want to use Boost classes where possible, but self written classes that handle these special requirements are ok. How would I implement this?
My first idea was to have a boost::mutex object per thread, and each object has a boost::shared_ptr that initially is unset (or NULL). Now when I want to access the object, I lock it by creating a scoped_lock object and assign it to the shared_ptr. When the shared_ptr is already set, I wait on the present lock. This idea sounds like a heap full of race conditions, so I sort of abandoned it. Is there another way to accomplish this design? A completely different way?
Edit:
The above description is a bit abstract, so let me add a specific example. Imagine a virtual world with many objects (think > 100.000). Users moving in the world could move through the world and modify objects (e.g. shoot arrows at monsters). When only using one thread, I'm good with a work queue where modifications to objects are queued. I want a more scalable design, though. If 128 core processors are available, I want to use all 128, so use that number of threads, each with work queues. One solution would be to use spatial separation, e.g. use a lock for an area. This could reduce number of locks used, but I'm more interested if there's a design which saves as much locks as possible.
You could use a mutex pool instead of allocating one mutex per resource or one mutex per thread. As mutexes are requested, first check the object in question. If it already has a mutex tagged to it, block on that mutex. If not, assign a mutex to that object and signal it, taking the mutex out of the pool. Once the mutex is unsignaled, clear the slot and return the mutex to the pool.
Without knowing it, what you were looking for is Software Transactional Memory (STM).
STM systems manage with the needed locks internally to ensure the ACI properties (Atomic,Consistent,Isolated). This is a research activity. You can find a lot of STM libraries; in particular I'm working on Boost.STM (The library is not yet for beta test, and the documentation is not really up to date, but you can play with). There are also some compilers that are introducing TM in (as Intel, IBM, and SUN compilers). You can get the draft specification from here
The idea is to identify the critical regions as follows
transaction {
// transactional block
}
and let the STM system to manage with the needed locks as far as it ensures the ACI properties.
The Boost.STM approach let you write things like
int inc_and_ret(stm::object<int>& i) {
BOOST_STM_TRANSACTION {
return ++i;
} BOOST_STM_END_TRANSACTION
}
You can see the couple BOOST_STM_TRANSACTION/BOOST_STM_END_TRANSACTION as a way to determine a scoped implicit lock.
The cost of this pseudo transparency is of 4 meta-data bytes for each stm::object.
Even if this is far from your initial design I really think is what was behind your goal and initial design.
I doubt there's any clean way to accomplish your design. The problem that assigning the mutex to the object looks like it'll modify the contents of the object -- so you need a mutex to protect the object from several threads trying to assign mutexes to it at once, so to keep your first mutex assignment safe, you'd need another mutex to protect the first one.
Personally, I think what you're trying to cure probably isn't a problem in the first place. Before I spent much time on trying to fix it, I'd do a bit of testing to see what (if anything) you lose by simply including a Mutex in each object and being done with it. I doubt you'll need to go any further than that.
If you need to do more than that I'd think of having a thread-safe pool of objects, and anytime a thread wants to operate on an object, it has to obtain ownership from that pool. The call to obtain ownership would release any object currently owned by the requesting thread (to avoid deadlocks), and then give it ownership of the requested object (blocking if the object is currently owned by another thread). The object pool manager would probably operate in a thread by itself, automatically serializing all access to the pool management, so the pool management code could avoid having to lock access to the variables telling it who currently owns what object and such.
Personally, here's what I would do. You have a number of objects, all probably have a key of some sort, say names. So take the following list of people's names:
Bill Clinton
Bill Cosby
John Doe
Abraham Lincoln
Jon Stewart
So now you would create a number of lists: one per letter of the alphabet, say. Bill and Bill would go in one list, John, Jon Abraham all by themselves.
Each list would be assigned to a specific thread - access would have to go through that thread (you would have to marshall operations to an object onto that thread - a great use of functors). Then you only have two places to lock:
thread() {
loop {
scoped_lock lock(list.mutex);
list.objectAccess();
}
}
list_add() {
scoped_lock lock(list.mutex);
list.add(..);
}
Keep the locks to a minimum, and if you're still doing a lot of locking, you can optimise the number of iterations you perform on the objects in your lists from 1 to 5, to minimize the amount of time spent acquiring locks. If your data set grows or is keyed by number, you can do any amount of segregating data to keep the locking to a minimum.
It sounds to me like you need a work queue. If the lock on the work queue became a bottle neck you could switch it around so that each thread had its own work queue then some sort of scheduler would give the incoming object to the thread with the least amount of work to do. The next level up from that is work stealing where threads that have run out of work look at the work queues of other threads.(See Intel's thread building blocks library.)
If I follow you correctly ....
struct table_entry {
void * pObject; // substitute with your object
sem_t sem; // init to empty
int nPenders; // init to zero
};
struct table_entry * table;
object_lock (void * pObject) {
goto label; // yes it is an evil goto
do {
pEntry->nPenders++;
unlock (mutex);
sem_wait (sem);
label:
lock (mutex);
found = search (table, pObject, &pEntry);
} while (found);
add_object_to_table (table, pObject);
unlock (mutex);
}
object_unlock (void * pObject) {
lock (mutex);
pEntry = remove (table, pObject); // assuming it is in the table
if (nPenders != 0) {
nPenders--;
sem_post (pEntry->sem);
}
unlock (mutex);
}
The above should work, but it does have some potential drawbacks such as ...
A possible bottleneck in the search.
Thread starvation. There is no guarantee that any given thread will get out of the do-while loop in object_lock().
However, depending upon your setup, these potential draw-backs might not matter.
Hope this helps.
We here have an interest in a similar model. A solution we have considered is to have a global (or shared) lock but used in the following manner:
A flag that can be atomically set on the object. If you set the flag you then own the object.
You perform your action then reset the variable and signal (broadcast) a condition variable.
If the acquire failed you wait on the condition variable. When it is broadcast you check its state to see if it is available.
It does appear though that we need to lock the mutex each time we change the value of this variable. So there is a lot of locking and unlocking but you do not need to keep the lock for any long period.
With a "shared" lock you have one lock applying to multiple items. You would use some kind of "hash" function to determine which mutex/condition variable applies to this particular entry.
Answer the following question under the #JohnDibling's post.
did you implement this solution ? I've a similar problem and I would like to know how you solved to release the mutex back to the pool. I mean, how do you know, when you release the mutex, that it can be safely put back in queue if you do not know if another thread is holding it ?
by #LeonardoBernardini
I'm currently trying to solve the same kind of problem. My approach is create your own mutex struct (call it counterMutex) with a counter field and the real resource mutex field. So every time you try to lock the counterMutex, first you increment the counter then lock the underlying mutex. When you're done with it, you decrement the coutner and unlock the mutex, after that check the counter to see if it's zero which means no other thread is trying to acquire the lock . If so put the counterMutex back to the pool. Is there a race condition when manipulating the counter? you may ask. The answer is NO. Remember you have a global mutex to ensure that only one thread can access the coutnerMutex at one time.