lock guard - will it queue multiple request

lock guard - will it queue multiple request - c++

I have a four member functions that can be called multiple times asynchronously from other piece of code - but since these functions are making use of its class member variables, I need to ensure that until one call execution is not over the second should not start but be in queue.
I have heard of lock guard feature in C++ that make a code block - in my case as automatic lock for a duration for a function :
void DoSomeWork()
{
std::lock_guard<std::mutex> lg(m); // Lock will be held from here to end of function
--------;
return;
}
Since my four class methods do independent work should I have four mutex one for each lock guard for each member function. Will the async calls made be in some sort of queue if a lock guard is active?
I mean if there are say 10 calls made to that member method at same time - so once 1st call acquires the lock guard the remaining 9 call request will wait until lock is free and take up execution one by one?

If a mutex is locked, the next request to lock it will block until the the previous thread holding the lock has unlocked it.
Note that attempting to lock a mutex multiple times from a single thread is undefined behavior. Don't do that.
For more information see e.g. this std::mutex reference.

Assuming you mean multiple threads issuing locks for the same mutex, based on prior questions, there's no queuing for pthreads or posix synchronization types. Say multiple threads each have a loop that starts with a lock and ends with an unlock, looping right back to the lock request, in which case the same thread can keep getting the lock, and none of the other threads will run (there's a very small chance that a time slice could occur between the unlock and lock, switching context to another thread). Using conditional variables also have an issue with spurious wakeup.
https://en.wikipedia.org/wiki/Spurious_wakeup
Based on testing, Windows native synchronization types, (CreateMutex, CreateSemaphore, WaitForSingleObject, WaitForMultipleObjects) do queue requests, but I haven't found it documented.
Some server applications on some operating systems will install a device driver that runs at kernel level in order to workaround the limitations of synchronization types on those operating systems.

Related

std::condition_variable memory writes visibility

Does std::condition_variable::notify_one() or std::condition_variable::notify_all() guarantee that non-atomic memory writes in the current thread prior to the call will be visible in notified threads?
Other threads do:
{
std::unique_lock lock(mutex);
cv.wait(lock, []() { return values[threadIndex] != 0; });
// May a thread here see a zero value and therefore start to wait again?
}
Main thread does:
fillData(values); // All values are zero and all threads wait() before calling this.
cv.notify_all(); // Do need some memory fence or lock before this
// to ensure that new non-zero values will be visible
// in other threads immediately after waking up?
Doesn't notify_all() store some atomic value therefore enforcing memory ordering? I did not clarified it.
UPD: according to Superlokkus' answer and an answer here: we have to acquire a lock to ensure memory writes visibility in other threads (memory propagation), otherwise threads in my case may read zero values.
Also I missed this quote here about condition_variable, which specifically answers my question. Even an atomic variable has to be modified under a lock in a case when the modification must become visible immediately.
Even if the shared variable is atomic, it must be modified under the
mutex in order to correctly publish the modification to the waiting
thread.

I guess you are mixing up memory ordering of so called atomic values and the mechanisms of classic lock based synchronization.
When you have a datum which is shared between threads, lets say an int for example, one thread can not simply read it while the other thread might be write to it meanwhile. Otherwise we would have a data race.
To get around this for long time we used classic lock based synchronization:
The threads share at least a mutex and the int. To read or to write any thread has to hold the lock first, meaning they wait on the mutex. Mutexes are build so that they are fine that this can happen concurrently. If a thread wins gettting the mutex it can change or read the int and then should unlock it, so others can read/write too. Using a conditional variable like you used is just to make the pattern "readers wait for a change of a value by a writer" more efficient, they get woken up by the cv instead of periodically waiting on the lock, reading, and unlocking, which would be called busy waiting.
So because you hold the lock in any after waiting on the mutex or in you case, correctly (mutex is still needed) waiting on the conditional variable, you can change the int. And readers will read the new value after the writer was able to wrote it, never the old. UPDATE: However one thing if have to add, which might also be the cause of confusion: Conditional variables are subject for so called spurious wakeups. Meaning even though you write did not have notified any thread, a read thread might still wake up, with the mutex locked. So you have to check if you writer actually waked you up, which is usually done by the writer by changing another datum just to notify this, or if its suitable by using the same datum you already wanted to share. The lambda parameter overload of std::condition_variable::wait was just made to make the checking and going back to sleep code looking a bit prettier. Based on your question now I don't know if you want to use you values for this job.
However at snippet for the "main" thread is incorrect or incomplete:
You are not synchronizing on the mutex in order to change values.
You have to hold the lock for that, but notifying can be done without the lock.
std::unique_lock lock(mutex);
fillData(values);
lock.unlock();
cv.notify_all();
But these mutex based patters have some drawbacks and are slow, only one thread at a time can do something. This is were so called atomics, like std::atomic<int> came into play. They can be written and read at the same time without an mutex by multiple threads concurrently. Memory ordering is only a thing to consider there and an optimization for cases where you uses several of them in a meaningful way or you don't need the "after the write, I never see the old value" guarantee. However with it's default memory ordering memory_order_seq_cst you would also be fine.

Using unique locks while in DPC

Currently working on a light weight filter in the NDIS stack. I'm trying to inject a packet which set in a global variable as an NBL. During receive NBL, if an injected NBL is pending, than a lock is taken by the thread before picking the injected NBL up to process it. Originally I was looking at using a spin lock or FAST_MUTEX. But according to the documentation for FAST_MUTEX, any other threads attempting to take the lock will wait for the lock to release before continuing.
The problem is, that receive NBL is running in DPC mode. This would cause a DPC running thread to pause and wait for the lock to release. Additionally, I'd like to be able to assert ownership of a thread's ownership over a lock.
My question is, does windows kernel support unique mutex locks in the kernel, can these locks be taken in DPC mode and how expensive is assertion of ownership in the lock. I'm fairly new to C++ so forgive any syntax errors.
I attempted to define a mutex in the LWF object
// Header file
#pragma once
#include <mutex.h>
class LWFobject
{
public:
LWFobject()
std::mutex ExampleMutex;
std::unique_lock ExampleLock;
}
//=============================================
// CPP file
#include "LWFobject.h"
LWFobject::LWFObject()
{
ExmapleMutex = CreateMutex(
NULL,
FALSE,
NULL);
ExampleLock(ExampleMutex, std::defer_lock);
}
Is the use of unique_locks supported in the kernel? When I attempt to compile it, it throws hundreds of compilation errors when attempting to use mutex.h. I'd like to use try_lock and owns_lock.

You can't use standard ISO C++ synchronization mechanisms while inside a Windows kernel.
A Windows kernel is a whole other world in itself, and requires you to live by its rules (which are vast - see for example these two 700-page books: 1, 2).
Processing inside a Windows kernel is largely asynchronous and event-based; you handle events and schedule deferred calls or use other synchronization techniques for work that needs to be done later.
Having said that, it is possible to have a mutex in the traditional sense inside a Windows driver. It's called a Fast Mutex and requires raising IRQL to APC_LEVEL. Then you can use calls like ExAcquireFastMutex, ExTryToAcquireFastMutex and ExReleaseFastMutex to lock/try-lock/release it.

A fundamental property of a lock is which priority (IRQL) it's synchronized at. A lock can be acquired from lower priorities, but can never be acquired from a higher priority.
(Why? Imagine how the lock is implemented. The lock must raise the current task priority up to the lock's natural priority. If it didn't do this, then a task running at a low priority could grab the lock, get pre-empted by a higher priority task, which would then deadlock if it tried to acquire the same lock. So every lock has a documented natural IRQL, and the lock will first raise the current thread to that IRQL before attempting to acquire exclusivity.)
The NDIS datapath can run at any IRQL between PASSIVE_LEVEL and DISPATCH_LEVEL, inclusive. This means that anything on the datapath must only ever use locks that are synchronized at DISPATCH_LEVEL (or higher). This really limits your choices: you can use KSPIN_LOCKs, NDIS_RW_LOCKs, and a handful of other uncommon ones.
This gets viral: if you have one function that can sometimes run at DISPATCH_LEVEL (like the datapath), it forces the lock to be synchronized at DISPATCH_LEVEL, which forces any other functions that hold the lock to also run at DISPATCH_LEVEL. That can be inconvenient, for example you might want to hold the locks while reading from the registry too.
There are various approaches to design your driver:
* Use spinlocks everywhere. When reading from the registry, read into temporary variables, then grab a spinlock and copy the temporary variables into global state.
* Use mutexes (or better yet: pushlocks) everywhere. Quarantine the datapath into a component that runs at dispatch level, and carefully copy any configuration state into this component's private state.
* Somehow avoid having your datapath interact with the rest of your driver, so there's no shared state, and thus no shared locks.
* Have the datapath rush to PASSIVE_LEVEL by queuing all packets to a worker thread.

C++/BOOST: Does condition_variable::wait( ) / notify( ) ensure ordering of threads waiting?

I need to know if there is a way to "queue up" threads that wait on a condition variable so that they are awoken in the correct order...without writing a bunch of queueing code, that is.
In most systems, the following reversal of the producer/consumer model (with blocking on full mailbox) may not ensure ordering:
unique_lock lock1(mutex), lock2(mutex)
ConditionVariable cv
Code Block A: (called by multiple threads)
lock(lock1)
timestampOnEntry = now()
cv.wait(lock1) // Don't worry about spurious notifies, out of scope.
somethingRequiringMonotonicOrderOfTimestamps(timestampOnEntry)
unlock(lock1)
Code Block B: (called by a single thread, typically within a loop)
lock(lock2)
somethingVeryVerySlow()
(1) unlock(lock2) // the ordering here is not a mistake
(2) cv.notify_one(lock2) // prevents needless reblocking in code block A
Note that lines (1) and (2) in the given order. This prevents an unnecessary second block on guard in code block A should the notified thread wake up before guard is unlocked by the thread in code block B.
The question is that if multiple threads are "blocked" on wait, I need to know if
*notify_one* will wake them up in the order in which blocked. Probably not (as in Java). If not by default, if there is a way to specify that.
This could of course be done with a bunch of queuing code, but I'd prefer to use a pre-canned BOOST methodology, regardless of how complicated the contents of the can are. Of course, should I convert *cv.notify_one(guard)* into *cv.notify_all(guard)*, I would be required to do the queueing code, regardless.

No such guarantess are given by the standard, notify_one may wake any thread that is currently waiting (§30.5.1):
void notify_one() noexcept;
Effects: If any threads are blocked waiting for *this, unblocks one of those theads.
The only way to ensure that a specific thread reacts to the event is to wake all threads and then have some additional synchronization mechanism that sends all but the correct thread back to sleep.
This is a fundamental limitation due to the requirements that the platform has to fulfill: Usually condition variables are implemented in a way that the waiting threads are put into a suspended state and will not get scheduled by the system again until a notify occurs. A scheduler implementation is not required to provide the functionality for selecting a specific thread for waking up (and many actually don't).
So this part of the logic inevitably has to be handled by user code, which in turn means you have to wake up all threads to make it work, because this is the only way to ensure that the correct thread will get woken at all.

The short answer, as you seem to have suspected, is no. Which thread (or threads) notify_one is going to rouse is not necessarily guaranteed.
That said, I'm not sure what to make of your example code. Specifically, passing a mutex to notify_one doesn't make sense to me (I am unaware of any condition variable implementations on any platform that signal/broadcast that way). I don't know your use case--perhaps you must have a lot of thread local data that prevents arranging your application state in such a way that any thread can pick up the necessary data to do the next task? My first reaction to that would be to refactor the code to care less about which particular OS thread does which work and focus more on the ordering of the work itself.

How to implement a recursive MRSW lock?

I need a fully-recursive multiple-reader/single-writer lock (shared mutex) for my project - I don't agree with the notion that if you have complete const-correctness you shouldn't need them (there was some discussion about that on the boost mailing list), in my case the lock should protect a completely transparent cache which would be mutable in any case.
As for the semantics of recursive MRSW locks, I think the only ones that make sense are that acquiring a exclusive lock in addition to a shared one temporarily releases the shared one, to be reacquired when the exclusive one is released.
Has the somewhat strange effect that unlocking can wait but I can live with that - writing rarely happens anyway and recursive locking usually only happens through recursive code paths, in which case the caller has to be prepared that the call might wait in any case. To avoid it one can still simply upgrade the lock instead of using recursive locking.
Acquiring a shared lock on top of an exclusive one should obviously just increases the lock count.
So the question becomes - how should I implement it? The usual approach with a critical section and two semaphores doesn't work here because - as far as I can see - the woken up thread has to handshake, by inserting it's thread id into the lock's owner map.
I suppose it would be doable with two condition variables and a couple of mutexes but the sheer amount of synchronization primitives that would end up using sounds like a bit too much overhead for my taste.
An idea which just sprang into my mind is to utilize TLS to remember the type of lock I'm holding (and possibly the local lock counts). Have to think it through - but I'll still post the question for now.
Target platform is Win32 but that shouldn't really matter. Note that I'm specifically targeting Win2k so anything related to the new MRSW lock primitive in Windows 7 is not relevant for me. :-)

Okay, I solved it.
It can be done with just 2 semaphores, a critical section and almost no more locking than for a regular non-recursive MRSW lock (there is obviously some more CPU-time spent inside the lock because that multimap must be managed) - but it's tricky. The structure I came up with looks like this:
// Protects everything that follows, except mWriterThreadId and mRecursiveUpgrade
CRITICAL_SECTION mLock;
// Semaphore to wait on for a read lock
HANDLE mSemaReader;
// Semaphore to wait on for a write lock
HANDLE mSemaWriter;
// Number of threads waiting for a write lock.
int mWriterWaiting;
// Number of times the writer entered the write lock.
int mWriterActive;
// Number of threads inside a read lock. Note that this does not include
// recursive read locks.
int mReaderActiveThreads;
// Whether or not the current writer obtained the lock by a recursive
// upgrade. Note that this member might be set outside the critical
// section, so it should only be read from by the writer during his
// unlock.
bool mRecursiveUpgrade;
// This member contains the current thread id once for each
// (recursive) read lock held by the current thread in addition to an
// undefined number of other thread ids which may or may not hold a
// read lock, even inside the critical section (!).
std::multiset<unsigned long> mReaderActive;
// If there is no writer this member contains 0.
// If the current thread is the writer this member contains his
// thread-id.
// Otherwise it can contain either of them, even inside the
// critical section (!).
// Also note that it might be set outside the critical section.
unsigned long mWriterThreadId;
Now, the basic idea is this:
Full update of mWriterWaiting and mWriterActive for an unlock is performed by the unlocking thread.
For mWriterThreadId and mReaderActive this is not possible, as the waiting thread needs to insert itself when it was released.
So the rule is, that you may never access those two members except to check whether you are holding a read lock or are the current writer - specifically it may not be used to checker whether or not there are any readers / writers - for that you have to use the (somewhat redundant but necessary for this reason) mReaderActiveThreads and mWriterActive.
I'm currently running some test code (which has been going on deadlock- and crash-free for 30 minutes or so) - when I'm sure that it's stable and I've cleaned up the code somewhat I'll put it on some pastebin and add a link in a comment here (just in case someone else ever needs this).

Well, I did some thinking. Starting from the simple "two semaphores and a critical section" one adds a writer lock count and a owning writer TID to the structure.
Unlock still set most of the new status in the critsec. Readers still normally increase the lock count - recursive locking simply adds a non-existing reader to the counter.
During writers lock() I compare the owning TID, and if the writer already own it the write lock counter is increased.
Setting the new writer TID can't be done by the unlock() - it doesn't know which one will be wakened, but if writers reset it back to zero in their unlock() it's not a problem - the current thread id won't ever be zero and setting it is an atomic operation.
All sounds simple enough - one nasty problem left: A recursive reader-reader lock while a writer is waiting will deadlock. And I don't know how to solve that short of doing a reader-biased lock... somehow I need to know whether or not I already own a reader lock.
Using TLS doesn't sound too great after I realized that the number if available slots might be rather limited...

As far as I understand, you need to provide your writer exclusive access to the data, while readers can operate simultaneously (if this is not what you want, please clarify your question).
I think you need to implement a sort of "inverse semaphore", i.e. a semaphore that will block a thread when positive, and signal all waiting threads when zero. If you do this, you can use two such semaphores for your program. The operation of your threads could then be the following:
Reader:
(1) wait on sem A
(2) increase sem B
(3) read operation
(4) decrease sem B
Writer:
(1) increase sem A
(2) wait on sem B
(3) write operation
(4) decrease sem A
In this way the writer will perform the write operation as soon as all pending readers have finished reading. As soon as your writer finishes, readers can resume their operation without blocking each other.
I am not familiar with Windows mutex/semaphore facilities but I can think of a way to implement such semaphores using the POSIX threads API (combining a mutex, a counter and a conditional variable).

Not locking mutex for pthread_cond_timedwait and pthread_cond_signal ( on Linux )

Is there any downside to calling pthread_cond_timedwait without taking a lock on the associated mutex first, and also not taking a mutex lock when calling pthread_cond_signal ?
In my case there is really no condition to check, I want a behavior very similar to Java wait(long) and notify().
According to the documentation, there can be "unpredictable scheduling behavior". I am not sure what that means.
An example program seems to work fine without locking the mutexes first.

The first is not OK:
The pthread_cond_timedwait() and
pthread_cond_wait() functions shall
block on a condition variable. They
shall be called with mutex locked by
the calling thread or undefined
behavior results.
http://opengroup.org/onlinepubs/009695399/functions/pthread_cond_timedwait.html
The reason is that the implementation may want to rely on the mutex being locked in order to safely add you to a waiter list. And it may want to release the mutex without first checking it is held.
The second is disturbing:
if predictable scheduling behaviour is
required, then that mutex is locked by
the thread calling
pthread_cond_signal() or
pthread_cond_broadcast().
http://www.opengroup.org/onlinepubs/007908775/xsh/pthread_cond_signal.html
Off the top of my head, I'm not sure what the specific race condition is that messes up scheduler behaviour if you signal without taking the lock. So I don't know how bad the undefined scheduler behaviour can get: for instance maybe with broadcast the waiters just don't get the lock in priority order (or however your particular scheduler normally behaves). Or maybe waiters can get "lost".
Generally, though, with a condition variable you want to set the condition (at least a flag) and signal, rather than just signal, and for this you need to take the mutex. The reason is that otherwise, if you're concurrent with another thread calling wait(), then you get completely different behaviour according to whether wait() or signal() wins: if the signal() sneaks in first, then you'll wait for the full timeout even though the signal you care about has already happened. That's rarely what users of condition variables want, but may be fine for you. Perhaps this is what the docs mean by "unpredictable scheduler behaviour" - suddenly the timeslice becomes critical to the behaviour of your program.
Btw, in Java you have to have the lock in order to notify() or notifyAll():
This method should only be called by a
thread that is the owner of this
object's monitor.
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html#notify()
The Java synchronized {/}/wait/notifty/notifyAll behaviour is analogous to pthread_mutex_lock/pthread_mutex_unlock/pthread_cond_wait/pthread_cond_signal/pthread_cond_broadcast, and not by coincidence.

Butenhof's excellent "Programming with POSIX Threads" discusses this right at the end of chapter 3.3.3.
Basically, signalling the condvar without locking the mutex is a potential performance optimisation: if the signalling thread has the mutex locked, then the thread waking on the condvar has to immediately block on the mutex that the signalling thread has locked even if the signalling thread is not modifying any of the data the waiting thread will use.
The reason that "unpredictable scheduler behavior" is mentioned is that if you have a high-priority thread waiting on the condvar (which another thread is going to signal and wakeup the high priority thread), any other lower-priority thread can come and lock the mutex so that when the condvar is signalled and the high-priority thread is awakened, it has to wait on the lower-priority thread to release the mutex. If the mutex is locked whilst signalling, then the higher-priority thread will be scheduled on the mutex before the lower-priority thread: basically you know that that when you "awaken" the high-priority thread it will awaken as soon as the scheduler allows it (of course, you might have to wait on the mutex before signalling the high-priority thread, but that's a different issue).

The point of waiting on conditional variable paired with a mutex is to atomically enter wait and release the lock, i.e. allow other threads to modify the protected state, then again atomically receive notification of the state change and acquire the lock. What you describe can be done with many other methods like pipes, sockets, signals, or - probably the most appropriate - semaphores.

I think this should work (note untested code):
// initialize a semaphore
sem_t sem;
sem_init(&sem,
0, // not shared
0 // initial value of 0
);
// thread A
struct timespec tm;
struct timeb tp;
const long sec = msecs / 1000;
const long millisec = msecs % 1000;
ftime(&tp);
tp.time += sec;
tp.millitm += millisec;
if(tp.millitm > 999) {
tp.millitm -= 1000;
tp.time++;
}
tm.tv_sec = tp.time;
tm.tv_nsec = tp.millitm * 1000000;
// wait until timeout or woken up
errno = 0;
while((sem_timedwait(&sem, &tm)) == -1 && errno == EINTR) {
continue;
}
return errno == ETIMEDOUT; // returns true if a timeout occured
// thread B
sem_post(&sem); // wake up Thread A early

Conditions should be signaled outside of the mutex whenever possible. Mutexes are a necessary evil in concurrent programming. Their use leads to contention which robs the system of the maximum performance that it can gain from the use of multiple processors.
The purpose of a mutex is to guard access to some shared variables in the program so that they behave atomically. When a signaling operation is done inside a mutex, it causes an inclusion of hundreds of irrelevant machine cycles into the mutex which have nothing to do with guarding the shared data. Potentially, it calls from a user space all the way into a kernel.
The notes about "predictable scheduler behavior" in the standard are completely bogus.
When we want the machine to execute statements in a predictable, well-defined order, the tool for that is the sequencing of statements within a single thread of execution: S1 ; S2. Statement S1 is "scheduled" before S2.
We use threads when we realize that some actions are independent and their scheduling order is not important, and there are performance benefits to be realized, like more timely response to real time events or computing on multiple processors.
At times when scheduling orders do become important among multiple threads, this falls under a concept called priority. Priority resolves what happens first when any one of N statements could potentially be scheduled to execute. Another tool for ordering under multithreading is queuing. Events are placed into a queue by one or more threads and a single service thread processes the events in the queue order.
The bottom line is, the placement of pthread_cond_broadcast is not an appropriate tool for controlling execution order. It will not make execution order predictable in the sense that the program suddenly has exactly the same, reproducible behavior on every platform.

"unpredictable scheduling behavior" means just that. You don't know what's going to happen.
Nor do the implementation. It could work as expected. It could crash your app. It could work fine for years, then a race condition makes your app go monkey. It could deadlock.
Basically if any docs suggest anything undefined/unpredicatble can happen unless you do what the docs tell you to do, you better do it. Else stuff might blow up in your face.
(And it won't blow up until you put the code into production , just to annoy you even more. Atleast that's my experience)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js