Using unique locks while in DPC - c++

Currently working on a light weight filter in the NDIS stack. I'm trying to inject a packet which set in a global variable as an NBL. During receive NBL, if an injected NBL is pending, than a lock is taken by the thread before picking the injected NBL up to process it. Originally I was looking at using a spin lock or FAST_MUTEX. But according to the documentation for FAST_MUTEX, any other threads attempting to take the lock will wait for the lock to release before continuing.
The problem is, that receive NBL is running in DPC mode. This would cause a DPC running thread to pause and wait for the lock to release. Additionally, I'd like to be able to assert ownership of a thread's ownership over a lock.
My question is, does windows kernel support unique mutex locks in the kernel, can these locks be taken in DPC mode and how expensive is assertion of ownership in the lock. I'm fairly new to C++ so forgive any syntax errors.
I attempted to define a mutex in the LWF object
// Header file
#pragma once
#include <mutex.h>
class LWFobject
{
public:
LWFobject()
std::mutex ExampleMutex;
std::unique_lock ExampleLock;
}
//=============================================
// CPP file
#include "LWFobject.h"
LWFobject::LWFObject()
{
ExmapleMutex = CreateMutex(
NULL,
FALSE,
NULL);
ExampleLock(ExampleMutex, std::defer_lock);
}
Is the use of unique_locks supported in the kernel? When I attempt to compile it, it throws hundreds of compilation errors when attempting to use mutex.h. I'd like to use try_lock and owns_lock.

You can't use standard ISO C++ synchronization mechanisms while inside a Windows kernel.
A Windows kernel is a whole other world in itself, and requires you to live by its rules (which are vast - see for example these two 700-page books: 1, 2).
Processing inside a Windows kernel is largely asynchronous and event-based; you handle events and schedule deferred calls or use other synchronization techniques for work that needs to be done later.
Having said that, it is possible to have a mutex in the traditional sense inside a Windows driver. It's called a Fast Mutex and requires raising IRQL to APC_LEVEL. Then you can use calls like ExAcquireFastMutex, ExTryToAcquireFastMutex and ExReleaseFastMutex to lock/try-lock/release it.

A fundamental property of a lock is which priority (IRQL) it's synchronized at. A lock can be acquired from lower priorities, but can never be acquired from a higher priority.
(Why? Imagine how the lock is implemented. The lock must raise the current task priority up to the lock's natural priority. If it didn't do this, then a task running at a low priority could grab the lock, get pre-empted by a higher priority task, which would then deadlock if it tried to acquire the same lock. So every lock has a documented natural IRQL, and the lock will first raise the current thread to that IRQL before attempting to acquire exclusivity.)
The NDIS datapath can run at any IRQL between PASSIVE_LEVEL and DISPATCH_LEVEL, inclusive. This means that anything on the datapath must only ever use locks that are synchronized at DISPATCH_LEVEL (or higher). This really limits your choices: you can use KSPIN_LOCKs, NDIS_RW_LOCKs, and a handful of other uncommon ones.
This gets viral: if you have one function that can sometimes run at DISPATCH_LEVEL (like the datapath), it forces the lock to be synchronized at DISPATCH_LEVEL, which forces any other functions that hold the lock to also run at DISPATCH_LEVEL. That can be inconvenient, for example you might want to hold the locks while reading from the registry too.
There are various approaches to design your driver:
* Use spinlocks everywhere. When reading from the registry, read into temporary variables, then grab a spinlock and copy the temporary variables into global state.
* Use mutexes (or better yet: pushlocks) everywhere. Quarantine the datapath into a component that runs at dispatch level, and carefully copy any configuration state into this component's private state.
* Somehow avoid having your datapath interact with the rest of your driver, so there's no shared state, and thus no shared locks.
* Have the datapath rush to PASSIVE_LEVEL by queuing all packets to a worker thread.

Related

lock guard - will it queue multiple request

I have a four member functions that can be called multiple times asynchronously from other piece of code - but since these functions are making use of its class member variables, I need to ensure that until one call execution is not over the second should not start but be in queue.
I have heard of lock guard feature in C++ that make a code block - in my case as automatic lock for a duration for a function :
void DoSomeWork()
{
std::lock_guard<std::mutex> lg(m); // Lock will be held from here to end of function
--------;
return;
}
Since my four class methods do independent work should I have four mutex one for each lock guard for each member function. Will the async calls made be in some sort of queue if a lock guard is active?
I mean if there are say 10 calls made to that member method at same time - so once 1st call acquires the lock guard the remaining 9 call request will wait until lock is free and take up execution one by one?
If a mutex is locked, the next request to lock it will block until the the previous thread holding the lock has unlocked it.
Note that attempting to lock a mutex multiple times from a single thread is undefined behavior. Don't do that.
For more information see e.g. this std::mutex reference.
Assuming you mean multiple threads issuing locks for the same mutex, based on prior questions, there's no queuing for pthreads or posix synchronization types. Say multiple threads each have a loop that starts with a lock and ends with an unlock, looping right back to the lock request, in which case the same thread can keep getting the lock, and none of the other threads will run (there's a very small chance that a time slice could occur between the unlock and lock, switching context to another thread). Using conditional variables also have an issue with spurious wakeup.
https://en.wikipedia.org/wiki/Spurious_wakeup
Based on testing, Windows native synchronization types, (CreateMutex, CreateSemaphore, WaitForSingleObject, WaitForMultipleObjects) do queue requests, but I haven't found it documented.
Some server applications on some operating systems will install a device driver that runs at kernel level in order to workaround the limitations of synchronization types on those operating systems.

Is double-check locking safe in C++ for unidirectional data transfer?

I have inherited an application which I'm trying to improve the performance of and it currently uses mutexes (std::lock_guard<std::mutex>) to transfer data from one thread to another. One thread is a low-frequency (slow) one which simply modifies the data to be used by the other.
The other thread (which we'll call fast) has rather stringent performance requirements (it needs to do maximum number of cycles per second possible) and we believe this is being impacted by the use of the mutexes.
Basically, the current logic is:
slow thread: fast thread:
occasionally: very-often:
claim mutex claim mutex
change data use data
release mutex release mutex
In order to get the fast thread running at maximum throughput, I'd like to experiment with removing the number of mutex locks it has to do.
I suspect a variation of the double locking check pattern may be of use here. I know it has serious issues with bi-directional data flow (or singleton creation) but the areas of responsibility in my case are a little more limited in terms of which thread performs which operations (and when) on the shared data.
Basically, the slow thread sets up the data and never reads or writes to it again unless a new change comes in. The fast thread uses and changes the data but never expects to pass any information back to the other thread. In other words, ownership mostly flows strictly one way.
I wanted to see if anyone could pick any holes in the strategy I'm thinking of.
The new idea is to have two sets of data, one current and one pending. There is no need for a queue in my case as incoming data overwrites previous data.
The pending data will only ever be written to by the slow thread under the control of the mutex and it will have an atomic flag to indicate that it has written and relinquished control (for now).
The fast thread will continue to use current data (without the mutex) until such time as the atomic flag is set. Since it is responsible for transferring pending to current, it can ensure the current data is always consistent.
At the point where the flag is set, it will lock the mutex and, transfer pending to current, clear the flag, unlock the mutex and carry on.
So, basically, the fast thread runs at full speed and only does mutex locks when it knows the pending data needs to be transferred.
Getting into more concrete details, the class will have the following data members:
std::atomic_bool m_newDataReady;
std::mutex m_protectData;
MyStruct m_pendingData;
MyStruct m_currentData;
The method for receiving new data in the slow thread would be:
void NewData(const MyStruct &newData) {
std::lock_guard<std::mutex> guard(m_protectData);
m_newDataReady = false;
Transfer(m_newData, 'to', m_pendingData);
m_newDataReady = true;
}
Clearing the flag prevents the fast thread from even trying to check for new data until the immediate transfer operation is complete.
The fast thread is a little trickier, using the flag to keep mutex locks to a minimum:
while (true) {
if (m_newDataReady) {
std::lock_guard<std::mutex> guard(m_protectData);
if (m_newDataReady) {
Transfer(m_pendingData, 'to', m_currentData);
m_newDataReady = false;
}
}
Use (m_currentData);
}
Now it appears to me that the use of this method in the fast thread could improve performance quite a bit:
There is only one place where the atomic flag is used outside the control of the mutex and the fact that it's an atomic means its state should be consistent there.
Even if it's not consistent, the second check inside the mutex-locked area should provide a safety valve (it's rechecked when we know it's consistent).
The transfer of data is only ever performed under the control of the mutex so that should always be consistent.
The outer loop in the fast thread means that unnecessary mutex locks will be avoided - they'll only be done if the flag is true (or "half-true", a possibly inconsistent state).
The inner if will take care of that "half-true" possibility that, between checking the and locking the mutex, the flag has been cleared.
I can't see any holes in this strategy but, given I'm only just getting into atomics/threading in the standard-C++ world, it may be I'm missing something.
Are there any clear problems in using this method?

Concurrent code without waiting

I'm thinking about a certain kind of synchronisation primitive, but I don't know what this kind of synchronisation is called or if something like this would be working.
So there is one variable (boolean) which basically signals if one thread is still working on a block of memory or not. At the beginning the bool is set to false, meaning the worker thread is not working on your block of memory. Now the main thread gives the worker thread a "todo-list", describing how it should be working on that block of memory. After that, it changes the state of the boolean to true, so that the worker thread knows it is now allowed to do its work. The main thread can now continue its own work and checks at certain locations if the worker thread is now done working, e.g. if the boolean has been set to false again. If it is stil true, the main thread just continues its own work and doesn't wait for the worker thread. If the boolean is false, the main thread knows the worker thread is done and starts processing the block of memory.
So the boolean just transfers the ownership over a block of memory between two threads. If one thread currently does not have the ownership of that memory, it just continues with its own work, and checks repeatedly if it now has the ownership again. This way, none of the threads is waiting for one another and can continue its own work.
What is this called and how is such a behavior implemented?
EDIT: Basically it's a mutex. But instead of waiting for the mutex to be unlocked again, it continues/skips the critical code.
EDIT: Basically it's a mutex. But instead of waiting for the mutex to
be unlocked again, it continues/skips the critical code.
It's still a mutex, just with "try" methods.
in standard C++, we're talking about std::mutex::try_lock , which tries to lock the mutex, if it fails it returns false and moves on
class unlocker{
std::mutex& m_Parent;
public :
unlocker(std::mutex& parent) : m_Parent(parent){}
~unlocker() {m_Parent.unlock(); }
};
std::mutex mtx;
if (mtx.try_lock()){
unlocker unlock(mtx); // no, you can't use std::lock_guard/unique_lock here
//success, mtx is free
} else{
// do something else
}
on Native OS's code you have similar functions depending on the operating system you are on, like pthread_mutex_trylock on Unix and TryEnterCriticalSection on Windows. needless to say that standard mutex probably does use these functions behind the scenes
What will you do if the main thread runs out of work?
Suppose you keep checking and you keep reading true. Eventually you reach a point where the main thread cannot continue without the result from the worker thread. Since you have no more work to do, the only thing left is now keep checking the value of the flag over and over, wasting CPU resources that other threads could use to do useful work.
In general, this is not what you want. Instead, you would like the operating system to put your main thread to sleep and only wake it up once the worker thread has finished processing. All kinds of locks and semaphores that ship with modern operating systems work this way. Underneath there is some flag in memory that indicates who owns the lock, but there is also a bunch of logic around it that ensure the operating system won't schedule threads that have nothing to do but wait for a lock to become ready.
That being said, there are some situations where this is not what you want. If you are sufficiently sure that you won't run into the situation where one thread just spins on a lock, and you want to save the overhead that comes with the OS locks, just checking a flag like you described might be a viable option.
Note though that low-level stuff like this should be reserved for special circumstances, not be the first tool in your toolbox. It's just too easy to end up with an algorithm that is incorrect or an implementation that is not as efficient as you thought. If you decide to go down this road, be prepared to do some serious work to get it working as expected.

boost lock variable vector during update

several (2 or more) client threads need to run at a high frequency, but once every 1 minute a background service thread updates a variable used by the main threads.
whats is the best method of locking a variable -- in fact, a vector -- during the small moment of update with little impact on the client threads.
there is no need to protect the vector during 'normal' (no background thread) operation since all threads utilize the values.
boost::thread is used with a endless while loop to update the vector and sleep for 60 seconds.
This seems like a good occasion for a Reader-Writer lock. All the clients lock the vector for reading only, and the background service thread locks it for writing only once every minute.
SharedLockable concept from c++14
which is implemented in Boost Thread as boost::shared_mutex
The class boost::shared_mutex provides an implementation of a multiple-reader / single-writer mutex. It implements the SharedLockable concept.
Multiple concurrent calls to lock(), try_lock(), try_lock_for(), try_lock_until(), timed_lock(), lock_shared(), try_lock_shared_for(), try_lock_shared_until(), try_lock_shared() and timed_lock_shared() are permitted.
That said, depending on your actual platform and CPU model you could get more lucky with an atomic variable.
If it's a primitive value, just using boost::atomic_int or similar would be fine. For a vector, consider using std::shared_ptr (which has atomic support).See e.g.
Confirmation of thread safety with std::unique_ptr/std::shared_ptr
You can also do without the dynamic allocation (although, you're using vector already) by using two vectors, and switching a reference to the "actual" version atomically.

How to implement a recursive MRSW lock?

I need a fully-recursive multiple-reader/single-writer lock (shared mutex) for my project - I don't agree with the notion that if you have complete const-correctness you shouldn't need them (there was some discussion about that on the boost mailing list), in my case the lock should protect a completely transparent cache which would be mutable in any case.
As for the semantics of recursive MRSW locks, I think the only ones that make sense are that acquiring a exclusive lock in addition to a shared one temporarily releases the shared one, to be reacquired when the exclusive one is released.
Has the somewhat strange effect that unlocking can wait but I can live with that - writing rarely happens anyway and recursive locking usually only happens through recursive code paths, in which case the caller has to be prepared that the call might wait in any case. To avoid it one can still simply upgrade the lock instead of using recursive locking.
Acquiring a shared lock on top of an exclusive one should obviously just increases the lock count.
So the question becomes - how should I implement it? The usual approach with a critical section and two semaphores doesn't work here because - as far as I can see - the woken up thread has to handshake, by inserting it's thread id into the lock's owner map.
I suppose it would be doable with two condition variables and a couple of mutexes but the sheer amount of synchronization primitives that would end up using sounds like a bit too much overhead for my taste.
An idea which just sprang into my mind is to utilize TLS to remember the type of lock I'm holding (and possibly the local lock counts). Have to think it through - but I'll still post the question for now.
Target platform is Win32 but that shouldn't really matter. Note that I'm specifically targeting Win2k so anything related to the new MRSW lock primitive in Windows 7 is not relevant for me. :-)
Okay, I solved it.
It can be done with just 2 semaphores, a critical section and almost no more locking than for a regular non-recursive MRSW lock (there is obviously some more CPU-time spent inside the lock because that multimap must be managed) - but it's tricky. The structure I came up with looks like this:
// Protects everything that follows, except mWriterThreadId and mRecursiveUpgrade
CRITICAL_SECTION mLock;
// Semaphore to wait on for a read lock
HANDLE mSemaReader;
// Semaphore to wait on for a write lock
HANDLE mSemaWriter;
// Number of threads waiting for a write lock.
int mWriterWaiting;
// Number of times the writer entered the write lock.
int mWriterActive;
// Number of threads inside a read lock. Note that this does not include
// recursive read locks.
int mReaderActiveThreads;
// Whether or not the current writer obtained the lock by a recursive
// upgrade. Note that this member might be set outside the critical
// section, so it should only be read from by the writer during his
// unlock.
bool mRecursiveUpgrade;
// This member contains the current thread id once for each
// (recursive) read lock held by the current thread in addition to an
// undefined number of other thread ids which may or may not hold a
// read lock, even inside the critical section (!).
std::multiset<unsigned long> mReaderActive;
// If there is no writer this member contains 0.
// If the current thread is the writer this member contains his
// thread-id.
// Otherwise it can contain either of them, even inside the
// critical section (!).
// Also note that it might be set outside the critical section.
unsigned long mWriterThreadId;
Now, the basic idea is this:
Full update of mWriterWaiting and mWriterActive for an unlock is performed by the unlocking thread.
For mWriterThreadId and mReaderActive this is not possible, as the waiting thread needs to insert itself when it was released.
So the rule is, that you may never access those two members except to check whether you are holding a read lock or are the current writer - specifically it may not be used to checker whether or not there are any readers / writers - for that you have to use the (somewhat redundant but necessary for this reason) mReaderActiveThreads and mWriterActive.
I'm currently running some test code (which has been going on deadlock- and crash-free for 30 minutes or so) - when I'm sure that it's stable and I've cleaned up the code somewhat I'll put it on some pastebin and add a link in a comment here (just in case someone else ever needs this).
Well, I did some thinking. Starting from the simple "two semaphores and a critical section" one adds a writer lock count and a owning writer TID to the structure.
Unlock still set most of the new status in the critsec. Readers still normally increase the lock count - recursive locking simply adds a non-existing reader to the counter.
During writers lock() I compare the owning TID, and if the writer already own it the write lock counter is increased.
Setting the new writer TID can't be done by the unlock() - it doesn't know which one will be wakened, but if writers reset it back to zero in their unlock() it's not a problem - the current thread id won't ever be zero and setting it is an atomic operation.
All sounds simple enough - one nasty problem left: A recursive reader-reader lock while a writer is waiting will deadlock. And I don't know how to solve that short of doing a reader-biased lock... somehow I need to know whether or not I already own a reader lock.
Using TLS doesn't sound too great after I realized that the number if available slots might be rather limited...
As far as I understand, you need to provide your writer exclusive access to the data, while readers can operate simultaneously (if this is not what you want, please clarify your question).
I think you need to implement a sort of "inverse semaphore", i.e. a semaphore that will block a thread when positive, and signal all waiting threads when zero. If you do this, you can use two such semaphores for your program. The operation of your threads could then be the following:
Reader:
(1) wait on sem A
(2) increase sem B
(3) read operation
(4) decrease sem B
Writer:
(1) increase sem A
(2) wait on sem B
(3) write operation
(4) decrease sem A
In this way the writer will perform the write operation as soon as all pending readers have finished reading. As soon as your writer finishes, readers can resume their operation without blocking each other.
I am not familiar with Windows mutex/semaphore facilities but I can think of a way to implement such semaphores using the POSIX threads API (combining a mutex, a counter and a conditional variable).