searching concurrent linked list - c++

I have a concurrent linked list. I need to prioritize finds on this list, so if a thread begins iterating over the list and subsequent insert/delete requests show up I'd want to queue those but if there are find requests from other threads I'd let those happen. What's the best way to implement this situation?
EDIT: I don't want to make copies of the list. Too expensive. Mom pays for my hardware.

Sounds like you are looking for a readers-/writer-lock. This is a lock that can be used to let many threads read the data-structure, but when one thread has write-access, all others are locked out.
Boost offers an implementation.

Are you on Windows ?
If so, you can use synchronization objects like Events or Mutexes.
For insert and delete, you can lock the mutex at the begining of those functions and release the mutex at the end of the function.
Steps shall be like below.
Create event object. http://msdn.microsoft.com/en-us/library/ms682655(VS.85).aspx
Lock the Event obect using WaitForSingleObject function at the start of insert/delete function.
http://msdn.microsoft.com/en-us/library/ms687032(VS.85).aspx
Use SetEvent to unlock the event object at the end of insert/delete function, so that the thread waiting on this event object will get the turn to do insertion/deletion.
http://msdn.microsoft.com/en-us/library/ms686211(VS.85).aspx
Reads do not need to acquire the lock. So no need of Event or Mutex for reading. Multiple threads can read concurrently from a shared buffer.
You can find general info on Reader-writer lock at
http://en.wikipedia.org/wiki/Readers-writer_lock
You can get example program at
http://msdn.microsoft.com/en-us/library/ms686915(v=VS.85).aspx

Sounds like a vanilla reader-writer lock is not quite what you want. As soon as you try to acquire the writer side of the lock, further readers will block until the writer completes, whereas you said you wanted additional threads doing reads to gain access even if there are inserts pending.
I'm not sure if what you want is safe. If there are enough reads going on, your insert/deletes could block forever.
If you really want this you can easily build it yourself, just as you can easily build a reader/writer lock on top of a standard mutex.
Note that this code is probably broken but maybe it gives you a general idea.
class UnfairReaderWriterMutex {
Mutex mutex;
CondVar condition;
int readers; // if positive, reader count; if -1, writer lock is held
public:
UnfairReaderWriterMutex() : counter(0) { }
void ReaderLock() {
mutex.Lock();
while (counter < 0) {
condition.Wait();
}
++readers;
mutex.Unlock();
}
void ReaderUnlock() {
mutex.Lock();
assert(readers > 0);
--readers;
condition.Notify(); // only need to wake up one writer
mutex.Unlock();
}
void WriterLock() {
mutex.Lock();
while (counter > 0) {
condition.Wait();
}
readers = -1;
mutex.Unlock();
}
void WriterUnlock() {
mutex.Lock();
assert(readers == -1);
readers = 0;
condition.NotifyAll(); // there may be multiple readers waiting
mutex.Unlock();
}
};
A typical reader/writer lock works just like this except that there's a flag stopping readers from acquiring the lock while there is a writer waiting.

Related

Pthreads: shared memory monitor design

I'm doing an assignment where we are creating 4 threads that all share some memory. Just want to ask if the design of my monitor looks good, since I'm having a deadlock/ stalling somewhere in my code when I try to cancel all threads.
Previously, I have identified the stalling to threads locking up the mutex when they are cancelled, leaving a thread deadlocking on waiting for the mutex to unlock. I've implemented some changes but still seems to stall when I pipe some data into it using
cat input.txt |./app
However if I read the data directly from the file using getline() then it does not stall and all threads are cancelled.
Currently, the monitor contains the 2 shared lists(these are created using the same pool of nodes), a mutex control access to this list, with 4 condition variables 2 per list.
//shared data
static List *sendList;
static List *receivelist;
//locks
static pthread_cond_t sendListIsFullCondVar = PTHREAD_COND_INITIALIZER;
static pthread_cond_t sendListIsEmptyCondVar = PTHREAD_COND_INITIALIZER;
static pthread_cond_t receiveListIsFullCondVar = PTHREAD_COND_INITIALIZER;
static pthread_cond_t receiveListIsEmptyCondVar = PTHREAD_COND_INITIALIZER;
static pthread_mutex_t listAccessMutex = PTHREAD_MUTEX_INITIALIZER;
The monitor interface consist of an add function and a get function for each list. Each threads are expected to be either adding to the list or getting from the list, but not both.
More specifically, the keyboardThread puts data into the sendlist, sendThread get data from sendlist, receiveThread put data into receivelist, and printThread get data from receivelist.
void Monitor_sendlistAdd(void *item)
{
pthread_mutex_lock(&listAccessMutex);
if (List_count(sendList) == MAX_LIST_SIZE)
{
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
pthread_cond_wait(&sendListIsFullCondVar, &listAccessMutex);
}
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
List_prepend(sendList, item);
pthread_cond_signal(&sendListIsEmptyCondVar);
pthread_mutex_unlock(&listAccessMutex);
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
}
void *Monitor_sendlistGet()
{
pthread_mutex_lock(&listAccessMutex);
if (List_count(sendList) == 0)
{
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
pthread_cond_wait(&sendListIsEmptyCondVar, &listAccessMutex);
}
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
void *item = List_trim(sendList);
pthread_cond_signal(&sendListIsFullCondVar);
pthread_mutex_unlock(&listAccessMutex);
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
return item;
}
(I didn't include the interface for receiveList since its is identical.)
I'm changing the cancel states after the if-statement to make sure a thread is not cancelled while locking the mutex, which would stall any process from cancelling if it is waiting for the mutex.
I also giving each thread's a cleaup handler that releases its mutex, again to ensure that no thread cancel without keeping the mutex locked.
void *keyboardThread()
{
pthread_cleanup_push(Monitor_releaseListAccessMutex, NULL);
while (1)
{
... some codes
}
pthread_cleanup_pop(1);
return;
So yeah, I'm just completely stumped as to where else could be blocking in my code. The rest of the code are just making connections to sockets to shoot data between ports and mallocs. I have looked into the manual and it seems like mutex_lock is the only function in my code that can block a thread cancel.
Do not use thread cancellation if you can possibly help it. It has deep problems associated with it.
To break threads out of a pthread_cond_wait(), use pthread_cond_signal() or pthread_cond_broadcast(). Or use pthread_cond_timedwait() in the first place and just let the timeout expire.
"But, wait!" I imagine you saying, "Then my threads will just proceed as if they had been signaled normally!" And there's the rub: your threads need to be able to handle spurious returns from their waits anyway, as those can and do happen. They must check before waiting whether they need to wait at all, and then they must check again, and potentially wait again, after returning from their wait.
What you can do, then, is add a shared flag that informs your thread(s) that they should abort instead of proceeding normally, and have them check that it is not set as one of the conditions for waiting. If it is set before any iteration of the wait loop then the thread should take whatever action is appropriate, such as (releasing all its locked mutexes and) terminating.
You remark:
I also giving each thread's a cleaup handler that releases its mutex
That's probably a bad idea, and it may be directly contributing to your problem. Threads must not attempt to unlock a mutex that they do not hold locked, so for your cleanup handlers to work correctly, you would need to track which mutexes each thread currently has locked, in a form that the cleanup handlers can act upon. It's conceivable that you could do that, but at best it would be messy, and probably fragile, and it might well carry its own synchronization issues. This is among the problems attending use of thread cancellation.

Can a single thread double-lock a mutex?

If I want a thread to pause for a while, until certain conditions are met, can I double-lock the mutex?
Here's a basic example of the idea I'm working with right now:
class foo
{
public:
static std::mutex processingMutex;
void infinite_processing_loop()
{
processingMutex.lock(); // Lock the mutex, initially
while(true)
{
if ( /*ready for this thread to process data*/ )
{
// ... perform one round of data processing ...
}
else // NOT ready for this thread to process data
{
/* Pause this thread,
and wait for another thread to unlock this mutex
(when more data might be ready for processing) */
processingMutex.lock();
}
}
processingMutex.unlock(); // Will never be executed
}
};
Will the processing thread halt, when it tries to double-lock the mutex?
And can another thread unlock the same mutex, causing the halted processing thread to resume?
Or does std::mutex automatically recognize when mutex is locked twice from the same processing thread?
std::mutex will usually deadlock on second attempt to lock by the owner thread. And even if it didn't, it's considered a bug for an application to attempt with this primitive.
std::recursive_mutex will allow re-entrant locks. So if you lock twice, you need to unlock twice before the mutex is available for other threads to grab.
There's a school of thought that any design that involves recursively acquiring a mutex after it's already been locked is a design flaw. I'll try to dig up that thread and add it.

Is it always necessary for a notifying thread to lock the shared data during modification?

When using a condition variable, http://en.cppreference.com/w/cpp/thread/condition_variable describes the typical steps for the thread that notifies as:
acquire a std::mutex (typically via std::lock_guard)
perform the modification while the lock is held
execute notify_one or notify_all on the std::condition_variable (the lock does not need to be held for notification)
For the simple case shown below, is it necessary for the main thread to lock "stop" when it modifies it? While I understand that locking when modifying shared data is almost always a good idea, I'm not sure why it would be necessary in this case.
std::condition_variable cv;
std::mutex mutex;
bool stop = false;
void worker() {
std::unique_lock<std::mutex> lock(mutex);
cv.wait(lock, [] { return stop; })
}
// No lock (Why would this not work?)
void main() {
std::thread(worker);
std::this_thread::sleep_for(1s);
stop = true;
cv.notify_one();
}
// With lock: why is this neccesary?
void main() {
std::thread mythread(worker);
std::this_thread::sleep_for(1s);
{
std::unique_lock<std::mutex>(mutex);
stop = true;
}
cv.notify_one();
mythread.join();
}
For the simple case shown below, is it necessary for the main thread to lock "stop" when it modifies it?
Yes, to prevent race condition. Aside from issues of accessing shared data from different threads, which could be fixed by std::atomic, imagine this order of events:
worker_thread: checks value of `stop` it is false
main_thread: sets `stop` to true and sends signal
worker_thread: sleeps on condition variable
In this situation worker thread would possibly sleep forever and it would miss event of stop set to true simply because wakeup signal was already sent and it missed it.
Acquiring mutex on modification of shared data is required because only then you can treat checking condition and going to sleep or acting on it in worker thread as atomic operation as whole.
Is , it is necessary.
First of all, from the standard point of view, you code exhibits undefined behavior, as stop is not atomic and isn't change under a lock, while other threads may read or write to it. both reading and writing must occur under a lock if multiple threads read and write to a shared memory.
From hardware perspective, if stop isn't changed under a lock, other threads, even through they lock some lock, aren't guaranteed to "see" the change that was done. locks don't just prevent from other threads to intervene, they also force the latest change to be read or written into the main memory, and hence all the threads can work with the latest values.

Implement a high performance mutex similar to Qt's one

I have a multi-thread scientific application where several computing threads (one per core) have to store their results in a common buffer. This requires a mutex mechanism.
Working threads spend only a small fraction of their time writing to the buffer, so the mutex is unlocked most of the time, and locks have a high probability to succeed immediately without waiting for another thread to unlock.
Currently, I have used Qt's QMutex for the task, and it works well : the mutex has a negligible overhead.
However, I have to port it to c++11/STL only. When using std::mutex, the performance drops by 66% and the threads spend most of their time locking the mutex.
After another question, I figured that Qt uses a fast locking mechanism based on a simple atomic flag, optimized for cases where the mutex is not already locked. And falls back to a system mutex when concurrent locking occurs.
I would like to implement this in STL. Is there a simple way based on std::atomic and std::mutex ? I have digged in Qt's code but it seems overly complicated for my use (I do not need locks timeouts, pimpl, small footprint etc...).
Edit : I have tried a spinlock, but this does not work well because :
Periodically (every few seconds), another thread locks the mutexes and flushes the buffer. This takes some time, so all worker threads get blocked at this time. The spinlocks make the scheduling busy, causing the flush to be 10-100x slower than with a proper mutex. This is not acceptable
Edit : I have tried this, but it's not working (locks all threads)
class Mutex
{
public:
Mutex() : lockCounter(0) { }
void lock()
{
if(lockCounter.fetch_add(1, std::memory_order_acquire)>0)
{
std::unique_lock<std::mutex> lock(internalMutex);
cv.wait(lock);
}
}
void unlock();
{
if(lockCounter.fetch_sub(1, std::memory_order_release)>1)
{
cv.notify_one();
}
}
private:
std::atomic<int> lockCounter;
std::mutex internalMutex;
std::condition_variable cv;
};
Thanks!
Edit : Final solution
MikeMB's fast mutex was working pretty well.
As a final solution, I did:
Use a simple spinlock with a try_lock
When a thread fails to try_lock, instead of waiting, they fill a queue (which is not shared with other threads) and continue
When a thread gets a lock, it updates the buffer with the current result, but also with the results stored in the queue (it processes its queue)
The buffer flushing was made much more efficiently : the blocking part only swaps two pointers.
General Advice
As was mentioned in some comments, I'd first have a look, whether you can restructure your program design to make the mutex implementation less critical for your performance .
Also, as multithreading support in standard c++ is pretty new and somewhat infantile, you sometimes just have to fall back on platform specific mechanisms, like e.g. a futex on linux systems or critical sections on windows or non-standard libraries like Qt.
That being said, I could think of two implementation approaches that might potentially speed up your program:
Spinlock
If access collisions happen very rarely, and the mutex is only hold for short periods of time (two things one should strive to achieve anyway of course), it might be most efficient to just use a spinlock, as it doesn't require any system calls at all and it's simple to implement (taken from cppreference):
class SpinLock {
std::atomic_flag locked ;
public:
void lock() {
while (locked.test_and_set(std::memory_order_acquire)) {
std::this_thread::yield(); //<- this is not in the source but might improve performance.
}
}
void unlock() {
locked.clear(std::memory_order_release);
}
};
The drawback of course is that waiting threads don't stay asleep and steal processing time.
Checked Locking
This is essentially the idea you demonstrated: You first make a fast check, whether locking is actually needed based on an atomic swap operation and use a heavy std::mutex only if it is unavoidable.
struct FastMux {
//Status of the fast mutex
std::atomic<bool> locked;
//helper mutex and vc on which threads can wait in case of collision
std::mutex mux;
std::condition_variable cv;
//the maximum number of threads that might be waiting on the cv (conservative estimation)
std::atomic<int> cntr;
FastMux():locked(false), cntr(0){}
void lock() {
if (locked.exchange(true)) {
cntr++;
{
std::unique_lock<std::mutex> ul(mux);
cv.wait(ul, [&]{return !locked.exchange(true); });
}
cntr--;
}
}
void unlock() {
locked = false;
if (cntr > 0){
std::lock_guard<std::mutex> ul(mux);
cv.notify_one();
}
}
};
Note that the std::mutex is not locked in between lock() and unlock() but it is only used for handling the condition variable. This results in more calls to lock / unlock if there is high congestion on the mutex.
The problem with your implementation is, that cv.notify_one(); can potentially be called between if(lockCounter.fetch_add(1, std::memory_order_acquire)>0) and cv.wait(lock); so your thread might never wake up.
I didn't do any performance comparisons against a fixed version of your proposed implementation though so you just have to see what works best for you.
Not really an answer per definition, but depending on the specific task, a lock-free queue might help getting rid of the mutex at all. This would help the design, if you have multiple producers and a single consumer (or even multiple consumers). Links:
Though not directly C++/STL, Boost.Lockfree provides such a queue.
Another option is the lock-free queue implementation in "C++ Concurrency in Action" by Anthony Williams.
A Fast Lock-Free Queue for C++
Update wrt to comments:
Queue size / overflow:
Queue overflowing can be avoided by i) making the queue large enough or ii) by making the producer thread wait with pushing data once the queue is full.
Another option would be to use multiple consumers and multiple queues and implement a parallel reduction but this depends on how the data is treated.
Consumer thread:
The queue could use std::condition_variable and make the consumer thread wait until there is data.
Another option would be to use a timer for checking in regular intervals (polling) for the queue being non-empty, once it is non-empty the thread can continuously fetch data and the go back into wait-mode.

How to avoid race conditions in a condition variable in VxWorks

We're programming on a proprietary embedded platform sitting atop of VxWorks 5.5. In our toolbox, we have a condition variable, that is implemented using a VxWorks binary semaphore.
Now, POSIX provides a wait function that also takes a mutex. This will unlock the mutex (so that some other task might write to the data) and waits for the other task to signal (it is done writing the data). I believe this implements what's called a Monitor, ICBWT.
We need such a wait function, but implementing it is tricky. A simple approach would do this:
bool condition::wait_for(mutex& mutex) const {
unlocker ul(mutex); // relinquish mutex
return wait(event);
} // ul's dtor grabs mutex again
However, this sports a race condition because it allows another task to preempt this one after the unlocking and before the waiting. The other task can write to the date after it was unlocked and signal the condition before this task starts to wait for the semaphore. (We have tested this and this indeed happens and blocks the waiting task forever.)
Given that VxWorks 5.5 doesn't seem to provide an API to temporarily relinquish a semaphore while waiting for a signal, is there a way to implement this on top of the provided synchronization routines?
Note: This is a very old VxWorks version that has been compiled without POSIX support (by the vendor of the proprietary hardware, from what I understood).
This should be quite easy with native vxworks, a message queue is what is required here. Your wait_for method can be used as is.
bool condition::wait_for(mutex& mutex) const
{
unlocker ul(mutex); // relinquish mutex
return wait(event);
} // ul's dtor grabs mutex again
but the wait(event) code would look like this:
wait(event)
{
if (msgQRecv(event->q, sigMsgBuf, sigMsgSize, timeoutTime) == OK)
{
// got it...
}
else
{
// timeout, report error or something like that....
}
}
and your signal code would like something like this:
signal(event)
{
msgQSend(event->q, sigMsg, sigMsgSize, NO_WAIT, MSG_PRI_NORMAL);
}
So if the signal gets triggered before you start waiting, then msgQRecv will return immediately with the signal when it eventually gets invoked and you can then take the mutex again in the ul dtor as stated above.
The event->q is a MSG_Q_ID that is created at event creation time with a call to msgQCreate, and the data in sigMsg is defined by you... but can be just a random byte of data, or you can come up with a more intelligent structure with information regarding who signaled or something else that may be nice to know.
Update for multiple waiters, this is a little tricky: So there are a couple of assumptions I will make to simplify things
The number of tasks that will be pending is known at event creation time and is constant.
There will be one task that is always responsible for indicating when it is ok to unlock the mutex, all other tasks just want notification when the event is signaled/complete.
This approach uses a counting semaphore, similar to the above with just a little extra logic:
wait(event)
{
if (semTake(event->csm, timeoutTime) == OK)
{
// got it...
}
else
{
// timeout, report error or something like that....
}
}
and your signal code would like something like this:
signal(event)
{
for (int x = 0; x < event->numberOfWaiters; x++)
{
semGive(event->csm);
}
}
The creation of the event is something like this, remember in this example the number of waiters is constant and known at event creation time. You could make it dynamic, but the key is that every time the event is going to happen the numberOfWaiters must be correct before the unlocker unlocks the mutex.
createEvent(numberOfWaiters)
{
event->numberOfWaiters = numberOfWaiters;
event->csv = semCCreate(SEM_Q_FIFO, 0);
return event;
}
You cannot be wishy-washy about the numberOfWaiters :D I will say it again: The numberOfWaiters must be correct before the unlocker unlocks the mutex. To make it dynamic (if that is a requirement) you could add a setNumWaiters(numOfWaiters) function, and call that in the wait_for function before the unlocker unlocks the mutex, so long as it always sets the number correctly.
Now for the last trick, as stated above the assumption is that one task is responsible for unlocking the mutex, the rest just wait for the signal, which means that one and only one task will call the wait_for() function above, and the rest of the tasks just call the wait(event) function.
With this in mind the numberOfWaiters is computed as follows:
The number of tasks who will call wait()
plus 1 for the task that calls wait_for()
Of course you can also make this more complex if you really need to, but chances are this will work because normally 1 task triggers an event, but many tasks want to know it is complete, and that is what this provides.
But your basic flow is as follows:
init()
{
event->createEvent(3);
}
eventHandler()
{
locker l(mutex);
doEventProcessing();
signal(event);
}
taskA()
{
doOperationThatTriggersAnEvent();
wait_for(mutex);
eventComplete();
}
taskB()
{
doWhateverIWant();
// now I need to know if the event has occurred...
wait(event);
coolNowIKnowThatIsDone();
}
taskC()
{
taskCIsFun();
wait(event);
printf("event done!\n");
}
When I write the above I feel like all OO concepts are dead, but hopefully you get the idea, in reality wait and wait_for should take the same parameter, or no parameter but rather be members of the same class that also has all the data they need to know... but none the less that is the overview of how it works.
Race conditions can be avoided if each waiting task waits on a separate binary semaphore.
These semaphores must be registered in a container which the signaling task uses to unblock all waiting tasks. The container must be protected by a mutex.
The wait_for() method obtains a binary semaphore, waits on it and finally deletes it.
void condition::wait_for(mutex& mutex) {
SEM_ID sem = semBCreate(SEM_Q_PRIORITY, SEM_EMPTY);
{
lock l(listeners_mutex); // assure exclusive access to listeners container
listeners.push_back(sem);
} // l's dtor unlocks listeners_mutex again
unlocker ul(mutex); // relinquish mutex
semTake(sem, WAIT_FOREVER);
{
lock l(listeners_mutex);
// remove sem from listeners
// ...
semDelete(sem);
}
} // ul's dtor grabs mutex again
The signal() method iterates over all registered semaphores and unlocks them.
void condition::signal() {
lock l(listeners_mutex);
for_each (listeners.begin(), listeners.end(), /* call semGive()... */ )
}
This approach assures that wait_for() will never miss a signal. A disadvantage is the need of additional system resources.
To avoid creating and destroying semaphores for every wait_for() call, a pool could be used.
From the description, it looks like you may want to implement (or use) a semaphore - it's a standard CS algorithm with semantics similar to condvars, and there are tons of textbooks on how to implement them (https://www.google.com/search?q=semaphore+algorithm).
A random Google result which explains semaphores is at: http://www.cs.cornell.edu/courses/cs414/2007sp/lectures/08-bakery.ppt‎ (see slide 32).