I have been programming an pthread application. The application has mutex locks shared across threads by the parent thread. For some reason, it throws the following error:
../nptl/pthread_mutex_lock.c:428: __pthread_mutex_lock_full: Assertion `e != ESRCH || !robust' failed.
The application is for capturing high speed network traffic using packet_mmap based approach where there are multiple threads each associated with a socket. I am not sure why it is happening. It happens during testing and I am not able to reproduce the error all times. I googled a lot but I am not able to know about the cause. Thanks for your help.
The cause of the error is due to file read. When the line of file read is commented, the error does not occur. It happens in this line:
fread(this->bit_array, sizeof(int), this->m , fp);
where bit_array is an integer array which is dynamically allocated and m is the size of the array.
Thanks.
In GLIBC 2.31, you were running the following source code of pthread_mutex_lock():
oldval = atomic_compare_and_exchange_val_acq (&mutex->__data.__lock,
newval, 0);
if (oldval != 0)
{
/* The mutex is locked. The kernel will now take care of
everything. */
int private = (robust
? PTHREAD_ROBUST_MUTEX_PSHARED (mutex)
: PTHREAD_MUTEX_PSHARED (mutex));
int e = futex_lock_pi ((unsigned int *) &mutex->__data.__lock,
NULL, private);
if (e == ESRCH || e == EDEADLK)
{
assert (e != EDEADLK
|| (kind != PTHREAD_MUTEX_ERRORCHECK_NP
&& kind != PTHREAD_MUTEX_RECURSIVE_NP));
/* ESRCH can happen only for non-robust PI mutexes where
the owner of the lock died. */
assert (e != ESRCH || !robust);
/* Delay the thread indefinitely. */
while (1)
lll_timedwait (&(int){0}, 0, 0 /* ignored */, NULL,
private);
}
oldval = mutex->__data.__lock;
assert (robust || (oldval & FUTEX_OWNER_DIED) == 0);
}
In the above code, the current value of the mutex is atomically read and it appears that it is different than 0 meaning that the mutex is locked.
Then, the assert is triggered because the owner of the mutex died and the mutex was not a robust one (meaning that the mutex has not been automatically released upon the end of the owner thread).
If you can modify the source code, you may need to add the "robust" attribute to the mutex (pthread_mutexattr_setrobust()) in order to make the system release it automatically when the owner dies. But it is error prone as the corresponding critical section of code may have not reached a sane point and so may leave some un-achieved work...
So, it would be better to find the reason why a thread may die without unlocking the mutex. Either it is an error, either you forgot to release the mutex in the termination branch.
Per this copy of GLIBC's pthread_mutex_lock.c:
/* ESRCH can happen only for non-robust PI mutexes where
the owner of the lock died. */
assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
Either one of your threads/processes ended without releasing all its locked resources, or you're using pthread_cancel()/kill() and killing threads or processes while they're running.
Related
libc++ counting_semaphore::release:
void release(ptrdiff_t __update = 1)
{
if(0 < __a.fetch_add(__update, memory_order_release))
;
else if(__update > 1)
__a.notify_all();
else
__a.notify_one();
}
Notifies only if internal count was zero before increment, notifies more then one waiter only if increment is more than one.
libc++ counting_semaphore::acquire:
void acquire()
{
auto const __test_fn = [=]() -> bool {
auto __old = __a.load(memory_order_relaxed);
return (__old != 0) && __a.compare_exchange_strong(__old, __old - 1, memory_order_acquire, memory_order_relaxed);
};
__cxx_atomic_wait(&__a.__a_, __test_fn);
}
Waits for count to be non-zero, and tries to CAS it with decremented value.
Now please look into the following 3-threaded case:
counting_semaphore<10> s;
T1: { s.acquire(); /*signal to T3*/ }
T2: { s.acquire(); /*signal to T3*/ }
T3: { /*wait until both signals*/ s.release(1); s.release(1); }
Initially:
__a == 0
(desired parameter passed as 0, any attempt to acquire would block)
Timeline
T1: enters wait
T2: enters wait
T3: fetch add 1 & returns 0, now __a == 1
T3: (0 < 0) is false, so notify_one happens
T1: unblocks from wait
T3: fetch add 1 & returns 1, now __a == 2
T3: (0 < 1) is true, so no notification
T1: loads 2
T1: cas 2 to 1 successfully
T1: returns from acquire
T2: still waits despite __a == 1
Does this look like a valid deadlock?
why I am asking here instead of reporting the issue?
I reported the issue quite some time ago, no reply so far.
I want to understand if there's indeed a deadlock or I am missing something.
The conditional if (0 < ...) is a problem, but it's not the only problem.
The effects of release are stated to be:
Atomically execute counter += update. Then, unblocks any threads that are waiting for counter to be greater than zero.
Note the words "any threads". Plural. This means that, even if a particular update value happened to be 1, all of the threads blocked on this condition must be notified. So calling notify_one is wrong, unless the implementation of notify_one always unblocks all waiting threads. Which would be an... interesting implementation of that function.
Even if you change notify_one to notify_all, that doesn't fix the problem. The logic of the condition basically assumes that thread notification should only happen if all of the threads (logically) notified by a previous release have escaped from their acquire calls. The standard requires no such thing.
I'm trying to implement a readers writers solution in C++ with std::thread.
I create several reader-threads that run in an infinite loop, pausing for some time in between each read access. I tried to recreate the algorithm presented in Tanenbaum's book of Operating Systems:
rc_mtx.lock(); // lock for incrementing readcount
read_count += 1;
if (read_count == 1) // if this is the first reader
db_mtx.lock(); // then make a lock on the database
rc_mtx.unlock();
cell_value = data_base[cell_number]; // read data from database
rc_mtx.lock();
read_count -= 1; // when finished 'sign this reader off'
if (read_count == 0) // if this was the last one
db_mtx.unlock(); // release the lock on the database mutex
rc_mtx.unlock();
Of course, the problem is that the thread who might satisfy the condition of being the last reader (and therefore want to do the unlock) has never acquired the db_mtx.
I tried to open another 'mother' thread for the readers to take care of
acquiring and releasing the mutex, but I go lost during the process.
If there is an elegant way to overcome this issue (thread might try to release a mutex that has never been acquired) in an elegant way I'd love to hear!
You can use a condition variable to pause writers if readers are in progress, instead of using a separate lock.
// --- read code
rw_mtx.lock(); // will block if there is a write in progress
read_count += 1; // announce intention to read
rw_mtx.unlock();
cell_value = data_base[cell_number];
rw_mtx.lock();
read_count -= 1; // announce intention to read
if (read_count == 0) rw_write_q.notify_one();
rw_mtx.unlock();
// --- write code
std::unique_lock<std::mutex> rw_lock(rw_mtx);
write_count += 1;
rw_write_q.wait(rw_lock, []{return read_count == 0;});
data_base[cell_number] = cell_value;
write_count -= 1;
if (write_count > 0) rw_write_q.notify_one();
This implementation has a fairness issue, because new readers can cut in front of waiting writers. A completely fair implementation would probably involve a proper queue that would allow new readers to wait behind waiting writers, and new writers to wait behind any waiting readers.
In C++14, you can use a shared_timed_mutex instead of mutex to achieve multiple readers/single writer access.
// --- read code
std::shared_lock<std::shared_timed_mutex> read_lock(rw_mtx);
cell_value = data_base[cell_number];
// --- write code
std::unique_lock<std::shared_timed_mutex> write_lock(rw_mtx);
data_base[cell_number] = cell_value;
There will likely be a plain shared_mutex implementation in the next C++ standard (probably C++17).
I am learning Multithreading. With regard to
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html#SCHEDULING
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
pthread_mutex_t count_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t condition_var = PTHREAD_COND_INITIALIZER;
void *functionCount1();
void *functionCount2();
int count = 0;
#define COUNT_DONE 10
#define COUNT_HALT1 3
#define COUNT_HALT2 6
main()
{
pthread_t thread1, thread2;
pthread_create( &thread1, NULL, &functionCount1, NULL);
pthread_create( &thread2, NULL, &functionCount2, NULL);
pthread_join( thread1, NULL);
pthread_join( thread2, NULL);
printf("Final count: %d\n",count);
exit(0);
}
// Write numbers 1-3 and 8-10 as permitted by functionCount2()
void *functionCount1()
{
for(;;)
{
// Lock mutex and then wait for signal to relase mutex
pthread_mutex_lock( &count_mutex );
// Wait while functionCount2() operates on count
// mutex unlocked if condition varialbe in functionCount2() signaled.
pthread_cond_wait( &condition_var, &count_mutex );
count++;
printf("Counter value functionCount1: %d\n",count);
pthread_mutex_unlock( &count_mutex );
if(count >= COUNT_DONE) return(NULL);
}
}
// Write numbers 4-7
void *functionCount2()
{
for(;;)
{
pthread_mutex_lock( &count_mutex );
if( count < COUNT_HALT1 || count > COUNT_HALT2 )
{
// Condition of if statement has been met.
// Signal to free waiting thread by freeing the mutex.
// Note: functionCount1() is now permitted to modify "count".
pthread_cond_signal( &condition_var );
}
else
{
count++;
printf("Counter value functionCount2: %d\n",count);
}
pthread_mutex_unlock( &count_mutex );
if(count >= COUNT_DONE) return(NULL);
}
}
I want to know the control flow of the code.
As pthread_cond_wait - unlocks the mutex and waits for the condition variable cond to be signaled
What I understood about the control of flow is
1) Thread One,Two are created and thread1 is passed the control (considering Single Core Processor System)
2) When it encounters pthread_cond_wait( &condition_var, &count_mutex ); in thread1 routine void *functionCount1() - it releases the lock and goes to wait state passing control to thread2 void *functionCount1()
3) In thread2 variable count is checked and since it satisfies count < COUNT_HALT1 || count > COUNT_HALT2 - it signals thread1 and restarts is to increment count
4) Steps 2 to 3 is repeated which displays 1-3 by thread1
5) For count 4-7 the thread2 is in action and there is no switching between thread1 and thread2
6)For count 8-10 again steps 2-3 are repeated.
I want to know whether my understanding is correct? Does thread1 goes to sleep and thread2 wakes it up ( i.e threads are switched) for count value 1-3 and 8-10 i.e switching between threads happen 5 times ?
EDIT
My main concern to ask this question is to know if thread1 will go to sleep state when it encounters pthread_cond_wait( &condition_var, &count_mutex ); and won't be active again unless signalled by thread2 and only then increments count i.e. it is not going to increment 1-3 in one go rather for each increment , it has to wait for signal from thread2 only then it can proceed further
First: get the book by Butenhof, and study it. The page you
cite is incorrect in several places, and the author obviously
doesn't understand threading himself.
With regards to your questions: the first thing to say is that
you cannot know about the control flow of the code. That's
a characteristic of threading, and on modern processors, you'll
often find the threads really running in parallel, with one core
executing one thread, and another core another. And within each
thread, the processor may rearrange memory accesses in
unexpected ways. This is why you need mutexes, for example.
(You mention "considering single core processing system", but in
practice, single core general purpose systems don't exist any
more.)
Second, how the threads are scheduled is up to the operating
system. In your code, for example, functionCount2 could run
until completion before functionCount1 starts, which would
mean that functionCount1 would wait forever.
Third, a thread in pthread_cond_wait may wake up spuriously.
It is an absolute rule that pthread_cond_wait be in a loop,
which checks whatever condition you're actually waiting for.
Maybe something like:
while ( count > COUNT_HALT1 && count < COUNT_HALT2 ) {
pthread_cond_wait( &condition_var, &count_mutex );
}
Finally, at times, you're accessing count in a section not
protected by the mutex. This is undefined behavior; all
accesses to count must be protected. In your case, the
locking and unlocking should probably be outside the program
loop, and both threads should wait on the conditional
variable. (But it's obviously an artificial situation—in
practice, there will almost always be a producer thread and
a consumer thread.)
In the ideal world, yes, but, in practice, not quite.
You can't predict which tread takes control first. Yes, it's likely to be thread1, but still not guaranteed. It is the first racing condition in your code.
When thread2 takes control, most likely it will finish without stopping. Regardless of how many CPUs you have. The reason is that it has no place to yield unconditionally. The fact you release mutex, doesn't mean any1 can get a lock on it. Its the second racing condition in your code.
So, the thread1 will print 11, and that is the only part guaranteed.
1) Threads are created. Control is not passed to thread1, it's system scheduler who decides which thread to execute. Both threads are active, both should receive processor time, but the order is not determined. There might be several context switches, you don't really control this.
2) Correct, thread1 comes to waiting state, thread2 continues working. Again, control is not passed explicitly.
3) Yes, thread2 notifies condition variable, so thread1 will awake and try to reacquire the mutex. Control does not go immediately to thread1.
In general you should understand that you can't really control which thread to execute; it's job os the system scheduler, who can put as many context switches as it wants.
UPD: With condition variables you can control the order of tasks execution within multithreading environment. So I think your understanding is more or less correct: thread1 is waiting on condition variable for signal from thread2. When signal is received thread1 continues execution (after it reacquires the mutex). But switching between threads - there might be many of them, 5 is just a theoretical minimum.
First of all, I know that it can be implemented with a mutex and condition variable, but I want the most efficient implementation possible.
I would like a semaphore with a fast-path when there's no contention. On Linux this is easy with a futex; for example, here's a wait:
if (AtomicDecremenIfPositive(_counter) > 0) return; // Uncontended
AtomicAdd(&_waiters, 1);
do
{
if (syscall(SYS_futex, &_counter, FUTEX_WAIT_PRIVATE, 0, nullptr, nullptr, 0) == -1) // Sleep
{
AtomicAdd(&_waiters, -1);
throw std::runtime_error("Failed to wait for futex");
}
}
while (AtomicDecrementIfPositive(_counter) <= 0);
AtomicAdd(&_waiters, -1);
and post:
AtomicAdd(&_counter, 1);
if (Load(_waiters) > 0 && syscall(SYS_futex, &_counter, FUTEX_WAKE_PRIVATE, 1, nullptr, nullptr, 0) == -1) throw std::runtime_error("Failed to wake futex"); // Wake one
At first I thought for Windows to just use NtWaitForKeyedEvent(). The problem is it's not a direct substitution because it doesn't atomically check the value at _counter before going into the kernel, and so can miss the wake from NtReleaseKeyedEvent(). Worse, then NtReleaseKeyedEvent() would block.
What's the best solution?
Windows has native semaphores with CreateSemaphore. Until and unless you have some kind of documented performance problem doing it the normal way, you shouldn't even consider optimizations that are fragile or hardware-specific.
I think something like this should work:
// bottom 16 bits: post count
// top 16 bits: wait count
struct Semaphore { unsigned val; }
wait(struct Semaphore *s)
{
retry:
do
old = s->val;
if old had posts (bottom 16 bits != 0)
new = old - 1
wait = false
else
new = old + 65536
wait = true
until successful CAS of &s->val from old to new
if wait == true
wait on keyed event
goto retry;
}
post(struct Semaphore *s)
{
do
old = s->val;
if old had waiters (top 16 bits != 0)
// perhaps new = old - 65536 and remove the "goto retry" above?
// not sure, but this is safer...
new = old - 65536 + 1
release = true
else
new = old + 1
release = false
until successful CAS of &s->val from old to new
if release == true
release keyed event
}
edit: that said, I'm not sure this would help you a lot. Your thread pool usually should be big enough that a thread is always ready to process your request. This means that not only waits, but also posts will always take the slow path and go to the kernel. So, counting semaphores are probably the one primitive where you do not really care about a userspace-only fastpath. Stock Win32 semaphores should be good enough. That said, I'm happy to be proven wrong!
I vote for your first idea, e.g critical section and condition variable. Critical section is fast enough and it does use interlocked operation before it goes to sleep. Or, you can experiment with SRWLocks instead of critical section. Condition variables (and SRWLocks) are very fast - their only problem is that there are no conditions on XP, but maybe you do not need to target this platform .
Qt has all kinds of things like QMutex, QSemaphore which are implemented in spirit like what you presented in your question.
Actually, I would suggest replacing the futex stuff with the usual OS-provided synchronization primitives; it should not matter much since that is the slow path anyway.
I have a program that spawns 3 worker threads that do some number crunching, and waits for them to finish like so:
#define THREAD_COUNT 3
volatile LONG waitCount;
HANDLE pSemaphore;
int main(int argc, char **argv)
{
// ...
HANDLE threads[THREAD_COUNT];
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
waitCount = 0;
for (int j=0; j<THREAD_COUNT; ++j)
{
threads[j] = CreateThread(NULL, 0, Iteration, p+j, 0, NULL);
}
WaitForMultipleObjects(THREAD_COUNT, threads, TRUE, INFINITE);
// ...
}
The worker threads use a custom Barrier function at certain points in the code to wait until all other threads reach the Barrier:
void Barrier(volatile LONG* counter, HANDLE semaphore, int thread_count = THREAD_COUNT)
{
LONG wait_count = InterlockedIncrement(counter);
if ( wait_count == thread_count )
{
*counter = 0;
ReleaseSemaphore(semaphore, thread_count - 1, NULL);
}
else
{
WaitForSingleObject(semaphore, INFINITE);
}
}
(Implementation based on this answer)
The program occasionally deadlocks. If at that point I use VS2008 to break execution and dig around in the internals, there is only 1 worker thread waiting on the Wait... line in Barrier(). The value of waitCount is always 2.
To make things even more awkward, the faster the threads work, the more likely they are to deadlock. If I run in Release mode, the deadlock comes about 8 out of 10 times. If I run in Debug mode and put some prints in the thread function to see where they hang, they almost never hang.
So it seems that some of my worker threads are killed early, leaving the rest stuck on the Barrier. However, the threads do literally nothing except read and write memory (and call Barrier()), and I'm quite positive that no segfaults occur. It is also possible that I'm jumping to the wrong conclusions, since (as mentioned in the question linked above) I'm new to Win32 threads.
What could be going on here, and how can I debug this sort of weird behavior with VS?
How do I debug weird thread behaviour?
Not quite what you said, but the answer is almost always: understand the code really well, understand all the possible outcomes and work out which one is happening. A debugger becomes less useful here, because you can either follow one thread and miss out on what is causing other threads to fail, or follow from the parent, in which case execution is no longer sequential and you end up all over the place.
Now, onto the problem.
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
From the MSDN documentation:
lInitialCount [in]: The initial count for the semaphore object. This value must be greater than or equal to zero and less than or equal to lMaximumCount. The state of a semaphore is signaled when its count is greater than zero and nonsignaled when it is zero. The count is decreased by one whenever a wait function releases a thread that was waiting for the semaphore. The count is increased by a specified amount by calling the ReleaseSemaphore function.
And here:
Before a thread attempts to perform the task, it uses the WaitForSingleObject function to determine whether the semaphore's current count permits it to do so. The wait function's time-out parameter is set to zero, so the function returns immediately if the semaphore is in the nonsignaled state. WaitForSingleObject decrements the semaphore's count by one.
So what we're saying here, is that a semaphore's count parameter tells you how many threads are allowed to perform a given task at once. When you set your count initially to THREAD_COUNT you are allowing all your threads access to the "resource" which in this case is to continue onwards.
The answer you link uses this creation method for the semaphore:
CreateSemaphore(0, 0, 1024, 0)
Which basically says none of the threads are permitted to use the resource. In your implementation, the semaphore is signaled (>0), so everything carries on merrily until one of the threads manages to decrease the count to zero, at which point some other thread waits for the semaphore to become signaled again, which probably isn't happening in sync with your counters. Remember when WaitForSingleObject returns it decreases the counter on the semaphore.
In the example you've posted, setting:
::ReleaseSemaphore(sync.Semaphore, sync.ThreadsCount - 1, 0);
Works because each of the WaitForSingleObject calls decrease the semaphore's value by 1 and there are threadcount - 1 of them to do, which happen when the threadcount - 1 WaitForSingleObjects all return, so the semaphore is back to 0 and therefore unsignaled again, so on the next pass everybody waits because nobody is allowed to access the resource at once.
So in short, set your initial value to zero and see if that fixes it.
Edit A little explanation: So to think of it a different way, a semaphore is like an n-atomic gate. What you do is usually this:
// Set the number of tickets:
HANDLE Semaphore = CreateSemaphore(0, 20, 200, 0);
// Later on in a thread somewhere...
// Get a ticket in the queue
WaitForSingleObject(Semaphore, INFINITE);
// Only 20 threads can access this area
// at once. When one thread has entered
// this area the available tickets decrease
// by one. When there are 20 threads here
// all other threads must wait.
// do stuff
ReleaseSemaphore(Semaphore, 1, 0);
// gives back one ticket.
So the use we're putting semaphores to here isn't quite the one for which they were designed.
It's a bit hard to guess exactly what you might be running into. Parallel programming is one of those places that (IMO) it pays to follow the philosophy of "keep it so simple it's obviously correct", and unfortunately I can't say that your Barrier code seems to qualify. Personally, I think I'd have something like this:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[thread_count];
for (int i=0; i<thread_count; i++)
barrier_[i] = CreateEvent(NULL, true, false, NULL);
// ...
Barrier(size_t thread_num) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, barrier_, true, INFINITE);
}
Edit:
Okay, now that the intent has been clarified (need to handle multiple iterations), I'd modify the answer, but only slightly. Instead of one array of Events, have two: one for the odd iterations and one for the even iterations:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[2][thread_count];
for (int i=0; i<thread_count; i++) {
barrier_[0][i] = CreateEvent(NULL, true, false, NULL);
barrier_[1][i] = CreateEvent(NULL, true, false, NULL);
}
// ...
Barrier(size_t thread_num, int iteration) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[iteration & 1][thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, &barrier[iteration & 1], true, INFINITE);
ResetEvent(barrier_[iteration & 1][thread_num]);
}
In your barrier, what prevents this line:
*counter = 0;
to be executed while this other one is executed by another thread?
LONG wait_count =
InterlockedIncrement(counter);