libc++ counting_semaphore::release:
void release(ptrdiff_t __update = 1)
{
if(0 < __a.fetch_add(__update, memory_order_release))
;
else if(__update > 1)
__a.notify_all();
else
__a.notify_one();
}
Notifies only if internal count was zero before increment, notifies more then one waiter only if increment is more than one.
libc++ counting_semaphore::acquire:
void acquire()
{
auto const __test_fn = [=]() -> bool {
auto __old = __a.load(memory_order_relaxed);
return (__old != 0) && __a.compare_exchange_strong(__old, __old - 1, memory_order_acquire, memory_order_relaxed);
};
__cxx_atomic_wait(&__a.__a_, __test_fn);
}
Waits for count to be non-zero, and tries to CAS it with decremented value.
Now please look into the following 3-threaded case:
counting_semaphore<10> s;
T1: { s.acquire(); /*signal to T3*/ }
T2: { s.acquire(); /*signal to T3*/ }
T3: { /*wait until both signals*/ s.release(1); s.release(1); }
Initially:
__a == 0
(desired parameter passed as 0, any attempt to acquire would block)
Timeline
T1: enters wait
T2: enters wait
T3: fetch add 1 & returns 0, now __a == 1
T3: (0 < 0) is false, so notify_one happens
T1: unblocks from wait
T3: fetch add 1 & returns 1, now __a == 2
T3: (0 < 1) is true, so no notification
T1: loads 2
T1: cas 2 to 1 successfully
T1: returns from acquire
T2: still waits despite __a == 1
Does this look like a valid deadlock?
why I am asking here instead of reporting the issue?
I reported the issue quite some time ago, no reply so far.
I want to understand if there's indeed a deadlock or I am missing something.
The conditional if (0 < ...) is a problem, but it's not the only problem.
The effects of release are stated to be:
Atomically execute counter += update. Then, unblocks any threads that are waiting for counter to be greater than zero.
Note the words "any threads". Plural. This means that, even if a particular update value happened to be 1, all of the threads blocked on this condition must be notified. So calling notify_one is wrong, unless the implementation of notify_one always unblocks all waiting threads. Which would be an... interesting implementation of that function.
Even if you change notify_one to notify_all, that doesn't fix the problem. The logic of the condition basically assumes that thread notification should only happen if all of the threads (logically) notified by a previous release have escaped from their acquire calls. The standard requires no such thing.
Related
I have been programming an pthread application. The application has mutex locks shared across threads by the parent thread. For some reason, it throws the following error:
../nptl/pthread_mutex_lock.c:428: __pthread_mutex_lock_full: Assertion `e != ESRCH || !robust' failed.
The application is for capturing high speed network traffic using packet_mmap based approach where there are multiple threads each associated with a socket. I am not sure why it is happening. It happens during testing and I am not able to reproduce the error all times. I googled a lot but I am not able to know about the cause. Thanks for your help.
The cause of the error is due to file read. When the line of file read is commented, the error does not occur. It happens in this line:
fread(this->bit_array, sizeof(int), this->m , fp);
where bit_array is an integer array which is dynamically allocated and m is the size of the array.
Thanks.
In GLIBC 2.31, you were running the following source code of pthread_mutex_lock():
oldval = atomic_compare_and_exchange_val_acq (&mutex->__data.__lock,
newval, 0);
if (oldval != 0)
{
/* The mutex is locked. The kernel will now take care of
everything. */
int private = (robust
? PTHREAD_ROBUST_MUTEX_PSHARED (mutex)
: PTHREAD_MUTEX_PSHARED (mutex));
int e = futex_lock_pi ((unsigned int *) &mutex->__data.__lock,
NULL, private);
if (e == ESRCH || e == EDEADLK)
{
assert (e != EDEADLK
|| (kind != PTHREAD_MUTEX_ERRORCHECK_NP
&& kind != PTHREAD_MUTEX_RECURSIVE_NP));
/* ESRCH can happen only for non-robust PI mutexes where
the owner of the lock died. */
assert (e != ESRCH || !robust);
/* Delay the thread indefinitely. */
while (1)
lll_timedwait (&(int){0}, 0, 0 /* ignored */, NULL,
private);
}
oldval = mutex->__data.__lock;
assert (robust || (oldval & FUTEX_OWNER_DIED) == 0);
}
In the above code, the current value of the mutex is atomically read and it appears that it is different than 0 meaning that the mutex is locked.
Then, the assert is triggered because the owner of the mutex died and the mutex was not a robust one (meaning that the mutex has not been automatically released upon the end of the owner thread).
If you can modify the source code, you may need to add the "robust" attribute to the mutex (pthread_mutexattr_setrobust()) in order to make the system release it automatically when the owner dies. But it is error prone as the corresponding critical section of code may have not reached a sane point and so may leave some un-achieved work...
So, it would be better to find the reason why a thread may die without unlocking the mutex. Either it is an error, either you forgot to release the mutex in the termination branch.
Per this copy of GLIBC's pthread_mutex_lock.c:
/* ESRCH can happen only for non-robust PI mutexes where
the owner of the lock died. */
assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
Either one of your threads/processes ended without releasing all its locked resources, or you're using pthread_cancel()/kill() and killing threads or processes while they're running.
I was reading Silberschatz's book and semaphore that I saw somethings confusing .
There were two type of definitions for wait() in Silberschatz'book .
the first one is :
wait(S) {
while (S<= 0)
; //busy-wait
S--;
}
and the second one is :
wait(semaphore *S) {
S->value--;
if (S->value < 0) {
add this process to S->list;
block();
}
}
Which one is true ?
when we call wait for semaphore , what we should do first ? checking semaphore value or decreasing amount of its value ?
Both are true but as mentioned in the book clearly that the first definition has problem of busy-waiting. That means it keeps the CPU busy but does no work and therefore wastes CPU cycles.
Whereas in the second definition there is no problem of busy waiting because of the fact that whenever the value of semaphore is not greater than 0, then the process is blocked and therefore enters the waiting state which means that it is taken off the CPU and can no more waste CPU cycles.
As it turns out, condition_variable::wait_for should really be called condition_variable::wait_for_or_possibly_indefinitely_longer_than, because it needs to reacquire the lock before really timing out and returning.
See this program for a demonstration.
Is there a way to express, "Look, I really only have two seconds. If myPredicate() is still false at that time and/or the lock is still locked, I don't care, just carry on regardless and give me a way to detect that."
Something like:
bool myPredicate();
auto sec = std::chrono::seconds(1);
bool pred;
std::condition_variable::cv_status timedOut;
std::tie( pred, timedOut ) =
cv.really_wait_for_no_longer_than( lck, 2*sec, myPredicate );
if( lck.owns_lock() ) {
// Can use mutexed resource.
// ...
lck.unlock();
} else {
// Cannot use mutexed resource. Deal with it.
};
I think that you misuse the condition_variable's lock. It's for protecting condition only, not for protecting a time-consuming work.
Your example can be fixed easily by splitting the mutex into two - one is for critical section, another is for protecting modifications of ready condition. Here is the modified fragment:
typedef std::unique_lock<std::mutex> lock_type;
auto sec = std::chrono::seconds(1);
std::mutex mtx_work;
std::mutex mtx_ready;
std::condition_variable cv;
bool ready = false;
void task1() {
log("Starting task 1. Waiting on cv for 2 secs.");
lock_type lck(mtx_ready);
bool done = cv.wait_for(lck, 2*sec, []{log("Checking condition..."); return ready;});
std::stringstream ss;
ss << "Task 1 finished, done==" << (done?"true":"false") << ", " << (lck.owns_lock()?"lock owned":"lock not owned");
log(ss.str());
}
void task2() {
// Allow task1 to go first
std::this_thread::sleep_for(1*sec);
log("Starting task 2. Locking and sleeping 2 secs.");
lock_type lck1(mtx_work);
std::this_thread::sleep_for(2*sec);
lock_type lck2(mtx_ready);
ready = true; // This happens around 3s into the program
log("OK, task 2 unlocking...");
lck2.unlock();
cv.notify_one();
}
It's output:
#2 ms: Starting task 1. Waiting on cv for 2 secs.
#2 ms: Checking condition...
#1002 ms: Starting task 2. Locking and sleeping 2 secs.
#2002 ms: Checking condition...
#2002 ms: Task 1 finished, done==false, lock owned
#3002 ms: OK, task 2 unlocking...
Actually, the condition_variable::wait_for does exactly what you want. The problem with your example is that you locked a 2-second sleep along with the ready = true assignment, making it impossible for the condition variable to even evaluate the predicate before reaching the time limit.
Put that std::this_thread::sleep_for(2*sec); line outside the lock and see for yourself.
Is there a way to express, "Look, I really only have two seconds. If
myPredicate() is still false at that time and/or the lock is still
locked, I don't care, just carry on regardless ..."
Yes, there is a way, but unfortunately in case of wait_for it has to be manual. The wait_for waits indefinitely because of Spurious Wakeup. Imagine your loop like this:
while(!myPredicate())
cv.wait_for(lock, std::chrono::duration::seconds(2);
The spurious wakeup can happen anytime in an arbitrary platform. Imagine that in your case it happens within 200 ms. Due to this, without any external notification wait_for() will wakeup and check for myPredicate() in the loop condition.
As expected, the condition will be false, hence the loop will be true and again it will execute cv.wait_for(..), with fresh 2 seconds. This is how it will run infinitely.
Either you control that updation duration by yourself or use wait_until() which is ultimately called in wait_for().
I am learning Multithreading. With regard to
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html#SCHEDULING
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
pthread_mutex_t count_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t condition_var = PTHREAD_COND_INITIALIZER;
void *functionCount1();
void *functionCount2();
int count = 0;
#define COUNT_DONE 10
#define COUNT_HALT1 3
#define COUNT_HALT2 6
main()
{
pthread_t thread1, thread2;
pthread_create( &thread1, NULL, &functionCount1, NULL);
pthread_create( &thread2, NULL, &functionCount2, NULL);
pthread_join( thread1, NULL);
pthread_join( thread2, NULL);
printf("Final count: %d\n",count);
exit(0);
}
// Write numbers 1-3 and 8-10 as permitted by functionCount2()
void *functionCount1()
{
for(;;)
{
// Lock mutex and then wait for signal to relase mutex
pthread_mutex_lock( &count_mutex );
// Wait while functionCount2() operates on count
// mutex unlocked if condition varialbe in functionCount2() signaled.
pthread_cond_wait( &condition_var, &count_mutex );
count++;
printf("Counter value functionCount1: %d\n",count);
pthread_mutex_unlock( &count_mutex );
if(count >= COUNT_DONE) return(NULL);
}
}
// Write numbers 4-7
void *functionCount2()
{
for(;;)
{
pthread_mutex_lock( &count_mutex );
if( count < COUNT_HALT1 || count > COUNT_HALT2 )
{
// Condition of if statement has been met.
// Signal to free waiting thread by freeing the mutex.
// Note: functionCount1() is now permitted to modify "count".
pthread_cond_signal( &condition_var );
}
else
{
count++;
printf("Counter value functionCount2: %d\n",count);
}
pthread_mutex_unlock( &count_mutex );
if(count >= COUNT_DONE) return(NULL);
}
}
I want to know the control flow of the code.
As pthread_cond_wait - unlocks the mutex and waits for the condition variable cond to be signaled
What I understood about the control of flow is
1) Thread One,Two are created and thread1 is passed the control (considering Single Core Processor System)
2) When it encounters pthread_cond_wait( &condition_var, &count_mutex ); in thread1 routine void *functionCount1() - it releases the lock and goes to wait state passing control to thread2 void *functionCount1()
3) In thread2 variable count is checked and since it satisfies count < COUNT_HALT1 || count > COUNT_HALT2 - it signals thread1 and restarts is to increment count
4) Steps 2 to 3 is repeated which displays 1-3 by thread1
5) For count 4-7 the thread2 is in action and there is no switching between thread1 and thread2
6)For count 8-10 again steps 2-3 are repeated.
I want to know whether my understanding is correct? Does thread1 goes to sleep and thread2 wakes it up ( i.e threads are switched) for count value 1-3 and 8-10 i.e switching between threads happen 5 times ?
EDIT
My main concern to ask this question is to know if thread1 will go to sleep state when it encounters pthread_cond_wait( &condition_var, &count_mutex ); and won't be active again unless signalled by thread2 and only then increments count i.e. it is not going to increment 1-3 in one go rather for each increment , it has to wait for signal from thread2 only then it can proceed further
First: get the book by Butenhof, and study it. The page you
cite is incorrect in several places, and the author obviously
doesn't understand threading himself.
With regards to your questions: the first thing to say is that
you cannot know about the control flow of the code. That's
a characteristic of threading, and on modern processors, you'll
often find the threads really running in parallel, with one core
executing one thread, and another core another. And within each
thread, the processor may rearrange memory accesses in
unexpected ways. This is why you need mutexes, for example.
(You mention "considering single core processing system", but in
practice, single core general purpose systems don't exist any
more.)
Second, how the threads are scheduled is up to the operating
system. In your code, for example, functionCount2 could run
until completion before functionCount1 starts, which would
mean that functionCount1 would wait forever.
Third, a thread in pthread_cond_wait may wake up spuriously.
It is an absolute rule that pthread_cond_wait be in a loop,
which checks whatever condition you're actually waiting for.
Maybe something like:
while ( count > COUNT_HALT1 && count < COUNT_HALT2 ) {
pthread_cond_wait( &condition_var, &count_mutex );
}
Finally, at times, you're accessing count in a section not
protected by the mutex. This is undefined behavior; all
accesses to count must be protected. In your case, the
locking and unlocking should probably be outside the program
loop, and both threads should wait on the conditional
variable. (But it's obviously an artificial situation—in
practice, there will almost always be a producer thread and
a consumer thread.)
In the ideal world, yes, but, in practice, not quite.
You can't predict which tread takes control first. Yes, it's likely to be thread1, but still not guaranteed. It is the first racing condition in your code.
When thread2 takes control, most likely it will finish without stopping. Regardless of how many CPUs you have. The reason is that it has no place to yield unconditionally. The fact you release mutex, doesn't mean any1 can get a lock on it. Its the second racing condition in your code.
So, the thread1 will print 11, and that is the only part guaranteed.
1) Threads are created. Control is not passed to thread1, it's system scheduler who decides which thread to execute. Both threads are active, both should receive processor time, but the order is not determined. There might be several context switches, you don't really control this.
2) Correct, thread1 comes to waiting state, thread2 continues working. Again, control is not passed explicitly.
3) Yes, thread2 notifies condition variable, so thread1 will awake and try to reacquire the mutex. Control does not go immediately to thread1.
In general you should understand that you can't really control which thread to execute; it's job os the system scheduler, who can put as many context switches as it wants.
UPD: With condition variables you can control the order of tasks execution within multithreading environment. So I think your understanding is more or less correct: thread1 is waiting on condition variable for signal from thread2. When signal is received thread1 continues execution (after it reacquires the mutex). But switching between threads - there might be many of them, 5 is just a theoretical minimum.
I have a program that spawns 3 worker threads that do some number crunching, and waits for them to finish like so:
#define THREAD_COUNT 3
volatile LONG waitCount;
HANDLE pSemaphore;
int main(int argc, char **argv)
{
// ...
HANDLE threads[THREAD_COUNT];
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
waitCount = 0;
for (int j=0; j<THREAD_COUNT; ++j)
{
threads[j] = CreateThread(NULL, 0, Iteration, p+j, 0, NULL);
}
WaitForMultipleObjects(THREAD_COUNT, threads, TRUE, INFINITE);
// ...
}
The worker threads use a custom Barrier function at certain points in the code to wait until all other threads reach the Barrier:
void Barrier(volatile LONG* counter, HANDLE semaphore, int thread_count = THREAD_COUNT)
{
LONG wait_count = InterlockedIncrement(counter);
if ( wait_count == thread_count )
{
*counter = 0;
ReleaseSemaphore(semaphore, thread_count - 1, NULL);
}
else
{
WaitForSingleObject(semaphore, INFINITE);
}
}
(Implementation based on this answer)
The program occasionally deadlocks. If at that point I use VS2008 to break execution and dig around in the internals, there is only 1 worker thread waiting on the Wait... line in Barrier(). The value of waitCount is always 2.
To make things even more awkward, the faster the threads work, the more likely they are to deadlock. If I run in Release mode, the deadlock comes about 8 out of 10 times. If I run in Debug mode and put some prints in the thread function to see where they hang, they almost never hang.
So it seems that some of my worker threads are killed early, leaving the rest stuck on the Barrier. However, the threads do literally nothing except read and write memory (and call Barrier()), and I'm quite positive that no segfaults occur. It is also possible that I'm jumping to the wrong conclusions, since (as mentioned in the question linked above) I'm new to Win32 threads.
What could be going on here, and how can I debug this sort of weird behavior with VS?
How do I debug weird thread behaviour?
Not quite what you said, but the answer is almost always: understand the code really well, understand all the possible outcomes and work out which one is happening. A debugger becomes less useful here, because you can either follow one thread and miss out on what is causing other threads to fail, or follow from the parent, in which case execution is no longer sequential and you end up all over the place.
Now, onto the problem.
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
From the MSDN documentation:
lInitialCount [in]: The initial count for the semaphore object. This value must be greater than or equal to zero and less than or equal to lMaximumCount. The state of a semaphore is signaled when its count is greater than zero and nonsignaled when it is zero. The count is decreased by one whenever a wait function releases a thread that was waiting for the semaphore. The count is increased by a specified amount by calling the ReleaseSemaphore function.
And here:
Before a thread attempts to perform the task, it uses the WaitForSingleObject function to determine whether the semaphore's current count permits it to do so. The wait function's time-out parameter is set to zero, so the function returns immediately if the semaphore is in the nonsignaled state. WaitForSingleObject decrements the semaphore's count by one.
So what we're saying here, is that a semaphore's count parameter tells you how many threads are allowed to perform a given task at once. When you set your count initially to THREAD_COUNT you are allowing all your threads access to the "resource" which in this case is to continue onwards.
The answer you link uses this creation method for the semaphore:
CreateSemaphore(0, 0, 1024, 0)
Which basically says none of the threads are permitted to use the resource. In your implementation, the semaphore is signaled (>0), so everything carries on merrily until one of the threads manages to decrease the count to zero, at which point some other thread waits for the semaphore to become signaled again, which probably isn't happening in sync with your counters. Remember when WaitForSingleObject returns it decreases the counter on the semaphore.
In the example you've posted, setting:
::ReleaseSemaphore(sync.Semaphore, sync.ThreadsCount - 1, 0);
Works because each of the WaitForSingleObject calls decrease the semaphore's value by 1 and there are threadcount - 1 of them to do, which happen when the threadcount - 1 WaitForSingleObjects all return, so the semaphore is back to 0 and therefore unsignaled again, so on the next pass everybody waits because nobody is allowed to access the resource at once.
So in short, set your initial value to zero and see if that fixes it.
Edit A little explanation: So to think of it a different way, a semaphore is like an n-atomic gate. What you do is usually this:
// Set the number of tickets:
HANDLE Semaphore = CreateSemaphore(0, 20, 200, 0);
// Later on in a thread somewhere...
// Get a ticket in the queue
WaitForSingleObject(Semaphore, INFINITE);
// Only 20 threads can access this area
// at once. When one thread has entered
// this area the available tickets decrease
// by one. When there are 20 threads here
// all other threads must wait.
// do stuff
ReleaseSemaphore(Semaphore, 1, 0);
// gives back one ticket.
So the use we're putting semaphores to here isn't quite the one for which they were designed.
It's a bit hard to guess exactly what you might be running into. Parallel programming is one of those places that (IMO) it pays to follow the philosophy of "keep it so simple it's obviously correct", and unfortunately I can't say that your Barrier code seems to qualify. Personally, I think I'd have something like this:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[thread_count];
for (int i=0; i<thread_count; i++)
barrier_[i] = CreateEvent(NULL, true, false, NULL);
// ...
Barrier(size_t thread_num) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, barrier_, true, INFINITE);
}
Edit:
Okay, now that the intent has been clarified (need to handle multiple iterations), I'd modify the answer, but only slightly. Instead of one array of Events, have two: one for the odd iterations and one for the even iterations:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[2][thread_count];
for (int i=0; i<thread_count; i++) {
barrier_[0][i] = CreateEvent(NULL, true, false, NULL);
barrier_[1][i] = CreateEvent(NULL, true, false, NULL);
}
// ...
Barrier(size_t thread_num, int iteration) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[iteration & 1][thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, &barrier[iteration & 1], true, INFINITE);
ResetEvent(barrier_[iteration & 1][thread_num]);
}
In your barrier, what prevents this line:
*counter = 0;
to be executed while this other one is executed by another thread?
LONG wait_count =
InterlockedIncrement(counter);