Consider the following code snippet
int index = 0;
av::utils::Lock lock(av::utils::Lock::EStrategy::eMutex); // Uses a mutex or a spin lock based on specified strategy.
void fun()
{
for (int i = 0; i < 100; ++i)
{
lock.aquire();
++index;
std::cout << "thread " << std::this_thread::get_id() << " index = " << index << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(500));
lock.release();
}
}
int main()
{
std::thread t1(fun);
std::thread t2(fun);
t1.join();
t2.join();
}
The output that I get with a mutex used for synchronization is first thread 1 gets executed completely followed by thread 2.
While using a spinlock(implemented using std::atomic_flag), I get the order of execution between the threads which is interleaved (one iteration of thread 1 followed by another iteration of thread 2). The latter case happens irrespective of the delay I add in execution of the iteration.
I understand that a mutex only guarantees mutual exclusion and not the order of execution. The question I have is if I want to have an execution order such that two threads are executed in an interleaved manner, is using spinlocks a recommended strategy or not?
The output that I get with a mutex ... is first thread 1 [runs through the whole loop] followed by thread 2.
That's because of how your loop uses the lock: The very last thing the loop body does is, it unlocks the lock. The very next thing it does at the start of the next iteration is, it locks the lock again.
The other thread can be blocked, effectively sleeping, waiting for the mutex. When your thread 1 releases the lock, the OS scheduler may still be running its algorithms, trying to figure out how to respond to that, when thread 1 comes 'round and locks the lock again.
It's like a race to lock the mutex, and thread 1 is on the starting line when the gun goes off, while thread 2 is sitting on the bench, tying its shoes.
While using a spinlock...the order of execution between the threads which is interleaved
That's because the "blocked" thread isn't really blocked. It's still actively running on a different processor while it waits. It has a much better chance at winning the lock when the first thread releases it.
Related
I'm trying to understand how to better use condition variables, and I have the following code.
Behavior.
The expected behavior of the code is that:
Each thread prints "thread n waiting"
The program waits until the user presses enter
When the user presses enter, notify_one is called once for each thread
All the threads print "thread n ready.", and exit
The observed behavior of the code is that:
Each thread prints "thread n waiting" (Expected)
The program waits until the user presses enter (Expected)
When the user presses enter, notify_one is called once for each thread (Expected)
One of the threads prints "thread n ready", but then the code hangs. (???)
Question.
Why does the code hang? And how can I have multiple threads wait on the same condition variable?
Code
#include <condition_variable>
#include <iostream>
#include <string>
#include <vector>
#include <thread>
int main() {
using namespace std::literals::string_literals;
auto m = std::mutex();
auto lock = std::unique_lock(m);
auto cv = std::condition_variable();
auto wait_then_print =[&](int id) {
return [&, id]() {
auto id_str = std::to_string(id);
std::cout << ("thread " + id_str + " waiting.\n");
cv.wait(lock);
// If I add this line in, the code gives me a system error:
// lock.unlock();
std::cout << ("thread " + id_str + " ready.\n");
};
};
auto threads = std::vector<std::thread>(16);
int counter = 0;
for(auto& t : threads)
t = std::thread(wait_then_print(counter++));
std::cout << "Press enter to continue.\n";
std::getchar();
for(int i = 0; i < counter; i++) {
cv.notify_one();
std::cout << "Notified one.\n";
}
for(auto& t : threads)
t.join();
}
Output
thread 1 waiting.
thread 0 waiting.
thread 2 waiting.
thread 3 waiting.
thread 4 waiting.
thread 5 waiting.
thread 6 waiting.
thread 7 waiting.
thread 8 waiting.
thread 9 waiting.
thread 11 waiting.
thread 10 waiting.
thread 12 waiting.
thread 13 waiting.
thread 14 waiting.
thread 15 waiting.
Press enter to continue.
Notified one.
Notified one.
thread 1 ready.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
This is undefined behavior.
In order to wait on a condition variable, the condition variable must be waited on by the same exact thread that originally locked the mutex. You cannot lock the mutex in one execution thread, and then wait on the condition variable in another thread.
auto lock = std::unique_lock(m);
This lock is obtained in the main execution thread. Afterwards, the main execution thread creates all these multiple execution threads. Each one of these execution threads executes the following:
cv.wait(lock)
The mutex lock was not acquired by the execution thread that calls wait() here, therefore this is undefined behavior.
A more closer look at what you are attempting to do here suggests that you will likely get your intended results if you simply move
auto lock = std::unique_lock(m);
inside the lambda that gets executed by each new execution thread.
You also need to simply use notify_all() instead of calling notify_one() multiple times, due to various race conditions. Remember that wait() automatically unlocks the mutex and waits on the condition variable, and wait() returns only after the thread successfully relocked the mutex after being notified by the condition variable.
Using cout in multiple threads might result in interleaved output.
So I tried to protect cout with a mutex.
The following code starts 10 background threads with std::async. When a thread starts, it prints "Started thread ...".
The main thread iterates over the futures of the background threads in the order in which they were created and prints out "Done thread ..." when the corresponding thread finished.
The output is synchronized correctly, but after some threads have started and some have finished (see output below), a deadlock occurres. All background threads left and the main thread are waiting for the mutex.
What is the reason for the deadlock?
When the print function is left or one iteration of the for loop ends, the lock_guard should unlock the mutex, so that one of the waiting threads would be able to proceed.
Why are all the threads left starving?
Code
#include <future>
#include <iostream>
#include <vector>
using namespace std;
std::mutex mtx; // mutex for critical section
int print_start(int i) {
lock_guard<mutex> g(mtx);
cout << "Started thread" << i << "(" << this_thread::get_id() << ") " << endl;
return i;
}
int main() {
vector<future<int>> futures;
for (int i = 0; i < 10; ++i) {
futures.push_back(async(print_start, i));
}
//retrieve and print the value stored in the future
for (auto &f : futures) {
lock_guard<mutex> g(mtx);
cout << "Done thread" << f.get() << "(" << this_thread::get_id() << ")" << endl;
}
cin.get();
return 0;
}
Output
Started thread0(352)
Started thread1(14944)
Started thread2(6404)
Started thread3(16884)
Done thread0(16024)
Done thread1(16024)
Done thread2(16024)
Done thread3(16024)
Your problem lies in the use of future::get:
Returns the value stored in the shared state (or throws its exception)
when the shared state is ready.
If the shared state is not yet ready (i.e., the provider has not yet
set its value or exception), the function blocks the calling thread
and waits until it is ready.
http://www.cplusplus.com/reference/future/future/get/
So if the thread behind the future didn't get to run yet, the function blocks until that thread finishes. However, you take ownership of the mutex before calling future::get, so whichever thread you're waiting for will not be able to attain the mutex for itself.
This should fix your deadlock problem:
int value = f.get();
lock_guard<mutex> g(mtx);
cout << "Done thread" << value << "(" << this_thread::get_id() << ")" << endl;
You lock the mutex and then wait for one of the futures, which in turn requires a lock on the mutex itself. Simple rule: Don't wait with locked mutexes.
BTW: Locking output streams is not very effective, because it can easily be circumvented by code you don't even control. Rather than using those globals, give a stream to code that needs to output something (dependency injection) and then collect the data from that stream in a threadsafe way. Or use a logging library, because that's probably what you wanted to do anyway.
It is good that the reason was spotted from the source. However, quite often the error, as it happens, may be not so easy to locate. And the reason may differ as well. Fortunately, in case of deadlock you can use debugger to investigate it.
I compiled and ran your example, then after attaching to it with gdb (gcc 4.9.2/Linux), there is a backtrace (noisy implementation details skipped):
#0 __lll_lock_wait ()
...
#5 0x0000000000403140 in std::lock_guard<std::mutex>::lock_guard (
this=0x7ffe74903320, __m=...) at /usr/include/c++/4.9/mutex:377
#6 0x0000000000402147 in print_start (i=0) at so_deadlock.cc:9
...
#23 0x0000000000409e69 in ....::_M_complete_async() (this=0xdd4020)
at /usr/include/c++/4.9/future:1498
#24 0x0000000000402af2 in std::__future_base::_State_baseV2::wait (
this=0xdd4020) at /usr/include/c++/4.9/future:321
#25 0x0000000000404713 in std::__basic_future<int>::_M_get_result (
this=0xdd47e0) at /usr/include/c++/4.9/future:621
#26 0x0000000000403c48 in std::future<int>::get (this=0xdd47e0)
at /usr/include/c++/4.9/future:700
#27 0x000000000040229b in main () at so_deadlock.cc:24
This is just what is explained in the other answers - the code in locked section (so_deadlock.cc:24) calls future::get(), which in turn (by forcing the result) trying to acquire the lock again.
It might be not that simple in other cases, there are usually several threads, but it's all there.
As it turns out, condition_variable::wait_for should really be called condition_variable::wait_for_or_possibly_indefinitely_longer_than, because it needs to reacquire the lock before really timing out and returning.
See this program for a demonstration.
Is there a way to express, "Look, I really only have two seconds. If myPredicate() is still false at that time and/or the lock is still locked, I don't care, just carry on regardless and give me a way to detect that."
Something like:
bool myPredicate();
auto sec = std::chrono::seconds(1);
bool pred;
std::condition_variable::cv_status timedOut;
std::tie( pred, timedOut ) =
cv.really_wait_for_no_longer_than( lck, 2*sec, myPredicate );
if( lck.owns_lock() ) {
// Can use mutexed resource.
// ...
lck.unlock();
} else {
// Cannot use mutexed resource. Deal with it.
};
I think that you misuse the condition_variable's lock. It's for protecting condition only, not for protecting a time-consuming work.
Your example can be fixed easily by splitting the mutex into two - one is for critical section, another is for protecting modifications of ready condition. Here is the modified fragment:
typedef std::unique_lock<std::mutex> lock_type;
auto sec = std::chrono::seconds(1);
std::mutex mtx_work;
std::mutex mtx_ready;
std::condition_variable cv;
bool ready = false;
void task1() {
log("Starting task 1. Waiting on cv for 2 secs.");
lock_type lck(mtx_ready);
bool done = cv.wait_for(lck, 2*sec, []{log("Checking condition..."); return ready;});
std::stringstream ss;
ss << "Task 1 finished, done==" << (done?"true":"false") << ", " << (lck.owns_lock()?"lock owned":"lock not owned");
log(ss.str());
}
void task2() {
// Allow task1 to go first
std::this_thread::sleep_for(1*sec);
log("Starting task 2. Locking and sleeping 2 secs.");
lock_type lck1(mtx_work);
std::this_thread::sleep_for(2*sec);
lock_type lck2(mtx_ready);
ready = true; // This happens around 3s into the program
log("OK, task 2 unlocking...");
lck2.unlock();
cv.notify_one();
}
It's output:
#2 ms: Starting task 1. Waiting on cv for 2 secs.
#2 ms: Checking condition...
#1002 ms: Starting task 2. Locking and sleeping 2 secs.
#2002 ms: Checking condition...
#2002 ms: Task 1 finished, done==false, lock owned
#3002 ms: OK, task 2 unlocking...
Actually, the condition_variable::wait_for does exactly what you want. The problem with your example is that you locked a 2-second sleep along with the ready = true assignment, making it impossible for the condition variable to even evaluate the predicate before reaching the time limit.
Put that std::this_thread::sleep_for(2*sec); line outside the lock and see for yourself.
Is there a way to express, "Look, I really only have two seconds. If
myPredicate() is still false at that time and/or the lock is still
locked, I don't care, just carry on regardless ..."
Yes, there is a way, but unfortunately in case of wait_for it has to be manual. The wait_for waits indefinitely because of Spurious Wakeup. Imagine your loop like this:
while(!myPredicate())
cv.wait_for(lock, std::chrono::duration::seconds(2);
The spurious wakeup can happen anytime in an arbitrary platform. Imagine that in your case it happens within 200 ms. Due to this, without any external notification wait_for() will wakeup and check for myPredicate() in the loop condition.
As expected, the condition will be false, hence the loop will be true and again it will execute cv.wait_for(..), with fresh 2 seconds. This is how it will run infinitely.
Either you control that updation duration by yourself or use wait_until() which is ultimately called in wait_for().
For example I want each thread to not start running until the previous one has completed, is there a flag, something like thread.isRunning()?
#include <iostream>
#include <vector>
#include <thread>
using namespace std;
void hello() {
cout << "thread id: " << this_thread::get_id() << endl;
}
int main() {
vector<thread> threads;
for (int i = 0; i < 5; ++i)
threads.push_back(thread(hello));
for (thread& thr : threads)
thr.join();
cin.get();
return 0;
}
I know that the threads are meant to run concurrently, but what if I want to control the order?
There is no thread.isRunning(). You need some synchronization primitive to do it.
Consider std::condition_variable for example.
One approachable way is to use std::async. With the current definition of std::async is that the associated state of an operation launched by std::async can cause the returned std::future's destructor to block until the operation is complete. This can limit composability and result in code that appears to run in parallel but in reality runs sequentially.
{
std::async(std::launch::async, []{ hello(); });
std::async(std::launch::async, []{ hello(); }); // does not run until hello() completes
}
If we need the second thread start to run after the first one is completed, is a thread really needed?
For solution I think try to set a global flag, the set the value in the first thread, and when start the second thread, check the flag first should work.
You can't simply control the order like saying "First, thread 1, then thread 2,..." you will need to make use of synchronization (i.e. std::mutex and condition-variables std::condition_variable_any).
You can create events so as to block one thread until a certain event happend.
See cppreference for an overview of threading-mechanisms in C++-11.
You will need to use semaphore or lock.
If you initialize semaphore to value 0:
Call wait after thread.start() and call signal/ release in the end of thread execution function (e.g. run funcition in java, OnExit function etc...)
So the main thread will keep waiting until the thread in loop has completed its execution.
Task-based parallelism can achieve this, but C++ does not currently offer task model as part of it's threading libraries. If you have TBB or PPL you can use their task-based facilities.
I think you can achieve this by using std::mutex and std::condition_variable from C++11. To be able to run threads sequentially array of booleans in used, when thread is done doing some work it writes true in specific index of the array.
For example:
mutex mtx;
condition_variable cv;
int ids[10] = { false };
void shared_method(int id) {
unique_lock<mutex> lock(mtx);
if (id != 0) {
while (!ids[id - 1]) {
cv.wait(lock);
}
}
int delay = rand() % 4;
cout << "Thread " << id << " will finish in " << delay << " seconds." << endl;
this_thread::sleep_for(chrono::seconds(delay));
ids[id] = true;
cv.notify_all();
}
void test_condition_variable() {
thread threads[10];
for (int i = 0; i < 10; ++i) {
threads[i] = thread(shared_method, i);
}
for (thread &t : threads) {
t.join();
}
}
Output:
Thread 0 will finish in 3 seconds.
Thread 1 will finish in 1 seconds.
Thread 2 will finish in 1 seconds.
Thread 3 will finish in 2 seconds.
Thread 4 will finish in 2 seconds.
Thread 5 will finish in 0 seconds.
Thread 6 will finish in 0 seconds.
Thread 7 will finish in 2 seconds.
Thread 8 will finish in 3 seconds.
Thread 9 will finish in 1 seconds.
I am learning Multithreading. With regard to
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html#SCHEDULING
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
pthread_mutex_t count_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t condition_var = PTHREAD_COND_INITIALIZER;
void *functionCount1();
void *functionCount2();
int count = 0;
#define COUNT_DONE 10
#define COUNT_HALT1 3
#define COUNT_HALT2 6
main()
{
pthread_t thread1, thread2;
pthread_create( &thread1, NULL, &functionCount1, NULL);
pthread_create( &thread2, NULL, &functionCount2, NULL);
pthread_join( thread1, NULL);
pthread_join( thread2, NULL);
printf("Final count: %d\n",count);
exit(0);
}
// Write numbers 1-3 and 8-10 as permitted by functionCount2()
void *functionCount1()
{
for(;;)
{
// Lock mutex and then wait for signal to relase mutex
pthread_mutex_lock( &count_mutex );
// Wait while functionCount2() operates on count
// mutex unlocked if condition varialbe in functionCount2() signaled.
pthread_cond_wait( &condition_var, &count_mutex );
count++;
printf("Counter value functionCount1: %d\n",count);
pthread_mutex_unlock( &count_mutex );
if(count >= COUNT_DONE) return(NULL);
}
}
// Write numbers 4-7
void *functionCount2()
{
for(;;)
{
pthread_mutex_lock( &count_mutex );
if( count < COUNT_HALT1 || count > COUNT_HALT2 )
{
// Condition of if statement has been met.
// Signal to free waiting thread by freeing the mutex.
// Note: functionCount1() is now permitted to modify "count".
pthread_cond_signal( &condition_var );
}
else
{
count++;
printf("Counter value functionCount2: %d\n",count);
}
pthread_mutex_unlock( &count_mutex );
if(count >= COUNT_DONE) return(NULL);
}
}
I want to know the control flow of the code.
As pthread_cond_wait - unlocks the mutex and waits for the condition variable cond to be signaled
What I understood about the control of flow is
1) Thread One,Two are created and thread1 is passed the control (considering Single Core Processor System)
2) When it encounters pthread_cond_wait( &condition_var, &count_mutex ); in thread1 routine void *functionCount1() - it releases the lock and goes to wait state passing control to thread2 void *functionCount1()
3) In thread2 variable count is checked and since it satisfies count < COUNT_HALT1 || count > COUNT_HALT2 - it signals thread1 and restarts is to increment count
4) Steps 2 to 3 is repeated which displays 1-3 by thread1
5) For count 4-7 the thread2 is in action and there is no switching between thread1 and thread2
6)For count 8-10 again steps 2-3 are repeated.
I want to know whether my understanding is correct? Does thread1 goes to sleep and thread2 wakes it up ( i.e threads are switched) for count value 1-3 and 8-10 i.e switching between threads happen 5 times ?
EDIT
My main concern to ask this question is to know if thread1 will go to sleep state when it encounters pthread_cond_wait( &condition_var, &count_mutex ); and won't be active again unless signalled by thread2 and only then increments count i.e. it is not going to increment 1-3 in one go rather for each increment , it has to wait for signal from thread2 only then it can proceed further
First: get the book by Butenhof, and study it. The page you
cite is incorrect in several places, and the author obviously
doesn't understand threading himself.
With regards to your questions: the first thing to say is that
you cannot know about the control flow of the code. That's
a characteristic of threading, and on modern processors, you'll
often find the threads really running in parallel, with one core
executing one thread, and another core another. And within each
thread, the processor may rearrange memory accesses in
unexpected ways. This is why you need mutexes, for example.
(You mention "considering single core processing system", but in
practice, single core general purpose systems don't exist any
more.)
Second, how the threads are scheduled is up to the operating
system. In your code, for example, functionCount2 could run
until completion before functionCount1 starts, which would
mean that functionCount1 would wait forever.
Third, a thread in pthread_cond_wait may wake up spuriously.
It is an absolute rule that pthread_cond_wait be in a loop,
which checks whatever condition you're actually waiting for.
Maybe something like:
while ( count > COUNT_HALT1 && count < COUNT_HALT2 ) {
pthread_cond_wait( &condition_var, &count_mutex );
}
Finally, at times, you're accessing count in a section not
protected by the mutex. This is undefined behavior; all
accesses to count must be protected. In your case, the
locking and unlocking should probably be outside the program
loop, and both threads should wait on the conditional
variable. (But it's obviously an artificial situation—in
practice, there will almost always be a producer thread and
a consumer thread.)
In the ideal world, yes, but, in practice, not quite.
You can't predict which tread takes control first. Yes, it's likely to be thread1, but still not guaranteed. It is the first racing condition in your code.
When thread2 takes control, most likely it will finish without stopping. Regardless of how many CPUs you have. The reason is that it has no place to yield unconditionally. The fact you release mutex, doesn't mean any1 can get a lock on it. Its the second racing condition in your code.
So, the thread1 will print 11, and that is the only part guaranteed.
1) Threads are created. Control is not passed to thread1, it's system scheduler who decides which thread to execute. Both threads are active, both should receive processor time, but the order is not determined. There might be several context switches, you don't really control this.
2) Correct, thread1 comes to waiting state, thread2 continues working. Again, control is not passed explicitly.
3) Yes, thread2 notifies condition variable, so thread1 will awake and try to reacquire the mutex. Control does not go immediately to thread1.
In general you should understand that you can't really control which thread to execute; it's job os the system scheduler, who can put as many context switches as it wants.
UPD: With condition variables you can control the order of tasks execution within multithreading environment. So I think your understanding is more or less correct: thread1 is waiting on condition variable for signal from thread2. When signal is received thread1 continues execution (after it reacquires the mutex). But switching between threads - there might be many of them, 5 is just a theoretical minimum.