I wrote a simple program to test out the performance of std::shared_mutex across a number of threads looping lock_shared(). But from the result, it doesn't seems to scale with more threads added, which doesn't really make sense to me.
You may argue it's because the stopFlag limiting the performance, so the second for loop a test for increment a local counter, which is almost perfect scaling in the beginning
The result in the comments are compiled with MSVC with Release flag.
int main()
{
const auto threadLimit = std::thread::hardware_concurrency() - 1; //for running main()
struct SharedMutexWrapper
{
std::shared_mutex mut;
void read()
{
mut.lock_shared();
mut.unlock_shared();
}
};
/*Testing shared_mutex */
for (auto i = 1; i <= threadLimit; ++i)
{
std::cerr << "Testing " << i << " threads: ";
SharedMutexWrapper test;
std::atomic<unsigned long long> count = 0;
std::atomic_bool stopFlag = false;
std::vector<std::thread> threads;
threads.reserve(i);
for (auto j = 0; j < i; ++j)
threads.emplace_back([&] {unsigned long long local = 0; while (!stopFlag) { test.read(); ++local; } count += local; });
std::this_thread::sleep_for(std::chrono::seconds{ 1 });
stopFlag = true;
for (auto& thread : threads)
thread.join();
std::cerr << count << '\n';
}
/*
Testing 1 threads: 60394076
Testing 2 threads: 39703889
Testing 3 threads: 23461029
Testing 4 threads: 16961003
Testing 5 threads: 12750838
Testing 6 threads: 12227898
Testing 7 threads: 12245818
*/
for (auto i = 1; i <= threadLimit; ++i)
{
std::cerr << "Testing " << i << " threads: ";
std::atomic<unsigned long long> count = 0;
std::atomic_bool stopFlag = false;
std::vector<std::thread> threads;
threads.reserve(i);
for (auto j = 0; j < i; ++j)
threads.emplace_back([&] {unsigned long long local = 0; while (!stopFlag) ++local; count += local; });
std::this_thread::sleep_for(std::chrono::seconds{ 1 });
stopFlag = true;
for (auto& thread : threads)
thread.join();
std::cerr << count << '\n';
}
/*
Testing 1 threads: 3178867276
Testing 2 threads: 6305783667
Testing 3 threads: 9388659151
Testing 4 threads: 12472666861
Testing 5 threads: 15230810694
Testing 6 threads: 18130479890
Testing 7 threads: 20151074046
*/
}
Placing a read lock on a shared mutex modifies the state of that mutex. All your threads do nothing but try to change the state of the same object, the shared mutex. So of course this code is going to scale poorly.
The point of a shared mutex is to allow accesses to shared data that is not modified to scale. You have no accesses to shared data that is not modified. So you aren't measuring any of the beneficial properties of a shared mutex here.
Related
I would like to apply as simple mutex as possible.
#include <iostream>
#include <thread>
#include <vector>
#include <functional>
#include <algorithm>
#include <mutex>
using namespace std;
int sum;
static mutex m;
void addValue(int value)
{
m.lock();
sum += value;
m.unlock();
}
int main()
{
int counter1 = 0;
int counter2 = 0;
for (int i = 0; i < 100; i++)
{
thread t1(addValue, 100);
thread t2(addValue, 200);
if (sum == 300)
{
counter1++;
}
else
{
counter2++;
}
sum = 0;
t1.join();
t2.join();
}
cout << counter1 << endl;
cout << counter2 << endl;
}
Unfortunately above mentioned code doesn't work as expected. I expect that:
a) sum is always equal to 300
b) counter1 is always 100
c) counter2 is always 0
What is wrong?
EDIT:
When I debug the sum variable in the else condition, I see values like:
200, 400, 100, and even 0 (I assume that addition didn't even happen).
C++ mutex doesn't work - synchronization fails
Why does everyone learning this stuff for the first time assume the tried-and-tested synchronization primitives that work for everyone else are broken, and not their assumptions?
The mutex is fine. Your mental model is broken. This should be your starting assumption.
I expect that:
sum is always equal to 300
That would be the case if you joined both threads before checking the value. But you haven't done that, so you're doing an entirely un-sychronized read of sum while two other threads are possibly mutating it. This is a data race. A mutex doesn't protect your data unless you always use the mutex when accessing the data.
Let's say we make the minimal change so sum is always protected:
thread t1(addValue, 100); // a
thread t2(addValue, 200); // b
m.lock();
if (sum == 300) // c
{
counter1++;
}
else
{
counter2++;
}
sum = 0;
m.unlock();
now some of the available orderings are:
abc - what you expected (and what would be guaranteed if you joined both threads before reading sum)
acb - you read 100 at line c, increment counter2, and the second thread increments sum to 300 after you read it (but you never see this)
cab - you read 0 immediately, before the two threads have even been scheduled to run
bca - you read 200, it's later incremented to 300 after you checked
etc.
every permutation is permitted, unless you make some effort to explicitly order them
It is working as intended, the problem is that you didn't expected that "time" will not be the same for all 3 threads and you dismist the obvious thing that one thread starts before the other, this clearly adds an avantage, even more if it only needs to do is loop 100 times a increment.
#include <iostream>
#include <thread>
#include <mutex>
bool keep_alive;
void add_value_mutex(std::mutex * mx, int * trg, int value) {
while (keep_alive){
mx->lock();
(*trg) += value;
mx->unlock();
}
}
int main(){
std::thread thread_1;
std::thread thread_2;
int count_targ = 2000;
int * counter_1 = new int(0);
int * counter_2 = new int(0);
/* --- */
std::mutex mx_1;
std::mutex mx_2;
keep_alive = true;
thread_1 = std::thread(add_value_mutex, &mx_1, counter_1, 1);
thread_2 = std::thread(add_value_mutex, &mx_2, counter_2, 1);
while(1){
if (mx_1.try_lock()){
if (count_targ <= * counter_1){
mx_1.unlock();
break;
}
mx_1.unlock();
}
if (mx_2.try_lock()){
if (count_targ <= * counter_2){
mx_2.unlock();
break;
}
mx_2.unlock();
}
}
keep_alive = false;
thread_1.join();
thread_2.join();
std::cout << "Thread 1 (independent mutex) -> " << * counter_1 << std::endl;
std::cout << "Thread 2 (independent mutex) -> " << * counter_2 << std::endl;
/* --- */
keep_alive = true;
(*counter_1) = 0;
(*counter_2) = 0;
std::mutex mx_s;
thread_1 = std::thread(add_value_mutex, &mx_s, counter_1, 1);
thread_2 = std::thread(add_value_mutex, &mx_s, counter_2, 1);
while(1){
if (mx_s.try_lock()){
if (count_targ <= * counter_1 || count_targ <= * counter_2){
mx_s.unlock();
break;
}
mx_s.unlock();
}
}
std::cout << "Thread 1 (shared mutex) -> " << * counter_1 << std::endl;
std::cout << "Thread 2 (shared mutex) -> " << * counter_2 << std::endl;
keep_alive = false;
thread_1.join();
thread_2.join();
delete counter_1;
delete counter_2;
return 0;
}
If you want another example of mine that measures the time a thread is waiting check this one
I have the following code, I need to have a random number in given interval. Seems to work how I need.
std::default_random_engine eng;
std::uniform_int_distribution<int> dist(3, 7);
int timeout = dist(eng);
But then I run it in different threads and repeated in the loop.
std::default_random_engine defRandEng(std::this_thread::get_id());
std::uniform_int_distribution<int> dist(3, 7);
int timeout; // if I put timeout = dist(defRandEng); here it's all the same
while (true)
{
timeout = dist(defRandEng);
std::cout<<"Thread "<<std::this_thread::get_id()<<" timeout = "<<timeout<<std::endl;
std::this_thread::sleep_for(std::chrono::seconds(timeout));
}
But for every iteration in all threads the values are the same
Thread 139779167999744 timeout = 6
Thread 139779134428928 timeout = 6
Thread 139779067287296 timeout = 6
Thread 139779117643520 timeout = 6
Thread 139779100858112 timeout = 6
Thread 139779084072704 timeout = 6
Thread 139779151214336 timeout = 6
Thread 139779050501888 timeout = 6
Thread 139779033716480 timeout = 6
next interation
Thread 139779167999744 timeout = 4
Thread 139779151214336 timeout = 4
Thread 139779134428928 timeout = 4
Thread 139779117643520 timeout = 4
Thread 139779100858112 timeout = 4
Thread 139779084072704 timeout = 4
Thread 139779067287296 timeout = 4
Thread 139779050501888 timeout = 4
Thread 139779033716480 timeout = 4
You need to provide a seed based on some naturally random value to your random engine. The below example, that was adopted from your code snippets, works fine with 3 threads:
std::mutex lock;
void sample_time_out()
{
std::stringstream ss;
ss << std::this_thread::get_id();
uint64_t thread_id = std::stoull(ss.str());
std::default_random_engine eng(thread_id);
std::uniform_int_distribution<int> dis(3, 7);
for (int i = 0; i < 3; i++)
{
auto timeout = dis(eng);
std::this_thread::sleep_for(std::chrono::seconds(timeout));
{
std::unique_lock<std::mutex> lock1(lock);
std::cout << "Thread " << std::this_thread::get_id() << " timeout = " << timeout << std::endl;
}
}
}
int main()
{
std::thread t1(sample_time_out);
std::thread t2(sample_time_out);
std::thread t3(sample_time_out);
t1.join();
t2.join();
t3.join();
return 0;
}
And the output of my first run is:
Thread 31420 timeout = 3
Thread 18616 timeout = 6
Thread 31556 timeout = 7
Thread 31420 timeout = 4
Thread 18616 timeout = 7
Thread 31420 timeout = 6
Thread 31556 timeout = 7
Thread 18616 timeout = 4
Thread 31556 timeout = 7
I am trying to conduct strict alternation on 2 processes, but I am not sure how to declare the critical region and non-critical region. Here is the code that I have:
#include <iostream>
#include <pthread.h>
int count;
int turn = 0; // Shared variable used to implement strict alternation
void* myFunction(void* arg)
{
int actual_arg = *((int*) arg);
for(unsigned int i = 0; i < 10; ++i) {
while(1)
{
while(turn != 0)
{
critical_region_0();
turn = 1;
non_critical_region_0();
}
}
// Beginning of the critical region
count++;
std::cout << "Thread #" << actual_arg << " count = " << count <<
std::endl;
// End of the critical region
while(0)
{
while(turn != 1)
{
critical_region_1();
turn = 0
non_critical_region_1();
}
}
}
pthread_exit(NULL);
}
int main()
{
int rc[2];
pthread_t ids[2];
int args[2];
count = 0;
for(unsigned int i = 0; i < 2; ++i) {
args[i] = i;
rc[i] = pthread_create(&ids[i], NULL, myFunction, (void*) &args[i]);
}
for(unsigned int i = 0; i < 2; ++i) {
pthread_join(ids[i], NULL);
}
std::cout << "Final count = " << count << std::endl;
pthread_exit(NULL);
}
I know that the critical region and non-critical regions are written as if they are a method but I am using those as placeholders. Is there a way to conduct Strict Alternation without the use of these methods?
Here is what the output should look like.
Thread #0 count = 1
Thread #1 count = 2
Thread #0 count = 3
Thread #1 count = 4
Thread #0 count = 5
Thread #1 count = 6
Thread #0 count = 7
Thread #1 count = 8
Thread #0 count = 9
Thread #1 count = 10
Thread #0 count = 11
Thread #1 count = 12
Thread #0 count = 13
Thread #1 count = 14
Thread #0 count = 15
Thread #1 count = 16
Thread #0 count = 17
Thread #1 count = 18
Thread #0 count = 19
Thread #1 count = 20
Final count = 20
The output I can only manage to get is all of thread 1 first then thread 0.
I think this is a classic place for signals. Each function controlling the thread looks like (thread 1 for example)
while( ... ) {
...
pthread_cond_signal(thread1_done_work);
pthread_cond_wait(thread_2_done_work);
}
where both the work variables are globals of type pthread_cond_t - I think it's more readable with two, but you don't have to use two (a mutex implementation comes to mind).
Thread 2 needs a wait condition as soon as it starts. Here are some details:
https://linux.die.net/man/3/pthread_cond_wait
https://linux.die.net/man/3/pthread_cond_signal
Basically each thread signals it's done (this blocks until the other thread is ready to catch this), then waits for the other thread. So they're "talking" my turn, your turn, etc. If you insist on using the same function for both threads you can pace the condition variables as arguments (swapped for thread 2 relative to 1).
One final small note - this somewhat contradicts the whole purpose of threading.
I am working on a proof of concept test program for a game where certain actions are threaded and information is output to the command window for each thread. So far I have gotten the basic threading process to work but it seems that the couting in my called function is not being written for each thread and instead each thread is overwriting the others output.
The desired or expected output is that each thread will output the information couted within the mCycle function of mLaser. Essentially this is meant to be a timer of sorts for each object counting down the time until that object has completed its task. There should be an output for each thread, so if five threads are running there should be five counters counting down independently.
The current output is such that each thread is outputting its own information with in the same space which then overwrites what another thread is attempting to output.
Here is an example of the current output of the program:
Time until cycle Time until cycle 74 is complete: 36 is complete:
92 seconds 2 seconds ress any key to continue . . .
You can see the aberrations where numbers and other text are in places they should not be if you examine how the information is couted from mCycle.
What should be displayed is more long these lines:
Time until cycle 1 is complete:
92 seconds
Time until cycle 2 is complete:
112 seconds
Time until cycle 3 is complete:
34 seconds
Cycle 4 has completed!
I am not sure if this is due to some kind of thread locking going on due to how my code is structured or just an oversight in my coding for the output. If I could get a fresh pair of eyes to look over the code and point anything out that could be the fault I would appreciate it.
Here is my code, it should be compilable in any MSVS 2013 install (no custom libraries used)
#include <iostream>
#include <Windows.h>
#include <string>
#include <vector>
#include <random>
#include <thread>
#include <future>
using namespace std;
class mLaser
{
public:
mLaser(int clen, float mamt)
{
mlCLen = clen;
mlMAmt = mamt;
}
int getCLen()
{
return mlCLen;
}
float getMAmt()
{
return mlMAmt;
}
void mCycle(int i1, int mCLength)
{
bool bMCycle = true;
int mCTime_left = mCLength * 1000;
int mCTime_start = GetTickCount(); //Get cycle start time
int mCTime_old = ((mCTime_start + 500) / 1000);
cout << "Time until cycle " << i1 << " is complete: " << endl;
while (bMCycle)
{
cout << ((mCTime_left + 500) / 1000) << " seconds";
bool bNChange = true;
while (bNChange)
{
//cout << ".";
int mCTime_new = GetTickCount();
if (mCTime_old != ((mCTime_new + 500) / 1000))
{
//cout << mCTime_old << " " << ((mCTime_new+500)/1000) << endl;
mCTime_old = ((mCTime_new + 500) / 1000);
mCTime_left -= 1000;
bNChange = false;
}
}
cout << " \r" << flush;
if (mCTime_left == 0)
{
bMCycle = false;
}
}
cout << "Mining Cycle " << i1 << " finished" << endl;
system("Pause");
return true;
}
private:
int mlCLen;
float mlMAmt;
};
string sMCycle(mLaser ml, int i1, thread& thread);
int main()
{
vector<mLaser> mlasers;
vector<thread> mthreads;
future<string> futr;
random_device rd;
mt19937 gen(rd());
uniform_int_distribution<> laser(1, 3);
uniform_int_distribution<> cLRand(30, 90);
uniform_real_distribution<float> mARand(34.0f, 154.3f);
int lasers;
int cycle_time;
float mining_amount;
lasers = laser(gen);
for (int i = 0; i < lasers-1; i++)
{
mlasers.push_back(mLaser(cLRand(gen), mARand(gen)));
mthreads.push_back(thread());
}
for (int i = 0; i < mlasers.size(); i++)
{
futr = async(launch::async, [mlasers, i, &mthreads]{return sMCycle(mlasers.at(i), i + 1, mthreads.at(i)); });
//mthreads.at(i) = thread(bind(&mLaser::mCycle, ref(mlasers.at(i)), mlasers.at(i).getCLen(), mlasers.at(i).getMAmt()));
}
for (int i = 0; i < mthreads.size(); i++)
{
//mthreads.at(i).join();
}
//string temp = futr.get();
//float out = strtof(temp.c_str(),NULL);
//cout << out << endl;
system("Pause");
return 0;
}
string sMCycle(mLaser ml, int i1, thread& t1)
{
t1 = thread(bind(&mLaser::mCycle, ref(ml), ml.getCLen(), ml.getMAmt()));
//t1.join();
return "122.0";
}
Although writing from multiple threads concurrently to std::cout has to be data race free, there is no guarantee that concurrent writes won't be interleaved. I'm not sure if one write operation of one thread can be interleaved with one write operation from another thread but they can certainly be interleaved between write operations (I think individual outputs from different threads can be interleaved).
What the standard has to say about concurrent access to the standard stream objects (i.e. std::cout, std::cin, etc.) is in 27.4.1 [iostream.objects.overview] paragraph 4:
Concurrent access to a synchronized (27.5.3.4) standard iostream object’s formatted and unformatted input (27.7.2.1) and output (27.7.3.1) functions or a standard C stream by multiple threads shall not result in a data race (1.10). [ Note: Users must still synchronize concurrent use of these objects and streams by multiple threads if they wish to avoid interleaved characters. —end note ]
If you want to have output appear in some sort of unit, you will need to synchronize access to std::cout, e.g., by using a mutex.
While Dietmar's answer is sufficient I decided to go a different, much more simple, route. Since I am creating instances of a class and I am accessing those instances in the threads, I chose to update those class' data during the threading and then called the updated data once the thread have finished executing.
This way I do not have to deal with annoying problems like data races nor grabbing output from async in a vector of shared_future. Here is my revised code in case anyone else would like to implement something similar:
#include <iostream>
#include <Windows.h>
#include <string>
#include <vector>
#include <random>
#include <thread>
#include <future>
using namespace std; //Tacky, but good enough fo a poc D:
class mLaser
{
public:
mLaser(int clen, float mamt, int time_left)
{
mlCLen = clen;
mlMAmt = mamt;
mCTime_left = time_left;
bIsCompleted = false;
}
int getCLen()
{
return mlCLen;
}
float getMAmt()
{
return mlMAmt;
}
void setMCOld(int old)
{
mCTime_old = old;
}
void mCycle()
{
if (!bIsCompleted)
{
int mCTime_new = GetTickCount(); //Get current tick count for comparison to mCOld_time
if (mCTime_old != ((mCTime_new + 500) / 1000)) //Do calculations to see if time has passed since mCTime_old was set
{
//If it has then update mCTime_old and remove one second from mCTime_left.
mCTime_old = ((mCTime_new + 500) / 1000);
mCTime_left -= 1000;
}
cur_time = mCTime_left;
}
else
{
mCTime_left = 0;
}
}
int getCTime()
{
return cur_time;
}
int getCTLeft()
{
return mCTime_left;
}
void mCComp()
{
bIsCompleted = true;
}
bool getCompleted()
{
return bIsCompleted;
}
private:
int mlCLen; //Time of a complete mining cycle
float mlMAmt; //Amoung of ore produced by one mining cycle (not used yet)
int cur_time; //The current time remaining in the current mining cycle; will be removing this as it is just a copy of mCTime_left that I was going to use for another possiblity to make this code work
int mCTime_left; //The current time remaining in the current mining cycle
int mCTime_old; //The last time that mCycle was called
bool bIsCompleted; //Flag to check if a mining cycle has already been accounted for as completed
};
void sMCycle(mLaser& ml, int i1, thread& _thread); //Start a mining cycle thread
//Some global defines
random_device rd;
mt19937 gen(rd());
uniform_int_distribution<> laser(1, 10); //A random range for the number of mlaser entities to use
uniform_int_distribution<> cLRand(30, 90); //A random time range in seconds of mining cycle lengths
uniform_real_distribution<float> mARand(34.0f, 154.3f); //A random float range of the amount of ore produced by one mining cycle (not used yet)
int main()
{
//Init some variables for later use
vector<mLaser> mlasers; //Vector to hold mlaser objects
vector<thread> mthreads; //Vector to hold threads
vector<shared_future<int>> futr; //Vector to hold shared_futures (not used yet, might not be used if I can get the code working like this)
int lasers; //Number of lasers to create
int cycle_time; //Mining cycle time
int active_miners = 0; //Number of active mining cycle threads (one for each laser)
float mining_amount; //Amount of ore produced by one mining cycle (not used yet)
lasers = laser(gen); //Get a random number
active_miners = lasers; //Set this to that random number for the while loop later on
//Create the mlaser objects and push them into the mlasers vector
for (int i = 0; i < lasers; i++)
{
int clength = cLRand(gen);
mlasers.push_back(mLaser(clength, mARand(gen), (clength * 1000)));
//Also push thread obects into mthreads for each laser object
mthreads.push_back(thread());
}
//Setup data for mining cycles
for (int i = 0; i < mlasers.size(); i++)
{
int mCTime_start = GetTickCount(); //Get cycle start time
mlasers.at(i).setMCOld(((mCTime_start + 500) / 1000));
}
//Print initial display for mining cycles
for (int i = 0; i < mlasers.size(); i++)
{
cout << "Mining Laser " << i + 1 << " cycle will complete in " << (mlasers.at(i).getCTLeft() + 500) / 1000 << " seconds..." << endl;
}
while (active_miners > 0)
{
for (int i = 0; i < mlasers.size(); i++)
{
//futr.push_back(async(launch::async, [mlasers, i, &mthreads]{return sMCycle(mlasers.at(i), i + 1, mthreads.at(i)); }));
async(launch::async, [&mlasers, i, &mthreads]{return sMCycle(mlasers.at(i), i + 1, mthreads.at(i)); }); //Launch a thread for the current mlaser object
//mthreads.at(i) = thread(bind(&mLaser::mCycle, ref(mlasers.at(i)), mlasers.at(i).getCLen(), mlasers.at(i).getMAmt()));
}
//Output information from loops
//cout << " \r" << flush; //Return cursor to start of line and flush the buffer for the next info
system("CLS");
for (int i = 0; i < mlasers.size(); i++)
{
if (mlasers.at(i).getCTLeft() != 0) //If mining cycle is not completed
{
cout << "Mining Laser " << i + 1 << " cycle will complete in " << (mlasers.at(i).getCTLeft() + 500) / 1000 << " seconds..." << endl;
}
else if (mlasers.at(i).getCTLeft() == 0) //If it is completed
{
if (!mlasers.at(i).getCompleted())
{
mlasers.at(i).mCComp();
active_miners -= 1;
}
cout << "Mining Laser " << i + 1 << " has completed its mining cycle!" << endl;
}
}
}
/*for (int i = 0; i < mthreads.size(); i++)
{
mthreads.at(i).join();
}*/
//string temp = futr.get();
//float out = strtof(temp.c_str(),NULL);
//cout << out << endl;
system("Pause");
return 0;
}
void sMCycle(mLaser& ml, int i1,thread& _thread)
{
//Start thread
_thread = thread(bind(&mLaser::mCycle, ref(ml)));
//Join the thread
_thread.join();
}
i am trying to learn the boost library and was going through examples of boost::thread.
The example below illustrates the usage of the boost::lock_guard for thread synchronization, ensuring that the access to std::cout is not concurrent:
#include <boost/thread.hpp>
#include <boost/format.hpp>
#include <iostream>
void wait(const int secs) {
boost::this_thread::sleep(boost::posix_time::seconds(secs));
}
boost::mutex mutex;
void thread1() {
for (int i = 0; i < 10; ++i) {
wait(1); // <-- all works fine if wait is placed here
boost::lock_guard<boost::mutex> lock(mutex);
std::cout << boost::format("thread A here %d\n") % i ;
}
}
void thread2() {
for (int i = 0; i < 10; ++i) {
wait(1); // <-- all works fine if wait is placed here
boost::lock_guard<boost::mutex> lock(mutex);
std::cout << boost::format("thread B here %d\n") % i;
}
}
int main() {
boost::thread t1(thread1);
boost::thread t2(thread2);
t1.join();
t2.join();
}
The results where pretty much what one would expect, i.e. alternating messages by the two threads printed:
thread A here 0
thread B here 0
thread A here 1
thread B here 1
thread A here 2
thread B here 2
thread A here 3
thread B here 3
thread A here 4
thread B here 4
...
However, a small modification -- moving the wait call inside the scope of the lock guard -- led to a surprise:
void thread1() {
for (int i = 0; i < 10; ++i) {
boost::lock_guard<boost::mutex> lock(mutex);
wait(1); // <== !
std::cout << boost::format("thread A here %d\n") % i ;
}
}
void thread2() {
for (int i = 0; i < 10; ++i) {
boost::lock_guard<boost::mutex> lock(mutex);
wait(1); // <== !
std::cout << boost::format("thread B here %d\n") % i;
}
Now either thead1 or thread2 wins the initial "race" for the mutex and then wins again and again on each loop iteration, thereby starving the other thread!
Example output:
thread B here 0
thread B here 1
thread B here 2
thread B here 3
thread B here 4
thread B here 5
thread B here 6
thread B here 7
thread B here 8
thread B here 9
thread A here 0
thread A here 1
thread A here 2
thread A here 3
thread A here 4
thread A here 5
thread A here 6
thread A here 7
thread A here 8
thread A here 9
Can anybody please explain why this is the case?
This is because after the lock is acquired the wait call causes the second thread to begin executing. Since the second thread cannot acquire the lock it goes into a wait-state until the lock is available. In your case the lock does not become available until the first thread completes it's loop.