i am trying to learn the boost library and was going through examples of boost::thread.
The example below illustrates the usage of the boost::lock_guard for thread synchronization, ensuring that the access to std::cout is not concurrent:
#include <boost/thread.hpp>
#include <boost/format.hpp>
#include <iostream>
void wait(const int secs) {
boost::this_thread::sleep(boost::posix_time::seconds(secs));
}
boost::mutex mutex;
void thread1() {
for (int i = 0; i < 10; ++i) {
wait(1); // <-- all works fine if wait is placed here
boost::lock_guard<boost::mutex> lock(mutex);
std::cout << boost::format("thread A here %d\n") % i ;
}
}
void thread2() {
for (int i = 0; i < 10; ++i) {
wait(1); // <-- all works fine if wait is placed here
boost::lock_guard<boost::mutex> lock(mutex);
std::cout << boost::format("thread B here %d\n") % i;
}
}
int main() {
boost::thread t1(thread1);
boost::thread t2(thread2);
t1.join();
t2.join();
}
The results where pretty much what one would expect, i.e. alternating messages by the two threads printed:
thread A here 0
thread B here 0
thread A here 1
thread B here 1
thread A here 2
thread B here 2
thread A here 3
thread B here 3
thread A here 4
thread B here 4
...
However, a small modification -- moving the wait call inside the scope of the lock guard -- led to a surprise:
void thread1() {
for (int i = 0; i < 10; ++i) {
boost::lock_guard<boost::mutex> lock(mutex);
wait(1); // <== !
std::cout << boost::format("thread A here %d\n") % i ;
}
}
void thread2() {
for (int i = 0; i < 10; ++i) {
boost::lock_guard<boost::mutex> lock(mutex);
wait(1); // <== !
std::cout << boost::format("thread B here %d\n") % i;
}
Now either thead1 or thread2 wins the initial "race" for the mutex and then wins again and again on each loop iteration, thereby starving the other thread!
Example output:
thread B here 0
thread B here 1
thread B here 2
thread B here 3
thread B here 4
thread B here 5
thread B here 6
thread B here 7
thread B here 8
thread B here 9
thread A here 0
thread A here 1
thread A here 2
thread A here 3
thread A here 4
thread A here 5
thread A here 6
thread A here 7
thread A here 8
thread A here 9
Can anybody please explain why this is the case?
This is because after the lock is acquired the wait call causes the second thread to begin executing. Since the second thread cannot acquire the lock it goes into a wait-state until the lock is available. In your case the lock does not become available until the first thread completes it's loop.
Related
I would like to apply as simple mutex as possible.
#include <iostream>
#include <thread>
#include <vector>
#include <functional>
#include <algorithm>
#include <mutex>
using namespace std;
int sum;
static mutex m;
void addValue(int value)
{
m.lock();
sum += value;
m.unlock();
}
int main()
{
int counter1 = 0;
int counter2 = 0;
for (int i = 0; i < 100; i++)
{
thread t1(addValue, 100);
thread t2(addValue, 200);
if (sum == 300)
{
counter1++;
}
else
{
counter2++;
}
sum = 0;
t1.join();
t2.join();
}
cout << counter1 << endl;
cout << counter2 << endl;
}
Unfortunately above mentioned code doesn't work as expected. I expect that:
a) sum is always equal to 300
b) counter1 is always 100
c) counter2 is always 0
What is wrong?
EDIT:
When I debug the sum variable in the else condition, I see values like:
200, 400, 100, and even 0 (I assume that addition didn't even happen).
C++ mutex doesn't work - synchronization fails
Why does everyone learning this stuff for the first time assume the tried-and-tested synchronization primitives that work for everyone else are broken, and not their assumptions?
The mutex is fine. Your mental model is broken. This should be your starting assumption.
I expect that:
sum is always equal to 300
That would be the case if you joined both threads before checking the value. But you haven't done that, so you're doing an entirely un-sychronized read of sum while two other threads are possibly mutating it. This is a data race. A mutex doesn't protect your data unless you always use the mutex when accessing the data.
Let's say we make the minimal change so sum is always protected:
thread t1(addValue, 100); // a
thread t2(addValue, 200); // b
m.lock();
if (sum == 300) // c
{
counter1++;
}
else
{
counter2++;
}
sum = 0;
m.unlock();
now some of the available orderings are:
abc - what you expected (and what would be guaranteed if you joined both threads before reading sum)
acb - you read 100 at line c, increment counter2, and the second thread increments sum to 300 after you read it (but you never see this)
cab - you read 0 immediately, before the two threads have even been scheduled to run
bca - you read 200, it's later incremented to 300 after you checked
etc.
every permutation is permitted, unless you make some effort to explicitly order them
It is working as intended, the problem is that you didn't expected that "time" will not be the same for all 3 threads and you dismist the obvious thing that one thread starts before the other, this clearly adds an avantage, even more if it only needs to do is loop 100 times a increment.
#include <iostream>
#include <thread>
#include <mutex>
bool keep_alive;
void add_value_mutex(std::mutex * mx, int * trg, int value) {
while (keep_alive){
mx->lock();
(*trg) += value;
mx->unlock();
}
}
int main(){
std::thread thread_1;
std::thread thread_2;
int count_targ = 2000;
int * counter_1 = new int(0);
int * counter_2 = new int(0);
/* --- */
std::mutex mx_1;
std::mutex mx_2;
keep_alive = true;
thread_1 = std::thread(add_value_mutex, &mx_1, counter_1, 1);
thread_2 = std::thread(add_value_mutex, &mx_2, counter_2, 1);
while(1){
if (mx_1.try_lock()){
if (count_targ <= * counter_1){
mx_1.unlock();
break;
}
mx_1.unlock();
}
if (mx_2.try_lock()){
if (count_targ <= * counter_2){
mx_2.unlock();
break;
}
mx_2.unlock();
}
}
keep_alive = false;
thread_1.join();
thread_2.join();
std::cout << "Thread 1 (independent mutex) -> " << * counter_1 << std::endl;
std::cout << "Thread 2 (independent mutex) -> " << * counter_2 << std::endl;
/* --- */
keep_alive = true;
(*counter_1) = 0;
(*counter_2) = 0;
std::mutex mx_s;
thread_1 = std::thread(add_value_mutex, &mx_s, counter_1, 1);
thread_2 = std::thread(add_value_mutex, &mx_s, counter_2, 1);
while(1){
if (mx_s.try_lock()){
if (count_targ <= * counter_1 || count_targ <= * counter_2){
mx_s.unlock();
break;
}
mx_s.unlock();
}
}
std::cout << "Thread 1 (shared mutex) -> " << * counter_1 << std::endl;
std::cout << "Thread 2 (shared mutex) -> " << * counter_2 << std::endl;
keep_alive = false;
thread_1.join();
thread_2.join();
delete counter_1;
delete counter_2;
return 0;
}
If you want another example of mine that measures the time a thread is waiting check this one
I wrote a simple program to test out the performance of std::shared_mutex across a number of threads looping lock_shared(). But from the result, it doesn't seems to scale with more threads added, which doesn't really make sense to me.
You may argue it's because the stopFlag limiting the performance, so the second for loop a test for increment a local counter, which is almost perfect scaling in the beginning
The result in the comments are compiled with MSVC with Release flag.
int main()
{
const auto threadLimit = std::thread::hardware_concurrency() - 1; //for running main()
struct SharedMutexWrapper
{
std::shared_mutex mut;
void read()
{
mut.lock_shared();
mut.unlock_shared();
}
};
/*Testing shared_mutex */
for (auto i = 1; i <= threadLimit; ++i)
{
std::cerr << "Testing " << i << " threads: ";
SharedMutexWrapper test;
std::atomic<unsigned long long> count = 0;
std::atomic_bool stopFlag = false;
std::vector<std::thread> threads;
threads.reserve(i);
for (auto j = 0; j < i; ++j)
threads.emplace_back([&] {unsigned long long local = 0; while (!stopFlag) { test.read(); ++local; } count += local; });
std::this_thread::sleep_for(std::chrono::seconds{ 1 });
stopFlag = true;
for (auto& thread : threads)
thread.join();
std::cerr << count << '\n';
}
/*
Testing 1 threads: 60394076
Testing 2 threads: 39703889
Testing 3 threads: 23461029
Testing 4 threads: 16961003
Testing 5 threads: 12750838
Testing 6 threads: 12227898
Testing 7 threads: 12245818
*/
for (auto i = 1; i <= threadLimit; ++i)
{
std::cerr << "Testing " << i << " threads: ";
std::atomic<unsigned long long> count = 0;
std::atomic_bool stopFlag = false;
std::vector<std::thread> threads;
threads.reserve(i);
for (auto j = 0; j < i; ++j)
threads.emplace_back([&] {unsigned long long local = 0; while (!stopFlag) ++local; count += local; });
std::this_thread::sleep_for(std::chrono::seconds{ 1 });
stopFlag = true;
for (auto& thread : threads)
thread.join();
std::cerr << count << '\n';
}
/*
Testing 1 threads: 3178867276
Testing 2 threads: 6305783667
Testing 3 threads: 9388659151
Testing 4 threads: 12472666861
Testing 5 threads: 15230810694
Testing 6 threads: 18130479890
Testing 7 threads: 20151074046
*/
}
Placing a read lock on a shared mutex modifies the state of that mutex. All your threads do nothing but try to change the state of the same object, the shared mutex. So of course this code is going to scale poorly.
The point of a shared mutex is to allow accesses to shared data that is not modified to scale. You have no accesses to shared data that is not modified. So you aren't measuring any of the beneficial properties of a shared mutex here.
I have the following code, I need to have a random number in given interval. Seems to work how I need.
std::default_random_engine eng;
std::uniform_int_distribution<int> dist(3, 7);
int timeout = dist(eng);
But then I run it in different threads and repeated in the loop.
std::default_random_engine defRandEng(std::this_thread::get_id());
std::uniform_int_distribution<int> dist(3, 7);
int timeout; // if I put timeout = dist(defRandEng); here it's all the same
while (true)
{
timeout = dist(defRandEng);
std::cout<<"Thread "<<std::this_thread::get_id()<<" timeout = "<<timeout<<std::endl;
std::this_thread::sleep_for(std::chrono::seconds(timeout));
}
But for every iteration in all threads the values are the same
Thread 139779167999744 timeout = 6
Thread 139779134428928 timeout = 6
Thread 139779067287296 timeout = 6
Thread 139779117643520 timeout = 6
Thread 139779100858112 timeout = 6
Thread 139779084072704 timeout = 6
Thread 139779151214336 timeout = 6
Thread 139779050501888 timeout = 6
Thread 139779033716480 timeout = 6
next interation
Thread 139779167999744 timeout = 4
Thread 139779151214336 timeout = 4
Thread 139779134428928 timeout = 4
Thread 139779117643520 timeout = 4
Thread 139779100858112 timeout = 4
Thread 139779084072704 timeout = 4
Thread 139779067287296 timeout = 4
Thread 139779050501888 timeout = 4
Thread 139779033716480 timeout = 4
You need to provide a seed based on some naturally random value to your random engine. The below example, that was adopted from your code snippets, works fine with 3 threads:
std::mutex lock;
void sample_time_out()
{
std::stringstream ss;
ss << std::this_thread::get_id();
uint64_t thread_id = std::stoull(ss.str());
std::default_random_engine eng(thread_id);
std::uniform_int_distribution<int> dis(3, 7);
for (int i = 0; i < 3; i++)
{
auto timeout = dis(eng);
std::this_thread::sleep_for(std::chrono::seconds(timeout));
{
std::unique_lock<std::mutex> lock1(lock);
std::cout << "Thread " << std::this_thread::get_id() << " timeout = " << timeout << std::endl;
}
}
}
int main()
{
std::thread t1(sample_time_out);
std::thread t2(sample_time_out);
std::thread t3(sample_time_out);
t1.join();
t2.join();
t3.join();
return 0;
}
And the output of my first run is:
Thread 31420 timeout = 3
Thread 18616 timeout = 6
Thread 31556 timeout = 7
Thread 31420 timeout = 4
Thread 18616 timeout = 7
Thread 31420 timeout = 6
Thread 31556 timeout = 7
Thread 18616 timeout = 4
Thread 31556 timeout = 7
I am trying to conduct strict alternation on 2 processes, but I am not sure how to declare the critical region and non-critical region. Here is the code that I have:
#include <iostream>
#include <pthread.h>
int count;
int turn = 0; // Shared variable used to implement strict alternation
void* myFunction(void* arg)
{
int actual_arg = *((int*) arg);
for(unsigned int i = 0; i < 10; ++i) {
while(1)
{
while(turn != 0)
{
critical_region_0();
turn = 1;
non_critical_region_0();
}
}
// Beginning of the critical region
count++;
std::cout << "Thread #" << actual_arg << " count = " << count <<
std::endl;
// End of the critical region
while(0)
{
while(turn != 1)
{
critical_region_1();
turn = 0
non_critical_region_1();
}
}
}
pthread_exit(NULL);
}
int main()
{
int rc[2];
pthread_t ids[2];
int args[2];
count = 0;
for(unsigned int i = 0; i < 2; ++i) {
args[i] = i;
rc[i] = pthread_create(&ids[i], NULL, myFunction, (void*) &args[i]);
}
for(unsigned int i = 0; i < 2; ++i) {
pthread_join(ids[i], NULL);
}
std::cout << "Final count = " << count << std::endl;
pthread_exit(NULL);
}
I know that the critical region and non-critical regions are written as if they are a method but I am using those as placeholders. Is there a way to conduct Strict Alternation without the use of these methods?
Here is what the output should look like.
Thread #0 count = 1
Thread #1 count = 2
Thread #0 count = 3
Thread #1 count = 4
Thread #0 count = 5
Thread #1 count = 6
Thread #0 count = 7
Thread #1 count = 8
Thread #0 count = 9
Thread #1 count = 10
Thread #0 count = 11
Thread #1 count = 12
Thread #0 count = 13
Thread #1 count = 14
Thread #0 count = 15
Thread #1 count = 16
Thread #0 count = 17
Thread #1 count = 18
Thread #0 count = 19
Thread #1 count = 20
Final count = 20
The output I can only manage to get is all of thread 1 first then thread 0.
I think this is a classic place for signals. Each function controlling the thread looks like (thread 1 for example)
while( ... ) {
...
pthread_cond_signal(thread1_done_work);
pthread_cond_wait(thread_2_done_work);
}
where both the work variables are globals of type pthread_cond_t - I think it's more readable with two, but you don't have to use two (a mutex implementation comes to mind).
Thread 2 needs a wait condition as soon as it starts. Here are some details:
https://linux.die.net/man/3/pthread_cond_wait
https://linux.die.net/man/3/pthread_cond_signal
Basically each thread signals it's done (this blocks until the other thread is ready to catch this), then waits for the other thread. So they're "talking" my turn, your turn, etc. If you insist on using the same function for both threads you can pace the condition variables as arguments (swapped for thread 2 relative to 1).
One final small note - this somewhat contradicts the whole purpose of threading.
Q1: Are pseudo random number generators thread safe? Can I use a shared generator in multiple threads?
#include "stdafx.h"
#include <iostream>
#include <thread>
#include <random>
#include <math.h>
using namespace std;
random_device seed;//Should I use thread_local here?
default_random_engine engine(seed());//Should I use thread_local here?
int random_int(int x, int y)
{
binomial_distribution<int> distribution(y - x);
return distribution(engine) + x;
}
int a[10],b[10],c[10];
void thread_task() {
for (int i = 0; i < 10; i++)
{
a[i] = random_int(1, 8);
}
}
void thread_task1() {
for (int i = 0; i < 10; i++)
{
b[i] = random_int(1, 8);
}
}
void thread_task2() {
for (int i = 0; i < 10; i++)
{
c[i] = random_int(1, 8);
}
}
int main()
{
thread t(thread_task);
thread t1(thread_task1);
thread t2(thread_task2);
t.join();
t1.join();
t2.join();
for (int i = 0; i < 10; i++)
cout << a[i] << " ";
cout << endl;
for (int i = 0; i < 10; i++)
cout << b[i] << " ";
cout << endl;
for (int i = 0; i < 10; i++)
cout << c[i] << " ";
cout << endl;
getchar();
return 0;
}
result 1:
7 4 4 3 7 5 4 4 4 4
5 4 4 7 2 3 6 5 4 7
4 4 4 6 1 6 3 5 3 4 //seems fine.
result 2:
5 3 5 6 3 4 5 5 3 5
5 6 5 6 8 3 5 7 3 2
4 6 4 5 4 4 4 3 6 7 //still works fine.
Q2: Does thread safe means lock-free?
If a class is thread safe, then does that mean I can use a shared instance of it in multiple threads without locking it?
Q3: I didn't use either a lock or a thread_local keyword, it still generates different integer sequences for different threads, then what's a lock good for?
If you don't need deterministic sequences per thread could use use locks with one PRNG. If the pseudo-random sequences can't differ over different threads over different runs then use a PRNG per thread.