I am trying to conduct strict alternation on 2 processes, but I am not sure how to declare the critical region and non-critical region. Here is the code that I have:
#include <iostream>
#include <pthread.h>
int count;
int turn = 0; // Shared variable used to implement strict alternation
void* myFunction(void* arg)
{
int actual_arg = *((int*) arg);
for(unsigned int i = 0; i < 10; ++i) {
while(1)
{
while(turn != 0)
{
critical_region_0();
turn = 1;
non_critical_region_0();
}
}
// Beginning of the critical region
count++;
std::cout << "Thread #" << actual_arg << " count = " << count <<
std::endl;
// End of the critical region
while(0)
{
while(turn != 1)
{
critical_region_1();
turn = 0
non_critical_region_1();
}
}
}
pthread_exit(NULL);
}
int main()
{
int rc[2];
pthread_t ids[2];
int args[2];
count = 0;
for(unsigned int i = 0; i < 2; ++i) {
args[i] = i;
rc[i] = pthread_create(&ids[i], NULL, myFunction, (void*) &args[i]);
}
for(unsigned int i = 0; i < 2; ++i) {
pthread_join(ids[i], NULL);
}
std::cout << "Final count = " << count << std::endl;
pthread_exit(NULL);
}
I know that the critical region and non-critical regions are written as if they are a method but I am using those as placeholders. Is there a way to conduct Strict Alternation without the use of these methods?
Here is what the output should look like.
Thread #0 count = 1
Thread #1 count = 2
Thread #0 count = 3
Thread #1 count = 4
Thread #0 count = 5
Thread #1 count = 6
Thread #0 count = 7
Thread #1 count = 8
Thread #0 count = 9
Thread #1 count = 10
Thread #0 count = 11
Thread #1 count = 12
Thread #0 count = 13
Thread #1 count = 14
Thread #0 count = 15
Thread #1 count = 16
Thread #0 count = 17
Thread #1 count = 18
Thread #0 count = 19
Thread #1 count = 20
Final count = 20
The output I can only manage to get is all of thread 1 first then thread 0.
I think this is a classic place for signals. Each function controlling the thread looks like (thread 1 for example)
while( ... ) {
...
pthread_cond_signal(thread1_done_work);
pthread_cond_wait(thread_2_done_work);
}
where both the work variables are globals of type pthread_cond_t - I think it's more readable with two, but you don't have to use two (a mutex implementation comes to mind).
Thread 2 needs a wait condition as soon as it starts. Here are some details:
https://linux.die.net/man/3/pthread_cond_wait
https://linux.die.net/man/3/pthread_cond_signal
Basically each thread signals it's done (this blocks until the other thread is ready to catch this), then waits for the other thread. So they're "talking" my turn, your turn, etc. If you insist on using the same function for both threads you can pace the condition variables as arguments (swapped for thread 2 relative to 1).
One final small note - this somewhat contradicts the whole purpose of threading.
Related
I would like to apply as simple mutex as possible.
#include <iostream>
#include <thread>
#include <vector>
#include <functional>
#include <algorithm>
#include <mutex>
using namespace std;
int sum;
static mutex m;
void addValue(int value)
{
m.lock();
sum += value;
m.unlock();
}
int main()
{
int counter1 = 0;
int counter2 = 0;
for (int i = 0; i < 100; i++)
{
thread t1(addValue, 100);
thread t2(addValue, 200);
if (sum == 300)
{
counter1++;
}
else
{
counter2++;
}
sum = 0;
t1.join();
t2.join();
}
cout << counter1 << endl;
cout << counter2 << endl;
}
Unfortunately above mentioned code doesn't work as expected. I expect that:
a) sum is always equal to 300
b) counter1 is always 100
c) counter2 is always 0
What is wrong?
EDIT:
When I debug the sum variable in the else condition, I see values like:
200, 400, 100, and even 0 (I assume that addition didn't even happen).
C++ mutex doesn't work - synchronization fails
Why does everyone learning this stuff for the first time assume the tried-and-tested synchronization primitives that work for everyone else are broken, and not their assumptions?
The mutex is fine. Your mental model is broken. This should be your starting assumption.
I expect that:
sum is always equal to 300
That would be the case if you joined both threads before checking the value. But you haven't done that, so you're doing an entirely un-sychronized read of sum while two other threads are possibly mutating it. This is a data race. A mutex doesn't protect your data unless you always use the mutex when accessing the data.
Let's say we make the minimal change so sum is always protected:
thread t1(addValue, 100); // a
thread t2(addValue, 200); // b
m.lock();
if (sum == 300) // c
{
counter1++;
}
else
{
counter2++;
}
sum = 0;
m.unlock();
now some of the available orderings are:
abc - what you expected (and what would be guaranteed if you joined both threads before reading sum)
acb - you read 100 at line c, increment counter2, and the second thread increments sum to 300 after you read it (but you never see this)
cab - you read 0 immediately, before the two threads have even been scheduled to run
bca - you read 200, it's later incremented to 300 after you checked
etc.
every permutation is permitted, unless you make some effort to explicitly order them
It is working as intended, the problem is that you didn't expected that "time" will not be the same for all 3 threads and you dismist the obvious thing that one thread starts before the other, this clearly adds an avantage, even more if it only needs to do is loop 100 times a increment.
#include <iostream>
#include <thread>
#include <mutex>
bool keep_alive;
void add_value_mutex(std::mutex * mx, int * trg, int value) {
while (keep_alive){
mx->lock();
(*trg) += value;
mx->unlock();
}
}
int main(){
std::thread thread_1;
std::thread thread_2;
int count_targ = 2000;
int * counter_1 = new int(0);
int * counter_2 = new int(0);
/* --- */
std::mutex mx_1;
std::mutex mx_2;
keep_alive = true;
thread_1 = std::thread(add_value_mutex, &mx_1, counter_1, 1);
thread_2 = std::thread(add_value_mutex, &mx_2, counter_2, 1);
while(1){
if (mx_1.try_lock()){
if (count_targ <= * counter_1){
mx_1.unlock();
break;
}
mx_1.unlock();
}
if (mx_2.try_lock()){
if (count_targ <= * counter_2){
mx_2.unlock();
break;
}
mx_2.unlock();
}
}
keep_alive = false;
thread_1.join();
thread_2.join();
std::cout << "Thread 1 (independent mutex) -> " << * counter_1 << std::endl;
std::cout << "Thread 2 (independent mutex) -> " << * counter_2 << std::endl;
/* --- */
keep_alive = true;
(*counter_1) = 0;
(*counter_2) = 0;
std::mutex mx_s;
thread_1 = std::thread(add_value_mutex, &mx_s, counter_1, 1);
thread_2 = std::thread(add_value_mutex, &mx_s, counter_2, 1);
while(1){
if (mx_s.try_lock()){
if (count_targ <= * counter_1 || count_targ <= * counter_2){
mx_s.unlock();
break;
}
mx_s.unlock();
}
}
std::cout << "Thread 1 (shared mutex) -> " << * counter_1 << std::endl;
std::cout << "Thread 2 (shared mutex) -> " << * counter_2 << std::endl;
keep_alive = false;
thread_1.join();
thread_2.join();
delete counter_1;
delete counter_2;
return 0;
}
If you want another example of mine that measures the time a thread is waiting check this one
I wrote a simple program to test out the performance of std::shared_mutex across a number of threads looping lock_shared(). But from the result, it doesn't seems to scale with more threads added, which doesn't really make sense to me.
You may argue it's because the stopFlag limiting the performance, so the second for loop a test for increment a local counter, which is almost perfect scaling in the beginning
The result in the comments are compiled with MSVC with Release flag.
int main()
{
const auto threadLimit = std::thread::hardware_concurrency() - 1; //for running main()
struct SharedMutexWrapper
{
std::shared_mutex mut;
void read()
{
mut.lock_shared();
mut.unlock_shared();
}
};
/*Testing shared_mutex */
for (auto i = 1; i <= threadLimit; ++i)
{
std::cerr << "Testing " << i << " threads: ";
SharedMutexWrapper test;
std::atomic<unsigned long long> count = 0;
std::atomic_bool stopFlag = false;
std::vector<std::thread> threads;
threads.reserve(i);
for (auto j = 0; j < i; ++j)
threads.emplace_back([&] {unsigned long long local = 0; while (!stopFlag) { test.read(); ++local; } count += local; });
std::this_thread::sleep_for(std::chrono::seconds{ 1 });
stopFlag = true;
for (auto& thread : threads)
thread.join();
std::cerr << count << '\n';
}
/*
Testing 1 threads: 60394076
Testing 2 threads: 39703889
Testing 3 threads: 23461029
Testing 4 threads: 16961003
Testing 5 threads: 12750838
Testing 6 threads: 12227898
Testing 7 threads: 12245818
*/
for (auto i = 1; i <= threadLimit; ++i)
{
std::cerr << "Testing " << i << " threads: ";
std::atomic<unsigned long long> count = 0;
std::atomic_bool stopFlag = false;
std::vector<std::thread> threads;
threads.reserve(i);
for (auto j = 0; j < i; ++j)
threads.emplace_back([&] {unsigned long long local = 0; while (!stopFlag) ++local; count += local; });
std::this_thread::sleep_for(std::chrono::seconds{ 1 });
stopFlag = true;
for (auto& thread : threads)
thread.join();
std::cerr << count << '\n';
}
/*
Testing 1 threads: 3178867276
Testing 2 threads: 6305783667
Testing 3 threads: 9388659151
Testing 4 threads: 12472666861
Testing 5 threads: 15230810694
Testing 6 threads: 18130479890
Testing 7 threads: 20151074046
*/
}
Placing a read lock on a shared mutex modifies the state of that mutex. All your threads do nothing but try to change the state of the same object, the shared mutex. So of course this code is going to scale poorly.
The point of a shared mutex is to allow accesses to shared data that is not modified to scale. You have no accesses to shared data that is not modified. So you aren't measuring any of the beneficial properties of a shared mutex here.
I have the following code, I need to have a random number in given interval. Seems to work how I need.
std::default_random_engine eng;
std::uniform_int_distribution<int> dist(3, 7);
int timeout = dist(eng);
But then I run it in different threads and repeated in the loop.
std::default_random_engine defRandEng(std::this_thread::get_id());
std::uniform_int_distribution<int> dist(3, 7);
int timeout; // if I put timeout = dist(defRandEng); here it's all the same
while (true)
{
timeout = dist(defRandEng);
std::cout<<"Thread "<<std::this_thread::get_id()<<" timeout = "<<timeout<<std::endl;
std::this_thread::sleep_for(std::chrono::seconds(timeout));
}
But for every iteration in all threads the values are the same
Thread 139779167999744 timeout = 6
Thread 139779134428928 timeout = 6
Thread 139779067287296 timeout = 6
Thread 139779117643520 timeout = 6
Thread 139779100858112 timeout = 6
Thread 139779084072704 timeout = 6
Thread 139779151214336 timeout = 6
Thread 139779050501888 timeout = 6
Thread 139779033716480 timeout = 6
next interation
Thread 139779167999744 timeout = 4
Thread 139779151214336 timeout = 4
Thread 139779134428928 timeout = 4
Thread 139779117643520 timeout = 4
Thread 139779100858112 timeout = 4
Thread 139779084072704 timeout = 4
Thread 139779067287296 timeout = 4
Thread 139779050501888 timeout = 4
Thread 139779033716480 timeout = 4
You need to provide a seed based on some naturally random value to your random engine. The below example, that was adopted from your code snippets, works fine with 3 threads:
std::mutex lock;
void sample_time_out()
{
std::stringstream ss;
ss << std::this_thread::get_id();
uint64_t thread_id = std::stoull(ss.str());
std::default_random_engine eng(thread_id);
std::uniform_int_distribution<int> dis(3, 7);
for (int i = 0; i < 3; i++)
{
auto timeout = dis(eng);
std::this_thread::sleep_for(std::chrono::seconds(timeout));
{
std::unique_lock<std::mutex> lock1(lock);
std::cout << "Thread " << std::this_thread::get_id() << " timeout = " << timeout << std::endl;
}
}
}
int main()
{
std::thread t1(sample_time_out);
std::thread t2(sample_time_out);
std::thread t3(sample_time_out);
t1.join();
t2.join();
t3.join();
return 0;
}
And the output of my first run is:
Thread 31420 timeout = 3
Thread 18616 timeout = 6
Thread 31556 timeout = 7
Thread 31420 timeout = 4
Thread 18616 timeout = 7
Thread 31420 timeout = 6
Thread 31556 timeout = 7
Thread 18616 timeout = 4
Thread 31556 timeout = 7
My code right now has a loop which calls a Monte-Carlo function to calculate a simple integral (y=x, from 0 to 1) for multiple number of samples and writes the total time and integration value to a text file. Then the loop increments the number of threads and continues onward. Right now around 8 threads the time peaks around 2.6 seconds. The loop iterates upwards of 64 threads, and I see no slow down beyond .2 seconds, even sometimes a speed up.
For loop calling Monte-Carlo method, increment number of threads:
//this loop will iterate the main loop for a number of threads from 1 to 16
for (int j = 1; j <= 17; j++)
{
//tell user how many threads are running monte-carlo currently
cout << "Program is running " << number_threads << " thread(s) currently." << endl;
//reset values for new run
num_of_samples = 1;
integration_result = 0;
//this for loop will run throughout number of circulations running through monte-carlo
//and entering the data into the text folder
for (int i = 1; i <= iteration_num; i++)
{
//call monte carlo function to perform integration and write values to text
monteCarlo(num_of_samples, starting_x, end_x, number_threads);
//increase num of samples for next test round
num_of_samples = 2 * num_of_samples;
} //end of second for loop
//iterate num_threads
if (number_threads == 1)
number_threads = 2;
else if (number_threads >= 32)
number_threads += 8;
else if (number_threads >= 16)
number_threads += 4;
else
number_threads += 2;
} //end of for loop
Parallel portion for Monte-Carlo:
int num_threads;
double x, u, error_difference, fs = 0, integration_result = 0; //fs is a placeholder to hold added values of f(x)
vector< vector<double>> dataHolder(number_threads, vector<double>(1)); //this vector will hold temp values of each thread
//get start time for parallel block of code
double start_time = omp_get_wtime();
omp_set_dynamic(0); // Explicitly disable dynamic teams
omp_set_num_threads(number_threads); // Use 4 threads for all consecutive parallel regions
#pragma omp parallel default(none) private(x, u) shared(std::cout, end_x, starting_x, num_of_samples, fs, number_threads, num_threads, dataHolder)
{
int i, id, nthrds;
double temp = fs;
//define thread id and num of threads
id = omp_get_thread_num();
nthrds = omp_get_num_threads();
//initilialize random seed
srand(id * time(NULL) * 1000);
//if there is only one thread
if(id == 0)
num_threads = nthrds;
//this for loop will calculate a temp value for fs for each thread
for (int i = id; i < num_of_samples; i = i + nthrds)
{
//assign random number under integration from 0 to 1
u = fRand(0, 1); //random number between 0 and 1
x = starting_x + (end_x - starting_x) * u;
//this line of code is from Monte_Carlo Method by Alex Godunov (February 2007)
//calculuate y for reciporical value of x and add it to thread's local fs
temp += function(x);
}
//place temp inside vector dataHolder
dataHolder[id][0] = temp;
//no thread will go beyond this barrier until task is complete
#pragma omp barrier
//one thread will do this task
#pragma omp single
{
//add summations to calc fs
for(i = 0, fs = 0.0; i < num_threads; i ++)
fs += dataHolder[i][0];
} //implicit barrier here, wait for all tasks to be done
}//end of parallel block of code
After implementing the same sort of parallelization over a simple Monte-Carlo walk with light scattering, I was able to pick up on the diminished returns quite a bit. I think there is a lack of diminishing returns here due to the fact that the integration calculation being so simple, that the threads themselves have little to do separately, and thus their overhead is relatively little.
If anyone else has any other information that would prove useful to this problem, please feel free to post. Otherwise I will accept this as my answer.
i am trying to learn the boost library and was going through examples of boost::thread.
The example below illustrates the usage of the boost::lock_guard for thread synchronization, ensuring that the access to std::cout is not concurrent:
#include <boost/thread.hpp>
#include <boost/format.hpp>
#include <iostream>
void wait(const int secs) {
boost::this_thread::sleep(boost::posix_time::seconds(secs));
}
boost::mutex mutex;
void thread1() {
for (int i = 0; i < 10; ++i) {
wait(1); // <-- all works fine if wait is placed here
boost::lock_guard<boost::mutex> lock(mutex);
std::cout << boost::format("thread A here %d\n") % i ;
}
}
void thread2() {
for (int i = 0; i < 10; ++i) {
wait(1); // <-- all works fine if wait is placed here
boost::lock_guard<boost::mutex> lock(mutex);
std::cout << boost::format("thread B here %d\n") % i;
}
}
int main() {
boost::thread t1(thread1);
boost::thread t2(thread2);
t1.join();
t2.join();
}
The results where pretty much what one would expect, i.e. alternating messages by the two threads printed:
thread A here 0
thread B here 0
thread A here 1
thread B here 1
thread A here 2
thread B here 2
thread A here 3
thread B here 3
thread A here 4
thread B here 4
...
However, a small modification -- moving the wait call inside the scope of the lock guard -- led to a surprise:
void thread1() {
for (int i = 0; i < 10; ++i) {
boost::lock_guard<boost::mutex> lock(mutex);
wait(1); // <== !
std::cout << boost::format("thread A here %d\n") % i ;
}
}
void thread2() {
for (int i = 0; i < 10; ++i) {
boost::lock_guard<boost::mutex> lock(mutex);
wait(1); // <== !
std::cout << boost::format("thread B here %d\n") % i;
}
Now either thead1 or thread2 wins the initial "race" for the mutex and then wins again and again on each loop iteration, thereby starving the other thread!
Example output:
thread B here 0
thread B here 1
thread B here 2
thread B here 3
thread B here 4
thread B here 5
thread B here 6
thread B here 7
thread B here 8
thread B here 9
thread A here 0
thread A here 1
thread A here 2
thread A here 3
thread A here 4
thread A here 5
thread A here 6
thread A here 7
thread A here 8
thread A here 9
Can anybody please explain why this is the case?
This is because after the lock is acquired the wait call causes the second thread to begin executing. Since the second thread cannot acquire the lock it goes into a wait-state until the lock is available. In your case the lock does not become available until the first thread completes it's loop.