thread_local in C++ 11 isn't consistent - c++

Why the output isn't consistently 1, when I declared Counter object as thread_local?
#include <iostream>
#include <thread>
#define MAX 10
class Counter {
int c;
public:
Counter() : c(0) {}
void increment() {
++c;
}
~Counter() {
std::cout << "Thread " << std::this_thread::get_id() << " having counter value " << c << "\n";
}
};
thread_local Counter c;
void doWork() {
c.increment();
}
int main() {
std::thread t[MAX];
for ( int i = 0; i < MAX; i++ )
t[i] = std::thread(doWork);
for ( int i = 0; i < MAX; i++ )
t[i].join();
return 0;
}
Output :
Thread 2 having counter value 19469616
Thread 3 having counter value 1
Thread 4 having counter value 19464528
Thread 5 having counter value 19464528
Thread 7 having counter value 1
Thread 6 having counter value 1
Thread 8 having counter value 1
Thread 9 having counter value 1
Thread 10 having counter value 1
Thread 11 having counter value 1

The only interesting thing I got out of exploring this a bit is that MSVC operates differently than clang/gcc.
Here is MSVC's output on the latest compiler:
Thread 34316 having counter value 1
Thread 34312 having counter value 1
Thread 34320 having counter value 1
Thread 34328 having counter value 1
Thread 34324 having counter value 1
Thread 34332 having counter value 1
Thread 34336 having counter value 1
Thread 34340 having counter value 1
Thread 34344 having counter value 1
Thread 34348 having counter value 1
Thread 29300 having counter value 0
c:\dv\TestCpp\out\build\x64-Debug (default)\TestCpp.exe (process 27748) exited with code 0.
Notice: "Thread 29300 having counter value 0". This is the main thread. Since doWork() was never called in the main thread the counter value is 0. This is working as I would expect things to work. "c" is a global Counter object - it should be created in all threads, including the main thread.
However, when I run (and godbolt doesn't seem to like to run multithreaded programs, AFAICT) under clang and gcc using rextester.com I get:
Thread 140199006193408 having counter value 1
Thread 140198906181376 having counter value 1
Thread 140198806169344 having counter value 1
Thread 140198706157312 having counter value 1
Thread 140199106205440 having counter value 1
Thread 140199206217472 having counter value 1
Thread 140199306229504 having counter value 1
Thread 140199406241536 having counter value 1
Thread 140199506253568 having counter value 1
Thread 140199606265600 having counter value 1
Notice: No main thread with the counter value of 0. The "c" Counter object was never created and destroyed in the main thread under gcc and clang however it was created and destroyed in the main thread under MSVC.
I believe that your personal bug is that you are using an old version of a compiler that has a bug in it. I couldn't reproduce your bug no matter which latest version of the various types of compilers that I used, but I am going to give a bug to the MSVC team because this is Standard C++ and one figures that it should produce the same results regardless of the compiler and it doesn't.
It seems (examining https://en.cppreference.com/w/cpp/language/storage_duration) that MSVC may be doing it wrong as the thread_local "Counter c" object was never explicitly accessed in the main thread and yet the object was created and destroyed.
We'll see what they say - I have a few other bugs pending to them...
Further research: The cpp reference article seems to indicate that since this is a "pure global" - i.e. it isn't a local static thread local object like:
Foo &
GetFooThreadSingleton()
{
static thread_local Foo foo;
return foo;
}
then they way I read the cppreference article is: it's a global like any other global, in which case MSVC is doing it correctly.
Someone please correct me if I am wrong. Thanks.
Also without running it natively in Ubuntu inside of gdb and checking I cannot be sure that the global wasn't actually created and destroyed in the main thread under gcc/clang because it could be that we just missed that output for some reason. I'll admit this seems unlikely. I'll try this later because this issue has intrigued me a bit.
I ran it natively in Ubuntu and got what I expected: The global "Counter c" is only created and destroyed if it is accessed in the main thread. So my thought is that this is a bug in gcc/clang at this point.

Related

OpenMP integer copied after tasks finish

I do not know if this is documented anywhere, if so I would love a reference to it, however I have found some unexpected behaviour when using OpenMP. I have a simple program below to illustrate the issue. Here in point form I will tell what I expect the program to do:
I want to have 2 threads
They both share an integer
The first thread increments the integer
The second thread reads the integer
Ater incrementing once, an external process must tell the first thread to continue incrementing (via a mutex lock)
The second thread is in charge of unlocking this mutex
As you will see, the counter which is shared between the threads is not altered properly for the second thread. However, if I turn the counter into an integer refernce instead, I get the expected result. Here is a simple code example:
#include <mutex>
#include <thread>
#include <chrono>
#include <iostream>
#include <omp.h>
using namespace std;
using std::this_thread::sleep_for;
using std::chrono::milliseconds;
const int sleep_amount = 2000;
int main() {
int counter = 0; // if I comment this and uncomment the 2 lines below, I get the expected results
/* int c = 0; */
/* int &counter = c; */
omp_lock_t mut;
omp_init_lock(&mut);
int counter_1, counter_2;
#pragma omp parallel
#pragma omp single
{
#pragma omp task default(shared)
// The first task just increments the counter 3 times
{
while (counter < 3) {
omp_set_lock(&mut);
counter += 1;
cout << "increasing: " << counter << endl;
}
}
#pragma omp task default(shared)
{
sleep_for(milliseconds(sleep_amount));
// While sleeping, counter is increased to 1 in the first task
counter_1 = counter;
cout << "counter_1: " << counter << endl;
omp_unset_lock(&mut);
sleep_for(milliseconds(sleep_amount));
// While sleeping, counter is increased to 2 in the first task
counter_2 = counter;
cout << "counter_2: " << counter << endl;
omp_unset_lock(&mut);
// Release one last time to increment the counter to 3
}
}
omp_destroy_lock(&mut);
cout << "expected: 1, actual: " << counter_1 << endl;
cout << "expected: 2, actual: " << counter_2 << endl;
cout << "expected: 3, actual: " << counter << endl;
}
Here is my output:
increasing: 1
counter_1: 0
increasing: 2
counter_2: 0
increasing: 3
expected: 1, actual: 0
expected: 2, actual: 0
expected: 3, actual: 3
gcc version: 9.4.0
Additional discoveries:
If I use OpenMP 'sections' instead of 'tasks', I get the expected result as well. The problem seems to be with 'tasks' specifically
If I use posix semaphores, this problem also persists.
This is not permitted to unlock a mutex from another thread. Doing it causes an undefined behavior. The general solution is to use semaphores in this case. Wait conditions can also help (regarding the real-world use cases). To quote the OpenMP documentation (note that this constraint is shared by nearly all mutex implementation including pthreads):
A program that accesses a lock that is not in the locked state or that is not owned by the task that contains the call through either routine is non-conforming.
A program that accesses a lock that is not in the uninitialized state through either routine is non-conforming.
Moreover, the two tasks can be executed on the same thread or different threads. You should not assume anything about their scheduling unless you tell OpenMP to do so with dependencies. Here, it is completely compliant for a runtime to execute the tasks serially. You need to use OpenMP sections so multiple threads execute different sections. Besides, it is generally considered as a bad practice to use locks in tasks as the runtime scheduler is not aware of them.
Finally, you do not need a lock in this case: an atomic operation is sufficient. Fortunately, OpenMP supports atomic operations (as well as C++).
Additional notes
Note that locks guarantee the consistency of memory accesses in multiple threads thanks to memory barriers. Indeed, an unlock operation on a mutex cause a release memory barrier that make writes visible from others threads. A lock from another thread do an acquire memory barrier that force reads to be done after the lock. When lock/unlocks are not used correctly, the way memory accesses are done is not safe anymore and this cause some variable not to be updated from other threads for example. More generally, this also tends to create race conditions. Thus, put it shortly, don't do that.

weird pthread_mutex_t behavior

Consider the next piece of code -
#include <iostream>
using namespace std;
int sharedIndex = 10;
pthread_mutex_t mutex;
void* foo(void* arg)
{
while(sharedIndex >= 0)
{
pthread_mutex_lock(&mutex);
cout << sharedIndex << endl;
sharedIndex--;
pthread_mutex_unlock(&mutex);
}
return NULL;
}
int main() {
pthread_t p1;
pthread_t p2;
pthread_t p3;
pthread_create(&p1, NULL, foo, NULL);
pthread_create(&p2, NULL, foo, NULL);
pthread_create(&p3, NULL, foo, NULL);
pthread_join(p1, NULL);
pthread_join(p2, NULL);
pthread_join(p3, NULL);
return 0;
}
I simply created three pthreads and gave them all the same function foo, in hope that every thread, at its turn, will print and decrement the sharedIndex.
But this is the output -
10
9
8
7
6
5
4
3
2
1
0
-1
-2
I don't understand why the process doesn't stop when sharedIndex
reaches 0.
sharedIndex is protected by a mutex. How come it's accessed after it became 0? Aren't the threads supposed to directly skip to return NULL;?
EDIT
In addition, it seems that only the first thread decrements the sharedIndex.
Why isn't every thread decrement the shared resource at it's turn?
Here's the output after a fix -
Current thread: 140594495477504
10
Current thread: 140594495477504
9
Current thread: 140594495477504
8
Current thread: 140594495477504
7
Current thread: 140594495477504
6
Current thread: 140594495477504
5
Current thread: 140594495477504
4
Current thread: 140594495477504
3
Current thread: 140594495477504
2
Current thread: 140594495477504
1
Current thread: 140594495477504
0
Current thread: 140594495477504
Current thread: 140594478692096
Current thread: 140594487084800
I wish that all of the thread will decrement the shared source - Meaning, every contex switch, a different thread will access the resource and do its thing.
This program's behaviour is undefined.
You have not initialized the mutex. You need to either call pthread_mutex_init or statically initialize it:
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
You read this variable outside the critical section:
while(sharedIndex >= 0)
That means you could read a garbage value while another thread is updating it. You should not read the shared variable until you have locked the mutex and have exclusive access to it.
Edit:
it seems that only the first thread decrements the sharedIndex
That's because of the undefined behaviour. Fix the problems above and you should see other threads run.
With your current code the compiler is allowed to assume that the sharedIndex is never updated by other threads, so it doesn't bother re-reading it, but just lets the first thread run ten times, then the other two threads run once each.
Meaning, every contex switch, a different thread will access the resource and do its thing.
There is no guarantee that pthread mutexes behave fairly. If you want to guarantee a round-robin behaviour where each thread runs in turn then you will need to impose that yourself, e.g. by having another shared variable (and maybe a condition variable) that says which thread's turn it is to run, and blocking the other threads until it is their turn.
The threads will be hanging out on pthread_mutex_lock(&mutex); waiting to get the lock. Once a thread decrements to 0 and releases the lock, the next thread waiting at lock will then go about it's business (making the value -1), and same for the next thread (making the value -2).
You need to alter your logic on checking value and locking the mutex.
int sharedIndex = 10;
pthread_mutex_t mutex;
void* foo(void* arg)
{
while(sharedIndex >= 0)
{
pthread_mutex_lock(&mutex);
cout << sharedIndex << endl;
sharedIndex--;
pthread_mutex_unlock(&mutex);
}
return NULL;
}
According to this code sharedIndex is the shared resource for all the threads.
Thus each access to it (both read and write) should be wrapped by mutex.
Otherwise assume the situation where all the threads sample sharedIndex simultaneously and its value is 1.
All threads, then, enter the while loop and each one decreases sharedIndex by one leading it to -2 at the end.
EDIT
Possible fix (as one of the possible options):
bool is_positive;
do
{
pthread_mutex_lock(&mutex);
is_positive = (sharedIndex >= 0);
if (is_positive)
{
cout << sharedIndex << endl;
sharedIndex--;
}
pthread_mutex_unlock(&mutex);
}while(is_positive);
EDIT2
Note that you must initialize the mutex:
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

Issue with POSIX thread synchronization and/or pthread_create() argument passing

I am trying to create a number of different threads that are required to wait for all of the threads to be created before they can perform any actions. This is a smaller part of a large program, I am just trying to take it in steps. As each thread is created it is immediately blocked by a semaphore. After all of the threads have been created, I loop through and release all the threads. I then wish each thread to print out its thread number to verify that they all waited. I only allow one thread to print at a time using another semaphore.
The issue I'm having is that although I create threads #1-10, a thread prints that it is #11. Also, a few threads say they have the same number as another one. Is the error in my passing the threadID or is it in my synchronization somehow?
Here is relevant code:
//Initialize semaphore to 0. Then each time a thread is spawned it will call
//semWait() making the value negative and blocking that thread. Once all of the
//threads are created, semSignal() will be called to release each of the threads
sem_init(&threadCreation,0,0);
sem_init(&tester,0,1);
//Spawn all of the opener threads, 1 for each valve
pthread_t threads[T_Valve_Numbers];
int check;
//Loop starts at 1 instead of the standard 0 so that numbering of valves
//is somewhat more logical.
for(int i =1; i <= T_Valve_Numbers;i++)
{
cout<<"Creating thread: "<<i<<endl;
check=pthread_create(&threads[i], NULL, Valve_Handler,(void*)&i);
if(check)
{
cout <<"Couldn't create thread "<<i<<" Error: "<<check<<endl;
exit(-1);
}
}
//Release all of the blocked threads now that they have all been created
for(int i =1; i<=T_Valve_Numbers;i++)
{
sem_post(&threadCreation);
}
//Make the main process wait for all the threads before terminating
for(int i =1; i<=T_Valve_Numbers;i++)
{
pthread_join(threads[i],NULL);
}
return 0;
}
void* Valve_Handler(void* threadNumArg)
{
int threadNum = *((int *)threadNumArg);
sem_wait(&threadCreation);//Blocks the thread until all are spawned
sem_wait(&tester);
cout<<"I'm thread "<<threadNum<<endl;
sem_post(&tester);
}
When T_Valve_Numbers = 10, some sample output is:
Creating thread: 1
Creating thread: 2
Creating thread: 3
Creating thread: 4
Creating thread: 5
Creating thread: 6
Creating thread: 7
Creating thread: 8
Creating thread: 9
Creating thread: 10
I'm thread 11 //Where is 11 coming from?
I'm thread 8
I'm thread 3
I'm thread 4
I'm thread 10
I'm thread 9
I'm thread 7
I'm thread 3
I'm thread 6
I'm thread 6 //How do I have 2 6's?
OR
Creating thread: 1
Creating thread: 2
Creating thread: 3
Creating thread: 4
Creating thread: 5
Creating thread: 6
Creating thread: 7
Creating thread: 8
Creating thread: 9
Creating thread: 10
I'm thread 11
I'm thread 8
I'm thread 8
I'm thread 4
I'm thread 4
I'm thread 8
I'm thread 10
I'm thread 3
I'm thread 9
I'm thread 8 //Now '8' showed up 3 times
"I'm thread..." is printing 10 times so it appears like my semaphore is letting all of the threads through. I'm just not sure why their thread number is messed up.
check=pthread_create(&threads[i], NULL, Valve_Handler,(void*)&i);
^^
You're passing the thread start function the address of i. i is changing all the time in the main loop, unsynchronized with the thread functions. You have no idea what the value of i will be once the thread function gets around to actually dereferencing that pointer.
Pass in an actual integer rather than a pointer to the local variable if that's the only thing you'll ever need to pass. Otherwise, create a simple struct with all the parameters, build an array of those (one for each thread) and pass each thread a pointer to its own element.
Example: (assuming your thread index never overflows an int)
#include <stdint.h> // for intptr_t
...
check = pthread_create(..., (void*)(intptr_t)i);
...
int threadNum = (intptr_t)threadNumArg;
Better/more flexible/doesn't require intprt_t that might not exist example:
struct thread_args {
int thread_index;
int thread_color;
// ...
}
// ...
struct thread_args args[T_Valve_Numbers];
for (int i=0; i<T_Valve_Numbers; i++) {
args[i].thread_index = i;
args[i].thread_color = ...;
}
// ...
check = pthread_create(..., &(args[i-1])); // or loop from 0, less surprising
A word of caution about this though: that array of thread arguments needs to stay alive at least as long as the threads will use it. In some situations, you might be better of with a dynamic allocation for each structure, passing that pointer (and its ownership) to the thread function (especially if you're going to detach the threads rather than joining them).
If you're going to join the threads at some point, keep those arguments around the same way you keep your pthread_t structures around. (And if you're creating and joining in the same function, the stack is usually fine for that.)

How do I make threads run sequentially instead of concurrently?

For example I want each thread to not start running until the previous one has completed, is there a flag, something like thread.isRunning()?
#include <iostream>
#include <vector>
#include <thread>
using namespace std;
void hello() {
cout << "thread id: " << this_thread::get_id() << endl;
}
int main() {
vector<thread> threads;
for (int i = 0; i < 5; ++i)
threads.push_back(thread(hello));
for (thread& thr : threads)
thr.join();
cin.get();
return 0;
}
I know that the threads are meant to run concurrently, but what if I want to control the order?
There is no thread.isRunning(). You need some synchronization primitive to do it.
Consider std::condition_variable for example.
One approachable way is to use std::async. With the current definition of std::async is that the associated state of an operation launched by std::async can cause the returned std::future's destructor to block until the operation is complete. This can limit composability and result in code that appears to run in parallel but in reality runs sequentially.
{
std::async(std::launch::async, []{ hello(); });
std::async(std::launch::async, []{ hello(); }); // does not run until hello() completes
}
If we need the second thread start to run after the first one is completed, is a thread really needed?
For solution I think try to set a global flag, the set the value in the first thread, and when start the second thread, check the flag first should work.
You can't simply control the order like saying "First, thread 1, then thread 2,..." you will need to make use of synchronization (i.e. std::mutex and condition-variables std::condition_variable_any).
You can create events so as to block one thread until a certain event happend.
See cppreference for an overview of threading-mechanisms in C++-11.
You will need to use semaphore or lock.
If you initialize semaphore to value 0:
Call wait after thread.start() and call signal/ release in the end of thread execution function (e.g. run funcition in java, OnExit function etc...)
So the main thread will keep waiting until the thread in loop has completed its execution.
Task-based parallelism can achieve this, but C++ does not currently offer task model as part of it's threading libraries. If you have TBB or PPL you can use their task-based facilities.
I think you can achieve this by using std::mutex and std::condition_variable from C++11. To be able to run threads sequentially array of booleans in used, when thread is done doing some work it writes true in specific index of the array.
For example:
mutex mtx;
condition_variable cv;
int ids[10] = { false };
void shared_method(int id) {
unique_lock<mutex> lock(mtx);
if (id != 0) {
while (!ids[id - 1]) {
cv.wait(lock);
}
}
int delay = rand() % 4;
cout << "Thread " << id << " will finish in " << delay << " seconds." << endl;
this_thread::sleep_for(chrono::seconds(delay));
ids[id] = true;
cv.notify_all();
}
void test_condition_variable() {
thread threads[10];
for (int i = 0; i < 10; ++i) {
threads[i] = thread(shared_method, i);
}
for (thread &t : threads) {
t.join();
}
}
Output:
Thread 0 will finish in 3 seconds.
Thread 1 will finish in 1 seconds.
Thread 2 will finish in 1 seconds.
Thread 3 will finish in 2 seconds.
Thread 4 will finish in 2 seconds.
Thread 5 will finish in 0 seconds.
Thread 6 will finish in 0 seconds.
Thread 7 will finish in 2 seconds.
Thread 8 will finish in 3 seconds.
Thread 9 will finish in 1 seconds.

pthread_cond_wait strange behaviour

I'm trying to solve Dining philosophers problem using C++.
Code is compiled with g++ -lpthread.
Entire solution is on philosophers github. Repository contains two cpp files: main.cpp and philosopher.cpp. "Main.cpp" creates mutex variable, semaphore, 5 conditinal variables, 5 forks, and starts philosophers. Semaphore is used only to synchronize start of philosophers. Other parameters are passed to philosophers to solve a problem. "Philosopher.cpp" contains solution for given problem but after few steps deadlock occurs.
Deadlock occurs when philosopher 0 is eating, and philosopher 1 (next to him) wants to take forks. Then, philosopher 1 has taken mutex, and wont give it back until philosopher 0 puts his forks down. Philosopher 0 can't put his forks down because of taken mutex, so we have a deadlock. Problem is in Philosopher::take_fork method, call for pthread_cond_wait(a,b) isn't releasing mutex b. Can't figure out why?
// Taking fork. If eather lef or right fork is taken, wait.
void Philosopher::take_fork(){
pthread_mutex_lock(&mon);
std::cout << "Philosopher " << id << " is waiting on forks" << std::endl;
while(!fork[id] || !fork[(id + 1)%N])
pthread_cond_wait(cond + id, &mon);
fork[id] = fork[(id + 1)%N] = false;
std::cout << "Philosopher " << id << " is eating" << std::endl;
pthread_mutex_unlock(&mon);
}
Please reference to this code for the rest.
Your call to pthread_cond_wait() is fine, so the problem must be elsewhere. You have three bugs that I can see:
Firstly, in main() you are only initialising the first condition variable in the array. You need to initialise all N condition variables:
for(int i = 0; i < N; i++) {
fork[i] = true;
pthread_cond_init(&cond[i], NULL);
}
pthread_mutex_init(&mon, NULL);
Secondly, in put_fork() you have an incorrect calculation for one of the condition variables to signal:
pthread_cond_signal(cond + (id-1)%N); /* incorrect */
When id is equal to zero, (id - 1) % N is equal to -1, so this will try to signal cond - 1, which does not point at a condition variable (it's possible that this pointer actually corrupts your mutex, since it might well be placed directly before cond on the stack). The calculation you actually want is:
pthread_cond_signal(cond + (id + N - 1) % N);
The third bug isn't the cause of your deadlock, but you shouldn't call srand(time(NULL)) every time you call rand() - just call that once, at the start of main().