Something strange with shared variable in openMP parallel sections

Something strange with shared variable in openMP parallel sections - c++

I got a strange phenomenon in openMP with shared monery and print function.
I tested this problem both in C++ and Fortran.
In C++:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
int main (int argc, char *argv[])
{
int i=1;
#pragma omp parallel sections shared(i)
{
#pragma omp section
{while(true){
i = 1;
printf("thread 1: %i\n", i);
}}
#pragma omp section
{while(true){
i = i - 1000;
printf("thread 2: %i\n", i);
}}
}
}
This code is quite simple and the expected result is something like this:
thread 1: 1
thread 2: -999
thread 1: 1
thread 2: -999
thread 2: -1999
thread 1: 1
However, I could get this result:
thread 1: 1
thread 2: -1726999
thread 2: -1727999
thread 2: -1728999
thread 2: -1729999
thread 2: -1730999
thread 2: -1731999
thread 2: -1732999
It is confusing and looks like i is not shared! I tried to commented this line:
printf("thread 1: %i\n", i);
and got:
thread 2: 1
thread 2: -999
thread 2: 1
thread 2: 1
thread 2: -999
thread 2: 1
It looks fine now.
In Fortan:
OpenMP performances a little different in Fortran.
PROGRAM test
implicit none
integer*8 i
i = 1
!$OMP parallel sections shared(i)
!$OMP section
do
i = 1
print *, "thread 1" ,i
!call sleep(1)
end do
!$OMP section
do
i = i-1000
print *, "thread 2" ,i
!call sleep(1)
end do
!$OMP end parallel sections
END PROGRAM
This code lead to the same problem as above. But if I comment the thread 1's print, the problem is still there.
I have to add sleep subroutine as the commented lines to get the expected result.
Anyone know the reason?
Another question, can a variable being modified in one thread as the same time as be reading in another thread?

You are modifying a shared variable from more than one thread without synchronization. This is known as a data race. The result of your program is unspecified - anything can happen. The same applies if you are writing to a variable in one thread and reading from another without synchronization.
See section 1.4.1 of the OpenMP 4.0 standard for more information.

Related

OpenMP integer copied after tasks finish

I do not know if this is documented anywhere, if so I would love a reference to it, however I have found some unexpected behaviour when using OpenMP. I have a simple program below to illustrate the issue. Here in point form I will tell what I expect the program to do:
I want to have 2 threads
They both share an integer
The first thread increments the integer
The second thread reads the integer
Ater incrementing once, an external process must tell the first thread to continue incrementing (via a mutex lock)
The second thread is in charge of unlocking this mutex
As you will see, the counter which is shared between the threads is not altered properly for the second thread. However, if I turn the counter into an integer refernce instead, I get the expected result. Here is a simple code example:
#include <mutex>
#include <thread>
#include <chrono>
#include <iostream>
#include <omp.h>
using namespace std;
using std::this_thread::sleep_for;
using std::chrono::milliseconds;
const int sleep_amount = 2000;
int main() {
int counter = 0; // if I comment this and uncomment the 2 lines below, I get the expected results
/* int c = 0; */
/* int &counter = c; */
omp_lock_t mut;
omp_init_lock(&mut);
int counter_1, counter_2;
#pragma omp parallel
#pragma omp single
{
#pragma omp task default(shared)
// The first task just increments the counter 3 times
{
while (counter < 3) {
omp_set_lock(&mut);
counter += 1;
cout << "increasing: " << counter << endl;
}
}
#pragma omp task default(shared)
{
sleep_for(milliseconds(sleep_amount));
// While sleeping, counter is increased to 1 in the first task
counter_1 = counter;
cout << "counter_1: " << counter << endl;
omp_unset_lock(&mut);
sleep_for(milliseconds(sleep_amount));
// While sleeping, counter is increased to 2 in the first task
counter_2 = counter;
cout << "counter_2: " << counter << endl;
omp_unset_lock(&mut);
// Release one last time to increment the counter to 3
}
}
omp_destroy_lock(&mut);
cout << "expected: 1, actual: " << counter_1 << endl;
cout << "expected: 2, actual: " << counter_2 << endl;
cout << "expected: 3, actual: " << counter << endl;
}
Here is my output:
increasing: 1
counter_1: 0
increasing: 2
counter_2: 0
increasing: 3
expected: 1, actual: 0
expected: 2, actual: 0
expected: 3, actual: 3
gcc version: 9.4.0
Additional discoveries:
If I use OpenMP 'sections' instead of 'tasks', I get the expected result as well. The problem seems to be with 'tasks' specifically
If I use posix semaphores, this problem also persists.

This is not permitted to unlock a mutex from another thread. Doing it causes an undefined behavior. The general solution is to use semaphores in this case. Wait conditions can also help (regarding the real-world use cases). To quote the OpenMP documentation (note that this constraint is shared by nearly all mutex implementation including pthreads):
A program that accesses a lock that is not in the locked state or that is not owned by the task that contains the call through either routine is non-conforming.
A program that accesses a lock that is not in the uninitialized state through either routine is non-conforming.
Moreover, the two tasks can be executed on the same thread or different threads. You should not assume anything about their scheduling unless you tell OpenMP to do so with dependencies. Here, it is completely compliant for a runtime to execute the tasks serially. You need to use OpenMP sections so multiple threads execute different sections. Besides, it is generally considered as a bad practice to use locks in tasks as the runtime scheduler is not aware of them.
Finally, you do not need a lock in this case: an atomic operation is sufficient. Fortunately, OpenMP supports atomic operations (as well as C++).
Additional notes
Note that locks guarantee the consistency of memory accesses in multiple threads thanks to memory barriers. Indeed, an unlock operation on a mutex cause a release memory barrier that make writes visible from others threads. A lock from another thread do an acquire memory barrier that force reads to be done after the lock. When lock/unlocks are not used correctly, the way memory accesses are done is not safe anymore and this cause some variable not to be updated from other threads for example. More generally, this also tends to create race conditions. Thus, put it shortly, don't do that.

How is OpenMP communicating between threads with what should be a private variable?

I'm writing some code in C++ using OpenMP to parallelize some chunks. I run into some strange behavior that I can't quite explain. I've rewritten my code such that it replicates the issue minimally.
First, here is a function I wrote that is to be run in a parallel region.
void foo()
{
#pragma omp for
for (int i = 0; i < 3; i++)
{
#pragma omp critical
printf("Hello %d from thread %d.\n", i, omp_get_thread_num());
}
}
Then here is my whole program.
int main()
{
omp_set_num_threads(4);
#pragma omp parallel
{
for (int i = 0; i < 2; i++)
{
foo();
#pragma omp critical
printf("%d\n", i);
}
}
return 0;
}
When I compile and run this code (with g++ -std=c++17), I get the following output on the terminal:
Hello 0 from thread 0.
Hello 1 from thread 1.
Hello 2 from thread 2.
0
0
Hello 2 from thread 2.
Hello 1 from thread 1.
0
Hello 0 from thread 0.
0
1
1
1
1
i is a private variable. I would expect that the function foo would be run twice per thread. So I would expect to see eight "Hello from %d thread %d.\n" statements in the terminal, just like how I see eight numbers printed when printing i. So what gives here? Why is it that in the same loop, OMP behaves so differently?

It is because #pragma omp for is a worksharing construct, so it will distribute the work among threads and the number of threads used does not matter in this respect, just the number of loop counts (2*3=6).
If you use omp_set_num_threads(1); you also see 6 outputps. If you use more threads than loop counts, some threads will be idle in the inner loop, but you still see exactly 6 outputs.
On the other hand, if you remove #pragma omp for line you will see (number of threads)*2*3 (=24) outputs.

From the documentation of omp parallel:
Each thread in the team executes all statements within a parallel region except for work-sharing constructs.
Emphasis mine. Since the omp for in foo is a work-sharing construct, it is only executed once per outer iteration, no matter how many threads run the parallel block in main.

How can I execute two for loops in parallel in C++?

In C++ I would like two for loops to execute at the same time and not have one wait for the other one to go first or wait for it to end.
I would like the two for loops (or more) to finish the loops in the same speed it would take one loop of the same size to finish.
I know it's been asked and answered, but not in an example this simple. I'm hoping to solve this specific problem. I worked combinations of pragma omp code examples and couldn't get the result.
#include <iostream>
using namespace std;
#define N 5
int main(void) {
int i;
for (i = 0; i < N; i++) {
cout << "This is line ONE \n";
};
#pragma omp parallel
#pragma omp for
for (i = 0; i < N; i++) {
cout << "This is line TWO \n";
};
};
Compiling
$ g++ parallel.cpp -fopenmp && ./a.out
The output of the code is this, in the time it takes to run two loops...
This is line ONE
This is line ONE
This is line ONE
This is line ONE
This is line ONE
This is line TWO
This is line TWO
This is line TWO
This is line TWO
This is line TWO
The output I would like is this
They don't have to print one after the other like this, but I would think they would if they were both getting to the print part of the loops at the same times. What I really need is for the loops to start and finish at the same time (with the loops being equal).
This is line ONE
This is line TWO
This is line ONE
This is line TWO
This is line ONE
This is line TWO
This is line ONE
This is line TWO
This is line ONE
This is line TWO
There's this Q&A here, but I don't quite understand the undeclared foo and the //do stuff with item parts. What kinda stuff? What item? I have not been able to extrapolate from examples online to make what I need happen.

As already mentioned in the comments that OpenMP may not be the best solution to do so, but if you wish to do it with OpenMP, I suggest the following:
Use sections to start 2 threads, and communicate between the threads by using shared variables. The important thing is to use atomic operation to read (#pragma omp atomic read seq_cst) and to write (#pragma omp atomic write seq_cst) these variables. Here is an example:
#pragma omp parallel num_threads(2)
#pragma omp sections
{
#pragma omp section
{
//This is the sensor controlling part
while(exit_condition)
{
sensor_state = read_sensor();
// Read the currect state of motor from other thread
#pragma omp atomic read seq_cst
motor_state=shared_motor_state;
// Based on the motor_state and sensor state send
// a command to the other thread to control the motor
// or wait for the motor to be ready in a loop, etc.
#pragma omp atomic write seq_cst
shared_motor_command= //whaterver you wish ;
}
}
#pragma omp section
{
//This is the motor controlling part
while(exit_condition)
{
// read motor command form other thread
#pragma omp atomic read seq_cst
motor_command = shared_motor_command;
// Do whatewer you have to to based on motor command and
// You can set the state of motor by the following line
#pragma omp atomic write seq_cst
shared_motor_state= //what you need to pass to the other thread
}
}
}

I think the issue is that you are not trying to parallelize two loops but instead you try to parallelize the work of one loop. If you would add std::cout << "Hello from thread: " << omp_get_thread_num() << "\n"; to your second loop you would see:
This is line TWO
Hello from thread: 0
This is line TWO
Hello from thread: 1
This is line TWO
Hello from thread: 2
This is line TWO
Hello from thread: 3
This is line TWO
Hello from thread: 0
Depending on the assignment to threads, with four threads being the default amount of threads (often number of cores), the order might vary: example (0,1,2,3,0) could be (0,2,3,1,0)
So what you do is that the first loop is run in serial and then (4 or more/less) threads run the second loop in parallel.
The question is if you REALLY want to use OpenMP to parallelize your code. If so you could do something similar to:
#include <iostream>
#include <omp.h>
#include <String.h>
int main() {
#pragma omp parallel for schedule(static)
for(int i = 0; i < 10; i++){
int tid = omp_get_thread_num();
if (tid%2==0) {
std::cout << "This is line ONE" << "\n";
} else {
std::cout << "This is line TWO" << "\n";
}
}
return 0;
}
Where based on the threadID - if it is an even thread it will do task 1, if it is an uneven thread it will do task 2. But as many other commenters have commented, maybe you should consider using p_threads depending on the task.

Wait for the whole omp block to be finished before calling the second function

I don't have experience with openmp in C++ and I would like to learn how to solve my problem properly. I have 30 files that need to be processed independently by the same function. Each time the function is activated, a new output file will be generated (out01.txt to out30.txt) saving the results. I have 12 processors in my machine and would like to use 10 for this problem.
I need to change my code to wait for all the 30 files to be processed to execute other routines in C++. At this moment, I'm not able to force my code to wait for all the omp scope to be executed and then move the second function.
Please find below a draft of my code.
int W = 10;
int i = 1;
ostringstream fileName;
int th_id, nthreads;
omp_set_num_threads(W);
#pragma omp parallel shared (nFiles) private(i,fileName,th_id)
{
#pragma omp for schedule(static)
for ( i = 1; i <= nFiles; i++)
{
th_id = omp_get_thread_num();
cout << "Th_id: " << th_id << endl;
// CALCULATION IS PERFORMED HERE FOR EACH FILE
}
}
// THIS is the point where the program should wait for the whole block to be finished
// Calling the second function ...

Both the "omp for" and "omp parallel" pragmas have an implicit barrier at the end of its scope. Therefore, the code after the parallel section can't be executed until the parallel section has concluded. So your code should run perfectly.
If there is still a problem then it isn't because your code isn't waiting at the end of the parallel region.
Please supply us with more details about what happens during execution of this code. This way we might be able to find the real cause of your problem.

set_num_threads inside parallel not working

I'm struggling to set number of threads to 1 inside of a parallel region. I put a barrier so that all threads stop at that point and I can freely set number of threads to 1 (and there will be no threads executing). But wherever I placed omp_set_num_threads(1) it always returned 3. Is it possible to change number of threads during runtime? How can I do that?
#import<iostream>
#import<omp.h>
#import<stdio.h>
int main(){
int num_of_threads;
std::cin>>num_of_threads;
omp_set_dynamic(0);
#pragma omp parallel if(num_of_threads>1) num_threads(3)
{
int t_id = omp_get_thread_num();
int t_total = omp_get_num_threads();
printf("Current thread id: %d \n Total number_of_threads: %d \n",t_id,t_total);
#pragma omp barrier
#pragma omp single
{
omp_set_num_threads(1);
t_id = omp_get_thread_num();
t_total = omp_get_num_threads();
printf("Single section \n Current thread id: %d \n Total number_of_threads: %d \n",t_id,t_total);
}
}
}

TL;DR You can't change the number of threads in a parallel region.
Remember this is a pool of threads, which get forked at the beginning of the parallel region. Inside they are not even synchronized (if you dont tell them too), thus OpenMP would need to terminate some of them at an unknown position - obviously a bad idea.
Your #pragma omp single makes the following code section be executed by a single thread, thus no need to set it via omp_set_num_threads.
BUT it doesnt change your pool, it just advises the compiler to schedule the following section to one thread - while the rest ignores it.
To show this behavior e.g. for university purposes i would suggest to print out only the thread id in parallel and single part. That way you can already tell it's working or not.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js