OpenMP filling array with two threads in series - c++

I have an array. And I need to fill it with two threads each value consequently, using omp_set_lock, and omp_unset_lock. First thread should write first value, then second array should write second value etc. I have no idea how to do that, because, in openmp you cant't explicitly make one thread wait for another. Have any ideas?

Why not try the omp_set_lock/omp_unset_lock functions?
omp_lock_t lock;
omp_init_lock(&lock);
#pragma omp parallel for
bool thread1 = true;
for (int i = 0; i < arr.size(); ++i) {
omp_set_lock(&lock);
if (thread1 == true) {
arr[i] = fromThread1();
thread1 = false;
} else {
arr[i] = fromThread2();
thread1 = true;
}
omp_unset_lock(&lock);
}

Related

using omp threads inside function

I'm trying to play a little with omp threads. Inorder to make "main" function more clean, I want to use omp threads inside function which called by the main function.
Here we have an example:
void main()
{
func();
}
void func()
{
#pragma omp parallel for
for (int i = 0; i < 5; i++)
{
for (int j = 0; j < 5; j++)
{
doSomething();
}
}
}
With complex computations, when running, after thread 0 finishes, the funcion returns while the other threads haven't finish yet. How can I suspend the return until all threads finishes?
Using barrier inside for loop is impossible, so I don't have another idea.

openMP increment add int among threads?

Hye
I'm trying to multithread the function below. I fail to get the counter to be properly shared among OpenMP threads, I tried atomic and int, atomic seem to not be working, neither do INT. Not sure, I'm lost, how can I solve this?
std::vector<myStruct> _myData(100);
int counter;
counter =0
int index;
#pragma omp parallel for private(index)
for (index = 0; index < 500; ++index) {
if (data[index].type == "xx") {
myStruct s;
s.innerData = data[index].rawData
processDataA(s); // processDataA(myStruct &data)
processDataB(s);
_myData[counter++] = s; // each thread should have unique int not going over 100 of initially allocated items in _myData
}
}
Edit. Update bad syntax/missing parts
If you cannot use OpenMP atomic capture, I would try:
std::vector<myStruct> _myData(100);
int counter = 0;
#pragma omp parallel for schedule(dynamic)
for (int index = 0; index < 500; ++index) {
if (data[index].type == "xx") {
myStruct s;
s.innerData = data[index].rawData
processDataA(s);
processDataB(s);
int temp;
#pragma omp critical
temp = counter++;
assert(temp < _myData.size());
_myData[temp] = s;
}
}
Or:
#pragma omp parallel for schedule(dynamic,c)
and experiment with chunk size c.
However, atomics would be likely more efficient than critical sections. There should be some form of atomics supported by your compiler.
Note that your solution is kind of fragile, since it works only if the condition inside the loop is evaluated to true less than 101x. That's why I added assertion into the code. Maybe a better solution:
std::vector<myStruct> _myData;
size_t size = 0;
#pragma omp parallel for reduction(+,size)
for (int index = 0; index < data.size(); ++index)
if (data[index].type == "xx") size++;
v.resize(size);
...
Then, you don't need to care about the vector size and also don't waste memory space.

Locking a part of memory for multithreading

I'm trying to write some code that creates threads that can modify different parts of memory concurrently. I read that a mutex is usually used to lock code, but I'm not sure if I can use that in my situation. Example:
using namespace std;
mutex m;
void func(vector<vector<int> > &a, int b)
{
lock_guard<mutex> lk(m);
for (int i = 0; i < 10E6; i++) { a[b].push_back(1); }
}
int main()
{
vector<thread> threads;
vector<vector<int> > ints(4);
for (int i = 0; i < 10; i++)
{
threads.push_back(thread (func, ref(ints), i % 4));
}
for (int i = 0; i < 10; i++) { threads[i].join(); }
return 0;
}
Currently, the mutex just locks the code inside func, so (I believe) every thread just has to wait until the previous is finished.
I'm trying to get the program to edit the 4 vectors of ints at the same time, but that does realize it has to wait until some other thread is done editing one of those vectors before starting the next.
I think you want the following: (one std::mutex by std::vector<int>)
std::mutex m[4];
void func(std::vector<std::vector<int> > &a, int index)
{
std::lock_guard<std::mutex> lock(m[index]);
for (int i = 0; i < 10E6; i++) {
a[index].push_back(1);
}
}
Have you considered using a semaphore instead of a mutex?
The following questions might help you:
Semaphore Vs Mutex
When should we use mutex and when should we use semaphore
try:
void func(vector<vector<int> > &a, int b)
{
for (int i=0; i<10E6; i++) {
lock_guard<mutex> lk(m);
a[b].push_back(1);
}
}
You only need to lock your mutex while accessing the shared object (a). The way you implemented func means that one thread must finish running the entire loop before the next can start running.

Decreasing number of iterations in OpenMP parallel for

I have a parallel for in a C++ program that has to loop up to some number of iterations. Each iteration computes a possible solution for an algorithm, and I want to exit the loop once I find a valid one (it is ok if a few extra iterations are done). I know the number of iterations should be fixed from the beginning in the parallel for, but since I'm not increasing the number of iterations in the following code, is there any guarantee of that threads check the condition before proceeding with their current iteration?
void fun()
{
int max_its = 100;
#pragma omp parallel for schedule(dynamic, 1)
for(int t = 0; t < max_its; ++t)
{
...
if(some condition)
max_its = t; // valid to make threads exit the for?
}
}
Modifying the loop counter works for most implementations of OpenMP worksharing constructs, but the program will no longer be conforming to OpenMP and there is no guarantee that the program works with other compilers.
Since the OP is OK with some extra iterations, OpenMP cancellation will be the way to go. OpenMP 4.0 introduced the "cancel" construct exactly for this purpose. It will request termination of the worksharing construct and teleport the threads to the end of it.
void fun()
{
int max_its = 100;
#pragma omp parallel for schedule(dynamic, 1)
for(int t = 0; t < max_its; ++t)
{
...
if(some condition) {
#pragma omp cancel for
}
#pragma omp cancellation point for
}
}
Be aware that might there might be a price to pay in terms of performance, but you might want to accept this if the overall performance is better when aborting the loop.
In pre-4.0 implementations of OpenMP, the only OpenMP-compliant solution would be to have an if statement to approach the regular end of the loop as quickly as possible without execution the actual loop body:
void fun()
{
int max_its = 100;
#pragma omp parallel for schedule(dynamic, 1)
for(int t = 0; t < max_its; ++t)
{
if(!some condition) {
... loop body ...
}
}
}
Hope that helps!
Cheers,
-michael
You can't modify max_its as the standard says it must be a loop invariant expression.
What you can do, though, is using a boolean shared variable as a flag:
void fun()
{
int max_its = 100;
bool found = false;
#pragma omp parallel for schedule(dynamic, 1) shared(found)
for(int t = 0; t < max_its; ++t)
{
if( ! found ) {
...
}
if(some condition) {
#pragma omp atomic
found = true; // valid to make threads exit the for?
}
}
}
A logic of this kind may be also implemented with tasks instead of a work-sharing construct. A sketch of the code would be something like the following:
void algorithm(int t, bool& found) {
#pragma omp task shared(found)
{
if( !found ) {
// Do work
if ( /* conditionc*/ ) {
#pragma omp atomic
found = true
}
}
} // task
} // function
void fun()
{
int max_its = 100;
bool found = false;
#pragma omp parallel
{
#pragma omp single
{
for(int t = 0; t < max_its; ++t)
{
algorithm(t,found);
}
} // single
} // parallel
}
The idea is that a single thread creates max_its tasks. Each task will be assigned to a waiting thread. If some of the tasks find a valid solution, then all the others will be informed by the shared variable found.
If some_condition is a logical expression that is "always valid", then you could do:
for(int t = 0; t < max_its && !some_condition; ++t)
That way, it's very clear that !some_condition is required to continue the loop, and there is no need to read the rest of the code to find out that "if some_condition, loop ends"
Otherwise (for example if some_condition is the result of some calculation inside the loop and it's complicated to "move" the some_condition to the for-loop condition, then using break is clearly the right thing to do.

Parallel OpenMP loop with break statement

I know that you cannot have a break statement for an OpenMP loop, but I was wondering if there is any workaround while still the benefiting from parallelism. Basically I have 'for' loop, that loops through the elements of a large vector looking for one element that satisfies a certain condition. However there is only one element that will satisfy the condition so once that is found we can break out of the loop, Thanks in advance
for(int i = 0; i <= 100000; ++i)
{
if(element[i] ...)
{
....
break;
}
}
See this snippet:
volatile bool flag=false;
#pragma omp parallel for shared(flag)
for(int i=0; i<=100000; ++i)
{
if(flag) continue;
if(element[i] ...)
{
...
flag=true;
}
}
This situation is more suitable for pthread.
You could try to manually do what the openmp for loop does, using a while loop:
const int N = 100000;
std::atomic<bool> go(true);
uint give = 0;
#pragma omp parallel
{
uint i, stop;
#pragma omp critical
{
i = give;
give += N/omp_get_num_threads();
stop = give;
if(omp_get_thread_num() == omp_get_num_threads()-1)
stop = N;
}
while(i < stop && go)
{
...
if(element[i]...)
{
go = false;
}
i++;
}
}
This way you have to test "go" each cycle, but that should not matter that much. More important is that this would correspond to a "static" omp for loop, which is only useful if you can expect all iterations to take a similar amount of time. Otherwise, 3 threads may be already finished while one still has halfway to got...
I would probably do (copied a bit from yyfn)
volatile bool flag=false;
for(int j=0; j<=100 && !flag; ++j) {
int base = 1000*j;
#pragma omp parallel for shared(flag)
for(int i = 0; i <= 1000; ++i)
{
if(flag) continue;
if(element[i+base] ...)
{
....
flag=true;
}
}
}
Here is a simpler version of the accepted answer.
int ielement = -1;
#pragma omp parallel
{
int i = omp_get_thread_num()*n/omp_get_num_threads();
int stop = (omp_get_thread_num()+1)*n/omp_get_num_threads();
for(;i <stop && ielement<0; ++i){
if(element[i]) {
ielement = i;
}
}
}
bool foundCondition = false;
#pragma omp parallel for
for(int i = 0; i <= 100000; i++)
{
// We can't break out of a parallel for loop, so this is the next best thing.
if (foundCondition == false && satisfiesComplicatedCondition(element[i]))
{
// This is definitely needed if more than one element could satisfy the
// condition and you are looking for the first one. Probably still a
// good idea even if there can only be one.
#pragma omp critical
{
// do something, store element[i], or whatever you need to do here
....
foundCondition = true;
}
}
}