#pragma omp parallel num_threads is not working

#pragma omp parallel num_threads is not working - c++

#include<omp.h>
#include<stdio.h>
#include<stdlib.h>
void main(int argc, int *argv[]){
#pragma omp parallel num_threads(3)
{
int tid = omp_get_thread_num();
printf("Hello world from thread = %d \n",tid);
if(tid == 0){
int nthreads = omp_get_num_threads();
printf("Number of threads = %d\n",nthreads);
}
}
}
I am learning OpenMP and I don't understand why it executes only one thread when I have specified the number of threads 3?
The program ouptut:
Hello world from thread = 0
Number of threads = 1

You need to compile your program with -fopenmp.
g++ a.cc -fopenmp

In VisualStudio just switch on OMP. You can refer to https://msdn.microsoft.com/de-de/library/fw509c3b(v=vs.120).aspx

omp_get_num_threads() returns the total number of threads being used
omp_get_thread_num() returns the current thread ID
You should use the former one

Related

How to make threads write their numbers in reverse order using OpenMP?

So I have a task. I need to make 8 threads an make them write their numbers in reverse order. I know how to make them write in natural order, but really confused about the reverse one. Hope anyone can help me!

I don't really understand the purpose of what you are asking but this works
#include "omp.h"
#include <iostream>
using namespace std;
int main()
{
#pragma omp parallel
{
int nthreads = omp_get_num_threads();
for(int i=nthreads-1; i>=0; i--)
{
#pragma omp barrier
{
if(i==omp_get_thread_num())
{
#pragma omp critical
cout << "I am thread "<< i <<endl;
}
}
}
}
}
8 threads it outputs
I am thread 7
I am thread 6
I am thread 5
I am thread 4
I am thread 3
I am thread 2
I am thread 1
I am thread 0

Avoid calling omp_get_thread_num() in parallel for loop with simd

What is the performance cost of call omp_get_thread_num(), compared to look up the value of a variable?
How to avoid calling omp_get_thread_num() for many times in a simd openmp loop?
I can use #pragma omp parallel, but will that make a simd loop?
#include <vector>
#include <omp.h>
int main() {
std::vector<int> a(100);
auto a_size = a.size();
#pragma omp for simd
for (int i = 0; i < a_size; ++i) {
a[i] = omp_get_thread_num();
}
}

I wouldn't be too worried about the cost of the call, but for code clarity you can do:
#include <vector>
#include <omp.h>
int main() {
std::vector<int> a(100);
auto a_size = a.size();
#pragma omp parallel
{
const auto threadId = omp_get_thread_num();
#pragma omp for
for (int i = 0; i < a_size; ++i) {
a[i] = threadId;
}
}
}
As long as you use #pragma omp for (and don't put an extra `parallel in there! otherwise each of your n threads will spawn n more threads... that's bad) it will ensure that inside your parallel region that for loop is split up amongst the n threads. Make sure omp compiler flag is turned on.

set number of threads open Mp (c++)

I am working on visual studio 2012 and trying to use several threads on a very simple hello word example :
#include <omp.h>
#include <stdio.h>
int main() {
omp_set_dynamic(0);
omp_set_num_threads(4);
#pragma omp parallel
printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
}
But the result I got is :
Hello from thread 0, nthreads 1
Why can't I have 4 threads ?

openMP exercise omp_bug2.c

this is an exercise from the OpenMP website:
https://computing.llnl.gov/tutorials/openMP/exercise.html
#include "stdafx.h"
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int _tmain(int argc, _TCHAR* argv[])
{
int nthreads, i, tid;
float total;
/*** Spawn parallel region ***/
#pragma omp parallel private(i, tid) // i changed this line
{
/* Obtain thread number */
tid = omp_get_thread_num();
/* Only master thread does this */
if (tid == 0) {
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
printf("Thread %d is starting...\n",tid);
#pragma omp barrier
/* do some work */
total = 0.0;
#pragma omp for schedule(dynamic,10)
for (i=0; i<1000000; i++)
total = total + i*1.0;
printf ("Thread %d is done! Total= %e\n",tid,total);
}
}
the output for this is
Number of threads = 4
Thread 0 is starting...
Thread 3 is starting...
Thread 2 is starting...
Thread 1 is starting...
Thread 0 is done! Total= 0.000000e+000
Thread 3 is done! Total= 0.000000e+000
Thread 2 is done! Total= 0.000000e+000
Thread 1 is done! Total= 0.000000e+000
which means we have a problem with the variable "total"
this is the help on the site
Here is my Solution: do you think this is the correct way to do it?
#include "stdafx.h"
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int _tmain(int argc, _TCHAR* argv[])
{
int nthreads, i, tid;
float total;
/*** Spawn parallel region ***/
#pragma omp parallel private(total,tid)
{
/* Obtain thread number */
tid = omp_get_thread_num();
total= 0.0;
/* Only master thread does this */
if (tid == 0) {
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
printf("Thread %d is starting...\n",tid);
#pragma omp parallel for schedule(static,10)\
private(i)\
reduction(+:total)
for (i=0; i<1000000; i++)
total = total + i*1.0;
printf ("Thread %d is done! Total= %e\n",tid,total);
} /*** End of parallel region ***/
}
Here is my new output:
Number of threads = 4
Thread 0 is starting...
Thread 1 is starting...
Thread 0 is done! Total= 4.999404e+011
Thread 2 is starting...
Thread 1 is done! Total= 4.999404e+011
Thread 2 is done! Total= 4.999404e+011
Thread 3 is starting...
Thread 3 is done! Total= 4.999404e+011

Yes you certainly want total to be a thread-private variable. One thing you presumably would do in a real example is to reduce the thread-private totals to a single global total at the end (and only let one thread print the result then). One way to do that is a simple
#pragma omp atomic
global_total += total
at the end (there are better ways though using reductions).
PS: Loop counters for omp for are by default private, so you actually don't have to explicitly specify that.

Couldn't get acceleration OpenMP

I am writing simple parallel program in C++ using OpenMP.
I am working on Windows 7 and on Microsoft Visual Studio 2010 Ultimate.
I changed the Language property of the project to "Yes/OpenMP" to support OpenMP
Here I provide the code:
#include <iostream>
#include <omp.h>
using namespace std;
double sum;
int i;
int n = 800000000;
int main(int argc, char *argv[])
{
omp_set_dynamic(0);
omp_set_num_threads(4);
sum = 0;
#pragma omp for reduction(+:sum)
for (i = 0; i < n; i++)
sum+= i/(n/10);
cout<<"sum="<<sum<<endl;
return EXIT_SUCCESS;
}
But, I couldn't get any acceleration by changing the x in omp_set_num_threads(x);
It doesn't matter if I use OpenMp or not, the calculating time is the same, about 7 seconds.
Does Someone know what is the problem?

Your pragma statement is missing the parallel specifier:
#include <iostream>
#include <omp.h>
using namespace std;
double sum;
int i;
int n = 800000000;
int main(int argc, char *argv[])
{
omp_set_dynamic(0);
omp_set_num_threads(4);
sum = 0;
#pragma omp parallel for reduction(+:sum) // add "parallel"
for (i = 0; i < n; i++)
sum+= i/(n/10);
cout<<"sum="<<sum<<endl;
return EXIT_SUCCESS;
}
Sequential:
sum=3.6e+009
2.30071
Parallel:
sum=3.6e+009
0.618365
Here's a version that some speedup with Hyperthreading. I had to increase the # of iterations by 10x and bump the datatypes to long long:
double sum;
long long i;
long long n = 8000000000;
int main(int argc, char *argv[])
{
omp_set_dynamic(0);
omp_set_num_threads(8);
double start = omp_get_wtime();
sum = 0;
#pragma omp parallel for reduction(+:sum)
for (i = 0; i < n; i++)
sum+= i/(n/10);
cout<<"sum="<<sum<<endl;
double end = omp_get_wtime();
cout << end - start << endl;
system("pause");
return EXIT_SUCCESS;
}
Threads: 1
sum=3.6e+014
13.0541
Threads: 2
sum=3.6e+010
6.62345
Threads: 4
sum=3.6e+010
3.85687
Threads: 8
sum=3.6e+010
3.285

Apart from the error pointed out by Mystical, you seemed to assume that openMP can justs to magic. It can at best use all cores on your machine. If you have 2 cores, it may reduce the execution time by two if you call omp_set_num_threads(np) with np>=2, but for np much larger than the number of cores, the code will be inefficient due to parallelization overheads.
The example from Mystical was apparently run on at least 4 cores with np=4.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

#pragma omp parallel num_threads is not working - c++

You need to compile your program with -fopenmp. g++ a.cc -fopenmp

In VisualStudio just switch on OMP. You can refer to https://msdn.microsoft.com/de-de/library/fw509c3b(v=vs.120).aspx

omp_get_num_threads() returns the total number of threads being used omp_get_thread_num() returns the current thread ID You should use the former one

Related

How to make threads write their numbers in reverse order using OpenMP?

Avoid calling omp_get_thread_num() in parallel for loop with simd

set number of threads open Mp (c++)

openMP exercise omp_bug2.c

Couldn't get acceleration OpenMP

Categories

Resources