#include<omp.h>
#include<stdio.h>
#include<stdlib.h>
void main(int argc, int *argv[]){
#pragma omp parallel num_threads(3)
{
int tid = omp_get_thread_num();
printf("Hello world from thread = %d \n",tid);
if(tid == 0){
int nthreads = omp_get_num_threads();
printf("Number of threads = %d\n",nthreads);
}
}
}
I am learning OpenMP and I don't understand why it executes only one thread when I have specified the number of threads 3?
The program ouptut:
Hello world from thread = 0
Number of threads = 1
You need to compile your program with -fopenmp.
g++ a.cc -fopenmp
In VisualStudio just switch on OMP. You can refer to https://msdn.microsoft.com/de-de/library/fw509c3b(v=vs.120).aspx
omp_get_num_threads() returns the total number of threads being used
omp_get_thread_num() returns the current thread ID
You should use the former one
Related
So I have a task. I need to make 8 threads an make them write their numbers in reverse order. I know how to make them write in natural order, but really confused about the reverse one. Hope anyone can help me!
I don't really understand the purpose of what you are asking but this works
#include "omp.h"
#include <iostream>
using namespace std;
int main()
{
#pragma omp parallel
{
int nthreads = omp_get_num_threads();
for(int i=nthreads-1; i>=0; i--)
{
#pragma omp barrier
{
if(i==omp_get_thread_num())
{
#pragma omp critical
cout << "I am thread "<< i <<endl;
}
}
}
}
}
8 threads it outputs
I am thread 7
I am thread 6
I am thread 5
I am thread 4
I am thread 3
I am thread 2
I am thread 1
I am thread 0
What is the performance cost of call omp_get_thread_num(), compared to look up the value of a variable?
How to avoid calling omp_get_thread_num() for many times in a simd openmp loop?
I can use #pragma omp parallel, but will that make a simd loop?
#include <vector>
#include <omp.h>
int main() {
std::vector<int> a(100);
auto a_size = a.size();
#pragma omp for simd
for (int i = 0; i < a_size; ++i) {
a[i] = omp_get_thread_num();
}
}
I wouldn't be too worried about the cost of the call, but for code clarity you can do:
#include <vector>
#include <omp.h>
int main() {
std::vector<int> a(100);
auto a_size = a.size();
#pragma omp parallel
{
const auto threadId = omp_get_thread_num();
#pragma omp for
for (int i = 0; i < a_size; ++i) {
a[i] = threadId;
}
}
}
As long as you use #pragma omp for (and don't put an extra `parallel in there! otherwise each of your n threads will spawn n more threads... that's bad) it will ensure that inside your parallel region that for loop is split up amongst the n threads. Make sure omp compiler flag is turned on.
I am working on visual studio 2012 and trying to use several threads on a very simple hello word example :
#include <omp.h>
#include <stdio.h>
int main() {
omp_set_dynamic(0);
omp_set_num_threads(4);
#pragma omp parallel
printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
}
But the result I got is :
Hello from thread 0, nthreads 1
Why can't I have 4 threads ?
this is an exercise from the OpenMP website:
https://computing.llnl.gov/tutorials/openMP/exercise.html
#include "stdafx.h"
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int _tmain(int argc, _TCHAR* argv[])
{
int nthreads, i, tid;
float total;
/*** Spawn parallel region ***/
#pragma omp parallel private(i, tid) // i changed this line
{
/* Obtain thread number */
tid = omp_get_thread_num();
/* Only master thread does this */
if (tid == 0) {
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
printf("Thread %d is starting...\n",tid);
#pragma omp barrier
/* do some work */
total = 0.0;
#pragma omp for schedule(dynamic,10)
for (i=0; i<1000000; i++)
total = total + i*1.0;
printf ("Thread %d is done! Total= %e\n",tid,total);
}
}
the output for this is
Number of threads = 4
Thread 0 is starting...
Thread 3 is starting...
Thread 2 is starting...
Thread 1 is starting...
Thread 0 is done! Total= 0.000000e+000
Thread 3 is done! Total= 0.000000e+000
Thread 2 is done! Total= 0.000000e+000
Thread 1 is done! Total= 0.000000e+000
which means we have a problem with the variable "total"
this is the help on the site
Here is my Solution: do you think this is the correct way to do it?
#include "stdafx.h"
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int _tmain(int argc, _TCHAR* argv[])
{
int nthreads, i, tid;
float total;
/*** Spawn parallel region ***/
#pragma omp parallel private(total,tid)
{
/* Obtain thread number */
tid = omp_get_thread_num();
total= 0.0;
/* Only master thread does this */
if (tid == 0) {
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
printf("Thread %d is starting...\n",tid);
#pragma omp parallel for schedule(static,10)\
private(i)\
reduction(+:total)
for (i=0; i<1000000; i++)
total = total + i*1.0;
printf ("Thread %d is done! Total= %e\n",tid,total);
} /*** End of parallel region ***/
}
Here is my new output:
Number of threads = 4
Thread 0 is starting...
Thread 1 is starting...
Thread 0 is done! Total= 4.999404e+011
Thread 2 is starting...
Thread 1 is done! Total= 4.999404e+011
Thread 2 is done! Total= 4.999404e+011
Thread 3 is starting...
Thread 3 is done! Total= 4.999404e+011
Yes you certainly want total to be a thread-private variable. One thing you presumably would do in a real example is to reduce the thread-private totals to a single global total at the end (and only let one thread print the result then). One way to do that is a simple
#pragma omp atomic
global_total += total
at the end (there are better ways though using reductions).
PS: Loop counters for omp for are by default private, so you actually don't have to explicitly specify that.
I am writing simple parallel program in C++ using OpenMP.
I am working on Windows 7 and on Microsoft Visual Studio 2010 Ultimate.
I changed the Language property of the project to "Yes/OpenMP" to support OpenMP
Here I provide the code:
#include <iostream>
#include <omp.h>
using namespace std;
double sum;
int i;
int n = 800000000;
int main(int argc, char *argv[])
{
omp_set_dynamic(0);
omp_set_num_threads(4);
sum = 0;
#pragma omp for reduction(+:sum)
for (i = 0; i < n; i++)
sum+= i/(n/10);
cout<<"sum="<<sum<<endl;
return EXIT_SUCCESS;
}
But, I couldn't get any acceleration by changing the x in omp_set_num_threads(x);
It doesn't matter if I use OpenMp or not, the calculating time is the same, about 7 seconds.
Does Someone know what is the problem?
Your pragma statement is missing the parallel specifier:
#include <iostream>
#include <omp.h>
using namespace std;
double sum;
int i;
int n = 800000000;
int main(int argc, char *argv[])
{
omp_set_dynamic(0);
omp_set_num_threads(4);
sum = 0;
#pragma omp parallel for reduction(+:sum) // add "parallel"
for (i = 0; i < n; i++)
sum+= i/(n/10);
cout<<"sum="<<sum<<endl;
return EXIT_SUCCESS;
}
Sequential:
sum=3.6e+009
2.30071
Parallel:
sum=3.6e+009
0.618365
Here's a version that some speedup with Hyperthreading. I had to increase the # of iterations by 10x and bump the datatypes to long long:
double sum;
long long i;
long long n = 8000000000;
int main(int argc, char *argv[])
{
omp_set_dynamic(0);
omp_set_num_threads(8);
double start = omp_get_wtime();
sum = 0;
#pragma omp parallel for reduction(+:sum)
for (i = 0; i < n; i++)
sum+= i/(n/10);
cout<<"sum="<<sum<<endl;
double end = omp_get_wtime();
cout << end - start << endl;
system("pause");
return EXIT_SUCCESS;
}
Threads: 1
sum=3.6e+014
13.0541
Threads: 2
sum=3.6e+010
6.62345
Threads: 4
sum=3.6e+010
3.85687
Threads: 8
sum=3.6e+010
3.285
Apart from the error pointed out by Mystical, you seemed to assume that openMP can justs to magic. It can at best use all cores on your machine. If you have 2 cores, it may reduce the execution time by two if you call omp_set_num_threads(np) with np>=2, but for np much larger than the number of cores, the code will be inefficient due to parallelization overheads.
The example from Mystical was apparently run on at least 4 cores with np=4.