num_threads clause not setting number of threads [closed] - c++

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I have the following simple program
#include <iostream>
#include <omp>
int main() {
std::cout << "max threads: " << omp_get_max_threads() << "\n";
#pragma parallel num_threads(4)
{
int tid = omp_get_thread_num();
std::cout << "Hello from " << tid << " of " << omp_get_num_threads() << "\n";
#pragma omp for
for (int i = 0; i < 5; i++) {
std::cout << "(" << tid << ", " << i << ")\n";
}
}
}
And I am compiling with clang++ -fopenmp=libomp main.cpp. I am able to compile and run other OpenMP programs compiled in this way.
I would expect the num_threads(4) to cause the parallel region to run across 4 threads. Instead I experience the following output:
max threads: 4
Hello from 0 of 1
(0, 0)
(0, 1)
(0, 2)
(0, 3)
(0, 4)
Why is the parallel region not running across 4 threads?

You left the omp out of your parallel pragma.
#pragma omp parallel num_threads(4)

Related

Using std::cout in a parallel for loop using OpenMP [duplicate]

This question already has answers here:
Parallelize output using OpenMP
(2 answers)
Closed 3 years ago.
I want to parallelize a for-loop in C++ using OpenMP. In that loop I want to print some results:
#pragma omp parallel for
for (int i=0; i<1000000; i++) {
if (test.successful() == true) {
std::cout << test.id() << " " << test.result() << std::endl;
}
}
The result I get without using OpenMP:
3 23923.23
1 32329.32
2 23239.45
However I get the following using a parallel for-loop:
314924.4244.5
434.
4343.43123
How can I avoid such an output?
The reason is the printing is not defined as an atomic operation, thus context switch occurs while printing to the standard output is still ongoing.
Solution: You must make the for loop body an atomic operation, using #pragma omp atomic
#pragma omp parallel for
for (int i=0; i<1000000; i++) {
if (test.successful() == true) {
#pragma omp atomic
std::cout << test.id() << " " << test.result() << std::endl;
}
}

How do I set the number of threads in OpenMP [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I am trying to set the number of threads within a program using OpenMP. For some reason, even though the maximum number of threads is 4, my program only uses 1 core. I'm on MacOSX but I'm using the gcc compiler (specifically: gcc9.1.0 and OpenMP version 4.5)
#include <fstream>
#include <chrono>
#include <omp.h>
int main() {
int maxthreads = omp_get_max_threads();
std::cout << "maxthreads: " << maxthreads << std::endl;
omp_set_dynamic(0);
omp_set_num_threads(4);
#pragma omp parallel num_threads(4)
{
int id = omp_get_thread_num();
#pragma omp critical
std::cout << "Hi from " << id << std::endl;
}
}
The result that I get is:
4
Hi from 0
But I expected "Hi from i" to be printed 4 times.
I needed to add flags to my cmake:
-DCMAKE_CXX_FLAGS=-fopenmp and -DCMAKE_C_FLAGS=-fopenmp

OpenMP only using one thread

I have having a bit of a frustrating problem with openmp. When I run the following code it only seems to be running on one thread.
omp_set_num_threads(8);
#pragma omp parallel for schedule(dynamic)
for(size_t i = 0; i < jobs.size(); i++) //jobs is a vector
{
std::cout << omp_get_thread_num() << "\t" << omp_get_num_threads() << "\t" << omp_in_parallel() << std::endl;
jobs[i].run();
}
This prints...
0 1 1
for every line.
I can see using top that openmp is spawning as many threads as I have the process taskset to. They are mostly idle while it runs. The program is both compiled and linked with the -fopenmp flag with gcc. I am using redhat 6. I also tried using the num_threads(8) parameter in the pragma which made no difference. The program is linked with another library which also uses openmp so maybe this is the issue. Does anyone know what might cause this behavior? In all my past openmp experience it has just worked.
Can you print your jobs.size()?
I made a quick test and it does work:
#include <stdio.h>
#include <omp.h>
#include <iostream>
int main()
{
omp_set_num_threads(2);
#pragma omp parallel for ordered schedule(dynamic)
for(size_t i = 0; i < 4; i++) //jobs is a vector
{
#pragma omp ordered
std::cout << i << "\t" << omp_get_thread_num() << "\t" << omp_get_num_threads() << "\t" << omp_in_parallel() << std::endl;
}
return 0;
}
I got:
icpc -qopenmp test.cpp && ./a.out
0 0 2 1
1 1 2 1
2 0 2 1
3 1 2 1

omp_get_max_threads() returns 1 in parallel region, but it should be 8

I'm compiling a complex C++ project on Linux which uses OpenMP, compiled with CMake and GCC 7.
The strange problem I'm encountering in this particular project is that OpenMP is clearly working, but it thinks that only 1 thread is supported, when it should be 8. However, if I manually specify the number of threads, it does indeed accelerate the code.
logOut << "In parallel? " << omp_in_parallel() << std::endl;
logOut << "Num threads = " << omp_get_num_threads() << std::endl;
logOut << "Max threads = " << omp_get_max_threads() << std::endl;
logOut << "Entering my parallel region: " << std::endl;
//without num_threads(5), only 1 thread is created
#pragma omp parallel num_threads(5)
{
#pragma omp single nowait
{
logOut << "In parallel? " << omp_in_parallel() << std::endl;
logOut << "Num threads = " << omp_get_num_threads() << std::endl;
logOut << "Max threads = " << omp_get_max_threads() << std::endl;
}
}
Output:
[openmp_test] In parallel? 0
[openmp_test] Num threads = 1
[openmp_test] Max threads = 1
[openmp_test] Entering my parallel region:
[openmp_test] In parallel? 1
[openmp_test] Num threads = 5
[openmp_test] Max threads = 1
What makes it even stranger is that a simple test OpenMP program directly correctly reports the maximum number of threads as 8, both inside and outside a parallel region.
I've been combing through all the CMake files trying to find any indicator of why this project behaves differently, but I've turned up nothing so far. There is no mention of omp_set_num_threads in any of my project files, and I can confirm that OMP_NUM_THREADS is not declared. Furthermore, this problem never happened when I compiled the same project on Windows with MSVC.
Any ideas what the problem could be?
(EDIT: I've expanded the code sample to show that it is not a nested parallel block)
CPU: Intel(R) Core(TM) i7-6700K
OS: Manjaro Linux 17.0.2
Compiler: GCC 7.1.1 20170630
_OPENMP = 201511 (I'm guessing that means OpenMP 4.5)
The values you are seeing inside the parallel region seem correct (assuming that OMP_NESTED is not true). omp_get_max_threads() returns the maximum number of threads that you might obtain if you were to go parallel form the current thread. Since you are already inside a parallel region (and we're assuming that nested parallelism is disabled) that will be one.
3.2.3 omp_get_max_threads
Summary
The omp_get_max_threads routine returns an upper bound on the number of threads that could be used
to form a new team if a parallel construct without a num_threads
clause were encountered after execution returns from this routine.
That doesn't explain why you see the value one outside the parallel region, though. (But it does answer the question in the title, to which the answer is "one is the correct answer").
Your program behaves exactly as if omp_set_num_threads(1) was called before.
Considering this snippet:
#include <iostream>
#include <string>
#include <vector>
#include "omp.h"
int main() {
omp_set_num_threads(1);
std::cout << "before parallel section: " << std::endl;
std::cout << "Num threads = " << omp_get_num_threads() << std::endl;
std::cout << "Max threads = " << omp_get_max_threads() << std::endl;
//without num_threads(5), only 1 thread is created
#pragma omp parallel num_threads(5)
{
#pragma omp single
{
std::cout << "inside parallel section: " << std::endl;
std::cout << "Num threads = " << omp_get_num_threads() << std::endl;
std::cout << "Max threads = " << omp_get_max_threads() << std::endl;
}
}
return 0;
}
the output is
before parallel section:
Num threads = 1
Max threads = 1
inside parallel section:
Num threads = 5
Max threads = 1
When I run it by setting the number of threads by 4 instead of 1 (8 on your machine), the output is as expected:
before parallel section:
Num threads = 1
Max threads = 4
inside parallel section:
Num threads = 5
Max threads = 4
Have you tried to call omp_set_num_threads(8) at the begining of your code? Or have you set the number of thread to 1 before your program (for example by inside a function calling this...)?
One other explanation could be that openMP API doesn't find it necessary to have more than one thread as only a single section is implemented inside the parallel section. In this case try to add some code that could be executed by several threads to run faster (i.e incrementing all the values of a large array of integers or calling omp_get_thread_num()) outside the single section but inside the parallel section and the number of threads should be different. Calling omp_set_num_threads only sets the upper limit for the number of threads used.

micro optimisation in for loop c++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Let's assume C++ for loop in the form of:
for(int i; i<10;i++)
So the integer needs to be allocated in the beginning and then increased at every step the as well as compared. So wouldn't it be faster to do something like that:
f or(int i; i++<10;)
since the variable doesn't need to be loaded into the storage again? espacially when making it volatile?
This little example code gave the same result for all the cases. Does the for loop get optimized anyway or am I missing something?
#include<iostream>
#include<ctime>
int main() {
time_t start,ende;
volatile int dummy = 1;
const int rep = 1000000000;
// Method 1
start = time(0);
for (int i = 0; i < rep; i++)
dummy = 1;
ende = time(0);
std::cout << "Method 1: " << difftime(ende,start)*1000 << " ms" << std::endl;
// Method 2
start = time(0);
for (int i = 0; i++ < rep; )
dummy = 1;
ende = time(0);
std::cout << "Method 2: " << difftime(ende,start)*1000 << " ms" << std::endl;
// Method 3
start = time(0);
for (volatile int i = 0; i < rep; i++)
dummy = 1;
ende = time(0);
std::cout << "Method 3: " << difftime(ende,start)*1000 << " ms" << std::endl;
}
OS: Linux
Compiler: g++
Optimization: standart (no flag)
Compilers are a lot smarter than you think. They won't produce the kind of spurious load that you think they do.
Optimization depends on compiler and its options.
I'd suggest you to disassemble your code and see what's the result of the optimization.
A simple good disassemble e.g. HT editor or IDA version 5 (it free now). For small piece of code it will be easy enough.