c++ openmp and threadprivate - c++

I'm in a situation where on one computer (cluster with high perf nodes) a code
compiles but on my personal computer it doesn't.
The error is
'var' declared 'threadprivate' after first use.
#pragma omp threadprivate(var)
The related line in the code is in a header file and looks like this
extern const int var;
#pragma omp threadprivate(var);
I haven't written the code so it is difficult to give a minimal example of the
problem.
Here are some specification of the computer I use :
cluster (compiles)
red hat 7.5
gcc 4.8.3 EDIT intel 15.0.0
openmp version date : 2011.07 : don't have permission to access yum/apt/...
personal computer (doesn't compile)
debian 8.0
gcc 4.9.2
openmp version date : 2013.07 : libgomp1 v 4.9.2-10
I know there is not enough information, but does anyone have an idea ?

Related

GCC version versus basic string

I'm running a .cpp code where I get some string dependence error
"basic_string.tcc: No such file or directory"
which should come from some compatibility issue. On my MAC machine "clang version 11.0.0" it works but on my linux machine with gcc 6.3.0 its fails. This happens on function
line read_in(std::string filename_begin, std::string fileformat, size_type n_files)
Any idea how to debug this perhaps,
thanks, Damir

Intel code coverage -> undefined reference to `std::string::_S_compare(unsigned long, unsigned long)

I compile code under Redhat 6 using the intel compiler icc/icpc with the flag -prof-gen:srcpos in order to perform a code coverage analysis. This works fine for some parts of my code, but I have problems in a few libraries.
I get the error
undefined reference to std::string::_S_compare(unsigned long, unsigned long)
I link againt the /usr/lib64/libstdc++.so.6.0.13.
Unfortunately, I am unable to identify the difference between code that can be compiled and code that cant. One lib that does not compile is statically build and linked.
Best regards, Georg
I am using the intel compiler version 15.0.3 20150407 and 4.4.7 20120313 (Red Hat 4.4.7-17).
After updating to gcc 4.8.2 20140120 it is working fine. In the old gcc version the required function is not provided.
I have struggled with the same error. Below is a fix to be able to compile the same code on RHEL5 and RHEL6 and not get the error you listed when generating Intel coverage reports. Just place this snippet in the .cpp file that the compiler complains is missing the symbol.
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
// NOTE: The block below is ONLY needed for building with the
// Intel code-coverage flags turned on. For some reason,
// this comes up as an un-resolved symbol. So, for CODE
// COVERAGE BUILDS ONLY, this symbol is defined here.
#if defined __INTEL_CODE_COVERAGE__ && defined __GLIBC__
// Specify that 2.6 is required because we know that 2.5 does NOT need this.
// The macro tests for >=. Will need to tune this if other glibc versions are in use.
// We have RHEL5 using 2.5, RHEL6 using 2.12.
#if __GLIBC_PREREQ(2,6)
namespace std {
template int string::_S_compare(size_type, size_type);
}
#endif /* glibc version >= 2.6 */
#endif /* intel code coverage and using glibc */
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////

__m256 unknown type (clang 5.1/i5 CPU)?

I just started to experiment with intrinsics. I managed to successfully compile a program using __m128 on a Mac using Clang 5.1. The CPU on this Mac is an Intel core i5 M540.
When I tried to compile the same code with __m256, I get the following message:
simple.cpp:4:2: error: unknown type name '__m256'
__m256 A;
The code looks like this:
#include <immintrin.h>
int main()
{
__m256 A;
return 0;
}
And here is the command used to compile it:
c++ -o simple simple.cpp -march=native -O3
Is it just that my CPU is too old to support AVX instruction set? Are all the options I use (on the command line) correct? I checked in the immintrin.h include file, and it does call another including file which seems to be defining AVX intrinsics. Apologies if the question is naive or if the terminology is misused, as I said, I am new to this topic.
The Intel 540M CPU is in the Westmere microarchitecture (sorry for the mistake in the comment) which appears before Sandy Bridge when AVX was introduced so it doesn't support AVX. The term "core i5" covers a wide range of architectures from Nehalem to Haswell (current) so using a core i5 CPU doesn't mean that you'll have support for all instruction sets like the lates ones.

Why OpenMP program runs only in one thread

I just tried OpenMP with a simple c program
test() {
for(int i=0;i<100000000;i++);
}
main() {
printf("Num of CPU: %d\n", omp_get_num_procs());
#pragma omp parallel for num_threads(4)
for(int i=0;i<100;i++) test();
}
Compiled with g++ -fopenmp. It can correctly print out that I have 4 CPUs, but all test functions are running at thread 0.
I tried to modify the OMP_NUM_THREADS. But it has no effect also.
I had everything the same as the online examples but why wouldn't I get it to work?
Your problem is here:
#pragma omp parallel for num_thread(4) <---
The correct clause is num_threads(4), not num_thread(4). Incorrect openmp pragmas are ignored and so you ended up with a sequential program. :)
I'm surprised you didn't get a compiler warning, because I did.
I had this problem in visual studio and finally I understood that I had forgotten to enable Open MP support in visual studio. It didn't give me any error but executed the program just for one thread
first choose project _> properties -> c/c++ -> language -> open mp support -> choose yes
and then you will find above conformance mode (make it no )
use the function omp_set_num_threads(4) before calling the omp parallel section.
also, how do you determine the number of threads ??
embed your printfs in a critical section just to make sure everything is getting printed.
I encountered the very same situation on my ubuntu desktop when I extends numpy module with C code. openmp only ran with one thread. I happened to remove libopenblas-base and install libatlas-base-dev.(to deal with numpy installation problem) Then multi-threading openmp came back:)
I have tested it on a ubuntu server with 64 cores and it works just as my desktop!
I think this is because libopenblas conflicts with libraries like atlas.

Ignore OpenMP on machine that does not have it

I have a C++ program using OpenMP, which will run on several machines that may have or not have OpenMP installed.
How could I make my program know if a machine has no OpenMP and ignore those #include <omp.h>, OpenMP directives (like #pragma omp parallel ...) and/or library functions (like tid = omp_get_thread_num();) ?
OpenMP compilation adds the preprocessor definition "_OPENMP", so you can do:
#if defined(_OPENMP)
#pragma omp ...
#endif
For some examples, see http://bisqwit.iki.fi/story/howto/openmp/#Discussion and the code which follows.
Compilers are supposed to ignore #pragma directives they don't understand; that's the whole point of the syntax. And the functions defined in openmp.h have simple well-defined meanings on a non-parallel system -- in particular, the header file will check for whether the compiler defines ENABLE_OPENMP and, if it's not enabled, provide the right fallbacks.
So, all you need is a copy of openmp.h to link to. Here's one: http://cms.mcc.uiuc.edu/qmcdev/docs/html/OpenMP_8h-source.html .
The relevant portion of the code, though, is just this:
#if defined(ENABLE_OPENMP)
#include <omp.h>
#else
typedef int omp_int_t;
inline omp_int_t omp_get_thread_num() { return 0;}
inline omp_int_t omp_get_max_threads() { return 1;}
#endif
At worst, you can just take those three lines and put them in a dummy openmp.h file, and use that. The rest will just work.
OpenMP is a compiler runtime thing and not a platform thing.
ie. If you compile your app using Visual Studio 2005 or higher, then you always have OpenMP available as the runtime supports it. (and if the end-user doesn't have the Visual Studio C runtime installed, then your app won't work at all).
So, you don't need to worry, if you can use it, it will always be there just like functions such as strcmp. To make sure they have the CRT, then you can install the visual studio redistributable.
edit:
ok, but GCC 4.1 will not be able to compile your openMP app, so the issue is not the target machine, but the target compiler. As all compilers have pre-defined macros giving their version, wrap your OpenMP calls with #ifdef blocks. for example, GCC uses 3 macros to identify the compiler version, __GNUC__, __GNUC_MINOR__ and __GNUC_PATCHLEVEL__
How could I make my program know if a machine has no OpenMP and ignore those #include <omp.h>, OpenMP directives (like #pragma omp parallel ...) and/or library functions (like tid = omp_get_thread_num();) ?
Here's a late answer, but we just got a bug report due to use of #pragma omp simd on Microsoft compilers.
According to OpenMP Specification, section 2.2:
Conditional Compilation
In implementations that support a preprocessor, the _OPENMP macro name
is defined to have the decimal value yyyymm where yyyy and mm are the
year and onth designations of the version of the OpenMP API that the
implementation supports.
It appears modern Microsoft compilers only support OpenMP from sometime between 2000 and 2005. I can only say "sometime between" because OpenMP 2.0 was released in 2000, and OpenMP 2.5 was released in 2005. But Microsoft advertises a version from 2002.
Here are some _OPENMP numbers...
Visual Studio 2012 - OpenMP 200203
Visual Studio 2017 - OpenMP 200203
IBM XLC 13.01 - OpenMP 201107
Clang 7.0 - OpenMP 201107
GCC 4.8 - OpenMP 201107
GCC 8.2 - OpenMP 201511
So if you want to use, say #pragma omp simd to guard a loop, and #pragma omp simd is available in OpenMP 4.0, then:
#if _OPENMP >= 201307
#pragma omp simd
for (size_t i = 0; i < 16; ++i)
data[i] += x[i];
#else
for (size_t i = 0; i < 16; ++i)
data[i] += x[i];
#endif
which will run on several machines that may have or not have OpenMP installed.
And to be clear, you probably need to build your program on each of those machines. The x86_64 ABI does not guarantee OpenMP is available on x86, x32 or x86_64 machines. And I have not read you can build on one machine, and then run on another machine.
There is another approach that I like, borrowed from Bisqwit:
#if defined(_OPENMP)
#include <omp.h>
extern const bool parallelism_enabled = true;
#else
extern const bool parallelism_enabled = false;
#endif
Then, start your OpenMP parallel for loops like this:
#pragma omp parallel for if(parallelism_enabled)
Note: there are valid reasons for not using pragma, which is non-standard, hence why Google and others do not support it.