OpenMP with restrict pointers fails with ICC while GCC/G++ succeeds - c++

I implemented a simple matrix vector multiplication for sparse matrices in CRS using an implicit openMP directive in the multiplication loop.
The complete code is in GitHub: https://github.com/torbjoernk/openMP-Examples/blob/icc_gcc_problem/matxvec_sparse/matxvec_sparse.cpp
Note: It's ugly ;-)
To control the private and shared memory I'm using restrict pointers. Compiling it with GCC 4.6.3 on 64bit Linux works fine (besides two warnings about %u and unsigned int in a printf command, but that's not the point).
However, compiling it with ICC 12.1.0 on 64bit Linux failes with the error:
matxvec_sparse.cpp(79): error: "default_n_row" must be specified in a variable list at enclosing OpenMP parallel pragma
#pragma omp parallel \
^
with the definition of the variable and pointer in question
int default_n_row = 4;
int *n_row = &default_n_row;
and the openMP directive defined as
#pragma omp parallel \
default(none) \
shared(n_row, aval, acolind, arowpt, vval, yval) \
private(x, y)
{
#pragma omp for \
schedule(static)
for ( x = 0; x < *n_row; x++ ) {
yval[x] = 0;
for ( y = arowpt[x]; y < arowpt[x+1]; y++ ) {
yval[x] += aval[y] * vval[ acolind[y] ];
}
}
} /* end PARALLEL */
Compiled with g++:
c++ -fopenmp -O0 -g -std=c++0x -Wall -o matxvec_sparse matxvec_sparse.cpp
Compiled with icc:
icc -openmp -O0 -g -std=c++0x -Wall -restrict -o matxvec_sparse matxvec_sparse.cpp
Is it an error in usage of GCC/ICC?
Is this a design issue in my code causing undefined behaviour?
If so, which line(s) is/are causing it?
Is it just inconsistency between ICC and GCC?
If so, what would be a good way to achieve compiler independence and compatibility?

Huh. Looking at the code, it's clear what icpc thinks the problem is, but I'm not sure without going through the specification which compiler is doing the right thing here, g++ or icpc.
The issue isn't the restrict keyword; if you take all those out and lose the -restrict option to icpc, the problem remains. The issue is that you've got in that parallel section default(none) shared(n_row...), but n_row is, at the start of the program, a pointer to default_n_row. And icpc is requiring that default_n_row also be shared (or, at least, something) in that omp parallel section.

Related

Why does this OpenMP code compile with g++, but fail with nvcc?

I'm trying to compile this code that uses OpenMP. When I compile it with nvcc, it gives an error that appears to be complaining about a token that isn't even there.
Here's a minimal version of my code:
int main() {
// this loop somehow prevents the second one from compiling
for (int foo = 0; foo < 10; foo++) {
int bar;
continue;
}
#pragma omp parallel for
for (int baz = 0; baz < 10; baz++) { }
return 0;
}
Here's the error message it produces:
exp.cu:10:1: error: for statement expected before ‘}’ token
10 | for (int baz = 0; baz < 10; baz++) { }
| ^
I'm compiling it with this command: nvcc -Xcompiler -fopenmp exp.cu
Without the first loop, this program compiles correctly. It also works if I remove either of the lines in the first loop. How does the first loop prevent the second one from compiling? Am I using invalid OpenMP syntax?
If I rename the file to exp.cpp and compile it with g++ -fopenmp exp.cpp, that works without errors. Is there any possibility that this is a bug in nvcc? Unfortunately, I can't just use g++, because I need to be able to use CUDA kernels in other places.
Edit
I'm using CUDA 11.2.
There is evidently a defect in CUDA 11.2 as far as this code example goes.
The problem appears to be resolved in CUDA 11.4 and later.
The solution is to upgrade the CUDA install to CUDA 11.4 or later.

OpenACC nvlink undefined reference to class

I am new to OpenACC and I am writing a new program from scratch (I have a fairly good idea what loops will be computationally costly from working in a similar problem before). I am getting an "Undefined reference" from nvlink. From my research, I found this is because no device code is being generated for the class I created. However, I don't understand why this is happening and how to fix it.
Below I send a MWE from my code.
include/vec1.h
#ifndef VEC1_H
#define VEC1_H
class Vec1{
public:
double data[1];
#pragma acc routine seq
Vec1();
#pragma acc routine seq
Vec1(double x);
#pragma acc routine seq
Vec1 operator* (double x);
};
#endif
src/vec1.cpp
#include "vec1.h"
Vec1::Vec1(){
data[0] = .0;
}
Vec1::Vec1(double x){
data[0] = x;
}
Vec1 Vec1::operator*(double c){
Vec1 r = Vec1(0.);
r.data[0] = c*data[0];
return r;
}
vec1_test_gpu.cpp
#include "vec1.h"
#define NUM_VECTORS 1000000
int main(){
Vec1 vec1_array[NUM_VECTORS];
for(int iv=0; iv<NUM_VECTORS; ++iv){
vec1_array[iv] = Vec1(iv);
}
#pragma acc data copyin(vec1_array)
#pragma acc parallel loop
for(int iv=0; iv<NUM_VECTORS; ++iv){
vec1_array[iv] = vec1_array[iv]*2;
}
return 0;
}
I compile them in the following way
$ nvc++ src/vec1.cpp -c -I./include -O3 -march=native -ta=nvidia:cuda11.2 -fPIC
$ nvc++ -shared -o libvec1.so vec1.o
$ nvc++ vec1_test_gpu.cpp -I./include -O3 -march=native -ta=nvidia:cuda11.2 -L./ -lvec1
The error message appears just after the last command and reads nvlink error : Undefined reference to '_ZN4Vec1mlEd' in '/tmp/nvc++jOtCBiT_m38d.o'
The problem here is that you're trying to call a device routine, "Vec1::operator*", that's contained in a shared object from a kernel in the main program. nvc++'s OpenACC implementation uses CUDA to target NVIDIA devices. Since CUDA doesn't have a dynamic linker for device code, at least not yet, this isn't supported.
You'll need to either link this statically, or move the "parallel loop" into the shared object.
Note that the "-ta" flag has been deprecated. Please consider using "-acc -gpu=cuda11.2" instead.

How to silence long long integer constant warning from GCC

I have some code using large integer literals as follows:
if(nanoseconds < 1'000'000'000'000)
This gives the compiler warning integer constant is too large for 'long' type [-Wlong-long]. However, if I change it to:
if(nanoseconds < 1'000'000'000'000ll)
...I instead get the warning use of C++11 long long integer constant [-Wlong-long].
I would like to disable this warning just for this line, but without disabling -Wlong-long or using -Wno-long-long for the entire project. I have tried surrounding it with:
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wlong-long"
...
#pragma GCC diagnostic pop
but that does not seem to work here with this warning. Is there something else I can try?
I am building with -std=gnu++1z.
Edit: minimal example for the comments:
#include <iostream>
auto main()->int {
double nanoseconds = 10.0;
if(nanoseconds < 1'000'000'000'000ll) {
std::cout << "hello" << std::endl;
}
return EXIT_SUCCESS;
}
Building with g++ -std=gnu++1z -Wlong-long test.cpp gives test.cpp:6:20: warning: use of C++11 long long integer constant [-Wlong-long]
I should mention this is on a 32bit platform, Windows with MinGW-w64 (gcc 5.1.0) - the first warning does not seem to appear on my 64bit Linux systems, but the second (with the ll suffix) appears on both.
It seems that the C++11 warning when using the ll suffix may be a gcc bug. (Thanks #praetorian)
A workaround (inspired by #nate-eldredge's comment) is to avoid using the literal and have it produced at compile time with constexpr:
int64_t constexpr const trillion = int64_t(1'000'000) * int64_t(1'000'000);
if(nanoseconds < trillion) ...

omp with gcc and intel compiler

According to this question, the use of threadprivate with openmp is
problematic. Here is a minimum (non-)working example of the problem:
#include"omp.h"
#include<iostream>
extern const int a;
#pragma omp threadprivate(a)
const int a=2;
void my_call(){
std::cout<<a<<std::endl;
};
int main(){
#pragma omp parallel for
for(unsigned int i=0;i<8;i++){
my_call();
}
}
This codes compiles with intel 15.0.2.164 but not with gcc 4.9.2-10.
gcc says:
g++ -std=c++11 -O3 -fopenmp -O3 -fopenmp test.cpp -o test
test.cpp:5:29: error: ‘a’ declared ‘threadprivate’ after first use
#pragma omp threadprivate(a)
I would be very happy to find a way to compile it with gcc.
Note: I know that global variables are a nightmare, but this example is the
coming from a code I haven't written and that I need to use... It's >11000
lines and I don't want to rewrite everything.

Why do I get undefined behavior when using OpenMP's firstprivate with std::vector on Intel compiler?

I have a problem when using OpenMP in combination with firstprivate and std::vector on the Intel c++ compiler. Take the following three functions:
#include <omp.h>
void pass_vector_by_value(std::vector<double> p) {
#pragma omp parallel
{
//do sth
}
}
void pass_vector_by_value_and_use_firstprivate(std::vector<double> p) {
#pragma omp parallel firstprivate(p)
{
//do sth
}
}
void create_vector_locally_and_use_firstprivate() {
std::vector<double> p(3, 7);
#pragma omp parallel firstprivate(p)
{
//do sth
}
}
The code compiles without warnings doing:
icc filename.cpp -openmp -Wall -pedantic
(icc version 14.0.1 (gcc version 4.7.0 compatibility))
or:
g++ filename.cpp -fopenmp -Wall -pedantic
(gcc version 4.7.2 20130108 [gcc-4_7-branch revision 195012] (SUSE Linux))
but after compiling with icc I am getting runtime errors such as:
*** Error in `./a.out': munmap_chunk(): invalid pointer: 0x00007fff31bcc980 ***
when calling the second function (pass_vector_by_value_and_use_firstprivate)
So the error only occurs when the firstprivate clause is used (which should invoke the copy constructor) and the vector is passed by value to the function (which should invoke the copy constructor as well). When either not passing the vector but creating it locally in the function or not using firstprivate there is no error! On gcc I do not get any errors.
I am wondering if the code somehow produces undefined behavior or if this is a bug in icc ?
I get the same problem with ICC but not GCC. Looks like a bug. Here is a workaround
void pass_vector_by_value2(std::vector<double> p) {
#pragma omp parallel
{
std::vector<double> p_private = p;
//do sth with p_private
}
}
On the other hand, in general, I don't pass non-POD by value to functions anyway. I would use a reference but if you do that you get the error
error: ‘p’ has reference type for ‘firstprivate’
The solution to that is the code I posted above anyway. Pass it by value or by reference and then define a private copy inside the parallel region as I did in the code above.