Why does this OpenMP code compile with g++, but fail with nvcc? - c++

I'm trying to compile this code that uses OpenMP. When I compile it with nvcc, it gives an error that appears to be complaining about a token that isn't even there.
Here's a minimal version of my code:
int main() {
// this loop somehow prevents the second one from compiling
for (int foo = 0; foo < 10; foo++) {
int bar;
continue;
}
#pragma omp parallel for
for (int baz = 0; baz < 10; baz++) { }
return 0;
}
Here's the error message it produces:
exp.cu:10:1: error: for statement expected before ‘}’ token
10 | for (int baz = 0; baz < 10; baz++) { }
| ^
I'm compiling it with this command: nvcc -Xcompiler -fopenmp exp.cu
Without the first loop, this program compiles correctly. It also works if I remove either of the lines in the first loop. How does the first loop prevent the second one from compiling? Am I using invalid OpenMP syntax?
If I rename the file to exp.cpp and compile it with g++ -fopenmp exp.cpp, that works without errors. Is there any possibility that this is a bug in nvcc? Unfortunately, I can't just use g++, because I need to be able to use CUDA kernels in other places.
Edit
I'm using CUDA 11.2.

There is evidently a defect in CUDA 11.2 as far as this code example goes.
The problem appears to be resolved in CUDA 11.4 and later.
The solution is to upgrade the CUDA install to CUDA 11.4 or later.

Related

List initialization returns semicolon error

I'm trying to compile the following C++ code on Visual Studio Code, using the Mac clang compiler.
#include <iostream>
int main() {
int x { 5 };
std::cout << x;
return 0;
}
However, this returns an error, on the line of the list initialization: int x{ 5 };. Specifically, it says I need to insert a semicolon after the x.
I don't get what's wrong with this code, it works fine on an online compiler. How do I fix this?
Running man clang in the Terminal and skimming through, I found this:
The default C++ language standard is gnu++14.
UPDATE: I ran clang++ main.cpp in the compiler and it returned that semicolon error. This isn't a problem with VSCode, so I'll remove that tag.
Here's the error:
main.cpp:3:10: error: expected ';' at end of declaration
int x { 5 };
^
;
1 error generated.

Trivial Eigen3 Tensor program does not build without -On

I'm trying to build a write of software with the Tensor module provided as unsupported from eigen3. I've written a simple piece of code that will build with a simple application of VectorXd (just printing it to stdout), and will also build with an analogous application of Tensor in place of the VectorXd, but WILL NOT build when I do not throw an optimization flag (-On). Note that my build is from within a conda enviromnent that is using conda-forge compilers, so the g++ in what follows is the g++ obtained from conda forge for ubuntu. It says its name in the error messages following, if that is perceived to be the issue.
I have a feeling this is not about the program I'm trying to write, but just in case I've included an mwe.cpp that seems to produce the error. The code follows:
#include <eigen3/Eigen/Dense>
#include <eigen3/unsupported/Eigen/CXX11/Tensor>
#include <iostream>
using namespace Eigen;
using namespace std;
int main(int argc, char const *argv[])
{
VectorXd v(6);
v << 1, 2, 3, 4, 5, 6;
cout << v.cwiseSqrt() << "\n";
Tensor<double, 1> t(6);
for (auto i=0; i<v.size(); i++){
t(i) = v(i);
}
cout << "\n";
for (auto i=0; i<t.size(); i++){
cout << t(i) << " ";
}
cout << "\n";
return 0;
}
If the above code is compiled without any optimizations, like:
g++ -I ~/miniconda3/envs/myenv/include/ mwe.cpp -o mwe
I get the following compiler error:
/home/myname/miniconda3/envs/myenv/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld: /tmp/cc2q8gj4.o: in function `Eigen::internal::(anonymous namespace)::get_random_seed()':
mwe.cpp:(.text+0x15): undefined reference to `clock_gettime'
collect2: error: ld returned 1 exit status
If instead I ask for 'n' optimization level, like the following:
g++ -I ~/miniconda3/envs/loos/include/ -On mwe.cpp -o mwe
The program builds without complaint and I get expected output:
$ ./mwe
1
1.41421
1.73205
2
2.23607
2.44949
1 2 3 4 5 6
I have no clue why this little program, or the real program I'm trying to write, would be trying to get a random seed for anything. Any advice would be appreciated. The reason why I would like to build without optimization is so that debugging is easier. I actually thought all this was being caused by debug flags, but I realized that my build tool's debug setting didn't ask for optimization and narrowed that down to the apparent cause. If I throw -g -O1 I do not see the error.
Obviously, if one were to comment out all the code that has to do with the Tensor module, that is everthing in main above 'return' and below the cwiseSqrt() line, and also the include statement, the code builds and produces expected output.
Technically, this is a linker error (g++ calls the compiler as well as the linker, depending on the command line arguments). And you get linker-errors if an externally defined function is called from somewhere, even if the code is never reached.
When compiling with optimizations enabled, g++ will optimize away uncalled functions (outside the global namespace), thus you get no linker errors. You may want to try -Og instead of -O1 for better debugging experience.
The following code should produce similar behavior:
int foo(); // externally defined
namespace { // anonymous namespace
// defined inside this module, but never called
int bar() {
return foo();
}
}
int main() {
// if you un-comment this line, the
// optimized version will fail as well:
// ::bar();
}
According to man clock_gettime you need to link with -lrt if your glibc version is older than 2.17 -- maybe that is the case for your setup:
g++ -I ~/miniconda3/envs/myenv/include/ mwe.cpp -o mwe -lrt

Why do I get undefined behavior when using OpenMP's firstprivate with std::vector on Intel compiler?

I have a problem when using OpenMP in combination with firstprivate and std::vector on the Intel c++ compiler. Take the following three functions:
#include <omp.h>
void pass_vector_by_value(std::vector<double> p) {
#pragma omp parallel
{
//do sth
}
}
void pass_vector_by_value_and_use_firstprivate(std::vector<double> p) {
#pragma omp parallel firstprivate(p)
{
//do sth
}
}
void create_vector_locally_and_use_firstprivate() {
std::vector<double> p(3, 7);
#pragma omp parallel firstprivate(p)
{
//do sth
}
}
The code compiles without warnings doing:
icc filename.cpp -openmp -Wall -pedantic
(icc version 14.0.1 (gcc version 4.7.0 compatibility))
or:
g++ filename.cpp -fopenmp -Wall -pedantic
(gcc version 4.7.2 20130108 [gcc-4_7-branch revision 195012] (SUSE Linux))
but after compiling with icc I am getting runtime errors such as:
*** Error in `./a.out': munmap_chunk(): invalid pointer: 0x00007fff31bcc980 ***
when calling the second function (pass_vector_by_value_and_use_firstprivate)
So the error only occurs when the firstprivate clause is used (which should invoke the copy constructor) and the vector is passed by value to the function (which should invoke the copy constructor as well). When either not passing the vector but creating it locally in the function or not using firstprivate there is no error! On gcc I do not get any errors.
I am wondering if the code somehow produces undefined behavior or if this is a bug in icc ?
I get the same problem with ICC but not GCC. Looks like a bug. Here is a workaround
void pass_vector_by_value2(std::vector<double> p) {
#pragma omp parallel
{
std::vector<double> p_private = p;
//do sth with p_private
}
}
On the other hand, in general, I don't pass non-POD by value to functions anyway. I would use a reference but if you do that you get the error
error: ‘p’ has reference type for ‘firstprivate’
The solution to that is the code I posted above anyway. Pass it by value or by reference and then define a private copy inside the parallel region as I did in the code above.

Cannot get <random> library to work

I'm having trouble getting the library working on macosx. First off, I tried to compile the following code, saved as rand.cpp, taken from the c++ website
#include <iostream>
#include <random>
int main()
{
const int nrolls=10000; // number of experiments
const int nstars=100; // maximum number of stars to distribute
std::default_random_engine generator;
std::normal_distribution<double> distribution(5.0,2.0);
int p[10]={};
for (int i=0; i<nrolls; ++i) {
double number = distribution(generator);
if ((number>=0.0)&&(number<10.0)) ++p[int(number)];
}
std::cout << "normal_distribution (5.0,2.0):" << std::endl;
for (int i=0; i<10; ++i) {
std::cout << i << "-" << (i+1) << ": ";
std::cout << std::string(p[i]*nstars/nrolls,'*') << std::endl;
}
return 0;
}
Upon running this with g++ rand.cpp -o rand i get the following errors
rand.cpp:9: error: ‘default_random_engine’ is not a member of ‘std’
rand.cpp:10: error: ‘normal_distribution’ is not a member of ‘std’
Searching around it seems to be suggested that the issue is the compiler, apparently thus library is only available to gcc11. I found a way to update gcc using the macport package as shown here Update GCC on OSX but I still don't know how to use this new compiler. Running g++ rand.cpp -o rand returns the same errors even when I change the compiler with sudo port select --set gcc gcc40 or sudo port select --set gcc mp-gcc46. I also tried using g++ -std=c++11 rand.cpp -o rand which just returns
cc1plus: error: unrecognized command line option "-std=c++11"
Does anyone know what I am doing wrong?
Try it with Clang++, which should be installed in your mac, or a new version of GCC.
gcc42: I had this version installed, it didn't work, and didn't recognize -std=c++0x and -std=c++11.
gcc49: Installed this with brew, it gave the same error but -std=c++11 made it work.
Clang++: Worked like a charm without even specifying the standard (it probably defaults to c++11).
Also, check if you have the latest version of the command line tools, if not, download them from the Downloads for Apple Developers website.
What you're doing wrong
The version you installed doesn't have the -std=c++11 option, but it should work with -std=c++0x or -std=gnu++0x, that's what it says in the manual for the 4.6 version.

OpenMP with restrict pointers fails with ICC while GCC/G++ succeeds

I implemented a simple matrix vector multiplication for sparse matrices in CRS using an implicit openMP directive in the multiplication loop.
The complete code is in GitHub: https://github.com/torbjoernk/openMP-Examples/blob/icc_gcc_problem/matxvec_sparse/matxvec_sparse.cpp
Note: It's ugly ;-)
To control the private and shared memory I'm using restrict pointers. Compiling it with GCC 4.6.3 on 64bit Linux works fine (besides two warnings about %u and unsigned int in a printf command, but that's not the point).
However, compiling it with ICC 12.1.0 on 64bit Linux failes with the error:
matxvec_sparse.cpp(79): error: "default_n_row" must be specified in a variable list at enclosing OpenMP parallel pragma
#pragma omp parallel \
^
with the definition of the variable and pointer in question
int default_n_row = 4;
int *n_row = &default_n_row;
and the openMP directive defined as
#pragma omp parallel \
default(none) \
shared(n_row, aval, acolind, arowpt, vval, yval) \
private(x, y)
{
#pragma omp for \
schedule(static)
for ( x = 0; x < *n_row; x++ ) {
yval[x] = 0;
for ( y = arowpt[x]; y < arowpt[x+1]; y++ ) {
yval[x] += aval[y] * vval[ acolind[y] ];
}
}
} /* end PARALLEL */
Compiled with g++:
c++ -fopenmp -O0 -g -std=c++0x -Wall -o matxvec_sparse matxvec_sparse.cpp
Compiled with icc:
icc -openmp -O0 -g -std=c++0x -Wall -restrict -o matxvec_sparse matxvec_sparse.cpp
Is it an error in usage of GCC/ICC?
Is this a design issue in my code causing undefined behaviour?
If so, which line(s) is/are causing it?
Is it just inconsistency between ICC and GCC?
If so, what would be a good way to achieve compiler independence and compatibility?
Huh. Looking at the code, it's clear what icpc thinks the problem is, but I'm not sure without going through the specification which compiler is doing the right thing here, g++ or icpc.
The issue isn't the restrict keyword; if you take all those out and lose the -restrict option to icpc, the problem remains. The issue is that you've got in that parallel section default(none) shared(n_row...), but n_row is, at the start of the program, a pointer to default_n_row. And icpc is requiring that default_n_row also be shared (or, at least, something) in that omp parallel section.