I have a simulation code, written in C++ using standard libraries together with the Eigen library for linear algebra and matrices, and observed the following effect:
If I run it on my personal laptop, which is a Macbook with OS X, I get different results than what I get if I run it on my office PC with Ubutuntu OS.
The difference in the average result is approximately 20%.
I am sure that it is the same code, because both are running the updated version of the same repository.
The code is too complex to provide here, and has a lot of complicated mechanisms inside.
Remark: Running on the same device gives always a similar average, so my results are reproducible as long as they run on the same HW.
How should I interpret this? That my code is erroneous? Or such effects are normal?
Thanks in advance!
Edit1: I am using qmake but when I run the make command, this is an example line to compile my main.cpp on my Ubuntu PC:
g++ -c -pipe -O2 -std=gnu++11 -Wall -W -D_REENTRANT -fPIC -DQT_DEPRECATED_WARNINGS -DQT_NO_DEBUG -DQT_CORE_LIB -I. -ICommon -INetwork -IEigen -I../../../Qt/5.10.0/gcc_64/include -I../../../Qt/5.10.0/gcc_64/include/QtCore -Imocs -I../../../Qt/5.10.0/gcc_64/mkspecs/linux-g++ -o objs/main.o main.cpp
The same line looks like the following on my OS X laptop:
/Library/Developer/CommandLineTools/usr/bin/clang++ -c -pipe -stdlib=libc++ -g -std=gnu++11 -arch x86_64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -mmacosx-version-min=10.13 -Wall -Wextra -fPIC -DQT_DEPRECATED_WARNINGS -DQT_CORE_LIB -I. -ICommon -INetwork -IEigen -I../../../opt/Qt5.14/lib/QtCore.framework/Headers -Imocs -I../../../opt/Qt5.14/mkspecs/macx-clang -F/Users/XXX/opt/Qt5.14/lib -o objs/main.o main.cpp
Edit2: I have an array of doubles which is calculated offline using Eigen libraries. Offline means here, the elements of the array have deterministic values, i.e., a lookup table. So, I have compared the content on both platforms, and the (printed) content is completely identical.
Related
I maintain the C+=-flavored CUDA API wrappers library. The library's current commit is relatively-well-tested, with some example programs and quite a few users. However, sometime very recently (can't say exactly when), and without committing anything new, I now get NVCC warnings during the "dlink" phase of my example programs, e.g.:
/path/to/nvcc /path/to/cuda-api-wrappers/examples/modified_cuda_samples/vectorAdd/vectorAdd.cu -dc -o /path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/./vectorAdd_generated_vectorAdd.cu.o -ccbin /opt/gcc-5.4.0/bin/gcc -m64 -gencode arch=compute_52,code=compute_52 --std=c++11 -Xcompiler -Wall -O3 -DNDEBUG -DNVCC -I/path/to/cuda/include -I/path/to/cuda-api-wrappers/src
/path/to/nvcc -gencode arch=compute_52,code=compute_52 --std=c++11 -Xcompiler -Wall -O3 -DNDEBUG -m64 -ccbin /opt/gcc-5.4.0/bin/gcc -dlink /export/path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/./vectorAdd_generated_vectorAdd.cu.o /path/to/cuda/lib64/libcublas_device.a -o /export/path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/./vectorAdd_intermediate_link.o
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
... this repeats many times ...
but the dlink face does conclude. This is already strange, since I haven't explicitly used any beta features.
/opt/gcc-5.4.0/bin/g++ -Wall -Wpedantic -O2 -DNDEBUG -L/path/to/cuda/lib64 -rdynamic CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/vectorAdd_generated_vectorAdd.cu.o CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o -o examples/bin/vectorAdd lib/libcuda-api-wrappers.a -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -lpthread -ldl -lrt -lnvToolsExt -Wl,-Bstatic -lcudadevrt -Wl,-Bdynamic
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_25_cublas_compute_70_cpp1_ii_f0559976':
link.stub:(.text+0xe0): undefined reference to `__fatbinwrap_25_cublas_compute_70_cpp1_ii_f0559976'
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_25_xerbla_compute_70_cpp1_ii_cd7f3ad3':
link.stub:(.text+0x190): undefined reference to `__fatbinwrap_25_xerbla_compute_70_cpp1_ii_cd7f3ad3'
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_23_nrm2_compute_70_cpp1_ii_8edbce95':
link.stub:(.text+0x240): undefined reference to `__fatbinwrap_23_nrm2_compute_70_cpp1_ii_8edbce95'
... more udnefined reference errors here ...
My question: Why would this happen and how do I circumvent/avoid/resolve it?
Notes:
I'm using separable compilation
I'm getting these specific errors with CUDA 9.1 and a SM 5.2 device (no 7.0).
The CMakeLists.txt is here.
I'm obviously clearing CMakeCache.txt before building.
This has happened to me both on a GNU/Linux Mint 18.3 and Fedora 26. On the first machine there have been some apt-get dist-upgrade's done, and now GCC is up to version 5.5.0, in case that matters. On the second machine - there really has been no change that I'm aware of; same compiler and CUDA version.
A partial answer / workaround:
This issue only seems to occur when libcublas is involved. If I remove /path/to/cuda/lib64/libcublas_device.a from the -dlink phase command-line, all warnings and errors go away (including from later stages). And in fact, my wrapper library is oblivious of cublas, not sure why CMake is adding it; it's not in $CUDA_LIBRARIES. See also:
Why does CMake force the use of libcublas with separable compilation?
I have Matlab code that uses sparse and '\' as solver for a linear system. I have hand tailored a C++ function that uses the conjage gradient sparse solver from Eigen in order to run the code outside Matlab using the coder toolbox to export the rest of the Matlab code. I export a static library and I'm able to compile and execute it on my remote system without any problem. However, I'm not able to run the code using multi-threading. I have tried to export it as a Matlab executable (mex) and the whole code runs in parallel without problem inside Matlab.
So my conclusions are that it must be something different in the compiler/linker flags on my remote system. I use -fopenmp in both complier and linker and I run the executable with OMP_NUM_THREADS=n and if I read out "n" inside my program I get the same number as I have in my execution.
My question is, do I need to include anything else in my compiler/linker, apart from needed things related to my particular code, in order to get Eigen to run multi-threaded?
UPDATE:
On the remote system I do:
g++ -c -m64 -fopenmp -std=c++11 -I /usr/local/include/Eigen/src/misc/
~/src/myHandTailoredFile.cpp -o ~/src/myHandTailoredFile.o
and with linker options
-fopenmp -L /usr/local/lib64/ -llapack -L /usr/local/lib/ -lcurl
To compile my hand tailored file together with myBigProgram into a Mex-file I do
g++ -DHAVE_LAPACK_CONFIG_H -DLAPACK_COMPLEX_STRUCTURE -DMW_HAVE_LAPACK_DECLS -c -ansi -fexceptions -fPIC -fno-omit-frame-pointer -pthread -D_GNU_SOURCE -DMATLAB_MEX_FILE -std=c++0x -fopenmp -DOMPLIBNAME="\"/usr/local/MATLAB/R2016a/sys/os/glnxa64/libiomp5.so\"" -O -DNDEBUG -I "/usr/local/MATLAB/R2016a/simulink/include" -I "/usr/local/MATLAB/R2016a/toolbox/shared/simtargets" -I "./interface" -I "/usr/local/lib" -I "/usr/local/MATLAB/R2016a/extern/include" -I "." "~/src/myHandTailoredFile.cpp"
with linker options set to
-pthread -Wl,--no-undefined -Wl,-rpath-link,/usr/local/MATLAB/R2016a/bin/glnxa64 -shared -L/usr/local/MATLAB/R2016a/bin/glnxa64 -lmx -lmex -lmat -lm -lstdc++ -lcurl -fPIC -L/usr/local/MATLAB/R2016a/sys/os/glnxa64 -liomp5 -o myBigProgram_mex.mexa64 -L"/usr/local/MATLAB/R2016a/bin/glnxa64" -lmwblas -lmwlapack -lemlrt -lcovrt -lut -lmwmathutil
Note that the compiler and linker options for the later is completely defined by Matlab.
I am on a complex project built in python2.7 that uses the PyDSTool package for analysis of dynamical system. PyDSTool provides two C-based integrators - Radau and Dopri - which I want to use to integrate my system of equations whose source is coded in a bunch of C/C++ files.
I have little control on the package, and when I instantiate the integrator, I can only add headers *.H files, source files (*.C, *.CPP) and pass the directories to include in the search path of the compiler as well as libraries to link to.
Since a consistent part of the code is based on C++11 I am passing to the compiler also the argument -std=C++11.
Eventually, /PyDSTool/Generators/mixins.py launch a setup command (line 185) which in turn runs the command build_ext from distutils to which all the above flags are appended.
For the sake of clarity: the flags that I am appending are:
compile options: '-I/usr/lib64/python2.7/site-packages/numpy/core/include -I/home/maurizio/Dropbox/StabilityAnalysis_tmp -I/usr/local/pydstool/PyDSTool/integrator -I/usr/include/python2_7 -I/usr/include/numpy -I/home/maurizio/Dropbox/Ongoing_Projects/pycustommodules -I/home/maurizio/Dropbox/Ongoing_Projects/c_libraries -I/home/maurizio/Dropbox/Ongoing_Projects/c_libraries/models -I/home/maurizio/Dropbox/Ongoing_Projects/DePitta_PNAS/Software/Stability_Analysis/ -I/usr/lib64/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -c'
extra options: '-std=c++11 -w -Wno-return-type -Wall -lpython2.7 -lm -lgsl -lgslcblas -D__DOPRI__'
The resulting compilation command as issued by PyDSTool reads:
error: Command "gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/lib64/python2.7/site-packages/numpy/core/include -I/home/maurizio/Dropbox/StabilityAnalysis_tmp -I/usr/local/pydstool/PyDSTool/integrator -I/usr/include/python2_7 -I/usr/include/numpy -I/home/maurizio/Dropbox/Ongoing_Projects/pycustommodules -I/home/maurizio/Dropbox/Ongoing_Projects/c_libraries -I/home/maurizio/Dropbox/Ongoing_Projects/c_libraries/models -I/home/maurizio/Dropbox/Ongoing_Projects/DePitta_PNAS/Software/Stability_Analysis/ -I/usr/lib64/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -c /home/maurizio/Dropbox/StabilityAnalysis_tmp/dop853_temp/ei_network_vf.c -o /home/maurizio/Dropbox/StabilityAnalysis_tmp/dop853_temp/home/maurizio/Dropbox/StabilityAnalysis_tmp/dop853_temp/ei_network_vf.o -std=c++11 -w -Wno-return-type -Wall -lpython2.7 -lm -lgsl -lgslcblas -D__DOPRI__" failed with exit status 1
Once looking into the build.log file automatically generated by PyDSTool, it turns out that the exit status is due to the fact that the compiler does not see the C++ libraries that are in several routines/libs used by my code, e.g.
/usr/include/blitz/blitz.h:45:18: fatal error: string: No such file or directory
#include <string>
^
Compilation Terminated
Now, it is not a problem of my code, because if I compile my code as a standalone in python or through scipy.weave with the same compile and extra options pasted above, it works. It is a problem of making PyDSTool build the code within the integrator. As I am NOT practical with distutils and all gcc options I hope there is some expert here that could provide me with some insight. I suspect in fact that I am missing some options or whatever to pass to the compiler.
Just for the sake of completeness. The issue I pointed out above does not have an easy workaround. PyDSTool C-based integrators (i.e. Radau and Dopri) cannot be compiled with source code for the equations in C++ but only in C. So either you recast your code in plain C or try to edit PyDSTool integrators and recast them in C++. The first option is likely the only one currently possible (at least to some non-experts as who is writing).
I have been writing a c++ program in Ubuntu and window8 using armadillo. Under Windows8 the program compiles without problems.
The program is just using the linear systems solver.
Under Ubuntu the compiler says
"reference to `wrapper_dgels_' not defined"
The compiler line I use is:
mpic++ -O2 -std=c++11 -Wall -fexceptions -O2 -larmadillo -llapack -lblas program.o
However, right before the error I see:
g++ module_of_the_error.o
Which is something I haven't set.
I am using code blocks in Ubuntu, and I compiled armadillo with all the libraries that cmake asked. (BLAS< LAPACK, OpenBLAS, HDF5, ARPACK, etc)
I have no clue what might be causing the problem, since the exact same code compiles in visual studio.I have tried the compiler line modifications suggested but it does not seem to work.
Any help is appreciated.
This is one trap I fell into myself one time. You will not like the likely cause of your error.
The order of the arguments to the linker matters.
Instead of
mpic++ -O2 -std=c++11 -Wall -fexceptions -O2 -larmadillo -llapack -lblas program.o
try:
mpic++ -O2 -std=c++11 -Wall -fexceptions -O2 program.o -larmadillo -llapack -lblas
I.e., put the object files to be linked into the executable before the libraries.
By the way, at this stage you are only linking files that have already been compiled. It is not necessary to repeat command line options that are only relevant for compiling. So this will be equivalent:
mpic++ program.o -larmadillo -llapack -lblas
Moreover, depending on how you installed Armadillo, you are adding either one or two superfluous libraries in that line. One of the following should be enough:
mpic++ program.o -larmadillo
or
mpic++ program.o -llapack -lblas
EDIT: as the answer by rerx states, the problem is probably just a simple ordering of the switches/arguments supplied to g++. All the -l switches need to be after the -o switch. Or in other words, put the -o switch before any -l switches. For example:
g++ prog.cpp -o prog -O3 -larmadillo
original answer:
Looks like your compiler can't find the Armadillo run-time library. The proper solution is to specify the path for armadillo run-time library using the -L switch. For example, g++ -O2 blah.cpp -o blah -L /usr/local/lib/ -larmadillo
Another possible solution is to define ARMA_DONT_USE_WRAPPER before including the armadillo header, and then directly link with LAPACK and BLAS. For example:
#define ARMA_DONT_USE_WRAPPER
#include <armadillo>
More details are available at the Armadillo frequently asked questions page.
I'm using RcppEigen to write some C++ functions for my R code, and I'd like to optimize their compilation as much as possible. When I've used Eigen in the past, I've gotten a significant boost from -O3 and -fopenmp. Following Dirk's advice, I edited ~/.R/Makevars so that my Eigen code would be compiled with these flags:
CPPFLAGS=-O3 -fopenmp
This works--when I check what's happening during compilation (ps ax | grep cpp) I see:
27097 pts/6 R+ 0:00 /usr/libexec/gcc/x86_64-redhat-linux/4.4.7/cc1plus -quiet -I/usr/include/R -I/home/sf/R/x86_64-redhat-linux-gnu-library/3.0/Rcpp/include -I/home/sf/R/x86_64-redhat-linux-gnu-library/3.0/RcppEigen/include -D_GNU_SOURCE -D_REENTRANT -DNDEBUG -D_FORTIFY_SOURCE=2 file69b757e053ad.cpp -quiet -dumpbase file69b757e053ad.cpp -m64 -mtune=generic -auxbase-strip file69b757e053ad.o -g -O3 -O2 -Wall -fopenmp -fpic -fexceptions -fstack-protector --param ssp-buffer-size=4 -o -
The flags I wanted are there, -O3 and -fopenmp. But I also see -O2 there, which is presumably the system-wide default (I verified this by removing ~/.R/Makevars and indeed, -O2 is there but -O3 and -fopenmp are not.)
So the question: how do I get rid of the -O2? Or, does it actually matter? The g++ man page says:
-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also
turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-
after-reload, -ftree-vectorize and -fipa-cp-clone options.
So maybe it's fine to have both -O2 and -O3?
I think you need CXXFLAGS not CPPFLAGS in your ~/.R/Makevars
I set Makevars in the following repo to benchmark various C++ compiler flags in R/Rcpp
https://github.com/jackwasey/optimization-comparison
I use a function from https://github.com/jimhester/covr to do that programmatically, if that's of use to you.
Also, did you see the following? R: C++ Optimization flag when using the inline package