I'm new to nvcc and I've seen a library where compilation is done with option -O3, for g++ and nvcc.
CC=g++
CFLAGS=--std=c++11 -O3
NVCC=nvcc
NVCCFLAGS=--std=c++11 -arch sm_20 -O3
What is -O3 doing ?
It's optimization on level 3, basically a shortcut for
several other options related to speed optimization etc. (see link below).
I can't find any documentation on it.
... it is one of the best known options:
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#options-for-altering-compiler-linker-behavior
Related
I know the way to use cmake to link openmp in a cross-platform way
find_package(OpenMP REQUIRED)
link_libraries(OpenMP::OpenMP_CXX)
But I don't know how to force cmake to static link openmp, in fact, all of cmake official variable about openmp library is all dynamic.
Anyway, The non-cross-platform way to do so is:
clang++ -std=c++2a test.cpp -Iinclude -march=native -O3 -c
clang++ test.o -o test.x /usr/local/lib/libomp.a -pthread
or if you use gcc
g++-10 -std=c++2a test.cpp -Iinclude -march=native -O3 -c
g++-10 test.o -o test.x /usr/local/opt/gcc/lib/gcc/10/libgomp.a -pthread
By the way, is it a cmake defect or is there any other way to accomplish it
Not an answer but too much to fit into a comment.
I don't know anything about OpenMP other than cmake does support it:
https://cmake.org/cmake/help/latest/module/FindOpenMP.html?highlight=openmp.
I don't see any of the documentation referring to static/shared. Maybe you are correct and it only support shared libs.
Double check by asking the official make discourse:
https://discourse.cmake.org/
You could also try reading the official FindOpenMP.cmake module since this is all open source.
EDIT:
If you are correct cmake is lacking this functionality consider contributing and adding it :)
I want to get the vectorisation report regrading automated vectorisation and openmp SIMD.
# gcc
-fopenmp-simd -O3 -ffast-math -march=native -fopt-info-omp-vec-optimized-missed
# clang
-fopenmp-simd -O3 -ffast-math -march=native -Rpass="loop|vect" -Rpass-missed="loop|vect" -Rpass-analysis="loop|vect"
# icc on Linux
-qopenmp-simd -O3 -ffast-math -march=native -qopt-report-file=stdout -qopt-report-format=vs -qopt-report=5 -qopt-report-phase=loop,vec
# msvc
-openmp -O2 -fp:fast -arch:AVX2 -Qvec-report:2
I don't think Apple's flavor of clang supports OpenMP (At least not by default on macOS).
You may find ways to extend it though.
I was trying some vectorisation after upgrading g++ from version 4.8.5 to 5.4.1. With this flags:
g++ particles-v3.cpp -o v3 -O3 -msse4.2 -mfpmath=sse -ftree-vectorizer-verbose=5 -ffast-math -m32 -march=native -std=c++11
While the same command gives over 4000 lines of detailed information about the vectorization with g++-4.8, with g++-5.4 it does not say anything.
Is there some major change in g++-5 that makes the -ftree-vectorizer-verbose=X unusable, or is there simply somethin wrong in the line? How to change it so that it works?
EDIT:
found out that using -fopt-info-vec-all gives exacty the info I wanted. Thus question solved.
I'm using RcppEigen to write some C++ functions for my R code, and I'd like to optimize their compilation as much as possible. When I've used Eigen in the past, I've gotten a significant boost from -O3 and -fopenmp. Following Dirk's advice, I edited ~/.R/Makevars so that my Eigen code would be compiled with these flags:
CPPFLAGS=-O3 -fopenmp
This works--when I check what's happening during compilation (ps ax | grep cpp) I see:
27097 pts/6 R+ 0:00 /usr/libexec/gcc/x86_64-redhat-linux/4.4.7/cc1plus -quiet -I/usr/include/R -I/home/sf/R/x86_64-redhat-linux-gnu-library/3.0/Rcpp/include -I/home/sf/R/x86_64-redhat-linux-gnu-library/3.0/RcppEigen/include -D_GNU_SOURCE -D_REENTRANT -DNDEBUG -D_FORTIFY_SOURCE=2 file69b757e053ad.cpp -quiet -dumpbase file69b757e053ad.cpp -m64 -mtune=generic -auxbase-strip file69b757e053ad.o -g -O3 -O2 -Wall -fopenmp -fpic -fexceptions -fstack-protector --param ssp-buffer-size=4 -o -
The flags I wanted are there, -O3 and -fopenmp. But I also see -O2 there, which is presumably the system-wide default (I verified this by removing ~/.R/Makevars and indeed, -O2 is there but -O3 and -fopenmp are not.)
So the question: how do I get rid of the -O2? Or, does it actually matter? The g++ man page says:
-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also
turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-
after-reload, -ftree-vectorize and -fipa-cp-clone options.
So maybe it's fine to have both -O2 and -O3?
I think you need CXXFLAGS not CPPFLAGS in your ~/.R/Makevars
I set Makevars in the following repo to benchmark various C++ compiler flags in R/Rcpp
https://github.com/jackwasey/optimization-comparison
I use a function from https://github.com/jimhester/covr to do that programmatically, if that's of use to you.
Also, did you see the following? R: C++ Optimization flag when using the inline package
We are trying to implement a jit compiler whose performance is supposed to be same as doing it with clang -o4. Is there a place where I could easily get the list of optimization passes invoked by clang with -o4 is specified?
As far as I know -O4 means same thing as -O3 + enabled LTO (Link Time Optimization).
See the folloing code fragments:
Tools.cpp // Manually translate -O to -O2 and -O4 to -O3;
Driver.cpp // Check for -O4.
Also see here:
You can produce bitcode files from clang using -emit-llvm or -flto, or the -O4 flag which is synonymous with -O3 -flto.
For optimizations used with -O3 flag see this PassManagerBuilder.cpp file (look for OptLevel variable - it will have value 3).
Note that as of LLVM version 5.1 -O4 no longer implies link time optimization. If you want that you need to pass -flto. See Xcode 5 Release Notes.