I was trying some vectorisation after upgrading g++ from version 4.8.5 to 5.4.1. With this flags:
g++ particles-v3.cpp -o v3 -O3 -msse4.2 -mfpmath=sse -ftree-vectorizer-verbose=5 -ffast-math -m32 -march=native -std=c++11
While the same command gives over 4000 lines of detailed information about the vectorization with g++-4.8, with g++-5.4 it does not say anything.
Is there some major change in g++-5 that makes the -ftree-vectorizer-verbose=X unusable, or is there simply somethin wrong in the line? How to change it so that it works?
EDIT:
found out that using -fopt-info-vec-all gives exacty the info I wanted. Thus question solved.
Related
I'm trying to compile C code on a Jetson Nano and I get this error during compiling. I tried removing any occurrence of 'm -64' but it seems like its added automatically. This is the cmd where it fails: /usr/bin/gcc-7 -Wall -Wextra -Wconversion -pedantic -Wshadow -m64 -Wfatal-errors -O0 -g -o CMakeFiles/dir/testCCompiler.c.o -c /home/user/dir/CMakeFiles/CMakeTmp/testCCompiler.c
uname -a: Linux jetson-nano 4.9.140-tegra aarch64 aarch64 aarch64 GNU/Linux
gcc-7 -v: Using built-in specs.
COLLECT_GCC=gcc-7
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/7/lto-wrapper
Target: aarch64-linux-gnu
gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)
CMAKE_C_COMPILER = gcc-7
CMAKE_CXX_COMPILER = g++-7
CXX_COMPILE_FLAGS = "-Wall -Werror -Wextra -Wnon-virtual-dtor -Wconversion -Wold-style-cast -pedantic -Wshadow"
C_COMPILE_FLAGS = "-Wall -Wextra -Wconversion -pedantic -Wshadow"
gcc-7: error: unrecognized command line option ‘-m64’
error: unrecognized command line option ‘-m64’
I believe you are looking for -march=armv8-a (and friends), and not -m64. The GCC arm64 options are available at 3.18.1 AArch64 Options in the manual.
Aarch64 includes ASIMD in the base specification, so there are no extra gyrations needed for it. ASIMD is "Advanced SIMD instructions", and it is what ARM calls NEON on the Aarch32 and Aarch64 architectures.
If you want to enable extensions, like CRC or Crypto, then the option would look like -march=armv8.1-a+crc or -march=armv8.1-a+crypto or -march=armv8.1-a+crc+crypto.
The equivalent x86 options would be the following. Obviously, the ARM port of GCC does not use the same model as x86. It is confusing for new users (or it was confusing for me).
-march=armv8-a → -msse2
-march=armv8.1-a+crc → -msse2 -msse4.1
-march=armv8.1-a+crypto → -msse2 -mpclmul -maes
-march=armv8.1-a+crc+crypto → -msse2 -msse4.1 -mpclmul -maes
ARM instruction set includes SHA in crypto, so the x86 options should probably include -msha. The problem is, x86 SHA did not arrive until about 8 years after carryless multiplies and AES.
Also, GCC ARM compilers usually don't understand -march=native. On older GCC compilers, the compiler will just crash. On mid-ranged GCC it is simply ignored. I believe the latest GCC compilers honor it.
This error often happens when cross-compiling with Rust/Cargo, because Cargo isn't smart enough to find cross-build tools by itself.
You need to set appropriate env vars. In the example replace x86_64_unknown_linux_gnu with your target, and paths to your cross-build paths (the example is for Debian). Watch out the env vars are case-sensitive and inconsistently named!
# for the cc crate
export HOST_CC=gcc
export CC_x86_64_unknown_linux_gnu=/usr/bin/x86_64-linux-gnu-gcc
# for Cargo
export CARGO_TARGET_X86_64-UNKNOWN-LINUX-GNU_LINKER=/usr/bin/x86_64-linux-gnu-gcc
I want to get the vectorisation report regrading automated vectorisation and openmp SIMD.
# gcc
-fopenmp-simd -O3 -ffast-math -march=native -fopt-info-omp-vec-optimized-missed
# clang
-fopenmp-simd -O3 -ffast-math -march=native -Rpass="loop|vect" -Rpass-missed="loop|vect" -Rpass-analysis="loop|vect"
# icc on Linux
-qopenmp-simd -O3 -ffast-math -march=native -qopt-report-file=stdout -qopt-report-format=vs -qopt-report=5 -qopt-report-phase=loop,vec
# msvc
-openmp -O2 -fp:fast -arch:AVX2 -Qvec-report:2
I don't think Apple's flavor of clang supports OpenMP (At least not by default on macOS).
You may find ways to extend it though.
I'm new to nvcc and I've seen a library where compilation is done with option -O3, for g++ and nvcc.
CC=g++
CFLAGS=--std=c++11 -O3
NVCC=nvcc
NVCCFLAGS=--std=c++11 -arch sm_20 -O3
What is -O3 doing ?
It's optimization on level 3, basically a shortcut for
several other options related to speed optimization etc. (see link below).
I can't find any documentation on it.
... it is one of the best known options:
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#options-for-altering-compiler-linker-behavior
I'm using RcppEigen to write some C++ functions for my R code, and I'd like to optimize their compilation as much as possible. When I've used Eigen in the past, I've gotten a significant boost from -O3 and -fopenmp. Following Dirk's advice, I edited ~/.R/Makevars so that my Eigen code would be compiled with these flags:
CPPFLAGS=-O3 -fopenmp
This works--when I check what's happening during compilation (ps ax | grep cpp) I see:
27097 pts/6 R+ 0:00 /usr/libexec/gcc/x86_64-redhat-linux/4.4.7/cc1plus -quiet -I/usr/include/R -I/home/sf/R/x86_64-redhat-linux-gnu-library/3.0/Rcpp/include -I/home/sf/R/x86_64-redhat-linux-gnu-library/3.0/RcppEigen/include -D_GNU_SOURCE -D_REENTRANT -DNDEBUG -D_FORTIFY_SOURCE=2 file69b757e053ad.cpp -quiet -dumpbase file69b757e053ad.cpp -m64 -mtune=generic -auxbase-strip file69b757e053ad.o -g -O3 -O2 -Wall -fopenmp -fpic -fexceptions -fstack-protector --param ssp-buffer-size=4 -o -
The flags I wanted are there, -O3 and -fopenmp. But I also see -O2 there, which is presumably the system-wide default (I verified this by removing ~/.R/Makevars and indeed, -O2 is there but -O3 and -fopenmp are not.)
So the question: how do I get rid of the -O2? Or, does it actually matter? The g++ man page says:
-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also
turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-
after-reload, -ftree-vectorize and -fipa-cp-clone options.
So maybe it's fine to have both -O2 and -O3?
I think you need CXXFLAGS not CPPFLAGS in your ~/.R/Makevars
I set Makevars in the following repo to benchmark various C++ compiler flags in R/Rcpp
https://github.com/jackwasey/optimization-comparison
I use a function from https://github.com/jimhester/covr to do that programmatically, if that's of use to you.
Also, did you see the following? R: C++ Optimization flag when using the inline package
I've been looking around and I can only seem to find the option to omit the frame pointer.
Here are the compile flags that I'm using right now.
-Wall -Wextra -Wconversion -Wctor-dtor-privacy -Wnon-virtual-dtor -Wold-style-cast -g
Can someone tell me how to enable the frame pointer?
Note I'm using g++ version 4.6.6 on RHEL 6.2
Any optimization level (-O, -O2, -O3), enable -fomit-frame-pointer, but you can revert it with -fno-omit-frame-pointer. You can find more details here: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html