GCC ARM Performance drop - c++

I stumbled upon very strange issue with GCC. The issue is 25% drop in performance. Here is the story.
I have a pice of software which is fp32 compute intensive (neural networks compiled with TVM). I compile it for ARM (rk3399 device), here is info:
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/5/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-armhf/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-armhf --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-armhf --with-arch-directory=arm --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --disable-werror --enable-multilib --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12)
uname -a
Linux FriendlyELEC 4.4.143 #1 SMP Tue Nov 20 11:10:11 CST 2018 aarch64 aarch64 aarch64 GNU/Linux
lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Thread(s) per core: 1
Core(s) per socket: 3
Socket(s): 2
Model name: ARMv8 Processor rev 2 (v8l)
CPU max MHz: 1800.0000
CPU min MHz: 408.0000
Hypervisor vendor: horizontal
Virtualization type: full
The code was initially "slow" and cpp11, I decided to try cpp17 and cpp14. cpp17 was not supported, but cpp14 was. I switched to cpp14 and voila I got boost around 25% in performance. I really tested it to make sure the boost is in fact real and not a measuring mistake. I had this boost for a week then my device rebooted and the boost in performance was gone!
It may sound crazy, but I'm very sure in my code and measurements I had. I didn't have explicit compile flags prior to this gimmick. Now I'm trying to figure out compile flags for GCC to reclaim what was lost, but I don't have much experience with GCC. What could be the issue here? What flags can affect performance that much?
the code uses .so files, compiled with use of llvm and gcc
llvm -device=arm_cpu -target=armv8l-linux-gnueabihf -mattr=+neon,fp-armv8

"What flags can affect performance that much?"
Turning on the optimizer -O1, -O2 or -O3 can have a dramatic effect (default is unoptimized build -O0).
Enabling link time optimization -flto can also often give significant improvements.
See also the manual for more.

It's not GCC fault. It's CPU frequency scaling problem. I had device with ARM with Linux (ubuntu) on board, strange behavior and different benchmarking results are due to strange cpu frequency governing by OS.

Related

Specify thread model on ubuntu mingw (if possible)

Im running a program on linux with g++ and everything works fine. If i compile using x86_64-w64-mingw32-g++ it crashes with error: ‘std::thread’ has not been declared.
The part where my code crashed is fairly simple:
#include <thread>
int main()
{
int cores = std::thread::hardware_concurrency();
}
I figured out that mingw uses win32 threads, i want it to use posix threads. Version is:
:~$ x86_64-w64-mingw32-g++ -v
Using built-in specs.
COLLECT_GCC=x86_64-w64-mingw32-g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-w64-mingw32/9.3-win32/lto-wrapper
Target: x86_64-w64-mingw32
Configured with: ../../src/configure --build=x86_64-linux-gnu --prefix=/usr --includedir='/usr/include' --mandir='/usr/share/man' --infodir='/usr/share/info' --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir='/usr/lib/x86_64-linux-gnu' --libexecdir='/usr/lib/x86_64-linux-gnu' --disable-maintainer-mode --disable-dependency-tracking --prefix=/usr --enable-shared --enable-static --disable-multilib --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --libdir=/usr/lib --enable-libstdcxx-time=yes --with-tune=generic --with-headers=/usr/x86_64-w64-mingw32/include --enable-version-specific-runtime-libs --enable-fully-dynamic-string --enable-libgomp --enable-languages=c,c++,fortran,objc,obj-c++,ada --enable-lto --enable-threads=win32 --program-suffix=-win32 --program-prefix=x86_64-w64-mingw32- --target=x86_64-w64-mingw32 --with-as=/usr/bin/x86_64-w64-mingw32-as --with-ld=/usr/bin/x86_64-w64-mingw32-ld --enable-libatomic --enable-libstdcxx-filesystem-ts=yes --enable-dependency-tracking
Thread model: win32
gcc version 9.3-win32 20200320 (GCC)
As you can see thread model is win32, (how) can i change it? In the mingw include paths there is a 9.3-posix folder, but it seems it isnt used so far.
As you can see here you can:
I'm installing mingw-w64 on Windows and there are two options: win32 threads and posix threads.
Is that possible on linux aswell?
It seems that the package g++-mingw-w64-x86-64 provides two executables:
x86_64-w64-mingw32-g++-posix: which uses posix
x86_64-w64-mingw32-g++-win32: which uses win32
Using the first executable should solve your issue.

Why gdb complain to me that No Source Available (with g++ -ggdb3)

For some reason I reinstalled my OS(manjaro linux) yesterday, and installed gcc and gdb by pacman, then I write a very small example program to make sure my environment is correct, because I am a beginner in linux and gdb. After compile,it went according to my expectations. Then I wanted to take a look at the STL source code through gdb, so I opened my gdb with tui mode. Everything seemed normal at first, but my gdb can't step into ss.insert() that I want to see, with [ No Source Available ] on the top side of the screen. And I've found the directory of the source file of operator<< is different to the version that before reinstall OS, and /usr/shared/c++/ doesn't exist anymore!
This is the version information of GCC(g++) and gdb
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --with-isl --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-install-libiberty --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-werror gdc_include_dir=/usr/include/dlang/gdc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.0 (GCC)
GNU gdb (GDB) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Why gdb complain to me that No Source Available (with g++ -ggdb3)
Because you are stopped inside libstdc++, which was built in a temporary directory (here /build/gcc/src/gcc-build/...) and that directory is not present on your machine.
It is exceedingly unlikely that you actually need to look at the source of operator<<(), but if you really do want to do that, install GCC sources, and use (gdb) directory to point GDB to the relevant sources.

gcc takes much memory to compile c++ file with very large object on stack

I have a c++ file which uses a std::bitset<size>. When compiling using gcc on windows subsystem linux, if size = 1000000000 (1Gbit) cause about 1.6GB compile-time memory using, size = 10000000000 (10Gbit) cause about ~6GB memory and ~15GB virtual memory (my PC has 8GB memory in total). The memory is allocated gradually and the compilation finishes immediately after the maximum.
The program runs into Segmentation fault (core dumped) as soon as launching if size is large. The turning point is between 10M and 100M.
On MSVC, the program compiles and runs good if size is small. For larger size an exception is thrown: "Stack overflow". If size is really large it gives error "total size of array must not exceed 0x7fffffff bytes" in file bitset
The problem is irrelevant to optimization level. No matter -O0, -O1,
-O2 or O3, it is the same.
this is the output of gcc -v:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.3.0-27ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)
This is my test code, very simple
#include <bitset>
int main(){
const size_t size = ...;
std::bitset<size> bs;
return 0;
}
If gcc-8 is used instead of gcc-7 there is no such issue, the compilation finishes quick, and the program runs into segmentation fault if size is large,as it should be.
If I use vector<bool> or create the bitset using new,it runs good.
So there is no problem to solve but my question is:
Why gcc-7 takes so much memory (and time) to compile the file?
This was a known bug in GCC, having to do with constexpr initialisation of very large objects.
There are several bug reports, one of which is bug 56671, reported in March, 2013 against v4.7.2. The bug was marked resolved in June, 2018, which means that it was present through to v7.3, at least, but has now been corrected.
If you need a workaround, it appears that using a different constructor did not trigger the bug. In particular, changing std::bitset<size> bs; to std::bitset<size> bs(0); compiles fine on various versions of gcc, including v.7.3.0 (according to the online compiler I tried).

Gfortran does not have module Funits while ifort does

Once O tried to compile Fortran code that relies on funits. while it is not available in gfortran 4.9.3.
use FUNITS
1
Fatal Error: Can't open module file 'funits.mod' for reading at (1): No such file or directory
here is gfortran information
gfortran -v
Using built-in specs.
COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/4.9.3/gfortran
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.3/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-4.9.3/work/gcc-4.9.3/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.9.3 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.9.3 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.9.3/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.9.3/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/include/g++-v4 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.9.3/python --enable-languages=c,c++,java,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.9.3 p1.5, pie-0.6.4' --enable-libstdcxx-time --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-altivec --disable-fixed-point --enable-targets=all --enable-libgomp --disable-libmudflap --disable-libssp --disable-libcilkrts --enable-vtable-verify --enable-libvtv --enable-lto --without-cloog --enable-libsanitizer
Thread model: posix
gcc version 4.9.3 (Gentoo 4.9.3 p1.5, pie-0.6.4)
Instead ifort version 14.0.3 has it pre-installed.
How can I install funits with gfortran as we need to rely on the free software?
FUNITS is a module. There is no such module in standard Fortran. It must be some external library, not an intrinsic module.
It is also not part of Intel Fortran although you claim it to be. It is not part of my Intel Fortran installation and it is not mentioned anywhere in the manual.
You have to find where does it come from, who supplies this library and compile it for gfortran yourself.
Or find out what it does and implement that functionality yourself.
A simple web search found 2 modules called FUNITS on the internet. Both just declare some file unit number constants. I have no idea whether you need one of them or not https://mfix.netl.doe.gov/develop/STI/source_profile_2015-2/mfix_und/MFIX_html/9899.html http://home.chpc.utah.edu/~u0703457/STILT_tutorial/WRF_STILT_code/STILT_updated/funits.f Probably it will be better to search your hard drive.

Strange behavior with std::nth_element

I'm trying to find the median of vectors of (x,y) points using nth_element
cv::Point2f medOffset;
vector<float> tempOffsetsX = offsetsX;
int medLoc = tempOffsetsX.size()/2;
nth_element(tempOffsetsX.begin(), tempOffsetsX.begin()+medLoc, tempOffsetsX.end());
// sort(tempOffsetsX.begin(), tempOffsetsX.end());
medOffset.x = tempOffsetsX[medLoc];
vector<float> tempOffsetsY = offsetsY;
****** debug out line 1 *********
nth_element(tempOffsetsY.begin(), tempOffsetsY.begin()+medLoc, tempOffsetsY.end());
// sort(tempOffsetsY.begin(), tempOffsetsY.end());
medOffset.y = tempOffsetsY[medLoc];
****** debug out line 2 *********
tempOffsetsX is working just fine but, occasionally, tempOffsetsY is giving very strange results after nth_element. Here is sample output at the marked debug lines
tempOffsetsY1: 5.184135 -1.564125 3.751759 0.221855 -0.742348 1.737648
tempOffsetsY2: -0.742348 -1.564125 -8885092352.000000 -8850636800.000000 0.000000 0.000000
The results are pretty repeatable until I recompile, at which point the specifics change the the general problem remains. Clearly the vector is getting corrupted somehow but I can't think of how.
Also, if I use sort instead of nth_element it works without problem. For debugging, I tried doing a sort and then nth_element which worked just fine. So somehow the reordering that happens inside nth_element is getting messed up but I can't think of how.
Any ideas how this is happening?
edit - More info about my environment. I'm running Arch Linux. I just did a system update. I should note that this same code did work without problem before the update, and this is the first time I've ran it after the update. But that's a gap of several days and I'm always hesitant to point at system libraries for what is usually my own problem.
[]$ uname -r
3.11.6-1-ARCH
[]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /build/gcc/src/gcc-4.8.2/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --enable-gnu-unique-object --enable-linker-build-id --enable-cloog-backend=isl --disable-cloog-version-check --enable-lto --enable-gold --enable-ld=default --enable-plugin --with-plugin-ld=ld.gold --with-linker-hash-style=gnu --disable-install-libiberty --disable-multilib --disable-libssp --disable-werror --enable-checking=release
Thread model: posix
gcc version 4.8.2 (GCC)
[]$ pacman -Qi glibc
Name : glibc
Version : 2.18-8
....
Quite likely you're running into a recent bug in libstdc++, which broke the nth_element function. This has since been fixed, but some Linux releases shipped with the broken version (e.g. Ubuntu 13.10)
A patch and discussion of the bug can be found on the GCC tracker: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58800