gcc / C++ Disable generation of vex instructions - c++

We are debugging memory issues with our large legacy app and would like to use Valgrind to track it down. The app uses the ACE/TAO CORBA library however, Valgrind complains of illegal "vex" instructions in the library.
==29992== Memcheck, a memory error detector
==29992== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==29992== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==29992== Command: DvMain
==29992==
DvMain. Version 6.0 Build 38B16
vex x86->IR: unhandled instruction bytes: 0xC4 0xE2 0x7B 0xF7
==29992== valgrind: Unrecognised instruction at address 0x5f37a4b.
==29992== at 0x5F37A4B: ACE_Select_Reactor_Impl::bit_ops(int, unsigned long, ACE_Select_Reactor_Handle_Set&, int) (in /usr/local/dvstation/lib3p/ACE/libACE.so.6.2.7)
In another SO question, VTT suggested disabling AVX instructions with -mno-avx, which worked on some things. However, still have problems.
I've tried -mno-sse2avx -mno-avx -mno-sse4.1 -mno-sse4.2 -mno-sse4 -mno-sse4a but Valgrind still complains of vex instructions in ::bit_ops() (If you are interested, bit_ops is defined on line 956 of this file)
How do I disable completely the generation of VEX instructions so I can use Valgrind to debug?
Platform is 32-bit Centos 6, g++ 4.9.4
(please don't suggest moving to 64-bit. That's not an option with this product)
Reference:
Compile line for offending file:
/usr/local/gcc-4.9.4/bin/c++4.9 -mno-sse2avx -fvisibility=hidden
-fvisibility-inlines-hidden -fdiagnostics-color=auto
-mno-avx -mno-sse4.1 -mno-sse4.2 -mno-sse4 -mno-sse4a
-O3 -march=native -pthread -fno-strict-aliasing
-Wall -W -Wpointer-arith -pipe -D_GNU_SOURCE
-c -fPIC -o .shobj/Select_Reactor_Base.o Select_Reactor_Base.cpp

VEX is pretty new. Using an old architecture, e.g. -march=pentium4 will disallow VEX instruction coding, but you keep SSE2.

Perhaps you can use valgrind 3.12 from DTS instead, in the form of the devtoolset-6-valgrind package?
devtoolset-6
Support for AVX2 instructions was added in valgrind 3.9, so you might avoid recompiling your software.

VEX is the Valgrind abstract machine representation. It is a fundamental part of Valgrind and you cannot turn it off. You either need to tell the compiler to emit machine code that your version of Valgrind understands or else upgrade to a more recent version of Valgrind that understands AVX.
AVX dates from about 2011 whilst the version of Valgrind that you are using was released in September 2012 and it probably hadn't added AVX support. Confusingly, these extensions also use a "VEX" prefix. In this case the "vex x86->IR" message from Valgrind refers to Valgrind's VEX not the AVX VEX prefix.

Related

Using Valgrind on files made with make

I'm trying to use valgrind on a program which is the output of the make command using this makefile:
# Intlist makefile
P = intlist
C = g++
F = -m32 -g -O0 -Wall
O = IntListTest.o IntList.o
$(P): $(O)
$(C) $(F) -o $(P) $(O)
IntListTest.o: IntListTest.cpp IntList.h
$(C) $(F) -c IntListTest.cpp
IntList.o: IntList.cpp IntList.h
$(C) $(F) -c IntList.cpp
clean:
rm $(P) $(O)
When I run valgrind on intlist, it generates this:
==130929== Memcheck, a memory error detector
==130929== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==130929== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==130929== Command: intlist
==130929==
valgrind: Fatal error at startup: a function redirection
valgrind: which is mandatory for this platform-tool combination
valgrind: cannot be set up. Details of the redirection are:
valgrind:
valgrind: A must-be-redirected function
valgrind: whose name matches the pattern: strlen
valgrind: in an object with soname matching: ld-linux.so.2
valgrind: was not found whilst processing
valgrind: symbols from the object with soname: ld-linux.so.2
valgrind:
valgrind: Possible fixes: (1, short term): install glibc's debuginfo
valgrind: package on this machine. (2, longer term): ask the packagers
valgrind: for your Linux distribution to please in future ship a non-
valgrind: stripped ld.so (or whatever the dynamic linker .so is called)
valgrind: that exports the above-named function using the standard
valgrind: calling conventions for this platform. The package you need
valgrind: to install for fix (1) is called
valgrind:
valgrind: On Debian, Ubuntu: libc6-dbg
valgrind: On SuSE, openSuSE, Fedora, RHEL: glibc-debuginfo
valgrind:
valgrind: Note that if you are debugging a 32 bit process on a
valgrind: 64 bit system, you will need a corresponding 32 bit debuginfo
valgrind: package (e.g. libc6-dbg:i386).
valgrind:
valgrind: Cannot continue -- exiting now. Sorry.
I feel like there's something wrong with my makefile, but I'm not sure what it is. Any ideas are greatly appreciated. However, This takes place on a college-owned server where I don't have permission to install anything, so that isn't a solution in my case.
You are building a 32bit executable (the -m32 option in your compile/link lines) but from the output valgrind provides, you don't have all the support libraries available to run valgrind on a 32bit executable.
Do you really need your program to be 32bit? If not the simplest thing to do is remove the -m32 option and clean and rebuild everything.
If you really have to have a 32bit binary then read the output valgrind provides above carefully to determine what extra 32bit libraries you're missing and need to install.

GNU Fortran compiler optimisation flags for Ivy Bridge architecture

May I please ask for your suggestions on the GNU Fortran compiler (v6.3.0) flags to optimise the code for the Ivy Bridge architecture (Intel Xeon CPU E5-2697v2 Ivy Bridge # 2.7 GHz)?
At the moment I’m compiling the code with the following flags:
-O3 -march=ivybridge -mtune=ivybridge -ffast-math -mavx -m64 -w
Unless you use intrinsics specific to Ivy bridge, Sandy bridge flag os sufficient. I expect you should find some advantage by setting additionally -funroll-loops --param max-unroll-times=2
Sometimes -O2 -ftree-vectorize will work out better than -O3.
If you have complex data type you will want to check vs. -fno-cx-limited-range as the default of -ffast-math may be too aggressive.

g++ enables wrong flags at -Os

at the moment I am doing some experiments with the GNU C++-Compiler and the -Os optimization option for minimal code size. I checked the enabled compiler flags at -Os with the following command:
g++ -c -Q -Os --help=optimizers | grep "enabled"
I got this list of enabled options:
-faggressive-loop-optimizations [enabled]
-falign-functions [enabled]
-falign-jumps [enabled]
-falign-labels [enabled]
-falign-loops [enabled]
-fasynchronous-unwind-tables [enabled]
...
This seems a bit strange, because I also looked up, which flags should be enabled at -Os, here and under the -Os section it is written that all the falign- options should be disabled for code minimization.
Q: So is this a bug or am I doing something wrong here ? Cause after reading what the falign- flags do I really think they should be disabled in -Os !
My gcc-version is 4.9.2 and I am working on Arch-Linux.
Already thanks for helping :)
Q: So is this a bug or am I doing something wrong here ? Cause after reading what the falign- flags do I really think they should be disabled in -Os
I think Hans did a good job of finding part of the problem. Its definitely a documentation bug. But no one from GCC commented on why -Os enabled them, so you might not have all of the information.
Older ARM devices were very intolerant of unaligned accesses. Older arm devices included ARMv4 and I think ARMv5. If you performed an unaligned access, you would get a SIGBUS (been there, done that, got the tee shirt).
Modern ARM devices fix up unaligned accesses like x86 processors do, so you no longer get a SIGBUS. Instead, you just take the performance penalty.
You should try to specify an architecture in case those options are an artifact from older ARM device support. For example, -march=armv7. If you find it on ARMv6 and ARMv7, then that could still be a bug. It depends if the GCC team decided the tradeoff was sufficient for ARM (code size vs performance penalty).

GDB jumps to wrong lines in out of order fashion

Application Setup :
I've C++11 application consuming the following 3rd party libraries :
boost 1.51.0
cppnetlib 0.9.4
jsoncpp 0.5.0
The application code relies on several in-house shared objects, all of them developed by my team (classical link time against those shared objects is carried out, no usage of dlopen etc.)
I'm using GCC 4.6.2 and the issue appears when using GDB 7.4 and 7.6.
OS - Red Hat Linux release 7.0 (Guinness) x86-64
The issue
While hitting breakpoints within the shared objects code, and issuing gdb next command, sometimes GDB jumps backward to certain lines w/o any plausible reason (especially after exceptions are thrown, for those exceptions there suitable catch blocks)
Similar issues in the web are answered in something along the lines 'turn off any GCC optimization) but my GCC CL clearly doesn't use any optimization and asked to have debug information, pls note the -O0 & -g switches :
COLLECT_GCC_OPTIONS= '-D' '_DEBUG' '-O0' '-g' '-Wall' '-fmessage-length=0' '-v' '-fPIC' '-D' 'BOOST_ALL_DYN_LINK' '-D' 'BOOST_PARAMETER_MAX_ARITY=15' '-D' '_GLIBCXX_USE_NANOSLEEP' '-Wno-deprecated' '-std=c++0x' '-fvisibility=hidden' '-c' '-MMD' '-MP' '-MF' 'Debug_x64/AgentRegisterer.d' '-MT' 'Debug_x64/AgentRegisterer.d' '-MT' 'Debug_x64/AgentRegisterer.o' '-o' 'Debug_x64/AgentRegisterer.o' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
Please also note as per Linux DSO best known methods, we have hidden visibility of symbols, only classes we would like to expose are being exposed (maybe this is related ???)
What should be the next steps in root causing this issue ?
This sort of problem is usually GIGO -- gdb is just acting in the way that the compiler has instructed it to act. So, it's typically a compiler bug rather than a gdb bug. I've seen this happen even with -O0 compilations. The example that comes to mind is that some versions of g++ emitted the location of a variable's declaration when emitting a call to the variable's destructor. This lead to this odd jumping behavior in otherwise straight-line code.
I had a code producing wrong output, and when I tried to debug it with gdb, the lines were jumping arbitrarily. Finally I figured that it was not a gdb problem but a bug in g++: when -O3 was used the last line of a constructor was getting skipped. If I put a printf line after that line, the code would work fine! After changing CFLAGS from -O3 to -O0, the code gave the correct output. I was using c++11 with gcc-5.4.0
When I had a similar problem on the STM32L4R9I Evaluation board, I changed from compiling with -Os to -O0 and now it's working like a charm.
Be sure that you don't have any other file defining compilation flags.

Successful compilation of SSE instruction with qmake (but SSE2 is not recognized)

I'm trying to compile and run my code migrated from Unix to windows. My code is pure C++ and not using Qt classes. it is fine in Unix.
I'm also using Qt creator as an IDE and qmake.exe with -spec win32-g++ for compiling. As I have sse instructions within my code, I have to include emmintrin.h header.
I added:
QMAKE_FLAGS_RELEASE += -O3 -msse4.1 -mssse3 -msse3 -msse2 -msse
QMAKE_CXXFLAGS_RELEASE += -O3 -msse4.1 -mssse3 -msse3 -msse2 -msse
In the .pro file. I have been able to compile my code without errors. but after running it gives run-time error while going through some functions containing __m128 or like that.
When I open emmintrin.h, I see:
#ifndef __SSE2__
# error "SSE2 instruction set not enabled"
#else
and It is undefined after #else.
I don't know how to enable SSE in my computer.
Platform: Windows Vista
System type: 64-bit
Processor: intel(R) Core(TM) i5-2430M CPU # 2.40Hz
Does anyone know the solution?
Thanks in advance.
It sounds like your data is not 16 byte aligned, which is a requirement for SSE loads such as mm_load_ps. You can either:
use _mm_loadu_ps as a temporary workaround. On newer CPUs the performance hit for misaligned loads such as this is fairly small (on older CPUs it's much more significant), but it should still be avoided if possible
or
fix your memory alignment. On Windows/Visual Studio you can use the declspec(align(16)) attribute for static allocations or _aligned_malloc for dynamic allocations. For gcc and most other civilised platforms/compilers use __attribute__ ((align(16))) for the former and posix_memalign for the latter.