GCC 8 Cross Compiler outputs ARMv7 executable instead of ARMv6 - c++

I'm trying to compile a C++ application for a Raspberry Pi Zero using GCC 8.2.1.
I'm using this for a relatively large C++17 project that is being built using CMake, and I'm trying to cross-compile it on my x86-64 laptop.
Even with the simplest code possible, I'm not able to compile it for ARMv6:
int main() {}
$ arm-linux-gnueabihf-g++ test.cpp -static -march=armv6 -mfpu=vfp -mfloat-abi=hard
When running the file on the Pi, I get an Illegal instruction error, and readelf returns the following:
$ arm-linux-gnueabihf-readelf -A a.out
Attribute Section: aeabi
File Attributes
Tag_CPU_name: "7-A"
Tag_CPU_arch: v7
Tag_CPU_arch_profile: Application
Tag_ARM_ISA_use: Yes
Tag_THUMB_ISA_use: Thumb-2
Tag_FP_arch: VFPv3
Tag_Advanced_SIMD_arch: NEONv1
Tag_ABI_PCS_wchar_t: 4
Tag_ABI_FP_rounding: Needed
Tag_ABI_FP_denormal: Needed
Tag_ABI_FP_exceptions: Needed
Tag_ABI_FP_number_model: IEEE 754
Tag_ABI_align_needed: 8-byte
Tag_ABI_align_preserved: 8-byte, except leaf SP
Tag_ABI_enum_size: int
Tag_ABI_VFP_args: VFP registers
Tag_CPU_unaligned_access: v6
GCC seems to ignore my architecture flags.
When simply compiling it into an object file, it seems to work just fine, but the linking stage always uses ARMv7:
$ arm-linux-gnueabihf-g++ test.cpp -static -march=armv6 -mfpu=vfp -mfloat-abi=hard -c
$ arm-linux-gnueabihf-readelf -A test.o
Attribute Section: aeabi
File Attributes
Tag_CPU_name: "6"
Tag_CPU_arch: v6
Tag_ARM_ISA_use: Yes
Tag_THUMB_ISA_use: Thumb-1
Tag_FP_arch: VFPv2
Tag_ABI_PCS_wchar_t: 4
Tag_ABI_FP_denormal: Needed
Tag_ABI_FP_exceptions: Needed
Tag_ABI_FP_number_model: IEEE 754
Tag_ABI_align_needed: 8-byte
Tag_ABI_align_preserved: 8-byte, except leaf SP
Tag_ABI_enum_size: int
Tag_ABI_VFP_args: VFP registers
Tag_ABI_optimization_goals: Aggressive Debug
Tag_CPU_unaligned_access: v6
What am I doing wrong?

I ended up compiling GCC from source, following this post. I didn't need all of the steps (I compiled everything using GCC 8 instead of compiling GCC 6.3 first, and I didn't edit any source files.)
I posted a Dockerfile with all build steps on GitHub.
The architecture of the executables produced is correct now, but I can't test it on target yet to check if it actually runs.

By default, newer GCC versions do not create correct binaries for ARMv6. Even though you pass the correct -mcpu= flag to gcc, it will create startup code for the newer ARMv7 architecture. Running them on your RasPI Zero will cause an "Illegal Instruction" exception.
this info is mentioned here https://github.com/Pro/raspi-toolchain

Related

How to compile a 32-bit C++ code on a default 64-bit compiler [duplicate]

I'm trying to compile a 32-bit C application on Ubuntu Server 12.04 LTS 64-bit using gcc 4.8. I'm getting linker error messages about incompatible libraries and skipping -lgcc. What do I need to do to get 32 bit apps compiled and linked?
This is known to work on Ubuntu 16.04 through 22.04:
sudo apt install gcc-multilib g++-multilib
Then a minimal hello world:
main.c
#include <stdio.h>
int main(void) {
puts("hello world");
return 0;
}
compiles without warning with:
gcc -m32 -ggdb3 -O0 -pedantic-errors -std=c89 \
-Wall -Wextra -pedantic -o main.out main.c
And
./main.out
outputs:
hello world
And:
file main.out
says:
main.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=87c87a83878ce7e7d23b6236e4286bf1daf59033, not stripped
and:
qemu-i386 main.out
also gives:
hello world
but fails on an x86_64 executable with:
./main.out: Invalid ELF image for this architecture
Furthermore, I have:
run the compiled file in a 32 bit VM
compiled and run an IA-32 C driver + complex IA-32 code
So I think it works :-)
See also: Cannot find crtn.o, linking 32 bit code on 64 bit system
It is a shame that this package conflicts with the cross compilers like gcc-arm-linux-gnueabihf https://bugs.launchpad.net/ubuntu/+source/gcc-defaults/+bug/1300211
Running versions of the question:
https://unix.stackexchange.com/questions/12956/how-do-i-run-32-bit-programs-on-a-64-bit-debian-ubuntu
https://askubuntu.com/questions/454253/how-to-run-32-bit-app-in-ubuntu-64-bit
We are able to run 32-bit programs directly on 64-bit Ubuntu because the Ubuntu kernel is configured with:
CONFIG_IA32_EMULATION=y
according to:
grep CONFIG_IA32_EMULATION "/boot/config-$(uname -r)"
whose help on the kernel source tree reads:
Include code to run legacy 32-bit programs under a
64-bit kernel. You should likely turn this on, unless you're
100% sure that you don't have any 32-bit programs left.
This is in turn possible because x86 64 bit CPUs have a mode to run 32-bit programs that the Linux kernel uses.
TODO: what options does gcc-multilib get compiled differently than gcc?
To get Ubuntu Server 12.04 LTS 64-bit to compile gcc 4.8 32-bit programs, you'll need to do two things.
Make sure all the 32-bit gcc 4.8 development tools are completely installed:
sudo apt-get install lib32gcc-4.8-dev
Compile programs using the -m32 flag
gcc pgm.c -m32 -o pgm
Multiarch installation is supported by adding the architecture information to the package names you want to install (instead of installing these packages using alternative names, which might or might not be available).
See this answer for more information on (modern) multiarch installations.
In your case you'd be better off installing the 32bit gcc and libc:
sudo apt-get install libc6-dev:i386 gcc:i386
It will install the 32-bit libc development and gcc packages, and all depending packages (all 32bit versions), next to your 64-bit installation without breaking it.

g++ arm-none-eabi upgrade from 4.9 to gcc 8.2. Generated binary do not fit any more in flash

I recently updated my Linux laptop from Ubuntu 16.04 to 18.04.
I had a STM32 (Cortex-M4) Makefile based project that compiled correctly with the arm-none-eabi g++ version provided by Ubuntu. The generated file required 47620 bytes in the .text section.
With the Ubuntu upgrade, I have also installed an up-to-date version of gcc (from ARM website). Version is 8.2.1.
When I compile the same project (make clean && make), the generated binary do not fit in flash (97424 bytes required, more than twice!). The project is exactly the same (sources, link script, startup files, Makefile).
The compiler options are: -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -DSTM32F303x8 -DARMCM4 -O0 -g -Wall -fexceptions -Wno-deprecated.
The linker options are -mthumb -mcpu=cortex-m4 -Tstm32f303K8.ld -mfloat-abi=hard -mfpu=fpv4-sp-d16 --specs=nosys.specs -lm -Wl,--start-group -lm -Wl,--end-group -Wl,--gc-sections -Lsys -Xlinker -Map=test.elf.map
When I look at the .Map generated file, all the user functions take approximatively the same size (new version saves 8 bytes!). But after, it includes C++ sepcific parts, and one is more than 26Kb (from map file):
.text 0x00000000080079e8 0x683c /usr/local/gcc-arm-none-eabi-8-2018-q4-major/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libstdc++.a(cp-demangle.o)
0x000000000800e13c __cxa_demangle
Note: there is no problem with C only projects, only with C++. The library included are the same (gcc 4.9.3 -> armv7e-m/fpu, and gcc 8.2.1 -> thumb/v7e-m+fp/hard):
libm.a libstdc++.a libc.a libnosys.a libgcc.a
Is there a way to get rid of that so that I can compile and flash my (no so old) project?
regards,
I found a solution using the libstdc++_nano (instead of implicit libstc++). With that, the code size is reduced from 84kb to 26kb!
LDFLAGS += -lstdc++_nano
It just works. Thanks #Henrik, #Matthieu and #EOF for your support!
It might be related to exception handling, as std::terminate(), which is used with exceptions, might call the demangling routine. If you don't need exceptions then try disabling them with -fno-exceptions as described here.
Another solution might be to look at the GCC headers:
Demangling routine.
ABI-mandated entry point in the C++ runtime library for demangling.
[...]
returns a pointer to the start of the NUL-terminated demangled
name, or NULL if the demangling fails. The caller is
responsible for deallocating this memory using free.
The prototype is:
char*
__cxa_demangle(const char* __mangled_name, char* __output_buffer,
size_t* __length, int* __status);
So you could probably just supply your own dummy function returning NULL (Given that all library functions are weak, and can be overridden). I'll advise you to look at the disassembled code first though, and find out how and why it is being called in the first place, since it might change behaviour to just discard functionality).
They also give other advise in This forum post, which might be useful for you as well:
Optimize for size with -Os instead of -O0 (possibly add the -Og option instead, if you prefer easily debuggable code, it is often both smaller and faster than -O0).
Optimize at link-time with -flto while compiling and linking.
Maybe disable RTTI if not used.

Is sse2 enabled by default in g++?

When I run g++ -Q --help=target, I get
-msse2 [disabled].
However, if I create the assembly code of with default options as
g++ -g mycode.cpp -o mycode.o; objdump -S mycode.o > default,
and a sse2 version with
g++ -g -msse2 mycode.cpp -o mycode.sse2.o; objdump -S mycode.sse2.o > sse2,
and finally a non-sse2 version with
g++ -g -mno-sse2 mycode.cpp -o mycode.nosse2.o; objdump -S mycode.nosse2.o > nosse2
I see basically no difference between default and sse2, but a big difference between default and nosse2, so this tells me that, by default, g++ is using sse2 instructions, even though I am being told it is disabled ... what is going on here?
I am compiling on a Xeon E5-2680 under Linux with gcc-4.4.7 if it matters.
If you are compiling for 64bit, then this is totally fine and documented behavior.
As stated in the gcc docs the SSE instruction set is enabled by default when using an x86-64 compiler:
-mfpmath=unit
Generate floating point arithmetics for selected unit unit. The choices for unit are:
`387'
Use the standard 387 floating point coprocessor present majority of chips and emulated otherwise. Code compiled with this option will run almost everywhere. The temporary results are computed in 80bit precision instead of precision specified by the type resulting in slightly different results compared to most of other chips. See -ffloat-store for more detailed description.
This is the default choice for i386 compiler.
`sse'
Use scalar floating point instructions present in the SSE instruction set. This instruction set is supported by Pentium3 and newer chips, in the AMD line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE instruction set supports only single precision arithmetics, thus the double and extended precision arithmetics is still done using 387. Later version, present only in Pentium4 and the future AMD x86-64 chips supports double precision arithmetics too.
For the i386 compiler, you need to use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default.
The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80bit.
This is the default choice for the x86-64 compiler.

LLVM & Clang can't compile for a supported arch

Under Ubuntu 64 bit I got
llc --version
LLVM (http://llvm.org/):
LLVM version 3.1
Optimized build with assertions.
Built Oct 15 2012 (18:15:59).
Default target: x86_64-pc-linux-gnu
Host CPU: btver1
Registered Targets:
arm - ARM
mips - Mips
mips64 - Mips64 [experimental]
mips64el - Mips64el [experimental]
mipsel - Mipsel
thumb - Thumb
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
I can't do this
clang -march=arm -x c++ /tmp/cpp.cpp
error: unknown target CPU 'arm'
I'm missing something here ? Why I can't compile for ARM ?
-march is LLVM's internal tools command line option and is not connected with clang at all. If you need to compile for other target you need to specify the target triplet. This can be done in several ways (I do not remember offhand, whether they work with 3.1, but they definitely work with 3.2):
Make a link from clang to your-target-triple-clang, e.g. to
arm-none-linux-gnueabi-clang and compile everything via it
Provide -target option, e.g. clang -target arm-none-linux-gnueabi
To get a list of options of the clang compiler, use:
clang -cc1 -help
To specify the target, use -triple:
clang -cc1 -triple "arm-vendor-os" filename
where "vendor" and "os" should be replaced with the actual vendor and OS name. It can also be replaced with unknown.
-triple is a string of the form ARCHITECTURE-VENDOR-OS or ARCHITECTURE-VENDOR-OS-ENVIRONMENT. For example: x86_64-apple-darwin10
the llvm linker links for the host, which is only one of the targets, it wont link to every target in the list. it will definitely compile for any target. Basically clang goes from C/C++ to bytecode, then llc takes bytecode and makes assembly for the specific target (new experrimental option to take the bytecode straight to object file) then you need to get a cross assembler and a cross linker to take it the final mile (I use gnu binutils). Unfortunately I found that clang to bytecode is not completely generic (I had hoped and expected that it would be), it does in fact change the target independent output based on the target. The example below using the host triple instead of using -march allowed for my examples to build properly on more hosts.
ARMGNU?=arm-none-eabi
LOPS = -Wall -m32 -emit-llvm -ccc-host-triple $(ARMGNU)
OOPS = -std-compile-opts
LLCOPS = -march=thumb -mtriple=$(ARMGNU)
clang $(LOPS) -c blinker03.c -o blinker03.clang.bc
opt $(OOPS) blinker03.clang.bc -o blinker03.clang.thumb.opt.bc
llc $(LLCOPS) blinker03.clang.thumb.opt.bc -o blinker03.clang.thumb.opt.s
$(ARMGNU)-as blinker03.clang.thumb.opt.s -o blinker03.clang.thumb.opt.o
$(ARMGNU)-ld -o blinker03.clang.thumb.opt.elf -T memmap vectors.o blinker03.clang.thumb.opt.o
I have not, but before long will experiment with using the llc straight to object (actually I tried it on a simple test but have not used it on anything larger or posted it anywhere).
You're confusing your flags. clang's -march= wants a processor family. You probably meant to use clang -arch arm instead.
As this comment says this option it's not supported yet under linux, for now.
"-arch arm" is equivalent to "-arch armv4t" in clang. I suppose that a generic "arm" target is not allowed with "-march=", which should require something more precise, such as "armv6", "thumbv7", "armv4t", ...
Try selecting a specific subarch.
Starting Clang 11 (trunk), the list of supported target architectures could be handily printed using the newly added -print-targets flag.

how to port c/c++ applications to legacy linux kernel versions

Ok, this is just a bit of a fun exercise, but it can't be too hard compiling programmes for some older linux systems, or can it?
I have access to a couple of ancient systems all running linux and maybe it'd be interesting to see how they perform under load. Say as an example we want to do some linear algebra using Eigen which is a nice header-only library. Any chance to compile it on the target system?
user#ancient:~ $ uname -a
Linux local 2.2.16 #5 Sat Jul 8 20:36:25 MEST 2000 i586 unknown
user#ancient:~ $ gcc --version
egcs-2.91.66
Maybe not... So let's compile it on a current system. Below are my attempts, mainly failed ones. Any more ideas very welcome.
Compile with -m32 -march=i386
user#ancient:~ $ ./a.out
BUG IN DYNAMIC LINKER ld.so: dynamic-link.h: 53: elf_get_dynamic_info: Assertion `! "bad dynamic tag"' failed!
Compile with -m32 -march=i386 -static: Runs on all fairly recent kernel versions but fails if they are slightly older with the well known error message
user#ancient:~ $ ./a.out
FATAL: kernel too old
Segmentation fault
This is a glibc error which has a minimum kernel version it supports, e.g. kernel 2.6.4 on my system:
$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
statically linked, for GNU/Linux 2.6.4, not stripped
Compile glibc myself with support for the oldest kernel possible. This post describes it in more detail but essentially it goes like this
wget ftp://ftp.gnu.org/gnu/glibc/glibc-2.14.tar.bz2
tar -xjf glibc-2.14.tar.bz2
cd glibc-2.14
mkdir build; cd build
../configure --prefix=/usr/local/glibc_32 \
--enable-kernel=2.0.0 \
--with-cpu=i486 --host=i486-linux-gnu \
CC="gcc -m32 -march=i486" CXX="g++ -m32 -march=i486"
make -j 4
make intall
Not sure if the --with-cpu and --host options do anything, most important is to force the use of compiler flags -m32 -march=i486 for 32-bit builds (unfortunately -march=i386 bails out with errors after a while) and --enable-kernel=2.0.0 to make the library compatible with older kernels. Incidentially, during configure I got the warning
WARNING: minimum kernel version reset to 2.0.10
which is still acceptable, I suppose. For a list of things which change with different kernels see ./sysdeps/unix/sysv/linux/kernel-features.h.
Ok, so let's link against the newly compiled glibc library, slightly messy but here it goes:
$ export LIBC_PATH=/usr/local/glibc_32
$ export LIBC_FLAGS=-nostdlib -L${LIBC_PATH} \
${LIBC_PATH}/crt1.o ${LIBC_PATH}/crti.o \
-lm -lc -lgcc -lgcc_eh -lstdc++ -lc \
${LIBC_PATH}/crtn.o
$ g++ -m32 -static prog.o ${LIBC_FLAGS} -o prog
Since we're doing a static compile the link order is important and may well require some trial and error, but basically we learn from what options gcc gives to the linker:
$ g++ -m32 -static -Wl,-v file.o
Note, crtbeginT.o and crtend.o are also linked against which I didn't need for my programmes so I left them out. The output also includes a line like --start-group -lgcc -lgcc_eh -lc --end-group which indicates inter-dependence between the libraries, see this post. I just mentioned -lc twice in the gcc command line which also solves inter-dependence.
Right, the hard work has paid off and now I get
$ file ./prog
./prog: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
statically linked, for GNU/Linux 2.0.10, not stripped
Brilliant I thought, now try it on the old system:
user#ancient:~ $ ./prog
set_thread_area failed when setting up thread-local storage
Segmentation fault
This, again, is a glibc error message from ./nptl/sysdeps/i386/tls.h. I fail to understand the details and give up.
Compile on the new system g++ -c -m32 -march=i386 and link on the old. Wow, that actually works for C and simple C++ programmes (not using C++ objects), at least for the few I've tested. This is not too surprising as all I need from libc is printf (and maybe some maths) of which the interface hasn't changed but the interface to libstdc++ is very different now.
Setup a virtual box with an old linux system and gcc version 2.95. Then compile gcc version 4.x.x ... sorry, but too lazy for that right now ...
???
Have found the reason for the error message:
user#ancient $ ./prog
set_thread_area failed when setting up thread-local storage
Segmentation fault
It's because glibc makes a system call to a function which is only available since kernel 2.4.20. In a way it can be seen as a bug of glibc as it wrongly claims to be compatible with kernel 2.0.10 when it requires at least kernel 2.4.20.
The details:
./glibc-2.14/nptl/sysdeps/i386/tls.h
[...]
/* Install the TLS. */ \
asm volatile (TLS_LOAD_EBX \
"int $0x80\n\t" \
TLS_LOAD_EBX \
: "=a" (_result), "=m" (_segdescr.desc.entry_number) \
: "0" (__NR_set_thread_area), \
TLS_EBX_ARG (&_segdescr.desc), "m" (_segdescr.desc)); \
[...]
_result == 0 ? NULL \
: "set_thread_area failed when setting up thread-local storage\n"; })
[...]
The main thing here is, it calls the assembly function int 0x80 which is a system call to the linux kernel which decides what to do based on the value of eax, which is set to
__NR_set_thread_area in this case and is defined in
$ grep __NR_set_thread_area /usr/src/linux-2.4.20/include/asm-i386/unistd.h
#define __NR_set_thread_area 243
but not in any earlier kernel versions.
So the good news is that point "3. Compiling glibc with --enable-kernel=2.0.0" will probably produce executables which run on all linux kernels >= 2.4.20.
The only chance to make this work with older kernels would be to disable tls (thread-local storage) but which is not possible with glibc 2.14, despite the fact it is offered as a configure option.
The reason you can't compile it on the original system likely has nothing to do with kernel version (it could, but 2.2 isn't generally old enough for that to be a stumbling block for most code). The problem is that the toolchain is ancient (at the very least, the compiler). However, nothing stops you from building a newer version of G++ with the egcs that is installed. You may also encounter problems with glibc once you've done that, but you should at least get that far.
What you should do will look something like this:
Build latest GCC with egcs
Rebuild latest GCC with the gcc you just built
Build latest binutils and ld with your new compiler
Now you have a well-built modern compiler and (most of a) toolchain with which to build your sample application. If luck is not on your side you may also need to build a newer version of glibc, but this is your problem - the toolchain - not the kernel.