When I use gcc to compile a C++ program to a 32 bit and I run it through gdb. When I disassemble the main function the gdb reads out the memory addresses EXAMPLE: 0x585583d0 and in other peoples examples of 32 bit it reads out 0x080483d0. Im using Kali linux and am wondering if its just because its a different distribution or am I missing some C libraries?
am wondering if its just because its a different distribution or am I missing some C libraries?
This is because you built a position independent executable, while other people didn't.
The default load address for non-PIE binaries on 32-bit x86 systems is 0x08048000. The default load address for PIE binaries under GDB is somewhere in the 0x5855.... region (it can be very random outside of GDB; if you set disable-randomization off, you'll observe that the executable starts "jumping around" to different addresses).
Some newer distributions default to building PIE binaries. You can avoid this with:
gcc -no-pie main.c
The resulting binary should now start around 0x08048xxx.
You can check whether you have a PIE binary or not with file a.out -- it will say executable for non-PIE binary, and shared library for a PIE binary. See also this answer.
Related
I'm trying to debug a segfault in Android's surfaceflinger daemon on a custom made ARM board. The process crashes before dumping the call stack and register content, including the program counter.
Normally I would've used objdump and searched for the program counter. The problem is that part of the call stack is in a shared library. Without using gdb, how can I correlate the program counter with a line in the source file? That is, can the addresses of shared library instructions be determined without running the program?
The simplest solution is to load core dump into gdb and use info symbol <program counter address>, see https://stackoverflow.com/a/7648883/72178.
You can also use addr2line but you will have to provide library starting address in parameters to addr2line, see How to map function address to function in *.so files.
You need your program (and all the relevant shared libraries) to be compiled with debug information (in DWARF format), e.g. by passing some -g (or -g2 or -g3) flag to the GCC compiler when they are built. Notice that with GCC such a debugging option can be mixed with optimization options like -O2
Then you might use utilities like addr2line, or perhaps libraries like libbacktrace. FWIW, the GCC compiler itself (actually its cc1plus) uses that libbacktrace library to print a useful backtrace on SIGSEGV and other terminating signals (on compiler crashes).
BTW, you could (and probably should) enable core(5) dumping and do a post mortem analysis of it with gdb
Notice that due to ASLR, a shared library is loaded (actually mmap(2)-ed) at some "random" page.
Read Drepper's How to Write Shared Libraries paper.
I have a C++/C application which needs to be compiled as a 32 bit application (as there are certain third-party libraries only available for 32 bit). However, the compilation as well as the execution will happen on CentOS 6.4 x86_64 machine.
I am using gnu autotools for building. After doing a lot of googling, finally figured a sets of options to give to ./configure to create 32 bit executables/shared objects. Set the LD_LIBRARY_PATH to search in /lib, /usr/lib/, /usr/lib/gcc/... instead of /lib64, ... Verified that all the generated .so and executable are 32 bit by using file command.
But I get the error: "undefined symbol: _ZL22__gthrw_pthread_cancelm" if I run the executable.
Any clues?
It seems you forgot to link to pthreads with -lpthread.
GCC adds a layer of abstraction over pthreads and this abstraction use weak symbols, so you can build your executable without link error but fail at runtime.
Is there a 32bits pthread library on your target host? If not, I guess you need to get one installed. Also inspect the output of ldd <my-program> on your target host, this might help you find out what is missing.
I was recently fighting some problems trying to compile an open source library on my Mac that depended on another library and got some errors about incompatible library architectures. Can somebody explain the concept behind compiling a C program for a specific architecture? I have seen the -arch compiler flag before and have seen values passed to it such as ppc, i386 and x86_64 which I assume maps to the CPU "language", but my understanding stops there. If one program uses a particular architecture, do all libraries that it loads need to be on the same architecture as well? How can I tell what architecture a given program/process is running under?
Can somebody explain the concept behind compiling a C program for a specific architecture?
Yes. The idea is to translate C to a sequence of native machine instructions, which have the program coded into binary form. The meaning of "architecture" here is "instruction-set architecture", which is how the instructions are coded in binary. For example, every architecture has its own way of coding for an instruction that adds two integers.
The reason to compile to machine instructions is that they run very, very fast.
If one program uses a particular architecture, do all libraries that it loads need to be on the same architecture as well?
Yes. (Exceptions exist but they are rare.)
How can I tell what architecture a given program/process is running under?
If a process is running on your hardware, it is running on the native architecture which on Unix you can discover by running the command uname -m, although for the human reader the output from uname -a may be more informative.
If you have an executable binary or a shared library (.so file), you can discover its architecture using the file command:
% file /lib/libm-2.10.2.so
/lib/libm-2.10.2.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
% file /bin/ls
/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, stripped
You can see that these binaries have been compiled for the very old 80386 architecture, even though my hardware is a more modern i686. The i686 (Pentium Pro) is backward compatible with 80386 and runs 80386 binaries as well as native binaries. To make this backward compatibility possible, Intel went to a great deal of trouble and expense—but they practically cornered the market on desktop CPUs, so it was worth it!
One thing that may be confusing here is that the Mac platform has what they call a universal binary, which is really two binaries in one archive, one for intel and the other for ppc architecture. Your computer will automatically decide which one to run. You can (sometimes) run a binary for another architecture in an emulation mode, and some architectures are supersets of others (ie. i386 code will usually run on a i486, i586, i686, etc.) but for the most part the only code you can run is code for your processor's architecture.
For cross compiling, not only the program, but all the libraries it uses, need to be compatible with the target processor. Sometimes this means having a second compiler installed, sometimes it is just a question of having the right extra module for the compiler availible. The cross compiler for gcc is actually a seperate executable, though it can sometimes be accessed via a command line switch. The gcc cross compilers for various architectures are most likely separate installs.
To build for a different architecture than the native of your CPU, you will need a cross-compiler, which means that the code generated cannot run natively on the machine your sitting on. GCC can do this fine. To find out which architecture a program is built for check out the file command. In Linux-based systems at least, a 32-bit x86 program will require 32-bit x86 libs to go along with it. I guess it's the same for most OSes.
Does ldd help in this case?
According to the LSB scanner, my binary is supposedly incompatible with a specific version of Linux because it uses GBLICXX_3.4.9 symbols. But when I tried to run the binary myself on that version, everything seems to work fine...
Can a binary even start on a Linux distro if that distro is missing the runtime libraries containing the required symbols?
I don't know if I've understood well the question but as far as I know even though you have compiled your program with a modern glibc does not necessarily mean that you won't be able to execute into an older version. The next Linux command:
objdump -T "your exe or lib file" | grep GLIB
will show you which version of the glibc the symbols of your program belong to.
For further information there is a paper called How to write shared libraries by Ulrich Drepper that explains a lot of things of how symbols work in linux not only for shared libraries but also for executables
I suspect they're warning you that you're using symbols that, even if they are available on your test system, may not be available on all LSB-compliant systems.
I know the '-fPIC' option has something to do with resolving addresses and independence between individual modules, but I'm not sure what it really means. Can you explain?
PIC stands for Position Independent Code.
To quote man gcc:
If supported for the target machine, emit position-independent code, suitable for dynamic linking and avoiding any limit on the size of the global offset table. This option makes a difference on AArch64, m68k, PowerPC and SPARC.
Use this when building shared objects (*.so) on those mentioned architectures.
The f is the gcc prefix for options that "control the interface conventions used
in code generation"
The PIC stands for "Position Independent Code", it is a specialization of the fpic for m68K and SPARC.
Edit: After reading page 11 of the document referenced by 0x6adb015, and the comment by coryan, I made a few changes:
This option only makes sense for shared libraries and you're telling the OS you're using a Global Offset Table, GOT. This means all your address references are relative to the GOT, and the code can be shared accross multiple processes.
Otherwise, without this option, the loader would have to modify all the offsets itself.
Needless to say, we almost always use -fpic/PIC.
man gcc says:
-fpic
Generate position-independent code (PIC) suitable for use in a shared
library, if supported for the target machine. Such code accesses all
constant addresses through a global offset table (GOT). The dynamic
loader resolves the GOT entries when the program starts (the dynamic
loader is not part of GCC; it is part of the operating system). If
the GOT size for the linked executable exceeds a machine-specific
maximum size, you get an error message from the linker indicating
that -fpic does not work; in that case, recompile with -fPIC instead.
(These maximums are 8k on the SPARC and 32k on the m68k and RS/6000.
The 386 has no such limit.)
Position-independent code requires special support, and therefore
works only on certain machines. For the 386, GCC supports PIC for
System V but not for the Sun 386i. Code generated for the
IBM RS/6000 is always position-independent.
-fPIC
If supported for the target machine, emit position-independent code,
suitable for dynamic linking and avoiding any limit on the size of
the global offset table. This option makes a difference on the m68k
and the SPARC.
Position-independent code requires special support, and therefore
works only on certain machines.