I have some questions regarding debugging symbols and what can be done with them, besides, well, debugging. I'm mostly interested in answers regarding GCC, but I'd also be happy to know how it looks like under other compilers, including MSVC.
First of all:
What are the common formats/types of debugging symbols?
How do they relate to compilers and platforms? Is it always the same format on GCC and MinGW among platforms?
Can I check in runtime whether the build has them and what format are they in?
And some more practical questions... How can I:
Check the current file and line number?
Obtain the (qualified) function name being executed?
Obtain a full current stack trace?
Let me emphasize that I'm talking about run-time checks. All of those can be read and pretty-printed by GDB, but I don't know how much info comes from the debugging symbols themselves and how much from the source code which GDB also has access to.
Maybe there's a library which is able to parse the debugging symbols and yield such information?
Are the debugging symbols standardised well enough that I can expect some degree of portability for such solutions?
What are the common formats/types of debugging symbols?
DWARF and STABS (those are embedded inside executable, in special sections), Program Database (PDB; external file, used by MSVC).
How do they relate to compilers and platforms? Is it always the same format on GCC and MinGW among platforms?
GCC uses DWARF/STABS (I think it's a GCC compile-time option) both on Linux (ELF) and Windows (PE), don't know about others. MSVC always uses PDB.
Can I check in runtime whether the build has them and what format are they in?
You can parse the executable image and see if there are sections with debugging info (see STABS documentation and DWARF specs). PDB files are distributed either with executables or via symbol servers (so if you don't want to go online, check if there is X.pdb for X.exe/X.dll).
About how to read and use those symbols — I don't know about DWARF/STABS (there's probably something around GNU binutils that can locate and extract those), but for PDB your best bet is to use dbghelp — its usage is pretty well documented and there are a lot of examples available on the net. There's also DIA SDK that can be used to query PDB files.
Are the debugging symbols standardised well enough that I can expect some degree of portability for such solutions?
DWARF has a formal specification, and it's complicated as hell. PDB AFAIK is not documented, but dbghelp/DIA are, and are the recommended way.
Related
I am developing 8051 firmware in a project and have to use IAR as the toolchain. The build system is CMake. I cannot use the IAR IDE.
In order to optimize my source code I want to look at the disassembly of the resulting binary, preferably with labels. Is there any way to make xlink output something that I can analyse? I know that the IAR IDE debugger has a debug view, but I cannot use the IDE. It seems xlink can output a lot of file formats, but which ones allow to extract debug information on command line?
The linker is commonly not the tool to produce assembly listings. It generally only knows of bytes, sections, references to be resolved, addresses to be located, output formats to generate, and so on. Only few linkers know how to patch compiler/assembler generated listings with the final addresses.
So you would need to look at the tool that knows about machine code, the compiler or the assembler. Unfortunately IAR does not seem to provide a disassembler.
A quick web research on "iar 8051 compiler option" revealed this compiler guide. And another quick look in the table of contents led to the chapter "Descriptions of compiler options", which describes among others the option -l. The guide says:
Use this option to generate an assembler or C/C++ listing to a file. Note that this option can be used one or more times on the command line.
This should be enough to see the generated assembly. If you want the allocated addresses after linking, the linker can provide that.
Anyway, as a last resort you would like to check out decent disassemblers like Ghidra.
I am learning gdb to make life little easier. I searched Google for this but couldn't find the answer. Though I learned that $1 is a gdb variable that can be used later anywhere in the debugging. But for my question there is no info.
Code:
if (pthread_create (&mythread[i], NULL, (void *)threadFunction,(void *) i))
{
printf ("\nerror creating thread");
exit (1);
}
My question is - can I see the definition of threadFunction or pthread_create function assuming gdb knows it.
GDB uses debug information, often in DWARF format, which contains names -e.g. of local variables-, source locations (source path, line number), etc.... That debug information exists for code compiled with -g (passed to GCC compiler). So you can get source location information for every code compiled with -g (and you could recompile yourself with -g the libraries you are using). Most of the time that DWARF info sits in the same shared library or object file (in ELF format) but there is some way to keep it in a separate file.
pthread_create is a C standard library POSIX function (it is using system calls such as clone(2); see syscalls(2) for an exhaustive list). Read carefully the pthread_create(3) documentation. On Linux most C standard libraries (notably GNU glibc and musl-libc) are free software, so you can study their source code. You may need to find a debug variant of your libc, e.g. some libc-dbg Debian package.
can I see the definition of threadFunction
You can use the list command of GDB. So try list threadFunction (assuming that the source code containing its definition was compiled with -g).
... or of pthread_create
The definition of pthread_create (provided by your C standard library) would appear only if you use a libc compiled with debug info. It may be faster to browse its source code (e.g. src/thread/pthread_create.c of
musl-libc) than to recompile your entire libc with debug info.
To understand the behavior of pthread_create you may want to understand clone(2), but it is better to trust the documentation in pthread_create(3). Most of it (of clone) is implemented inside the Linux kernel, which is also free software (downloadable on kernel.org). You could spend many years in understanding all the details (but ask also on kernelnewbies.org after having begin to study some kernel code).
Many Linux distributions are mostly made of free software, totalizing more than ten billions lines of source code. You surely need more than a lifetime to study most of them. Abstraction is practically essential in software development (so choose what details you are willing to forget). Read also about undefined behavior and about leaky abstractions.
Read also Advanced Linux Programming (a bit old) and Operating Systems : Three Easy Pieces. Both are freely downloadable. https://computing.llnl.gov/tutorials/pthreads/ is a good tutorial on Posix Threads Programming.
If I have a C++ source file, gcc can give all its dependencies, in a tree structure, using the -H option. But given only the C++ executable, is it possible to find all libraries and header files that went into its compilation and linking?
If you've compiled the executable with debugging symbols, then yes, you can use the symbols to get the files.
If you have .pdb files (Visual studio creates them to store sebugging information separately) you can use all kinds of programs to open them and see the source files and methods.
You can even open it with a text editor and you'll see, among the gibrish, a list of the functions and source files.
If you're using linux (or GNU compilers in general), you can use gdb (again only if you have debug symbols enables in compilation time).
Run gdb on your executable, then run the command: info sources
That's an important reason why you should always remove that flag when going into production. You don't want clients to mess around with your sources, functions, and code.
You cannot do that, because that executable might have been build on a machine on which the header files (or the C++ code, or the libraries) are private or even generated. Also, if a static library is linked in, you have no reliable way to find out.
In practice however, on Linux, using nm or objdump or ldd on the executable will often (but not always) gives you a good clue about the needed libraries.
Also, some executables are dynamically loading a plugin e.g. using dlopen, so your question might not have any sense (since that plugin is known only at runtime).
Notice also that you might not know if an executable is obtained by compiling some C++ code (you might not be able to tell if it was obtained from C, C++, D, or Ocaml, ... source code, or a mixture of them).
On Linux, if you build an executable with static linking and stripping, people won't be able to easily guess the source programming language that you have used.
BTW, on Linux distributions, it is the role of the package management system to deal with such dependencies.
As answered by Yochai Timmer if the executable contains debug information (e.g. in DWARF format) you should be able to get a lot more information.
At a customer place a third party software has crashed. The process and the libraries are stripped (no symbols), the call stack does not give any useful information. All that I have is registers which may not be corrupted. This third party code has been written is C.
Now, I have used gdb till now to debug simpler issues. But this one is a bit complicated. I think register and raw stack information may be used to corelate where the crash occurred and I require help on this aspect.
It may not be possible to deploy a non-stripped binary at customer site, nor would it be possible to do inhouse crash reproduction. Also, I am not familiar with this third party code.
Also I require pointers/sites/documents for the following:
1) ELF and various section headers.
2) How to create a symbol file (during compilation) for a library and a process.
3) How to tell gdb to read symbols from a symbol file.
One thing we should be able to do is to open you core file against a non-stripped/with-symbols version of your process. As long as the compilation process (compiler, optimization flags, etc.) is the same and you just keep all these debugging information, GDB should be able to provide you with all the information you can expect from a core.
gdb [options] executable-file core-file
To compile your process with the debugging information (symbols and dwarfs for lines, types, ...), you need to add -g in your compiler flags. The same applies for your custom libraries.
For the system libraries, it might be conviant sometime (not always), modern Linux distributions (at least Fedora) directly provide them to gdb.
From a compiled file, can I see which compiler has been used to generate the file?
There's also the good old 'strings' utility. Dumps all ascii-ish looking strings it finds in the binary. Different compilers embed different amounts of information in the binaries they produce, but many will actually include obviously identifying strings.
Many compilers/linkers insert a .comment section in the output file which identifies them. There are also a number of more subtle behaviors you could create compiler fingerprints off of, but I know of no existing tools for this.
If you have the source, the easiest solution would be to try compiling with each compiler in question until you get a binary that matches byte-for-byte (or even closely).
In some cases you can run ldd on the binary and find out which standard library it's linked against. For example on Solaris gcc vs Sun CC vs whatever.
For C++ code, you can also dump some of the symbols, find a mangled function name, and then figure out which demangler generates the correct original name.
Try, IDA Pro which identifies the libraries and tools used to build up executable file.
I was answering a quiz in a Blue Team website, and this was a question. I found the solution using a tool called PE Detective, he looks for signatures on the EXE, works really fine
https://www.softpedia.com/get/System/File-Management/PE-Detective.shtml