how to find all objects (class objects/structs) of a C++ executable

how to find all objects (class objects/structs) of a C++ executable - c++

Is there a way, maybe using nm, or gdb, that will let me create a list of all the object types that an executable contains?
To clarify, I have the source code. I need a method for figuring out all the class/struct sizes that are used at runtime. So this is probably a two part problem:
create a list of all classes/structs
use sizeof() on each of the items on the list, in gdb.

"Types" aren't a property of machine code. They're a property of a high-level, abstract language, which is compiled into machine code. Unless the compiler makes specific arrangements for you to recover information about the source program, type information generally doesn't exist at all.

http://www.hex-rays.com/products/ida/index.shtml : DeCompiler for C++
You will usually not get good C++ out of a binary unless you compiled in debugging information. Prepare to spend a lot of manual labor reversing the code.
If you didn't strip the binaries there is some hope as IDA Pro can produce C-alike code for you to work with.

It's easy to get a list of types from gdb. You just want info types and then ptype if you want to drill down into the type (limiting it to types matching a string just to keep this small):
(gdb) info types Q
All types matching regular expression "Q":
File foo.cpp:
Qq;
(gdb) ptype Qq
type = class Qq {
private:
int qx;
public:
Qq(int);
std::__cxx11::string something(std::__cxx11::list<int, std::allocator<int> >);
int getQ(void);
}
And sizeof tells you how big the structure is (of course, it's the structure itself, so this may or may not be all that useful):
(gdb) p sizeof(Qq)
$1 = 4
(gdb)
You'll probably want to run gdb in a script and parse the output somehow.

Related

can we make GDB display non trivial arguments in backtrace

Currently GDB prints only trivial arguments in backtrace (only scalars); something like below
(gdb) bt 1
(gdb) function1(this=this#entry=0xfff6c20, x1=-1, x2=3, x3=...
and so on. x3 here could be a array/STL vector and by default GDB does not display it.
I am using lot of STL vectors and Blitz arrays in my code.
I have routines in .gdbinit file to display STL vectors, and subroutines in c++ where I can make use of call functionality in GDB, which can display the array contents. To manually print the vector/array contents, I would use
(gdb) printVector vector_name -> this is a routine in my .gdbinit
(gdb) call printBlitzArray(array_name) -> this is a routine inside my executable itself.
How can we make GDB display the non trivial arguments of a function like below.
void myFunc(int x1, int x2, std::vector<int> x3, blitz::Array<bool, 1> x4)
I got to know using set print frame-arguments all can display some of the non trivial arguments.
But how to really print arguments where GDB may not have a native support for printing them.
The intent is to automatically print all the arguments at the start of the function (atleast whichever we can).
I can write a GDB script and add prints individually for each vector/array, but doing this for every function would be very time consuming, since I have a large number of functions. This would help a lot to accelerate my debug.
Any suggestion is highly appreciated.
Thanks a lot in advance !

I've just tested this on my own machine, use -rdynamic when compiling.
-rdynamic flag basically makes an additional copy for all of your symbols (not just dynamic symbols or externally dependant) to the dynamic symbol table of your executable, thus allowing them to be loaded into your memory during runtime of the program and not simply used by your linker as some metadata, this provides any backtracing mechanism the fully name-mangled symbol and allowing it to be parsed into your original function (without the actual names of function parameters, just types), hope this helps! :)

How to avoid namespace prefix for symbols in GDB?

I'm working with a C++ library. The library uses several namespaces. When debugging, I have to prefix every symbol name with the namespace prefix. It causes a lot of extra work and typing.
C++ has the using namespace X concept to make symbols available with more ease (lots of hand waiving). I'm looking for similar in GDB. For example, instead of b MyLibNamespace::Foo::bar, I want to b Foo::bar.
GDB does not appear to have help related to namespaces, but I'm probably doing something wrong:
(gdb) help namespace
Undefined command: "namespace". Try "help".
(gdb) namespace help
Undefined command: "namespace". Try "help".
How do I tell GDB to use a namespace prefix so I don't have to provide it for every symbol name?

How do I tell GDB to use a namespace prefix so I don't have to provide it for every symbol name?
There doesn't appear to be any such support in current GDB (as of 2017-08-13).
You can probably implement it using Python scripting to define a new command. Documentation.
Beware, this is entirely non-trivial proposition.

How do I tell GDB to use a namespace prefix so I don't have to provide
it for every symbol name?
You might consider a work-around...
I have (on occasion) added one or more (C++) functions to my class definitions file. (.cc), but they are not part of the class(s).
They are not part of the application, and are harmlessly removed when you are done with them.
They generally 'dump' info (with names d1(), d2(), etc.)
But they can also do practically any thing useful for your debugging effort, Usually, it is not the case that you thought of this specific test effort ahead of time.
So, your edit/compile/link iteration is simply: stop gdb, open the file, add a useful function, line, and resume gdb. Keep this 'diagnostic' code simple. Hopefully the result is ultimately time saving.
I can find no examples (in my files) at the moment. I suppose I discard these functions quickly once I've overcome a particular challenge.
Anyway ... this demo worked just a few minutes ago ...
When working in gdb near my class Foo_t, part of namespace DTB, etc. the d1 I've created knows how to access a particular instance of Foo_t (in some convenient way), and, can easily dump the current state of the instance using a Foo method to do so. Perhaps d1 can look like this:
void d1() { objDer.f("xxx"); } // a derived instance,
// the class has a long complex name.
Now, in gdb, run to a breakpoint somewhere when that instance exists, and is initialized, and use gdb print command to run d1 ...
(gdb) p d1()
that is a short gdb command to get at the instance and run a method.

Make BFD library find the location of a class member function

I am using the function bfd_find_nearest_line to find the source location of a function (from an executable with debugging symbols --compiled with -g). Naturally one of the arguments is a pointer to the function I want to locate:
boolean
_bfd_elf_find_nearest_line (abfd,
section,
symbols,
offset,
filename_ptr,
functionname_ptr, // <- HERE!
line_ptr)
https://sourceware.org/ml/binutils/2000-08/msg00248.html
After quite a bit of (pure C) boiler plate, I managed this to work with normal functions (where the normal function pointer is casted to *void).
For example, this works:
int my_function(){return 5;}
int main(){
_bfd_elf_find_nearest_line (...,
(void*)(&my_function),
...);
}
The question is if bfd_find_nearest_line can be used to locate the source code of a class member function.
struct A{
int my_member_function(){return 5.;}
};
_bfd_elf_find_nearest_line (...,
what_should_I_put_here??,
...)
Class member function (in this case if type int (A::*)()) are not functions, an in particular cannot be cast to any function pointer, not even to void*. See here: https://isocpp.org/wiki/faq/pointers-to-members#cant-cvt-memfnptr-to-voidptr
I completely understand the logic behind this, how ever the member-function pointer is the only handle from which I have information of a member function in order to make BFD identify the function. I don't want this pointer to call a function.
I know more or less how C++ works, the compiler will generate silently an equivalent free-C function,
__A_my_member_function(A* this){...}
But I don't know how to access the address of this free function or if that is even possible,and whether the bfd library will be able to locate the source location of the original my_member_function via this pointer.
(For the moment at least I am not interested in virtual functions.)
In other words,
1) I need to know if bfd will be able to locate a member function,
2) and if it can how can I map the member function pointer of type int (A::*)() to an argument that bfd can take (void*).
I know by other means (stack trace) that the pointer exists, for example I can get that the free function is called in this case _ZN1A18my_member_functionEv, but the problem is how I can get this from &(A::my_member_function).

Okay, there's good news and bad news.
The good news: It is possible.
The bad news: It's not straight forward.
You'll need the c++filt utility.
And, some way to read the symbol table of your executable, such as readelf. If you can enumerate the [mangled] symbols with a bfd_* call, you may be able to save yourself a step.
Also, here is a biggie: You'll need the c++ name of your symbol in a text string. So, for &(A::my_member_function), you'll need it in a form: "A::my_member_function()" This shouldn't be too difficult since I presume you have a limited number of them that you care about.
You'll need to get a list of symbols and their addresses from readelf -s <executable>. Be prepared to parse this output. You'll need to decode the hex address from the string to get its binary value.
These will be the mangled names. For each symbol, do c++filt -n mangled_name and capture the output (i.e. a pipe) into something (e.g. nice_name). It will give you back the demangled name (i.e. the nice c++ name you'd like).
Now, if nice_name matches "A:my_member_function()", you now have a match, you already have the mangled name, but, more importantly, the hex address of the symbol. Feed this hex value [suitably cast] to bfd where you were stuffing functionname_ptr
Note: The above works but can be slow with repeated invocations of c++filt
A faster way is to do this is to capture the piped output of:
readelf -s <executable> | c++filt
It's also [probably] easier to do it this way since you only have to parse the filtered output and look for the matching nice name.
Also, if you had multiple symbols that you cared about, you could get all the addresses in a single invocation.

Ok, I found a way. First, I discovered that bfd is pretty happy detecting member functions debug information from member pointers, as long as the pointer can be converted to void*.
I was using clang which wouldn't allow me to cast the member function pointer to any kind of pointer or integer.
GCC allows to do this but emits a warning.
There is even a flag to allow pointer to member cast called -Wno-pmf-conversions.
With that information in mind I did my best to convert a member function pointer into void* and I ended up doing this using unions.
struct A{
int my_member_function(){return 5.;}
};
union void_caster_t{
int (A::*p)(void) value;
void* casted_value;
};
void_caster_t void_caster = {&A::my_member_function};
_bfd_elf_find_nearest_line (...,
void_caster.casted_value,
...)
Finally bfd is able to give me debug information of a member function.
What I didn't figure out yet, is how to get the pointer to the constructor and the destructor member functions.
For example
void_caster_t void_caster = {&A::~A};
Gives compiler error: "you can't take the address of the destructor".
For the constructor I wasn't even able to find the correct syntax, since this fails as a syntax error.
void_caster_t void_caster = {&A::A};
Again all the logic behind not being able involves non-sensical callbacks, but this is different because I want the pointer (or address) to get debug information, not callbacks.

does cdb/windbg have an equivalent to autoexp.dat?

I'd like to change the way some types are displayed using either 'dt' or '??' in a manner similar to how you can do that with autoexp.dat. Is there a way to do this?
For example, I have a structure something like this:
struct Foo
{
union Bar
{
int a;
void *p;
} b;
};
And I've got an array of a few hundred of these, all of which I know point to a structure Bar. Is there any way to tell cdb that, in this expression anyway, that 'p' is a pointer to Bar? This is the kind of thing you could do with autoexp. (The concrete example here is that I've got a stashtable that can have keys of any type, but I know they keys are strings. the implementation stores them as void pointers).
Thanks in advance!

I don't think there's anything as simple as autoexp.dat.
You have a couple potential options - you can write a simple script file with the debugger commands to dump the data structure in the way you want and use the "$<filename" command (or one of its variants). Combined with user aliases you can get this to be pretty easy and natural to use.
The second option is quite a bit more involved, but with it comes much more power - write an extension DLL that dumps your data structure. For something like what you're talking about this is probably overkill. But you have immense power with debugger extensions (in fact, much of the power that comes in the Debugging tools package is implemented this way). The SDK is packaged with the debugger, so it's easy to determine if this is what you might need.

You can say du or da to have it dump memory as unicode or ascii strings.

Does an arbitrary instruction pointer reside in a specific function?

I have a very difficult problem I'm trying to solve: Let's say I have an arbitrary instruction pointer. I need to find out if that instruction pointer resides in a specific function (let's call it "Foo").
One approach to this would be to try to find the start and ending bounds of the function and see if the IP resides in it. The starting bound is easy to find:
void *start = &Foo;
The problem is, I don't know how to get the ending address of the function (or how "long" the function is, in bytes of assembly).
Does anyone have any ideas how you would get the "length" of a function, or a completely different way of doing this?
Let's assume that there is no SEH or C++ exception handling in the function. Also note that I am on a win32 platform, and have full access to the win32 api.

This won't work. You're presuming functions are contigous in memory and that one address will map to one function. The optimizer has a lot of leeway here and can move code from functions around the image.
If you have PDB files, you can use something like the dbghelp or DIA API's to figure this out. For instance, SymFromAddr. There may be some ambiguity here as a single address can map to multiple functions.
I've seen code that tries to do this before with something like:
#pragma optimize("", off)
void Foo()
{
}
void FooEnd()
{
}
#pragma optimize("", on)
And then FooEnd-Foo was used to compute the length of function Foo. This approach is incredibly error prone and still makes a lot of assumptions about exactly how the code is generated.

Look at the *.map file which can optionally be generated by the linker when it links the program, or at the program's debug (*.pdb) file.

OK, I haven't done assembly in about 15 years. Back then, I didn't do very much. Also, it was 680x0 asm. BUT...
Don't you just need to put a label before and after the function, take their addresses, subtract them for the function length, and then just compare the IP? I've seen the former done. The latter seems obvious.
If you're doing this in C, look first for debugging support --- ChrisW is spot on with map files, but also see if your C compiler's standard library provides anything for this low-level stuff -- most compilers provide tools for analysing the stack etc., for instance, even though it's not standard. Otherwise, try just using inline assembly, or wrapping the C function with an assembly file and a empty wrapper function with those labels.

The most simple solution is maintaining a state variable:
volatile int FOO_is_running = 0;
int Foo( int par ){
FOO_is_running = 1;
/* do the work */
FOO_is_running = 0;
return 0;
}

Here's how I do it, but it's using gcc/gdb.
$ gdb ImageWithSymbols
gdb> info line * 0xYourEIPhere
Edit: Formatting is giving me fits. Time for another beer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js