Make BFD library find the location of a class member function

Make BFD library find the location of a class member function - c++

I am using the function bfd_find_nearest_line to find the source location of a function (from an executable with debugging symbols --compiled with -g). Naturally one of the arguments is a pointer to the function I want to locate:
boolean
_bfd_elf_find_nearest_line (abfd,
section,
symbols,
offset,
filename_ptr,
functionname_ptr, // <- HERE!
line_ptr)
https://sourceware.org/ml/binutils/2000-08/msg00248.html
After quite a bit of (pure C) boiler plate, I managed this to work with normal functions (where the normal function pointer is casted to *void).
For example, this works:
int my_function(){return 5;}
int main(){
_bfd_elf_find_nearest_line (...,
(void*)(&my_function),
...);
}
The question is if bfd_find_nearest_line can be used to locate the source code of a class member function.
struct A{
int my_member_function(){return 5.;}
};
_bfd_elf_find_nearest_line (...,
what_should_I_put_here??,
...)
Class member function (in this case if type int (A::*)()) are not functions, an in particular cannot be cast to any function pointer, not even to void*. See here: https://isocpp.org/wiki/faq/pointers-to-members#cant-cvt-memfnptr-to-voidptr
I completely understand the logic behind this, how ever the member-function pointer is the only handle from which I have information of a member function in order to make BFD identify the function. I don't want this pointer to call a function.
I know more or less how C++ works, the compiler will generate silently an equivalent free-C function,
__A_my_member_function(A* this){...}
But I don't know how to access the address of this free function or if that is even possible,and whether the bfd library will be able to locate the source location of the original my_member_function via this pointer.
(For the moment at least I am not interested in virtual functions.)
In other words,
1) I need to know if bfd will be able to locate a member function,
2) and if it can how can I map the member function pointer of type int (A::*)() to an argument that bfd can take (void*).
I know by other means (stack trace) that the pointer exists, for example I can get that the free function is called in this case _ZN1A18my_member_functionEv, but the problem is how I can get this from &(A::my_member_function).

Okay, there's good news and bad news.
The good news: It is possible.
The bad news: It's not straight forward.
You'll need the c++filt utility.
And, some way to read the symbol table of your executable, such as readelf. If you can enumerate the [mangled] symbols with a bfd_* call, you may be able to save yourself a step.
Also, here is a biggie: You'll need the c++ name of your symbol in a text string. So, for &(A::my_member_function), you'll need it in a form: "A::my_member_function()" This shouldn't be too difficult since I presume you have a limited number of them that you care about.
You'll need to get a list of symbols and their addresses from readelf -s <executable>. Be prepared to parse this output. You'll need to decode the hex address from the string to get its binary value.
These will be the mangled names. For each symbol, do c++filt -n mangled_name and capture the output (i.e. a pipe) into something (e.g. nice_name). It will give you back the demangled name (i.e. the nice c++ name you'd like).
Now, if nice_name matches "A:my_member_function()", you now have a match, you already have the mangled name, but, more importantly, the hex address of the symbol. Feed this hex value [suitably cast] to bfd where you were stuffing functionname_ptr
Note: The above works but can be slow with repeated invocations of c++filt
A faster way is to do this is to capture the piped output of:
readelf -s <executable> | c++filt
It's also [probably] easier to do it this way since you only have to parse the filtered output and look for the matching nice name.
Also, if you had multiple symbols that you cared about, you could get all the addresses in a single invocation.

Ok, I found a way. First, I discovered that bfd is pretty happy detecting member functions debug information from member pointers, as long as the pointer can be converted to void*.
I was using clang which wouldn't allow me to cast the member function pointer to any kind of pointer or integer.
GCC allows to do this but emits a warning.
There is even a flag to allow pointer to member cast called -Wno-pmf-conversions.
With that information in mind I did my best to convert a member function pointer into void* and I ended up doing this using unions.
struct A{
int my_member_function(){return 5.;}
};
union void_caster_t{
int (A::*p)(void) value;
void* casted_value;
};
void_caster_t void_caster = {&A::my_member_function};
_bfd_elf_find_nearest_line (...,
void_caster.casted_value,
...)
Finally bfd is able to give me debug information of a member function.
What I didn't figure out yet, is how to get the pointer to the constructor and the destructor member functions.
For example
void_caster_t void_caster = {&A::~A};
Gives compiler error: "you can't take the address of the destructor".
For the constructor I wasn't even able to find the correct syntax, since this fails as a syntax error.
void_caster_t void_caster = {&A::A};
Again all the logic behind not being able involves non-sensical callbacks, but this is different because I want the pointer (or address) to get debug information, not callbacks.

Related

Changing what a function points to

I have been playing around with pointers and function pointers in c/c++. As you can get the adress of a function, can you change where a function call actually ends?
I tried getting the memory adress of a function, then writing a second functions adress to that location, but it gave me a access violation error.
Regards,

Function pointers are variables, just like ints and doubles. The address of a function is something different. It is the location of the beginning of the function in the .text section of the binary. You can assign the address of a function to a function pointer of the same type however the .text section is read only and therefore you can't modify it. Writing to the address of a function would attempt to overwrite the code at the beginning of the function and is therefore not allowed.
Note:
If you want to change, at runtime, where function calls end up you can create something called a vritual dispatch table, or vtable. This is a structure containing function pointers and is used in languages such as c++ for polymorphism.
e.g.:
struct VTable {
int (*foo)(void);
int (*bar)(int);
} vTbl;
At runtime you can change the values of vTbl.foo and vTbl.bar to point to different functions and any calls made to vTbl.foo() or .bar will be directed to the new functions.

If the function you're trying to call is inlined, then you're pretty much out of luck. However, if it's not inlined, then there may be a way:
On Unix systems there's a common feature of the dynamic linker called LD_PRELOAD which allows you to override functions in shared libraries with your own versions. See the question What is the LD_PRELOAD trick? for some discussion of this. If the function you're trying to hijack is not loaded from a shared library (i.e. if it's part of the executable or if it's coming from a statically linked library), you're probably out of luck.
On Windows, there are other attack vectors. If the function to be hooked is exported by some DLL, you could use Import Address Table Patching to hijack it without tinkering with the code of the function. If it's not exported by the DLL but you can get the address of it (i.e. by taking the address of a function) you could use something like the free (and highly recommended) N-CodeHook project.

In some environments, it is possible to "patch" the beginning instructions of a function to make the call go somewhere else. This is an unusual technique and is not used for normal programming. It is sometimes used if you have an existing compiled program and need to change how it interacts with the operating system.
Microsoft Detours is an example of a library that has the ability to this.

You can change what a function pointer points to, but you can't change a normal function nor can you change what the function contains.

You generally can't find where a function ends. There's no such standard functionality in the language and the compiler can optimize code in such ways that the function's code isn't contiguous and really has not a single point of end and in order to find where the code ends one would need to either use some non-standard tools or disassemble the code and make sense of it, which isn't something you can easily write a program for to do automatically.

how to find all objects (class objects/structs) of a C++ executable

Is there a way, maybe using nm, or gdb, that will let me create a list of all the object types that an executable contains?
To clarify, I have the source code. I need a method for figuring out all the class/struct sizes that are used at runtime. So this is probably a two part problem:
create a list of all classes/structs
use sizeof() on each of the items on the list, in gdb.

"Types" aren't a property of machine code. They're a property of a high-level, abstract language, which is compiled into machine code. Unless the compiler makes specific arrangements for you to recover information about the source program, type information generally doesn't exist at all.

http://www.hex-rays.com/products/ida/index.shtml : DeCompiler for C++
You will usually not get good C++ out of a binary unless you compiled in debugging information. Prepare to spend a lot of manual labor reversing the code.
If you didn't strip the binaries there is some hope as IDA Pro can produce C-alike code for you to work with.

It's easy to get a list of types from gdb. You just want info types and then ptype if you want to drill down into the type (limiting it to types matching a string just to keep this small):
(gdb) info types Q
All types matching regular expression "Q":
File foo.cpp:
Qq;
(gdb) ptype Qq
type = class Qq {
private:
int qx;
public:
Qq(int);
std::__cxx11::string something(std::__cxx11::list<int, std::allocator<int> >);
int getQ(void);
}
And sizeof tells you how big the structure is (of course, it's the structure itself, so this may or may not be all that useful):
(gdb) p sizeof(Qq)
$1 = 4
(gdb)
You'll probably want to run gdb in a script and parse the output somehow.

Casting ClutterActor* to ClutterStage*

I am exploring the possibility of creating a Clutter binding for the D
language ( http://d-programming-language.org/) and have started by
trying some simple tests using dynamic loading of libclutter. I've run
into a problem that might derive from the GObject inheritance
system, and I'd appreciate any help getting it figured out. Here's the
rundown: using clutter_stage_get_default returns a ClutterActor* which
I can use with the clutter_actor_* methods. But I always get errors or
segfaults when I use the clutter_stage_* or clutter_container_*
methods. Here's my test code: http://pastebin.com/nVrQ69dU
At the clutter_container_add_actor call on line 56, I get the following error:
(<unknown>:11976): Clutter-CRITICAL **: clutter_container_add_actor:
assertion 'CLUTTER_IS_CONTAINER (container)' failed
In example code, I have noticed the CLUTTER_STAGE and
CLUTTER_CONTAINER macros for casting (these obviously are not
available to me), but as far as I could tell, they simply performed
some checks, then did a plain C cast. If this is incorrect, and some
Gobject type magic needs to be done on the stage pointer before
casting, please let me know. Binding and using the
clutter_stage_set_title or clutter_stage_set_color with cast(ClutterStage*)stage resulted in
segmentation faults, presumably the same issue.
EDIT: Here's a stripped down example with no external dependencies (if you're not on Linux, you'll need to replace the dl calls with your OS's equivalents). This code fails with a segfault, which according to GDB and Valgrind, is in clutter_stage_set_title (in /usr/lib/libclutter-glx-1.0.so.0.600.14)

The problem is that you don't declare the C functions as extern(C). Because of that dmd thinks you're calling a D function and uses the wrong calling convention. One way to do this correctly is like this:
alias extern(C) void function(void*, const char*) setTitleFunc;
auto clutter_stage_set_title = getSym!(setTitleFunc)("clutter_stage_set_title");
I'm not sure how to make it work without the alias though. DMD refuses to parse anything with extern(C) in a template parameter:
auto clutter_stage_set_title = getSym!(extern(C) void function(void*, const char*))("clutter_stage_set_title"); //Doesn't work
BTW: Your cstring function is dangerous: It returns a char* indicating that the string can be modified, but this is not always true: If you pass a string literal to toStringz it might not allocate new memory but return the pointer of the original string instead. String literals are in readonly memory, so this could lead to problems.
You could just adjust your function types to match the C Types (const gchar* in C --> const char* in D) and use toStringz directly.

structs in D cannot inherit from each other and casting struct pointers will return null unless there's a intermediate cast to void* (unlike a C cast) I got refuted here
you're better off adding another abstraction layer using handle-wrapping structs and emulating the checks from those macros when casting
but what happens if you do
clutter_container_add_actor(cast(ClutterContainer*)(cast(void*)stage), textbox);
(casting to void* first and then to ClutterContainer*)

Function pointers and unknown number of arguments in C++

I came across the following weird chunk of code.Imagine you have the following typedef:
typedef int (*MyFunctionPointer)(int param_1, int param_2);
And then , in a function , we are trying to run a function from a DLL in the following way:
LPCWSTR DllFileName; //Path to the dll stored here
LPCSTR _FunctionName; // (mangled) name of the function I want to test
MyFunctionPointer functionPointer;
HINSTANCE hInstLibrary = LoadLibrary( DllFileName );
FARPROC functionAddress = GetProcAddress( hInstLibrary, _FunctionName );
functionPointer = (MyFunctionPointer) functionAddress;
//The values are arbitrary
int a = 5;
int b = 10;
int result = 0;
result = functionPointer( a, b ); //Possible error?
The problem is, that there isn't any way of knowing if the functon whose address we got with LoadLibrary takes two integer arguments.The dll name is provided by the user at runtime, then the names of the exported functions are listed and the user selects the one to test ( again, at runtime :S:S ).
So, by doing the function call in the last line, aren't we opening the door to possible stack corruption? I know that this compiles, but what sort of run-time error is going to occur in the case that we are passing wrong arguments to the function we are pointing to?

There are three errors I can think of if the expected and used number or type of parameters and calling convention differ:
if the calling convention is different, wrong parameter values will be read
if the function actually expects more parameters than given, random values will be used as parameters (I'll let you imagine the consequences if pointers are involved)
in any case, the return address will be complete garbage, so random code with random data will be run as soon as the function returns.
In two words: Undefined behavior

I'm afraid there is no way to know - the programmer is required to know the prototype beforehand when getting the function pointer and using it.
If you don't know the prototype beforehand then I guess you need to implement some sort of protocol with the DLL where you can enumerate any function names and their parameters by calling known functions in the DLL. Of course, the DLL needs to be written to comply with this protocol.

If it's a __stdcall function and they've left the name mangling intact (both big ifs, but certainly possible nonetheless) the name will have #nn at the end, where nn is a number. That number is the number of bytes the function expects as arguments, and will clear off the stack before it returns.
So, if it's a major concern, you can look at the raw name of the function and check that the amount of data you're putting onto the stack matches the amount of data it's going to clear off the stack.
Note that this is still only a protection against Murphy, not Machiavelli. When you're creating a DLL, you can use an export file to change the names of functions. This is frequently used to strip off the name mangling -- but I'm pretty sure it would also let you rename a function from xxx#12 to xxx#16 (or whatever) to mislead the reader about the parameters it expects.
Edit: (primarily in reply to msalters's comment): it's true that you can't apply __stdcall to something like a member function, but you can certainly use it on things like global functions, whether they're written in C or C++.
For things like member functions, the exported name of the function will be mangled. In that case, you can use UndecorateSymbolName to get its full signature. Using that is somewhat nontrivial, but not outrageously complex either.

I do not think so, it is a good question, the only provision is that you MUST know what the parameters are for the function pointer to work, if you don't and blindly stuff the parameters and call it, it will crash or jump off into the woods never to be seen again... It is up to the programmer to convey the message on what the function expects and the type of parameters, luckily you could disassemble it and find out from looking at the stack pointer and expected address by way of the 'stack pointer' (sp) to find out the type of parameters.
Using PE Explorer for instance, you can find out what functions are used and examine the disassembly dump...
Hope this helps,
Best regards,
Tom.

It will either crash in the DLL code (since it got passed corrupt data), or: I think Visual C++ adds code in debug builds to detect this type of problem. It will say something like: "The value of ESP was not saved across a function call", and will point to code near the call. It helps but isn't totally robust - I don't think it'll stop you passing in the wrong but same-sized argument (eg. int instead of a char* parameter on x86). As other answers say, you just have to know, really.

There is no general answer. The Standard mandates that certain exceptions be thrown in certain circumstances, but aside from that describes how a conforming program will be executed, and sometimes says that certain violations must result in a diagnostic. (There may be something more specific here or there, but I certainly don't remember one.)
What the code is doing there isn't according to the Standard, and since there is a cast the compiler is entitled to go ahead and do whatever stupid thing the programmer wants without complaint. This would therefore be an implementation issue.
You could check your implementation documentation, but it's probably not there either. You could experiment, or study how function calls are done on your implementation.
Unfortunately, the answer is very likely to be that it'll screw something up without being immediately obvious.

Generally if you are calling LoadLibrary and GetProcByAddrees you have documentation that tells you the prototype. Even more commonly like with all of the windows.dll you are provided a header file. While this will cause an error if wrong its usually very easy to observe and not the kind of error that will sneak into production.

Most C/C++ compilers have the caller set up the stack before the call, and readjust the stack pointer afterwards. If the called function does not use pointer or reference arguments, there will be no memory corruption, although the results will be worthless. And as rerun says, pointer/reference mistakes almost always show up with a modicum of testing.

Does an arbitrary instruction pointer reside in a specific function?

I have a very difficult problem I'm trying to solve: Let's say I have an arbitrary instruction pointer. I need to find out if that instruction pointer resides in a specific function (let's call it "Foo").
One approach to this would be to try to find the start and ending bounds of the function and see if the IP resides in it. The starting bound is easy to find:
void *start = &Foo;
The problem is, I don't know how to get the ending address of the function (or how "long" the function is, in bytes of assembly).
Does anyone have any ideas how you would get the "length" of a function, or a completely different way of doing this?
Let's assume that there is no SEH or C++ exception handling in the function. Also note that I am on a win32 platform, and have full access to the win32 api.

This won't work. You're presuming functions are contigous in memory and that one address will map to one function. The optimizer has a lot of leeway here and can move code from functions around the image.
If you have PDB files, you can use something like the dbghelp or DIA API's to figure this out. For instance, SymFromAddr. There may be some ambiguity here as a single address can map to multiple functions.
I've seen code that tries to do this before with something like:
#pragma optimize("", off)
void Foo()
{
}
void FooEnd()
{
}
#pragma optimize("", on)
And then FooEnd-Foo was used to compute the length of function Foo. This approach is incredibly error prone and still makes a lot of assumptions about exactly how the code is generated.

Look at the *.map file which can optionally be generated by the linker when it links the program, or at the program's debug (*.pdb) file.

OK, I haven't done assembly in about 15 years. Back then, I didn't do very much. Also, it was 680x0 asm. BUT...
Don't you just need to put a label before and after the function, take their addresses, subtract them for the function length, and then just compare the IP? I've seen the former done. The latter seems obvious.
If you're doing this in C, look first for debugging support --- ChrisW is spot on with map files, but also see if your C compiler's standard library provides anything for this low-level stuff -- most compilers provide tools for analysing the stack etc., for instance, even though it's not standard. Otherwise, try just using inline assembly, or wrapping the C function with an assembly file and a empty wrapper function with those labels.

The most simple solution is maintaining a state variable:
volatile int FOO_is_running = 0;
int Foo( int par ){
FOO_is_running = 1;
/* do the work */
FOO_is_running = 0;
return 0;
}

Here's how I do it, but it's using gcc/gdb.
$ gdb ImageWithSymbols
gdb> info line * 0xYourEIPhere
Edit: Formatting is giving me fits. Time for another beer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js