Variable name to string in function call, C++ - c++

I'm expanding our internal debugging library, and I've run into an odd wall. I'd like to output a variable name as a string. From elsewhere on this site, I found that a macro can be used to do this within a file:
#define VarToStr(v) #v
...
printf("%s\n", VarToStr(MatName));
This outputs MatName. But now let's try this through a function across files (Matrix is a defined type):
// DebugHelpers.h
#define VarToStr(v) #v
...
void PrintMatrix(const Matrix &InputMat)
{
printf("%s\n", VarToStr(InputMat));
... // output InputMat contents
}
// DataAnalysis.cc
#include DebugHelpers.h
...
void AnalysisSubProgram342()
{
Matrix MatName;
...
PrintMatrix(MatName);
}
This outputs InputMat, instead of MatName. How can a function in another file get the variable name from the calling file?
While more complex solutions (wrapper classes, etc.) would be useful for the larger community, my implementation needs to minimize impact to preexisting code/classes.
Update:
Inspired by zenith's comments, I implemented both of his proposed solutions for comparison's sake and got both working quickly. The macro works well for simple outputs, while the function allows for more complex work (and type checking/overloading). I hadn't known that preprocessor macros could be so complex. I'll remember both for future use. Thanks !

You can't. Neither C nor C++ retain variable names at runtime.
All your macros are doing is substituting text which happens at compile time.

As mentioned by others, C++ doesn't support runtime reflection, so if you want to have a string whose contents will only be known at runtime (which is when the call to PrintMatrix will happen), you need to pass it as an argument.
And because you always know what your variables' names are you don't need the VarToStr macro:
// DebugHelpers.h
void PrintMatrix(const Matrix &InputMat, const char* MatName)
{
printf("%s\n", MatName);
... // output InputMat contents
}
// DataAnalysis.cc
#include DebugHelpers.h
...
void AnalysisSubProgram342()
{
Matrix MatName;
...
PrintMatrix(MatName, "MatName");
}
But there's another choice: make PrintMatrix a macro itself, since it's only a debug thing anyway:
// DebugHelpers.h
#define PRINT_MATRIX(InputMat)\
printf(#InputMat "\n");\
... // output InputMat contents
// DataAnalysis.cc
#include DebugHelpers.h
...
void AnalysisSubProgram342()
{
Matrix MatName;
...
PRINT_MATRIX(MatName);
}
Now after preprocessing, AnalysisSubProgram342 will look like this:
void AnalysisSubProgram342()
{
Matrix MatName;
...
printf("MatName\n");
... // output InputMat contents
}

In general you cannot do that (getting the name of a variable at runtime, from e.g. its address or in C++ its reference).
I am focusing on Linux:
However, on Linux (and GNU glibc based systems), for global variables (and functions), you might use the GNU specific dladdr(3) function.
If all the relevant code was compiled with -g (to get debug info), you might parse the debug information in DWARF format (perhaps also using __builtin_frame_address, etc.). With some pain, you might be able to get the name of some local variables from its address on the call stack. This would be a significant effort (probably months of work). Ian Taylor's libbacktrace (inside GCC) might be useful as a starting point.
You could also start (assuming everything is compiled with -g), with e.g. popen(3), a gdb -p debugging process.
Notice that recent GDB debugger is scriptable in Python or Guile, so practically speaking developing Python or Guile functions for GDB would be quicker.
You could also simply add debug output like here.

Related

How does it work and compile a C++ extension of TCL with a Macro and no main function

I have a working set of TCL script plus C++ extension but I dont know exactly how it works and how was it compiled. I am using gcc and linux Arch.
It works as follows: when we execute the test.tcl script it will pass some values to an object of a class defined into the C++ extension. Using these values the extension using a macro give some result and print some graphics.
In the test.tcl scrip I have:
#!object
use_namespace myClass
proc simulate {} {
uplevel #0 {
set running 1
for {} {$running} { } {
moveBugs
draw .world.canvas
.statusbar configure -text "t:[tstep]"
}
}
}
set toroidal 1
set nx 100
set ny 100
set mv_dist 4
setup $nx $ny $mv_dist $toroidal
addBugs 100
# size of a grid cell in pixels
set scale 5
myClass.scale 5
The object.cc looks like:
#include //some includes here
MyClass myClass;
make_model(myClass); // --> this is a macro!
The Macro "make_model(myClass)" expands as follows:
namespace myClass_ns { DEFINE_MYLIB_LIBRARY; int TCL_obj_myClass
(mylib::TCL_obj_init(myClass),TCL_obj(mylib::null_TCL_obj,
(std::string)"myClass",myClass),1); };
The Class definition is:
class MyClass:
{
public:
int tstep; //timestep - updated each time moveBugs is called
int scale; //no. pixels used to represent bugs
void setup(TCL_args args) {
int nx=args, ny=args, moveDistance=args;
bool toroidal=args;
Space::setup(nx,ny,moveDistance,toroidal);
}
The whole thing creates a cell-grid with some dots (bugs) moving from one cell to another.
My questions are:
How do the class methods and variables get the script values?
How is possible to have c++ code and compile it without a main function?
What is that macro doing there in the extension and how it works??
Thanks
Whenever a command in Tcl is run, it calls a function that implements that command. That function is written in a language like C or C++, and it is passed in the arguments (either as strings or Tcl_Obj* values). A full extension will also include a function to do the library initialisation; the function (which is external, has C linkage, and which has a name like Foo_Init if your library is foo.dll) does basic setting up tasks like registering the implementation functions as commands, and it's explicit because it takes a reference to the interpreter context that is being initialised.
The implementation functions can do pretty much anything they want, but to return a result they use one of the functions Tcl_SetResult, Tcl_SetObjResult, etc. and they have to return an int containing the relevant exception code. The usual useful ones are TCL_OK (for no exception) and TCL_ERROR (for stuff's gone wrong). This is a C API, so C++ exceptions aren't allowed.
It's possible to use C++ instance methods as command implementations, provided there's a binding function in between. In particular, the function has to get the instance pointer by casting a ClientData value (an alias for void* in reality, remember this is mostly a C API) and then invoking the method on that. It's a small amount of code.
Compiling things is just building a DLL that links against the right library (or libraries, as required). While extensions are usually recommended to link against the stub library, it's not necessary when you're just developing and testing on one machine. But if you're linking against the Tcl DLL, you'd better make sure that the code gets loaded into a tclsh that uses that DLL. Stub libraries get rid of that tight binding, providing pretty strong ABI stability, but are little more work to set up; you need to define the right C macro to turn them on and you need to do an extra API call in your initialisation function.
I assume you already know how to compile and link C++ code. I won't tell you how to do it, but there's bound to be other questions here on Stack Overflow if you need assistance.
Using the code? For an extension, it's basically just:
# Dynamically load the DLL and call the init function
load /path/to/your.dll
# Commands are all present, so use them
NewCommand 3
There are some extra steps later on to turn a DLL into a proper Tcl package, abstracting code that uses the DLL away from the fact that it is exactly that DLL and so on, but they're not something to worry about until you've got things working a lot more.

Merging global arrays at link time / filling a global array from multiple compilation units

I want to define an array of things, like event handlers. The contents of
this array is completely known at compile time, but is defined among
multiple compilation units, distributed amongst multiple libraries that
are fairly decoupled, at least until the final (static) link. I'd like
to keep it that way too - so adding or deleting a compilation unit will
also automatically manage the event handler without having to modify a
central list of event handlers.
Here's an example of what I'd like to do (but does not work).
central.h:
typedef void (*callback_t)(void);
callback_t callbacks[];
central.c:
#include "central.h"
void do_callbacks(void) {
int i;
for (i = 0; i < sizeof(callbacks) / sizeof(*callbacks); ++i)
callbacks[i]();
}
foo.c:
#include "central.h"
void callback_foo(void) { }
callback_t callbacks[] = {
&callback_foo
};
bar.c:
#include "central.h"
void callback_bar(void) { }
callback_t callbacks[] = {
&callback_bar
};
What I'd like to happen is to get a single callbacks array, which contains
two elements: &callback_foo and &callback_bar. With the code above, there's
obviously two problems:
The callbacks array is defined multiple times.
sizeof(callbacks) isn't known when compiling central.c.
It seems to me that the first point could be solved by having the linker merge
the two callbacks symbols instead of throwing an error (possibly through some
attribute on the variable), but I'm not sure if there is something like that.
Even if there is, the sizeof problem should somehow also be solved.
I realize that a common solution to this problem is to just have a startup
function or constructor that "registers" the callback. However, I can see only
two ways to implement this:
Use dynamic memory (realloc) for the callbacks array.
Use static memory with a fixed (bigger than usually needed) size.
Since I'm running on a microcontroller platform (Arduino) with limited memory,
neither of these approaches appeal to me. And given that the entire contents of
the array is known at compile time, I'm hoping for a way to let the compiler
also see this.
I've found this and this solution, but those require a custom
linker script, which is not feasible in the compilation environment I'm
running (especially not since this would require explicitely naming each
of these special arrays in the linker script, so just having a single
linker script addition doesn't work here).
This solution is the best I found so far. It uses a linked list
that is filled at runtime, but uses memory allocated statically in each
compile unit seperately (e.g. a next pointer is allocated with each
function pointer). Still, the overhead of these next pointers should not
be required - is there any better approach?
Perhaps having a dynamic solution combined with link-time optimization can
somehow result in a static allocation?
Suggestions on alternative approaches are also welcome, though the required
elements are having a static list of things, and memory efficiency.
Furthermore:
Using C++ is fine, I just used some C code above for illustrating the problem, most Arduino code is C++ anyway.
I'm using gcc / avr-gcc and though I'd prefer a portable solution, something that is gcc only is also ok.
I have template support available, but not STL.
In the Arduino environment that I use, I have not Makefile or other way to easily run some custom code at compiletime, so I'm looking for something that can be entirely implemented in the code.
As commented in some previous answer, the best option is to use a custom linker script (with a KEEP(*(SORT(.whatever.*))) input section).
Anyway, it can be done without modifying the linker scripts (working sample code below), at least at some platforms with gcc (tested on xtensa embedded device and cygwin)
Assumptions:
We want to avoid using RAM as much as possible (embedded)
We do not want the calling module to know anything about the modules with callbacks (it is a lib)
No fixed size for the list (unknown size at library compile time)
I am using GCC. The principle may work on other compilers, but I have not tested it
Callback funtions in this sample receive no arguments, but it is quite simple to modify if needed
How to do it:
We need the linker to somehow allocate at link time an array of pointers to functions
As we do not know the size of the array, we also need the linker to somehow mark the end of the array
This is quite specific, as the right way is using a custom linker script, but it happens to be feasible without doing so if we find a section in the standard linker script that is always "kept" and "sorted".
Normally, this is true for the .ctors.* input sections (the standard requires C++ constructors to be executed in order by function name, and it is implemented like this in standard ld scripts), so we can hack a little and give it a try.
Just take into account that it may not work for all platforms (I have tested it in xtensa embedded architecture and CygWIN, but this is a hacking trick, so...).
Also, as we are putting the pointers in the constructors section, we need to use one byte of RAM (for the whole program) to skip the callback code during C runtime init.
test.c:
A library that registers a module called test, and calls its callbacks at some point
#include "callback.h"
CALLBACK_LIST(test);
void do_something_and_call_the_callbacks(void) {
// ... doing something here ...
CALLBACKS(test);
// ... doing something else ...
}
callme1.c:
Client code registering two callbacks for module test. The generated functions have no name (indeed they do have a name, but it is magically generated to be unique inside the compilation unit)
#include <stdio.h>
#include "callback.h"
CALLBACK(test) {
printf("%s: %s\n", __FILE__, __FUNCTION__);
}
CALLBACK(test) {
printf("%s: %s\n", __FILE__, __FUNCTION__);
}
void callme1(void) {} // stub to be called in the test sample to include the compilation unit. Not needed in real code...
callme2.c:
Client code registering another callback for module test...
#include <stdio.h>
#include "callback.h"
CALLBACK(test) {
printf("%s: %s\n", __FILE__, __FUNCTION__);
}
void callme2(void) {} // stub to be called in the test sample to include the compilation unit. Not needed in real code...
callback.h:
And the magic...
#ifndef __CALLBACK_H__
#define __CALLBACK_H__
#ifdef __cplusplus
extern "C" {
#endif
typedef void (* callback)(void);
int __attribute__((weak)) _callback_ctor_stub = 0;
#ifdef __cplusplus
}
#endif
#define _PASTE(a, b) a ## b
#define PASTE(a, b) _PASTE(a, b)
#define CALLBACK(module) \
static inline void PASTE(_ ## module ## _callback_, __LINE__)(void); \
static void PASTE(_ ## module ## _callback_ctor_, __LINE__)(void); \
static __attribute__((section(".ctors.callback." #module "$2"))) __attribute__((used)) const callback PASTE(__ ## module ## _callback_, __LINE__) = PASTE(_ ## module ## _callback_ctor_, __LINE__); \
static void PASTE(_ ## module ## _callback_ctor_, __LINE__)(void) { \
if(_callback_ctor_stub) PASTE(_ ## module ## _callback_, __LINE__)(); \
} \
inline void PASTE(_ ## module ## _callback_, __LINE__)(void)
#define CALLBACK_LIST(module) \
static __attribute__((section(".ctors.callback." #module "$1"))) const callback _ ## module ## _callbacks_start[0] = {}; \
static __attribute__((section(".ctors.callback." #module "$3"))) const callback _ ## module ## _callbacks_end[0] = {}
#define CALLBACKS(module) do { \
const callback *cb; \
_callback_ctor_stub = 1; \
for(cb = _ ## module ## _callbacks_start ; cb < _ ## module ## _callbacks_end ; cb++) (*cb)(); \
} while(0)
#endif
main.c:
If you want to give it a try... this the entry point for a standalone program (tested and working on gcc-cygwin)
void do_something_and_call_the_callbacks(void);
int main() {
do_something_and_call_the_callbacks();
}
output:
This is the (relevant) output in my embedded device. The function names are generated at callback.h and can have duplicates, as the functions are static
app/callme1.c: _test_callback_8
app/callme1.c: _test_callback_4
app/callme2.c: _test_callback_4
And in CygWIN...
$ gcc -c -o callme1.o callme1.c
$ gcc -c -o callme2.o callme2.c
$ gcc -c -o test.o test.c
$ gcc -c -o main.o main.c
$ gcc -o testme test.o callme1.o callme2.o main.o
$ ./testme
callme1.c: _test_callback_4
callme1.c: _test_callback_8
callme2.c: _test_callback_4
linker map:
This is the relevant part of the map file generated by the linker
*(SORT(.ctors.*))
.ctors.callback.test$1 0x4024f040 0x0 .build/testme.a(test.o)
.ctors.callback.test$2 0x4024f040 0x8 .build/testme.a(callme1.o)
.ctors.callback.test$2 0x4024f048 0x4 .build/testme.a(callme2.o)
.ctors.callback.test$3 0x4024f04c 0x0 .build/testme.a(test.o)
Try to solve the actual problem. What you need are multiple callback functions, that are defined in various modules, that aren't in the slightest related to each other.
What you have done though, is to place a global variable in a header file, which is accessible by every module including that header. This introduces a tight coupling between all such files, even though they are not related to each other. Furthermore, it seems only the callback handler .c function needs to actually call the functions, yet they are exposed to the whole program.
So the actual problem here is the program design and nothing else.
And there is actually no apparent reason why you need to allocate this array at compile time. The only sane reason would be to save RAM, but that's of course is a valid reason for an embedded system. In which case the array should be declared as const and initialized at compile time.
You can keep something similar to your design if storing the array as read-write objects. Or if the array must be a read-only one for the purpose of saving RAM, you must do a drastic re-design.
I'll give both versions, consider which one is most suitable for your case:
RAM-based read/write array
(Advantage: flexible, can be changed in runtime. Disadvantages: RAM consumption. Slight over-head code for initialization. RAM is more exposed to bugs than flash.)
Let the callback.h and callback.c from a module which is only concerned with the handling of the callback functions. This module is responsible for how the callbacks are allocated and when they are executed.
In callback.h define a type for the callback functions. This should be a function pointer type just as you have done. But remove the variable declaration from the .h file.
In callback.c, declare the callback array of functions as
static callback_t callbacks [LARGE_ENOUGH_FOR_WORST_CASE];
There is no way you can avoid "LARGE_ENOUGH_FOR_WORST_CASE". You are on an embedded system with limited RAM, so you have to actually consider what the worst-case scenario is and reserve enough memory for that, no more, no less. On a microcontroller embedded system, there are no such things as "usually needed" nor "lets save some RAM for other processes". Your MCU either has enough memory to cover the worst case scenario, or it does not, in which case no amount of clever allocations will save you.
In callback.c, declare a size variable that keeps track of how much of the callback array that has been initialized. static size_t callback_size;.
Write an init function void callback_init(void) which initializes the callback module. The prototype should be in the .h file and the caller is responsible for executing it once, at program startup.
Inside the init function, set callback_size to 0. The reason I propose to do this in runtime is because you have an embedded system where a .bss segment may not be present or even undesired. You might not even have a copy-down code that initializes all static variables to zero. Such behavior is non-conformant with the C standard but very common in embedded systems. Therefore, never write code which relies on static variables getting automatically initialized to zero.
Write a function void callback_add (callback_t* callback);. Every module that includes your callback module will call this function to add their specific callback functions to the list.
Keep your do_callbacks function as it is (though as a minor remark, consider renaming to callback_traverse, callback_run or similar).
Flash-based read-only array
(Advantages: saves RAM, true read-only memory safe from memory corruption bugs. Disadvantages: less flexible, depends on every module used in the project, possibly slightly slower access because it's in flash.)
In this case, you'll have to turn the whole program upside-down. By the nature of compile-time solutions, it will be a whole lot more "hard-coded".
Instead of having multiple unrelated modules including a callback handler module, you'll have to make the callback handler module include everything else. The individual modules still don't know when a callback will get executed or where it is allocated. They just declare one or several functions as callbacks. The callback module is then responsible for adding every such callback function to its array at compile-time.
// callback.c
#include "timer_module.h"
#include "spi_module.h"
...
static const callback_t CALLBACKS [] =
{
&timer_callback1,
&timer_callback2,
&spi_callback,
...
};
The advantage of this is that you'll automatically get the worst case scenario handed to you by your own program. The size of the array is now known at compile time, it is simply sizeof(CALLBACKS)/sizeof(callback_t).
Of course this isn't nearly as elegant as the generic callback module. You get a tight coupling from the callback module to every other module in the project, but not the other way around. Essentially, the callback.c is a "main()".
You can still use a function pointer typedef in callback.h though, but it is no longer actually needed: the individual modules must ensure that they have their callback functions written in the desired format anyhow, with or without such a type present.
I too am faced with a similar problem:
...need are multiple callback functions, that are defined in various
modules, that aren't in the slightest related to each other.
Mine is C, on Atmel XMega processor. You mentioned that you are using GCC. The following doesn't solve your problem, it is a variant on the above #1 solution. It exploits the __attribute__((weak)) directive.
1) For each optional module, have a unique (per module name) but similar (per purpose) callback function. E.g.
fooModule.c:
void foo_eventCallback(void) {
// do the foo response here
}
barModule.c:
void bar_eventCallback(void) {
// do the bar response here
}
yakModule.c:
void yak_eventCallback(void) {
// do the yak response here
}
2) Have a callback start point that looks something like:
__attribute__((weak)) void foo_eventCallback(void) { }
__attribute__((weak)) void bar_eventCallback(void) { }
__attribute__((weak)) void yak_eventCallback(void) { }
void functionThatExcitesCallback(void) {
foo_eventCallback();
foo_eventCallback();
foo_eventCallback();
}
The __attribute__((weak)) qualifier basically creates a default implementation with an empty body, which the linker will replace with a different variant IF it finds a non-weak variant by the same name. It doesn't make it completely decoupled, unfortunately. But you can at least put this big super-set-of-all-callbacks in one and only one place, and not get into header file hell with it. And then your different compilation units basically replace the subsets of the superset that they want to. I would love it if there was a way to do this with using the same named function in all modules and just have those called based on what's linked, but haven't yet found something that does that.

Custom stacktrace implementation for ARM

I need to have a stacktrace in my program that is written in C++ and runs on ARM-device. I can't find any reliable way to get starcktrace so I decided to write my own that will be as simple as possible, just to get something like stacktrace in gdb.
Here's an idea: write a macro that will push FUNCTION and __PRETTY_FUNCTION__. There are several questions:
Consider I have such a macro:
#define STACKTRACE_ENTER_FUNC \
... lock mutex
... push info into the global list
... set scope-exit handler to delete info at function exit
... unlock mutex
Now I need to place this macro in every function in my code. But there are too many of them. Is there any better way to achieve the goal or should I really change every function to include this macro:
void foo()
{
STACKTRACE_ENTER_FUNC;
...
}
void bar()
{
STACKTRACE_ENTER_FUNC;
...
}
The next question is: I can use __PRETTY_FUNCTION__ (because we use only gcc of fixed version and the stacktrace implementation is only for debug builds on the fixed platform, no cross-platform or compiler issues). I can even parse it a bit to split the string to function name and function arguments names. But how can I print all function arguments without knowing too much about them: like types or number of arguments? Like:
int foo(char x, float y)
{
PRINT_ARGS("arg1", "arg2"); // Gives me the string: "arg1 = 'A', arg2 = 13.37"
...
}
int main()
{
foo('A', 13.37);
...
}
P.S. If you know a better approach to get stack-trace in running program on ARMv6, please let me know (compiler: arm-openwrt-linux-uclibcgnueabi-gcc 4.7.3, libc: uClibc-0.9.33.2)
Thanks in advance.
The easier solution is to drop down to assembly - stack traces don't exist on C++ level anyway.
From an assembly perspective, you use a map of function addresses (which any linker can generate). The current Instruction Pointer identifies the top frame, the return addresses identify the call stack. The tricky part is tail-call optimization, which is a bit philosophical (do you want the logical or the actual call stack?)

Automatically adding Enter/Exit Function Logs to a Project

I have a 3rd party source code that I have to investigate. I want to see in what order the functions are called but I don't want to waste my time typing:
printf("Entered into %s", __FUNCTION__)
and
printf("Exited from %s", __FUNCTION__)
for each function, nor do I want to touch any source file.
Do you have any suggestions? Is there a compiler flag that automagically does this for me?
Clarifications to the comments:
I will cross-compile the source to run it on ARM.
I will compile it with gcc.
I don't want to analyze the static code. I want to trace the runtime. So doxygen will not make my life easier.
I have the source and I can compile it.
I don't want to use Aspect Oriented Programming.
EDIT:
I found that 'frame' command in the gdb prompt prints the current frame (or, function name, you could say) at that point in time. Perhaps, it is possible (using gdb scripts) to call 'frame' command everytime a function is called. What do you think?
Besides the usual debugger and aspect-oriented programming techniques, you can also inject your own instrumentation functions using gcc's -finstrument-functions command line options. You'll have to implement your own __cyg_profile_func_enter() and __cyg_profile_func_exit() functions (declare these as extern "C" in C++).
They provide a means to track what function was called from where. However, the interface is a bit difficult to use since the address of the function being called and its call site are passed instead of a function name, for example. You could log the addresses, and then pull the corresponding names from the symbol table using something like objdump --syms or nm, assuming of course the symbols haven't been stripped from the binaries in question.
It may just be easier to use gdb. YMMV. :)
You said "nor do I want to touch any source file"... fair game if you let a script do it for you?
Run this on all your .cpp files
sed 's/^{/{ENTRY/'
So that it transforms them into this:
void foo()
{ENTRY
// code here
}
Put this in a header that can be #included by every unit:
#define ENTRY EntryRaiiObject obj ## __LINE__ (__FUNCTION__);
struct EntryRaiiObject {
EntryRaiiObject(const char *f) : f_(f) { printf("Entered into %s", f_); }
~EntryRaiiObject() { printf("Exited from %s", f_); }
const char *f_;
};
You may have to get fancier with the sed script. You can also put the ENTRY macro anywhere else you want to probe, like some deeply nested inner scope of a function.
Use /Gh (Enable _penter Hook Function) and /GH (Enable _pexit Hook Function) compiler switches (if you can compile the sources ofcourse)
NOTE: you won't be able to use those macro's. See here ("you will need to get the function address (in EIP register) and compare it against addresses in the map file that can be generated by the linker (assuming no rebasing has occurred). It'll be very slow though.")
If you're using gcc, the magic compiler flag is -g. Compile with debugging symbols, run the program under gdb, and generate stack traces. You could also use ptrace, but it's probably a lot easier to just use gdb.
Agree with William, use gdb to see the run time flow.
There are some static code analyzer which can tell which functions call which and can give you some call flow graph. One tool is "Understand C++" (support C/C++) but thats not free i guess. But you can find similar tools.

Does an arbitrary instruction pointer reside in a specific function?

I have a very difficult problem I'm trying to solve: Let's say I have an arbitrary instruction pointer. I need to find out if that instruction pointer resides in a specific function (let's call it "Foo").
One approach to this would be to try to find the start and ending bounds of the function and see if the IP resides in it. The starting bound is easy to find:
void *start = &Foo;
The problem is, I don't know how to get the ending address of the function (or how "long" the function is, in bytes of assembly).
Does anyone have any ideas how you would get the "length" of a function, or a completely different way of doing this?
Let's assume that there is no SEH or C++ exception handling in the function. Also note that I am on a win32 platform, and have full access to the win32 api.
This won't work. You're presuming functions are contigous in memory and that one address will map to one function. The optimizer has a lot of leeway here and can move code from functions around the image.
If you have PDB files, you can use something like the dbghelp or DIA API's to figure this out. For instance, SymFromAddr. There may be some ambiguity here as a single address can map to multiple functions.
I've seen code that tries to do this before with something like:
#pragma optimize("", off)
void Foo()
{
}
void FooEnd()
{
}
#pragma optimize("", on)
And then FooEnd-Foo was used to compute the length of function Foo. This approach is incredibly error prone and still makes a lot of assumptions about exactly how the code is generated.
Look at the *.map file which can optionally be generated by the linker when it links the program, or at the program's debug (*.pdb) file.
OK, I haven't done assembly in about 15 years. Back then, I didn't do very much. Also, it was 680x0 asm. BUT...
Don't you just need to put a label before and after the function, take their addresses, subtract them for the function length, and then just compare the IP? I've seen the former done. The latter seems obvious.
If you're doing this in C, look first for debugging support --- ChrisW is spot on with map files, but also see if your C compiler's standard library provides anything for this low-level stuff -- most compilers provide tools for analysing the stack etc., for instance, even though it's not standard. Otherwise, try just using inline assembly, or wrapping the C function with an assembly file and a empty wrapper function with those labels.
The most simple solution is maintaining a state variable:
volatile int FOO_is_running = 0;
int Foo( int par ){
FOO_is_running = 1;
/* do the work */
FOO_is_running = 0;
return 0;
}
Here's how I do it, but it's using gcc/gdb.
$ gdb ImageWithSymbols
gdb> info line * 0xYourEIPhere
Edit: Formatting is giving me fits. Time for another beer.