I have some legacy C++ code that hasn't been maintained in years. I'm trying to learn how it functions at the moment. It takes .xml input and should spit out an output text file. Two different .xml input files take vastly different amounts of time to process, and one of them behaves properly, the other doesn't. They begin the same though. I'd like to output log files of the function calls made when I execute the code with the two different inputs and diff these logs against one another to see where they begin to diverge. I can't just interrupt the code right at the first line of main() and step my way through the control flow in gdb. It's taking way too long. Ideally, I'd like to find a way to do something like
gdb --args old_exec inp1.xml -step >log1.txt
gdb --args old_exec inp2.xml -step >log2.txt
diff log1.txt log2.txt
The "-step" flag isn't real, of course, but maybe some way to tell it to log all the steps does exist. Any thoughts? Thanks!
The GCC compiler has a flag, -finstrument-functions, which causes your functions to call specific functions on entry and exit; you can use this to track your code flow. With this flag in use, you will need to provide the following functions:
void __cyg_profile_func_enter (void *this_fn, void *call_site);
void __cyg_profile_func_exit (void *this_fn, void *call_site);
and keep in mind that when you compile those functions, they must not be compiled with the intrumentation flag!
You can use addr2line to convert pointers to file/function/line numbers. It would generally be better to record the raw pointers at run-time, and perform post mortem address conversion.
See http://balau82.wordpress.com/2010/10/06/trace-and-profile-function-calls-with-gcc/ for more details.
Related
GCC has an auto-instrument options for function entry/exit.
-finstrument-functions Generate instrumentation calls for entry and exit to functions. Just after function entry and just before function
exit, the following profiling functions will be called with the
address of the current function and its call site. (On some platforms,
__builtin_return_address does not work beyond the current function, so the call site information may not be available to the profiling
functions otherwise.)
void __cyg_profile_func_enter (void *this_fn,
void *call_site);
void __cyg_profile_func_exit (void *this_fn,
void *call_site);
I would like to have something like this for every "basic block" so that I can log, dynamically, execution of every branch.
How would I do this?
There is a fuzzer called American Fuzzy Lop, it solves very similar problem of instrumenting jumps between basic blocks to gather edge coverage: if basic blocks are vertices what jumps (edges) were encountered during execution. It may be worth to see its sources. It has three approaches:
afl-gcc is a wrapper for gcc that substitutes as by a wrapper rewriting assembly code according to basic block labels and jump instructions
plugin for Clang compiler
patch for QEMU for instrumenting already compiled code
Another and probably the simplest option may be to use DynamoRIO dynamic instrumentation system. Unlike QEMU, it is specially designed to implement custom instrumentation (either as rewriting machine code by hand or simply inserting calls that even may be automatically inlined in some cases, if I get documentation right). If you think dynamic instrumentation is something very hard, look at their examples -- they are only about 100-200 lines (but you still need to read their documentation at least here and for used functions since it may contain important points: for example DR constructs dynamic basic blocks, which are distinct from a compiler's classic basic blocks). With dynamic instrumentation you can even instrument used system libraries. In case it is not what you want, you may use something like
static module_data_t *traced_module;
// in dr_client_main
traced_module = dr_get_main_module();
// in basic block event handler
void *app_pc = dr_fragment_app_pc(tag);
if (!dr_module_contains_addr(traced_module, app_pc)) {
return DR_EMIT_DEFAULT;
}
For example:
dprintf main,"hello\n"
run
Generates the same output as:
break main
commands
silent
printf "hello\n"
continue
end
run
Is there a significant advantage to using dprintf over commands, e.g. it is considerably faster (if so why?), or has some different functionality?
I imagine that dprinf could be in theory faster as it could in theory compile and inject code with a mechanism analogous to the compile code GDB command.
Or is it mostly a convenience command?
Source
In the 7.9.1 source, breakpoint.c:dprintf_command, which defines dprintf, calls create_breakpoint which is also what break_command calls, so they both seem to use the same underlying mechanism.
The main difference is that dprintf passes the dprintf_breakpoint_ops structure, which has different callbacks and gets initialized at initialize_breakpoint_ops.
dprintf stores list of command strings much like that of commands command, depending on the settings. They are:
set at update_dprintf_command_list
which gets called on after a type == bp_dprintf check inside init_breakpoint_sal
which gets called by create_breakpoint.
When a breakpoint is reached:
bpstat_stop_status gets called and invokes b->ops->after_condition_true (bs); for the breakpoint reached
after_condition_true for dprintf is dprintf_after_condition_true
bpstat_do_actions_1 runs the commands
There are two main differences.
First, dprintf has some additional output modes that can be used to make it work in other ways. See help set dprintf-channel, or the manual, for more information. I think these modes are the reason that dprintf was added as a separate entity; though at the same time they are fairly specialized and unlikely to be of general interest.
More usefully, though, dprintf doesn't interfere with next. If you write a breakpoint and use commands, and then next over such a breakpoint, gdb will forget about the next and act as if you had typed continue. This is a longstanding oddity in the gdb scripting language. dprintf doesn't suffer from this problem. (If you need similar functionality from an ordinary breakpoint, you can do this from Python.)
What I want is a mix of what can be obtained by a static code analysis like Doxygen and the stackframe you can see when using GDB. I know which problematic function I'm debugging and I want to see the neighbourhood of the function calls that guided the execution to this function call. For instance, running a simple HelloWorld! would output something like:
main:
Greeter::Greeter()
Greeter::printHello()
Greeter::printWorld()
denoting that from the main function, the constructor was called and then the printHello and printWorld functions where called. Notice that in GDB if I break at printWorld I won't be able to see in the stackframe that printHello was called.
Any ideas about how to trace function calls without going through the pain of inserting log messages in a myriad of source files?
Thanks!!
The -finstrument-functions option to gcc instructs the compiler to call a user-provided profiling function at every function entry and exit.
You could use this to write a function that just logs every function entry and exit.
From reading the question I understand that you want a list of all relevant functions executed in order as they're executed.
Unfortunately there is no application to generate this list automatically, but there are helper macros to save you a lot of time. Define a single macro called LOGFUNCTION or whatever you want and define it as:
#define LOGFUNCTION printf("In %s (%s:%d)\n", __PRETTY_FUNCTION__, __FILE__, __LINE__);
Now you do have to paste the line LOGFUNCTION wherever you want a trace to be added.
wherever you see fit.
see http://gcc.gnu.org/onlinedocs/gcc/Function-Names.html and http://gcc.gnu.org/onlinedocs/cpp/Standard-Predefined-Macros.html
GDB features a stack trace, it does what you ask for.
What he wants is to obtain tha info (for example, backtrace from gdb) but printed in a 'nicer' format than gdb do.
I think you can't. I mean, maybe there is some type of app that trace your application and do something like that, but I never hear about something like that.
The best thing you can do is use GDB, maybe create some type of bash script that use gdb to obtain the info and print it out in the way you like.
Of course, your application MUST be compiled with debug symbols (-g param to gcc).
I'm not entirely sure what the problem is with gdb's backtrace, but maybe a profiler is closer to what you want? For example, using valgrind:
valgrind --tool cachegrind ./myprogram
kcachegrind callgrind.out.NNNN
Have you tried to use gprof to generate a call graph? You can also convert gprof output to something easier on the eye with gprof2dot for example.
I have a 3rd party source code that I have to investigate. I want to see in what order the functions are called but I don't want to waste my time typing:
printf("Entered into %s", __FUNCTION__)
and
printf("Exited from %s", __FUNCTION__)
for each function, nor do I want to touch any source file.
Do you have any suggestions? Is there a compiler flag that automagically does this for me?
Clarifications to the comments:
I will cross-compile the source to run it on ARM.
I will compile it with gcc.
I don't want to analyze the static code. I want to trace the runtime. So doxygen will not make my life easier.
I have the source and I can compile it.
I don't want to use Aspect Oriented Programming.
EDIT:
I found that 'frame' command in the gdb prompt prints the current frame (or, function name, you could say) at that point in time. Perhaps, it is possible (using gdb scripts) to call 'frame' command everytime a function is called. What do you think?
Besides the usual debugger and aspect-oriented programming techniques, you can also inject your own instrumentation functions using gcc's -finstrument-functions command line options. You'll have to implement your own __cyg_profile_func_enter() and __cyg_profile_func_exit() functions (declare these as extern "C" in C++).
They provide a means to track what function was called from where. However, the interface is a bit difficult to use since the address of the function being called and its call site are passed instead of a function name, for example. You could log the addresses, and then pull the corresponding names from the symbol table using something like objdump --syms or nm, assuming of course the symbols haven't been stripped from the binaries in question.
It may just be easier to use gdb. YMMV. :)
You said "nor do I want to touch any source file"... fair game if you let a script do it for you?
Run this on all your .cpp files
sed 's/^{/{ENTRY/'
So that it transforms them into this:
void foo()
{ENTRY
// code here
}
Put this in a header that can be #included by every unit:
#define ENTRY EntryRaiiObject obj ## __LINE__ (__FUNCTION__);
struct EntryRaiiObject {
EntryRaiiObject(const char *f) : f_(f) { printf("Entered into %s", f_); }
~EntryRaiiObject() { printf("Exited from %s", f_); }
const char *f_;
};
You may have to get fancier with the sed script. You can also put the ENTRY macro anywhere else you want to probe, like some deeply nested inner scope of a function.
Use /Gh (Enable _penter Hook Function) and /GH (Enable _pexit Hook Function) compiler switches (if you can compile the sources ofcourse)
NOTE: you won't be able to use those macro's. See here ("you will need to get the function address (in EIP register) and compare it against addresses in the map file that can be generated by the linker (assuming no rebasing has occurred). It'll be very slow though.")
If you're using gcc, the magic compiler flag is -g. Compile with debugging symbols, run the program under gdb, and generate stack traces. You could also use ptrace, but it's probably a lot easier to just use gdb.
Agree with William, use gdb to see the run time flow.
There are some static code analyzer which can tell which functions call which and can give you some call flow graph. One tool is "Understand C++" (support C/C++) but thats not free i guess. But you can find similar tools.
I have a very difficult problem I'm trying to solve: Let's say I have an arbitrary instruction pointer. I need to find out if that instruction pointer resides in a specific function (let's call it "Foo").
One approach to this would be to try to find the start and ending bounds of the function and see if the IP resides in it. The starting bound is easy to find:
void *start = &Foo;
The problem is, I don't know how to get the ending address of the function (or how "long" the function is, in bytes of assembly).
Does anyone have any ideas how you would get the "length" of a function, or a completely different way of doing this?
Let's assume that there is no SEH or C++ exception handling in the function. Also note that I am on a win32 platform, and have full access to the win32 api.
This won't work. You're presuming functions are contigous in memory and that one address will map to one function. The optimizer has a lot of leeway here and can move code from functions around the image.
If you have PDB files, you can use something like the dbghelp or DIA API's to figure this out. For instance, SymFromAddr. There may be some ambiguity here as a single address can map to multiple functions.
I've seen code that tries to do this before with something like:
#pragma optimize("", off)
void Foo()
{
}
void FooEnd()
{
}
#pragma optimize("", on)
And then FooEnd-Foo was used to compute the length of function Foo. This approach is incredibly error prone and still makes a lot of assumptions about exactly how the code is generated.
Look at the *.map file which can optionally be generated by the linker when it links the program, or at the program's debug (*.pdb) file.
OK, I haven't done assembly in about 15 years. Back then, I didn't do very much. Also, it was 680x0 asm. BUT...
Don't you just need to put a label before and after the function, take their addresses, subtract them for the function length, and then just compare the IP? I've seen the former done. The latter seems obvious.
If you're doing this in C, look first for debugging support --- ChrisW is spot on with map files, but also see if your C compiler's standard library provides anything for this low-level stuff -- most compilers provide tools for analysing the stack etc., for instance, even though it's not standard. Otherwise, try just using inline assembly, or wrapping the C function with an assembly file and a empty wrapper function with those labels.
The most simple solution is maintaining a state variable:
volatile int FOO_is_running = 0;
int Foo( int par ){
FOO_is_running = 1;
/* do the work */
FOO_is_running = 0;
return 0;
}
Here's how I do it, but it's using gcc/gdb.
$ gdb ImageWithSymbols
gdb> info line * 0xYourEIPhere
Edit: Formatting is giving me fits. Time for another beer.