Compare execution paths of same code under different inputs - c++

I'm debugging a very complex C++ function that gives me some unexpected results under some inputs. I'd like to compare code executions under different input so that I find out what part causes me bug. The tool that can compare code execution paths is what I am looking for. Please let me know if such a tool exists. Or otherwise if there's some techniques I can employ to do the same thing?
To describe my problem concretely, here I'm using a contrived example.
Say this is the function in pseudocode,
double payTax(double income)
{
if (income < 10000)
return noTax();
else if ( 10000 < income < 30000)
return levelOneTax();
else if (30000 < income < 48000)
return levelTwoTax();
else
return levelThreeAboveTax();
}
Given input 15000, the function computes the correct amount of tax, but somehow input 16000 gives an erroneous tax amount. Supposedly, input 15000 and 16000 would cause the function to go through exactly the same execution paths; on the other hand, if they go different paths, then something must have gone wrong within the function. Therefore, a tool that compares execution paths would reveal enough information that could help me quickly identify the bug. I'm looking for such a tool. Preferably compatible with Visual Studio 2010. It would be better if such a tool also keeps values of variables.
P.S. debugging is the last thing I want to do because the code base I am working with is much much bigger and complex than the trivial payTax example.
Please help. Thanks.

The keywords you are looking for is "code coverage" or "coverage analysis" or "code coverage analysis".
Which tool you use will naturally depend on the rest of your environment.

I know that this question is almost ten years old now, but I'll still give my answer here, since it may be useful for a random googler.
The approach does not require any additional 3rd party tools, except maybe a usual text diff application to compare text files with execution paths.
You'll need to output the execution path yourself from your application, but instead of adding logging code in every function, you'll use special support from the compiler to make it call your handlers upon each function entry and exit. Microsoft compiler calls it Hook Functions and require separate flags for hooking function entering (/Gh) and exiting (/GH). GNU C++ compiler calls it instrumentation functions and requires a single -finstrument-functions flag for both.
Given those flags the compilers will add special prologue and epilogue code for each function being compiled, which will call special handlers, one for enter and one for exit. You'll need to provide the implementation of those handlers yourself. On GNU C++ the handlers are already passed with the pointers to the function being entered into or exited from. If you're on MSVC++, you'll need to use the return of _ReturnAddress intrinsic with some modification to get the address of the function.
Then you can just output the address as is and then use something line add2line tool to translate the address to function name. Or you can go one step further and make the translation yourself.
On MSVC++ you can use DbgHelp API and specifically SymInitialize and SymFromAddr helpers to translate the address into the function name. This will require your application to be compiled with debug information.
On GNU C++ you may probably want to use backtrace_symbols to translate the address into function name, and then maybe __cxa_demangle to demangle the returned name. This will probably require your executable to be built with -rdynamic.
Having all this in place you can output the name of each called function with the needed indent and thus have the call path. Or even do some fancier stuff like this, this or this.
You can use this MSVC++ code or this GCC code as a starting point, or just use your favorite search engine for other examples which are plenty.

The tool you want is printf or std::cerr!
And you have a substantial error in your code: a statement like if ( 10000 < income < 30000) will not work as expected! You want to write it like if( 10000 < income && income < 30000 ).
And to keep testing simple, please use curly brackets as in:
if( 10000 < income && income < 30000 ) {
return levelOneTax();
} else if( ...
Because then it will be much easier to add debug output, as in:
if( 10000 < income && income < 30000 ) {
std::cerr << "using levelOneTax for income=" << income << std::endl;
return levelOneTax();
} else if( ...
EDIT
BTW: "a tool that compares execution paths would reveal enough information [...]", BUT in the sense you are expecting, such a tool would reveal TOO MUCH information to handle. The best thing you can do is debugging and verifying that your code is doing what you expect it to do. A "code coverage" tool would probably be too big for your case (and also such tools are not cheap).

Related

How to speed up program execution

This is a very simple question, but unfortunately, I am stuck and do not know what to do. My program is a simple program that keeps on accepting 3 numbers and outputs the largest of the 3. The program keeps on running until the user inputs a character.
As the tittle says, my question is how I can make this execute faster ( There will be a large amount of input data ). Any sort of help which may include using a different algorithm or using different functions or changing the entire code is accepted.
I'm not very experienced in C++ Standard, and thus do not know about all the different functions available in the different libraries, so please do explain your reasons and if you're too busy, at least try and provide a link.
Here is my code
#include<stdio.h>
int main()
{
int a,b,c;
while(scanf("%d %d %d",&a,&b,&c))
{
if(a>=b && a>=c)
printf("%d\n",a);
else if(b>=a && b>=c)
printf("%d\n",b);
else
printf("%d\n",c);
}
return 0;
}
It's working is very simple. The while loop will continue to execute until the user inputs a character. As I've explained earlier, the program accepts 3 numbers and outputs the largest. There is no other part of this code, this is all. I've tried to explain it as much as I can. If you need anything more from my side, please ask, ( I'll try as much as I can ).
I am compiling on an internet platform using CPP 4.9.2 ( That's what is said over there )
Any sort of help will be highly appreciated. Thanks in advance
EDIT
The input is made by a computer, so there is no delay in input.
Also, I will accept answers in c and c++.
UPDATE
I would also like to ask if there are any general library functions or algorithms, or any other sort of advise ( certain things we must do and what we must not do ) to follow to speed up execution ( Not just for this code, but in general ). Any help would be appreciated. ( and sorry for asking such an awkward question without giving any reference material )
Your "algorithm" is very simple and I would write it with the use of the max() function, just because it is better style.
But anyway...
What will take the most time is the scanf. This is your bottleneck. You should write your own read function which reads a huge block with fread and processes it. You may consider doing this asynchronously - but I wouldn't recommend this as a first step (some async implementations are indeed slower than the synchronous implementations).
So basically you do the following:
Read a huge block from file into memory (this is disk IO, so this is the bottleneck)
Parse that block and find your three integers (watch out for the block borders! the first two integers may lie within one block and the third lies in the next - or the block border splits your integer in the middle, so let your parser just catch those things)
Do your comparisions - that runs as hell compared to the disk IO, so no need to improve that
Unless you have a guarantee that the three input numbers are all different, I'd worry about making the program get the correct output. As noted, there's almost nothing to speed up, other than input and output buffering, and maybe speeding up decimal conversions by using custom parsing and formatting code, instead of the general-purpose scanf and printf.
Right now if you receive input values a=5, b=5, c=1, your code will report that 1 is the largest of those three values. Change the > comparisons to >= to fix that.
You can minimize the number of comparisons by remembering previous results. You can do this with:
int d;
if (a >= b)
if (a >= c)
d = a;
else
d = c;
else
if (b >= c)
d = b;
else
d = c;
[then output d as your maximum]
That does exactly 2 comparisons to find a value for d as max(a,b,c).
Your code uses at least two and maybe up to 4.

How to print result of C++ evaluation with GDB?

I've been looking around but was unable to figure out how one could print out in GDB the result of an evaluation. For example, in the code below:
if (strcmp(current_node->word,min_node->word) > 0)
min_node = current_node;
(above I was trying out a possible method for checking alphabetical order for strings, and wasn't absolutely certain it works correctly.)
Now I could watch min_node and see if the value changes but in more involved code this is sometimes more complicated. I am wondering if there is a simple way to watch the evaluation of a test on the line where GDB / program flow currently is.
There is no expression-level single stepping in gdb, if that's what you are asking for.
Your options are (from most commonly to most infrequently used):
evaluate the expression in gdb, doing print strcmp(current_node->word,min_node->word). Surprisingly, this works: gdb can evaluate function calls, by injecting code into the running program and having it execute the code. Of course, this is fairly dangerous if the functions have side effects or may crash; in this case, it is so harmless that people typically won't think about potential problems.
perform instruction-level (assembly) single-stepping (ni/si). When the call instruction is done, you find the result in a register, according to the processor conventions (%eax on x86).
edit the code to assign intermediate values to variables, and split that into separate lines/statements; then use regular single-stepping and inspect the variables.
you may simply try to type in :
call "my_funtion()"
as far as i rember, though it won't work when a function is inlined.

Is there a way to figure out the top callers of a C function?

Say I have function that is called a LOT from many different places. So I would like to find out who calls this functions the most. For example, top 5 callers or who ever calls this function more than N times.
I am using AS3 Linux, gcc 3.4.
For now I just put a breakpoint and then stop there after every 300 times, thus brute-forcing it...
Does anyone know of tools that can help me?
Thanks
Compile with -pg option, run the program for a while and then use gprof. Running a program compiled with -pg option will generate gmon.out file with execution profile. gprof can read this file and present it in readable form.
I wrote call logging example just for fun. A macro change the function call with an instrumented one.
include <stdio.h>.
int funcA( int a, int b ){ return a+b; }
// instrumentation
void call_log(const char*file,const char*function,const int line,const char*args){
printf("file:%s line: %i function: %s args: %s\n",file,line,function,args);
}
#define funcA(...) \
(call_log(__FILE__, __FUNCTION__, __LINE__, "" #__VA_ARGS__), funcA(__VA_ARGS__)).
// testing
void funcB(void){
funcA(7,8);
}
int main(void){
int x = funcA(1,2)+
funcA(3,4);
printf( "x: %i (==10)\n", x );
funcA(5,6);
funcB();
}
Output:
file:main.c line: 22 function: main args: 1,2
file:main.c line: 24 function: main args: 3,4
x: 10 (==10)
file:main.c line: 28 function: main args: 5,6
file:main.c line: 17 function: funcB args: 7,8
Profiling helps.
Since you mentioned oprofile in another comment, I'll say that oprofile supports generating callgraphs on profiled programs.
See http://oprofile.sourceforge.net/doc/opreport.html#opreport-callgraph for more details.
It's worth noting this is definitely not as clear as the callers profile you may get from gprof or another profiler, as the numbers it reports is the number of times oprofile collected a sample in which X is the caller for a given function, not the number of times X called a given function. But this should be sufficient to figure out the top callers of a given function.
A somewhat cumbersome method, but not requiring additional tools:
#define COUNTED_CALL( fn, ...) do{ \
fprintf( call_log_fp, "%s->%s\n", __FUNCTION__, #fn ) ; \
(fn)(__VA_ARGS__) ; \
}while(0) ;
Then all calls written like:
int input_available = COUNTED_CALL( scanf, "%s", &instring ) ;
will be logged to the file associated to call_log_fp (a global FILE* which you must have initialised). The log for the above would look like:
main->scanf
You can then process that log file to extract the data you need. You could even write your own code to do the instrumentation which would make it perhaps less cumbersome.
Might be a bit ambiguous for C++ class member functions though. I am not sure if there is a __CLASS__ macro.
In addition to the aforementioned gprof profiler, you may also try the gcov code-coverage tool. Information on compiling for and using both should be included in the gcc manual.
Once again, stack sampling to the rescue! Just take a bunch of "stackshots", as many as you like. Discard any samples where your function (call it F) is not somewhere on the stack. (If you're discarding most of them, then F is not a performance problem.)
On each remaining sample, locate the call to F, and see what function (call it G) that call is in. If F is recursive (it appears more than once on the sample) only use the topmost call.
Rank your Gs by how many stacks each one appears in.
If you don't want to do this by hand, you could make a simple tool or script. You don't need a zillion samples. 20 or so will give you reasonably good information.
By the way, if what you're really trying to do is find performance problems, you don't actually need to do all that discarding and ranking. In fact - don't discard the exact locations of the call instruction inside each G. Those can actually tell you a good bit more than just the fact that they were somewhere inside G.
P.S. This is all based on the assumption that when you say "calls it the most" you mean "spends the most wall clock time in calling it", not "calls it the greatest number of times". If you are interested in performance, fraction of wall clock time is more useful than invocation count.

if - else vs if and returns revisited (not asking about multiple returns ok or not)

With regards this example from Code Complete:
Comparison Compare(int value1, int value2)
{
if ( value1 < value2 )
return Comparison_LessThan;
else if ( value1 > value2 )
return Comparison_GreaterThan;
else
return Comparison_Equal;
}
You could also write this as:
Comparison Compare(int value1, int value2)
{
if ( value1 < value2 )
return Comparison_LessThan;
if ( value1 > value2 )
return Comparison_GreaterThan;
return Comparison_Equal;
}
Which is more optimal though? (readability, etc aside)
Readability aside, the compiler should be smart enough to generate identical code for both cases.
"Readability, etc aside" I'd expect the compiler to produce identical code from each of them.
You can test that though, if you like: your C++ compiler probably has an option to generate a listing file, so you can see the assembly/opcodes generated from each version ... or, you can see the assembly/opcodes by using your debugger to inspect the code (after you start the executable).
This will generate identical code in just about any compiler... (GCC, visual studio, etc). Compilers work on a little bit different logic then we do. If's become if!... meaning that in both cases it would just fall through to that last return statement.
Edit:
More generally, the else statement is just there for the human, it actually doesn't generate anything on most compilers... this is true in your case and just about anything else using the if... else... construct.
The compiler generates identical code. One of the most basic things the compiler does is to build a control graph. Basically, "standing at node X, which nodes can I get to", and then it inserts jump statements for these reachable nodes.
And in your case, the control graph is exactly the same in both cases.
(Of course this is a gross simplification, and the compiler does a lot more before actually generating any actual code)
Readability is the correct answer. Any compiler will produce equivalent code to within a cycle or two, and an optimizer will have no problems parsing and sorting the control flow, either.
That's why readability is more important. Your cost of this code isn't just in writing it and compiling it today. It may have to be maintained in the future by you or someone else. You want your code to be readable so that the next maintainer will not have to waste a lot of time trying to understand it.
<underwear fabric="asbestos"> Not all coding style decisions should be made solely on "efficiency" or cycle count. </underwear> Should you write inefficient code? Of course not. But let the optimizer handle the tiny questions when it can. You're more valuable than that.
It will really depend on your compiler inferring what you are trying to do and placing the "jumps" or not. It is trivial.
In case there is a return statement, there is no difference.
Using else in these cases may just stop you from checking the second condition in case you enter the first if. But the performance difference should be really slow, unless you have a condition that takes long to check.
The two code samples should compile identically on modern compilers, whether optimizations are turned on or off. The only chance you may encounter something different is if you use an old compiler that doesn't recognize that it's going to write inefficient code (most likely, unused code).
If you're worried about optimizations, you might consider taking a look at the algorithm being used.
Just execute gcc -S to see at generated assembler code, it should be identical. Anyway you could answer yourself by executing each 1000000 times and measuring execution time.

How to correctly benchmark a [templated] C++ program

< backgound>
I'm at a point where I really need to optimize C++ code. I'm writing a library for molecular simulations and I need to add a new feature. I already tried to add this feature in the past, but I then used virtual functions called in nested loops. I had bad feelings about that and the first implementation proved that this was a bad idea. However this was OK for testing the concept.
< /background>
Now I need this feature to be as fast as possible (well without assembly code or GPU calculation, this still has to be C++ and more readable than less).
Now I know a little bit more about templates and class policies (from Alexandrescu's excellent book) and I think that a compile-time code generation may be the solution.
However I need to test the design before doing the huge work of implementing it into the library. The question is about the best way to test the efficiency of this new feature.
Obviously I need to turn optimizations on because without this g++ (and probably other compilers as well) would keep some unnecessary operations in the object code. I also need to make a heavy use of the new feature in the benchmark because a delta of 1e-3 second can make the difference between a good and a bad design (this feature will be called million times in the real program).
The problem is that g++ is sometimes "too smart" while optimizing and can remove a whole loop if it consider that the result of a calculation is never used. I've already seen that once when looking at the output assembly code.
If I add some printing to stdout, the compiler will then be forced to do the calculation in the loop but I will probably mostly benchmark the iostream implementation.
So how can I do a correct benchmark of a little feature extracted from a library ?
Related question: is it a correct approach to do this kind of in vitro tests on a small unit or do I need the whole context ?
Thanks for advices !
There seem to be several strategies, from compiler-specific options allowing fine tuning to more general solutions that should work with every compiler like volatile or extern.
I think I will try all of these.
Thanks a lot for all your answers!
If you want to force any compiler to not discard a result, have it write the result to a volatile object. That operation cannot be optimized out, by definition.
template<typename T> void sink(T const& t) {
volatile T sinkhole = t;
}
No iostream overhead, just a copy that has to remain in the generated code.
Now, if you're collecting results from a lot of operations, it's best not to discard them one by one. These copies can still add some overhead. Instead, somehow collect all results in a single non-volatile object (so all individual results are needed) and then assign that result object to a volatile. E.g. if your individual operations all produce strings, you can force evaluation by adding all char values together modulo 1<<32. This adds hardly any overhead; the strings will likely be in cache. The result of the addition will subsequently be assigned-to-volatile so each char in each sting must in fact be calculated, no shortcuts allowed.
Unless you have a really aggressive compiler (can happen), I'd suggest calculating a checksum (simply add all the results together) and output the checksum.
Other than that, you might want to look at the generated assembly code before running any benchmarks so you can visually verify that any loops are actually being run.
Compilers are only allowed to eliminate code-branches that can not happen. As long as it cannot rule out that a branch should be executed, it will not eliminate it. As long as there is some data dependency somewhere, the code will be there and will be run. Compilers are not too smart about estimating which aspects of a program will not be run and don't try to, because that's a NP problem and hardly computable. They have some simple checks such as for if (0), but that's about it.
My humble opinion is that you were possibly hit by some other problem earlier on, such as the way C/C++ evaluates boolean expressions.
But anyways, since this is about a test of speed, you can check that things get called for yourself - run it once without, then another time with a test of return values. Or a static variable being incremented. At the end of the test, print out the number generated. The results will be equal.
To answer your question about in-vitro testing: Yes, do that. If your app is so time-critical, do that. On the other hand, your description hints at a different problem: if your deltas are in a timeframe of 1e-3 seconds, then that sounds like a problem of computational complexity, since the method in question must be called very, very often (for few runs, 1e-3 seconds is neglectible).
The problem domain you are modeling sounds VERY complex and the datasets are probably huge. Such things are always an interesting effort. Make sure that you absolutely have the right data structures and algorithms first, though, and micro-optimize all you want after that. So, I'd say look at the whole context first. ;-)
Out of curiosity, what is the problem you are calculating?
You have a lot of control on the optimizations for your compilation. -O1, -O2, and so on are just aliases for a bunch of switches.
From the man pages
-O2 turns on all optimization flags specified by -O. It also turns
on the following optimization flags: -fthread-jumps -falign-func‐
tions -falign-jumps -falign-loops -falign-labels -fcaller-saves
-fcrossjumping -fcse-follow-jumps -fcse-skip-blocks
-fdelete-null-pointer-checks -fexpensive-optimizations -fgcse
-fgcse-lm -foptimize-sibling-calls -fpeephole2 -fregmove -fre‐
order-blocks -freorder-functions -frerun-cse-after-loop
-fsched-interblock -fsched-spec -fschedule-insns -fsched‐
ule-insns2 -fstrict-aliasing -fstrict-overflow -ftree-pre
-ftree-vrp
You can tweak and use this command to help you narrow down which options to investigate.
...
Alternatively you can discover which binary optimizations are
enabled by -O3 by using:
gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
gcc -c -Q -O2 --help=optimizers > /tmp/O2-opts
diff /tmp/O2-opts /tmp/O3-opts Φ grep enabled
Once you find the culpret optimization you shouldn't need the cout's.
If this is possible for you, you might try splitting your code into:
the library you want to test compiled with all optimizations turned on
a test program, dinamically linking the library, with optimizations turned off
Otherwise, you might specify a different optimization level (it looks like you're using gcc...) for the test functio n with the optimize attribute (see http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#Function-Attributes).
You could create a dummy function in a separate cpp file that does nothing, but takes as argument whatever is the type of your calculation result. Then you can call that function with the results of your calculation, forcing gcc to generate the intermediate code, and the only penalty is the cost of invoking a function (which shouldn't skew your results unless you call it a lot!).
#include <iostream>
// Mark coords as extern.
// Compiler is now NOT allowed to optimise away coords
// This it can not remove the loop where you initialise it.
// This is because the code could be used by another compilation unit
extern double coords[500][3];
double coords[500][3];
int main()
{
//perform a simple initialization of all coordinates:
for (int i=0; i<500; ++i)
{
coords[i][0] = 3.23;
coords[i][1] = 1.345;
coords[i][2] = 123.998;
}
std::cout << "hello world !"<< std::endl;
return 0;
}
edit: the easiest thing you can do is simply use the data in some spurious way after the function has run and outside your benchmarks. Like,
StartBenchmarking(); // ie, read a performance counter
for (int i=0; i<500; ++i)
{
coords[i][0] = 3.23;
coords[i][1] = 1.345;
coords[i][2] = 123.998;
}
StopBenchmarking(); // what comes after this won't go into the timer
// this is just to force the compiler to use coords
double foo;
for (int j = 0 ; j < 500 ; ++j )
{
foo += coords[j][0] + coords[j][1] + coords[j][2];
}
cout << foo;
What sometimes works for me in these cases is to hide the in vitro test inside a function and pass the benchmark data sets through volatile pointers. This tells the compiler that it must not collapse subsequent writes to those pointers (because they might be eg memory-mapped I/O). So,
void test1( volatile double *coords )
{
//perform a simple initialization of all coordinates:
for (int i=0; i<1500; i+=3)
{
coords[i+0] = 3.23;
coords[i+1] = 1.345;
coords[i+2] = 123.998;
}
}
For some reason I haven't figured out yet it doesn't always work in MSVC, but it often does -- look at the assembly output to be sure. Also remember that volatile will foil some compiler optimizations (it forbids the compiler from keeping the pointer's contents in register and forces writes to occur in program order) so this is only trustworthy if you're using it for the final write-out of data.
In general in vitro testing like this is very useful so long as you remember that it is not the whole story. I usually test my new math routines in isolation like this so that I can quickly iterate on just the cache and pipeline characteristics of my algorithm on consistent data.
The difference between test-tube profiling like this and running it in "the real world" means you will get wildly varying input data sets (sometimes best case, sometimes worst case, sometimes pathological), the cache will be in some unknown state on entering the function, and you may have other threads banging on the bus; so you should run some benchmarks on this function in vivo as well when you are finished.
I don't know if GCC has a similar feature, but with VC++ you can use:
#pragma optimize
to selectively turn optimizations on/off. If GCC has similar capabilities, you could build with full optimization and just turn it off where necessary to make sure your code gets called.
Just a small example of an unwanted optimization:
#include <vector>
#include <iostream>
using namespace std;
int main()
{
double coords[500][3];
//perform a simple initialization of all coordinates:
for (int i=0; i<500; ++i)
{
coords[i][0] = 3.23;
coords[i][1] = 1.345;
coords[i][2] = 123.998;
}
cout << "hello world !"<< endl;
return 0;
}
If you comment the code from "double coords[500][3]" to the end of the for loop it will generate exactly the same assembly code (just tried with g++ 4.3.2). I know this example is far too simple, and I wasn't able to show this behavior with a std::vector of a simple "Coordinates" structure.
However I think this example still shows that some optimizations can introduce errors in the benchmark and I wanted to avoid some surprises of this kind when introducing new code in a library. It's easy to imagine that the new context might prevent some optimizations and lead to a very inefficient library.
The same should also apply with virtual functions (but I don't prove it here). Used in a context where a static link would do the job I'm pretty confident that decent compilers should eliminate the extra indirection call for the virtual function. I can try this call in a loop and conclude that calling a virtual function is not such a big deal.
Then I'll call it hundred of thousand times in a context where the compiler cannot guess what will be the exact type of the pointer and have a 20% increase of running time...
at startup, read from a file. in your code, say if(input == "x") cout<< result_of_benchmark;
The compiler will not be able to eliminate the calculation, and if you ensure the input is not "x", you won't benchmark the iostream.