Pros/Cons of Static and Dynamic Instrumentation - llvm

There are many static and dynamic instrumentation tools. Soot is a static instrumentation tool for Java bytecode. Pin and Valgrind are dynamic instrumentation tools for binaries.
What are pros and cons for static and dynamic instrumentation tools? I think static instrumentation tools are better in terms of runtime performance, whereas dynamic tools are more powerful. Please compare them in terms of ability and performance.
Plus, what is the difference using instrumentation tools from writing LLVM pass?

The pros of static instrumentation is the fact that the analysis is not dependent on the input. The analysis happens on the original code and includes all paths of the code. Full coverage. This type of instrumentation usually rewrites the binary which is ready for execution without the need of another process at run-time. That also means that the code will run fast, with the only overhead coming from the injected code. The drawback of static instrumentation is the not detailed analysis which is caused due to the lack of run-time information and because of that, sometimes is very difficult to achieve your goals.
On the other side, dynamic instrumentation does includes every detail and information during the run-time of the code. In the most cases, the tools that perform dynamic instrumentation are easy to write. On the other hand, isn't able to achieve full code coverage due to the fact that the execution path is dependent on the inputs given. Also the fact that there's a need for an external process to be attached and instrument the original one makes things slower.
AFAIC, LLVM passes are used for static instrumentation, because the code generated is at compile time and is already written in the final binary and for sure includes all the pros and cons of static instrumentation techniques.
To conclude, it's a matter of what you need. You should choose the right tool for your job.

I'm assuming the need is to discover code that takes significant time and that you could optimize to save that time. That is a different goal from just timing routines.
I'm skeptical of static analyzers because everything depends on the input data mix.
Dynamic instrumentation tries to measure properties of functions, such as: self time and total time, absolute, average, and percent. Also call counts, and each routine's role in the call graph.
Dynamic instrumentation (a la gprof) has been the de-facto standard for decades, but it is very far from being the last word. For one thing, it is important to realize that most of the statistics it gives you are missing the point in terms of your original need.
These days (IMHO) you need a sampling profiler that samples the call stack, not just the program counter. It should sample on wall-clock time, not just CPU time. Samples need not be drawn at high frequency. It should suppress sampling when the app is waiting for user input. It should give you information at the line or instruction level, not just the function level. The most important statistic it should give you for a line of code is the percentage of samples containing it, because that is the most direct measure of the time that can be saved if that line is optimized.
A few profilers can do this, Oprofile and RotateRight/Zoom in particular.

Related

Locate the most costly methods and evaluate/profile them

I would like to know if there is a technique or a tool available that can tell you how much time is required to execute a particular method.
Something similar to the big O notation in math/computer science that can give you an idea about the complexity of an algorithm, I wonder if there is something similar for code analysis.
Profiling is a means of analysing a program to determine the relative amount of time spent in particular functions or methods. It is useful for discovering performance issues in a program empirically. Using GCC, for example, you can:
Compile the program with the -pg option to enable profiling.
Run the executable to produce a file called gmon.out, which contains information about the runtime characteristics of your program as it actually ran.
Run gprof to display the information generated by the instrumented executable.
Generally speaking, human analysis is the only way to discover the asymptotic (i.e., big-O) complexity of a particular algorithm—there is, so far as I know, no mechanical way to do it.
If you want to know how much time is spent in a function, use a so-called "profiler".
Complexity analysis is outside the scope of a profiler, though, since a profiler tells you what happens when you run a program once, whereas complexity tells you the limit behavior of what happens when you run an infinite sequence of programs with bigger and bigger input.
So: do you want to know which functions are most costly in your program (in which case find a profiler for your C++ implementation and follow its documentation), or do you want to know about time complexity (in which case you pretty much need a human to analyze your code)?
You should try running your code with callgrind started, it will record what functions are called, how many times, but the code will run 20X slower or such. After you get your callgrind output you should open it with kcachegrind to view the tree structure of the calls. There you can browse around and see where do you have bottlenecks
Useful links:
kcachegrind http://kcachegrind.sourceforge.net/html/Home.html
callgrind documentation http://valgrind.org/docs/manual/cl-manual.html
(valgrind is the framework, callgrind is a component)
how to start callgrind if the program spawns processes and you want to profile those too (replace sage bench.py with your program) https://github.com/titusnicolae/pynac-callgrind/blob/master/run.sh
edit: "--instr-atstart=no" should be removed from the parameter list if you don't plan on enabling the instrumentation later
How is this different from the halting problem?
Please note that I can trivially solve the halting problem using your automatic complexity analyzer -- your problem is HARDER. And the halting problem is already undecidable.

difference between code coverage and profiling

What is difference between code code coverage and profiling.
Which is the best open source tool for code coverage.
Code coverage is an assessment of how much of your code has been run. This is used to see how well your tests have exercised your code.
Profiling is used to see how various parts of your code perform.
The tools depend on the language and platform you are using. I'm guessing that you are using Java, so recommend CodeCover. Though you may find NoUnit easier to use.
Coverage is important to see which parts of the code have not been run.
In my experience it has to be accumulated over multiple use cases, because any single run of the software will only use some of the code.
Profiling means different things at different times. Sometimes it means measuring performance. Sometimes it means diagnosing memory leaks. Sometimes it means getting visibility into multi-threading or other low-level activities.
When the goal is to improve software performance, by finding so-called "bottlenecks" and fixing them, don't just settle for any profiler, not even necessarily a highly-recommended or venerable one.
It is essential to use the kind that gets the right kind of information and presents it to you the right way, because there is a lot of confusion about this.
More on that subject.
Added:
For a coverage tool, I've always done it myself. In nearly every routine and basic block, I insert a call like this: Utils.CovTest("file name, routine name, comment that tells what's being done here").
The routine records the fact that it was called, and when the program finishes, all those comments are appended to a text file.
Then there's a post-processing step where that file is "subtracted" from a complete list (gotten by a grep-like program).
The result is a list of what hasn't been tested, requiring additional test cases.
When not doing coverage testing, Utils.CovTest does nothing. I leave it out of the innermost loops anyway, so it doesn't affect performance much.
In C and C++, I do it with a macro that, during normal use, expands to nothing.

How do I profile code beyond per-function level?

AFAIK profilers can only tell how much time is spent in each function. But since C++ compilers tend to inline code aggressively and also some functions are not that short it's often useful to know more details - how much time each construct consumes.
How can this be achieved except restructuring code into smaller functions?
If you use a sampling profiler (e.g. Zoom or Shark), rather than an instrumented profiler (e.g. gprof) then you can get much finer grained profiles, down to the statement and instruction level.
If you can use callgrind then you can get a summary of which methods are taking most of the processing time. Then you can use kcachegrind to view the results. It gives a very nice graph, through which you can easily browse and find bottlenecks.

C/C++ compiler feedback optimization

Has anyone seen any real world numbers for different programs which are using the feedback optimization that C/C++ compilers offer to support the branch prediction, cache preloading functions etc.
I searched for it and amazingly not even the popular interpreter development groups seem to have checked the effect. And increasing ruby,python,php etc. performance by 10% or so should be considered usefull.
Is there really no benefit or is the whole developer community just to lazy to use it?
10% is a good ballpark figure. That said, ...
You have to REALLY care about the performance to go this route. The product I work on (DB2) uses PGO and other invasive and agressive optimizations. Among the costs are significant build time (triple on some platforms) and development and support nightmares.
When something goes wrong it can be non-trivial to map the fault location in the optimized code back to the source. Developers don't usually expect that functions in different modules can end up merged and inlined and this can have "interesting" effects.
Problems with pointer aliasing, which are nasty to track down also usually show up with these sorts of optimizations. You have the additional fun of having non-deterministic builds (an aliasing problem can show up in monday's build, vanish again till thursday's, ...).
The line between what is correct or incorrect compiler behaviour under these sorts of aggressive optimizations also becomes fairly blurred. Even with the luxury of having our compiler guys in house (literally) the optimization issues (either in our source or the compiler) are still not easy to understand and resolve.
From unladen-swallow (a project optimizing the CPython VM):
For us, the final nail in PyBench's coffin was when experimenting with gcc's feedback-directed optimization tools, we were able to produce a universal 15% performance increase across our macrobenchmarks; using the same training workload, PyBench got 10% slower.
So some people are at least looking at it. That said, PGO sets some pretty tricky requirements on the build environment that are hard to satisfy for open-source projects meant to be built by a distributed heterogeneous group of people. Heavy optimization also creates difficult to debug heisenbugs. It's less work to give the compiler explicit hints for the performance critical parts.
That said, I expect significant performance increases from runtime profile guided optimization. JIT'ing allows the optimizer to cope with the profile of data changing across the execution of a program and do many extremely runtime data specific optimizations that would explode the code size for static compilation. Especially dynamic languages need good runtime data based optimization to perform well. With dynamic language performance getting significant attention lately (JavaScript VM's, MS DLR, JSR-292, PyPy and so on) there's a lot of work being done in this area.
Traditional methods in improving the compiler efficiency via profiling is done by performance analysis tools. However, how the data from the tools may be of use in optimization still depends on the compiler you use. For example, GCC is a framework being worked on to produce compilers for different domains. Providing profiling mechanism in the such compiler framework will be extremely difficult.
We can rely on statistical data to do certain optimization. For instance, GCC unrolls a loop if the loop count is less than a constant (say 7). How it fixes up the constant will be based on statistical result of the code size generated for different target architecture.
Profile guided optimizations track the special areas of the source. Details regarding previous run results needs to be stored which is an overhead. The input on the other hand requires a statistical representation of the target application which may use the compiler. So the complexity level rises with the number of different inputs and outputs. In short, deciding profile guided optimization needs extreme data collection. Automation or embedding such profiling into source needs careful monitoring. If not, the entire result will be awry and in our effort to swim we actually will drown.
However, experimentation on this regard is ongoing. Just have a look at POGO.

What is profiling?

I am new to this and is trying to learn.
What is profiling?
What are various free tools for profiling .NET, Java EE?
Can Javascript be profiled?
If so, by which tool?
And lastly, how do these profilers work?
Profiling measures how long various parts of the code take to run. Javascript can be profiled with firebug: http://getfirebug.com/js.html
profiling is measuring the execution times and correlating it with various classes/methods/functions. (see the link I gave to the wikipedia page for some commentary on how profilers can work)
Think of profilers as debuggers for execution duration bugs.
Profilers are implemented a lot like debuggers too, except that rather than allowing you to stop the program and poke around, they simply let it run and keep track of how much time gets spent in every part of the program. This is particularly useful if you have some code that is running slower than you need it to run, as you can figure out exactly where all the time is going, and concentrate your efforts on fixing just that bottleneck.
Many developers believe you should never hand-optimize code without using a profiler.
The way you would usually use your profiler is as follows:
Start the profiler, fire up your application using the profiler.
Use your application for some time or just the features in your application that you have identified as bottlenecks and would like to optimize.
Once your application is closed (or sometimes even before that), the profiler can present you a breakdown of execution times per function. Some will also allow you to get a breakdown of execution times per line or function within one of these functions so you can see where cpu most time was used up using a top-down approach.
Usually some functions in your application will take an unusually long time to execute. After looking at your profiling results, you should be able to identify them and eliminate performance problems.
Here are some .NET profilers for you to try (free):
Prof-it
NProf
CLR Profiler
I am not a big fan of these. I would recommend one of the commercial products to get the best results:
dotTrace
Ants
Other than that take a look at Brad Adams blog posts Profilers for the CLR and .NET Application Profiler.
I personally like dotTrace.
Profiling is a technique for measuring execution times and numbers of invocations of procedures.
It is not however the only or even necessarily the best way to locate things that cause time to be wasted in your code. Look here.
For a different Wikipedia article, try http://en.wikipedia.org/wiki/Performance_tuning#Bottlenecks
For a simple how-to, try http://www.wikihow.com/Optimize-Your-Program%27s-Performance
Wikipedia says:
In software engineering, performance analysis, more commonly today known as profiling, is the investigation of a program's behavior using information gathered as the program executes
Continue reading here http://en.wikipedia.org/wiki/Performance_analysis.
So, about javascript tool Firebug(http://getfirebug.com/index.html#install) is an excelent option.
Profiling is a measure of execution time at method level (functional statistics) as well as run-time level information collection such as consumption of memory, processor, threads and number of classes (non-functional statistics) loaded over a period of time the application is running. It falls under performance analysis (functional and non-functional statistics collection) of the application in question as run by one user. JConsole is one of the built-in tools to profile Java applications.
Profiling or programming profiles is the technique of dynamic analysis of programs, which uses resources such as memory space or the temporal complexity of a program, the use of particular instructions or the frequency, as well as the duration of function calls , to mention a few cases. Typically, profiling information is used to aid program optimization and, more specifically, performance engineering. Profiling is accomplished by instrumenting the program's source code. Profilers employ different methods such as event-based, statistical, instrumented, and simulation methods