How do I profile code beyond per-function level? - c++

AFAIK profilers can only tell how much time is spent in each function. But since C++ compilers tend to inline code aggressively and also some functions are not that short it's often useful to know more details - how much time each construct consumes.
How can this be achieved except restructuring code into smaller functions?

If you use a sampling profiler (e.g. Zoom or Shark), rather than an instrumented profiler (e.g. gprof) then you can get much finer grained profiles, down to the statement and instruction level.

If you can use callgrind then you can get a summary of which methods are taking most of the processing time. Then you can use kcachegrind to view the results. It gives a very nice graph, through which you can easily browse and find bottlenecks.

Related

Profile optimised C++/C code

I have some heavily templated c++ code that I am working with. I can compile and profile with AMD tools and sleepy in debug mode. However without optimisation most of time spent concentrated in the templated code and STL. With optimised compilation, all the profile tools that I know produce garbage information. Does anybody know a good way to profile optimised native code
PS1:
The code that I am writing is also heavily templated. Most of the time spent in the unoptimised code will be optimized away. I am talking about 96-97% of the run time are spent in templated code without optimisation. This is going to corrupt the accuracy of the profiling. And yes I can change many templated code or at least what part of the templated code is introducing the most trouble and I can do better in those places.
You should focus on the code you wrote because that is what you can change, time spent in STL is irrelevant, just ignore it and focus on the callers of that code. If too much time is spent in STL you probably can call some other STL primitive instead of the current one.
Profiling unoptimized code is less interesting, but you can still get some informations. If used algorithms from some parts of code are totally flawed it will show up even there. But you should be able to get useful informations from any good profiling tool in optimized code. What tools do you use exactly and why do you call their output garbage ?
Also it's usually easy enough to instrument your code by hand and find out exactly which parts are efficient and which are not. It's just a matter of calling timer functions (or reading cycle count of processor if possible) at well chosen points. I usually do that from unit tests to have reproducible results, but all depends of the specifics of your program.
Tools or instrumenting code are the easy part of optimization. The hard part is finding ways to get faster code where it's needed.
What do you mean by "garbage information"?
Profiling is only really meaningful on optimized builds, so tools are designed to work with them -- thus if you're getting meaningless results, it's probably due to the profiler not finding the right symbols, or needing to instrument the build.
In the case of Intel VTune, for example, I found I got impossible results from the sampler unless I explicitly told it where to find the PDBs for the executable I was tuning. In the instrumented version, I had to fiddle with the settings until it was reliably putting probes into the function calls.
When #kriss says
You should focus on the code you wrote
because that is what you can change
that's exactly what I was going to say.
I would add that in my opinion it is easier to do performance tuning first on code compiled without optimization, and then later turn on the optimizer, for the same reason. If something you can fix is costing excess time, it will cost proportionally excess time regardless of what the compiler does, and it's easier to find it in code that hasn't been scrambled.
I don't look for such code by measuring time. If the excess time is, say, 20%, then what I do is randomly pause it several times. As soon as I see something that can obviously be improved on 2 or more samples, then I've found it. It's an oddball method, but it doesn't really miss anything. I do measure the overall time before and after to see how much I saved. This can be done multiple times until you can't find anything to fix. (BTW, if you're on Linux, Zoom is a more automated way to do this.)
Then you can turn on the optimizer and see how much it gives you, but when you see what changes you made, you can see there's really no way the compiler could have done it for you.

Compiler optimization for fastest possible code

I would like to select the compiler optimizations to generate the fastest possible application.
Which of the following settings should I set to true?
Dead store elimination
Eliminate duplicate expressions within basic blocks and functions
Enable loop induction variable and strength reduction
Enable Pentium instruction scheduling
Expand common intrinsic functions
Optimize jumps
Use register variables
There is also the option 'Generate the fastest possible code.', which I have obviously set to true. However, when I set this to true, all the above options are still set at false.
So I would like to know if any of the above options will speed up the application if I set them to true?
So I would like to know if any of the above options will speed up the application if I set them to true?
I know some will hate me for this, but nobody here can answer you truthfully. You have to try your program with and without them, and profile each build and see what the results are. Guess-work won't get anybody anywhere.
Compilers already do tons(!) of great optimization, with or without your permission. Your best bet is to write your code in a clean and organized matter, and worry about maintainability and extensibility. As I like to say: Code now, optimize later.
Don't micromanage down to the individual optimization. Compiler writers are very smart people - just turn them all on unless you see a specific need not to. Your time is better spent by optimizing your code (improve algorithmic complexity of your functions, etc) rather than fiddling with compiler options.
My other advice, use a different compiler. Intel has a great reputation as an optimizing compiler. VC and GCC of course are also great choices.
You could look at the generated code with different compiled options to see which is fastest, but I understand nowadays many people don't have experience doing this.
Therefore, it would be useful to profile the application. If there is an obvious portion requiring speed, add some code to execute it a thousand or ten million times and time it using utime() if it's available. The loop should run long enough that other processes running intermittently don't affect the result—ten to twenty seconds is a popular benchmark range. Or run multiple timing trials. Compile different test cases and run it to see what works best.
Spending an hour or two playing with optimization options will quickly reveal that most have minor effect. However, that same time spent thinking about the essence of the algorithm and making small changes (code removal is especially effective) can often vastly improve execution time.

c++ profiling/optimization: How to get better profiling granularity in an optimized function

I am using google's perftools (http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html) for CPU profiling---it's a wonderful tool that has helped me perform a great deal of CPU-time improvements on my application.
Unfortunately, I have gotten to the point that the code is still a bit slow, and when compiled using g++'s -O3 optimization level, all I know is that a specific function is slow, but not which aspects of it are slow.
If I remove the -O3 flag, then the unoptimized portions of the program overtake this function, and I don't get a lot of clarity into the actual parts of the function that are slow. If I leave the -O3 flag in, then the slow parts of the function are inlined, and I can't determine which parts of the function are slow.
Any suggestions? Thanks for your help!
For something like this, I've always used the "old school" way of doing it:
Insert into the routine you want to measure at various points statements which measure the current time (or cputime). Then simply print out or log the differences between them and you will know how long each section of code took. From there you can find out what is eating most of the time, and go in and get fine-grained timing within that section until you know what the problem is, and how to fix it.
If the overhead of the function calls is not the problem, you can also force inlining to be off with -fno-inline-small-functions -fno-inline-functions -fno-inline-functions-called-once -fno-inline (I'm not exactly sure how these switches interact with each other, but I think they are independent). Then you can use your ordinary profiler to look at the call graph profile and see what function calls are taking what amount of time.
If you're on linux, use oprofile.
If you're on Windows, use AMD's CodeAnalyst.
Both will give sample-based profiles down to the level of individual source lines or assembly instructions and you should have no problem identifying "hot spots" within functions.
I've spent decades doing performance tuning.
People love their tools, but I swear by this method.

C++ code performance

When is about writing code into C++ using VS2005, how can you measure the performance of your code?
Is any default tool in VS for that? Can I know which function or class slow down my application?
Are other external tools which can be integrated into VS in order to measure the gaps in my code?
If you have the Team System edition of Visual Studio 2005, you can use the built-in profiler.
http://msdn.microsoft.com/en-gb/library/z9z62c29(VS.80).aspx
AMD CodeAnalyst is available for free for both Windows and Linux and works on most x86 or x64 CPUs (including Intel's).
It has extra features available when you have an AMD processor, of course. It also integrates into Visual Studio.
I've had pretty good luck with it.
Note that there are generally at least two common forms of profiler:
instrumenting: alters your build to record information at the beginning and end of certain areas (usually per function)
sampling: periodically looks at what code is running to record information
The types of information recorded can include (but are not limited to): elapsed time, # of CPU cycles, cache hits/misses, etc.
Instrumenting can be specific to certain areas of the code (just certain files or just code you compile, not libraries you link to). The overhead is much higher (you're adding code to the project, which takes time to execute, so you're altering timing; you may change program behavior for e.g. interrupt handlers or other timing-dependent code). You're guaranteed that you will get information about the functions/areas you instrument, though.
Sampling can miss very small or very sporadic functions, but modern machines have hardware help to allow you to sample much more thoroughly. Note that some sampling systems may still inject timing differences, although they generally will be much much smaller.
Some profiling tools support a mixture of the above, depending on how you use them.
You could also use Intel VTune.
You want a tool called a profiler. For a free one that covers most simple cases, I recommend Very Sleepy. It works by sampling the application's current call stack at regular intervals.
You can always measure the time and performance of you code yourself. Consult MSDN about the the following functions QueryPerformanceCounter() and QueryPerformanceFrequency().
For more in depth analysis of memory allocation and execution times we use Memory Validator and Performance Validator from Software Verify. They have support for several languages other than C++.
I think measuring performance, and locating code to optimize, are different problems, and require different methods.
To locate code to optimize, I swear by this simple method, which is orthogonal to accepted wisdom about profiling, and does not require you to buy or install any tools.
To measure performance, I'm content with the simple process of running the subject code in a loop and timing it.
EDIT: BTW, I just looked at Very Sleepy, and it appears to be on the right track. It samples the entire call stack, and retains each stack. What I can't tell is if it gives you, for each call instruction or regular instruction, the fraction of stack samples containing that instruction. In my opinion, that is the most valuable statistic, and it does not need to be very precise.
dotTrace, on the other hand, also looks like maybe it retains stack samples, but its UI presentation of call-stack info seems to be a call-tree. What I would look for is something that shows the stack-residence percentage of individual instructions (or statements), because they could be in different branches of the call-tree, and thus the call-tree could miss their importance.
For intrusive measurement, use the performance counters. Since you're using C++, you should use a facade over this slightly painful API. STLSoft has a family of such things, with different pros and cons. I suggest winstl::performance_counter for highest resolution, or winstl::threadtimes_counter if you want to monitor the performance of a particular thread regardless of other activity in your process(es). There was an article about this in Dr Dobb's several years ago, in which the design rationale behind the facades was described in detail.
For non-intrusive measurement, you can't go past VTune.
We use Rational quantify which comes as a part of Rational PurifyPlus set of tools.
Its an excellent tool for profiling application performance.
I've recently tried JetBrains dotTrace profiler and it looks very good. It helped me locate a number "black holes" in existing C++ code quite easily.
It works fine in Visual Studio 2005 Professional in a solution which mixes C# and C++ - it uses the right function names for both pieces of code and does an integrated analysis. You can trace for time or memory.
It will be a pity when the evaluation period expires :)
We've had good results from AQTime. It's not free but is cheaper than Visual Studio ;-)

What is profiling?

I am new to this and is trying to learn.
What is profiling?
What are various free tools for profiling .NET, Java EE?
Can Javascript be profiled?
If so, by which tool?
And lastly, how do these profilers work?
Profiling measures how long various parts of the code take to run. Javascript can be profiled with firebug: http://getfirebug.com/js.html
profiling is measuring the execution times and correlating it with various classes/methods/functions. (see the link I gave to the wikipedia page for some commentary on how profilers can work)
Think of profilers as debuggers for execution duration bugs.
Profilers are implemented a lot like debuggers too, except that rather than allowing you to stop the program and poke around, they simply let it run and keep track of how much time gets spent in every part of the program. This is particularly useful if you have some code that is running slower than you need it to run, as you can figure out exactly where all the time is going, and concentrate your efforts on fixing just that bottleneck.
Many developers believe you should never hand-optimize code without using a profiler.
The way you would usually use your profiler is as follows:
Start the profiler, fire up your application using the profiler.
Use your application for some time or just the features in your application that you have identified as bottlenecks and would like to optimize.
Once your application is closed (or sometimes even before that), the profiler can present you a breakdown of execution times per function. Some will also allow you to get a breakdown of execution times per line or function within one of these functions so you can see where cpu most time was used up using a top-down approach.
Usually some functions in your application will take an unusually long time to execute. After looking at your profiling results, you should be able to identify them and eliminate performance problems.
Here are some .NET profilers for you to try (free):
Prof-it
NProf
CLR Profiler
I am not a big fan of these. I would recommend one of the commercial products to get the best results:
dotTrace
Ants
Other than that take a look at Brad Adams blog posts Profilers for the CLR and .NET Application Profiler.
I personally like dotTrace.
Profiling is a technique for measuring execution times and numbers of invocations of procedures.
It is not however the only or even necessarily the best way to locate things that cause time to be wasted in your code. Look here.
For a different Wikipedia article, try http://en.wikipedia.org/wiki/Performance_tuning#Bottlenecks
For a simple how-to, try http://www.wikihow.com/Optimize-Your-Program%27s-Performance
Wikipedia says:
In software engineering, performance analysis, more commonly today known as profiling, is the investigation of a program's behavior using information gathered as the program executes
Continue reading here http://en.wikipedia.org/wiki/Performance_analysis.
So, about javascript tool Firebug(http://getfirebug.com/index.html#install) is an excelent option.
Profiling is a measure of execution time at method level (functional statistics) as well as run-time level information collection such as consumption of memory, processor, threads and number of classes (non-functional statistics) loaded over a period of time the application is running. It falls under performance analysis (functional and non-functional statistics collection) of the application in question as run by one user. JConsole is one of the built-in tools to profile Java applications.
Profiling or programming profiles is the technique of dynamic analysis of programs, which uses resources such as memory space or the temporal complexity of a program, the use of particular instructions or the frequency, as well as the duration of function calls , to mention a few cases. Typically, profiling information is used to aid program optimization and, more specifically, performance engineering. Profiling is accomplished by instrumenting the program's source code. Profilers employ different methods such as event-based, statistical, instrumented, and simulation methods