How to profile under Windows?

How to profile under Windows? - c++

I have a C++ program that I want to profile as it needs too much running time.
Im am using windows for this program and I'd like to use a free profiler. I searched the net and found the AMD CodeAnalyst and very sleepy. The AMD tool does not work very good as I have an intel CPU. So nearly no information is coming out.
When using very sleepy, I have the problem, that I do not see the names of the functions in the summary. That means: The profiling itself works, but I cannot see what function took how long. I see just something like [123456789]as function name and 0 as line number. I think this is a problem of the debugging symbols.
Can you tell me, what I have to do to get it working (Visual Studio 2010)?
Thanks

Visual Studio Profiler here : http://www.microsoft.com/download/en/details.aspx?id=23205
Instructions : http://msdn.microsoft.com/en-us/library/ms182372.aspx

I've tried a number of them, including LTProf and ANTS, but I keep going back to this method.
It's not a tool; it's just a technique.
Here's a step-by-step example of using it.
A lot of other people also use it, if you want links.

There are two kinds of profiler.
The non-intrusive kind, which do not require modification to your code. IMHO these do not provide satisfactory results, although they are easier to use.
The intrusive kind, which require additions to your code. These provide better results, I think. I developed my own profiler of this kind, which has received good reviews. You can check it out at http://ravenspoint.wordpress.com/2010/06/16/timing/

Related

Visual C++ | How to benchmark EVERY FUNCTION and log output? [duplicate]

I've used a few profilers in the past and never found them particularly easy. Maybe I picked bad ones, maybe I didn't really know what I was expecting!
But I'd like to know if there are any 'standard' profilers which simply drop in and work? I don't believe I need massively fine-detailed reports, just to pick up major black-spots. Ease of use is more important to me at this point.
It's VC++ 2008 we're using (I run standard edition personally). I don't suppose there are any tools in the IDE for this, I can't see any from looking at the main menus?

I suggest a very simple method (which I learned from reading Mike Dunlavey's posts on SO):
Just pause the program.
Do it several times to get a reasonable sample. If a particular function is taking half of your program's execution time, the odds are that you will catch it in the act very quickly.
If you improve that function's performance by 50%, then you've just improved overall execution time by 25%. And if you discover that it's not even needed at all (I have found several such cases in the short time I've been using this method), you've just cut the execution time in half.
I must confess that at first I was quite skeptical of the efficacy of this approach, but after trying it for a couple of weeks, I'm hooked.

VS built in:
If you have team edition you can use the Visual Studio profiler.
Other options:
Otherwise check this thread.
Creating your own easily:
I personally use an internally built one based on the Win32 API QueryPerformanceCounter.
You can make something nice and easy to use within a hundred lines of code or less.
The process is simple: create a macro at the top of each function that you want to profile called PROFILE_FUNC() and that will add to internally managed stats. Then have another macro called PROFILE_DUMP() which will dump the outputs to a text document.
PROFILE_FUNC() creates an object that will use RAII to log the amount of time until the object is destroyed. Both the constructor of this RAII object and the destructor will call QueryPerformanceCounter. You could also leave these lines in your code and control the behavior via a #define PROFILING_ON

I always used AMD CodeAnalyst, I find it quite easy to use and gives interesting results. I always used the time based profile, in which I found that it cooperates well with my apps' debug information, letting me find where the time is spent at procedure, C++ instruction and single assembly instruction level.

I used lt prof in the past for a quick run down of my C++ app. It works pretty easy and runs with a compiled program, does not need and source code hooks or tweaks. There is a trial version available I believe.

A very simple (and free) way to profile is to install the Windows debuggers (cdb/windbg), set a bp on the place of interest, and issue the wt command ("Trace and Watch Data"). Check out MSDN for more info.

Another super simple and useful profiling workflow that works on any programming languages is to comment out blocks of codes. After commenting out all of them, uncomment some and run your program to see the performance. If your program starts to run very slow when some code has been uncommented, then you'll probably want to check the performance there.

Edit and Continue on GDB

I know that E&C is a controversial subject and some say that it encourages a wrong approach to debugging, but still - I think we can agree that there are numerous cases when it is clearly useful - experimenting with different values of some constants, redesigning GUI parameters on-the-fly to find a good look... You name it.
My question is: Are we ever going to have E&C on GDB? I understand that it is a platform-specific feature and needs some serious cooperation with the compiler, the debugger and the OS (MSVC has this one easy as the compiler and debugger always come in one package), but... It still should be doable. I've even heard something about Apple having it implemented in their version of GCC [citation needed]. And I'd say it is indeed feasible.
Knowing all the hype about MSVC's E&C (my experience says it's the first thing MSVC users mention when asked "why not switch to Eclipse and gcc/gdb"), I'm seriously surprised that after quite some years GCC/GDB still doesn't have such feature. Are there any good reasons for that? Is someone working on it as we speak?

It is a surprisingly non-trivial amount of work, encompassing many design decisions and feature tradeoffs. Consider: you are debugging. The debugee is suspended. Its image in memory contains the object code of the source, and the binary layout of objects, the heap, the stacks. The debugger is inspecting its memory image. It has loaded debug information about the symbols, types, address mappings, pc (ip) to source correspondences. It displays the call stack, data values.
Now you want to allow a particular set of possible edits to the code and/or data, without stopping the debuggee and restarting. The simplest might be to change one line of code to another. Perhaps you recompile that file or just that function or just that line. Now you have to patch the debuggee image to execute that new line of code the next time you step over it or otherwise run through it. How does that work under the hood? What happens if the code is larger than the line of code it replaced? How does it interact with compiler optimizations? Perhaps you can only do this on a specially compiled for EnC debugging target. Perhaps you will constrain possible sites it is legal to EnC. Consider: what happens if you edit a line of code in a function suspended down in the call stack. When the code returns there does it run the original version of the function or the version with your line changed? If the original version, where does that source come from?
Can you add or remove locals? What does that do to the call stack of suspended frames? Of the current function?
Can you change function signatures? Add fields to / remove fields from objects? What about existing instances? What about pending destructors or finalizers? Etc.
There are many, many functionality details to attend to to make any kind of usuable EnC work. Then there are many cross-tools integration issues necessary to provide the infrastructure to power EnC. In particular, it helps to have some kind of repository of debug information that can make available the before- and after-edit debug information and object code to the debugger. For C++, the incrementally updatable debug information in PDBs helps. Incremental linking may help too.
Looking from the MS ecosystem over into the GCC ecosystem, it is easy to imagine the complexity and integration issues across GDB/GCC/binutils, the myriad of targets, some needed EnC specific target abstractions, and the "nice to have but inessential" nature of EnC, are why it has not appeared yet in GDB/GCC.
Happy hacking!
(p.s. It is instructive and inspiring to look at what the Smalltalk-80 interactive programming environment could do. In St80 there was no concept of "restart" -- the image and its object memory were always live, if you edited any aspect of a class you still had to keep running. In such environments object versioning was not a hypothetical.)

I'm not familiar with MSVC's E&C, but GDB has some of the things you've mentioned:
http://sourceware.org/gdb/current/onlinedocs/gdb/Altering.html#Altering
17. Altering Execution
Once you think you have found an error in your program, you might want to find out for certain whether correcting the apparent error would lead to correct results in the rest of the run. You can find the answer by experiment, using the gdb features for altering execution of the program.
For example, you can store new values into variables or memory locations, give your program a signal, restart it at a different address, or even return prematurely from a function.
Assignment: Assignment to variables
Jumping: Continuing at a different address
Signaling: Giving your program a signal
Returning: Returning from a function
Calling: Calling your program's functions
Patching: Patching your program
Compiling and Injecting Code: Compiling and injecting code in GDB

This is a pretty good reference to the old Apple implementation of "fix and continue". It also references other working implementations.
http://sources.redhat.com/ml/gdb/2003-06/msg00500.html
Here is a snippet:
Fix and continue is a feature implemented by many other debuggers,
which we added to our gdb for this release. Sun Workshop, SGI ProDev
WorkShop, Microsoft's Visual Studio, HP's wdb, and Sun's Hotspot Java
VM all provide this feature in one way or another. I based our
implementation on the HP wdb Fix and Continue feature, which they
added a few years back. Although my final implementation follows the
general outlines of the approach they took, there is almost no shared
code between them. Some of this is because of the architectual
differences (both the processor and the ABI), but even more of it is
due to implementation design differences.
Note that this capability may have been removed in a later version of their toolchain.
UPDATE: Dec-21-2012
There is a GDB Roadmap PDF presentation that includes a slide describing "Fix and Continue" among other bullet points. The presentation is dated July-9-2012 so maybe there is hope to have this added at some point. The presentation was part of the GNU Tools Cauldron 2012.
Also, I get it that adding E&C to GDB or anywhere in Linux land is a tough chore with all the different components.
But I don't see E&C as controversial. I remember using it in VB5 and VB6 and it was probably there before that. Also it's been in Office VBA since way back. And it's been in Visual Studio since VS2005. VS2003 was the only one that didn't have it and I remember devs howling about it. They intended to add it back anyway and they did with VS2005 and it's been there since. It works with C#, VB, and also C and C++. It's been in MS core tools for 20+ years, almost continuous (counting VB when it was standalone), and subtracting VS2003. But you could still say they had it in Office VBA during the VS2003 period ;)
And Jetbrains recently added it too their C# tool Rider. They bragged about it (rightly so imo) in their Rider blog.

Profiling line-by-line in C++

I have a C++ program I am trying to optimize.
Since I want it to run fast, I am not using a lot of function calls. Most profiling tool I have seen can give you profiling info in a function-call resolution. However, I would like it in a line-by-line resolution. Is there some option like this?
I am using Visual Studio 2010 on Windows.
Thanks.

Intel Parallel Amplifier should be capable of what you want. If that is what you want:

If you're running with on an AMD processor, CodeAnalyst is free and can do that (at least, in time-based profiling); you can actually "zoom" in and out seeing what is taking the most CPU time from processes to functions down to single assembly instructions.
However, keep in mind that to get meaningful results to that resolution with time-based profiling you should run the critical part of the code several times, otherwise the statistics you get doesn't have much sense.
By the way, in my opinion you should forget about the less function calls=>faster idea. If the cost of a function call is bigger than its "payload", the compiler should be able to figure out by itself if it's convenient to inline the call, and in some cases even inlining too much can slow down the code.

AQTime is a commercial profiler for Windows and I have found it to work pretty well for both function and line timings. One thing I like about it is that you do not have to fiddle with compiler options or Visual Studio settings -- i.e. you do not need any additional compiler options to enable profiling: All you need to do the profiling is the pdb (symbol) file and the executable. (And yes, you can create a pdb file for your release-compile.)

IMHO, this method is best, for these reasons, and here's an example of a 43x speedup. It's not a well-known technique, except to a small number of people, for one example, and another, and another. You may be surprised that it's very low-tech and manual, but you can't beat the results.
Oh, and by the way, for Visual Studio, LTProf may well be the next best thing. It gives you line-level percents, derived from stack samples taken at random wall-clock times. Don't get sucked in by a lot of fancy UI options or promises of accuracy of timing. Those things don't matter. What matters is that it pinpoints the spots worth optimizing.

What's a very easy C++ profiler (VC++)?

I've used a few profilers in the past and never found them particularly easy. Maybe I picked bad ones, maybe I didn't really know what I was expecting!
But I'd like to know if there are any 'standard' profilers which simply drop in and work? I don't believe I need massively fine-detailed reports, just to pick up major black-spots. Ease of use is more important to me at this point.
It's VC++ 2008 we're using (I run standard edition personally). I don't suppose there are any tools in the IDE for this, I can't see any from looking at the main menus?

I suggest a very simple method (which I learned from reading Mike Dunlavey's posts on SO):
Just pause the program.
Do it several times to get a reasonable sample. If a particular function is taking half of your program's execution time, the odds are that you will catch it in the act very quickly.
If you improve that function's performance by 50%, then you've just improved overall execution time by 25%. And if you discover that it's not even needed at all (I have found several such cases in the short time I've been using this method), you've just cut the execution time in half.
I must confess that at first I was quite skeptical of the efficacy of this approach, but after trying it for a couple of weeks, I'm hooked.

VS built in:
If you have team edition you can use the Visual Studio profiler.
Other options:
Otherwise check this thread.
Creating your own easily:
I personally use an internally built one based on the Win32 API QueryPerformanceCounter.
You can make something nice and easy to use within a hundred lines of code or less.
The process is simple: create a macro at the top of each function that you want to profile called PROFILE_FUNC() and that will add to internally managed stats. Then have another macro called PROFILE_DUMP() which will dump the outputs to a text document.
PROFILE_FUNC() creates an object that will use RAII to log the amount of time until the object is destroyed. Both the constructor of this RAII object and the destructor will call QueryPerformanceCounter. You could also leave these lines in your code and control the behavior via a #define PROFILING_ON

I always used AMD CodeAnalyst, I find it quite easy to use and gives interesting results. I always used the time based profile, in which I found that it cooperates well with my apps' debug information, letting me find where the time is spent at procedure, C++ instruction and single assembly instruction level.

I used lt prof in the past for a quick run down of my C++ app. It works pretty easy and runs with a compiled program, does not need and source code hooks or tweaks. There is a trial version available I believe.

A very simple (and free) way to profile is to install the Windows debuggers (cdb/windbg), set a bp on the place of interest, and issue the wt command ("Trace and Watch Data"). Check out MSDN for more info.

Another super simple and useful profiling workflow that works on any programming languages is to comment out blocks of codes. After commenting out all of them, uncomment some and run your program to see the performance. If your program starts to run very slow when some code has been uncommented, then you'll probably want to check the performance there.

What's your favorite profiling tool (for C++) [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
So far, I've only used Rational Quantify. I've heard great things about Intel's VTune, but have never tried it!
Edit: I'm mostly looking for software that will instrument the code, as I guess that's about the only way to get very fine results.
See also:
What are some good profilers for native C++ on Windows?

For linux development (although some of these tools might work on other platforms). These are the two big names I know of, there's plenty of other smaller ones that haven't seen active development in a while.
Valgrind
TAU - Tuning and Analysis Utilities

For Linux:
Google Perftools
Faster than valgrind (yet, not so fine grained)
Does not need code instrumentation
Nice graphical output (--> kcachegrind)
Does memory-profiling, cpu-profiling, leak-checking

IMHO, sampling using a debugger is the best method. All you need is an IDE or debugger that lets you halt the program. It nails your performance problems before you even get the profiler installed.

My only experience profiling C++ code is with AQTime by AutomatedQA (now SmartBear Software). It has several types of profilers built in (performance, memory, Windows handles, exception tracing, static analysis, etc.), and instruments the code to get the results.
I enjoyed using it - it was always fun to find those spots where a small change in code could make a dramatic improvement in performance.

I have never done profiling before. Yesterday I programmed a ProfilingTimer class with a static timetable (a map<std::string, long long>) for time storage.
The constructor stores the starting tick, and the destructor calculates the elapsed time and adds it to the map:
ProfilingTimer::ProfilingTimer(std::string name)
: mLocalName(name)
{
sNestedName += mLocalName;
sNestedName += " > ";
if(sTimetable.find(sNestedName) == sTimetable.end())
sTimetable[sNestedName] = 0;
mStartTick = Platform::GetTimerTicks();
}
ProfilingTimer::~ProfilingTimer()
{
long long totalTicks = Platform::GetTimerTicks() - mStartTick;
sTimetable[sNestedName] += totalTicks;
sNestedName.erase(sNestedName.length() - mLocalName.length() - 3);
}
In every function (or {block}) that I want to profile i need to add:
ProfilingTimer _ProfilingTimer("identifier");
This line is a bit cumbersome to add in all functions I want to profile since I have to guess which functions take a lot of time. But it works well and the print function shows time consumed in %.
(Is anyone else working with any similar "home-made profiling"? Or is it just stupid? But it's fun! Does anyone have improvement suggestions?
Is there some sort of auto-adding a line to all functions?)

I've used Glowcode extensively in the past and have had nothing but positive experiences with it. Its Visual Studio integration is really nice, and it is the most efficient/intuitive profiler that I've ever used (even compared to profilers for managed code).
Obviously, thats useless if your not running on Windows, but the question leaves it unclear to me exactly what your requirements are.

oprofile, without a doubt; its simple, reliable, does the job, and can give all sorts of nice breakdowns of data.

The profiler in Visual Studio 2008 is very good: fast, user friendly, clear and well integrated in the IDE.

For Windows, check out Xperf. It uses sampled profile, has some useful UI, & does not require instrumentation. Quite useful for tracking down performance problems. You can answer questions like:
Who is using the most CPU? Drill down to function name using call stacks.
Who is allocating the most memory?
Who is doing the most registry queries?
Disk writes? etc.
You will be quite surprised when you find the bottlenecks, as they are probably not where you expected!

Since you don't mention the platform you're working on, I'll say cachegrind under Linux. Definitely. It's part of the Valgrind toolset.
http://valgrind.org/info/tools.html
I've never used its sub-feature Callgrind, since most of my code optimization is for inside functions.
Note that there is a frontend KCachegrind available.

For Windows, I've tried AMD Codeanalyst, Intel VTune and the profiler in Visual Studio Team Edition.
Codeanalyst is buggy (crashes frequently) and on my code, its results are often inaccurate. Its UI is unintuitive. For example, to reach the call stack display in the profile results, you have to click the "Processes" tab, then click the EXE filename of your program, then click a toolbar button with the tiny letters "CSS" on it. But it is freeware, so you may as well try it, and it works (with fewer features) without an AMD processor.
VTune ($700) has a terrible user interface IMO; in a large program, it's hard to find the particular call tree you want, and you can only look at one "node" in a program at a time (a function with its immediate callers and callees)--you cannot look at a complete call tree. There is a call graph view, but I couldn't find a way to make the relative execution times appear on the graph. In other words, the functions in the graph look the same regardless of how much time was spent in them--it's as though they totally missed the point of profiling.
Visual Studio's profiler has the best GUI of the three, but for some reason it is unable to collect samples from the majority of my code (samples are only collected for a few functions in my entire C++ program). Also, I couldn't find a price or way to buy it directly; but it comes with my company's MSDN subscription. Visual Studio supports managed, native, and mixed code; I'm not sure about the other two profilers in that regard.
In conclusion, I don't know of a good profiler yet! I'll be sure to check out the other suggestions here.

There are different requirements for profiling. Is instrumented code ok, or do you need to profile optimized code (or even already compiled code)? Do you need line-by-line profile information? Which OS are you running? Do you need to profile shared libraries as well? What about trace into system calls?
Personally, I use oprofile for everything I do, but that might not be the best choice in every case. Vtune and Shark are both excellent as well.

For Windows development, I've been using Software Verification's Performance Validator - it's fast, reasonably accurate, and reasonably priced. Best yet, it can instrument a running process, and lets you turn data collection on and off at runtime, both manually and based on the callstack - great for profiling a small section of a larger program.

I use devpartner for the pc platform.

I have tried Quantify an AQTime, and Quantify won because of its invaluable 'focus on sub tree' and 'delete sub tree' features.

The only sensitive answer is PTU from Intel. Of course its best to use it on an Intel processor and to get even more valuable results at least on a C2D machine as the architecture itself is easier to give back meaningful profiles.

I've used VTune under Windows and Linux for many years with very good results. Later versions have gotten worse, when they outsourced that product to their Russian development crew quality and performance both went down (increased VTune crashes, often 15+ minutes to open an analysis file).
Regarding instrumentation, you may find out that it's less useful than you think. In the kind of applications I've worked on adding instrumentation often slows the product down so much that it doesn't work anymore (true story: start app, go home, come back next day, app still initializing). Also, with non instrumented profiling you can react to live problems. For example, with VTune remote date collector I can start up a sampling session against a live server with hundreds of simultaneous connections that is experiencing performance problems and catch issues that happen in production that I'd never be able to replicate in a test environment.

ElectricFence works nicely for malloc debugging

My favorite tool is Easy Profiler : http://code.google.com/p/easyprofiler/
It's a compile time profiler : the source code must be manually instrumented using a set of routines so to describe the target regions.
However, once the application is run, and measures automatically written to an XML file, it is only a matter of opening the Observer application and doing few clicks on the analysis/compare tools, before you can see the result in a qualitative chart.

Visual studio 2010 profiler under Windows. VTune had a great call graph tool, but it got broken as of Windows Vista/7. I don't know if they fixed it.

Let me give a plug for EQATEC... just what I was looking for... simple to learn and use and gives me the info I need to find the hotspots quickly. I much prefer it to the one built in to Visual Studio (though I haven't tried the VS 2010 one yet, to be fair).
The ability to take snapshots is HUGE. I often get an extra analysis and optimization done while waiting for the real target analysis to run... love it.
Oh, and its base version is free!
http://www.eqatec.com/Profiler/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js