Can anybody recommend a good code profiler for C++?
I came across Shiny - any good? http://sourceforge.net/projects/shinyprofiler/
Callgrind for Unix/Linux
DevPartner for Windows
Not C++ specific, but AMD's CodeAnalyst software is free and is feature-packed.
http://developer.amd.com/cpu/codeanalyst/codeanalystwindows/Pages/default.aspx
Gprof if you use gcc. It may not be user friendly but still useful.
Probably you will be interested in Intel VTune. Rather useful and allows to collect low-level events like cache misses which helps a lot in tuning.
Quantify (part of the IBM/Rational PurifyPlus package) is a very good profiler, but not exactly cheap. It is available on several platforms, too - I've used it on Solaris, Windows and Linux.
Depends on what you need to do:
Measure, so you can do regressions testing to see if changes in performance happened.
Find reasons for suboptimal performance and optimize them.
These are not the same.
For 1, use one of the recommended profilers.
For 2, the profiler I much prefer is one you already have:
http://www.wikihow.com/Optimize-Your-Program%27s-Performance
To see how this goes, check this out.
For C++, as for C# and any language that encourages layers of abstraction, those layers may or may not be good from a software engineering standpoint, but they can kill performance. Every method call is a detour in the execution of your program, and the style encourages you to nest those things, sometimes needlessly. Also the style discourages you from knowing or caring what goes on inside them. You may find them creating and deleting objects underneath at a rate and level of generality far beyond what your application really needs.
AQtime (for Windows)
If you are running a Premium version of VS 2010 then you get a profiler with it.
I've also used a couple of other free ones, but they don't compare to the on MS ships. Useful as a second opinion though.
If you have access to a Mac, then I recommend using Shark from the CHUD tools.
You can use the analyzer that´s in Sun Studio 12 on Linux or Solaris. Itś free. http://developers.sun.com/sunstudio/index.jsp
If you cannot locate DevPartner it is because we've moved under new ownership. Check us out on the Micro Focus website: http://www.microfocus.com/products/micro-focus-developer/devpartner/index.aspx. Shameless plug: I work on the DevPartner team. Our long awaited 64-bit versions of BoundsChecker and C++/.NET profilers ship on February 4, 2011. We've changed our pricing model so you can choose either the whole suite or just the performance profiler if that's what you need. Please check out the new DPS 10.5 release when it goes live!
Related
We've been running for years with BoundsChecker for Visual C++ 6 (I think it was BoundsChecker 5 or 6, too). We've upgaded to VS2008 (finally!), and now need a follow-up for the outdated BoundsChecker.
How's the landscape?
What tools are out there?
Any new kids in town?
Any new ideas dealing with the problems we used memory profilers for?
Your recent experiences with these tools?
Recommendations?
The main application is C++ with many COM DLL's, we are looking to track native, C++ and COM leaks and objects. Bounds Checker for that size was already a pain in performance, sorting out the slew of data and some of its limitations.
Support for managed applications (primarily C#) is required, though that may be a separate tool.
Related (but IMO incomplete) question: Modern equivalent of BoundsChecker for Visual Studio 2008
[edit]
Regardign the comment, "In modern C++, you just use self-checking types, and bounds are never broken" :
Reference counted smart pointers can have cyclic references. Interfacing COM components is inherently unsafe, as it requires a lot of manual memory management. I've had a UI-less 3rd party service leak GDI handles so it crashed our overnight tests - the vendor blamed it on a "strange" Microsoft API. I have to interface C-based libraries, I have tons of legacy code that assumes allocation trickery in the sense of Numerical Recipes is a good thing and variable names longer than 3 letters are for typists. I have code from engineers for whom a std::vector<double>::iterator looks much more scary than a double ***, good luck developing and testing these without a solid background in signal processing.
So unless you come here, rewrite and encapsulate the core of a million lines of code in fool-proof C++ classes and make sure a few dozen products still work as before, keep your smart-assery to yourself. I wish I wouldn't need a memory checker, but I do. Thank you.
Disclaimer and warning: I work for Micro Focus, owner of the DevPartner Studio and BoundsChecker products.
BoundsChecker 10.5, part of DevPartner Studio 10.5 (though you can buy it by itself), supports Visual Studio 2005, 2008 and 2010 unmanaged code for 32 and 64 bit applications in essentially the same way it supported 32 bit applications on Visual Studio 6.0. While enhancing it to support X64 applications, we found and fixed quite a few very old problems, and made a start at working in spite of the .NET 4.0 code present in some VS 2010 applications. I say "in spite of" because .NET 4.0 turns out to do a lot of very nasty things in the process space, doing some things that Microsoft warns everybody else not to do, and has a certain amount of built-in resistance to tools like BoundsChecker, which are essentially gigantic viruses.
Anyway, since that release (February 4th), we have updated it to work on Windows 7 SP1 (which isn't quite public yet), and as far as BoundsChecker is concerned, we work with Visual Studio 2010 SP1 as well. We also discovered a nasty .NET 4.0 trap, and figured out to prevent it from taking us down. These enhancements and fixes will be available in our next public update, hopefully within the next month or so.
I have a massive application (here at work), and the new bounds checker 10.5 (supports 64 bit apps now) pretty much works with it. The trick Max is not to turn on all the checker features of devpartner bounds checker at once. Turn on just memory leaks, or turn on just some other feature, then run your app. And by all means exclude modules you don't need. There are quite a few things you can use to tune your settings so it goes faster. But yes, it does take a performance hit. But that is the name of the ballgame.
Intel's Parrallel inspector gave us thousands and thousands of false positives. Didn't use it.
Purify only works on 32 bit apps. i.e. small 32 bit native apps. Forget about using it with a managed C++ app.
And just for the record, if you have a large 32 bit app, memory analysis tools in general won't work very much, because of the massive memory overhead. And since you have very limited memory in a 32 bit address space, you quickly run out of room, and the tools fail.
We evaluated Boundschecker, Intel's Inspector and Purify.
They were all more or less crap.
For our main application, BoundsChecker would not start it after many hours; it only worked for a couple of smaller applications; but find a couple of things (I think we're still in contact with them to figure things out)
Intel's Inspector works, but does not instrument the code, it runs on the executable only (maybe works better when used with the whole suite of Intel products).
Purify failed miserably; we were never able to use it.
We're still in limbo about that.
Max.
Boundschecker: I just bought a (&(^ subscription which only entitles me to use the damned product for 99 days, so I'm pretty damned upset about that) but anyhow I was having big memory troubles and thought I ought to run this thing. It seems to catch lots of interesting things, but is so damned slow that, well put it this way: my appliciation is still in the DLL init code; it has been running for at least a couple hours, and so far it hasn't even gotten as far as the app does normally in the first COUPLE SECONDS. Boundschecker used to be 'the shit' back in numega days, but it seems like it is really another technological orphan being peddled by an opportunitstic business entity, like borland compilers.
So I really like it when it works, it has lots of great info. I just need to see if I will be able to actually get any decent results. It is currently using 4+ GB of RAM and hasn't even started up fully yet. Since I use win7/64 with a crippled home edition which will only recognize 12GB, I may run out of memory before anything really interesting happens. And it will be sometime a few days from now...
I have an application that runs on an embedded processor (ARM), and I'd like to profile the application to get an idea of where it's using system resources, like CPU, memory, IO, etc. The application is running on top of Linux, so I'm assuming there's a number of profiling applications available. Does anyone have any suggestions?
Thanks!
edit: I should also add the version of Linux we're using is somewhat old (2.6.18). Unfortunately I don't have a lot of control over that right now.
As bobah said, gprof and valgrind are useful. You might also want to try OProfile. If your application is in C++ (as indicated by the tags), you might want to consider disabling exceptions (if your compiler lets you) and avoiding dynamic casts, as mentioned above by sashang. See also Embedded C++.
if your Linux is not very limited then you may find gprof and valgrind useful
On a related note, the C++ working group did a technical report on the performance cost of various C++ language features. For example they analyze the cost of dynamic_casting one or 2 levels deep. The reports here http://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf and it might give you some insight into where the pain points in your embedded application might be.
gprof may disappoint you.
Assuming the program you are testing is big enough to be useful, then chances are the call tree could be pruned, so the best opportunities for optimization are function/method calls that you can remove or avoid. That link shows a good way to find them.
Many people approach this as sort of a hierarchical sleuthing process of measuring times.
Or you can simply catch it in the act, which is what I do.
I'm looking for a good multi-thread-aware debugger, capable of showing performance charts of application threads on Linux, don't know if such a thing exists, perhaps as a Eclipse plugin.
The idea would be to track per thread memory allocation a CPU usage as well as being able to interrupt a thread and examine its stack trace, local vars, etc.
It does not have to be an eclipse plugin or a free tool, do any of you have heard of something similar?
Qt Creator does provide information on a per-thread basis. It also has the features you would expect from any standard debugger. (Watches, breakpoints, etc.)
Although designed for compiling Qt applications, it can be used for just about any C++ project. (I have used it for compiling/editing a non-Qt app before.)
TotalView (and MemoryScape) doesn't do precisely what you're asking for in its' default presentation, but it provides the data you need. It costs money, but a better C++ debugger for Linux cannot be found.
Free trials are available, and there are a number of cool and useful videos on their support site.
If you're on linux, you've got access to one of the most powerful debugging tools in the trade - Valgrind. Read about it, especially about it's additional tools like Helgrind.
Sure, the visualisation is lacking compared to commercial tools, but you can't beat it's level of detail.
This question already has answers here:
What are some good profilers for native C++ on Windows? [closed]
(8 answers)
Closed 9 years ago.
Does windows have any decent sampling (eg. non-instrumenting) profilers available? Preferably something akin to Shark on MacOS, although i am willing to accept that i am going to have to pay for such a profiler on windows.
I've tried the profiler in VS Team Suite and was not overly impressed, and was wondering if there were any other good ones.
[Edit: Erk, i forgot to say this is for C/C++, rather than .NET -- sorry for any confusion]
For Windows, check out the free Xperf that ships with the Windows SDK. It uses sampled profile, has some useful UI, & does not require instrumentation. Quite useful for tracking down performance problems. You can answer questions like:
Who is using the most CPU? Drill down to function name using call stacks.
Who is allocating the most memory?
Outstanding memory allocations (leaks)
Who is doing the most registry queries?
Disk writes? etc.
I know I'm adding my answer months after this question was asked, but I thought I'd point out a decent, open-source profiler: Very Sleepy.
It doesn't have the feature count that some of the other profilers mentioned before do, but it's a pretty respectable sampling profiler that will work very well in most situations.
Intel VTune is good and is non-instrumenting. We evaluated a whole bunch of profilers for Windows, and this was the best for working with driver code (though it does unmanaged user level code as well). A particular strength is that it reads all the Intel processor performance counters, so you can get a good understanding of why your code is running slowly, and it was useful for putting prefetch instructions into our code and sorting out data layout to work well with the cache lines, and the way cache lines get invalidated in multi core systems.
It is commercial, and I have to say it isn't the easiest UI in the world.
AMD's CodeAnalyst is FREE here
We use both VTune and AQTime, and I can vouch for both. Which works best for you depends on your needs. Both have free trial versions - I suggest you give them a go.
The Windows Driver Kit includes a non-instrumenting user/kernel sampling profiler called "kernrate". It seems useful for profiling multi-process applications, applications that spend most of their time in the kernel, and device drivers (of course). It's also available in the KrView (Kernrate Viewer) and Windows Server 2003 Resource Kit Tools packages.
Kernrate works on Windows 2000 and later (unlike Xperf, which requires Vista / Server 2008). It's command-line based and the documentation has a somewhat intimidating list of options. I'm not sure if it can record call stacks or just the program counter. If you use a symbol server, make sure to put an up-to-date dbghelp.dll and symsrv.dll in the same directory as kernrate.exe to prevent it from using the ancient version of dbghelp.dll that is installed in %SystemRoot%\system32.
I have tried Intel's vtune with a rather large project about two years ago. It was an instrumenting profiler then and it took so long to instrument the DLL that I was attempting to profile that I eventually lost patience after an hour.
The one tool that I have had quite good success and which i would highly recommend is that of AQTime. It not only provides excellent performance profiling resources but it also doe really good memory profiling which has been of significant help to me in tracking down memory leaks.
Luke Stackwalker seems promising -- it's not as polished as I'd like, but it is open source and it does do something that seems very close to what #Mike Dunlavey keeps saying we ought to do. (Of course, it then tries to smoosh it all down into the typically-unhelpful call graphs that Mike is so weary of, but it shouldn't be too hard to fix that with the source as our ally.)
It even seems to count time spent waiting in the kernel, as far as I can tell...
I'm not sure what a non-instrumenting profiler is, but I can say for .NET I love RedGate's ANTS Profiler. Version 3 beats the MS version for ease of use and Version 4, which allows arbitrary time slices, makes MS look like a joke.
What good profilers do you know?
What is a good way to measure and tweak the performance of a C++ MFC application?
Is Analysis of algorithms really neccesary? http://en.wikipedia.org/wiki/Algorithm_analysis
I strongly recommend AQTime if you are staying on the Windows platform. It comes with a load of profilers, including static code analysis, and works with most important Windows compilers and systems, including Visual C++, .NET, Delphi, Borland C++, Intel C++ and even gcc. And it integrates into Visual Studio, but can also be used standalone. I love it.
If you're (still) using Visual C++ 6.0, I suggest using the built-in profiler. For more recent versions you could try Compuware DevPartner Performance Analysis Community Edition.
For Windows, check out Xperf which ships free with the Windows SDK. It uses sampled profile, has some useful UI, & does not require instrumentation. Quite useful for tracking down performance problems. You can answer questions like:
Who is using the most CPU? Drill down to function name using call stacks.
Who is allocating the most memory?
Who is doing the most registry queries?
Disk writes? etc.
You will be quite surprised when you find the bottlenecks, as they are probably not where you expected!
It's been a while since I profiled unmanaged code, but when I did I had good results with Intel's vtune. I'm sure somebody else will tell us if that's been overtaken.
Algorithmic analysis has the potential to improve your performance more profoundly than anything you'll find with a profiler, but only for certain classes of application. If you operate over reasonably large sets of data, algorithmic analysis might find ways to be more efficient in CPU/Memory/both, but if your app is mainly form-fill with a relational database for storage, it might not offer you much.
Intel Thread Checker via Vtune performance analyzer- Check this picture for the view i use the most that tells me which function eats up the most of my time.
I can further drill down inside and decompose which functions inside them eats up more time etc. There are different views based on what you are watching (total time = time within fn + children), self time (time spent only in code running inside the function etc).
This tool does a lot more than profiling but i haven't explored them all. I would definitely recommend it. The tool is also available for downloading as a fully functional trial version that can run for 30 days. If you have cost constraints, i would say this window is all that you require to pin point your problem.
Trial download here - https://registrationcenter.intel.com/RegCenter/AutoGen.aspx?ProductID=907&AccountID=&ProgramID=&RequestDt=&rm=EVAL&lang=
ps : I have also played with Rational Rational but for some reason I did not take much to it. I suspect Rational might be more expensive than Intel too.
Tools (like true time from DevPartner) that let you see hit counts for source lines let you quickly find algorithms that have bad 'Big O' complexity. You still have to analyse the algorithm to determine how to reduce the complexity.
I second AQTime, having both AQTime and Compuwares DevPartner, for most cases. The reason being that AQTime will profile any executable that has a valid PDB file, whereas TrueTime requires you to make an instrumented build. This greatly speeds up and simplifies ad hoc profiling. DevPartner is also quite a bit more expensive if this is an issue. Where DevPartner comes into its own is with BoundsChecker, which I still rate as a better tool for catching leaks and overwrites than AQTimes execution profiler. TrueTime can be slighly more accurate than AQTime, but I have never found this to be an issue.
Is profiling worthwhile, IMO yes, if you need performance gains on a local application. I think you also learn a lot about how your program and algorithms really work, and the cost implications of using certain types of object classes for storing and iterating through your data.
Glowcode is a very nice profiler (when it works). It can attach to a running program and requires only symbol files - you don't need to rebuild.
Some versions pf visual studio 2005 (and maybe 2008) actually come with a pretty good performance profiler.
if you have it it should be available under the tools menu
or you can search for a way to open the "performance explorer" window to start a new performance session.
A link to MSDN
FYI, Some versions of Visual Studio only come with a non-optimizing compiler. For one of my my MFC apps if I compile it with MINGW/MSYS ( gcc compiler ) with -o3 then it runs about 5-10x as fast as my release compile with Visual Studio.
For example I have an openstreetmap xml compiler and it takes about 3 minutes ( the gcc compiled version) to process a 2.7GB xml file. My visual studio compile of the same code takes about 18 minutes to run.