Performance profiling on Linux - c++

What are the best tools for profiling C/C++ applications on *nix?
(I'm hoping to profile a server that is a mix of (blocking) file IO, epoll for network and fork()/execv() for some heavy lifting; but general help and more general tools are all also appreciated.)
Can you get the big system picture of RAM, CPU, network and disk all in one overview, and drill into it?
There's been a lot of talk on the kernel lists about things like perf timechart, but I haven't found anything turning up in Ubuntu yet.

I recommend taking stackshots, for which pstack is useful. Here's some more information:
Comments on gprof.
How stackshots work.
A blow-by-blow example.
A very short explanation.
If you want to spend money, Zoom looks like a pretty good tool.

For performance, you can try Callgrind, a Valgrind tool. Here is a nice article showing it in action.

Compile with -pg, run the program, and then use gprof
Compiling (and linking) with -pg adds profiling code and the profiling libraries to the executable, which then produces a file called gmon.out that contains the timing information. gprof displays call graphs and their (absolute and relative) timings.
See man gprof for details.

The canonical example of a full system profiling tool (for Solaris, OS X, FreeBSD) is DTrace. But it is not yet fully available on Linux (you can try here but the site is down for me at the moment, and I haven't tried it myself). There are many tools, in various states of usefulness, for doing full system profiling and kernel profiling on Linux.
You might consider investigating:
oprofile
SystemTap
bootchart
strace (e.g. this SO answer

Allinea MAP is a profiler for C++ and other native languages on Linux. It is commercially supported by my employer. It has a graphical interface and source-line level profiling and profiles code with almost no slowdown which makes it very accurate where timing of other subsystems is relevant - such as for IO.
Callgrind has been useful and accurate - but the slowdown was ~5x so I could only do smaller runs. It can actually count the number of times a function is called which is useful for understanding asymptotic behavior.

Description of using -gp and gproff here http://www.ibm.com/developerworks/library/l-gnuprof.html

oprofile might interest you. Ubuntu should have all the packages you need.

If you can take your application to freeBSD, OS X , or Solaris you can use dtrace, although dtrace is an analyst oriented tool -- i.e., you need to drive it -- read: script it. Nothing else can give you the level of granularity you need; Dtrace can not just profile the latencies of function calls in user-land; it can also follow a context switch into the kernel.

As mentioned in the accepted answer, Zoom can do some amazing things. I've used it to understand thread behavior all the way down to optimizing the assembly generated by the compiler.

The FOSS answer, as already mentioned, is to build with -pg and then use gprof to analyse the output. If it's a product/project that justifies throwing some money at, I would also be tempted to use IBM/Rationals Quantify profiler as that makes it easier to view the profiling data, drill down to the line level or look at it in a '10000ft' level.
Of course there might be viewer for gprof available that can do the same thing, but I am not aware of any.

Related

Difference between Very sleepy and Callgrind for C++ profiling

I am trying to learn the difference between Very Sleepy and Callgrind for profiling. The code that I intend to profile is written in C++ and works under both Linux and Windows.
On Linux, I was able to use Callgrind to look at the Self and inclusive relative costs. From what I understand, Callgrind uses instrumented profiling technique and takes considerable time. However, Very Sleepy uses statistical profiling and is very quick. Since both uses different approaches to profiling, I cannot compare the results from the two.
Is there a way that I can do some sort of profile comparison on both Linux and Windows? Unfortunately, Callgrind is unavailable for Windows and vice versa for Very Sleepy.
No. Such a comparison is between two unlike things. Use sampling when to get accurate profiling you cannot afford overhead. Use instrumentation when you need to understand control flow over time.
Although, I couldn't get the answer for the first question. I have found a new tool that works on both Windows and Linux for C++ code profiling. Its called as CodeXL from AMD and its free.
http://developer.amd.com/tools-and-sdks/opencl-zone/codexl/
Bonus if you have AMD processors or Catalyst graphics processors as some of the other capabilities of the tool becomes available.

profiling in solaris

Can anyone suggest a good tool to profile a program compiled with SunCC compiler.
Also please suggest a good equivalent of valgrind for the same.
DTrace is the best tool for profiling [in] the universe.
DTrace is a comprehensive dynamic
tracing framework for the Solaris™
Operating Environment. DTrace provides
a powerful infrastructure to permit
administrators, developers, and
service personnel to concisely answer
arbitrary questions about the behavior
of the operating system and user
programs.
It's not marketing, it really allows just that.
The Solaris Dynamic Tracing Guide
describes how to use DTrace to
observe, debug, and tune system
behavior. The DTrace guide also
includes a complete reference for
bundled DTrace observability tools and
the D programming language.
DTrace is also available in Mac OS X, (there's a nice GUI for it, Instruments), and a FreeBSD port that has only kernel mode providers is also available.
The Sun Studio compilers include Performance Analyzer for profiling and Memory Runtime Checking features in the dbx debugger.
See also the answers to Locate bad memory access on Solaris.
On SPARC hardware, you may want consider IBM Rational Quantify for performance profiling.
On the cheap, you can get away with pstack sampling, prstat -vL, and instrumenting your application with gethrtime().

Profiling embedded application

I have an application that runs on an embedded processor (ARM), and I'd like to profile the application to get an idea of where it's using system resources, like CPU, memory, IO, etc. The application is running on top of Linux, so I'm assuming there's a number of profiling applications available. Does anyone have any suggestions?
Thanks!
edit: I should also add the version of Linux we're using is somewhat old (2.6.18). Unfortunately I don't have a lot of control over that right now.
As bobah said, gprof and valgrind are useful. You might also want to try OProfile. If your application is in C++ (as indicated by the tags), you might want to consider disabling exceptions (if your compiler lets you) and avoiding dynamic casts, as mentioned above by sashang. See also Embedded C++.
if your Linux is not very limited then you may find gprof and valgrind useful
On a related note, the C++ working group did a technical report on the performance cost of various C++ language features. For example they analyze the cost of dynamic_casting one or 2 levels deep. The reports here http://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf and it might give you some insight into where the pain points in your embedded application might be.
gprof may disappoint you.
Assuming the program you are testing is big enough to be useful, then chances are the call tree could be pruned, so the best opportunities for optimization are function/method calls that you can remove or avoid. That link shows a good way to find them.
Many people approach this as sort of a hierarchical sleuthing process of measuring times.
Or you can simply catch it in the act, which is what I do.

Good c++ profiler for GCC

I tried to find a related question but all previous questions are about profilers for native c++ in windows. I googled a while and learned about gprof, but the output of gprof actually contained lot of obscure internal functions. Is there a good opensource c++ profiler with good documentation?
Valgrind
I totally recommend this
http://en.wikipedia.org/wiki/Valgrind
Don't use gprof, for the reasons given here.
What you need are stackshots, explained here. One way to take stackshots is the pstack utility. Another way is to use "Pause" or ctrl-break under the debugger. Also lsstack, if you can get a copy.
If you want to spend money, RotateRight makes a nice tool based on stack sampling called Zoom.
Compile using the flag -pg and use gprof.
If you don't mind the KDE library dependencies, KCachegrind is very useful with the added visualization. It depends on Callgrind and Valgrind, as one could have guessed, so no special compiler flags required during compile-time.
I've heard oprofile is really, really good for real time apps. Linux only though, AFAIK.
How much detail do you need in your profile reports. If you just want to do some really simple time profiling for a few functions, then the new functionality available via the C++11 chrono classes makes it easy to profile in a cross platform, cross compiler way.
See this article for some simple profiling code that works similarly to Matlab's super easy to use tic and toc functions.

Decent profiler for Windows? [duplicate]

This question already has answers here:
What are some good profilers for native C++ on Windows? [closed]
(8 answers)
Closed 9 years ago.
Does windows have any decent sampling (eg. non-instrumenting) profilers available? Preferably something akin to Shark on MacOS, although i am willing to accept that i am going to have to pay for such a profiler on windows.
I've tried the profiler in VS Team Suite and was not overly impressed, and was wondering if there were any other good ones.
[Edit: Erk, i forgot to say this is for C/C++, rather than .NET -- sorry for any confusion]
For Windows, check out the free Xperf that ships with the Windows SDK. It uses sampled profile, has some useful UI, & does not require instrumentation. Quite useful for tracking down performance problems. You can answer questions like:
Who is using the most CPU? Drill down to function name using call stacks.
Who is allocating the most memory?
Outstanding memory allocations (leaks)
Who is doing the most registry queries?
Disk writes? etc.
I know I'm adding my answer months after this question was asked, but I thought I'd point out a decent, open-source profiler: Very Sleepy.
It doesn't have the feature count that some of the other profilers mentioned before do, but it's a pretty respectable sampling profiler that will work very well in most situations.
Intel VTune is good and is non-instrumenting. We evaluated a whole bunch of profilers for Windows, and this was the best for working with driver code (though it does unmanaged user level code as well). A particular strength is that it reads all the Intel processor performance counters, so you can get a good understanding of why your code is running slowly, and it was useful for putting prefetch instructions into our code and sorting out data layout to work well with the cache lines, and the way cache lines get invalidated in multi core systems.
It is commercial, and I have to say it isn't the easiest UI in the world.
AMD's CodeAnalyst is FREE here
We use both VTune and AQTime, and I can vouch for both. Which works best for you depends on your needs. Both have free trial versions - I suggest you give them a go.
The Windows Driver Kit includes a non-instrumenting user/kernel sampling profiler called "kernrate". It seems useful for profiling multi-process applications, applications that spend most of their time in the kernel, and device drivers (of course). It's also available in the KrView (Kernrate Viewer) and Windows Server 2003 Resource Kit Tools packages.
Kernrate works on Windows 2000 and later (unlike Xperf, which requires Vista / Server 2008). It's command-line based and the documentation has a somewhat intimidating list of options. I'm not sure if it can record call stacks or just the program counter. If you use a symbol server, make sure to put an up-to-date dbghelp.dll and symsrv.dll in the same directory as kernrate.exe to prevent it from using the ancient version of dbghelp.dll that is installed in %SystemRoot%\system32.
I have tried Intel's vtune with a rather large project about two years ago. It was an instrumenting profiler then and it took so long to instrument the DLL that I was attempting to profile that I eventually lost patience after an hour.
The one tool that I have had quite good success and which i would highly recommend is that of AQTime. It not only provides excellent performance profiling resources but it also doe really good memory profiling which has been of significant help to me in tracking down memory leaks.
Luke Stackwalker seems promising -- it's not as polished as I'd like, but it is open source and it does do something that seems very close to what #Mike Dunlavey keeps saying we ought to do. (Of course, it then tries to smoosh it all down into the typically-unhelpful call graphs that Mike is so weary of, but it shouldn't be too hard to fix that with the source as our ally.)
It even seems to count time spent waiting in the kernel, as far as I can tell...
I'm not sure what a non-instrumenting profiler is, but I can say for .NET I love RedGate's ANTS Profiler. Version 3 beats the MS version for ease of use and Version 4, which allows arbitrary time slices, makes MS look like a joke.