substitution for googleperf tools that works on 64 bit Linux - c++

As some of you might know Google provides for free great collection of tools for analyzing c++ code:
http://code.google.com/p/google-perftools/
Problem is that there is apparently some libunwind problem on 64 bits, and authors can't do anything on their side to fix it(
But I don't expect a
fix anytime soon: it depends on the libc folks and the libunwind
folks working out some locking issues. There's unfortunately not
much we ourselves can do.
), so I'm searching for replacement.
Is there any similar tool that provides cool graphical representation of profiling data(for example: )
)
EDIT: paste from README that explains the problem:
2) On x86-64 64-bit systems, while tcmalloc itself works fine, the
cpu-profiler tool is unreliable: it will sometimes work, but sometimes
cause a segfault. I'll explain the problem first, and then some
workarounds.
Note that this only affects the cpu-profiler, which is a
google-perftools feature you must turn on manually by setting the
CPUPROFILE environment variable. If you do not turn on cpu-profiling,
you shouldn't see any crashes due to perftools.
The gory details: The underlying problem is in the backtrace()
function, which is a built-in function in libc. Backtracing is fairly
straightforward in the normal case, but can run into problems when
having to backtrace across a signal frame. Unfortunately, the
cpu-profiler uses signals in order to register a profiling event, so
every backtrace that the profiler does crosses a signal frame.
In our experience, the only time there is trouble is when the signal
fires in the middle of pthread_mutex_lock. pthread_mutex_lock is
called quite a bit from system libraries, particularly at program
startup and when creating a new thread.
The solution: The dwarf debugging format has support for 'cfi
annotations', which make it easy to recognize a signal frame. Some OS
distributions, such as Fedora and gentoo 2007.0, already have added
cfi annotations to their libc. A future version of libunwind should
recognize these annotations; these systems should not see any
crashses.
Workarounds: If you see problems with crashes when running the
cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into
your code, rather than setting CPUPROFILE. This will profile only
those sections of the codebase. Though we haven't done much testing,
in theory this should reduce the chance of crashes by limiting the
signal generation to only a small part of the codebase. Ideally, you
would not use ProfilerStart()/ProfilerStop() around code that spawns
new threads, or is otherwise likely to cause a call to
pthread_mutex_lock!
--- 17 May 2011

Valgrind has a collection of great tools, including callgrind to profile the code. The gui client for the callgrind and cachegrind is kcachegrind.

Related

How to get Valgrind to not instrument a specific shared object?

I'm using a proprietary library to import data, it uses the GPU (OpenGL) to transform the data.
Running my program through Valgrind (memcheck) causes the data import to take 8-12 hours (instead of a fraction of a second). I need to do my Valgrind sessions overnight (and leave my screen unlocked all night, since the GPU stuff pauses while the screen is locked). This is causing a lot of frustration.
I'm not sure if this is related, but Valgrind shows thousands of out-of-bound read/write errors in the driver for my graphics card:
==10593== Invalid write of size 4
==10593== at 0x9789746: ??? (in /usr/lib/x86_64-linux-gnu/dri/i965_dri.so)
(I know how to suppress those warnings).
I have been unable to find any ways of selectively instrumenting code, or excluding certain shared libraries from being instrumented. I remember using a tool on Windows 20 years or so ago that would skip instrumenting selected binaries. It seems this is not possible with memcheck:
Is it possible to make valgrind ignore certain libraries? -- 2010, says this is not possible.
Can I make valgrind ignore glibc libraries? -- 2010, solutions are to disable warnings.
Restricting Valgrind to a specific function -- 2011, says it's not possible.
using valgrind at specific point while running program -- 2013, no answers.
Valgrind: disable conditional jump (or whole library) check -- 2013, solutions are to disable warnings.
...unless things have changed in the last 6 or 7 years.
My question is: Is there anything at all that can be done to speed up the memory check? Or to not check memory accesses in certain parts of the program?
Right now the only solution I see is to modify the program to read data directly from disk, but I'd rather test the actual program I'm planning to deploy. :)
No, this is not possible. When you run an application under Valgrind it is not running natively under the OS but rather in a virtual environment.
Some of the tools like Callgrind have options to control the instrumentation. However, even with the instrumentation off the application under test is still running under the Valgrind virtual environment.
There are a few things you can do to make things less slow
Test an optimized build of your application. You will lost line number information as a result, however.
Turn of leak detection
Avoid costly options like trace-origins
The sanitizers are faster and can also detect stack overflows, but at the cost of requiring instrumentation.

Fast cross-platform cpu profiling of C++ functions

I have written a software in Qt for a company, which works with a hardware device and shows realtime plots. I have problems with its speed and I need to know which parts are cpu intensive, but there are lots of threads and events, and when I use valgrind the application gets so slow that serial handlers won't work as expected and timeouts happen and therefore I can't find what's going on. The code base is huge and simplification is almost impossible, because things depend on each other. I'm developing on MacOSX but the application runs on Linux, I wanted to know if there is a faster profiler than valgrind out there. Preferably one that works on MacOSX, as valgrind doesn't work on MacOSX (It says it does, but there are so many problems that makes it just not practical). Thanks in advance.
P.S. If you need any more info, please comment instead of down voting, I can't offer the code as it is proprietary, but I can say it is well written, and I'm a fairly experienced C++ programmer.

Stack trace running UNIX application

How can I perform a live stack trace on a running UNIX applicaiton, and are there any utilities that are useful in digesting the stack trace once its done?
I'm looking to see if any functions are getting called more often than I would have expected them to be - the application works fine, it just recently slowed down and it doesn't appear that anything else in the system is responsible (no other processes are running with unusual memory/processor usage).
Profiling tools will show what bits of the program are taking up the CPU time. If you have to dig deeper, you may need other tooling. Depending on what species of unix you're after, the tools will vary, as this is sometimes quite platform specific. This article discusses process monitoring on Linux. Different versions of unix may have different sets of utilities for functions that have to interact with the kernel (e.g. Dtrace for Solaris). Some do work across platforms.
Have you tried using an actual profiler on your application? This will help you much more than just a stack trace. Usually you just have to compile your application with profile information. Then you can run it and use the information written to determine in which functions the most time is being spent, number of calls, etc.
How can I perform a live stack trace on a running UNIX applicaiton
gcore can be used to grab a core file for a live process. This will give you a snapshot of the stack traces for all threads. This approach may, however, be a little too heavyweight for your needs.
If the suspect functions are system calls, you could try using strace to see what's going on.
In any case, I think the first port of call should be a profiler, such as gprof.
I'm guessing you want this at runtime without any debugger involvment. For that you could use glibc's backtrace functions. Documentation here and here(assuming Linux) just to get you started. This Linux Journal article might be of help also.
Your debugger may have the functionality to attach to a running process. With gdb this looks like
$ gdb path/to/exec 1234
where 1234 is the PID of the running process.
(But consider those answers that direct you to a profiling utility.)

Memory counter - Collision Detection Project

I thought I would ask the experts - see if you can help me :o)
My son has written C++ code for Collision Detection using Brute Force and Octree algorithms.
He has used Debug etc - and to collect stats on mem usage he has used windows & task manager - which have given him all the end results he has needed so far. The results are not yet as they were expect to be (that Octree would use more memory overall).
His tutor has suggested he checks memory once each is "initialised" and then plot at points through the test.
He was pointed in the direction of Valgrind .... but it looked uite complicated and becaus ehe has autism, he is worried that it might affect his programmes :o)
Anyone suggest a simple way to grab the information on Memory if not also Frame Rate and CPU usage???
Any help gratefully received, as I know nothing so can't help him at all, except for typing this on here - as it's "social" environment he can't deal with it.
Thanks
Rosalyn
For the memory leaks:
If you're on Windows, Visual C++ by Microsoft (the Express version is free) has a nice tool for debugging and is easy to setup with instructions can be found here; otherwise if you're on Linux, Valgrind is one of the standards. I have used the Visual C++ tool often and it's a nice verification that you have no memory leaks. Also, you can use it to enabled your programs to break on allocation numbers that you get from the memory leak log so it quickly points you to when and where the memory is getting assigned that leaks. Again, it's easy to implement (just a few header files and then a single function call where you want to dump the leaks at).
I have found the best way to implement the VC++ tool is to make the call to dump the memory leaks to the output window right before main returns a value. That way, you can catch the leaks of absolutely everything in your program. This works very well and I have used it for some advanced software.
For the framerate and CPU usage:
I usually use my own tools for benchmarking since they're not difficult to code once you learn the functions to call; this would usually require OS API calls, but I think Boost has that available and is cross-platform. There might be other tools out there that can track the process in the OS to get benchmarking data as well, but I'm not certain if they would be free or not.
It looks like you're running under a windows system. This isn't a programming solution, and you may have already tried it (so feel free to ignore), but if not, you should take a look at performance monitor (it's one of the tools that ships with windows). It'll let you track all sorts of useful stats about individual proceses and the system as a whole (cpu/commit size etc). It plots the results for you as a graph as the program is running and you can save the results off for future viewing.
On Windows 7, you get to it from here:
Control Panel\All Control Panel Items\Performance Information and Tools\Advanced Tools
Then Open Performance Monitor.
For older versions of windows, it used to be one of the administrative tools options.

How to profile multi-threaded C++ application on Linux?

I used to do all my Linux profiling with gprof.
However, with my multi-threaded application, it's output appears to be inconsistent.
Now, I dug this up:
http://sam.zoy.org/writings/programming/gprof.html
However, it's from a long time ago and in my gprof output, it appears my gprof is listing functions used by non-main threads.
So, my questions are:
In 2010, can I easily use gprof to profile multi-threaded Linux C++ applications? (Ubuntu 9.10)
What other tools should I look into for profiling?
Edit: added another answer on poor man's profiler, which IMHO is better for multithreaded apps.
Have a look at oprofile. The profiling overhead of this tool is negligible and it supports multithreaded applications---as long as you don't want to profile mutex contention (which is a very important part of profiling multithreaded applications)
Have a look at poor man's profiler. Surprisingly there are few other tools that for multithreaded applications do both CPU profiling and mutex contention profiling, and PMP does both, while not even requiring to install anything (as long as you have gdb).
Try modern linux profiling tool, the perf (perf_events): https://perf.wiki.kernel.org/index.php/Tutorial and http://www.brendangregg.com/perf.html:
perf record ./application
# generates profile file perf.data
perf report
Have a look at Valgrind.
A Paul R said, have a look at Zoom. You can also use lsstack, which is a low-tech approach but surprisingly effective, compared to gprof.
Added: Since you clarified that you are running OpenGL at 33ms, my prior recommendation stands. In addition, what I personally have done in situations like that is both effective and non-intuitive. Just get it running with a typical or problematic workload, and just stop it, manually, in its tracks, and see what it's doing and why. Do this several times.
Now, if it only occasionally misbehaves, you would like to stop it only while it's misbehaving. That's not easy, but I've used an alarm-clock interrupt set for just the right delay. For example, if one frame out of 100 takes more than 33ms, at the start of a frame, set the timer for 35ms, and at the end of a frame, turn it off. That way, it will interrupt only when the code is taking too long, and it will show you why. Of course, one sample might miss the guilty code, but 20 samples won't miss it.
I tried valgrind and gprof. It is a crying shame that none of them work well with multi-threaded applications. Later, I found Intel VTune Amplifier. The good thing is, it handles multi-threading well, works with most of the major languages, works on Windows and Linux, and has many great profiling features. Moreover, the application itself is free. However, it only works with Intel processors.
You can randomly run pstack to find out the stack at a given point. E.g. 10 or 20 times.
The most typical stack is where the application spends most of the time (according to experience, we can assume a Pareto distribution).
You can combine that knowledge with strace or truss (Solaris) to trace system calls, and pmap for the memory print.
If the application runs on a dedicated system, you have also sar to measure cpu, memory, i/o, etc. to profile the overall system.
Since you didn't mention non-commercial, may I suggest Intel's VTune. It's not free but the level of detail is very impressive (and the overhead is negligible).
Putting a slightly different twist on matters, you can actually get a pretty good idea as to what's going on in a multithreaded application using ftrace and kernelshark. Collecting the right trace and pressing the right buttons and you can see the scheduling of individual threads.
Depending on your distro's kernel you may have to build a kernel with the right configuration (but I think that a lot of them have it built in these days).
Microprofile is another possible answer to this. It requires hand-instrumentation of the code, but it seems like it handles multi-threaded code pretty well. And it also has special hooks for profiling graphics pipelines, including what's going on inside the card itself.