how can i use perf to profile my code? - c++

I'm trying to use "perf" to see what's using all the CPU in my C++ program on Linux. I want to attach to a running process and get a list of symbols or line numbers that I can then go look at to optimize.

To attach to a process and see live updates of hotspots:
perf top -p $(pidof yourapp)
To attach to a process, then analyse it for later evaluation, do:
perf record -p $(pidof yourapp)
And later:
perf report
For both, top and record, you can add --call-graph dwarf for dwarf-based callgraphs.
Note that you should compile your application with something like -O2 -g to get optimizations and debug symbols, otherwise you won't know functions names, files, line numbers etc. pp.

Related

Problems with using gperftools on Mac OS X

I have found several conflicting answers over this topic. This blog post requires libuwind, but that doesn't work on Mac OS X. I included #include <google/profiler.h> in my code, however my compiler (g++) could not find the library. I installed gperftools via homebrew. In addition, I found this stackoverflow question showing this:
Then I ran pprof to generate the output:
[hidden ~]$ pprof --text ./a.out cpu.profile
Using local file ./a.out.
Using local file cpu.profile.
Removing __sigtramp from all stack traces.
Total: 282 samples
107 37.9% 37.9% 107 37.9% 0x000000010d72229e
16 5.7% 43.6% 16 5.7% 0x000000010d721a5f
12 4.3% 47.9% 12 4.3% 0x000000010d721de8
...
Running that command (without any of the prior steps) gets me this:
[hidden]$ pprof --text ./a.out cpu.profile
Using remote profile at ./a.out.
Failed to get the number of symbols from http://cpu.profile/pprof/symbol
Why does it try to access an internet site on my machine and a local file on his/hers?
Attempting to link lib profiler as a dry run with g++ gets me:
[hidden]$ g++ -l libprofiler
ld: library not found for -llibprofiler
clang: error: linker command failed with exit code 1 (use -v to see invocation)
I have looked at the man pages, the help option text, the official online guide, blog posts, and many other sources.
I am so confused right now. Can someone help me use gperftools?
The result of my conversation with #osgx was this script. I tried to clean it up a bit. It likely contains quite a few unnecessary options too.
The blog post https://dudefrommangalore.wordpress.com/2012/02/09/profiling-c-code-using-google-performance-tools/ "Profiling C++ code using Google Performance Tools" 2012 by dudefrommangalore missed the essential step.
You should link your program (which you want to be profiled) with cpu profiler library of gperftools library.
Check official manual: http://goog-perftools.sourceforge.net/doc/cpu_profiler.html, part "Linking in the Library"
add -lprofiler to the link-time step for your executable. (It's also probably possible to add in the profiler at run-time using LD_PRELOAD, but this isn't necessarily recommended.)
Second step is to collect the profile, run the code with profiling enabled. In linux world it was done by setting controlling environment variable CPUPROFILE before running:
CPUPROFILE=name_of_profile ./program_to_be_profiled
Third step is to use pprof (google-pprof in ubuntu world). Check that there is not-empty name_of_profile profile file generated; it there is no such file, pprof will try to do remote profile fetch (you see output of such try).
pprof ./program_to_be_profiled name_of_profile
First you need to run your program with profiling enabled.
This is usually first linking your program with libprofiler and then running it with CPUPROFILE=cpu.profile.
I.e.
$ CPUPROFILE=cpu.profile my_program
I think that later step is what you have been missing.
The program will create this cpu.profile file when it exits. And then you can use pprof (preferably from github.com/google/pprof) on it to visualize/analyze.

GNU Profiler Empty Flat Profile

I am trying to learn profiling with GNU gprof, and I just tried to take the same results as in this tutorial by using exactly the same code and applying the same steps there: http://www.ibm.com/developerworks/library/l-gnuprof.html
However, when I try to reach flat profile by gprof example1 gmon.out -p, what I see nothing. It says
Each sample counts as 0.01 seconds. no time accumulated
and I am seeing an empty flat profile. And when I try to obtain call graph, I receive
gprof: gmon.out file is missing call-graph data
I am using Ubuntu on a VirtualBox VM. May using VirtualBox be the reason?

How to map PC (ARMv5) address to source code?

I'm developing on an ARM9E processor running Linux. Sometimes my application crashes with the following message :
[ 142.410000] Alignment trap: rtspserverd (996) PC=0x4034f61c
Instr=0xe591300c Address=0x0000000d FSR 0x001
How can I translate the PC address to actual source code? In other words, how can I make sense out of this message?
With objdump. Dump your executable, then search for 4034f61c:.
The -x, --disassemble, and -l options are particularly useful.
You can turn on listings in the compiler and tell the linker to produce a map file. The map file will give you the meaning of the absolute addresses up to the function where the problem occurs, while the listing will help you pinpoint the exact location of the exception within the function.
For example in gcc you can do
gcc -Wa,-a,-ad -c foo.c > foo.lst
to produce a listing in the file foo.lst.
-Wa, sends the following options to the assembler (gas).
-a tells gas to produce a listing on standard output.
-ad tells gas to omit debug directives, which would otherwise add a lot of clutter.
The option for the GNU linker to produce a map file is -M or --print-map. If you link with gcc you need to pass the option to the linker with an option starting with -Wl,, for example -Wl,-M.
Alternatively you could also run your application in the debugger (e.g. gdb) and look at the stack dump after the crash with the bt command.

How can I output a C + Assembly program trace using GDB?

I'm debugging a nasty problem where #includeing a file(not anything I wrote, for the record) causes a crash in my program. This means, I have working and broken, with only one C(++) include statement changed. Some of the libraries I'm using don't have debugging information.
What I would like to do is get GDB to output every line of C++ executed for the program run, and x86 instructions where not available to a textfile in such a format that I can diff the two outputs and hopefully figure out what went wrong.
Is this easily possible in GDB?
You can check the difference between the pre-processed output in each version. For example:
gcc -dD -E a.cc -o a.pre
gcc -dD -E b.cc -o b.pre
diff -u a.pre b.pre
You can experiment with different "-d" settings to make that more verbose/concise. Maybe something in the difference of listings will be obvious. It's usually something like a struct which changes size depending on include files.
Failing that, if you really want to mess with per-instruction or line traces, you could probably use valgrind and see where the paths diverge, but I think you may be in for a world of pain. In fact you'll probably find valgrind finds your bug and then 100 you didn't know about :) I expect the problem is just a struct or other data size difference, and you won't need to bother.
You could get gdb to automate line tracing. It would be quite painful. Basically you'd need to script it to run "n" (next line) repeatedly until a crash, then check the logs. If you can script "b main", then "run", then infinite "n" that would do it. There's probably a built-in command to do it but I'm not aware of it.
I don't think GDB can do this; maybe a profile will help, though? Are you compiling with gcc? Look at the -p and -pf commands, I think those might be useful.
The disassemble command at the gdb prompt will disassemble the current function you are stopped in, but I don't think outputting the entire execution path is feasible.
What library are you including? If it is open source, you can recompile it with debugging symbols enabled. Also, if you're using Linux, most distributions have -dbg versions of packages for common libraries.

analysis of core file

I'm using Linux redhat 3, can someone explain how is that possible that i am able to analyze
with gdb , a core dump generated in Linux redhat 5 ?
not that i complaint :) but i need to be sure this will always work... ?
EDIT: the shared libraries are the same version, so no worries about that, they are placed in a shaerd storage so it can be accessed from both linux 5 and linux 3.
thanks.
You can try following commands of GDB to open a core file
gdb
(gdb) exec-file <executable address>
(gdb) set solib-absolute-prefix <path to shared library>
(gdb) core-file <path to core file>
The reason why you can't rely on it is because every process used libc or system shared library,which will definitely has changes from Red hat 3 to red hat 5.So all the instruction address and number of instruction in native function will be diff,and there where debugger gets goofed up,and possibly can show you wrong data to analyze. So its always good to analyze the core on the same platform or if you can copy all the required shared library to other machine and set the path through set solib-absolute-prefix.
In my experience analysing core file, generated on other system, do not work, because standard library (and other libraries your program probably use) typically will be different, so addresses of the functions are different, so you cannot even get a sensible backtrace.
Don't do it, because even if it works sometimes, you cannot rely on it.
You can always run gdb -c /path/to/corefile /path/to/program_that_crashed. However, if program_that_crashed has no debug infos (i.e. was not compiled and linked with the -g gcc/ld flag) the coredump is not that useful unless you're a hard-core debugging expert ;-)
Note that the generation of corefiles can be disabled (and it's very likely that it is disabled by default on most distros). See man ulimit. Call ulimit -c to see the limit of core files, "0" means disabled. Try ulimit -c unlimited in this case. If a size limit is imposed the coredump will not exceed the limit size, thus maybe cutting off valuable information.
Also, the path where a coredump is generated depends on /proc/sys/kernel/core_pattern. Use cat /proc/sys/kernel/core_pattern to query the current pattern. It's actually a path, and if it doesn't start with / then the file will be generated in the current working directory of the process. And if cat /proc/sys/kernel/core_uses_pid returns "1" then the coredump will have the file PID of the crashed process as file extension. You can also set both value, e.g. echo -n /tmp/core > /proc/sys/kernel/core_pattern will force all coredumps to be generated in /tmp.
I understand the question as:
how is it possible that I am able to
analyse a core that was produced under
one version of an OS under another
version of that OS?
Just because you are lucky (even that is questionable). There are a lot of things that can go wrong by trying to do so:
the tool chains gcc, gdb etc will
be of different versions
the shared libraries will be of
different versions
so no, you shouldn't rely on that.
You have asked similar question and accepted an answer, ofcourse by yourself here : Analyzing core file of shared object
Once you load the core file you can get the stack trace and get the last function call and check the code for the reason of crash.
There is a small tutorial here to get started with.
EDIT:
Assuming you want to know how to analyse core file using gdb on linux as your question is little unclear.