I've been trying to profile my C++ application in Linux by following this article on perf record. My understanding is all I need to do is run perf record program [program_options], where program is the program executable and [program options] are the arguments I want to pass to the program. However, when I try to profile my application like this:
perf record ./csvJsonTransducer -enable-AVX-deletion test.csv testout.json
perf returns almost immediately with a report. It takes nearly 30 seconds to run./csvJsonTransducer -enable-AVX-deletion test.csv testout.json without perf, though, and I want perf to monitor my program for the entirety of its execution, not return immediately. Why is perf returning so quickly? How can I make it take the entire run of my program into account?
Your commands seems ok. Try change the paranoid level at /proc/sys/kernel/perf_event_paranoid. Setting this parameter to -1 (as root) should solve permission issues:
echo "-1" > /proc/sys/kernel/perf_event_paranoid
You can also try to set the event that you want to monitor with perf record. The default event is cycles (if supported). Check man perf-list.
Try the command:
perf record -e cycles ./csvJsonTransducer -enable-AVX-deletion test.csv testout.json
to force the monitoring of cycles.
Related
When analyzing with Linux perf:
perf record -g -e cycles my-executable
perf report
... the outermost invocation is shown to only use 50.6% of execution time, although the manual asserts it should always be 100%. Where do these samples go?
I'm aware a similar question exists; I think my problem is different because my application runs only for ~5s, and I do not expect the core to idle during this time.
how can I measure memory consumed by process? process quits really quickly so utilities like top are useless. I tried using massif by valgrind, but it measures only memory allocated via malloc/new + stack, and not static variables for example. --pages-as-heap doesn't help as well because it shows mapped memory as well.
Something that might work for you is using a script that will repeatedly run 'ps' immediately after your program starts. I've written up the following script that should work for your purposes, just replace the variables at the top with your specific details. It currently runs 'netstat' in the background (notice the & symbol) and samples the memory 10 times with 0.1 second intervals between the samples, writing the results of the memory checking to a file as it goes. I've run this on cygwin and it works (minus the -o rss,vsz parameters), I don't have access to a linux machine at the moment but it should be simple to adapt if for some reason it doesn't immediately work.
#! /bin/bash
saveFileName=saveFile.txt
userName=jacob
programName=netstat
numberOfSamples="10"
delayBetweenSamples="0.1"
saveFileName=saveFile
i="0"
$programName &
while [ $i -lt $numberOfSamples ]
do
ps -u $userName -o rss,vsz | grep $programName >> $saveFileName
i=$[$i+1]
sleep $delayBetweenSamples
done
If your program completes so fast that the delay between executing it and running ps in the script is too long you might consider running your program with a delay and using a very high sample frequency to try and catch it. You can do that by using 'sleep' and two ampersands like sleep 2 && netstat . That will wait 2 seconds and then run netstat.
If none of this sounds good to you, perhaps try running your program within a debugger. I believe gdb has some memory tracking options you could look into.
I am using perf 3.0.4 on ubuntu 11.10. Its record command works well and displays on terminal 256 samples collected. But when I make use of perf report , it gives me the following error:
perf.data file has no samples
I searched a lot for the solution but no success yet.
This thread has some useful information: http://www.spinics.net/lists/linux-perf-users/msg01436.html
It seems that if you are running in a VM that does not expose the PMU to the guest, the default collection (-e cycles) won't work. Try running with -e cpu-clock. According to that thread, the OP had the same problem also in a real host running Ubuntu 10.04, so it might solve it for you too...
The number of samples reported by the perf record command is an approximation and not the correct number of events (see perf wiki here).
To get the accurate number of events, dump the raw file and use wc -l to count then number of results:
perf report -D -i perf.data | grep RECORD_SAMPLE | wc -l
This command should report 0 in your case where perf report says it can't find events.
Let us know more information about how you use perf record, which event are you sampling, which hardware, which program.
EDIT: you can try first to increase the sampling period or frequency with the -c or -F options
Whenever I run into this on a machine where perf record has worked in the past, it is because I have left something else running that uses the performance counters, e.g., I have perf top running in another terminal tab.
In this case, it seems that perf record simply doesn't record any PMU related samples.
A program is massively reading from the disk but I don't know which file it is reading nor where in the code it is reading.
Is there any kind of tools on linux to monitor this ?
Related question (windows) : Disk IO profiler for existing applications
So, you can use:
/proc/PID/fd
or
lsof -p PID
to know which file your process use.
for example, with lsof -p 27666 (assume 27666 is the PID of a.out program) you can see this:
./a.out 22531 me 9w REG 8,5 131072 528280 /home/me/tmp/test.db
./a.out 22531 me 9r REG 8,5 131072 528280 /home/me/tmp/test2.db
If the system is really busy with IO, just look at top and you'll see the IO-bound process usually stuck in a D-state.
strace -c myprog is my best friend for a first attempt at all generic 'what is my application doing/where is it spending most time' questions. Strace can also attach to running processes, so you can observe the program as it's running.
Another good strace trick is to output it (with strace -o myprogrun.log) to a log file , then view it with a modern vim as it does a very nice job syntax highlighting the log. It's a lot easier to find things this way, as the default strace output is not very human-readable.
Important thing to remember is to log to another partition/set of disks than where the IO problem is! Do not induce extra IO problems as strace can generate a lot of output. I like to use a TmpFS or ZRAM RAM-disks for such occasions.
I want to use valgrind to do some profiling, since it does not need re-build the program. (the program I want to profile is already build with “-g")
But valgrind(callgrind) is quite slow ... so here's what I to do:
start the server ( I want to profile that server)
kind of attach to that server
before I do some operation on server, start collect profile data
after the operation is done, end collecting profile data
analyze the profiling data.
I can do this kind of thing using sun studio on Solaris. (using dbx ). I just want to know is it possible to do the same thing using valgrind(callgrind)?
Thanks
You should look at callgrind documentation, and read about callgrind_control.
Launch your app : valgrind --tool=callgrind --instr-atstart=no your_server.x
See 1.
start collect profile data: callgrind_control -i on
end collect profile data: callgrind_control -i off
Analyze data with kcachegrind or callgrind_annotate/cg_annotate
For profiling only some function you can also find useful CALLGRIND_START_INSTRUMENTATION and CALLGRIND_STOP_INSTRUMENTATION from <valgrind/callgrind.h> header and using callgrind's --instr-atstart=no option as suggested in Doomsday's answer.
You don't say what OS - I'm assuming Linux - in which case you might want to look at oprofile (free) or Zoom (not free, but you can get an evaluation licence), both of which are sampling profilers and can profile existing code without re-compilation. Zoom is much nicer and easier to use (it has a GUI and some nice additional features), but you probably already have oprofile on your system.