perf.data file has no samples - profiling

I am using perf 3.0.4 on ubuntu 11.10. Its record command works well and displays on terminal 256 samples collected. But when I make use of perf report , it gives me the following error:
perf.data file has no samples
I searched a lot for the solution but no success yet.

This thread has some useful information: http://www.spinics.net/lists/linux-perf-users/msg01436.html
It seems that if you are running in a VM that does not expose the PMU to the guest, the default collection (-e cycles) won't work. Try running with -e cpu-clock. According to that thread, the OP had the same problem also in a real host running Ubuntu 10.04, so it might solve it for you too...

The number of samples reported by the perf record command is an approximation and not the correct number of events (see perf wiki here).
To get the accurate number of events, dump the raw file and use wc -l to count then number of results:
perf report -D -i perf.data | grep RECORD_SAMPLE | wc -l
This command should report 0 in your case where perf report says it can't find events.
Let us know more information about how you use perf record, which event are you sampling, which hardware, which program.
EDIT: you can try first to increase the sampling period or frequency with the -c or -F options

Whenever I run into this on a machine where perf record has worked in the past, it is because I have left something else running that uses the performance counters, e.g., I have perf top running in another terminal tab.
In this case, it seems that perf record simply doesn't record any PMU related samples.

Related

perf samples do not add up to 100%

When analyzing with Linux perf:
perf record -g -e cycles my-executable
perf report
... the outermost invocation is shown to only use 50.6% of execution time, although the manual asserts it should always be 100%. Where do these samples go?
I'm aware a similar question exists; I think my problem is different because my application runs only for ~5s, and I do not expect the core to idle during this time.

profiling linux application with perf record

I've been trying to profile my C++ application in Linux by following this article on perf record. My understanding is all I need to do is run perf record program [program_options], where program is the program executable and [program options] are the arguments I want to pass to the program. However, when I try to profile my application like this:
perf record ./csvJsonTransducer -enable-AVX-deletion test.csv testout.json
perf returns almost immediately with a report. It takes nearly 30 seconds to run./csvJsonTransducer -enable-AVX-deletion test.csv testout.json without perf, though, and I want perf to monitor my program for the entirety of its execution, not return immediately. Why is perf returning so quickly? How can I make it take the entire run of my program into account?
Your commands seems ok. Try change the paranoid level at /proc/sys/kernel/perf_event_paranoid. Setting this parameter to -1 (as root) should solve permission issues:
echo "-1" > /proc/sys/kernel/perf_event_paranoid
You can also try to set the event that you want to monitor with perf record. The default event is cycles (if supported). Check man perf-list.
Try the command:
perf record -e cycles ./csvJsonTransducer -enable-AVX-deletion test.csv testout.json
to force the monitoring of cycles.

Linux process allocated memory usage

how can I measure memory consumed by process? process quits really quickly so utilities like top are useless. I tried using massif by valgrind, but it measures only memory allocated via malloc/new + stack, and not static variables for example. --pages-as-heap doesn't help as well because it shows mapped memory as well.
Something that might work for you is using a script that will repeatedly run 'ps' immediately after your program starts. I've written up the following script that should work for your purposes, just replace the variables at the top with your specific details. It currently runs 'netstat' in the background (notice the & symbol) and samples the memory 10 times with 0.1 second intervals between the samples, writing the results of the memory checking to a file as it goes. I've run this on cygwin and it works (minus the -o rss,vsz parameters), I don't have access to a linux machine at the moment but it should be simple to adapt if for some reason it doesn't immediately work.
#! /bin/bash
saveFileName=saveFile.txt
userName=jacob
programName=netstat
numberOfSamples="10"
delayBetweenSamples="0.1"
saveFileName=saveFile
i="0"
$programName &
while [ $i -lt $numberOfSamples ]
do
ps -u $userName -o rss,vsz | grep $programName >> $saveFileName
i=$[$i+1]
sleep $delayBetweenSamples
done
If your program completes so fast that the delay between executing it and running ps in the script is too long you might consider running your program with a delay and using a very high sample frequency to try and catch it. You can do that by using 'sleep' and two ampersands like sleep 2 && netstat . That will wait 2 seconds and then run netstat.
If none of this sounds good to you, perhaps try running your program within a debugger. I believe gdb has some memory tracking options you could look into.

How to profile an app running inside KVM guest

Is there any way to profile an application running inside KVM guest using a tool like perf_events?
I've tried to do that using
perf kvm --guestkallsyms=.. --guestmodules=.. --guest record -a
but information in report is pretty useless:
# ========
#
# Samples: 627 of event 'cache-misses'
# Event count (approx.): 295421
#
# Overhead Command Shared Object Symbol
# ........ ....... ................ ......................
#
73.18% :15661 [x_tables] [g] 0xffffffff8176bc80
26.82% :15661 [unknown] [u] 0x00000000004004fe
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
No.
The perf tool runs in the host and does not have any way to get information about the applications in the guest. I think the attribution of samples to guest-kernelspace or guest-userspace is based on the cpu-mode at the time the sample was taken (not on higher-level information about what the guest is doing).
You can get some profiling information by running perf directly in the guest. Use perf list to see the options (they are probably all in the 'software' category).
Yes, you probably can. The host can see the guest. You can use raw hardware events to do so (just check the event number to be available on your system).
For me this works as an example:
sudo perf kvm stat -I 1000 -e r1a8 -a
(make sure you are monitoring the guest by turning off the KVM machine after a while and see the zeros ..)
yes,
what about
sudo perf kvm stat record -p appPID
it should work based on the help of perf kvm but it does not! it works fine in system wide mode with -a

Disk IO profiler for a C++ application on Linux

A program is massively reading from the disk but I don't know which file it is reading nor where in the code it is reading.
Is there any kind of tools on linux to monitor this ?
Related question (windows) : Disk IO profiler for existing applications
So, you can use:
/proc/PID/fd
or
lsof -p PID
to know which file your process use.
for example, with lsof -p 27666 (assume 27666 is the PID of a.out program) you can see this:
./a.out 22531 me 9w REG 8,5 131072 528280 /home/me/tmp/test.db
./a.out 22531 me 9r REG 8,5 131072 528280 /home/me/tmp/test2.db
If the system is really busy with IO, just look at top and you'll see the IO-bound process usually stuck in a D-state.
strace -c myprog is my best friend for a first attempt at all generic 'what is my application doing/where is it spending most time' questions. Strace can also attach to running processes, so you can observe the program as it's running.
Another good strace trick is to output it (with strace -o myprogrun.log) to a log file , then view it with a modern vim as it does a very nice job syntax highlighting the log. It's a lot easier to find things this way, as the default strace output is not very human-readable.
Important thing to remember is to log to another partition/set of disks than where the IO problem is! Do not induce extra IO problems as strace can generate a lot of output. I like to use a TmpFS or ZRAM RAM-disks for such occasions.