CPU usage of a process tree using perf? - profiling

I executed the following command for getting the cpu usage
perf record -F 99 -p PID sleep 1
But I think this command is giving me the cpu usage of this process only.
If it fork any new process then the cpu usage of that process will not be included in the perf report.
Can someone suggest how can we get the cpu usage of a given PID with usage of all its successors combined ?
Also i am interested in getting the peak cpu usage at any interval.Is this also possible using perf?

Related

Profiling part of a program's execution

I have a complex application that executes in a number of phases. I would like to profile only one of the phases.
The C++ application runs on Linux, x86-64.
This program takes several minutes to run. If I use perf to profile the whole thing, the resulting data set is too large for perf report to process. However, at this point I am interested only in profiling the execution of one phase of the program that takes maybe 1/3 of the total time. Perhaps this data set will be easier for perf to report on.
Ideally, I'd like something along the lines of "send yourself SIGUSR1 to start profiling, and SIGUSR2 to stop it". At that point I can easily delineate the execution phase that I want profile information for.
I can always write my own (albeit basic) profiler using SIGPROF, but is there a way I can do this with existing tools such as perf?
A possible way to do this is to attach perf to an existing process.
So, start your program, check out its pid. Then start profiling with the -p <pid> option, when it's appropriate. And use CTRL-C or SIGINT to stop the profiling. But this trick works only if you don't need to start/stop profiling a lot of times, as the data append functionality has been removed from perf long time ago.
Or maybe you can just decrease sampling frequency with -F, so the resulting data becomes more tractable.

CPU usage drops on Linux (Ubuntu)

I was using AWS c5.4xlarge instance, which has 16 vCPU, and running a 10 processes python programme. However, the CPU usage of each process gradually dropped down to 10% as showed in the picture in just 10 seconds. The total CPU usage of the 16 vCPU instances was just about 6%.
I reduced the number of the processes, but the CPU usage of each process was still quite low. Everything is ok on my own macOS.
What is wrong with this?
Ok I find the answer. This is about processor affinity. For Linux beginner:
https://en.wikipedia.org/wiki/Processor_affinity
In Linux you can allocate a cpu to a specific process in the Linux ternimal:
$ taskset -cp CPU_ID PID
For example:
$ taskset -cp 0-1 1000
will allocate CPU 0 and 1 to process with ID 1000.
You can find the PID by using
$ top
in your terminal.

yourkit cpu view and top command - showing different results

I am looking at my process via top command and it shows very high value on the CPU%. however when I look on the same process via yourkit cpu view it shows completely different result. how can it be ?
YourKit profiler treats entire CPU with all cores as 100%. It means that if you have 4 cores and 1 core is fully loaded and other 3 cores sleep, then CPU usage will be 25% (not 100%).
After this explanation YourKit results correlate good with "top".
Even I have the same confusion. From what I understand top command displays as a percentage of a single CPU. On multi-core systems, you can have percentages that are greater than 100%
https://unix.stackexchange.com/questions/145247/understanding-cpu-while-running-top-command

Linux Performance Monitoring, any way to monitor per-thread?

I am using Linux Ubuntu, and programming in C++. I have been able to access the performance counters (instruction counts, cache misses etc) using perf_event (actually using programs from this link: https://github.com/castl/easyperf).
However, now I am running a multi-threaded application using pthreads, and need the instruction counts and cycles to completion of each thread separately. Any ideas on how to go about this?
Thanks!
perf is a system profiling tool you can use. it's not like https://github.com/castl/easyperf), which is a library and you use it in your code. Following the steps and use it to profile your program:
Install perf on Ubuntu. The installation could be quite different in different Linux distribution. You can find out the installation tutorial line.
Simply run your program and get all thread id of your program:
ps -eLf | grep [application name]
open separate terminal and run perf as perf stat -t [threadid] according to man page:
usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
-i, --no-inherit child tasks do not inherit counters
-p, --pid <n> stat events on existing process id
-t, --tid <n> stat events on existing thread id
-a, --all-cpus system-wide collection from all CPUs
-c, --scale scale/normalize counters
-v, --verbose be more verbose (show counter open errors, etc)
-r, --repeat <n> repeat command and print average + stddev (max: 100)
-n, --null null run - dont start any counters
-B, --big-num print large numbers with thousands' separators
there is an analysis article about perf, you can get a feeling about it.
You can use standard tool to access perf_event - the perf (from linux-tools). It can work with all threads of your program and report summary profile and per-thread (per-pid/per-tid) profile.
This profile is not exact hardware counters, but rather result of sampling every N events, with N tuned to be reached around 99 Hz (times per second). You can also try -c 2000000 option to get sample every 2 millions of hardware event. For example, cycles event (full list - perf list or try some listed in perf stat ./program)
perf record -e cycles -F 99 ./program
perf record -e cycles -c 2000000 ./program
Summary on all threads. -n will show you total number of samples
perf report -n
Per pid (actually tids are used here, so it will allow you to select any thread).
Text variant will list all threads recorded with summary sample count (with -c 2000000 you can multiply sample count with 2 million to estimate hw event count for the thread)
perf report -n -s pid | cat
Or ncurses-like interactive variant where you can select any thread and see its own profile:
perf report -n -s pid
Please take a look at the perf tool documentation here, it supports some of the events (eg: both instructions and cache-misses) that you're looking to profile. Extract from the wiki page linked above:
The perf tool can be used to count events on a per-thread, per-process, per-cpu or system-wide basis. In per-thread mode, the counter only monitors the execution of a designated thread. When the thread is scheduled out, monitoring stops. When a thread migrated from one processor to another, counters are saved on the current processor and are restored on the new one.

Getting current cpu usage in c++/windows for particular process

I want to calculate current cpu usage for particular application in my code. I looked up on internet and found pdh library for windows. When I tried it I am getting overall cpu usage not cpu usage for one process.
PdhAddCounter(hquery, TEXT("\\Processor(_Total)\\% Processor Time"),0,&counter);
So what I do with this line to get cpu usage for particular process? I tried replacing _Total with process name(explorer). At that time I am getting 0 cpu usage. But I checked in resource monitor that opening many windows at a time increased cpu usage upto 20%. Still in log file cpu usage is showing 0.
Can anyone help me with this?
thanks in advance.
You need to use GetProcessTimes
And unfortunately, it won't give you the "CPU usage", it will give you the amount of CPU-time since the process started. So to get CPU usage, you will need to take one sample, store that, and then take another sample a known amount of time later, and then calculate the time (and if you want to know the total usage, you'll need to add the usertime and kerneltime together, of course).
You can check this for example. Explained everything in that project. It will give memory based on process id(same way shown in task manager)
Thanks,
Darshan