Enable/disable perf event collection programmatically - c++

I'm using perf for profiling on Ubuntu 20.04 (though I can use any other free tool). It allows to pass a delay in CLI, so that event collection starts after a certain time since program launch. However, this time varies a lot (by 20 seconds out of 1000) and there are tail computations which I am not interested in either.
So it would be great to call some API from my program to start perf event collection for the fragment of code I'm interested in, and then stop collection after the code finishes.
It's not really an option to run the code in a loop because there is a ~30 seconds initialization phase and 10 seconds measurement phase and I'm only interested in the latter.

There is an inter-process communication mechanism to achieve this between the program being profiled (or a controlling process) and the perf process: Use the --control option in the format --control=fifo:ctl-fifo[,ack-fifo] or --control=fd:ctl-fd[,ack-fd] as discussed in the perf-stat(1) manpage. This option specifies either a pair of pathnames of FIFO files (named pipes) or a pair of file descriptors. The first file is used for issuing commands to enable or disable all events in any perf process that is listening to the same file. The second file, which is optional, is used to check with perf when it has actually executed the command.
There is an example in the manpage that shows how to use this option to control a perf process from a bash script, which you can easily translate to C/C++:
ctl_dir=/tmp/
ctl_fifo=${ctl_dir}perf_ctl.fifo
test -p ${ctl_fifo} && unlink ${ctl_fifo}
mkfifo ${ctl_fifo}
exec ${ctl_fd}<>${ctl_fifo} # open for read+write as specified FD
This first checks the file /tmp/perf_ctl.fifo, if exists, is a named pipe and only then it deletes it. It's not a problem if the file doesn't exist, but if it exists and it's not a named pipe, the file should not be deleted and mkfifo should fail instead. The mkfifo creates a named pipe with the pathname /tmp/perf_ctl.fifo. The next command then opens the file with read/write permissions and assigns the file descriptor to ctl_fd. The equivalent syscalls are fstat, unlink, mkfifo, and open. Note that the named pipe will be written to by the shell script (controlling process) or the process being profiled and will be read from the perf process. The same commands are repeated for the second named pipe, ctl_fd_ack, which will be used to receive acknowledgements from perf.
perf stat -D -1 -e cpu-cycles -a -I 1000 \
--control fd:${ctl_fd},${ctl_fd_ack} \
-- sleep 30 &
perf_pid=$!
This forks the current process and runs the perf stat program in the child process, which inherits the same file descriptors. The -D -1 option tells perf to start with all events disabled. You probably need to change the perf options as follows:
perf stat -D -1 -e <your event list> --control fd:${ctl_fd},${ctl_fd_ack} -p pid
In this case, the program to be profiled is the the same as the controlling process, so tell perf to profile your already running program using -p. The equivalent syscalls are fork followed by execv in the child process.
sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
The example script sleeps for about 5 seconds, writes 'enable' to the ctl_fd pipe, and then checks the response from perf to ensure that the events have been enabled before proceeding to disable the events after about 10 seconds. The equivalent syscalls are write and read.
The rest of the script deletes the file descriptors and the pipe files.
Putting it all together now, your program should look like this:
/* PART 1
Initialization code.
*/
/* PART 2
Create named pipes and fds.
Fork perf with disabled events.
perf is running now but nothing is being measured.
You can redirect perf output to a file if you wish.
*/
/* PART 3
Enable events.
*/
/* PART 4
The code you want to profile goes here.
*/
/* PART 5
Disable events.
perf is still running but nothing is being measured.
*/
/* PART 6
Cleanup.
Let this process terminate, which would cause the perf process to terminate as well.
Alternatively, use `kill(pid, SIGINT)` to gracefully kill perf.
perf stat outputs the results when it terminates.
*/

Related

C++ pause/resume system on large operation

I have a C++ program that loads a file with few millions lines and starts processing, the same operation was done by a php script, but in order to reduce the execution time I switched to C++.
In the old script, I checked whether there is a file with the current operation id in a "pause" folder, the file is empty It is just to check if a pause is requested, the script then checks after each 5 iterations if there is such file, if so It stuck on an empty loop until the file is deleted (a.k.a resume) :
foreach($lines as $line)
{
$isFinished = $index >= $countData - 1;
if($index % 5 == 0)
{
do
{
$isPaused = file_exists("/home/pauses/".$content->{'drop-id'});
}while($isPaused);
}
// Starts processing the line here
}
But since disk accessing is relatively slow, I don't want to follow the same approach, so I was thinking of some sort of commands that simulates this :
$ kill cpp_program // C++ program returns the last index checked e.g: 37710
$ ./main 37710
$ // cpp_program escapes the first 37709 lines and continues its job
What do you think of this approach ? Is-it feasible ? Is-it non time-consuming ? Is there any better approach ?
Thank you
Edit : A clarification because this seems a little ambiguous, this task runs in the background, there is another application which starts this one, I want to be able to send command from the management app (through Linux commands) to the background task to pause/resume.
Jumping to the 37710 line of a text file sadly requires reading all 37710 lines before it on most operating systems.
On most operating systems, text files are binary files with a convention about newlines. But the OS doesn't cache where the newlines are.
So to find the newlines, you have to read every byte.
If your program saved the byte offset of the file it had reached, it could seek to that location, however.
You can save the state of your program to some config file as you are shutting down, and set it to resume by default when it starts up again. This will require catching the signal you use to shut down, making your main logic notice the signal flag being set, and then cleanly shutting down. It is a very C-esque operation.
Now, a different traditional way to make a program controllable remotely is to have it listen on a TCP port (and/or stdin) and take command line commands there.
To go that way, you'd write a REPL component, then hook that up to whatever input and output.
Either you'd do the REPL in a coroutine like way between processing files, or you'd spawn a separate thread to do REPL and have it communicate asynchronously with the processing thread.
However, this could be beyond your skill. Each step of this (writing a REPL system, having it not block the main work, responding to commands, then attaching it to a TCP port) would take some effort and learning on your part.

Call function when program is killed

I run some of my C++-programs on a HPC, scheduled using SLURM. Sometimes my programs get killed, either because they are using too many resources, or because they run too long. Usually, if my program is finished running, or encounters an internal error, I get a message telling me that fact, and I can apply appropriate actions.
But if my program is killed by the queue manager, I do not get any messages (and yes, I specified that I would like to get those messages in the job file, but somehow that does not work properly). Thus I was wondering if there is a possibility to call a function within the program when encountering the kill signal, or another way to tell me when my main program is killed?
Maybe you could look the other way around. Stop your program a bit before its end, and do your duty to produce a clean exit. With slurm you can use :
#SBATCH --signal=B:USR1#120
to send a signal to your bash script 120 seconds prior its job limit. Just trap this signal and produce a clean exit.
I use it, and it works very well.
#!/bin/bash -l
# job name
#SBATCH --job-name=example
# replace this by your account
#SBATCH --account=...
# one core only
#SBATCH --ntasks=1
# we give this job 4 minutes
#SBATCH --time=0-00:04:00
# asks SLURM to send the USR1 signal 120 seconds before end of the time limit
#SBATCH --signal=B:USR1#120
# define the handler function
# note that this is not executed here, but rather
# when the associated signal is sent
your_cleanup_function()
{
echo "function your_cleanup_function called at $(date)"
# do whatever cleanup you want here
}
# call your_cleanup_function once we receive USR1 signal
trap 'your_cleanup_function' USR1
echo "starting calculation at $(date)"
# the calculation "computes" (in this case sleeps) for 1000 seconds
# but we asked slurm only for 240 seconds so it will not finish
# the "&" after the compute step and "wait" are important
sleep 1000 &
wait
Those lines were extracted from here

Linux Performance Monitoring, any way to monitor per-thread?

I am using Linux Ubuntu, and programming in C++. I have been able to access the performance counters (instruction counts, cache misses etc) using perf_event (actually using programs from this link: https://github.com/castl/easyperf).
However, now I am running a multi-threaded application using pthreads, and need the instruction counts and cycles to completion of each thread separately. Any ideas on how to go about this?
Thanks!
perf is a system profiling tool you can use. it's not like https://github.com/castl/easyperf), which is a library and you use it in your code. Following the steps and use it to profile your program:
Install perf on Ubuntu. The installation could be quite different in different Linux distribution. You can find out the installation tutorial line.
Simply run your program and get all thread id of your program:
ps -eLf | grep [application name]
open separate terminal and run perf as perf stat -t [threadid] according to man page:
usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
-i, --no-inherit child tasks do not inherit counters
-p, --pid <n> stat events on existing process id
-t, --tid <n> stat events on existing thread id
-a, --all-cpus system-wide collection from all CPUs
-c, --scale scale/normalize counters
-v, --verbose be more verbose (show counter open errors, etc)
-r, --repeat <n> repeat command and print average + stddev (max: 100)
-n, --null null run - dont start any counters
-B, --big-num print large numbers with thousands' separators
there is an analysis article about perf, you can get a feeling about it.
You can use standard tool to access perf_event - the perf (from linux-tools). It can work with all threads of your program and report summary profile and per-thread (per-pid/per-tid) profile.
This profile is not exact hardware counters, but rather result of sampling every N events, with N tuned to be reached around 99 Hz (times per second). You can also try -c 2000000 option to get sample every 2 millions of hardware event. For example, cycles event (full list - perf list or try some listed in perf stat ./program)
perf record -e cycles -F 99 ./program
perf record -e cycles -c 2000000 ./program
Summary on all threads. -n will show you total number of samples
perf report -n
Per pid (actually tids are used here, so it will allow you to select any thread).
Text variant will list all threads recorded with summary sample count (with -c 2000000 you can multiply sample count with 2 million to estimate hw event count for the thread)
perf report -n -s pid | cat
Or ncurses-like interactive variant where you can select any thread and see its own profile:
perf report -n -s pid
Please take a look at the perf tool documentation here, it supports some of the events (eg: both instructions and cache-misses) that you're looking to profile. Extract from the wiki page linked above:
The perf tool can be used to count events on a per-thread, per-process, per-cpu or system-wide basis. In per-thread mode, the counter only monitors the execution of a designated thread. When the thread is scheduled out, monitoring stops. When a thread migrated from one processor to another, counters are saved on the current processor and are restored on the new one.

Make child process spawned with system() keep running after parent gets kill signals and exits

In a Linux/C++ library I'm launching a process via the system() call,
system("nohup processName > /dev/null&");
This seems to work fine with a simple test application that exits on it's own, but if I use this from inside of a Nodejs/V8 extension which gets a kill signal, the child process gets killed. I did find that running,
system("sudo nohup processName > /dev/null&");
With the sudoers file set up to not require a password manages to make this run even when the parent process (node) exits. Is there someway to entirely detach the child process so signals sent to the parent and the parent exiting have no effect on the child anymore? Preferably within the system() call and not something that requires getting the process ID and doing something with it.
The procedure to detach from the parent process is simple: Run the command under setsid (so it starts in a new session), redirecting standard input, output and error to /dev/null (or somewhere else, as appropriate), in background of a subshell. Because system() starts a new shell, it is equivalent to such a subshell, so
system("setsid COMMAND </dev/null >/dev/null 2>/dev/null &");
does exactly what is needed. In a shell script, the equivalent is
( setsid COMMAND </dev/null >/dev/null 2>/dev/null & )
(Shell scripts need a subshell, because otherwise the COMMAND would be under job control for the current shell. That is not important when using system(), because it starts a new shell just for the command anyway; the shell will exit when the command exits.)
The redirections are necessary to make sure the COMMAND has no open descriptors to the current terminal. (When the terminal closes, a TERM signal is sent to all such processes.) This means standard input, standard output, and standard error all must be redirected. The above redirections work in both Bash and POSIX shells, but might not work in ancient versions of /bin/sh. In particular, it should work in all Linux distros.
setsid starts a new session; the COMMAND becoming the process group leader for its own process group. Signals can be directed to either a single process, or to all processes in a process group. Termination signals are usually sent to entire process groups (since an application may technically consist of multiple related processes). Starting a new session makes sure COMMAND does not get killed if the process group the parent proces belongs to is killed by a process-group wide signal.
My guess is that the whole process group is being killed. You could try setpgid in the child to start a new process group. The first step should be to get rid of system and use fork and execve or posix_spawn.

Kill Bash copy child process to simulate crash

I'm trying to test a Bash script which copies files individually and does some stuff to each file. It is meant to be resumable, so I'd like to make sure to test this properly. What is an elegant solution to kill or otherwise abort the script which does the copying from the test script, making sure it does not have time to copy and process all the files?
I have the PID of the child process, I can change the source code of both scripts, and I can create arbitrarily large files to test on.
Clarification: I start the script in the background with &, get the PID as $!, then I have a loop which checks that there is at least one file in the target directory (the test script copies three files). At that point I run kill -9 $PID, but the process is not interrupted - The files are copied successfully. This happens even if the files are big enough that creating them (with dd and /dev/urandom) takes a couple seconds.
Could it be that the files are only visible to the shell when cp has finished? It would be a bit strange, but it would explain why the kill command is too late.
Also, the idea is not to test resuming the same process, but cutting off the first process (simulate a system crash) and resuming with another invocation.
Send a KILL signal to the child process:
kill -KILL $childpid
You can try an play the timing game by using large files and sleeps. You may have an issue with the repeatability of the test.
You can add throttling code to the script your testing and then just throttle it all the way down. You can do throttling code by passing in a value which is:
a sleep value for sleeping in the loop
the number of files to process
the number of seconds after which the script will die
a nice value to execute the script at
Some of these may work better or worse from a testing point of view. nice'ing may get you variable results, as will setting up a background process to kill your script after N seconds. You can also try more than one of these at the same time which may give you the control you want. For example, accepting both a sleep value and the kill seconds could give you fine grained throttling control.