Detaching ncu profiler while leaving the profiled program running - nsight-compute

I am currently using Nsight Compute CLI to profile DNN training. I use the following command to launch and attach the profiler with the program. (I will abbreviate the metrics part, since it is not the core concern of this topic.) If I run this script with appropriate command line arguments, the profiler and the program runs fine and the log and report is created.
#!/bin/bash
path_to_report=$1
path_to_script=$2
/usr/local/NVIDIA-Nsight-Compute/ncu \
--log-file ./temp-report \
-o $path_to_report \
--print-summary per-gpu\
--target-processes all \
--metrics some_metrics_blah_blah \
--force \
$path_to_script
However, I want to exit the profiler (or the profiling process) after a certain time (let’s say 5 minutes), but keep the program running. This is because the profiler adds large overhead & info, so I want to stop profiling but keep the DNN training running. But as I tried to terminate the ncu related process with the kill command, not only the ncu related process but also the training program exits. So I am needing help on this issue. Would there be any way to achieve my objective?

Related

Get user events from VTune doesn't work with attach to process

TLDR;
I am attempting to run a command line vtune attach to process analysis for some instrumented code with the application instrumentation lib supplied by intel. I have succeeded, in collecting user events when running within the vtune application (both command line and GUI). When I use -target-pid command line option to connect to the same application, user events do not show up in the profile. The environment setup suggested in the instructions for attaching to a process does not work.
The long version
I have broken this down again and again, and i have hit the minimum amount of things going on here. I am running Ubuntu 20.04 with intel vtune installed as part of the oneapi installer package. I have built an example application, which i can share, but it basically spawns threads and does some random computations. I have instrumented the code with itt as such:
#include <ittnotify.h>
__itt_event cloud_in_event = __itt_event_create( "CloudIn", 7 );
...
void add() {
__itt_event_start( cloud_in_event );
...
This works correctly when run through the gui. Aka, i compile my application with the following:
g++ -g -O3 -fno-asm -std=c++17 -I/opt/intel/oneapi/vtune/latest/sdk/include -DUSE_THR example.cpp -g -o ./example -lpthread -lm -L/opt/intel/oneapi/vtune/2021.4.0/sdk/lib64 -littnotify -ldl -D_LINUX
I start the gui using:
. /opt/intel/oneapi/setvars.sh && vtune-gui &
Run it using the cpu hotspot analysis in hw mode. The application runs and i get this in the output:
Yay, my user event is there. All is well.
The equivalent command line also works:
/opt/intel/oneapi/vtune/2021.4.0/bin64/vtune -collect hotspots -knob sampling-mode=hw -knob stack-size=0 -app-working-dir /home/development/example/example --app-working-dir=/home/development/example/example -- /home/development/hovermap/example/example
However, if i run the application on its own (using the correct setup for the link path in the environment variables for INTEL_LIBITTNOTIF), then attach with the GUI to that process (or with the command line). There are no user events (aka, the CloudIn event in the above image) in the profiler data.
If I print out the environment variables in the application, there are quite vast differences in the environments when profiling directly, vs when attaching. For example, there is the following:
INTEL_JIT_PROFILER32=/opt/intel/oneapi/vtune/2021.4.0/lib32/runtime/libittnotify_collector.so
INTEL_JIT_PROFILER64=/opt/intel/oneapi/vtune/2021.4.0/lib64/runtime/libittnotify_collector.so
ENABLE_JITPROFILING=1
Exists in the gui based run environment, but the setup instructional says nothing about these environment variables. I have also tried setting them with no luck.
Any ideas what extra stuff i need to set up?
If you want to attach to application that uses ITT API you need to set up additional environment variables before running it, for example:
export INTEL_LIBITTNOTIFY32=/opt/intel/oneapi/vtune/2021.4.0/lib64/runtime/libittnotify_collector.so
export INTEL_LIBITTNOTIFY64=/opt/intel/oneapi/vtune/2021.4.0/lib64/runtime/libittnotify_collector.so
./example
These environment variables are described in Attach ITT APIs to a Launched Application help topic in VTune User Guide.

CLion remote debugging will not kill remote process

I have the newest (2020.3 EAP ATM) version of CLion and I currently use it to remote debug a program on an embedded target (linux-mipsel).
Everything works as expected, after a bit of configuration, using self-built cross-toolchain and gdbserver.
My only problem is hitting the "red square" to stop execution will neither kill the running program nor gdbserver itself.
This means next iteration of edit-compile-debug cycle I will have two copies of both (I can get more, if I insist) which will not work as each tries to open the same resources (e.g.: a serial port) concurrently.
I have to manually log into target and kill the offending processes.
Am I missing something, is it a known bug or what?
Small update:
gdbserver is actually killed (does not show in ps ax) but underlying program (debugee) is still there. I am unsure why I was convinced otherwise, my bad.
This is a known issue and will hopefully be fixed soon.
Here is the link to the youtrack issue: https://youtrack.jetbrains.com/issue/CPP-20346
You could try the suggested workarounds:
Add pre-deploy configuration which kills running instances of the program
Follow the instructions for the gdb configuration in the comments:
GDB Server: /bin/bash
GDB Server args: -c "gdbserver :1234 /home/pi/myapp; pkill -e myapp"
The second config did not work for me, so I added the execution of an external tool where I run in /bin/bash the command -c "pkill -e myapp || true". The true is mandatory to avoid errors if the program is not running.

my system V init script don't return

This is script content, located in /etc/init.d/myserviced:
#!/lib/init/init-d-script
DAEMON="/usr/local/bin/myprogram.py"
NAME="myserviced"
DESC="The description of my service"
When I start the service (either by calling it directly or by calling sudo service myserviced start), I can see program myprogram.py run, but it did not return to command prompt.
I guess there must be something that I misunderstood, so what is it?
The system is Debian, running on a Raspberry Pi.
After more works, I finally solved this issue. There are 2 major reasons:
init-d-script actually calls start-stop-daemon, who don't work well with scripts specified via --exec option. When killing scripts, you should only specify --name option. However, as init-d-script always fill --exec option, it cannot be used with script daemons. I have to write the sysv script by myself.
start-stop-daemon won't magically daemonize the thing you provide. So the executable provided to start-stop-daemon should be daemonized itself, but not a regular program.

running parallel code on PC

I have fortran code that has been parallelized with OpenMP. I want to test my code on my PC before running on HPC. My PC has double core CPU and I work on Linux-mint. I installed gfortranmultilib and this is my script:
#!/bin/bash
### Job name
#PBS -N pme
### Keep Output and Error
#PBS -j eo
### Specify the number of nodes and thread (ppn) for your job.
#PBS -l nodes=1:ppn=2
### Switch to the working directory;
cd $PBS_O_WORKDIR
### Run:
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
ulimit -s unlimited
./a.out
echo 'done'
What should I do more to run my code?
OK, I changed script as suggested in answers:
#!/bin/bash
### Switch to the working directory;
cd Desktop/test
### Run:
OMP_NUM_THREADS=2
export OMP_NUM_THREADS
ulimit -s unlimited
./a.out
echo 'done'
my code and its executable file are in folder test on Desktop, so:
cd Desktop/test
is this correct?
then I compile my simple code:
implicit none
!$OMP PARALLEL
write(6,*)'hi'
!$OMP END PARALLEL
end
by command:
gfortran -fopenmp test.f
and then run by:
./a.out
but only one "hi" is printed as output. What should I do?
(and a question about this site: in situation like this I should edit my post or just add a comment?)
You don't need and probably don't want to use the script on your PC. Not even to learn how to use such a script, because these scripts are too much connected to the specifics of each supercomputer.
I use several supercomputers/clusters and I cannot just reuse the script from one at the other, because they are so much different.
On your PC you should just do:
optional, it is probably the default
export OMP_NUM_THREADS=2
to set the number of OpenMP threads to 2. Adjust if you need some other number.
cd to the working directory
cd my_working_directory
Your working directory is the directory where you have the required data or where the executable resides. In your case it seems to be the directory where a.out is.
run the damn thing
ulimit -s unlimited
./a.out
That's it.
You can also store the standard output and error output to a file
./out > out.txt 2> err.txt
to mimic the supercomputer behaviour.
The PBS variables are only set when you run the script using qsub. You probably don't have that on your PC and you probably don't want to have it either.
$PBS_O_WORKDIR is the directory where you run the qsub command, unless you set it differently by other means.
$PBS_NUM_PPN is the number you indicated in #PBS -l nodes=1:ppn=2. The queue system reads that and sets this variable for you.
The script you posted is for Portable Batch System (https://en.wikipedia.org/wiki/Portable_Batch_System) queue system. That means, that the job you want to run on the HPC infrastructure has to go first into the queue system and when the resources are available the job will run on the system.
Some of the commands (those starting with #PBS) are specific commands for this queue system. Among these commands, some allow the user to indicate the application process hierarchy (i.e. number of processes and threads). Also, keep in mind that since all the PBS commands start by # they are ignored by regular shell script execution. In the case you presented, that is given by
### Specify the number of nodes and thread (ppn) for your job.
#PBS -l nodes=1:ppn=2
which as the comment indicates it should tell the queue system that you want to run 1 process and each process will have 2 threads. The queue system is likely to pass these parameters to the process launcher (srun/mpirun/aprun/... for MPI apps in addition to OMP_NUM_THREADS for OpenMP apps).
If you want to run this job on a computer that does not have PBS queue, you should be aware at least of two things.
1) The following command
### Switch to the working directory;
cd $PBS_O_WORKDIR
will be translated into "cd" because the environment variable PBS_O_WORKDIR is only defined within the PBS job context. So, you should change this command (or execute another cd command just before the execution) in order to fix where you want to run the job.
2) Similarly for PBS_NUM_PPN environment variable,
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
this variable won't be defined if you don't run this within a PBS job context, so you should set OMP_NUM_THREADS to the value you want (2, according to your question) manually.
If you want your linux box environment to be like an HPC login node. You can do the following
Make sure that your compiler supports OpenMP, test a simple hello world program with OpenMP flags
Install OpenMPI on your system from your favourite package manager or download the source/binary from the website (OpenMPI Download)
I would not recommend installing cluster manager like Slurm for your experiments
After you are done, you can execute your MPI programs through the mpirun wrapper
mpirun -n <no_of_cores> <executable>
EDIT:
This is assuming that you are running this only MPI. Note that OpenMP utilizes the cores as well. If you are running MPI+OpenMP - n*OMP_NUM_THREADS=cores on a single node.

Libssh2: prevent background task from being killed

I am writing a program that logs into another system via SSH using the libssh2 library. Once logged in, I execute a command using:
libssh2_channel_exec(sshchannel, command)
The command executes okay. However, once I close the channel the process running is killed. In my case, the command (executing a binary executable) will run for a long period of time and my program cannot wait for it to terminate. I've tried issuing the following commands all to the same result (the process is still killed upon closing the channel):
/path/myprog
nohup /path/myprog
nohup /path/myprog &
/path/myprog &; disown
Further, I've observed this behavior for both libssh and libssh2. Is there some option or command I am missing?
Thanks in advance.
you can use the unix at command:
echo "cmd" | at now