setting an environment variable for a specific MPI process - fortran

I am running a Fortran code in MPI. I need to set an environment variable in one particular process. Is there a way to do this? Calling "system" from the Fortran code does not seem to have an effect. I am running the code via "aprun".

Launcher solution
You should do this with MPMD launching. It works with mpirun or aprun.
Here is an example, where one sets the OMP_NUM_THREADS environment variable differently on one process than the others.
aprun -n 1 -e OMP_NUM_THREADS=1 ./mpi-openmp-app.x input_file.in :
-n 99 -e OMP_NUM_THREADS=10 ./mpi-openmp-app.x input_file.in
This is the heterogeneous equivalent of
aprun -n 100 -e OMP_NUM_THREADS=10 ./mpi-openmp-app.x input_file.in
Please see the aprun man page (or man aprun from the command line) for details.
Note that Cray is in the process of switching many sites from ALPS (i.e. aprun) to SLURM (srun), but I'm sure that SLURM supports the same feature.
MPI's mpirun or mpiexec supports a similar feature. The syntax is not specified by the MPI standard, so you need to read the documentation of your MPI implementation for the specifics.
Source code solution
Assuming your environment variable is parsed after MPI is initialized, you can do something like the following using setenv, if the launcher solution does not work.
int requested=MPI_THREAD_FUNNELED, provided;
MPI_Init_thread(&argc,&argv,requested,&provided);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
if (rank==0) {
int overwrite = 1;
int rc = setenv("OMP_NUM_THREADS","1",overwrite);
}

Related

Obtain a trace of all function invocations? [duplicate]

How can we list all the functions being called in an application. I tried using GDB but its backtrace list only upto the main function call.
I need deeper list i.e list of all the functions being called by the main function and the function being called from these called functions and so on.
Is there a way to get this in gdb? Or could you give me suggestions on how to get this?
How can we list all the functions being called in an application
For any realistically sized application, this list will have thousands of entries, which will probably make it useless.
You can find out all functions defined (but not necessarily called) in an application with the nm command, e.g.
nm /path/to/a.out | egrep ' [TW] '
You can also use GDB to set a breakpoint on each function:
(gdb) set logging on # collect trace in gdb.txt
(gdb) set confirm off # you wouldn't want to confirm every one of them
(gdb) rbreak . # set a breakpoint on each function
Once you continue, you'll hit a breakpoint for each function called. Use the disable and continue commands to move forward. I don't believe there is an easy way to automate that, unless you want to use Python scripting.
Already mentioned gprof is another good option.
You want a call graph. The tool that you want to use is not gdb, it's gprof. You compile your program with -pg and then run it. When it runs a file gmon.out will be produced. You then process this file with gprof and enjoy the output.
record function-call-history
https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html
This should be a great hardware accelerated possibility if you are one of the few people (2015) with a CPU that supports Intel Processor Tracing (Intel PT, intel_pt in /proc/cpuinfo).
GDB docs claim that it can produce output like:
(gdb) list 1, 10
1 void foo (void)
2 {
3 }
4
5 void bar (void)
6 {
7 ...
8 foo ();
9 ...
10 }
(gdb) record function-call-history /ilc
1 bar inst 1,4 at foo.c:6,8
2 foo inst 5,10 at foo.c:2,3
3 bar inst 11,13 at foo.c:9,10
Before using it you need to run:
start
record btrace
which is where a non capable CPU fails with:
Target does not support branch tracing.
CPU support is further discussed at: How to run record instruction-history and function-call-history in GDB?
Related threads:
how to trace function call in C?
Is there a compiler feature to inject custom function entry and exit code?
For embedded, you also consider JTAG and supporting hardware like ARM's DSTREAM, but x86 support does not seem very good: debugging x86 kernel using a hardware debugger
This question might need clarification to decide between what are currently 2 answers. Depends on what you need:
1) You need to know how many times each function is being called in straight list/graph format of functions matched with # of calls. This could lead to ambiguous/inconclusive results if your code is not procedural (i.e. functions calling other functions in a branch out structure without ambiguity of what is calling what). This is basic gprof functionality which requires recompilation with -pg flag.
2) You need a list of functions in the order in which they were called, this depends on your program which is the best/feasible option:
a) IF your program runs and terminates without runtime errors you can use gprof for this purpose.
b) ELSE option above using dbg with logging and break points is the left over option that I learned upon reading this.
3) You need to know not only the order but, for example, the function arguments for each call as well. My current work is simulations in physics of particle transport, so this would ABSOLUTELY be useful in tracking down where anomalous results are coming from... i.e. when the arguments getting passed around stop making sense. I imagine one way to do this is would be a variation on what Employed Russian did except using the following:
(gdb) info args
Logging the results of this command with every break point (set at every function call) gives the args of the current function.
With gdb, if you can find the most child function, you can list its all ancestors like this:
gdb <your-binary>
(gdb) b theMostChildFunction ## put breakpoint on the desired function
(gdb) r ## run the program
(gdb) bt ## backtrace starting from the breakpoint
Otherwise, on linux, you can use perf tool to trace programs and their function calls. The advantage of this, it is tracing all processes including child processes and also it shows usage percentages of the functions in the program.
You can install perf like this:
sudo apt install linux-tools-generic
sudo apt install linux-cloud-tools-generic
Before using perf you may also need to remove some kernel restrictions temporarily:
sudo sh -c 'echo 0 >/proc/sys/kernel/kptr_restrict'
sudo sh -c 'echo 0 >/proc/sys/kernel/perf_event_paranoid'
sudo sh -c 'echo 0 >/proc/sys/kernel/yama/ptrace_scope'
After this, you can run your program binary with perf like this:
perf record -g -s -a <your-binary-and-its-flags>
Then either you can look the output on terminal like this:
perf report
or on text file like this:
perf report -i perf.data > output.txt
vim output.txt
when you are recording the function calls with perf also you may want to filter kernel calls with --all-user flag:
perf record -g -s -a --all-user <your-binary-and-its-flags>
For further information you can look here: https://perf.wiki.kernel.org/index.php/Tutorial

Executing multiple processes of c++ program

I have a C++ program. I am executing it on LInux. I want to execute this multiple instances of this program with different arguments. Eg:
./exeutableProgram file.txt
./exeutableProgram file2.txt
./exeutableProgram file3.txt
In other words, I want to create multiple processes such that each process run on different processor.
How can I achieve this task?
Do I need to make some program using fork()? or I need to write some shell script?
Please provide some guidance in this regard.
You could write a bash script to do this:
for var in "$#" <-- loops over all of the arguments and sets them to var.
do
/path/to/executableProgram $var & <-- executes the program with current var
as argument, & means background process.
done
The & will background the process and they should be allocated to different cores by your operating system.
You could then call with:
./Script file*.txt <-- '*' is the wildcard character meaning all files with
the signature file??.txt (file1.txt, file2.txt etc) will
all become arguments.
If you install the util-linux package on your Linux distribution, you can use the taskset command to start your process on a specific CPU. To start your program on core 0 and then core 5:
$ taskset 0x1 ./executableProgram file.txt
$ taskset 0x20 ./executableProgram file2.txt

parallel run of executable within MPI in C++

i have been using MPI for a while, but i'm not experienced. So
i'm here to ask an advice on general structure of the following implementation.
Say, I have the main C++ file with
MPI_Init(&narg,&arg);
int me,nprocs;
MPI_Comm_rank(MPI_COMM_WORLD,&me);
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
int N = 10;
for (int i=0;i<N;i++) {
//(1)do some stuff in parallel...
//(2)gather results and write an input file for executable
MPI_Barrier(MPI_COMM_WORLD);
//(3)run executable in parallel.
// which is usually run from command line as:
//
// mpirun -np 6 external.exe < input.file
//
MPI_Barrier(MPI_COMM_WORLD);
//(4)gather output from executable, distribute info among processors and keep running
}
MPI_Finalize();
it's the (3) where i have a problem understanding how to do it and tell how many processors can it use. My confusion is also that some kind of "run" command should probably be executed from a single processor/instance. So how do i make it work and let parallel executable use all processors which were provided to the main program? If it is possible.
p/s/ i saw similar questions here in stackoverflow, but no definite answer on if it is possible or not.
Do you have control over the exe, i.e. can you change its code? If so, I'd suggest re-developing it so that the exe is simply a wrapper around the behavior you need, and then you can link in the actual action into your application.
If that is not an option, I suggest just calling the executable from your master (rank 0) process, and let the others wait. Not super efficient, but it'll do the job:
if (me == 0) {
system("mpirun -np 6 external.exe < input.file")
}
You'll have to figure out a way to wait until the command is finished, but according to the docs of system and mpirun it should be as simple as checking if the return value from system(...) is zero, and then continue (after a barrier, as in your example).

List of all function calls made in an application

How can we list all the functions being called in an application. I tried using GDB but its backtrace list only upto the main function call.
I need deeper list i.e list of all the functions being called by the main function and the function being called from these called functions and so on.
Is there a way to get this in gdb? Or could you give me suggestions on how to get this?
How can we list all the functions being called in an application
For any realistically sized application, this list will have thousands of entries, which will probably make it useless.
You can find out all functions defined (but not necessarily called) in an application with the nm command, e.g.
nm /path/to/a.out | egrep ' [TW] '
You can also use GDB to set a breakpoint on each function:
(gdb) set logging on # collect trace in gdb.txt
(gdb) set confirm off # you wouldn't want to confirm every one of them
(gdb) rbreak . # set a breakpoint on each function
Once you continue, you'll hit a breakpoint for each function called. Use the disable and continue commands to move forward. I don't believe there is an easy way to automate that, unless you want to use Python scripting.
Already mentioned gprof is another good option.
You want a call graph. The tool that you want to use is not gdb, it's gprof. You compile your program with -pg and then run it. When it runs a file gmon.out will be produced. You then process this file with gprof and enjoy the output.
record function-call-history
https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html
This should be a great hardware accelerated possibility if you are one of the few people (2015) with a CPU that supports Intel Processor Tracing (Intel PT, intel_pt in /proc/cpuinfo).
GDB docs claim that it can produce output like:
(gdb) list 1, 10
1 void foo (void)
2 {
3 }
4
5 void bar (void)
6 {
7 ...
8 foo ();
9 ...
10 }
(gdb) record function-call-history /ilc
1 bar inst 1,4 at foo.c:6,8
2 foo inst 5,10 at foo.c:2,3
3 bar inst 11,13 at foo.c:9,10
Before using it you need to run:
start
record btrace
which is where a non capable CPU fails with:
Target does not support branch tracing.
CPU support is further discussed at: How to run record instruction-history and function-call-history in GDB?
Related threads:
how to trace function call in C?
Is there a compiler feature to inject custom function entry and exit code?
For embedded, you also consider JTAG and supporting hardware like ARM's DSTREAM, but x86 support does not seem very good: debugging x86 kernel using a hardware debugger
This question might need clarification to decide between what are currently 2 answers. Depends on what you need:
1) You need to know how many times each function is being called in straight list/graph format of functions matched with # of calls. This could lead to ambiguous/inconclusive results if your code is not procedural (i.e. functions calling other functions in a branch out structure without ambiguity of what is calling what). This is basic gprof functionality which requires recompilation with -pg flag.
2) You need a list of functions in the order in which they were called, this depends on your program which is the best/feasible option:
a) IF your program runs and terminates without runtime errors you can use gprof for this purpose.
b) ELSE option above using dbg with logging and break points is the left over option that I learned upon reading this.
3) You need to know not only the order but, for example, the function arguments for each call as well. My current work is simulations in physics of particle transport, so this would ABSOLUTELY be useful in tracking down where anomalous results are coming from... i.e. when the arguments getting passed around stop making sense. I imagine one way to do this is would be a variation on what Employed Russian did except using the following:
(gdb) info args
Logging the results of this command with every break point (set at every function call) gives the args of the current function.
With gdb, if you can find the most child function, you can list its all ancestors like this:
gdb <your-binary>
(gdb) b theMostChildFunction ## put breakpoint on the desired function
(gdb) r ## run the program
(gdb) bt ## backtrace starting from the breakpoint
Otherwise, on linux, you can use perf tool to trace programs and their function calls. The advantage of this, it is tracing all processes including child processes and also it shows usage percentages of the functions in the program.
You can install perf like this:
sudo apt install linux-tools-generic
sudo apt install linux-cloud-tools-generic
Before using perf you may also need to remove some kernel restrictions temporarily:
sudo sh -c 'echo 0 >/proc/sys/kernel/kptr_restrict'
sudo sh -c 'echo 0 >/proc/sys/kernel/perf_event_paranoid'
sudo sh -c 'echo 0 >/proc/sys/kernel/yama/ptrace_scope'
After this, you can run your program binary with perf like this:
perf record -g -s -a <your-binary-and-its-flags>
Then either you can look the output on terminal like this:
perf report
or on text file like this:
perf report -i perf.data > output.txt
vim output.txt
when you are recording the function calls with perf also you may want to filter kernel calls with --all-user flag:
perf record -g -s -a --all-user <your-binary-and-its-flags>
For further information you can look here: https://perf.wiki.kernel.org/index.php/Tutorial

BASH scripts for generating inputs to parallel C++ jobs

I'm an amateur C++ programmer trying to learn about basic shell scripting. I have a complex C++ program that currently reads in different parameter values from Parameters.h and then executes one or more simulations with each parameter value sequentially. These simulations take a long time to run. Since I have a cluster available, I'd like to effectively parallelize this job, running the simulations for each parameter value on a separate processor. I'm assuming it's easier to learn shell scripting techniques for this purpose than OpenMPI. My cluster runs on the LSF platform.
How can I write my input parameters in Bash so that they are distributed among multiple processors, each executing the program with that value? I'd like to avoid interactive submission. Ideally, I'd have the inputs in a text file that Bash reads, and I'd be passing two parameters to each job: an actual parameter value and a parameter ID.
Thanks in advance for any leads and suggestions.
my solution
GNU Parallel does look slick, but I ended up (with the help of an IT admin) writing a simple bash script that echos to screen three inputs (a treatment identifier, treatment/parameter value, and a simulation identifier):
#!/bin/bash
j=1
for treatment in cat treatments.txt; do
for experiment in cat simulations.txt; do
bsub -oo tr_${j}_sim_${experiment}_screen -eo tr_${j}_sim_${experiment}_err -q short_serial "echo \"$j $treatment $experiment\" | ./a.out"
done
let j=$j+1
done
The file treatments.txt contains a list of the values I'd like to vary, simulations.txt contains a list of all the simulation identifiers I'd like to run (currently just 1,...,s, where s is the total number of simulations I want for each treatment), and the treatments are indexed 1...j.
Maybe check out: http://www.gnu.org/software/parallel/
edit:
Or, check out the -P argument to xargs, example:
time echo {1..5} | xargs -n 1 -P 5 sleep
Say you want to run the program simulate with inputs foo, bar, baz and quux in parallel, then the simplest way is:
inputs="foo bar baz quux"
# Launch processes in the background with &
children=""
for x in $inputs; do
simulate "$x" > "$x.output" &
$children = "$children $!"
done
# Wait for each to finish
for $pid in $children; do
wait $pid
done
for x in $inputs; do
echo "simulate '$x' gave:"
cat "$x.output"
rm -f "$x.output"
done
The problem is that all simulations are launched at the same time, so if your number of inputs is much larger than your number of CPUs/cores, they may swamp the system.
My best stab at this is you background multiple instances of your program and let the OS's scheduler take over to put them on different processors. AFAIK there is no way in any shell to specify which processor a given process should run on.
Something to the effect of:
#!/bin/sh
for arg in foo bar baz; do
./your_program "$arg" &
done