I am trying to understand parallel data writing from Fortran code with MPI. I came across a simple program from here.
I compiled and run the program with MPI compiler and getting the following error:
sathish#HP-EliteBook:~/Desktop$ mpif90 test.F90 -o test
sathish#HP-EliteBook:~/Desktop$ mpirun -np 4 test
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
I see similar issues in other forums mentioning this is due to wrong mpi installation location or so. The following was one of the solutions suggesting use of LD_PRELOAD:
sathish#HP-EliteBook:~/Desktop$ mpirun -x LD_PRELOAD=libmpi.so -np 4 test
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
The issue still persists. I could not figure what the issue is for such a simple program.
Short answer:
run as
mpirun -np 4 ./test
instead of:
mpirun -np 4 test
Details:
That is a common problem that happens when your working directory is not in your path. The simple solution is to add the full path to the executable. Another alternative might be to add the current directory to your path variable. However, in this case, even if you add the current directory to the path, the order will matter. Linux systems (it seems to be your case) usually come with a program named test that is usually in the path by default.
What is going on is that you do not start mpi program directly, instead, you start mpirun that will start the mpi machinery, and start your program that will make use of that machinery. mpirun has to find your program. And there is where comes the two options I suggested above: full path to your executable or add your directory to your search path.
Related
As I understand from this link, MPI and DPCPP is possible together- https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/Intel-MPI-support-GPU-Computing/td-p/1204653?profile.language=de
I am trying to use GDB on a simple MPI +DPCPP program as found here on Intel’s GitHub page - https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/DPC%2B%2B/ParallelPatterns/dpc_reduce
When I do
mpirun -n 4 -gdb ./mpi_code
the mpi_gdb attaches to the 4 processes. It also works with the gdb commands except when I put a breakpoint inside the GPU offloading part (e.g. at line 618). GDB completely skips this breakpoint and moves to the next one.
Is there anything I am missing ? Any parameter or environment variable or maybe a flag I need to set?
Why is gdb showing that the program exited during its startup, so before to stop at the first breakpoint in the main function ?
Some steps:
$ gdb --cd $programhome -tui -tty $reservedtty --args myprogram
b main
r
gdb shows:
Starting program: myprogram
During startup program exited with code 1.
I already tried to break at exit() function, without success.
Why is gdb exiting before to stop at the first breakpoint in the main function
GDB is not exiting. Your program does.
It does exit before reaching main.
This can happen for a few reasons, such as:
Corrupt binary -- the kernel rejects it in execve system call for some reason and not a single instruction of the program actually runs.
The dynamic linker rejects it (e.g. because some required library or symbol is missing)
Your shell refuses to execute the program (bad ~/.bashrc, bad $PATH, etc).
You can narrow down the actual cause by running the program outside GDB (does it run?), running without ~/.bashrc, using (gdb) catch syscall exit_group (on Linux), etc.
There was a permission issue accessing the secondary terminal port.
The gdb is being started with the parameter -tty which switches the input/output to another tty port (in that case pseudo: pts).
When the two terminals are opened by different users, that problem occurs, even if after the first logon you change the user with su command, the first user logged needed to be the same among the two ttys.
I have found several conflicting answers over this topic. This blog post requires libuwind, but that doesn't work on Mac OS X. I included #include <google/profiler.h> in my code, however my compiler (g++) could not find the library. I installed gperftools via homebrew. In addition, I found this stackoverflow question showing this:
Then I ran pprof to generate the output:
[hidden ~]$ pprof --text ./a.out cpu.profile
Using local file ./a.out.
Using local file cpu.profile.
Removing __sigtramp from all stack traces.
Total: 282 samples
107 37.9% 37.9% 107 37.9% 0x000000010d72229e
16 5.7% 43.6% 16 5.7% 0x000000010d721a5f
12 4.3% 47.9% 12 4.3% 0x000000010d721de8
...
Running that command (without any of the prior steps) gets me this:
[hidden]$ pprof --text ./a.out cpu.profile
Using remote profile at ./a.out.
Failed to get the number of symbols from http://cpu.profile/pprof/symbol
Why does it try to access an internet site on my machine and a local file on his/hers?
Attempting to link lib profiler as a dry run with g++ gets me:
[hidden]$ g++ -l libprofiler
ld: library not found for -llibprofiler
clang: error: linker command failed with exit code 1 (use -v to see invocation)
I have looked at the man pages, the help option text, the official online guide, blog posts, and many other sources.
I am so confused right now. Can someone help me use gperftools?
The result of my conversation with #osgx was this script. I tried to clean it up a bit. It likely contains quite a few unnecessary options too.
The blog post https://dudefrommangalore.wordpress.com/2012/02/09/profiling-c-code-using-google-performance-tools/ "Profiling C++ code using Google Performance Tools" 2012 by dudefrommangalore missed the essential step.
You should link your program (which you want to be profiled) with cpu profiler library of gperftools library.
Check official manual: http://goog-perftools.sourceforge.net/doc/cpu_profiler.html, part "Linking in the Library"
add -lprofiler to the link-time step for your executable. (It's also probably possible to add in the profiler at run-time using LD_PRELOAD, but this isn't necessarily recommended.)
Second step is to collect the profile, run the code with profiling enabled. In linux world it was done by setting controlling environment variable CPUPROFILE before running:
CPUPROFILE=name_of_profile ./program_to_be_profiled
Third step is to use pprof (google-pprof in ubuntu world). Check that there is not-empty name_of_profile profile file generated; it there is no such file, pprof will try to do remote profile fetch (you see output of such try).
pprof ./program_to_be_profiled name_of_profile
First you need to run your program with profiling enabled.
This is usually first linking your program with libprofiler and then running it with CPUPROFILE=cpu.profile.
I.e.
$ CPUPROFILE=cpu.profile my_program
I think that later step is what you have been missing.
The program will create this cpu.profile file when it exits. And then you can use pprof (preferably from github.com/google/pprof) on it to visualize/analyze.
I have fortran code that has been parallelized with OpenMP. I want to test my code on my PC before running on HPC. My PC has double core CPU and I work on Linux-mint. I installed gfortranmultilib and this is my script:
#!/bin/bash
### Job name
#PBS -N pme
### Keep Output and Error
#PBS -j eo
### Specify the number of nodes and thread (ppn) for your job.
#PBS -l nodes=1:ppn=2
### Switch to the working directory;
cd $PBS_O_WORKDIR
### Run:
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
ulimit -s unlimited
./a.out
echo 'done'
What should I do more to run my code?
OK, I changed script as suggested in answers:
#!/bin/bash
### Switch to the working directory;
cd Desktop/test
### Run:
OMP_NUM_THREADS=2
export OMP_NUM_THREADS
ulimit -s unlimited
./a.out
echo 'done'
my code and its executable file are in folder test on Desktop, so:
cd Desktop/test
is this correct?
then I compile my simple code:
implicit none
!$OMP PARALLEL
write(6,*)'hi'
!$OMP END PARALLEL
end
by command:
gfortran -fopenmp test.f
and then run by:
./a.out
but only one "hi" is printed as output. What should I do?
(and a question about this site: in situation like this I should edit my post or just add a comment?)
You don't need and probably don't want to use the script on your PC. Not even to learn how to use such a script, because these scripts are too much connected to the specifics of each supercomputer.
I use several supercomputers/clusters and I cannot just reuse the script from one at the other, because they are so much different.
On your PC you should just do:
optional, it is probably the default
export OMP_NUM_THREADS=2
to set the number of OpenMP threads to 2. Adjust if you need some other number.
cd to the working directory
cd my_working_directory
Your working directory is the directory where you have the required data or where the executable resides. In your case it seems to be the directory where a.out is.
run the damn thing
ulimit -s unlimited
./a.out
That's it.
You can also store the standard output and error output to a file
./out > out.txt 2> err.txt
to mimic the supercomputer behaviour.
The PBS variables are only set when you run the script using qsub. You probably don't have that on your PC and you probably don't want to have it either.
$PBS_O_WORKDIR is the directory where you run the qsub command, unless you set it differently by other means.
$PBS_NUM_PPN is the number you indicated in #PBS -l nodes=1:ppn=2. The queue system reads that and sets this variable for you.
The script you posted is for Portable Batch System (https://en.wikipedia.org/wiki/Portable_Batch_System) queue system. That means, that the job you want to run on the HPC infrastructure has to go first into the queue system and when the resources are available the job will run on the system.
Some of the commands (those starting with #PBS) are specific commands for this queue system. Among these commands, some allow the user to indicate the application process hierarchy (i.e. number of processes and threads). Also, keep in mind that since all the PBS commands start by # they are ignored by regular shell script execution. In the case you presented, that is given by
### Specify the number of nodes and thread (ppn) for your job.
#PBS -l nodes=1:ppn=2
which as the comment indicates it should tell the queue system that you want to run 1 process and each process will have 2 threads. The queue system is likely to pass these parameters to the process launcher (srun/mpirun/aprun/... for MPI apps in addition to OMP_NUM_THREADS for OpenMP apps).
If you want to run this job on a computer that does not have PBS queue, you should be aware at least of two things.
1) The following command
### Switch to the working directory;
cd $PBS_O_WORKDIR
will be translated into "cd" because the environment variable PBS_O_WORKDIR is only defined within the PBS job context. So, you should change this command (or execute another cd command just before the execution) in order to fix where you want to run the job.
2) Similarly for PBS_NUM_PPN environment variable,
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
this variable won't be defined if you don't run this within a PBS job context, so you should set OMP_NUM_THREADS to the value you want (2, according to your question) manually.
If you want your linux box environment to be like an HPC login node. You can do the following
Make sure that your compiler supports OpenMP, test a simple hello world program with OpenMP flags
Install OpenMPI on your system from your favourite package manager or download the source/binary from the website (OpenMPI Download)
I would not recommend installing cluster manager like Slurm for your experiments
After you are done, you can execute your MPI programs through the mpirun wrapper
mpirun -n <no_of_cores> <executable>
EDIT:
This is assuming that you are running this only MPI. Note that OpenMP utilizes the cores as well. If you are running MPI+OpenMP - n*OMP_NUM_THREADS=cores on a single node.
How can I redirect execution errors of a c++ executable in bash? I've found that 2> helps while trying identify compilation errors:
g++ example.cpp 2> compErr.txt
But running the executable with that command still sends the errors to stdout:
$ ./a.out 2> e.txt
Floating point exception (core dumped)
Actually, the error "Floating point exception (core dumped)" does not come from the executable but from the shell! The messages from bash won't be suppressed by output redirection but there is a flag to enable/disable these messages.
You can install signal handlers for some of the errors which would cause the program to exit and write something to a suitable destination there. Some signals can't be intercepted and some other are hard to handle. That's the approach you can do from inside your code.
If you want to go further you could fork() your program first thing and have the actual work done in the child process. The parent process would essentially just waitpid() for the child process and use the information in the result structure received to report errors to a file.
I found something that worked in my terminal, here: http://bytes.com/topic/c/answers/822874-runtime-error-stderr-gcc-ubuntu-bash
In summary, a participant explained:
In this particular case, the reason that the string "Floating point exception" is not >redirected is that it is not produced by the process that runs ./{file} or anything that it invokes. Instead,it is being produced by the command-interpreter itself.
You can see this by telling the command interpreter to run another command interpreter, redirecting this sub-interpreter's error output. However, a bit of a >trick is required:
$ bash -c './{file}; true' >out 2>err
$ cat out
$ cat err
bash: line 1: 28106 Floating point exception./test_fail