How can I change MPI error handler before MPI_init? - fortran

I have a Fortran code which is predominantly used for running large MPI calculations, but occasionally I want to run it on a machine which does not have MPI available for quick data processing tasks.
I have a logical variable which determines whether or not the program is being run in MPI mode or not, and when it is false the code will not call any MPI subroutines. The way I determine whether or not to run in MPI mode at the moment is by testing to see if a dummy file "NO_MPI" exists which I dislike due to its inelegance and the fact that I usually forget that I need to create this file when running in a non-MPI environment.
When the code is run in a non-MPI environment the MPI_init subroutine will fail and cause the program to crash. So what I would like to do is call it with the optional error output:
call MPI_init(ierr)
and then if an error is returned continue with the node in non-MPI mode.
The issue is that the default MPI error handler will abort the program without returning an error. For any other MPI subroutine the solution would be:
call MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN)
to tell MPI not to abort and return the error code. However I cannot call this subroutine before MPI_init without getting the error
Attempting to use an MPI routine before initializing MPICH.
Is there any way to change the default error handler before calling MPI_init, perhaps specifying it with a compiler flag? I am using the IFORT compiler with MPICH.
I feel that there must be some way of testing the error code from MPI_init, otherwise what is the point of allowing MPI_init to accept the optional error return as an argument when the default behaviour is to abort without returning errors?

Related

MPI_Init_thread function was called before MPI_Init was invoked

I need an MPI_Init_Thread call example in fortran.
I tried it without MPI_Init before it and got the message:"MPI_Init_thread function was called before MPI_Init was invoked".
I called after MPI_init and got the message "Calling MPI_Init or MPI_Init_Thread twpce is erreneous." altough I had each called once.
I am confused.
Also are there three or 4 arguments? and what are they? I believe it is three and the second one I want is MPI_THREAD_MULTIPLE but program aborts.
It would be best if someone uses it with fortran could post an example call please.
For completeness I am posting a simple code I am tring to run in hybrid (MPI + OPENMP) mode.
My purpose is to run this job on two nodes , each with 32 threads by making use of openmp within each node.
program hello
Use mpi
integer ierr,np,pid,inull1,inul2,hug
call MPI_Init(ierr)
inull2=3
! call MPI_Init_thread(inull1,inull2,ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, pid,ierr)
call MPI_Comm_size(MPI_COMM_WORLD, np, ierr)
write(6,*) 'inull2,MPI_THREAD_MULTIPLE',MPI_THREAD_MULTIPLE
hug=huge(inull1)
!$OMP PARALLEL SHARED(inull1,inull2,hug,MPI_THREAD_MULTIPLE,np) &
!$OMP PRIVATE(pid)
!$OMP DO SCEDULE (STATIC)
do i=1,hug
write(6,*) 'program hello_world i,np,pid,inull2,mtm',i,np,pid,inull2,MPI_THREAD_MULTIPLE
enddo
!$OMP END DO
!$OMP END PARALLEL
call MPI_Finalize(ierr)
end program hello
I compile it with
mpif90 -o hello_world_openmp.exe hello_world_openmp.f90
and run it with the following command (for now)
mpirun -hostfile hostfile -pernode -bind-to none hello_world_openmp.exe
Changing the code as
integer ierr,np,pid,inull1,inul2,hug
! call MPI_Init(ierr)
call MPI_Init_thread(MPI_THREAD_FUNNELLED,inull2,ierr)
also gives the following error:
*** The MPI_Init_thread() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
The error message you quite is strange. But it has been reported before (https://www.mail-archive.com/devel#lists.open-mpi.org/msg19978.html https://github.com/TRIQS/cthyb/issues/122)
It points to some incompatibility somewhere. In your case it points to an incorrect call to MPI_Init_thread. The main point is:
Always use IMPLICIT NONE!
This cannot be over-stressed. Even in a very short code you must use it. You declared inul2 instead of inull2 and you used MPI_THREAD_FUNNELLED instead of MPI_THREAD_FUNNELED. After correcting that, your codes runs correctly for me.
One calls MPI_Init_thread instead of MPI_Init, not after.
You call it like
call MPI_Init_thread(required, provided, ie)
where all arguments are integers, required is an input argument saying which threading level you need and provided is an output argument and says which threading level you got.
Be aware that for many MPI implementations MPI_THREAD_MULTIPLE is either not supported at all or is very slow. I suggest designing your programs without the need for MPI_THREAD_MULTIPLE.
The usage can be something like
integer :: ie
integer :: required, provided
required = MPI_THREAD_SERIALIZED
call MPI_Init_thread(required, provided, ie)
if (ie/=0) ... error
if (provided<required) then
... different error

how to stop a fortran program abnormally

When an exception occurs I would like to terminate abnormally my program. Right now, when an exception happens a write statement with an explanatory sentence is called, and then a stop statement is called.
I am debugging the program with idb (intel debugger), when the exception happens I get the write statement, but idb treats the program as terminated normally. I would like that when the exception happens the program is terminated abnormally and so that I can look to the memory with backtrace in the place where the exception happened.
I have tried changing stop in stop 1, so that a non zero value is returned, but this doesn't work
EDIT:
I have implemented the solution in one of the answer:
interface
subroutine abort() bind(C, name="abort")
end subroutine
end interface
print *,1
call abort()
print *,2
end
with this solution I still do not get any backtrace when I am using ifort 13.0.1, but this works perfectly with ifort 14.0.2.
I have resorted to use idb instead of gdb, because often the latter cannot read the values of allocatable arrays in fortran.
There are non-standard extensions for this. Gfortran uses backtrace() to print a backtrace anywhere, for the Intel's equivalent see the wander95's answer https://stackoverflow.com/a/38905855/721644.
In ifort and gfortran you can call the abort() subroutine and you will get backtrace if you used the -traceback (Intel) or -g -fbacktrace (gfortran) compiler option.
You could also call the C abort() directly using the C interoperability. (also non-standard and may not work in all circumstances):
interface
subroutine abort() bind(C, name="abort")
end subroutine
end interface
print *,1
call abort()
print *,2
end
With Fortran 2008 the ERROR STOP statement has been introduced. It's mainly used for Coarray Fortran programs to initiate error termination on all images.
Found this old question by accident. If you want abnormal termination with the intel compiler, you can use the routine tracebackqq. The call sequence can be:
call TRACEBACKQQ(string=string,user_exit_code=user_exit_code)
To quote the manual:
Provides traceback information. Uses the IntelĀ® Fortran run-time library traceback facility to generate a stack trace showing the program call stack as it appeared at the time of the call to TRACEBACKQQ( )
I've never used idb, I've only used gdb, so this might not work. I just put a read statement in at the error point, so that the program stops and waits for input. Then I can CTRL-C it, which causes gdb to pause execution, from which I can get a backtrace, move up and down the stack, look at variables, etc.

C++ and Lua - Unprotected Error (bad callback)? How is this possible

I'm working with LuaJIT's FFI and I'm getting very strange results. This returns a PANIC: Unprotected Error (bad callback):
function idle(ms)
myDLL.myDLL_idle(session, ms)
end
But this simple print has fixed the problem.
function idle(ms)
print("anything")
myDLL.myDLL_idle(session, ms)
end
Another extremely odd solution is to use myDLL.myDLL_idle() inside the main function. How can this even be possible? It's not like I can just do any arbitrary function either if I put the call in a function, the only ones guaranteed to work are a print and sleep.
function idle(ms)
myDLL.myDLL_idle(session, ms)
end
myDLL.myDLL_idle(session, ms) -- works
idle(ms) -- doesn't work (unless first line of idle() is a print statement)
It's doing the same thing but just in another function. And the print fixing it if I try putting it in a function method just add to the complete weirdness of this. This is a huge problem.
According to the documentation, LuaJIT doesn't allow an FFI-call to be JIT-compiled if the FFI-code calls a C function that calls back into Lua via a stored callback. In most cases LuaJIT will detect those calls and avoid compilation, but if it doesn't, it aborts with the "bad callback" error message. The extra print helped, because it prevented JIT-compilation (print is not compiled atm.).
The proposed solution (instead of calling print) is to explicitly stop the FFI-call from being JIT-compiled using the jit.off function.

boost test unit can not call mpi function

I have looked throughfully around but could not find any reference to this problem.
I wrote a c++ program that I am testing with boost/unit. The serial version works fine and the unit test is working.
Now I have made the program parallel via a function doing embarrassingly parallel work with MPI. If a write down my own test calling the parallel function -- let's call it parafunction -- it is working well, MPI runs all right.
Compilation is done with mpic++ and I use mpixec to run the program.
If I call parafunction in boost test case however, the MPI goes all wrong, the test are launched multiple time and the process crash when several MPI::Init are called.
Here is an example of the error I get :
The MPI_comm_size() function was called after MPI_FINALIZE was invoked.
This is disallowed by the MPI standard.
Your MPI job will now abort.
My test case is on a test_unit, automatically handled by a master_test_suite. As I said without the parallelisation it works perfectly well.
Parafunction calls MPI::Init and MPI::Finalize, and no other function of files is supposed to do any MPI related stuff.
Has anyone ever encountered a similar problem before ?
My test run are quite long therefore I could really use the parallel version of my program !
Thanks for your help
A function, which both initialises and then finalises can only be called once, because MPI can be only initialised once during the lifetime of the program and can only be finalised once. To prevent multiple initialisation calls, put the call to MPI_Init() or MPI_Init_thread() in a conditional:
int already_initialised;
MPI_Initialized(&already_initialised);
if (!already_initialised)
MPI_Init(NULL, NULL);
As for the finalisation, it should be moved outside of your function, probably in an atexit(3) handler if you don't want to pollute the outer scope with MPI calls. For example:
void finalise_mpi(void)
{
int already_finalised;
MPI_Finalized(&already_finalised);
if (!already_finalised)
MPI_Finalize();
}
...
atexit(finalise_mpi);
...
The atexit() call could be part of the initialisation code, e.g.:
int already_initialised;
MPI_Initialized(&already_initialised);
if (!already_initialised)
{
MPI_Init(NULL, NULL);
atexit(finalise_mpi);
}
This would not install the atexit(3) handler if MPI was already initialised. The basic idea is that if MPI was initialised on entry to the function, then it would mean that MPI_Init() was called in the outer scope and one would normally expect that MPI_Finalize() is also called there.
If I were you, I would move MPI initialisation and finalisation out of the parallel processing function. The proper calling sequence would be to initialise MPI, run the tests, then finalise MPI.
I've used the C bindings in the above text as the C++ bindings were deprecated in MPI-2.2 and then deleted in MPI-3.0.

GFortran equivalent of ieee_exceptions

I am trying to write a program that will stop whenever an invalid operation is performed, no matter how it is compiled with GFortran. With ifort I could do something like this:
use ieee_exceptions
....
logical :: halt
....
call ieee_get_halting_mode(IEEE_USUAL,halt)
call ieee_set_halting_mode(IEEE_USUAL,.True.)
....
! Something that may stop the program
....
call ieee_set_halting_mode(IEEE_USUAL,halt)
Does GFortran have a module similar to ifort's ieee_exceptions? Or even better is there a way of stopping the halting mode without knowing how the program will be compiled or which compiler will be used?
GFortran supports the ieee_exceptions module as of the GCC 5 release.
If you're stuck on an older GFortran release, a workaround would be to implement functions in C/asm that get/set the FP trapping status register and call those from Fortran.
PS.: GFortran does have a switch (-fpe-trap) for globally enabling traps for FP exceptions, see http://gcc.gnu.org/onlinedocs/gfortran/Debugging-Options.html . But, since you explicitly said "no matter how it is compiled with gfortran", I guess you don't want to use that.