MPI_TYPE_CREATE_STRUCT: invalid datatype - fortran

I have a subroutine as part of a larger Fortran program which will not run when called with mpi on a Mac laptop. The program is compiled using mpifort, and will run perfectly well in serial mode.
The program will run successfully when compiled with mpifort on a HEC - the included mpif.h files for each are indicated towards the top of the code below.
I've seen a previous post (Invalid datatype when running mpirun) which discusses changing the number of blocks to get around this error; however, this does not explain why the program will run on a different architecture.
subroutine Initialise_f(Nxi, XI, BBC, B, nprofile, &
kTzprofile, kTpprofile, vzprofile, Nspecies, particle, &
neighbourLeft, neighbourRight, comm1d)
use SpecificTypes
implicit none
include '/opt/pgi/osx86-64/2017/mpi/mpich/include/mpif.h' !MPI for MAC
!include '/usr/shared_apps/packages/openmpi-1.10.0-gcc/include/mpif.h' !MPI for HEC
type maxvector
double precision, allocatable :: maxf0(:)
end type maxvector
integer Nxi, Nspecies
double precision XI(Nxi), BBC(2), B(Nxi), nprofile(Nxi,Nspecies), &
kTzprofile(Nxi,Nspecies), kTpprofile(Nxi,Nspecies), &
vzprofile(Nxi,Nspecies)
type(species) particle(Nspecies)
integer neighbourLeft, neighbourRight, comm1d
! Variables for use with mpi based communication
integer (kind=MPI_ADDRESS_KIND) :: offsets(2)
integer ierr, blockcounts(2), tag, oldtypes(2), &
b_type_sendR, b_type_sendL, b_type_recvR, b_type_recvL, &
istart, Nmess, rcount, fnodesize, fspecsize, maxf0size, &
fspecshape(2), requestIndex, receiveLeftIndex, receiveRightIndex
! Allocate communication buffers if necessary
fnodesize = sum( particle(:)%Nvz * particle(:)%Nmu )
Nmess = 0
if (neighbourLeft>-1) then
Nmess = Nmess + 2
allocate( send_left%ivzoffsets(Nspecies*2) )
allocate( send_left%fs(fnodesize*2 ))
allocate( receive_left%ivzoffsets(Nspecies*2) )
allocate( receive_left%fs(fnodesize*2) )
send_left%ivzoffsets = 0
send_left%fs = 0.0d0
receive_left%ivzoffsets = 0
receive_left%fs = 0.0d0
end if
! Build a few mpi data types for communication purposes
oldtypes(1) = MPI_INTEGER
blockcounts(1) = Nspecies*2
oldtypes(2) = MPI_DOUBLE_PRECISION
blockcounts(2) = fnodesize*2
if (neighbourLeft>-1) then
call MPI_GET_ADDRESS(receive_left%ivzoffsets, offsets(1), ierr)
call MPI_GET_ADDRESS(receive_left%fs, offsets(2), ierr)
offsets = offsets-offsets(1)
call MPI_TYPE_CREATE_STRUCT(2,blockcounts,offsets,oldtypes,b_type_recvL,ierr)
call MPI_TYPE_COMMIT(b_type_recvL, ierr)
call MPI_GET_ADDRESS(send_left%ivzoffsets, offsets(1), ierr)
call MPI_GET_ADDRESS(send_left%fs, offsets(2), ierr)
offsets = offsets-offsets(1)
call MPI_TYPE_CREATE_STRUCT(2,blockcounts,offsets,oldtypes,b_type_sendL,ierr)
call MPI_TYPE_COMMIT(b_type_sendL, ierr)
end if
This will bail out with the following error:
[dyn-191-250:31563] *** An error occurred in MPI_Type_create_struct
[dyn-191-250:31563] *** reported by process [1687683073,0]
[dyn-191-250:31563] *** on communicator MPI_COMM_WORLD
[dyn-191-250:31563] *** MPI_ERR_TYPE: invalid datatype
[dyn-191-250:31563] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[dyn-191-250:31563] *** and potentially your MPI job)

On your mac, you include mpif.h from MPICH, but the error message is from Open MPI.
you should simply include 'mpif.h' and use the MPI wrappers (e.g. mpifort) to build your application.
A better option is to use mpi and an even better one is to use mpi_f08 if your MPI and compilers support it (note the latter option requires you to update your code).

Related

Segmentation fault with MPI_Comm_Rank

I have to work on a code written a few years ago which uses MPI and PETSc.
When I try to run it, I have an error with the function MPI_Comm_rank().
Here is the beginning of the code :
int main(int argc,char **argv)
{
double mesure_tps2,mesure_tps1;
struct timeval tv;
time_t curtime2,curtime1;
char help[] = "Solves linear system with KSP.\n\n"; // NB: Petsc est defini dans "fafemo_Constant_Globales.h"
std::cout<< "d�but PetscInitialize" <<std::endl;
(void*) PetscInitialize(&argc,&argv,(char *)0,help);
std::cout<< "d�but PetscInitialize fait" <<std::endl;
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
PetscFinalize();
}
Obviously, there are some code between MPI_Comm_rank() and PetscFinalize().
PetscInitialize and PetscFinalize call respectively MPI_INIT and MPI_FINALIZE.
In my makefil I have :
PETSC_DIR=/home/thib/Documents/bibliotheques/petsc-3.13.2
PETSC_ARCH=arch-linux-c-debug
include ${PETSC_DIR}/lib/petsc/conf/variables
include ${PETSC_DIR}/lib/petsc/conf/rules
PETSC36 = -I/home/thib/Documents/bibliotheques/petsc-3.13.2/include -I/home/thib/Documents/bibliotheques/petsc-3.13.2/arch-linux-c-debug/include
Mpi_include=-I/usr/lib/x86_64-linux-gnu/openmpi
#a variable with some files names
fafemo_files = fafemo_CI_CL-def.cc fafemo_Flux.cc fafemo_initialisation_probleme.cc fafemo_FEM_setup.cc fafemo_sorties.cc fafemo_richards_solve.cc element_read_split.cpp point_read_split.cpp read_split_mesh.cpp
PETSC_KSP_LIB_VSOIL=-L/home/thib/Documents/bibliotheques/petsc-3.13.2/ -lpetsc_real -lmpi -lmpi++
fafemo: ${fafemo_files} fafemo_Richards_Main.o
g++ ${CXXFLAGS} -g -o fafemo_CD ${fafemo_files} fafemo_Richards_Main.cc ${PETSC_KSP_LIB_VSOIL} $(PETSC36) ${Mpi_include}
Using g++ or mpic++ doesn't seem to change anything.
It compiles, but when I try to execute I have :
[thib-X540UP:03696] Signal: Segmentation fault (11)
[thib-X540UP:03696] Signal code: Address not mapped (1)
[thib-X540UP:03696] Failing at address: 0x44000098
[thib-X540UP:03696] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3efd0)[0x7fbfa87e4fd0]
[thib-X540UP:03696] [ 1] /usr/lib/x86_64-linux-gnu/libmpi.so.20(MPI_Comm_rank+0x42)[0x7fbfa9533c42]
[thib-X540UP:03696] [ 2] ./fafemo_CD(+0x230c8)[0x561caa6920c8]
[thib-X540UP:03696] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fbfa87c7b97]
[thib-X540UP:03696] [ 4] ./fafemo_CD(+0x346a)[0x561caa67246a]
[thib-X540UP:03696] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node thib-X540UP exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Also, I have others MPI programs on my computer and I never had such a problem.
Does anyone know why do I get this ?
If someone has the same issue :
When I installed PETSc, I ran ./configure with --download-mpich while I already had mpi installed on my computer.
To solve the problem I did "rm -rf ${PETSC_ARCH}" and ran ./configure again.

How does one debug a fortran application where MPI_INIT succeeds but MPI_INITIALIZED returns false?

In my MPI fortran application I have:
call MPI_INIT(ierror )
debugging result:
ierror .eq. MPI_SUCCESS
immediately afterwards, in the same routine, I call,
call MPI_INITIALIZED(initialzed_flag , initialzed_ierror )
debugging result: initialzed_ierror .eq. MPI_SUCCESS = 0
but: initialzed_flag .eq. .false
meaning there was MPI but it wasn't initialized. How do I debug this situation? I am using the intel compiler with intel MPI, ifort version 18.0.5 (2018).
I tried:
export I_MPI_DEBUG=6
export I_MPI_HYDRA_DEBUG=on
but I did not learn anything useful. I am suspecting that the problem occurs at runtime but what is it?
Here is a minimum working example:
program main
include 'mpif.h'
integer error
integer id
integer p
LOGICAL initialzed_flag
INTEGER initialzed_ierror
!
! Initialize MPI.
!
call MPI_Init ( error )
if (error .ne. MPI_SUCCESS) then
write(*,*)"MPI_INIT error at", __FILE__, " line ", __LINE__
stop "Error MPI_INIT"
endif
c initialize the error to something other than success
initialzed_ierror = MPI_ERR_OTHER
call MPI_INITIALIZED(initialzed_flag , initialzed_ierror )
write(*,*)'initialzed_ierror =', initialzed_ierror
if ( initialzed_ierror .ne. MPI_SUCCESS) then
write(*,*)"Checking for MPI initialzed failed."
endif
if(.not. initialzed_flag)then
write(*,"(A85, 1x, I3)")"MPI not initialized in " //
& trim(__FILE__) // " line ", __LINE__
STOP "MPI initialzing error"
else
write(*,*)"MPI was initialzed"
endif
!
! Get the number of processes.
!
call MPI_Comm_size ( MPI_COMM_WORLD, p, error )
!
! Get the individual process ID.
!
call MPI_Comm_rank ( MPI_COMM_WORLD, id, error )
! Every MPI process will print this message.
!
write ( *, '(a,i1,2x,a)' ) 'P', id, '"Hello, world!"'
!
! Shut down MPI.
!
call MPI_Finalize ( error )
!
! Terminate.
stop
end
The above file was named hello_mpi.F.
makefile:
all: a.out_mpif90 a.out_mpiifort
a.out_mpif90:
mpif90 -f90=ifort hello_mpi.F -o a.out_mpif90 # mimics my error
a.out_mpiifort:
mpiifort hello_mpi.F -o a.out_mpiifort
the file a.out_mpif90 seems to mimic my error in the mwe that I am getting with the entire code even though I am using mpiifort.
I also found that:
ldd -v a.out_mpiifort # gives the same result as the commend below
ldd -v a.out_mpif90
output in the incorrect case (mpirun -np 2):
MPI_INIT error athello_mpi.F line 15
MPI_INIT error athello_mpi.F line 15
STOP Error MPI_INIT
STOP Error MPI_INIT

Error occurred in MPI_Send on communicator MPI_COMM_WORLD MPI_ERR_RANK:invalid rank

I am trying to learn MPI. When I am sending data from 1 processor to another, I am successfully able to send the data and receive it in the other in a variable. But, when I try to send and receive on both the processors I get the invalid rank error.
Here is my code for the program
#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv) {
int world_size;
int rank;
char hostname[256];
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
int tag = 4;
int value = 4;
int master = 0;
int rec;
MPI_Status status;
// Initialize the MPI environment
MPI_Init(&argc,&argv);
// get the total number of processes
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// get the rank of current process
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// get the name of the processor
MPI_Get_processor_name(processor_name, &name_len);
// get the hostname
gethostname(hostname,255);
printf("World size is %d\n",world_size);
if(rank == master){
MPI_Send(&value,1,MPI_INT,1,tag,MPI_COMM_WORLD);
MPI_Recv(&rec,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
printf("In master with value %d\n",rec);
}
if(rank == 1){
MPI_Send(&tag,1,MPI_INT,0,tag,MPI_COMM_WORLD);
MPI_Recv(&rec,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
printf("in slave with rank %d and value %d\n",rank, rec);
}
printf("Hello world! I am process number: %d from processor %s on host %s out of %d processors\n", rank, processor_name, hostname, world_size);
MPI_Finalize();
return 0;
}
Here is my PBS file:
#!/bin/bash
#PBS -l nodes=1:ppn=8,walltime=1:00
#PBS -N MPIsample
#PBS -q edu_shared
#PBS -m abe
#PBS -M blahblah#blah.edu
#PBS -e mpitest.err
#PBS -o mpitest.out
#PBS -d /export/home/blah/MPIsample
mpirun -machinefile $PBS_NODEFILE -np $PBS_NP ./mpitest
The output file comes out like this:
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Job complete
If the world size is 1, the world size should be printed once and not 8 times.
The err file is:
[compute-0-34.local:13110] *** An error occurred in MPI_Send
[compute-0-34.local:13110] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13110] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13110] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13107] *** An error occurred in MPI_Send
[compute-0-34.local:13107] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13107] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13107] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13112] *** An error occurred in MPI_Send
[compute-0-34.local:13112] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13112] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13112] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13108] *** An error occurred in MPI_Send
[compute-0-34.local:13108] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13108] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13108] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13109] *** An error occurred in MPI_Send
[compute-0-34.local:13109] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13109] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13109] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13113] *** An error occurred in MPI_Send
[compute-0-34.local:13113] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13113] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13113] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13106] *** An error occurred in MPI_Send
[compute-0-34.local:13106] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13106] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13106] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13111] *** An error occurred in MPI_Send
[compute-0-34.local:13111] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13111] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13111] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
2 days ago I was able to send and receive simultaneously but after that the working code is showing me this error. Is there any problem in my code or in the High Performance computer that I am working on?
From a MPI point of view, you did not launch one MPI job with 8 MPI tasks, but 8 independent MPI jobs with one MPI task each.
That typically occurs when you are mixing two MPI implementations (for example your application was built with Open MPI, and you are using MPICH mpirun).
Before invoking mpirun, i suggest you add in your PBS script
which mpirun
ldd mpitest
Make sure mpirun and the MPI libs are from the same library (e.g. same vendor and same version)
There was a problem with HPC and it was not allotting me the required number of processors. Thanks guys.

RX channel out of range for configured RX frontends

I have a simple c++ test program on a Ettus x310 that used to work now doesn't. I'm trying to simply set two center freq of two channels of a single USRP. The above Out of range error occurs when I try to set anything on the 2nd channel.
I get a crash with a Channel out of range error:
$ ./t2j.out
linux; GNU C++ version 4.8.4; Boost_105400; UHD_003.009.001-0-gf7a15853
-- X300 initialization sequence...
-- Determining maximum frame size... 1472 bytes.
-- Setup basic communication...
-- Loading values from EEPROM...
-- Setup RF frontend clocking...
-- Radio 1x clock:200
-- Initialize Radio0 control...
-- Performing register loopback test... pass
-- Initialize Radio1 control...
-- Performing register loopback test... pass
terminate called after throwing an instance of 'uhd::index_error'
what(): LookupError: IndexError: multi_usrp: RX channel 140445275195320 out of range for configured RX frontends
Aborted (core dumped)
Here is my test program:
int main( void )
{
// sources
gr::uhd::usrp_source::sptr usrp1;
const std::string usrp_addr = std::string( "addr=192.168.10.30" );
uhd::stream_args_t usrp_args = uhd::stream_args_t( "fc32" );
usrp_args.channels = std::vector<size_t> ( 0, 1 );
usrp1 = gr::uhd::usrp_source::make( usrp_addr, usrp_args );
usrp1->set_subdev_spec( std::string( "A:AB B:AB" ), 0 );
usrp1->set_clock_source( "external" );
usrp1->set_samp_rate( 5.0e6 );
usrp1->set_center_freq( 70e6, 0 ); // this is OK
usrp1->set_center_freq( 70e6, 1 ); // crashes here With RX Chan out of Range Error!
printf( "test Done!\n" );
return 0;
}
The only thing Ive found so far in searching is make sure PYTHONPATH is set correctly (and for the heck of it I made sure it pointed to the site_packages) but again that seems to be related to GRC and not C++.
I am using Ubuntu 14.04.4 and UHD 3.9.1 with gnuradio 3.7.8.1 (Ive also tried 3.7.9.2) with the same result.
The hardware is an Ettus x310 with two BasicRx daughterboards.
Someone from the gnuradio/uhd Mailing List helped me. It appears that the vector initialization was wrong:
Replace:
stream_args.channels = std::vector ( 0, 1 );
With these two lines:
stream_args.channels.push_back( 0 );
stream_args.channels.push_back( 1 );
There are other more concise methods but this does the trick for now.
-Bob

Angelscript - RegisterScriptArray fails

i am trying to get an angelscript test running, however, calling RegisterScriptArray() fails
System function (1, 39) : ERR : Expected '<end of file>'
(0, 0) : ERR : Failed in call to function 'RegisterObjectBehaviour' with 'array' and 'array<T># f(int&in type, int&in list) {repeat T}' (Code: -10)
the code is:
engine = asCreateScriptEngine(ANGELSCRIPT_VERSION);
// message callback
int r = engine->SetMessageCallback(asFUNCTION(as_messageCallback), 0, asCALL_CDECL); assert( r >= 0 );
RegisterStdString(engine);
RegisterScriptArray(engine, false);
r = engine->RegisterGlobalFunction("void print(const string &in)", asFUNCTION(as_print), asCALL_CDECL); assert( r >= 0 );
What should i do? If i comment out the call it works, but thats obviously not what i want to archieve as i want arrays
After asking on their forums I got a reply (actually quite some time ago).
http://www.gamedev.net/topic/657233-registerscriptarray-fails
In case the link dies:
The main issue was a version mismatch between the plugins (which I compiled and installed manually) and the core (which I installed through my package manager). Now I include the plugins in my code and the core is manually compiled.
Hope it helps others encountering the same issue.