Segmentation fault in MPI_Test() when using singleton/wrapper classes - c++

During my work writing a C++ wrapper for MPI I ran into a segmentation fault in MPI_Test(), the reason of which I can't figure out.
The following code is a minimal crashing example, to be compiled and run with mpic++ -std=c++11 -g -o test test.cpp && ./test:
#include <stdlib.h>
#include <stdio.h>
#include <memory>
#include <mpi.h>
class Environment {
public:
static Environment &getInstance() {
static Environment instance;
return instance;
}
static bool initialized() {
int ini;
MPI_Initialized(&ini);
return ini != 0;
}
static bool finalized() {
int fin;
MPI_Finalized(&fin);
return fin != 0;
}
private:
Environment() {
if(!initialized()) {
MPI_Init(NULL, NULL);
_initialized = true;
}
}
~Environment() {
if(!_initialized)
return;
if(finalized())
return;
MPI_Finalize();
}
bool _initialized{false};
public:
Environment(Environment const &) = delete;
void operator=(Environment const &) = delete;
};
class Status {
private:
std::shared_ptr<MPI_Status> _mpi_status;
MPI_Datatype _mpi_type;
};
class Request {
private:
std::shared_ptr<MPI_Request> _request;
int _flag;
Status _status;
};
int main() {
auto &m = Environment::getInstance();
MPI_Request r;
MPI_Status s;
int a;
MPI_Test(&r, &a, &s);
Request r2;
printf("b\n");
}
Basically, the Environment class is a singleton wrapper around MPI_Init and MPI_Finalize. When the program exits, MPI will be finalized and the first time the class is instantiated, MPI_Init is called. Then I do some MPI stuff in the main() function, involving some other simple wrapper objects.
The code above crashes (on my machine, OpenMPI & Linux). However, it works when I
comment any of the private members of Request or Status (even int _flag;)
comment the last line, printf("b\n");
Replace auto &m = Environment::getInstance(); with MPI_Init().
There doesn't seem to be a connection between these points and I have no clue where to look for the segmentation fault.
The stack trace is:
[pc13090:05978] *** Process received signal ***
[pc13090:05978] Signal: Segmentation fault (11)
[pc13090:05978] Signal code: Address not mapped (1)
[pc13090:05978] Failing at address: 0x61
[pc13090:05978] [ 0] /usr/lib/libpthread.so.0(+0x11dd0)[0x7fa9cf818dd0]
[pc13090:05978] [ 1] /usr/lib/openmpi/libmpi.so.40(ompi_request_default_test+0x16)[0x7fa9d0357326]
[pc13090:05978] [ 2] /usr/lib/openmpi/libmpi.so.40(MPI_Test+0x31)[0x7fa9d03970b1]
[pc13090:05978] [ 3] ./test(+0xb7ae)[0x55713d1aa7ae]
[pc13090:05978] [ 4] /usr/lib/libc.so.6(__libc_start_main+0xea)[0x7fa9cf470f4a]
[pc13090:05978] [ 5] ./test(+0xb5ea)[0x55713d1aa5ea]
[pc13090:05978] *** End of error message ***
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node pc13090 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Related

g++ and gcc behave differently on pthread_cleanup_push/pop

This is my code
#include <pthread.h>
#include <stdio.h>
void cleanup(void *arg) {
printf("cleanup: %s\n", (const char*)arg);
}
void *thr_fn1(void *arg) {
printf("thread 1 strat\n");
pthread_cleanup_push(cleanup, (void*)"thread 1 first handler");
pthread_cleanup_push(cleanup, (void*)"thread 1 first handler");
if(arg)
return (void*)1;
pthread_cleanup_pop(0);
pthread_cleanup_pop(0);
return (void*)1;
}
void *thr_fn2(void *arg) {
printf("thread 2 strat\n");
pthread_cleanup_push(cleanup, (void*)"thread 2 first handler");
pthread_cleanup_push(cleanup, (void*)"thread 2 first handler");
if(arg)
return (void*)2;
pthread_cleanup_pop(0);
pthread_cleanup_pop(0);
return (void*)2;
}
int main() {
int err;
pthread_t tid1, tid2;
void *tret;
pthread_create(&tid1, NULL, thr_fn1, (void*)1);
pthread_create(&tid2, NULL, thr_fn2, (void*)1);
pthread_join(tid1, &tret);
printf("pthread 1 exit code %ld\n", tret);
pthread_join(tid2, &tret);
printf("pthread 2 exit code %ld\n", tret);
return 0;
}
Now I run it using gcc and g++
$ gcc main.c -o main
$ ./main
thread 2 strat
thread 1 strat
pthread 1 exit code 1
pthread 2 exit code 2
$ g++ main.c -o main
$ ./main
thread 1 strat
cleanup: thread 1 first handler
cleanup: thread 1 first handler
thread 2 strat
cleanup: thread 2 first handler
cleanup: thread 2 first handler
pthread 1 exit code 1
pthread 2 exit code 2
$
Why they behave differently?
Any other functions behave like this?
I found the implementations of gcc and g++ are different. So which one is a better implementation?
On Linux, the pthread_cleanup_push() and pthread_cleanup_pop() functions are implemented as macros that expand to text containing { and }, respectively.
# define pthread_cleanup_push(routine, arg) \
do { \
__pthread_cleanup_class __clframe (routine, arg)
If compiled with g++, __pthread_cleanup_class is a C++ class:
#ifdef __cplusplus
/* Class to handle cancellation handler invocation. */
class __pthread_cleanup_class
{
void (*__cancel_routine) (void *);
void *__cancel_arg;
int __do_it;
int __cancel_type;
public:
__pthread_cleanup_class (void (*__fct) (void *), void *__arg)
: __cancel_routine (__fct), __cancel_arg (__arg), __do_it (1) { }
~__pthread_cleanup_class () { if (__do_it) __cancel_routine (__cancel_arg); }
void __setdoit (int __newval) { __do_it = __newval; }
void __defer () { pthread_setcanceltype (PTHREAD_CANCEL_DEFERRED,
&__cancel_type); }
void __restore () const { pthread_setcanceltype (__cancel_type, 0); }
};
It behaves like any class, its destructor runs on scope end.
In C, using gcc, cleanup handlers require pthread_exit(), but your code does return.
When a thread terminates by calling pthread_exit(3), all clean-up handlers are executed as described in the preceding point. (Clean-up handlers are not called if the thread terminates by performing a return from the thread start function.)

Segmentation fault with MPI_Comm_Rank

I have to work on a code written a few years ago which uses MPI and PETSc.
When I try to run it, I have an error with the function MPI_Comm_rank().
Here is the beginning of the code :
int main(int argc,char **argv)
{
double mesure_tps2,mesure_tps1;
struct timeval tv;
time_t curtime2,curtime1;
char help[] = "Solves linear system with KSP.\n\n"; // NB: Petsc est defini dans "fafemo_Constant_Globales.h"
std::cout<< "d�but PetscInitialize" <<std::endl;
(void*) PetscInitialize(&argc,&argv,(char *)0,help);
std::cout<< "d�but PetscInitialize fait" <<std::endl;
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
PetscFinalize();
}
Obviously, there are some code between MPI_Comm_rank() and PetscFinalize().
PetscInitialize and PetscFinalize call respectively MPI_INIT and MPI_FINALIZE.
In my makefil I have :
PETSC_DIR=/home/thib/Documents/bibliotheques/petsc-3.13.2
PETSC_ARCH=arch-linux-c-debug
include ${PETSC_DIR}/lib/petsc/conf/variables
include ${PETSC_DIR}/lib/petsc/conf/rules
PETSC36 = -I/home/thib/Documents/bibliotheques/petsc-3.13.2/include -I/home/thib/Documents/bibliotheques/petsc-3.13.2/arch-linux-c-debug/include
Mpi_include=-I/usr/lib/x86_64-linux-gnu/openmpi
#a variable with some files names
fafemo_files = fafemo_CI_CL-def.cc fafemo_Flux.cc fafemo_initialisation_probleme.cc fafemo_FEM_setup.cc fafemo_sorties.cc fafemo_richards_solve.cc element_read_split.cpp point_read_split.cpp read_split_mesh.cpp
PETSC_KSP_LIB_VSOIL=-L/home/thib/Documents/bibliotheques/petsc-3.13.2/ -lpetsc_real -lmpi -lmpi++
fafemo: ${fafemo_files} fafemo_Richards_Main.o
g++ ${CXXFLAGS} -g -o fafemo_CD ${fafemo_files} fafemo_Richards_Main.cc ${PETSC_KSP_LIB_VSOIL} $(PETSC36) ${Mpi_include}
Using g++ or mpic++ doesn't seem to change anything.
It compiles, but when I try to execute I have :
[thib-X540UP:03696] Signal: Segmentation fault (11)
[thib-X540UP:03696] Signal code: Address not mapped (1)
[thib-X540UP:03696] Failing at address: 0x44000098
[thib-X540UP:03696] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3efd0)[0x7fbfa87e4fd0]
[thib-X540UP:03696] [ 1] /usr/lib/x86_64-linux-gnu/libmpi.so.20(MPI_Comm_rank+0x42)[0x7fbfa9533c42]
[thib-X540UP:03696] [ 2] ./fafemo_CD(+0x230c8)[0x561caa6920c8]
[thib-X540UP:03696] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fbfa87c7b97]
[thib-X540UP:03696] [ 4] ./fafemo_CD(+0x346a)[0x561caa67246a]
[thib-X540UP:03696] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node thib-X540UP exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Also, I have others MPI programs on my computer and I never had such a problem.
Does anyone know why do I get this ?
If someone has the same issue :
When I installed PETSc, I ran ./configure with --download-mpich while I already had mpi installed on my computer.
To solve the problem I did "rm -rf ${PETSC_ARCH}" and ran ./configure again.

C++: <sys/sysctl.h> fails to declare functions CTL_HW and HW_NCPU

Aloha all!
I'm working with the following script (which I did not write). This is one of many files I've been working on modifying to initiate a build/make on Linux.
Everything I've found online suggests that sys/sysctl.h should properly declare these functions:
CTL_HW and HW_NCPU
However, running the following (called "machineInfo.cpp"):
#include "machineInfo.h"
#include <sys/sysctl.h>
#include <linux/sysctl.h>
#include <cstdio>
#define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
int StMachineInfo::numProcs(void) {
int numCPU = 0;
int nprocs;
size_t len = sizeof(nprocs);
static int mib[2] = { CTL_HW, HW_NCPU };
/* get the number of CPUs from the system */
sysctl(mib, 2, &numCPU, &len, NULL, 0);
if( numCPU < 1 )
{
mib[1] = HW_NCPU;
if (sysctl (mib, ARRAY_SIZE(mib), &nprocs, &len, NULL, 0) == 0 && len == sizeof (nprocs) && 0 < nprocs)
numCPU = nprocs;
if( numCPU < 1 )
numCPU = 1;
}
return numCPU;
}
...results in the following error output:
g++ -c machineInfo.cpp
machineInfo.cpp: In function ‘int StMachineInfo::numProcs()’:
machineInfo.cpp:14:24: error: ‘CTL_HW’ was not declared in this scope
static int mib[2] = { CTL_HW, HW_NCPU };
^
machineInfo.cpp:14:32: error: ‘HW_NCPU’ was not declared in this scope
static int mib[2] = { CTL_HW, HW_NCPU };
^
Makefile:33: recipe for target 'machineinfo.o' failed
make: *** [machineinfo.o] Error 1
Is there something wrong with the code itself? Or do I need to #include another header? I've experimented with this and Googled for a couple of hours, to no avail.
Many thanks,
Sean
I believe the problem here is that sysctl does not have a glibc wrapper on Linux. By my best understanding, those constants are only available on BSD.
I'd be happy to be proven wrong, as I'm trying to understand if this uname -p behavior could ever work on Linux.

Backtrace inside Signal Handler

I'm trying to following the code from this post to have signal handlers print a backtrace on errors such as floating point and segmentation faults. I'm using seg fault signals as a starting point. Here is the code:
#include <cstdlib> //for exit()
#include <signal.h> //signal handling
#include <execinfo.h> //backtrace, backtrace_symbols and backtrace_fd
#include <iostream>
#include <string.h>
#include <stdio.h>
#define TRACE_MSG fprintf(stderr, "TRACE at: %s() [%s:%d]\n", \
__FUNCTION__, __FILE__, __LINE__)
void show_stackframe()
{
void *trace[1024];
char **messages = (char **) NULL;
int i, trace_size = 0;
TRACE_MSG;
trace_size = backtrace(trace, 1024); // segfault here???
// More code here to print backtrace, but not needed at the moment..
TRACE_MSG;
}
void sigSegvHandler( int signum, siginfo_t* info, void* arg )
{
TRACE_MSG;
show_stackframe();
return;
}
double func_b()
{
show_stackframe(); // Show that backtrace works without being
// called inside sighandler.
TRACE_MSG;
int int_a[5];
int_a[0] = 4;
int_a[11] = 10; // cause a segfault on purpose to see
// how the signal handling performs.
return 1.1;
}
int main()
{
// Examine and change the seg fault signal
struct sigaction segvAction; // File: /usr/include/bits/sigaction.h
// Initialize segvAction struct to all zeros for initialiation
memset( &segvAction, 0, sizeof( segvAction ) );
segvAction.sa_sigaction = sigSegvHandler;
segvAction.sa_flags = SA_SIGINFO; //Invoke signal catching function with 3 arguments instead of 1
// Set the action for the SIGSEGV signal
sigaction( SIGSEGV, &segvAction, NULL );
func_b(); // Produce a SIGSEGV error
}
I am compiling using:
g++ -rdynamic testprogram.cpp -o testprogram
I receive the following output from the program:
TRACE at: show_stackframe() [stackoverflow.cpp:15]
TRACE at: show_stackframe() [stackoverflow.cpp:17]
TRACE at: func_b() [stackoverflow.cpp:33]
TRACE at: sigSegvHandler() [stackoverflow.cpp:22]
TRACE at: show_stackframe() [stackoverflow.cpp:15]
Segmentation fault
My question is why does show_stackframe() cause a segmentation fault inside of sigaction but works fine when not inside of the sigaction handler? I obviously seem to be setting up the signal handler/action incorrect but I haven't been able to find it all day. GDB doesn't seem to be any help in this case.
As stated here, the backtrace function is AS-Unsafe, which means it is unsafe to call from an asynchronous signal handler. Doing so invokes undefined behavior.

MPI_Send MPI_Recv segfault in C++

I have written a simple program in MPI, which sends and receives messages between the processors but its running with segmentation fault.
Here's my entire code
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <string.h>
#include <strings.h>
#include <sstream>
#include<mpi.h>
using namespace std;
class Case {
public:
int value;
std::stringstream sta;
};
int main(int argc, char **argv) {
int rank,size;
MPI::Init(argc,argv);
rank=MPI::COMM_WORLD.Get_rank();
size=MPI::COMM_WORLD.Get_size();
if(rank==0){
Case *s=new Case();
s->value=1;
s->sta<<"test";
cout<<"\nInside send before copy value :"<<s->value;
fflush(stdout);
cout<<"\nInside send before copy data :"<<s->sta.str();
fflush(stdout);
Case scpy;
scpy.value=s->value;
scpy.sta<<(s->sta).rdbuf();
cout<<"\nInside send after copy value :"<<scpy.value;
cout<<"\nInside send after copy value :"<<scpy.sta.str();
MPI::COMM_WORLD.Send(&scpy,sizeof(Case),MPI::BYTE,1,23);
}
MPI::COMM_WORLD.Barrier();
if(rank==1){
Case r;
MPI::COMM_WORLD.Recv(&r,sizeof(Case),MPI::BYTE,0,23);
cout<<"\nRecieve value"<<r.value;
fflush(stdout);
cout<<"\nRecieve data"<<r.sta;
fflush(stdout);
}
MPI::Finalize();
return 0;
}
I got the below error message and I'm not able to figure out what is wrong in this program. Can anyone please explain?
Inside send before copy value :1
Inside send before copy data :test
Inside send after copy value :1
Recieve value1
Recieve data0xbfa5d6b4[localhost:03706] *** Process received signal ***
[localhost:03706] Signal: Segmentation fault (11)
[localhost:03706] Signal code: Address not mapped (1)
[localhost:03706] Failing at address: 0x8e1a210
[localhost:03706] [ 0] [0xe6940c]
[localhost:03706] [ 1] /usr/lib/libstdc++.so.6(_ZNSt18basic_stringstreamIcSt11char_traitsIcESaIcEED1Ev+0xc6) [0x6a425f6]
[localhost:03706] [ 2] ./a.out(_ZN4CaseD1Ev+0x14) [0x8052d8e]
[localhost:03706] [ 3] ./a.out(main+0x2f9) [0x804f90d]
[localhost:03706] [ 4] /lib/libc.so.6(__libc_start_main+0xe6) [0x897e36]
[localhost:03706] [ 5] ./a.out() [0x804f581]
[localhost:03706] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 3706 on node localhost.localdomain exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Problem
I think the problem is that the line:
MPI::COMM_WORLD.Send(&scpy,sizeof(Case),MPI::BYTE,1,23);
sends a copy of the Case structure to the receiver, but it is sending a raw copy of the bytes which is not very useful. The std::stringstream class will contain a pointer to the actual memory used to store your string, so this code will:
Send a pointer to the receiver (containing an address that will be meaningless to the receiver)
Not send the actual contents of the string.
The receiver will seg fault when it attempts to dereference the invalid pointer.
Fix 1
One approach to fix this is to send the character data yourself.
In this approach you would send a message pointing to std::stringstream::str()::c_str() and of length std::stringstream::str()::size()*sizeof(char).
Fix 2
An alternative approach that seems to fit better with the way you are attempting to use MPI and strings is to use the Boost libraries. Boost contains functions for MPI that automatically serialize the data for you.
A useful tutorial on Boost and MPI is available on the boost website.
Here is example code from that tutorial that does a similar task:
#include <boost/mpi.hpp>
#include <iostream>
#include <string>
#include <boost/serialization/string.hpp>
namespace mpi = boost::mpi;
int main(int argc, char* argv[])
{
mpi::environment env(argc, argv);
mpi::communicator world;
if (world.rank() == 0) {
world.send(1, 0, std::string("Hello"));
std::string msg;
world.recv(1, 1, msg);
std::cout << msg << "!" << std::endl;
} else {
std::string msg;
world.recv(0, 0, msg);
std::cout << msg << ", ";
std::cout.flush();
world.send(0, 1, std::string("world"));
}
return 0;
}