I have problem with MPI library.
I have to read the text from file and send it to another processes
for example as a vector.
I've written the following code:
#include "mpi.h"
#include <stdio.h>
#include<string.h>
#include<stdlib.h>
#include<string>
#include <fstream>
#include <cstring>
#include <vector>
class PatternAndText
{
public:
static std::string textPreparaation()
{
std::ifstream t("file.txt");
std::string str((std::istreambuf_iterator<char>(t)), std::istreambuf_iterator<char>());
std::string text = str;
return text;
}
};
int main(int argc, char* argv[])
{
int size, rank ;
std::string text;
std::vector<char> cstr;
MPI_Init(&argc, &argv);
MPI_Status status;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == 0)
{
text = PatternAndText::textPreparaation();
std::vector<char> cstr(text.c_str(), text.c_str() + text.size() + 1);
}
MPI_Bcast(cstr.data(), cstr.size(), MPI_CHAR,0,MPI_COMM_WORLD);
if (rank != 0 )
{
std::cout<<"\n";
std::cout<<cstr[1]<<" "<<rank;
std::cout<<"\n";
}
MPI_Finalize();
return 0;
}
I want to read the text from file by main process and broadcast i to the others.
When I try to run, it gives me:
[alek:26408] *** Process received signal ***
[alek:26408] Signal: Segmentation fault (11)
[alek:26408] Signal code: Address not mapped (1)
[alek:26408] Failing at address: 0x1
[alek:26408] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7fc7c1a8af20]
[alek:26408] [ 1] spli(+0xc63d)[0x55b0104bb63d]
[alek:26408] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fc7c1a6db97]
[alek:26408] [ 3] spli(+0xc3ba)[0x55b0104bb3ba]
[alek:26408] *** End of error message ***
[alek:26406] *** Process received signal ***
[alek:26406] Signal: Segmentation fault (11)
[alek:26406] Signal code: Address not mapped (1)
[alek:26406] Failing at address: 0x1
[alek:26406] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f01ef5f5f20]
[alek:26406] [ 1] spli(+0xc63d)[0x5579714df63d]
[alek:26406] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f01ef5d8b97]
[alek:26406] [ 3] spli(+0xc3ba)[0x5579714df3ba]
[alek:26406] *** End of error message ***
[alek:26414] *** Process received signal ***
[alek:26414] Signal: Segmentation fault (11)
[alek:26414] Signal code: Address not mapped (1)
[alek:26414] Failing at address: 0x1
[alek:26413] *** Process received signal ***
[alek:26422] *** Process received signal ***
[alek:26417] *** Process received signal ***
[alek:26417] Signal: Segmentation fault (11)
[alek:26417] Signal code: Address not mapped (1)
[alek:26417] Failing at address: 0x1
[alek:26413] Signal: Segmentation fault (11)
[alek:26413] Signal code: Address not mapped (1)
[alek:26413] Failing at address: 0x1
[alek:26422] Signal: Segmentation fault (11)
[alek:26422] Signal code: Address not mapped (1)
[alek:26422] Failing at address: 0x1
[alek:26413] [alek:26425] *** Process received signal ***
[alek:26425] Signal: Segmentation fault (11)
[alek:26425] Signal code: Address not mapped (1)
[alek:26425] Failing at address: 0x1
[alek:26414] [ 0] [alek:26422] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7ff9c3740f20]
[alek:26414] [ 1] spli(+0xc63d)[0x563e8a58563d]
[alek:26417] [ 0] [alek:26425] [ 0] [ 0] [alek:26414] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7/lib/x86_64-linux-gnu/libc.so.6/lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f5a4dd75f20]
[alek:26417] /lib/x86_64-linux-gnu/libc.so.6(/lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f90009f0f20]
[alek:26425] [ 1] spli(+0xc63d))[0x7ff9c3723b97]
[alek:26414] [ 3] spli+0x3ef20)[0x7f2a6faf6f20]
[alek:26413] [ 1] spli(+0xc63d)[0x5557de07763d]
[0x557dee98063d]
[alek:26425] [ 2] (+0xc3ba)[0x563e8a5853ba]
[alek:26414] *** End of error message ***
(+0x3ef20)[0x7f8c41861f20]
[alek:26422] [ 1] spli(+0xc63d)[ 1] spli(+0xc63d)[0x5650e93dc63d]
[alek:26417] [alek:26413] [ 2] [0x561eb2de463d]
[alek:26422] [ 2] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f2a6fad9b97]
[alek:26413] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f5a4dd58b97]
[alek:26417] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f8c41844b97]
[alek:26422] [ 3] spli(+0xc3ba)[0x5557de0773ba]
[alek:26413] *** End of error message ***
spli(+0xc3ba)[0x5650e93dc3ba]
[alek:26417] *** End of error message ***
spli(+0xc3ba)[0x561eb2de43ba]
[alek:26422] *** End of error message ***
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f90009d3b97]
[alek:26425] [ 3] spli(+0xc3ba)[0x557dee9803ba]
[alek:26425] *** End of error message ***
[alek:26411] *** Process received signal ***
[alek:26411] Signal: Segmentation fault (11)
[alek:26411] Signal code: Address not mapped (1)
[alek:26411] Failing at address: 0x1
[alek:26411] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f1a339adf20]
[alek:26411] [ 1] spli(+0xc63d)[0x555737c1263d]
[alek:26411] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f1a33990b97]
[alek:26411] [ 3] spli(+0xc3ba)[0x555737c123ba]
[alek:26411] *** End of error message ***
[warn] Epoll ADD(4) on fd 88 failed. Old events were 0; read change was 0 (none); write change was 1 (add): Bad file descriptor
--------------------------------------------------------------------------
mpirun noticed that process rank 5 with PID 0 on node alek exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
When I check the size instead of [0] processes print 0
What should I change to make it works?
The local variable std::vector<char> cstr inside if (rank == 0) {...} shadows that inside main. The variable cstr inside main is not affected.
To assign data to cstr, use cstr.assign(...):
if (rank == 0) {
const std::string text = PatternAndText::textPreparaation();
cstr.assign(text.c_str(), text.c_str() + text.size() + 1);
}
Other processes should first allocate storage in cstr by calling cstr.resize(...). To do that they should know its size. You can first broadcast size and then resize cstr:
unsigned long long size = cstr.size();
MPI_Bcast(&size, 1, MPI_UNSIGNED_LONG_LONG, 0, MPI_COMM_WORLD);
if (rank != 0)
cstr.resize(size);
before broadcasting the vector itself:
MPI_Bcast(cstr.data(), size, MPI_CHAR, 0, MPI_COMM_WORLD);
Related
I have to work on a code written a few years ago which uses MPI and PETSc.
When I try to run it, I have an error with the function MPI_Comm_rank().
Here is the beginning of the code :
int main(int argc,char **argv)
{
double mesure_tps2,mesure_tps1;
struct timeval tv;
time_t curtime2,curtime1;
char help[] = "Solves linear system with KSP.\n\n"; // NB: Petsc est defini dans "fafemo_Constant_Globales.h"
std::cout<< "d�but PetscInitialize" <<std::endl;
(void*) PetscInitialize(&argc,&argv,(char *)0,help);
std::cout<< "d�but PetscInitialize fait" <<std::endl;
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
PetscFinalize();
}
Obviously, there are some code between MPI_Comm_rank() and PetscFinalize().
PetscInitialize and PetscFinalize call respectively MPI_INIT and MPI_FINALIZE.
In my makefil I have :
PETSC_DIR=/home/thib/Documents/bibliotheques/petsc-3.13.2
PETSC_ARCH=arch-linux-c-debug
include ${PETSC_DIR}/lib/petsc/conf/variables
include ${PETSC_DIR}/lib/petsc/conf/rules
PETSC36 = -I/home/thib/Documents/bibliotheques/petsc-3.13.2/include -I/home/thib/Documents/bibliotheques/petsc-3.13.2/arch-linux-c-debug/include
Mpi_include=-I/usr/lib/x86_64-linux-gnu/openmpi
#a variable with some files names
fafemo_files = fafemo_CI_CL-def.cc fafemo_Flux.cc fafemo_initialisation_probleme.cc fafemo_FEM_setup.cc fafemo_sorties.cc fafemo_richards_solve.cc element_read_split.cpp point_read_split.cpp read_split_mesh.cpp
PETSC_KSP_LIB_VSOIL=-L/home/thib/Documents/bibliotheques/petsc-3.13.2/ -lpetsc_real -lmpi -lmpi++
fafemo: ${fafemo_files} fafemo_Richards_Main.o
g++ ${CXXFLAGS} -g -o fafemo_CD ${fafemo_files} fafemo_Richards_Main.cc ${PETSC_KSP_LIB_VSOIL} $(PETSC36) ${Mpi_include}
Using g++ or mpic++ doesn't seem to change anything.
It compiles, but when I try to execute I have :
[thib-X540UP:03696] Signal: Segmentation fault (11)
[thib-X540UP:03696] Signal code: Address not mapped (1)
[thib-X540UP:03696] Failing at address: 0x44000098
[thib-X540UP:03696] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3efd0)[0x7fbfa87e4fd0]
[thib-X540UP:03696] [ 1] /usr/lib/x86_64-linux-gnu/libmpi.so.20(MPI_Comm_rank+0x42)[0x7fbfa9533c42]
[thib-X540UP:03696] [ 2] ./fafemo_CD(+0x230c8)[0x561caa6920c8]
[thib-X540UP:03696] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fbfa87c7b97]
[thib-X540UP:03696] [ 4] ./fafemo_CD(+0x346a)[0x561caa67246a]
[thib-X540UP:03696] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node thib-X540UP exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Also, I have others MPI programs on my computer and I never had such a problem.
Does anyone know why do I get this ?
If someone has the same issue :
When I installed PETSc, I ran ./configure with --download-mpich while I already had mpi installed on my computer.
To solve the problem I did "rm -rf ${PETSC_ARCH}" and ran ./configure again.
I have a vector
std::vector<double**> blocks(L);
std::vector<double**> localblocks(blocks_local);
I then use send commands to send the data that resides on rank 0 to the other ranks (I think that this is the correct terminology)
for(i=1;i<numnodes-1;i++)
{
for(j=0;j<blocks_local;j++)
{
MPI_Send(blocks[i*blocks_local+j],N*N,MPI_DOUBLE,i,j,MPI_COMM_WORLD);
}
}
Up until this point the code runs perfectly fine: No errors. Then on the remaining ranks the following code is
for(i=0;i<blocks_local;i++)
{
MPI_Recv(&localblocks[i],N*N,MPI_DOUBLE,0,i,MPI_COMM_WORLD,&status);
}
It is at this point I get an invalid pointer error.
The total output is
6.8297e-05
3.6895e-05
4.3906e-05
4.4463e-05 << these just show the time it takes for a process to complete. Shows that the program has excited successfully.
free(): invalid pointer
[localhost:16841] *** Process received signal ***
[localhost:16841] Signal: Aborted (6)
[localhost:16841] Signal code: (-6)
free(): invalid pointer
free(): invalid pointer
[localhost:16842] *** Process received signal ***
[localhost:16842] Signal: Aborted (6)
[localhost:16842] Signal code: (-6)
[localhost:16840] *** Process received signal ***
[localhost:16840] Signal: Aborted (6)
[localhost:16840] Signal code: (-6)
[localhost:16841] [ 0] /lib64/libpthread.so.0(+0x11fc0)[0x7fb761c10fc0]
[localhost:16841] [ 1] [localhost:16840] [ 0] [localhost:16842] [ 0] /lib64/libc.so.6(gsignal+0x10b)[0x7fb761876f2b]
[localhost:16841] [ 2] /lib64/libpthread.so.0(+0x11fc0)[0x7fb0f377cfc0]
[localhost:16840] [ 1] /lib64/libpthread.so.0(+0x11fc0)[0x7f0ec17e9fc0]
[localhost:16842] [ 1] /lib64/libc.so.6(gsignal+0x10b)[0x7fb0f33e2f2b]
[localhost:16840] /lib64/libc.so.6(abort+0x12b)[0x7fb761861561]
/lib64/libc.so.6(gsignal+0x10b)[0x7f0ec144ff2b]
[localhost:16842] [ 2] [localhost:16841] [ 3] [ 2] /lib64/libc.so.6(abort+0x12b)/lib64/libc.so.6(abort+0x12b)[0x7fb0f33cd561]
[localhost:16840] [ 3] [0x7f0ec143a561]
[localhost:16842] [ 3] /lib64/libc.so.6(+0x79917)[0x7fb7618b9917]
[localhost:16841] /lib64/libc.so.6(+0x79917)[0x7fb0f3425917]
[ 4] [localhost:16840] [ 4] /lib64/libc.so.6(+0x79917)[0x7f0ec1492917]
/lib64/libc.so.6(+0x7fdec)[0x7fb7618bfdec]
[localhost:16841] [ 5] [localhost:16842] [ 4] /lib64/libc.so.6(+0x7fdec)[0x7fb0f342bdec]
[localhost:16840] [ 5] /lib64/libc.so.6(+0x8157c)[0x7fb7618c157c]
[localhost:16841] [ 6] /lib64/libc.so.6(+0x7fdec)[0x7f0ec1498dec]
[localhost:16842] [ 5] /lib64/libc.so.6(+0x8157c)[0x7fb0f342d57c]
[localhost:16840] [ 6] /usr/lib64/openmpi/lib/libopen-pal.so.20(+0x4ffb2)[0x7fb76134dfb2]
[localhost:16841] [ 7] /lib64/libc.so.6(+0x8157c)[0x7f0ec149a57c]
[localhost:16842] [ 6] /usr/lib64/openmpi/lib/libopen-pal.so.20(+0x4ffb2)[0x7fb0f2eb9fb2]
[localhost:16840] /usr/lib64/openmpi/lib/libmpi.so.20(ompi_win_finalize+0x1c1)[0x7fb7627a9881]
[localhost:16841] [ 8] /usr/lib64/openmpi/lib/libopen-pal.so.20(+0x4ffb2)[0x7f0ec0f26fb2]
[localhost:16842] [ 7] [ 7] /usr/lib64/openmpi/lib/libmpi.so.20(ompi_win_finalize+0x1c1)[0x7fb0f4315881]
[localhost:16840] [ 8] /usr/lib64/openmpi/lib/libmpi.so.20(ompi_win_finalize+0x1c1)[0x7f0ec2382881]
[localhost:16842] [ 8] /usr/lib64/openmpi/lib/libmpi.so.20(ompi_mpi_finalize+0x2f1)[0x7fb7627a7711]
[localhost:16841] [ 9] ./a.out[0x408c75]
/usr/lib64/openmpi/lib/libmpi.so.20(ompi_mpi_finalize+0x2f1)[0x7fb0f4313711]
[localhost:16840] [ 9] ./a.out[0x408c75]
/usr/lib64/openmpi/lib/libmpi.so.20(ompi_mpi_finalize+0x2f1)[0x7f0ec2380711]
[localhost:16842] [ 9] [localhost:16841] [10] /lib64/libc.so.6(__libc_start_main+0xeb)./a.out[0x408c75]
[localhost:16842] [localhost:16840] [10] [0x7fb76186318b]
[localhost:16841] [11] ./a.out[0x40896a]
[localhost:16841] *** End of error message ***
/lib64/libc.so.6(__libc_start_main+0xeb)[0x7fb0f33cf18b]
[localhost:16840] [11] ./a.out[0x40896a]
[localhost:16840] *** End of error message ***
[10] /lib64/libc.so.6(__libc_start_main+0xeb)[0x7f0ec143c18b]
[localhost:16842] [11] ./a.out[0x40896a]
[localhost:16842] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 0 on node localhost exited on signal 6 (Aborted).
--------------------------------------------------------------------------
I have deleted my clean up code. So this cleanup has to come from MPI, i am unsure how to resolve this.
void BlockMatVecMultiplication(int mynode, int numnodes,int N,int L, std::vector<double **> blocks,double *x,double* y)
{
int i,j;
int local_offset,blocks_local,last_blocks_local;
int *count;
int *displacements;
// The number of rows each processor is dealt
blocks_local = L/numnodes;
double *temp = new double[N*blocks_local];
std::vector<double**> localblocks(blocks_local);
// the offset
local_offset = mynode*blocks_local;
MPI_Status status;
if(mynode == (numnodes-1))
{
blocks_local = L-blocks_local*(numnodes-1);
}
/* Distribute the blocks across the processes */
// At this point node 0 has the matrix. So we only need
// to distribute among the remaining nodes, using the
// last node as a cleanup.
if(mynode ==0)
{
// This deals the matrix between processes 1 to numnodes -2
for(i=1;i<numnodes-1;i++)
{
for(j=0;j<blocks_local;j++)
{
MPI_Send(blocks[i*blocks_local+j],N*N,MPI_DOUBLE,i,j,MPI_COMM_WORLD);
}
}
// Here we use the last process to "clean up". For small N
// the load is poorly balanced.
last_blocks_local = L- blocks_local*(numnodes-1);
for(j=0;j<last_blocks_local;j++)
{
MPI_Send(blocks[(numnodes-1)*blocks_local+j],N*N,MPI_DOUBLE,numnodes-1,j,MPI_COMM_WORLD);
}
}
else
{
/*This code allows other processes to obtain the chunks of data
/* sent by process 0 */
/* rows_local has a different value on the last processor, remember */
for(i=0;i<blocks_local;i++)
{
MPI_Recv(&localblocks[i],N*N,MPI_DOUBLE,0,i,MPI_COMM_WORLD,&status);
}
}
}
The above method is called from the below code
#include <iostream>
#include <iomanip>
#include <mpi.h>
#include "SCmathlib.h"
#include "SCchapter7.h"
using namespace std;
int main(int argc, char * argv[])
{
int i,j, N = 10,L=10;
double **A,*x,*y;
int totalnodes,mynode;
std::vector<double**> blocks;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
MPI_Comm_rank(MPI_COMM_WORLD, &mynode);
//output variable
y = new double[N];
//input variable
x = new double[N];
for(i=0;i<N;i++)
{
x[i] = 1.0;
}
// forms identity matrix on node 0
if(mynode==0)
{
for(j=0;j<L;j++)
{
A = CreateMatrix(N,N);
// fills the block
for(i=0;i<N;i++)
{
A[i][i] = 1.0;
}
blocks.push_back(A);
}
}
double start = MPI_Wtime();
BlockMatVecMultiplication(mynode,totalnodes,N,L,blocks,x,y);
double end = MPI_Wtime();
if(mynode==0)
{
for(i=0;i<L;i++)
{
//DestroyMatrix(blocks[i],N,N);
}
//delete[] x;
//delete[] y;
}
std::cout << end- start << std::endl;
MPI_Finalize();
}
The "includes" just provide basic matrix functionality. The following function creates a matrix
double ** CreateMatrix(int m, int n){
double ** mat;
mat = new double*[m];
for(int i=0;i<m;i++){
mat[i] = new double[n];
for(int j=0;j<m;j++)
mat[i][j] = 0.0;
}
return mat;
}
During my work writing a C++ wrapper for MPI I ran into a segmentation fault in MPI_Test(), the reason of which I can't figure out.
The following code is a minimal crashing example, to be compiled and run with mpic++ -std=c++11 -g -o test test.cpp && ./test:
#include <stdlib.h>
#include <stdio.h>
#include <memory>
#include <mpi.h>
class Environment {
public:
static Environment &getInstance() {
static Environment instance;
return instance;
}
static bool initialized() {
int ini;
MPI_Initialized(&ini);
return ini != 0;
}
static bool finalized() {
int fin;
MPI_Finalized(&fin);
return fin != 0;
}
private:
Environment() {
if(!initialized()) {
MPI_Init(NULL, NULL);
_initialized = true;
}
}
~Environment() {
if(!_initialized)
return;
if(finalized())
return;
MPI_Finalize();
}
bool _initialized{false};
public:
Environment(Environment const &) = delete;
void operator=(Environment const &) = delete;
};
class Status {
private:
std::shared_ptr<MPI_Status> _mpi_status;
MPI_Datatype _mpi_type;
};
class Request {
private:
std::shared_ptr<MPI_Request> _request;
int _flag;
Status _status;
};
int main() {
auto &m = Environment::getInstance();
MPI_Request r;
MPI_Status s;
int a;
MPI_Test(&r, &a, &s);
Request r2;
printf("b\n");
}
Basically, the Environment class is a singleton wrapper around MPI_Init and MPI_Finalize. When the program exits, MPI will be finalized and the first time the class is instantiated, MPI_Init is called. Then I do some MPI stuff in the main() function, involving some other simple wrapper objects.
The code above crashes (on my machine, OpenMPI & Linux). However, it works when I
comment any of the private members of Request or Status (even int _flag;)
comment the last line, printf("b\n");
Replace auto &m = Environment::getInstance(); with MPI_Init().
There doesn't seem to be a connection between these points and I have no clue where to look for the segmentation fault.
The stack trace is:
[pc13090:05978] *** Process received signal ***
[pc13090:05978] Signal: Segmentation fault (11)
[pc13090:05978] Signal code: Address not mapped (1)
[pc13090:05978] Failing at address: 0x61
[pc13090:05978] [ 0] /usr/lib/libpthread.so.0(+0x11dd0)[0x7fa9cf818dd0]
[pc13090:05978] [ 1] /usr/lib/openmpi/libmpi.so.40(ompi_request_default_test+0x16)[0x7fa9d0357326]
[pc13090:05978] [ 2] /usr/lib/openmpi/libmpi.so.40(MPI_Test+0x31)[0x7fa9d03970b1]
[pc13090:05978] [ 3] ./test(+0xb7ae)[0x55713d1aa7ae]
[pc13090:05978] [ 4] /usr/lib/libc.so.6(__libc_start_main+0xea)[0x7fa9cf470f4a]
[pc13090:05978] [ 5] ./test(+0xb5ea)[0x55713d1aa5ea]
[pc13090:05978] *** End of error message ***
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node pc13090 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I am trying to learn MPI. When I am sending data from 1 processor to another, I am successfully able to send the data and receive it in the other in a variable. But, when I try to send and receive on both the processors I get the invalid rank error.
Here is my code for the program
#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv) {
int world_size;
int rank;
char hostname[256];
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
int tag = 4;
int value = 4;
int master = 0;
int rec;
MPI_Status status;
// Initialize the MPI environment
MPI_Init(&argc,&argv);
// get the total number of processes
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// get the rank of current process
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// get the name of the processor
MPI_Get_processor_name(processor_name, &name_len);
// get the hostname
gethostname(hostname,255);
printf("World size is %d\n",world_size);
if(rank == master){
MPI_Send(&value,1,MPI_INT,1,tag,MPI_COMM_WORLD);
MPI_Recv(&rec,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
printf("In master with value %d\n",rec);
}
if(rank == 1){
MPI_Send(&tag,1,MPI_INT,0,tag,MPI_COMM_WORLD);
MPI_Recv(&rec,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
printf("in slave with rank %d and value %d\n",rank, rec);
}
printf("Hello world! I am process number: %d from processor %s on host %s out of %d processors\n", rank, processor_name, hostname, world_size);
MPI_Finalize();
return 0;
}
Here is my PBS file:
#!/bin/bash
#PBS -l nodes=1:ppn=8,walltime=1:00
#PBS -N MPIsample
#PBS -q edu_shared
#PBS -m abe
#PBS -M blahblah#blah.edu
#PBS -e mpitest.err
#PBS -o mpitest.out
#PBS -d /export/home/blah/MPIsample
mpirun -machinefile $PBS_NODEFILE -np $PBS_NP ./mpitest
The output file comes out like this:
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Job complete
If the world size is 1, the world size should be printed once and not 8 times.
The err file is:
[compute-0-34.local:13110] *** An error occurred in MPI_Send
[compute-0-34.local:13110] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13110] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13110] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13107] *** An error occurred in MPI_Send
[compute-0-34.local:13107] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13107] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13107] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13112] *** An error occurred in MPI_Send
[compute-0-34.local:13112] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13112] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13112] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13108] *** An error occurred in MPI_Send
[compute-0-34.local:13108] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13108] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13108] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13109] *** An error occurred in MPI_Send
[compute-0-34.local:13109] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13109] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13109] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13113] *** An error occurred in MPI_Send
[compute-0-34.local:13113] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13113] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13113] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13106] *** An error occurred in MPI_Send
[compute-0-34.local:13106] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13106] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13106] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13111] *** An error occurred in MPI_Send
[compute-0-34.local:13111] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13111] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13111] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
2 days ago I was able to send and receive simultaneously but after that the working code is showing me this error. Is there any problem in my code or in the High Performance computer that I am working on?
From a MPI point of view, you did not launch one MPI job with 8 MPI tasks, but 8 independent MPI jobs with one MPI task each.
That typically occurs when you are mixing two MPI implementations (for example your application was built with Open MPI, and you are using MPICH mpirun).
Before invoking mpirun, i suggest you add in your PBS script
which mpirun
ldd mpitest
Make sure mpirun and the MPI libs are from the same library (e.g. same vendor and same version)
There was a problem with HPC and it was not allotting me the required number of processors. Thanks guys.
I am trying to access an environment variable from a C++ program. So I made a test program which works fine :
#include <stdio.h>
#include <stdlib.h>
int main ()
{
printf("MANIFOLD : %s\n", getenv("MANIFOLD_DIRECTORY"));
return(0);
}
Output : MANIFOLD : /home/n1603031f/Desktop/manifold-0.12.1/kitfox_configuration/input.config
Note : Signature of getenv is :
char *getenv(const char *name);
But when I use this as a part of a bigger program with many files linked :
energy_introspector->configure (getenv("MANIFOLD_DIRECTORY"));
Above does not work.
char *a = new char [1000];
a = getenv("MANIFOLD_DIRECTORY");
energy_introspector->configure (a);
Above also does not work.
Note : Signature of configure function :
void configure(const char *ConfigFile);
Error message :
Number of LPs = 1
[Ubuntu10:18455] *** Process received signal ***
[Ubuntu10:18455] Signal: Segmentation fault (11)
[Ubuntu10:18455] Signal code: Address not mapped (1)
[Ubuntu10:18455] Failing at address: (nil)
[Ubuntu10:18455] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330) [0x7f9a38149330]
[Ubuntu10:18455] [ 1] /lib/x86_64-linux-gnu/libc.so.6(strlen+0x2a) [0x7f9a37dfc9da]
[Ubuntu10:18455] [ 2] /home/n1603031f/Desktop/manifold-0.12.1/simulator/smp/QsimLib/smp_llp() [0x5bf8c4]
[Ubuntu10:18455] [ 3] /home/n1603031f/Desktop/manifold-0.12.1/simulator/smp/QsimLib/smp_llp() [0x5a4ac6]
[Ubuntu10:18455] [ 4] /home/n1603031f/Desktop/manifold-0.12.1/simulator/smp/QsimLib/smp_llp() [0x5a4df8]
[Ubuntu10:18455] [ 5] /home/n1603031f/Desktop/manifold-0.12.1/simulator/smp/QsimLib/smp_llp() [0x4283b6]
[Ubuntu10:18455] [ 6] /home/n1603031f/Desktop/manifold-0.12.1/simulator/smp/QsimLib/smp_llp() [0x41e197]
[Ubuntu10:18455] [ 7] /home/n1603031f/Desktop/manifold-0.12.1/simulator/smp/QsimLib/smp_llp() [0x41de7a]
[Ubuntu10:18455] [ 8] /home/n1603031f/Desktop/manifold-0.12.1/simulator/smp/QsimLib/smp_llp() [0x41d906]
[Ubuntu10:18455] [ 9] /home/n1603031f/Desktop/manifold-0.12.1/simulator/smp/QsimLib/smp_llp() [0x41710b]
[Ubuntu10:18455] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f9a37d95f45]
[Ubuntu10:18455] [11] /home/n1603031f/Desktop/manifold-0.12.1/simulator/smp/QsimLib/smp_llp() [0x41697f]
[Ubuntu10:18455] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 18455 on node Ubuntu10 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
But this works :
energy_introspector->configure ("/home/n1603031f/Desktop/manifold-0.12.1/kitfox_configuration/input.config");
getenv returns a pointer to library-allocated memory that is not owned by your program. Your
a = new char [1000]
line shows you did not recognize this and seem to assume you need to supply the memory. That is not true, especially you may never free the memory returned by getenv.
(Even if that would be correct, the simple pointer assignment
a = getenv...
would still be wrong, as you're just swapping a pointer and not copying the memory. That line is a memory leak as you loose the pointer to the allocated 1000 chars)
If you want your program to own that memory so you can later free it, you need to copy it into our private memory space.
a = new char [1000];
e = getenv (<whatever>);
strcpy (a, e);
Unfortunately, I cannot see what you do with the pointer later on in your other examples, especially if you try to free or delete it. Both will lead to an error.
First explicit error in your code is char array allocation and then assigning result of getenv. This leads to memory leak. In your case use:
std::string a = getenv("MANIFOLD_DIRECTORY");
This saves the result in variable a and makes your code immune to unsetting environment variables.
If getenv returns NULL then variable with specified name is not in the environment passed to your application. Try to list all available environment variables with code like below.
extern char** environ;
for (int i = 0; environ[i] != NULL; ++i) {
std::cout << environ[i] << std::endl;
}
If your variable is not listed then most probably it's the problem how you call your application. The other option is that your environment has been unset.