PMPI_Waitall error - c++

i've some problems with this piece of code.
NEIGHBOORDHOOD_FLAG = 1234;
int number_of_nodes=90;
MPI_Request neig_request[2*number_of_nodes];
MPI_Status neig_status[2*number_of_nodes];
int neig_counter=0;
int is_neighbour_send[number_of_nodes];
int is_neighbour_receive[number_of_nodes];
for(int v=2;v<number_of_nodes+2;++v){
is_neighbour_send[v-2] = (util.check_targets(neighbours,v)==true)?1:0;
this->communicate_neighborhood(&(is_neighbour_send[v-2]),v,&(neig_request[neig_counter]));
++neig_counter;
}
//ricevo la comunicazione dai nodi per sapere di chi sono vicino
for(int v=2;v<number_of_nodes+2;++v){
this->receive_neighborhood(&(is_neighbour_receive[v-2]),v,&(neig_request[neig_counter]));
++neig_counter;
}
MPI_Waitall(neig_counter,neig_request,neig_status);
The two methods above are essentially two wrapped versions of the MPI send and receive:
void NodeAgent::communicate_neighborhood(int *neigh,int dest_node,MPI_Request *req){
MPI_Class::send_message(neigh,MPI_INT,1,dest_node,NEIGHBOORDHOOD_FLAG,req);
}
void NodeAgent::receive_neighborhood(int *is_neighbour,int node_mitt,MPI_Request *req){
MPI_Class::receive_message(is_neighbour,MPI_INT,1,node_mitt,NEIGHBOORDHOOD_FLAG,req);
}
with send and receive static methods:
void MPI_Class::send_message(void *content,MPI_Datatype datatype,int length,int dest,int tag,MPI_Request *request){
int np = MPI_Class::count();
MPI_Isend(content, length, datatype, (dest % np), tag, MPI_COMM_WORLD, request);
}
void MPI_Class::receive_message(void *buffer,MPI_Datatype datatype,int length,int dest,int tag,MPI_Request *request){
int np = MPI_Class::count();
MPI_Irecv(buffer,length,datatype,(dest % np),tag,MPI_COMM_WORLD,request);
}
The number of nodes is the number of processes - 2 (there are two processes that i use for others operations) and i've used the " % " in send and receive to ensure that the sender and the recipient are always in the range that i want.
The interesting thing is that this code executes properly also with large number of process (100). The problem arises when i launch the program on two machines or more...
It gives me: Fatal error in PMPI_Waitall: See the MPI_ERROR field in MPI_Status for the error code
I tried to print the MPI_ERROR of each status, but are all zeroes...
Can someone help me?
many thanks in advance
Edit, here the MCVE:
#include <mpi.h>
#include <stdlib.h>
int main(int argc,char **argv){
MPI_Init(&argc,&argv);
int my_id;
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Status * statuses = (MPI_Status *)malloc(2*world_size * sizeof(MPI_Status));
MPI_Request * requestes = (MPI_Request *)malloc(2*world_size * sizeof(MPI_Request));
int counter =0;
int *tosend = (int *) malloc((world_size - 2) * sizeof(int));
int *toreceive = (int *) malloc((world_size - 2) * sizeof(int));
for(int i=0;i<world_size-2;++i){
tosend[i]=0;
}
if(my_id>1){
for(int i=2;i<world_size;++i){
MPI_Isend(&tosend[i-2],1,MPI_INT,i,0,MPI_COMM_WORLD,&requestes[counter]);
++counter;
}
}
if(my_id>1){
for(int i=2;i<world_size;++i){
MPI_Irecv(&(toreceive[i-2]),1,MPI_INT,i,0,MPI_COMM_WORLD,&(requestes[counter]));
++counter;
}
}
MPI_Waitall(counter,requestes,statuses);
free(statuses);
free(requestes);
free(tosend);
free(toreceive);
MPI_Finalize();
}
Also this code executes correctly on a single machine. The problem arises when i launch it on 2 machines mutual authenticated (ssh). I just discovered that after the MPI_Waitall fatal error, if i try to connect via ssh from a machine to other, it ask me the password....
Why? At this point i think the problem is the communication via ssh and not the code..
Edit2: solved! it was a ssh problem that i solved in some way (i don't know how honestly, i redid the authentication procedures ten times as long as it worked....

Related

MPI Slave processes hang when there is no more work

I have a serial C++ program that I wish to parallelize. I know the basics of MPI, MPI_Send, MPI_Recv, etc. Basically, I have a data generation algorithm that runs significantly faster than the data processing algorithm. Currently they run in series, but I was thinking that running the data generation in the root process, having the data processing done on the slave processes, and sending a message from the root to a slave containing the data to be processed. This way, each slave processes a data set and then waits for its next data set.
The problem is that, once the root process is done generating data, the program hangs because the slaves are waiting for more.
This is an example of the problem:
#include "mpi.h"
#include <cassert>
#include <cstdio>
class Generator {
public:
Generator(int min, int max) : value(min - 1), max(max) {}
bool NextValue() {
++value;
return value < max;
}
int Value() { return value; }
private:
int value, max;
Generator() {}
Generator(const Generator &other) {}
Generator &operator=(const Generator &other) { return *this; }
};
long fibonnaci(int n) {
assert(n > 0);
if (n == 1 || n == 2) return 1;
return fibonnaci(n-1) + fibonnaci(n-2);
}
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
int rank, num_procs;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
if (rank == 0) {
Generator generator(1, 2 * num_procs);
int proc = 1;
while (generator.NextValue()) {
int value = generator.Value();
MPI_Send(&value, 1, MPI_INT, proc, 73, MPI_COMM_WORLD);
printf("** Sent %d to process %d.\n", value, proc);
proc = proc % (num_procs - 1) + 1;
}
} else {
while (true) {
int value;
MPI_Status status;
MPI_Recv(&value, 1, MPI_INT, 0, 73, MPI_COMM_WORLD, &status);
printf("** Received %d from process %d.\n", value, status.MPI_SOURCE);
printf("Process %d computed %d.\n", rank, fibonnaci(2 * (value + 10)));
}
}
MPI_Finalize();
return 0;
}
Obviously not everything above is "good practice", but it is sufficient to get the point across.
If I remove the while(true) from the slave processes, then the program exits when each of the slaves have exited. I would like the program to exit only after the root process has done its job AND all of the slaves have processed everything that has been sent.
If I knew how many data sets would be generated, I could have that many process running and everything would exit nicely, but that isn't the case here.
Any suggestions? Is there anything in the API that will do this? Could this be solved better with a better topology? Would MPI_Isend or MPI_IRecv do this better? I am fairly new to MPI so bear with me.
Thanks
The usual practice is to send to all worker processes an empty message with a special tag that signals them to exit the infinite processing loop. Let's say this tag is 42. You would do something like that in the worker loop:
while (true) {
int value;
MPI_Status status;
MPI_Recv(&value, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (status.MPI_TAG == 42) {
printf("Process %d exiting work loop.\n", rank);
break;
}
printf("** Received %d from process %d.\n", value, status.MPI_SOURCE);
printf("Process %d computed %d.\n", rank, fibonnaci(2 * (value + 10)));
}
The manager process would do something like this after the generator loop:
for (int i = 1; i < num_procs; i++)
MPI_Send(&i, 0, MPI_INT, i, 42, MPI_COMM_WORLD);
Regarding your next question. Using MPI_Isend() in the master process would deserialise the execution and increase the performance. The truth however is that you are sending very small messages and those are typically internally buffered (WARNING - implementation dependent!) so your MPI_Send() is actually non-blocking and you already have non-serial execution. MPI_Isend() returns an MPI_Request handle that you need to take care of later. You could either wait for it to finish with MPI_Wait() or MPI_Waitall() but you could also just call MPI_Request_free() on it and it will be automatically freed when the operation is over. This is usually done when you'd like to send many messages asynchronously and would not care on when the sends will be completed, but it's a bad practice nevertheless since having a large number of outstanding requests can consume lots of precious memory. As for the worker processes - they need the data in order to proceed with the computation so using MPI_Irecv() is not necessary.
Welcome to the wonderful world of MPI programming!

client/server application using MPI

i have two questions ; the first one is :
i'm gonna use msmpi and i meant by "only mpi" that we mustn't use sockets, my application is about a scalable distributed data structure; initially, we have a server contain a file which has a variable size (the size could be increased by insertions and decreased by deletion) and when the size of the file exceed certain limit the file will be splitted, the half remain in the first server and the second half will be moved to a new server and so on... and the client need to be always informed by the address of the data he want to retrieve so he should have an image of the split operation of the file. finally, i hope i make it clearer.
and the second one is:
i've tried to compile simple client/server application(the code source is bellow) with msmpi or mpich2 and it doesn't work and gives me the error message "fatal error in mpi_open_port() and other errors of stack", so i installed open mpi on ubunto 11.10, and tried to run the same example it worked with server side and it gave me a port name but on the client side it gave me the error message:
[user-Compaq-610:03833] [[39604,1],0] ORTE_ERROR_LOG: Not found in file ../../../../../../ompi/mca/dpm/orte/dpm_orte.c at line 155
[user-Compaq-610:3833] *** An error occurred in MPI_Comm_connect
[user-Compaq-610:3833] *** on communicator MPI_COMM_WORLD
[user-Compaq-610:3833] *** MPI_ERR_INTERN: internal error
[user-Compaq-610:3833] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 3833 on
node toufik-Compaq-610 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
so i'm confused what the problem is, and i spent a while trying to fix it,
i'd be greatfull if any body could help me with it, and thank u in advance.
the source code is here:
/* the server side */
#include <stdio.h>
#include <mpi.h>
main(int argc, char **argv)
{
int my_id;
char port_name[MPI_MAX_PORT_NAME];
MPI_Comm newcomm;
int passed_num;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
passed_num = 111;
if (my_id == 0)
{
MPI_Open_port(MPI_INFO_NULL, port_name);
printf("%s\n\n", port_name); fflush(stdout);
} /* endif */
MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &newcomm);
if (my_id == 0)
{
MPI_Send(&passed_num, 1, MPI_INT, 0, 0, newcomm);
printf("after sending passed_num %d\n", passed_num); fflush(stdout);
MPI_Close_port(port_name);
} /* endif */
MPI_Finalize();
exit(0);
} /* end main() */
and at the client side:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char **argv)
{
int passed_num;
int my_id;
MPI_Comm newcomm;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
MPI_Comm_connect(argv[1], MPI_INFO_NULL, 0, MPI_COMM_WORLD, &newcomm);
if (my_id == 0)
{
MPI_Status status;
MPI_Recv(&passed_num, 1, MPI_INT, 0, 0, newcomm, &status);
printf("after receiving passed_num %d\n", passed_num); fflush(stdout);
} /* endif */
MPI_Finalize();
return 0;
//exit(0);
} /* end main() */
How exactly do you run the application? It seems that provided client and server codes are the same.
Usually the code is the same for all MPI processes and program decides what to execute basing on rank as in this snippet if (my_id == 0) { ... }. The application is executed with mpiexec. For example mpiexec -n 2 ./application would run two MPI processes with ranks 1 and 2 in one MPI_COMM_WORLD communicator. Where exactly the prococesses would be executed (on the same node or on different ones) depends on configuration.
Nevertheless, you should create port with MPI_Open_port and the pass it to MPI_Comm_connect. Here is an example on how to use these functions: MPI_Comm_connect
Moreover, for MPI_Recv there must be corresponding MPI_Send. Otherwise receiving process would wait forever.

C++ windows threading and mutex issue

I am a bit rusty with threaded programs especially in windows.
I have created a simple mex file in Matlab that is meant to read a number of files with each file being read in its own thread.
The file doesnt do anything really useful but is a precursor to a more complicated version that will use all of the functionality ive put into this file.
Here is the code:
#include <windows.h>
#include "mex.h"
#include <fstream>
typedef unsigned char uchar;
typedef unsigned int uint;
using namespace std;
int N;
int nThreads;
const int BLOCKSIZE = 1024;
char * buffer;
char * out;
HANDLE hIOMutex;
DWORD WINAPI runThread(LPVOID argPos) {
int pos = *(reinterpret_cast<int*>(argPos));
DWORD dwWaitResult = WaitForSingleObject( hIOMutex, INFINITE );
if (dwWaitResult == WAIT_OBJECT_0){
char buf[20];
sprintf(buf, "test%i.dat", pos);
ifstream ifs(buf, ios::binary);
if (!ifs.fail()) {
mexPrintf("Running thread:%i\n", pos);
for (int i=0; i<N/BLOCKSIZE;i++) {
if (ifs.eof()){
mexPrintf("File %s exited at i=%i\n", buf, (i-1)*BLOCKSIZE);
break;
}
ifs.read(&buffer[pos*BLOCKSIZE], BLOCKSIZE);
}
}
else {
mexPrintf("Could not open file %s\n", buf);
}
ifs.close();
ReleaseMutex( hIOMutex);
}
else
mexPrintf("The Mutex failed in thread:%i \n", pos);
return TRUE;
}
// 0 - N is data size
// 1 - nThreads is number of threads
// 2 - this is the output array
void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray*prhs[] ) {
N = mxGetScalar(prhs[0]);
nThreads = mxGetScalar(prhs[1]);
out = (char*)mxGetData(prhs[2]);
buffer = (char*)malloc(BLOCKSIZE*nThreads);
hIOMutex= CreateMutex(NULL, FALSE, NULL);
HANDLE *hArr = (HANDLE*)malloc(sizeof(HANDLE)*nThreads);
int *tInd = (int*)malloc(sizeof(int)*nThreads);
for (int i=0;i<nThreads;i++){
tInd[i]=i;
hArr[i] = CreateThread( NULL, 0, runThread, &tInd[i], 0, NULL);
if (!hArr[i]) {
mexPrintf("Failed to start thread:%i\n", i);
break;
}
}
WaitForMultipleObjects( nThreads, hArr, TRUE, INFINITE);
for (int i=0;i<nThreads;i++)
CloseHandle(hArr[i]);
CloseHandle(hIOMutex);
mexEvalString("drawnow");
mexPrintf("Finished all threads.\n");
free(hArr);
free(tInd);
free(buffer);
I compile it like this in Matlab:
mex readFile.cpp
And then run it like this:
out = zeros(1024*1024,1,'uint8');
readFile(1024*1024,nFiles,out);
The problem is that when I set nFiles to be less than or equal to 64 everything works as expected and I get the following output:
Running thread:0
.
.
.
Running thread:62
Running thread:63
Finished all threads.
However when I set nFiles to 65 or larger I get:
Running thread:0
Running thread:1
Running thread:2
Running thread:3
The Mutex failed in thread:59
The Mutex failed in thread:60
The Mutex failed in thread:61
.
.
.
(up to nFiles-1)
Finished all threads.
I have also tested it without threading and it works fine.
I cannot see what Im doing wrong or why the cutoff to using the mutex would be so arbitrary so I am assuming there is something I am not taking into account.
Can anyone see where I have a blatant mistake relating to the error Im seeing?
In the documentation for WaitForMultipleObjects, "The maximum number of object handles is MAXIMUM_WAIT_OBJECTS.", which is 64 on most systems.
This is also (almost) a duplicate of this thread. The summary is really just that yes, the limit is 64, and also to use the information in the remarks section of WaitForMultipleObjects to build up a tree of threads to wait on.

MPI communicator error

I had a problem with a program that uses MPI and I have just fixed it, however, I don't seem to understand what was wrong in the first place. I'm quite green with programming relates stuff, so please be forgiving.
The program is:
#include <iostream>
#include <cstdlib>
#include <mpi.h>
#define RNumber 3
using namespace std;
int main() {
/*Initiliaze MPI*/
int my_rank; //My process rank
int comm_sz; //Number of processes
MPI_Comm GathComm; //Communicator for MPI_Gather
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
/*Initialize an array for results*/
long rawT[RNumber];
long * Times = NULL; //Results from threads
if (my_rank == 0) Times = (long*) malloc(comm_sz*RNumber*sizeof(long));
/*Fill rawT with results at threads*/
for (int i = 0; i < RNumber; i++) {
rawT[i] = i;
}
if (my_rank == 0) {
/*Main thread recieves data from other threads*/
MPI_Gather(rawT, RNumber, MPI_LONG, Times, RNumber, MPI_LONG, 0, GathComm);
}
else {
/*Other threads send calculation results to main thread*/
MPI_Gather(rawT, RNumber, MPI_LONG, Times, RNumber, MPI_LONG, 0, GathComm);
}
/*Finalize MPI*/
MPI_Finalize();
return 0;
};
On execution the program returns the following message:
Fatal error in PMPI_Gather: Invalid communicator, error stack:
PMPI_Gather(863): MPI_Gather(sbuf=0xbf824b70, scount=3, MPI_LONG,
rbuf=0x98c55d8, rcount=3, MPI_LONG, root=0, comm=0xe61030) failed
PMPI_Gather(757): Invalid communicator Fatal error in PMPI_Gather:
Invalid communicator, error stack: PMPI_Gather(863):
MPI_Gather(sbuf=0xbf938960, scount=3, MPI_LONG, rbuf=(nil), rcount=3,
MPI_LONG, root=0, comm=0xa6e030) failed PMPI_Gather(757): Invalid
communicator
After I remove GathComm altogether and substitute it with MPI_COMM_WORLD default communicator everything works fine.
Could anyone be so kind to explain what was I doing wrong and how did this adjustment made everything work?
That's because GathComm has not been assigned a valid communicator. "MPI_Comm GathComm;" only declares the variable to hold a communicator but doesn't create one.
You can use the default communicator (MPI_COMM_WORLD) if you simply want to include all procs in the operation.
Custom communicators are useful when you want to organised your procs in separate groups or when using virtual communication topologies.
To find out more, check out this article which describes Groups, Communicator and Topologies.

Can anybody help me to identify the runtime MPI error in this code sample?

This code sampler is used to learn MPI programming. The MPI package I use is MPICH2 1.3.1. The code below is my first step to learn MPI_Isend(), MPI_Irecv() and MPI_Wait(). The code has a master and several workers. Master receives data from workers while workers send data to master. As usual, the data size is very large, workers split data into trunks and send trunks sequentially. I use some tricks to overlap the computation and communication when sending trunks. The method is very simple, just keeping two buffers to hold two trunks for each sending cycle.
int test_mpi_wait_2(int argc, char* argv[])
{
int rank;
int numprocs;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
int trunk_num = 6;// assume there are six trunks
int trunk_size = 10000;// assume each trunk has 10,000 data points
if(rank == 0)
{
//allocate receiving buffer for all workers
int** recv_buf = new int* [numprocs];
for(int i=0;i<numprocs;i++)
recv_buf[i] = new int [trunk_size];
//collecting first trunk from all workers
MPI_Request* requests = new MPI_Request[numprocs];
for(int i=1;i<numprocs;i++)
MPI_Irecv(recv_buf[i], trunk_size, MPI_INT, i, 0, MPI_COMM_WORLD, &requests[i]);
//define send_buf counter used to record how many trunks have been collected
vector<int> counter(numprocs);
MPI_Status status;
//assume therer are N-1 workers, then the total trunks will be collected is (N-1)*trunk_num
for(int i=0;i<(numprocs-1)*trunk_num;i++)
{
//wait until receive one trunk from any worker
int active_index;
MPI_Waitany(numprocs-1, requests+1, &active_index, &status);
int request_index = active_index + 1;
int procs_index = active_index + 1;
//check wheather all trunks from this worker have been collected
if(++counter[procs_index] != trunk_num)
{
//receive next trunk from this worker
MPI_Irecv(recv_buf[procs_index], trunk_size, MPI_INT, procs_index, 0, MPI_COMM_WORLD, &requests[request_index]);
}
}
for(int i=0;i<numprocs;i++)
delete [] recv_buf[i];
delete [] recv_buf;
delete [] requests;
cout<<rank<<" done"<<endl;
}
else
{
//for each worker, the worker first fill one trunk and send it to master
//for efficiency, the computation of trunk and communication to master is overlapped.
//two buffers are allocated to implement the overlapped computation
int* send_buf[2];
send_buf[0] = new int [trunk_size];//Buffer A
send_buf[1] = new int [trunk_size];//Buffer B
MPI_Request requests[2];
//file first trunk
for(int i=0;i<trunk_size;i++)
send_buf[0][i] = 0;
//send this trunk
MPI_Isend(send_buf[0], trunk_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[0]);
if(trunk_num > 1)
{
//file second trunk
for(int i=0;i<trunk_size;i++)
send_buf[1][i] = i;
//send this trunk
MPI_Isend(send_buf[1], trunk_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[1]);
}
//for remained trunks, keep cycle until all trunks are sent
for(int i=2;i<trunk_num;i+=2)
{
//wait till trunk data at buffer A is sent
MPI_Wait(&requests[0], MPI_STATUS_IGNORE);
//fill buffer A with next trunk data
for(int j=0;j<trunk_size;j++)
send_buf[0][j] = j * i;
//send buffer A
MPI_Isend(send_buf[0], trunk_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[0]);
//if more trunks are remained, fill buffer B and sent it
if(i+ 1 < trunk_num)
{
MPI_Wait(&requests[1], MPI_STATUS_IGNORE);
for(int j=0;j<trunk_size;j++)
send_buf[1][j] = j * (i + 1);
MPI_Isend(send_buf[1], trunk_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[1]);
}
}
//wait until last two trunks have been sent
if(trunk_num == 1)
{
MPI_Wait(&requests[0], MPI_STATUS_IGNORE);
}
else
{
MPI_Wait(&requests[0], MPI_STATUS_IGNORE);
MPI_Wait(&requests[1], MPI_STATUS_IGNORE);
}
delete [] send_buf[0];
delete [] send_buf[1];
cout<<rank<<" done"<<endl;
}
MPI_Finalize();
return 0;
}
Not much of an answer but this compiles and runs on my version of MPI, with up to 4 processors. The code does seem a bit involved, but I also cannot see any reason why it should not work.
I see several obvious ones: some for loops are not terminated, some cout statements aren't terminated, etc. I believe the code wasn't formatted properly...