What is equivalent of socket programming's select() in MPI? - c++

In socket programming, we have select() function which allows us to simultaneously check multiple sockets. I want to know is there any such feature available in MPI library as well?
In the first for loop of the following code, I am sending multiple nonblocking send and receive requests from one to every other node. In the second for loop instead of waiting for each node in sequential order, I want to start processing the data of the node which sends its data first. I want to know is there any way to do that?
for(id=0; id<numtasks; id++){
if(id == taskid) continue;
if(sendCount[id] != 0) MPI_Isend(sendBuffer[id], N*sendCount[id], MPI_DOUBLE, id, tag, MPI_COMM_WORLD, &reqs[id]);
if(recvCount[id] != 0) MPI_Irecv(recvBuffer[id], N*recvCount[id], MPI_DOUBLE, id, tag, MPI_COMM_WORLD, &reqs[id]);
}
for(id=0; id<numtasks; id++){
if(id == taskid) continue;
if(recvCount[id] != 0){
MPI_Wait(&reqs[id], &status);
for(i=0; i<recvCount[id]; i++)
splitData(N, recvBuffer[id] + N*i, U[toRecv[id][i]]);
}
}
According to the given answers, I have tried to modify my code but I am still getting segmentation fault error during run time. Please help me to figure out the error.
for(id=0; id<numtasks; id++){
if(id == taskid) continue;
if(sendCount[id] != 0) MPI_Isend(sendBuffer[id], N*sendCount[id], MPI_DOUBLE, id, tag, MPI_COMM_WORLD, &reqs[id]);
if(recvCount[id] != 0) MPI_Irecv(recvBuffer[id], N*recvCount[id], MPI_DOUBLE, id, tag, MPI_COMM_WORLD, &reqs[id]);
}
reqs[taskid] = reqs[numtasks-1];
for(i=0; i<numtasks-1; i++){
MPI_Waitany(numtasks-1, reqs, &id, &status);
if(id == taskid) id = numtasks-1;
for(i=0; i<recvCount[id]; i++)
splitData(N, recvBuffer[id] + N*i, U[toRecv[id][i]]);
}

The closest equivalent would be MPI_Waitsome, you provide a list of requests and it returns as soon as at least one request is completed. However, there is no timeout as in select. There is also MPI_Waitany, MPI_Waitall as well as MPI_Testany, MPI_Testall, MPI_Testsome.
The any and some variants mainly differ in the way the interface informs you about one or multiple completed requests.
Edit: You need to use a separate requests for each operation, specifically the send and receive operations.

Related

MPI Isend and Irecv problems

I'm having trouble with my MPI_Isend and MPI_Irecv blocks of code. I need to send a number Cin to the next process up the line, then the current process can go about it's business.
The receiving process needs to receive before it can go further in it's calculations, but when I don't have MPI_Wait it nevers gets the data, and when I do it just hangs forever. What am I doing wrong?
Note: I only set Cin to 3 in order to see when the message doesn't go through. Currently it just hangs.
void ComputeS5C()
{
MPI_Request send_request, recv_request;
MPI_Status status;
int Cin[1] = {3};
if(my_rank == 0){Cin[0] = 0;}
else {
MPI_Irecv(Cin, 1, MPI_INT, my_rank - 1, 0, MPI_COMM_WORLD, &recv_request);
MPI_Wait(&recv_request, &status);
fprintf(stderr, "RANK:%d Message Received from rank%d: Cin=%d\n", my_rank, my_rank-1, Cin[0]);
}
int k;
for(k = 0; k < Size_5; k++)
{
int s5clast;
if(k==0)
{
s5clast = Cin[0];
}
else
{
s5clast = s5c[k-1];
}
s5c[k] = s5g[k] | (s5p[k]&s5clast);
}
//if not highest rank, pass the carryin upstream
if(my_rank < world_size - 1){
MPI_Isend(&s5c[k], 1, MPI_INT, my_rank+1, 1, MPI_COMM_WORLD, &send_request);
fprintf(stderr, "RANK:%d Message sent to rank%d: Cin=%d\n", my_rank, my_rank+1, s5c[k]);
}
MPI_Wait(&send_request, &status);
}
The error in your code has to do with the missmatch of tags. Messages are sent using a tag = 1 and received using tag = 0. Sends and receives are not matching explaining why all processes are stuck waiting that sent messages get consumed. Change the tags so that they match.
A note, when using MPI_Irecv you always need an MPI_Wait to be sure to know when it is safe to consume received data. I think in your example use of MPI_Recv is more approriate.
It seems that you communicate one rank after the other sequentially. Quite large overhead.

Make slaves wait for each other in MPI

I have a master-slave model in my MPI program. I want to make slaves wait for each other before going to the next iteration.
if (rank == 0) {
// master process
} else {
// slave process
for (int i = 0; i < 10; i++) {
// do stuff
// wait for all slaves to end iteration i
}
}
Basically, I don't want any processor to go into next iteration without all other slaves complete their current iteration. How can I do this? With MPI_Barrier?
You can create a communicator comprising with all slave processes and use it on a MPI_Barrier().
Fro creating this communicator, the simplest / safest is to use MPI_Comm_split() this way:
MPI_Comm slaves;
MPI_Comm_split( MPI_COMM_WORLD, ( rank == 0 ), rank, &slaves );
This will actually globally create 2 communicators: one comprising only the master process and one comprising all processes but the master.
For actual use, you can do it this way:
if (rank == 0) {
// master process
} else {
// slave process
for (int i = 0; i < 10; i++) {
// do stuff
// wait for all slaves to end iteration i
MPI_Barrier( slaves );
}
}

MPI Slave processes hang when there is no more work

I have a serial C++ program that I wish to parallelize. I know the basics of MPI, MPI_Send, MPI_Recv, etc. Basically, I have a data generation algorithm that runs significantly faster than the data processing algorithm. Currently they run in series, but I was thinking that running the data generation in the root process, having the data processing done on the slave processes, and sending a message from the root to a slave containing the data to be processed. This way, each slave processes a data set and then waits for its next data set.
The problem is that, once the root process is done generating data, the program hangs because the slaves are waiting for more.
This is an example of the problem:
#include "mpi.h"
#include <cassert>
#include <cstdio>
class Generator {
public:
Generator(int min, int max) : value(min - 1), max(max) {}
bool NextValue() {
++value;
return value < max;
}
int Value() { return value; }
private:
int value, max;
Generator() {}
Generator(const Generator &other) {}
Generator &operator=(const Generator &other) { return *this; }
};
long fibonnaci(int n) {
assert(n > 0);
if (n == 1 || n == 2) return 1;
return fibonnaci(n-1) + fibonnaci(n-2);
}
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
int rank, num_procs;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
if (rank == 0) {
Generator generator(1, 2 * num_procs);
int proc = 1;
while (generator.NextValue()) {
int value = generator.Value();
MPI_Send(&value, 1, MPI_INT, proc, 73, MPI_COMM_WORLD);
printf("** Sent %d to process %d.\n", value, proc);
proc = proc % (num_procs - 1) + 1;
}
} else {
while (true) {
int value;
MPI_Status status;
MPI_Recv(&value, 1, MPI_INT, 0, 73, MPI_COMM_WORLD, &status);
printf("** Received %d from process %d.\n", value, status.MPI_SOURCE);
printf("Process %d computed %d.\n", rank, fibonnaci(2 * (value + 10)));
}
}
MPI_Finalize();
return 0;
}
Obviously not everything above is "good practice", but it is sufficient to get the point across.
If I remove the while(true) from the slave processes, then the program exits when each of the slaves have exited. I would like the program to exit only after the root process has done its job AND all of the slaves have processed everything that has been sent.
If I knew how many data sets would be generated, I could have that many process running and everything would exit nicely, but that isn't the case here.
Any suggestions? Is there anything in the API that will do this? Could this be solved better with a better topology? Would MPI_Isend or MPI_IRecv do this better? I am fairly new to MPI so bear with me.
Thanks
The usual practice is to send to all worker processes an empty message with a special tag that signals them to exit the infinite processing loop. Let's say this tag is 42. You would do something like that in the worker loop:
while (true) {
int value;
MPI_Status status;
MPI_Recv(&value, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (status.MPI_TAG == 42) {
printf("Process %d exiting work loop.\n", rank);
break;
}
printf("** Received %d from process %d.\n", value, status.MPI_SOURCE);
printf("Process %d computed %d.\n", rank, fibonnaci(2 * (value + 10)));
}
The manager process would do something like this after the generator loop:
for (int i = 1; i < num_procs; i++)
MPI_Send(&i, 0, MPI_INT, i, 42, MPI_COMM_WORLD);
Regarding your next question. Using MPI_Isend() in the master process would deserialise the execution and increase the performance. The truth however is that you are sending very small messages and those are typically internally buffered (WARNING - implementation dependent!) so your MPI_Send() is actually non-blocking and you already have non-serial execution. MPI_Isend() returns an MPI_Request handle that you need to take care of later. You could either wait for it to finish with MPI_Wait() or MPI_Waitall() but you could also just call MPI_Request_free() on it and it will be automatically freed when the operation is over. This is usually done when you'd like to send many messages asynchronously and would not care on when the sends will be completed, but it's a bad practice nevertheless since having a large number of outstanding requests can consume lots of precious memory. As for the worker processes - they need the data in order to proceed with the computation so using MPI_Irecv() is not necessary.
Welcome to the wonderful world of MPI programming!

What might cause an infinite loop error

I am working on a network programming and I have this code
void WorkHandler::workLoop(){
.
.
.
while(1){
if(remainLength >= MAX_LENGTH)
currentSentLength = send(client->getFd(), sBuffer, MAX_LENGTH, MSG_NOSIGNAL);
else
currentSentLength = send(client->getFd(), sBuffer, remainLength,MSG_NOSIGNAL);
if(currentSentLength == -1){
log("WorkHandler::workLoop, connection has been lost \n");
break;
}
sBuffer += currentSentLength;
remainLength -= currentSentLength;
if(remainLength == 0)
break;
}
}
Also, I am creating a child thread like this
bool WorkHandler::initThreads(){
for(int i=0; i < m_maxThreads; i++){
pthread_t *thread(new pthread_t);
m_workThreadList.push_back(thread);
if(pthread_create(thread, NULL, runWorkThread, reinterpret_cast<void *>(this))!=0){
log("WorkHandler::initThreads, pthread_create error \n");
return false;
}
pthread_detach(*thread);
}
return true;
}
void* WorkHandler::runWorkThread(void *delegate){
printf("WorkHandler::runWorkThread, called\n");
WorkHandler *ptr = reinterpret_cast<WorkHandler*>(delegate);
ptr->workLoop();
return NULL;
}
I am running this code on gdb and it doesn't blow up but it gets stuck at the second send function in the if then else loop. I put log statements every single line and it prints a log right above the second send function and stopped.
currentSentLength = send(client->getFd(), sBuffer, remainLength, MSG_NOSIGNAL);
What might cause this problem and how do I fix this issue?
Thanks in advance..
With blocking IO send will block if the kernel buffer is full and will block untill the clients have read the data. Do you send large chunks? If so, check your client.
If you don't trust clients (they can abuse this to do denial of service attacks) there are a couple of ways to do this properly: poll (with timeout) on the sockets for writeability, send with timeout, use nonblocking I/O, ...
I guess you're calling send() with a negative size...
Your test to exit the while should be
remainLength <= 0
and not
remainLength == 0

Can anybody help me to identify the runtime MPI error in this code sample?

This code sampler is used to learn MPI programming. The MPI package I use is MPICH2 1.3.1. The code below is my first step to learn MPI_Isend(), MPI_Irecv() and MPI_Wait(). The code has a master and several workers. Master receives data from workers while workers send data to master. As usual, the data size is very large, workers split data into trunks and send trunks sequentially. I use some tricks to overlap the computation and communication when sending trunks. The method is very simple, just keeping two buffers to hold two trunks for each sending cycle.
int test_mpi_wait_2(int argc, char* argv[])
{
int rank;
int numprocs;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
int trunk_num = 6;// assume there are six trunks
int trunk_size = 10000;// assume each trunk has 10,000 data points
if(rank == 0)
{
//allocate receiving buffer for all workers
int** recv_buf = new int* [numprocs];
for(int i=0;i<numprocs;i++)
recv_buf[i] = new int [trunk_size];
//collecting first trunk from all workers
MPI_Request* requests = new MPI_Request[numprocs];
for(int i=1;i<numprocs;i++)
MPI_Irecv(recv_buf[i], trunk_size, MPI_INT, i, 0, MPI_COMM_WORLD, &requests[i]);
//define send_buf counter used to record how many trunks have been collected
vector<int> counter(numprocs);
MPI_Status status;
//assume therer are N-1 workers, then the total trunks will be collected is (N-1)*trunk_num
for(int i=0;i<(numprocs-1)*trunk_num;i++)
{
//wait until receive one trunk from any worker
int active_index;
MPI_Waitany(numprocs-1, requests+1, &active_index, &status);
int request_index = active_index + 1;
int procs_index = active_index + 1;
//check wheather all trunks from this worker have been collected
if(++counter[procs_index] != trunk_num)
{
//receive next trunk from this worker
MPI_Irecv(recv_buf[procs_index], trunk_size, MPI_INT, procs_index, 0, MPI_COMM_WORLD, &requests[request_index]);
}
}
for(int i=0;i<numprocs;i++)
delete [] recv_buf[i];
delete [] recv_buf;
delete [] requests;
cout<<rank<<" done"<<endl;
}
else
{
//for each worker, the worker first fill one trunk and send it to master
//for efficiency, the computation of trunk and communication to master is overlapped.
//two buffers are allocated to implement the overlapped computation
int* send_buf[2];
send_buf[0] = new int [trunk_size];//Buffer A
send_buf[1] = new int [trunk_size];//Buffer B
MPI_Request requests[2];
//file first trunk
for(int i=0;i<trunk_size;i++)
send_buf[0][i] = 0;
//send this trunk
MPI_Isend(send_buf[0], trunk_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[0]);
if(trunk_num > 1)
{
//file second trunk
for(int i=0;i<trunk_size;i++)
send_buf[1][i] = i;
//send this trunk
MPI_Isend(send_buf[1], trunk_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[1]);
}
//for remained trunks, keep cycle until all trunks are sent
for(int i=2;i<trunk_num;i+=2)
{
//wait till trunk data at buffer A is sent
MPI_Wait(&requests[0], MPI_STATUS_IGNORE);
//fill buffer A with next trunk data
for(int j=0;j<trunk_size;j++)
send_buf[0][j] = j * i;
//send buffer A
MPI_Isend(send_buf[0], trunk_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[0]);
//if more trunks are remained, fill buffer B and sent it
if(i+ 1 < trunk_num)
{
MPI_Wait(&requests[1], MPI_STATUS_IGNORE);
for(int j=0;j<trunk_size;j++)
send_buf[1][j] = j * (i + 1);
MPI_Isend(send_buf[1], trunk_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[1]);
}
}
//wait until last two trunks have been sent
if(trunk_num == 1)
{
MPI_Wait(&requests[0], MPI_STATUS_IGNORE);
}
else
{
MPI_Wait(&requests[0], MPI_STATUS_IGNORE);
MPI_Wait(&requests[1], MPI_STATUS_IGNORE);
}
delete [] send_buf[0];
delete [] send_buf[1];
cout<<rank<<" done"<<endl;
}
MPI_Finalize();
return 0;
}
Not much of an answer but this compiles and runs on my version of MPI, with up to 4 processors. The code does seem a bit involved, but I also cannot see any reason why it should not work.
I see several obvious ones: some for loops are not terminated, some cout statements aren't terminated, etc. I believe the code wasn't formatted properly...