mpi hello world not working - c++

I write simple hello-world program on Visual c++ 2010 express with MPI library and cant understand, why my code not working.
MPI_Init( NULL, NULL );
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
int a, b = 5;
MPI_Status st;
MPI_Send( &b, 1, MPI_INT, 0,0, MPI_COMM_WORLD );
MPI_Recv( &a, 1, MPI_INT, 0,0, MPI_COMM_WORLD, &st );
MPI_Send tells me "DEADLOCK: attempting to send a message to the local process without a prior matching receive". If i write Recv first, program stucks there (no data, blocking receive).
What i`m doint wrong?
My studio is visual c++ 2010 express. MPI from HPC SDK 2008 (32 bit).

You need something like this:
assert(size >= 2);
if (rank == 0)
MPI_Send( &b, 1, MPI_INT, 1,0, MPI_COMM_WORLD );
if (rank == 1)
MPI_Recv( &a, 1, MPI_INT, 0,0, MPI_COMM_WORLD, &st );
The idea of MPI is that the whole system operates in lockstep. And sometimes you do need to be aware of which participant you are in the "world." In this case, assuming you have two members (as per my assert), you need to make one of them send and the other receive.
Note also that I changed the "dest" parameter of the send, because 0 needs to send to 1 therefore 1 needs to receive from 0.
You can later do it the other way around if you wish (if each needs to tell the other something), but in such a case you may find even more efficient ways to do it using "collective operations" where you can exchange (both send and receive) with all the peers.

In your example code, you're sending to and receiving from rank 0. If you are only running your MPI program with 1 process (which makes no sense, but we'll accept it for the sake of argument), you could make this work by using non-blocking calls instead of the blocking version. It would change your program to look like this:
MPI_Init( NULL, NULL );
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
int a, b = 5;
MPI_Status st[2];
MPI_Request request[2];
MPI_Isend( &b, 1, MPI_INT, 0,0, MPI_COMM_WORLD, &request[0] );
MPI_Irecv( &a, 1, MPI_INT, 0,0, MPI_COMM_WORLD, &request[1] );
MPI_Waitall( request, st );
That would let both the send and the receive complete at the same time. The reason your MPI version doesn't like your original code (which is very nice of it to tell you such a thing) is because the call to MPI_SEND could block until the matching MPI_RECV is done, which in this case wouldn't occur because it would only get called after the MPI_SEND is over, which is a circular dependency.
In MPI, when you add an 'I' before an MPI call, it means "Immediate", as in, the call will return immediately and complete all the work later, when you call MPI_WAIT (or some version of it, like MPI_WAITALL in this example). So what we did here was to make the send and receive return immediately, basically just telling MPI that we intend to do a send and receive with rank 0 at some point in the future, then later (the next line), we tell MPI to go ahead and finish those calls now.
The benefit of using the immediate version of these calls is that theoretically, MPI can do some things in the background to let the send and receive calls make progress while your application is doing something else that doesn't rely on the result of that data. Then, when you finish the call to MPI_WAIT* later, the data is available and you can do whatever you need to do.

Related

MPI dynamically allocate tasks

I have a C++ MPI program that runs on Windows HPC cluster (12 nodes, 24 cores per node).
The logic of the program is really simple:
there is a pool of tasks
At the start, the program divides the tasks equally to each MPI process
Each MPI process execute their tasks
After everything is finished, using MPI reduce to gather the results to the root process.
There is one problem. Each task can have drastically different execution time and there is no way that I can tell that in advance. Equally distributing the task will results a lot of processes waiting idle. This wastes a lot of computer resources and make the total execution time longer.
I am thinking of one solution that might work.
The process is like this.
The task pool is divided into small parcels (like 10 tasks a parcel)
Each MPI process take a parcel at a time when it is idle (have not received a parcel, or finished the previous parcel)
The step 2 is continued until the task pool is exhausted
Using MPI reduce to gather all the results to root process
As far as I understand, this scheme need a universal counter across nodes/process (to avoid different MPI process execute the same parcel) and changing it need some lock/sync mechanism. It certainly has its overhead but with proper tuning, I think it can help to improve the performance.
I am not quite familiar with MPI and have some implementation issues. I can think of two ways to implement this universal counter
Using MPI I/O technique, write this counter in file, when a parcel is took, increase this counter (will certainly need file lock mechanism)
Using MPI one side communication/shared memory. Put this counter in the shared memory and increase it when a parcel is taken. (will certainly need a sync mechanism)
Unfortunately, I am not familiar with either technique and want to explore the possibility, implementation, or possible drawbacks of the two above methods. A sample code would be greatly appreciated.
If you have other ways to solve the problem or suggestions, that will also be great. Thanks.
Follow-ups:
Thanks for all the useful suggestions. I am implemented a test program following the scheme of using process 0 as the task distributor.
#include <iostream>
#include <mpi.h>
using namespace std;
void doTask(int rank, int i){
cout<<rank<<" got task "<<i<<endl;
}
int main ()
{
int numTasks = 5000;
int parcelSize = 100;
int numParcels = (numTasks/parcelSize) + (numTasks%parcelSize==0?0:1);
//cout<<numParcels<<endl;
MPI_Init(NULL, NULL);
int rank, nproc;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Status status;
MPI_Request request;
int ready = 0;
int i = 0;
int maxParcelNow = 0;
if(rank == 0){
for(i = 0; i <numParcels; i++){
MPI_Recv(&ready, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
//cout<<i<<"Yes"<<endl;
MPI_Send(&i, 1, MPI_INT, status.MPI_SOURCE, 0, MPI_COMM_WORLD);
//cout<<i<<"No"<<endl;
}
maxParcelNow = i;
cout<<maxParcelNow<<" "<<numParcels<<endl;
}else{
int counter = 0;
while(true){
if(maxParcelNow == numParcels) {
cout<<"Yes exiting"<<endl;
break;
}
//if(maxParcelNow == numParcels - 1) break;
ready = 1;
MPI_Send(&ready, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
//cout<<rank<<"send"<<endl;
MPI_Recv(&i, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
//cout<<rank<<"recv"<<endl;
doTask(rank, i);
}
}
MPI_Bcast(&maxParcelNow, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
It does not work and it never stops. Any suggestions on how to make it work? Does this code reflect the idea right or am I missing something? Thanks
[Converting my comments into an answer...]
Given n processes, you can have your first process p0 dispatch tasks for the other n - 1 processes. First, it will do point-to-point communication to the other n - 1 processes so that everyone has work to do, and then it will block on a Recv. When any given process completes, say p3, it will send its result back to p0. At this point, p0 will send another message to p3 with one of two things:
1) Another task
or
2) Some kind of termination signal if there are no tasks remaining. (Using the 'tag' of the message is one easy way.)
Obviously, p0 will loop over that logic until there is no task left, in which case it will call MPI_Finalize too.
Unlike what you thought in your comments, this isn't round-robin. It first gives a job to every process, or worker, and then gives back another job whenever one completes...

MPI Programming in C - MPI_Send() and MPI_Recv() Address Trouble

I'm currently working on a C program using MPI, and I've run into a roadblock regarding the MPI_Send() and MPI_Recv() functions, that I hope you all can help me out with. My goal is to send (with MPI_Send()), and receive (with MPI_Recv()), the address of "a[0][0]" (Defined Below), and then display the CONTENTS of that address after I've received it from MPI_Recv(), in order to confirm my send and receive is working. I've outlined my problem below:
I have a 2-d array, "a", that works like this:
a[0][0] Contains my target ADDRESS
*a[0][0] Contains my target VALUE
i.e. printf("a[0][0] Value = %3.2f, a[0][0] Address = %p\n", *a[0][0], a[0][0]);
So, I run my program and memory is allocated for a. Debug confirms that a[0][0] contains the address 0x83d6260, and the value stored at address 0x83d6260, is 0.58. In other words, "a[0][0] = 0x83d6260", and "*a[0][0] = 0.58".
So, I pass the address, "a[0][0]", as the first parameter of MPI_Send():
-> MPI_Send(a[0][0], 1, MPI_FLOAT, i, 0, MPI_COMM_WORLD);
// I put 1 as the second parameter becasue I only want to receive this one address
MPI_Send() executes and returns 0, which is MPI_SUCCESS, which means that it succeeded, and my Debug confirms that "0x83d6260" is the address passed.
However, when I attempt to receive the address by using MPI_Recv(), I get Segmentation fault:
MPI_Recv(a[0][0], 1, MPI_FLOAT, iNumProcs-1, 0, MPI_COMM_WORLD, &status);
The address 0x83d6260 was sent successfully using MPI_Send(), but I can't receive the same address with MPI_Recv(). My question is - Why does MPI_Recv() cause a segment fault? I want to simply print the value contained in a[0][0] immediately after the MPI_Recv() call, but the program crashes.
MPI_Send(a[0][0], 1, MPI_FLOAT ...) will send memory with size sizeof(float) starting at a[0][0]
So basicaly the value sent is *(reinterpret_cast<float*>(a[0][0]))
Therefore if a[0][0] is 0x0x83d6260 and *a[0][0] is 0.58f then MPI_Recv(&buff, 1, MPI_FLOAT...) will set buffer (of type float, which need to be allocated) to 0.58
On important thing is that different MPI programm should NEVER share pointers (even if they run on the same node). They do not share virtual memory pagination and event if you where able to acces the adress from one on the rank, the other ones should give you a segfault if you try to access the same adress in their context
EDIT
This code works for me :
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char* argv[])
{
int size, rank;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
switch(rank)
{
case 0:
{
float*** a;
a = malloc(sizeof(float**));
a[0] = malloc(sizeof(float* ));
a[0][0] = malloc(sizeof(float ));
*a[0][0] = 0.58;
MPI_Send(a[0][0], 1, MPI_FLOAT, 1, 0, MPI_COMM_WORLD);
printf("rank 0 send done\n");
free(a[0][0]);
free(a[0] );
free(a );
break;
}
case 1:
{
float buffer;
MPI_Recv(&buffer, 1, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("rank 1 recv done : %f\n", buffer);
break;
}
}
MPI_Finalize();
return 0;
}
results are :
mpicc mpi.c && mpirun ./a.out -n 2
> rank 0 send done
> rank 1 recv done : 0.580000
I think the problem is that you're trying to put the value into the array of pointers (which is probably causing the segfault). Try making a new buffer to receive the value:
MPI_Send(a[0][0], 1, MPI_FLOAT, i, 0, MPI_COMM_WORLD);
....
double buff;
MPI_Recv(&buff, 1, MPI_FLOAT, iNumProcs-1, 0, MPI_COMM_WORLD, &status);
If I remember correctly the MPI_Send/Recv will dereference the pointer giving you the value, not the address.
You also haven't given us enough information to tell if your source/destination values are correct.

What should I do for recieving when the number of send messages is unknown in MPI?

I am programming in MPI. I want to send something to another processor and receive it there, but I don't know how many messages I will send. In fact, the number of messages which send to the other processor depends on the file which I am reading it during the program, so I don't know how many receives I should write on the other side. Which method and which function should I use?
You can still use sends and receives, but you would also add a new kind of message that tells the receiving process that there will be no new messages. Usually this is handled by sending with a different tag. So you program would look something like this:
if (sender) {
while (data_to_send == true) {
MPI_Send(data, size, datatype, receiving_rank, 0, MPI_COMM_WORLD);
}
for (i = 0; i < size; i++) {
MPI_Send(NULL, 0, MPI_INT, i, 1, MPI_COMM_WORLD);
}
} else {
while (1) {
MPI_Recv(data, size, datatype, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (status.MPI_TAG == 1) break;
/* Do processing */
}
}
There is a better way that works if you have non-blocking collectives (from MPI-3). Before you start receiving data, you post a non-blocking barrier. Then you start posting non-blocking receives. Instead of waiting only on the receives, you use a waitany on both requests and when the barrier is done, you know here won't be any more data. On the sender side, you just keep sending data until there's no more, then do a non-blocking barrier to finish things off.

Simple MPI_Scatter try

I am just learning OpenMPI. Tried a simple MPI_Scatter example:
#include <mpi.h>
using namespace std;
int main() {
int numProcs, rank;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int* data;
int num;
data = new int[5];
data[0] = 0;
data[1] = 1;
data[2] = 2;
data[3] = 3;
data[4] = 4;
MPI_Scatter(data, 5, MPI_INT, &num, 5, MPI_INT, 0, MPI_COMM_WORLD);
cout << rank << " recieved " << num << endl;
MPI_Finalize();
return 0;
}
But it didn't work as expected ...
I was expecting something like
0 received 0
1 received 1
2 received 2 ...
But what I got was
32609 received
1761637486 received
1 received
33 received
1601007716 received
Whats with the weird ranks? Seems to be something to do with my scatter? Also, why is the sendcount and recvcount the same? At first I thought since I'm scattering 5 elements to 5 processors, each will get 1? So I should be using:
MPI_Scatter(data, 5, MPI_INT, &num, 1, MPI_INT, 0, MPI_COMM_WORLD);
But this gives an error:
[JM:2861] *** An error occurred in MPI_Scatter
[JM:2861] *** on communicator MPI_COMM_WORLD
[JM:2861] *** MPI_ERR_TRUNCATE: message truncated
[JM:2861] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
I am wondering though, why doing I need to differentiate between root and child processes? Seems like in this case, the source/root will also get a copy? Another thing is will other processes run scatter too? Probably not, but why? I thought all processes will run this code since its not in the typical if I see in MPI programs?
if (rank == xxx) {
UPDATE
I noticed to run, send and receive buffer must be of same length ... and the data should be declared like:
int data[5][5] = { {0}, {5}, {10}, {3}, {4} };
Notice the columns is declared as length 5 but I only initialized 1 value? What is actually happening here? Is this code correct? Suppose I only want each process to receive 1 value only.
sendcount is the number of elements you want to send to each process, not the count of elements in the send buffer. MPI_Scatter will just take sendcount * [number of processes in the communicator] elements from the send buffer from the root process and scatter it to all processes in the communicator.
So to send 1 element to each of the processes in the communicator (assume there are 5 processes), set sendcount and recvcount to be 1.
MPI_Scatter(data, 1, MPI_INT, &num, 1, MPI_INT, 0, MPI_COMM_WORLD);
There are restrictions on the possible datatype pairs and they are the same as for point-to-point operations. The type map of recvtype should be compatible with the type map of sendtype, i.e. they should have the same list of underlying basic datatypes. Also the receive buffer should be large enough to hold the received message (it might be larger, but not smaller). In most simple cases, the data type on both send and receive sides are the same. So sendcount - recvcount pair and sendtype - recvtype pair usually end up the same. An example where they can differ is when one uses user-defined datatype(s) on either side:
MPI_Datatype vec5int;
MPI_Type_contiguous(5, MPI_INT, &vec5int);
MPI_Type_commit(&vec5int);
MPI_Scatter(data, 5, MPI_INT, local_data, 1, vec5int, 0, MPI_COMM_WORLD);
This works since the sender constructs messages of 5 elements of type MPI_INT while each receiver interprets the message as a single instance of a 5-element integer vector.
(Note that you specify the maximum number of elements to be received in MPI_Recv and the actual amount received might be less, which can be obtained by MPI_Get_count. In contrast, you supply the expected number of elements to be received in recvcount of MPI_Scatter so error will be thrown if the message length received is not exactly the same as promised.)
Probably you know by now that the weird rank printed out is caused by stack corruption, since num can only contains 1 int but 5 int are received in MPI_Scatter.
I am wondering though, why doing I need to differentiate between root and child processes? Seems like in this case, the source/root will also get a copy? Another thing is will other processes run scatter too? Probably not, but why? I thought all processes will run this code since its not in the typical if I see in MPI programs?
It is necessary to differentiate between root and other processes in the communicator (they are not child process of the root since they can be in a separate computer) in some operations such as Scatter and Gather, since these are collective communication (group communication) but with a single source/destination. The single source/destination (the odd one out) is therefore called root. It is necessary for all the processes to know the source/destination (root process) to set up send and receive correctly.
The root process, in case of Scatter, will also receive a piece of data (from itself), and in case of Gather, will also include its data in the final result. There is no exception for the root process, unless "in place" operations are used. This also applies to all collective communication functions.
There are also root-less global communication operations like MPI_Allgather, where one does not provide a root rank. Rather all ranks receive the data being gathered.
All processes in the communicator will run the function (try to exclude one process in the communicator and you will get a deadlock). You can imagine processes on different computer running the same code blindly. However, since each of them may belong to different communicator group and has different rank, the function will run differently. Each process knows whether it is member of the communicator, and each knows the rank of itself and can compare to the rank of the root process (if any), so they can set up the communication or do extra actions accordingly.

MPI_Isend/Recv- Is there a deadlock?

I have a total of 8 messages being passed on 4 nodes using MPI. I noticed that there were two messages whose arrays did not provide meaningful results. I have copied an excerpt of the code below? These are some related questions I had based on the code/results below:
Does the MPI_Isend also require a wait? I am not sure if there is a deadlock. I also tried just passing these two variables from one node to the other, and the array values were still NULL.
Will MPI_SendRecv improve the efficiency of the code as suggested here Non Blocking communication in MPI and MPI Wait Issue. Not all information is passed correctly? If so, how/why? Would also appreciate some pointers on setting that up.
Thanks!
Source Code:
if ((my_rank) == 0)
{
MPI_Irecv(A, Rows, MPI_DOUBLE, my_rank+1, MPI_ANY_TAG, MPI_COMM_WORLD, &request[6]);
MPI_Wait(&request[6], &status[6]);
}
if ((my_rank) == 1)
{
MPI_Isend(AA, Rows, MPI_DOUBLE, my_rank-1, 0, MPI_COMM_WORLD, &request[6]);
}
if ((my_rank) == 2)
{
MPI_Isend(B, Rows, MPI_DOUBLE, my_rank+1, 0, MPI_COMM_WORLD, &request[7]);
}
if ((my_rank) == 3)
{
MPI_Irecv(BB, Rows, MPI_DOUBLE, my_rank-1, MPI_ANY_TAG, MPI_COMM_WORLD, &request[7]);
MPI_Wait(&request[7], &status[7]);
}
Yes, All non-blocking calls (MPI_Isend, MPI_Irecv etc) require a matching MPI_Wait. The call is not guaranteed to complete until MPI_Wait is called. You should not change the contents of the buffer until after MPI_Wait returns.
https://computing.llnl.gov/tutorials/mpi/
To use SendRecv, same task has to send a message and wait to receive a message. That pattern doesnt hold true for your code.