Consecutive MPI non-blocking calls - c++

I have been wondering how MPI runtime would differentiate messages between multiple non-blocking calls (inside a same comm world)?
ex: Say we have multiple Iallgather operations.
...
auto res1 = MPI_Iallgather(... , MPI_COMM_WORLD, req[0]);
auto res2 = MPI_Iallgather(... , MPI_COMM_WORLD, req[1]);
MPI_Waitall(2, req, MPI_STATUSES_IGNORE);
...
In Isend/Irecv routines, there's a int tag parameter. But for other non-blocking calls, there's no tag param.
When we create a MPI_Request object, would it create a unique tag?

Since, as you observe, there is no tag, there may be a problem if two processes issue the Iallgathers in different orders. Therefore all processes need to issue the non-blocking collectives in the same order. The request object offers no help here, because the first request corresponds to whatever you do first, on whatver process, so you can have mismatches there.

Related

What does this sentence in the OpenMPI documentation mean?

In the documentation of OpenMPI one can read the following sentence in the "When communicator is an Inter-communicator"-section:
The send buffer argument of the processes in the first group must be consistent with the receive buffer argument of the root process in the second group.
This section only appears in the documentation of non-blocking functions. In my case this is MPI_Igatherv.
I have an Inter-communicator connecting two groups. The first group contains only one process, which is a master (distributing and collecting data). The second group contains one or more worker processes (receiving data, doing work and sending results back). All the workers have the same code and the master has its own separate code. The master starts the workers with MPI_Spawn.
However I am concerned, I am not using the function call correctly.
As the master tries to receive data, I use the following code:
MPI_Igatherv(nullptr, 0, MPI_DOUBLE, recv_buf, sizes, offsets, MPI_DOUBLE, MPI_ROOT, inter_comm, &mpi_request);
The master does not contribute any data, so the send buffer here is a nullptr with zero size.
On the other hand, all workers send data like this:
MPI_Igatherv(send_buf, size, MPI_DOUBLE, nullptr, nullptr, nullptr, MPI_DOUBLE, 0, inter_comm, &mpi_request);
The workers do not receive any data, so the receive buffer is a nullptr with no sizes or offsets.
Is this the correct way?

How does one send custom MPI_Datatype over to a different process?

Suppose that I create custom MPI_Datatypes for subarrays of different sizes on each of the MPI processes allocated to a program. Now I wish to send these subarrays to the master process and assemble them into a bigger array block by block. The master process is unaware of the individual datatypes (defined by the local sizes) on the other processes. Naively, therefore, I might attempt to send over these custom datatypes to the master process in the following manner.
MPI_Datatype localarr_type;
MPI_Type_create_subarray( NDIMS, array_size, local_size, box_corner, MPI_ORDER_C, MPI_FLOAT, &localarr_type );
MPI_Type_Commit(&localarr_type);
if (rank == master)
{
for (int id = 1; id < nprocs; ++id)
{
MPI_Recv( &localarr_type, 1, MPI_Datatype, id, tag1[id], comm_cart, MPI_STATUS_IGNORE );
MPI_Recv( big_array, 1, localarray_type, id, tag2[id], comm_cart, MPI_STATUS_IGNORE );
}
}
else
{
MPI_Send( &localarr_type, 1, MPI_Datatype, master, tag1[rank], comm_cart );
MPI_Send( local_box, 1, localarr_type, master, tag2[rank], comm_cart );
}
However, this results in a compilation error with the following error message from the GNU and CLANG compilers, and the latter error message from the Intel compiler.
/* GNU OR CLANG COMPILER */
error: unexpected type name 'MPI_Datatype': expected expression
/* INTEL COMPILER */
error: type name is not allowed
This means that either (1) I am attempting to send a custom MPI_Datatype over to a different process in the wrong way or that (2) this is not possible at all. I would like to know which it is, and if it is (1), I would like to know what the correct way of communicating a custom MPI_Datatype is. Thank you.
Note.
I am aware of other ways of solving the above problem without needing to communicate MPI_Datatypes. For example, one could communicate the local array sizes and manually reconstruct the MPI_Datatype from other processes inside the master process before using it in the subsequent communication of subarrays. This is not what I am looking for.
I wish to communicate the custom MPI_Datatype itself (as shown in the example above), not something that is an instance of the datatype (which is doable, as also shown in the example code above).
First of all: You can not send a datatype like that. The value MPI_Datatype is not a value of type MPI_Datatype. (It's a cute idea though.) You could send the parameters with which it is constructed, and the reconstruct it on the sending type.
However, you are probably misunderstanding the nature of MPI. In your code, with the same datatype on workers and manager, you are sort of assuming that everyone has data of the same size/shape. That is not compatible with the manager gathering everything together.
If you're gathering data on a manager process (usually not a good idea: are you really sure you need that?) then the contributing processes have the data in a small array, say at index 0..99. So you can send them as an ordinary contiguous buffer. The "manager" has a much larger array, and places all the contributions in disjoint locations. So at most the manager needs to create subarray types to indicate where the received data goes in the big array.

MPI - how to send avalue to a specific position in array

I want so send a value to a position in array of another process.
so
1st process: MPI_ISend (&val..., process, ..)
2nd process: MPI_Recv (&array[i], ..., process, ...)
So I know the i number on the first process, I also know, that I can't use a variable - first send i and then val, as other processes can change i ( 2nd process is accepting messages from many others).
First of all other send/receives should not/cannot overwrite i. You should keep your messages clear and separated. That's what the tag is for! Also rank_2 can separate which rank did send the data. So you can have one i for every rank you await a message from.
Finally you might want to check out one-sided MPI communication (MPI_Win). With that technique rank_1 can 'drop' the message directly into rank_2's array at the position only known to rank_1.

MPI Busy wait for response

I had something like
while(j<nOSlaves)
{
//Iterate through all the slaves.
for(int i=1;i<nOSlaves && j<nOSlaves;i++)
{
//Create a taskMessage which contains length and distance.
MPI_Status st;
MPI_Recv(&buffer, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &st);
if (buffer > 0)
{ //Handle the message.... }
}
}
The problem now is that I have to wait everyone until the message arrives, I wanted it faster and tried it async.
MPI_Irecv(&buffer, 1, MPI_INT, i, 0, MPI_COMM_WORLD, &rq);
int flag = 0;
MPI_Test(&rq, &flag, &st);
//If the asynchronous message has been received advance, else try again later.
if (flag)
{ //Handle the message.... }
But after each iteration of the for I will lose the request.
Is there a way to iterate throught all the "slaves" and look if some already answered?
With that error you have some block. Your sends and receives are not happening at the same time.
So MPI_Irecv is the non blocking receive function while MPI_Recv is a blocking function. Since you do not include your sending functions it is difficult to tell what is causing the blocking here (but given that error message this is likely the case). I suggest looking at Hristo Ilieve's (he seems to be active here) tutorials. One of the hardest things in MPI is the blocking. A way you could ensure that everything is caught up is to make use of MPI_Barrier though I mostly use that for debugging. If you are concerned with speed then passing a single integer is not a great idea, unless you are using it for indexing. You can also use MPI_Scatterv if you want to have uneven chunks sent out. If you are sending the same section of buffer, which it looks like you are, you might try MPI_Bcast.
I find that it helps to write down some parts of the code to make sure that they are not blocking each other.

Retrieve buffer with multiple overlapped I/O requests

There is something I'd like to know about overlapped I/O under windows, both with and without I/O completion ports.
I know in advance how many packets I will be receiving after using WSASend().
So I'd like to do that
for (int i = 0; i < n; i++)
WSARecv(sock, &buffer_array[i], 1, NULL, 0, &overlapped, completion_routine);
My problem is : how can I know which buffer has been filled upon notification the buffer has been filled? I mean, without guessing by the order of the calls (buffer[0], buffer[1], buffer[2], etc.).
I would find an alternative solution that gives me the buffer pointer at the time of the notification much more clean for example, and more easily changeable/adaptable as the design of my application evolves.
Thanks.
Right now you are starting n concurrent receive operations. Instead, start them one after the other. Start the next one when the previous one has completed.
When using a completion routine, the hEvent field in the OVERLAPPED block is unused and can be used to pass context info into the completion routine. Typically, this would be a pointer to a buffer class instance or an index to an array of buffer instances. Often, the OVL block would be a struct member of the instance since you need a separate OVL per call.