I am trying to fake shared memory currently while using the MPI library in C++. I have an array A of size n+1, where n is given from the user, and have processor 0 generate the integers for that array. I need to share the array that processor 0 created with all the other processes. So as a result I Bcast it to the others... However when I go to have each processor print out their array I get a signal 11 (Segmentation Fault) from a processor that isn't zero. If I comment that section out it runs with no problems. I would like to be able to see that my array was sent and stored correctly in all of the processors.
int *A=new int[n+1];
if(my_rank==0)
{
srand(1251);
A[0]=0;
for(int i=1; i<=n; i++){
A[i]=rand()%100;}
MPI_Bcast(&A, n+1, MPI_INT, 0, MPI_COMM_WORLD);
}
else {
MPI_Bcast(&A, n+1, MPI_INT, 0, MPI_COMM_WORLD);
}
cout<<"My rank is "<<my_rank<<" and this is my array:"<<endl;
for (int i=0; i<=n; i++)
{cout<<A[i]<<" "<<endl;}
cout<<endl;
You are incorrectly passing &A as address to MPI_Bcast. This is the address of the pointer, MPI needs the address of the data i.e. A.
MPI_Bcast(A, n+1, MPI_INT, 0, MPI_COMM_WORLD);
Move that code outside of the if/else block. It is the same call for all ranks.
Related
I have a task to speed up a program using MPI.
Let's assume I have a large 2d array (1000x1000 or bigger) on the input. I have a working sequential program that divides, so the 2d array into chunks (for example 10x10) and calculates the result which is double for each chuck. (so we have a function which argument is 2d array 10x10 and a result is a double number).
My first idea to speed up:
Create 1d array of size N*N (for example 10x10 = 100) and Send array to another process
double* buffer = new double[dataPortionSize];
//copy some data to buffer
MPI_Send(buffer, dataPortionSize, MPI_DOUBLE, currentProcess, 1, MPI_COMM_WORLD);
Recieve it in another process, calculate result, send back the result
double* buf = new double[dataPortionSize];
MPI_Recv(buf, dataPortionSize, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD, status);
double result = function->calc(buf);
MPI_Send(&result, 1, MPI_DOUBLE, 0, 3, MPI_COMM_WORLD);
This program was much slower than the sequential version. It looks like MPI needs a lot of time to pass an array to another process.
My second idea:
Pass the whole 2d input array to all processes
// data is protected field in base class, it is injected during runtime
MPI_Send(&(data[0][0]), dataSize * dataSize, MPI_DOUBLE, currentProcess, 1, MPI_COMM_WORLD);
And receive data like this
double **arrayAlloc( int size ) {
double **result; result = new double [ size ];
for ( int i = 0; i < size; i++ )
result[ i ] = new double[ size ];
return result;
}
double **data = arrayAlloc(dataSize);
MPI_Recv(&data[0][0], dataSize * dataSize, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD, status);
Unfortunately, I got a bunch of errors during execution:
Those crashes are pretty random. It happened 2 times that the program ended successfully
My third idea:
Pass memory address to all processes, but I found this:
MPI processes cannot read each others' memory, and virtual addressing makes one process' pointer completely meaningless to another.
Does anyone have an idea how to speed up it? I understand that the key thing for speed to is pass array/arrays to processes in an efficient way, but I don't have an idea how to do this.
You have multiple issues here. I'll try to go through them in some arbitrary order.
As someone else explained, your second attempt fails because MPI expects you to work with a single consecutive array, not an array of pointers. So you want to allocate something like matrix = new double[rows * cols] and then access individual rows as &matrix[row * cols] or an individual value as matrix[row * cols + col]
This would be a data structure that you can send, receive, scatter, and gather with MPI. It would also be faster in general.
You are correct to assume that MPI takes time to transfer data. Even best case it is the cost of a memcpy. Usually significantly more. If your program is doing too little work before transferring data, it will not be faster.
Your first attempt may have failed because the first process doesn't do anything useful while waiting for the result. You didn't include the receive operation in your code sample. However, if you wrote something like this:
for(int block = 0; block < nblocks; ++block) {
generate_data(buf);
MPI_Send(buf, ...);
MPI_Recv(buf, ...);
}
Then you cannot expect a speedup because the process is not doing anything useful while waiting for the result. You can avoid this with double buffering. Let the first process generate the next data block before waiting in the receive operation for the result. Something like this:
generate_data(0, input); /* 0-th block */
MPI_Send(input, ...);
for(int block = 1; block < nblocks; ++block) {
generate_data(block, input); /* 1st up to nth block */
MPI_Recv(output, ...); /* 0-th up to n-1-th block */
MPI_Send(input, ...);
}
MPI_Recv(output, ...); /* n-th block */
Now calculations in both processes can overlap.
You shouldn't use MPI_Send and MPI_Recv to begin with! MPI is designed for collective operations like MPI_Scatter and MPI_Gather. What you should do, is generate N blocks for N processes, MPI_Scatter them across all processes. Then let each process compute their result. Then MPI_Gather them back at the root process.
Even better, let every process work independently, if possible. Of course this depends on your data but if you can generate and process data blocks independently from one another, don't do any communication. Just let them all work alone. Something like this:
int rank, worldsize;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &worldsize);
for(int block = rank; block < nblocks; block += worldsize) {
process_data(block);
}
I have an MPI program in which worker ranks (rank != 0) make a bunch of MPI_Send calls, and the master rank (rank == 0) receives all these messages. However, I run into a Fatal error in MPI_Recv - MPI_Recv(...) failed, Out of memory.
Here is the code that I am compiling in Visual Studio 2010.
I run the executable like so:
mpiexec -n 3 MPIHelloWorld.exe
int main(int argc, char* argv[]){
int numprocs, rank, namelen, num_threads, thread_id;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
if(rank == 0){
for(int k=1; k<numprocs; k++){
for(int i=0; i<1000000; i++){
double x;
MPI_Recv(&x, 1, MPI_DOUBLE, k, i, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
}
}
else{
for(int i=0; i<1000000; i++){
double x = 5;
MPI_Send(&x, 1, MPI_DOUBLE, 0, i, MPI_COMM_WORLD);
}
}
}
If I run with only 2 processes, the program does not crash. So it seems like the problem is when there is an accumulation of the MPI_Send calls from a third rank (aka a second worker node).
If I decrease the number of iterations to 100,000 then I can run with 3 processes without crashing. However, the amount of data being sent with one million iterations is ~ 8 MB (8 bytes for double * 1000000 iterations), so I don't think the "Out of Memory" is referring to any physical memory like RAM.
Any insight is appreciated, thanks!
The MPI_send operation stores the data on the system buffer ready to send. The size of this buffer and where it is stored is implementation specific (I remember hearing that this can even be in the interconnects). In my case (linux with mpich) I don't get a memory error. One way to explicitly change this buffer is to use MPI_buffer_attach with MPI_Bsend. There may also be a way to change the system buffer size (e.g. MP_BUFFER_MEM system variable on IBM systems).
However that this situation of unrequited messages should probably not occur in practice. In your example above, the order of the k and i loops could be swapped to prevent this build up of messages.
I've got probles with receiving MPI Array. I'm doing something like this:
int *b = new int[5];
for(int i = 0; i < 5; i++) {
b[i] = i;
}
MPI_Send(&b[0], 5, MPI_INT, procesDocelowy, 0, MPI_COMM_WORLD);
this is how I send my array.
Receiving:
int *b = new int[5];
MPI_Recv(&b, 5, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
My problems is that I cant receive arrays which was allocated dynamically. My process hangs just after MPI_recv and I get:
job aborted:
rank: node: exit code: message
0: Majster: terminated
1: Majster: terminated
2: Majster: 0xc0000005: process exited without calling finalize
3: Majster: terminated
It's quite interesting, because if I initialize my array in static way, I mean
int b[5]; when receiving and
int b[] = {1,2,3,4,5}; while sending
everything works fine.
I can't initialize arrays in static way, I have to do this dynamically. Any ideas how to resolve this problem?
It's because you use &b to refer to your array when you call MPI_Recv(). If you use a pointer to a dynamic address, you send the address of the pointer instead of the address of the array.
I'm currently working on a C program using MPI, and I've run into a roadblock regarding the MPI_Send() and MPI_Recv() functions, that I hope you all can help me out with. My goal is to send (with MPI_Send()), and receive (with MPI_Recv()), the address of "a[0][0]" (Defined Below), and then display the CONTENTS of that address after I've received it from MPI_Recv(), in order to confirm my send and receive is working. I've outlined my problem below:
I have a 2-d array, "a", that works like this:
a[0][0] Contains my target ADDRESS
*a[0][0] Contains my target VALUE
i.e. printf("a[0][0] Value = %3.2f, a[0][0] Address = %p\n", *a[0][0], a[0][0]);
So, I run my program and memory is allocated for a. Debug confirms that a[0][0] contains the address 0x83d6260, and the value stored at address 0x83d6260, is 0.58. In other words, "a[0][0] = 0x83d6260", and "*a[0][0] = 0.58".
So, I pass the address, "a[0][0]", as the first parameter of MPI_Send():
-> MPI_Send(a[0][0], 1, MPI_FLOAT, i, 0, MPI_COMM_WORLD);
// I put 1 as the second parameter becasue I only want to receive this one address
MPI_Send() executes and returns 0, which is MPI_SUCCESS, which means that it succeeded, and my Debug confirms that "0x83d6260" is the address passed.
However, when I attempt to receive the address by using MPI_Recv(), I get Segmentation fault:
MPI_Recv(a[0][0], 1, MPI_FLOAT, iNumProcs-1, 0, MPI_COMM_WORLD, &status);
The address 0x83d6260 was sent successfully using MPI_Send(), but I can't receive the same address with MPI_Recv(). My question is - Why does MPI_Recv() cause a segment fault? I want to simply print the value contained in a[0][0] immediately after the MPI_Recv() call, but the program crashes.
MPI_Send(a[0][0], 1, MPI_FLOAT ...) will send memory with size sizeof(float) starting at a[0][0]
So basicaly the value sent is *(reinterpret_cast<float*>(a[0][0]))
Therefore if a[0][0] is 0x0x83d6260 and *a[0][0] is 0.58f then MPI_Recv(&buff, 1, MPI_FLOAT...) will set buffer (of type float, which need to be allocated) to 0.58
On important thing is that different MPI programm should NEVER share pointers (even if they run on the same node). They do not share virtual memory pagination and event if you where able to acces the adress from one on the rank, the other ones should give you a segfault if you try to access the same adress in their context
EDIT
This code works for me :
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char* argv[])
{
int size, rank;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
switch(rank)
{
case 0:
{
float*** a;
a = malloc(sizeof(float**));
a[0] = malloc(sizeof(float* ));
a[0][0] = malloc(sizeof(float ));
*a[0][0] = 0.58;
MPI_Send(a[0][0], 1, MPI_FLOAT, 1, 0, MPI_COMM_WORLD);
printf("rank 0 send done\n");
free(a[0][0]);
free(a[0] );
free(a );
break;
}
case 1:
{
float buffer;
MPI_Recv(&buffer, 1, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("rank 1 recv done : %f\n", buffer);
break;
}
}
MPI_Finalize();
return 0;
}
results are :
mpicc mpi.c && mpirun ./a.out -n 2
> rank 0 send done
> rank 1 recv done : 0.580000
I think the problem is that you're trying to put the value into the array of pointers (which is probably causing the segfault). Try making a new buffer to receive the value:
MPI_Send(a[0][0], 1, MPI_FLOAT, i, 0, MPI_COMM_WORLD);
....
double buff;
MPI_Recv(&buff, 1, MPI_FLOAT, iNumProcs-1, 0, MPI_COMM_WORLD, &status);
If I remember correctly the MPI_Send/Recv will dereference the pointer giving you the value, not the address.
You also haven't given us enough information to tell if your source/destination values are correct.
The issue I am trying to resolve is the following:
The C++ serial code I have computes across a large 2D matrix. To optimize this process, I wish to split this large 2D matrix and run on 4 nodes (say) using MPI. The only communication that occurs between nodes is the sharing of edge values at the end of each time step. Every node shares the edge array data, A[i][j], with its neighbor.
Based on reading about MPI, I have the following scheme to be implemented.
if (myrank == 0)
{
for (i= 0 to x)
for (y= 0 to y)
{
C++ CODE IMPLEMENTATION
....
MPI_SEND(A[x][0], A[x][1], A[x][2], Destination= 1.....)
MPI_RECEIVE(B[0][0], B[0][1]......Sender = 1.....)
MPI_BARRIER
}
if (myrank == 1)
{
for (i = x+1 to xx)
for (y = 0 to y)
{
C++ CODE IMPLEMENTATION
....
MPI_SEND(B[x][0], B[x][1], B[x][2], Destination= 0.....)
MPI_RECEIVE(A[0][0], A[0][1]......Sender = 1.....)
MPI BARRIER
}
I wanted to know if my approach is correct and also would appreciate any guidance on other MPI functions too look into for implementation.
Thanks,
Ashwin.
Just to amplify Joel's points a bit:
This goes much easier if you allocate your arrays so that they're contiguous (something C's "multidimensional arrays" don't give you automatically:)
int **alloc_2d_int(int rows, int cols) {
int *data = (int *)malloc(rows*cols*sizeof(int));
int **array= (int **)malloc(rows*sizeof(int*));
for (int i=0; i<rows; i++)
array[i] = &(data[cols*i]);
return array;
}
/*...*/
int **A;
/*...*/
A = alloc_2d_init(N,M);
Then, you can do sends and recieves of the entire NxM array with
MPI_Send(&(A[0][0]), N*M, MPI_INT, destination, tag, MPI_COMM_WORLD);
and when you're done, free the memory with
free(A[0]);
free(A);
Also, MPI_Recv is a blocking recieve, and MPI_Send can be a blocking send. One thing that means, as per Joel's point, is that you definately don't need Barriers. Further, it means that if you have a send/recieve pattern as above, you can get yourself into a deadlock situation -- everyone is sending, no one is recieving. Safer is:
if (myrank == 0) {
MPI_Send(&(A[0][0]), N*M, MPI_INT, 1, tagA, MPI_COMM_WORLD);
MPI_Recv(&(B[0][0]), N*M, MPI_INT, 1, tagB, MPI_COMM_WORLD, &status);
} else if (myrank == 1) {
MPI_Recv(&(A[0][0]), N*M, MPI_INT, 0, tagA, MPI_COMM_WORLD, &status);
MPI_Send(&(B[0][0]), N*M, MPI_INT, 0, tagB, MPI_COMM_WORLD);
}
Another, more general, approach is to use MPI_Sendrecv:
int *sendptr, *recvptr;
int neigh = MPI_PROC_NULL;
if (myrank == 0) {
sendptr = &(A[0][0]);
recvptr = &(B[0][0]);
neigh = 1;
} else {
sendptr = &(B[0][0]);
recvptr = &(A[0][0]);
neigh = 0;
}
MPI_Sendrecv(sendptr, N*M, MPI_INT, neigh, tagA, recvptr, N*M, MPI_INT, neigh, tagB, MPI_COMM_WORLD, &status);
or nonblocking sends and/or recieves.
First you don't need that much barrier
Second, you should really send your data as a single block as multiple send/receive blocking their way will result in poor performances.
This question has already been answered quite thoroughly by Jonathan Dursi; however, as Jonathan Leffler has pointed out in his comment to Jonathan Dursi's answer, C's multi-dimensional arrays are a contiguous block of memory. Therefore, I would like to point out that for a not-too-large 2d array, a 2d array could simply be created on the stack:
int A[N][M];
Since, the memory is contiguous, the array can be sent as it is:
MPI_Send(A, N*M, MPI_INT,1, tagA, MPI_COMM_WORLD);
On the receiving side, the array can be received into a 1d array of size N*M (which can then be copied into a 2d array if necessary):
int A_1d[N*M];
MPI_Recv(A_1d, N*M, MPI_INT,0,tagA, MPI_COMM_WORLD,&status);
//copying the array to a 2d-array
int A_2d[N][M];
for (int i = 0; i < N; i++){
for (int j = 0; j < M; j++){
A_2d[i][j] = A_1d[(i*M)+j]
}
}
Copying the array does cause twice the memory to be used, so it would be better to simply use A_1d by accessing its elements through A_1d[(i*M)+j].