I am trying to write an MPI code to process a large 2D matrix. I'm basically dividing the matrix into chunks and giving those chunks to individual processes. I see that the processes complete their task, send the processed array back to the Master. However, after calling MPI_Finalize(), a random process gives me a seg fault. I have checked and debugged the addressing (like whether a process is accessing some invalid memory) but haven't found any issues in that. I have attached my code below :-
#include "mpi/mpi.h"
#include <iostream>
using namespace std;
#define n 6
#define iter 1
#define MASTER 0
#define FROM_MASTER 1
#define FROM_SLAVE 2
int main(int argc, char* argv[]) {
int num_tasks, num_workers, task;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &num_tasks);
MPI_Comm_rank(MPI_COMM_WORLD, &task);
num_workers = num_tasks - 1;
MPI_Status status;
double *mat;
int rows_per_task = (n - 2) / num_workers;
int index = 0;
int size;
if(task == MASTER) {
// Allocate n*n matrix
mat = (double *)malloc(sizeof(double) * n * n);
// Initialize the matrix
for(int i = 1; i <= num_workers; ++i) {
// Accommodate for extra rows per task
size = (i <= (n % num_workers)) ? rows_per_task + 1 : rows_per_task;
// Send rows to the Slave processes
MPI_Send(&index, 1, MPI_INT, i, FROM_MASTER, MPI_COMM_WORLD); // Start index
MPI_Send(&size, 1, MPI_INT, i, FROM_MASTER, MPI_COMM_WORLD); // Size of the array chunk
MPI_Send(&mat[index], n * size, MPI_DOUBLE, i, FROM_MASTER, MPI_COMM_WORLD); // Array
index += n * size;
}
for(int i = 1; i <= num_workers; ++i) {
// Get array size, start index, actual array from all the processes
MPI_Recv(&size, 1, MPI_INT, i, FROM_SLAVE, MPI_COMM_WORLD, &status);
MPI_Recv(&index, 1, MPI_INT, i, FROM_SLAVE, MPI_COMM_WORLD, &status);
MPI_Recv(&mat[index], n * size, MPI_DOUBLE, i, FROM_SLAVE, MPI_COMM_WORLD, &status);
}
printf("MASTER DONE!\n");
}
if(task > 0) {
// Get index, size, rows
MPI_Recv(&index, 1, MPI_INT, MASTER, FROM_MASTER, MPI_COMM_WORLD, &status);
MPI_Recv(&size, 1, MPI_INT, MASTER, FROM_MASTER, MPI_COMM_WORLD, &status);
MPI_Recv(&mat[index], n * size, MPI_DOUBLE, MASTER, FROM_MASTER, MPI_COMM_WORLD, &status);
// Repeat
for(int it = 0; it < iter; ++it) {
for(int i = 0; i < size; ++i) {
for(int j = 0; j < n; ++j) {
int idx = index + n * i + j; // 2D -> 1D index transformation
// Do something with the array element mat[idx]
}
}
}
MPI_Send(&size, 1, MPI_INT, MASTER, FROM_SLAVE, MPI_COMM_WORLD);
MPI_Send(&index, 1, MPI_INT, MASTER, FROM_SLAVE, MPI_COMM_WORLD);
MPI_Send(&mat[index], n * size, MPI_DOUBLE, MASTER, FROM_SLAVE, MPI_COMM_WORLD);
printf("SLAVE %d DONE\n", task);
}
printf("TASK %d calling finalize\n", task);
MPI_Finalize();
printf("TASK %d called finalize\n", task);
return 0;
}
In this example, n is set to 6. If my number of processes = 3 (divisible by 6), equal sharing of rows happens and the program works. If I change n to 7, then I start getting an error :-
SLAVE 2 DONE
TASK 2 calling finalize
SLAVE 1 DONE
TASK 1 calling finalize
SLAVE 3 DONE
TASK 3 calling finalize
MASTER DONE!
TASK 0 calling finalize
[<node_name>:70041] *** Process received signal ***
[<node_name>:70041] Signal: Segmentation fault (11)
[<node_name>:70041] Signal code: (128)
[<node_name>:70041] Failing at address: (nil)
[<node_name>:70041] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7f24107b6090]
[<node_name>:70041] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x20)[0x7f241080d6f0]
[<node_name>:70041] [ 2] /lib/x86_64-linux-gnu/libopen-pal.so.40(+0x409e2)[0x7f24106269e2]
[<node_name>:70041] [ 3] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_datatype_finalize+0x79)[0x7f2410bc2eb9]
[<node_name>:70041] [ 4] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_finalize+0x773)[0x7f2410bb12e3]
[<node_name>:70041] [ 5] ./a.out(+0xd2cd)[0x5642eb9e52cd]
[<node_name>:70041] [ 6] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f2410797083]
[<node_name>:70041] [ 7] ./a.out(+0xc5ce)[0x5642eb9e45ce]
[<node_name>:70041] *** End of error message ***
TASK 1 called finalize
TASK 0 called finalize
TASK 3 called finalize
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 0 on node <node_name> exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
From the output, I see that the processes finish their tasks and every task (including master) calls finalize. Then, a seg fault happens for task 2 while the other 3 tasks finish. Although this is happening when I change the value of n such that n % number_of_slave_tasks != 0, I am positive nothing is wrong with that particular logic and I am accessing valid memory locations.
Help me out with this please!
Related
Why these lines of code:
if(my_rank != 0) {
sprintf(msg, "Hello from %d of %d...", my_rank, comm_sz);
if(my_rank == 2) {
sleep(2);
sprintf(msg, "Hello from %d of %d, I have slept 2 seconds...", my_rank, comm_sz);
}
MPI_Send(msg, strlen(msg), MPI_CHAR, 0, 0, MPI_COMM_WORLD);
}
else {
printf("Hello from the chosen Master %d\n", my_rank);
for(i = 1; i < comm_sz; i++) {
MPI_Recv(msg, MAX_STRING, MPI_CHAR, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("%s\n", msg);
}
}
give this result?
Hello from the chosen Master 0
Hello from 1 of 5...
Hello from 2 of 5, I have slept 2 seconds...
Hello from 3 of 5... have slept 2 seconds...
Hello from 4 of 5... have slept 2 seconds...
Doesn't each process have its copy of 'msg' ?
strlen() does not include the null terminator, hence it will not be sent to the master. Receiving the message from rank 3 will not overwrite the later part of the string, so it is still displayed. You should use strlen(msg) + 1 as send count.
I'm having trouble with my MPI_Isend and MPI_Irecv blocks of code. I need to send a number Cin to the next process up the line, then the current process can go about it's business.
The receiving process needs to receive before it can go further in it's calculations, but when I don't have MPI_Wait it nevers gets the data, and when I do it just hangs forever. What am I doing wrong?
Note: I only set Cin to 3 in order to see when the message doesn't go through. Currently it just hangs.
void ComputeS5C()
{
MPI_Request send_request, recv_request;
MPI_Status status;
int Cin[1] = {3};
if(my_rank == 0){Cin[0] = 0;}
else {
MPI_Irecv(Cin, 1, MPI_INT, my_rank - 1, 0, MPI_COMM_WORLD, &recv_request);
MPI_Wait(&recv_request, &status);
fprintf(stderr, "RANK:%d Message Received from rank%d: Cin=%d\n", my_rank, my_rank-1, Cin[0]);
}
int k;
for(k = 0; k < Size_5; k++)
{
int s5clast;
if(k==0)
{
s5clast = Cin[0];
}
else
{
s5clast = s5c[k-1];
}
s5c[k] = s5g[k] | (s5p[k]&s5clast);
}
//if not highest rank, pass the carryin upstream
if(my_rank < world_size - 1){
MPI_Isend(&s5c[k], 1, MPI_INT, my_rank+1, 1, MPI_COMM_WORLD, &send_request);
fprintf(stderr, "RANK:%d Message sent to rank%d: Cin=%d\n", my_rank, my_rank+1, s5c[k]);
}
MPI_Wait(&send_request, &status);
}
The error in your code has to do with the missmatch of tags. Messages are sent using a tag = 1 and received using tag = 0. Sends and receives are not matching explaining why all processes are stuck waiting that sent messages get consumed. Change the tags so that they match.
A note, when using MPI_Irecv you always need an MPI_Wait to be sure to know when it is safe to consume received data. I think in your example use of MPI_Recv is more approriate.
It seems that you communicate one rank after the other sequentially. Quite large overhead.
I'm finishing off a simple MPI program and I'm struggling on the last part of the project.
I send 2 ints containing a start point and end point to the slave node. And using these I need to create an array and populate it. I need to send this back to the Master node. Slave code below:
printf("Client waiting for start point and endpoint array\n");fflush(stdout);
int startEnd [2];
MPI_Recv(startEnd, 2, MPI_INT, 0, 100, MPI_COMM_WORLD, &status);
int end = startEnd[1];
int start = startEnd[0];
printf("Recieved Start End of %d \t %d\n", startEnd[0], startEnd[1]);fflush(stdout);
unsigned char TargetHash[MAX_HASH_LEN];
MPI_Recv(TargetHash, MAX_HASH_LEN, MPI_CHAR, 0, 100, MPI_COMM_WORLD, &status);
int sizeToCompute = (end - start);
uint64* pStartPosIndexE = new uint64[sizeToCompute];
int iterator = 0;
for (int nPos = end; nPos >= start; nPos--)
{
cwc.SetHash(TargetHash);
cwc.HashToIndex(nPos);
int i;
for (i = nPos + 1; i <= cwc.GetRainbowChainLength() - 2; i++)
{
cwc.IndexToPlain();
cwc.PlainToHash();
cwc.HashToIndex(i);
}
pStartPosIndexE[iterator] = cwc.GetIndex();
}
Is this the correct way to create the array of dynamic length and how would I send this array back to the master node?
Sending dynamically allocated arrays is no different than sending static arrays. When the array size varies, the receive code gets a bit more complicated, but not that much more complicated:
// ---------- Sender code ----------
MPI_Send(pStartPosIndexE, sizeToCompute, MPI_UINT64, 99, ...);
// --------- Receiver code ---------
// Wait for a message with tag 99
MPI_Status status;
MPI_Probe(MPI_ANY_SOURCE, 99, MPI_COMM_WORLD, &status);
// Get the number of elements in the message
int nElems;
MPI_Get_elements(&status, MPI_UINT64_T, &nElems);
// Allocate buffer of appropriate size
uint64 *result = new uint64[nElems];
// Receive the message
MPI_Recv(result, nElems, MPI_UINT64_T, status.MPI_SOURCE, 99, ...);
Using MPI_Probe with source rank of MPI_ANY_SOURCE is what is usually done in master/worker applications where workers are processed on a first-come-first-served basis.
Please correct me if I am misunderstanding how MPI_Send and MPI_Recv work, since I have just started learning MPI.
My current understanding is that the MPI standard guarantees that two messages which are sent one after another from one sender to one receiver will always appear to the receiver in the same order they were sent. This suggests to me that some kind of queuing must be happening either at the receiver, or the sender, or as part of some distributed state.
I am trying to understand the nature of this queue, so I wrote a simple pingpong program where all the odd-ranked nodes would send and receive with the even-ranked node whose node number was under it.
The idea is that if there is a global queue shared across all the nodes in the cluster, then running with a higher number of nodes should substantially increase the latency observed at each node. On the other hand, if the queue is at each receiver, then the latency increase should be relatively small. However, I get very mixed results, so I am not sure how to interpret them.
Can someone provide an interpretation of the following results, with respect to where the queue is resident?
$ mpirun -np 2 simple
Rank = 0, Message Length = 0, end - start = 0.000119
$ mpirun -np 2 simple
Rank = 0, Message Length = 0, end - start = 0.000117
$ mpirun -np 4 simple
Rank = 2, Message Length = 0, end - start = 0.000119
Rank = 0, Message Length = 0, end - start = 0.000253
$ mpirun -np 4 simple
Rank = 2, Message Length = 0, end - start = 0.000129
Rank = 0, Message Length = 0, end - start = 0.000303
$ mpirun -np 6 simple
Rank = 4, Message Length = 0, end - start = 0.000144
Rank = 2, Message Length = 0, end - start = 0.000122
Rank = 0, Message Length = 0, end - start = 0.000415
$ mpirun -np 8 simple
Rank = 4, Message Length = 0, end - start = 0.000119
Rank = 0, Message Length = 0, end - start = 0.000336
Rank = 2, Message Length = 0, end - start = 0.000323
Rank = 6, Message Length = 0, end - start = 0.000287
$ mpirun -np 10 simple
Rank = 2, Message Length = 0, end - start = 0.000127
Rank = 8, Message Length = 0, end - start = 0.000158
Rank = 0, Message Length = 0, end - start = 0.000281
Rank = 4, Message Length = 0, end - start = 0.000286
Rank = 6, Message Length = 0, end - start = 0.000278
This is the code that implements the pingpong.
#include "mpi.h" // MPI_I*
#include <stdlib.h>
#define MESSAGE_COUNT 100
int main(int argc, char* argv[]){
if (MPI_Init( &argc, &argv) != MPI_SUCCESS) {
std::cerr << "MPI Failed to Initialize" << std:: endl;
return 1;
}
int rank = 0, size = 0;
// Get processors ID within the communicator
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
size_t message_len = 0;
char* buf = new char[message_len];
MPI_Status status;
// Pingpong between even and odd machines
if (rank & 1) { // Odd ranked machine will just pong
for (int i = 0; i < MESSAGE_COUNT; i++) {
MPI_Recv(buf, (int) message_len, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Send(buf, (int) message_len, MPI_CHAR, rank - 1, 0, MPI_COMM_WORLD);
}
}
else { // Even ranked machine will ping and time.
double start = MPI_Wtime();
for (int i = 0; i < MESSAGE_COUNT; i++) {
MPI_Send(buf, (int) message_len, MPI_CHAR, rank + 1, 0, MPI_COMM_WORLD);
MPI_Recv(buf, (int) message_len, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
}
double end = MPI_Wtime();
printf("Rank = %d, Message Length = %zu, end - start = %f\n", rank, message_len, end - start);
}
delete[] buf;
MPI_Finalize();
return 0;
}
This code sampler is used to learn MPI programming. The MPI package I use is MPICH2 1.3.1. The code below is my first step to learn MPI_Isend(), MPI_Irecv() and MPI_Wait(). The code has a master and several workers. Master receives data from workers while workers send data to master. As usual, the data size is very large, workers split data into trunks and send trunks sequentially. I use some tricks to overlap the computation and communication when sending trunks. The method is very simple, just keeping two buffers to hold two trunks for each sending cycle.
int test_mpi_wait_2(int argc, char* argv[])
{
int rank;
int numprocs;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
int trunk_num = 6;// assume there are six trunks
int trunk_size = 10000;// assume each trunk has 10,000 data points
if(rank == 0)
{
//allocate receiving buffer for all workers
int** recv_buf = new int* [numprocs];
for(int i=0;i<numprocs;i++)
recv_buf[i] = new int [trunk_size];
//collecting first trunk from all workers
MPI_Request* requests = new MPI_Request[numprocs];
for(int i=1;i<numprocs;i++)
MPI_Irecv(recv_buf[i], trunk_size, MPI_INT, i, 0, MPI_COMM_WORLD, &requests[i]);
//define send_buf counter used to record how many trunks have been collected
vector<int> counter(numprocs);
MPI_Status status;
//assume therer are N-1 workers, then the total trunks will be collected is (N-1)*trunk_num
for(int i=0;i<(numprocs-1)*trunk_num;i++)
{
//wait until receive one trunk from any worker
int active_index;
MPI_Waitany(numprocs-1, requests+1, &active_index, &status);
int request_index = active_index + 1;
int procs_index = active_index + 1;
//check wheather all trunks from this worker have been collected
if(++counter[procs_index] != trunk_num)
{
//receive next trunk from this worker
MPI_Irecv(recv_buf[procs_index], trunk_size, MPI_INT, procs_index, 0, MPI_COMM_WORLD, &requests[request_index]);
}
}
for(int i=0;i<numprocs;i++)
delete [] recv_buf[i];
delete [] recv_buf;
delete [] requests;
cout<<rank<<" done"<<endl;
}
else
{
//for each worker, the worker first fill one trunk and send it to master
//for efficiency, the computation of trunk and communication to master is overlapped.
//two buffers are allocated to implement the overlapped computation
int* send_buf[2];
send_buf[0] = new int [trunk_size];//Buffer A
send_buf[1] = new int [trunk_size];//Buffer B
MPI_Request requests[2];
//file first trunk
for(int i=0;i<trunk_size;i++)
send_buf[0][i] = 0;
//send this trunk
MPI_Isend(send_buf[0], trunk_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[0]);
if(trunk_num > 1)
{
//file second trunk
for(int i=0;i<trunk_size;i++)
send_buf[1][i] = i;
//send this trunk
MPI_Isend(send_buf[1], trunk_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[1]);
}
//for remained trunks, keep cycle until all trunks are sent
for(int i=2;i<trunk_num;i+=2)
{
//wait till trunk data at buffer A is sent
MPI_Wait(&requests[0], MPI_STATUS_IGNORE);
//fill buffer A with next trunk data
for(int j=0;j<trunk_size;j++)
send_buf[0][j] = j * i;
//send buffer A
MPI_Isend(send_buf[0], trunk_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[0]);
//if more trunks are remained, fill buffer B and sent it
if(i+ 1 < trunk_num)
{
MPI_Wait(&requests[1], MPI_STATUS_IGNORE);
for(int j=0;j<trunk_size;j++)
send_buf[1][j] = j * (i + 1);
MPI_Isend(send_buf[1], trunk_size, MPI_INT, 0, 0, MPI_COMM_WORLD, &requests[1]);
}
}
//wait until last two trunks have been sent
if(trunk_num == 1)
{
MPI_Wait(&requests[0], MPI_STATUS_IGNORE);
}
else
{
MPI_Wait(&requests[0], MPI_STATUS_IGNORE);
MPI_Wait(&requests[1], MPI_STATUS_IGNORE);
}
delete [] send_buf[0];
delete [] send_buf[1];
cout<<rank<<" done"<<endl;
}
MPI_Finalize();
return 0;
}
Not much of an answer but this compiles and runs on my version of MPI, with up to 4 processors. The code does seem a bit involved, but I also cannot see any reason why it should not work.
I see several obvious ones: some for loops are not terminated, some cout statements aren't terminated, etc. I believe the code wasn't formatted properly...