MPI end program with Broadcast when some process finds a solution - c++

I am having problems with ending my program using MS-MPI.
All return values seems fine but I have to ctrl + c in cmd to end it (it doesn't look like it's still computing so the exit condition looks fine).
I want to run a program using N processes. When one of them finds a solution, it should set flag as false, send it for all others and then in next iteration they shall all stop and the program ends.
The program actually does some more advanced calculations and I'm working on simplified version for clarity. I just wanted to make sure that communication works.
int main(int argc, char* argv[])
{
MPI_Init(&argc, &argv);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
//sets as 0 -> (N-1) depending on number of processes running
int c = world_rank;
bool flag = true;
while (flag) {
std::cout << "process: " << world_rank << " value: " << c << std::endl;
c += world_size;
//dummy condition just to test stop
if (c == 13) {
flag = false;
}
MPI_Barrier(MPI_COMM_WORLD);
//I have also tried using MPI_Bcast without that if
if(!flag) MPI_Bcast(&flag, 1, MPI_C_BOOL, world_rank, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
} //end of while
MPI_Finalize();
return 0;
}
How I think it works:
it starts with every process defining its c and flag, then on each (while) pass it increments its c by a fixed number. Then when it gets to stop condition it sets flag as false and sends it to all remaining processes. What I get when I run it with 4 processes:
process: 0 value: 0
process: 2 value: 2
process: 1 value: 1
process: 3 value: 3
process: 1 value: 5
process: 3 value: 7
process: 0 value: 4
process: 2 value: 6
process: 3 value: 11
process: 1 value: 9
process: 2 value: 10
process: 0 value: 8
process: 3 value: 15
process: 2 value: 14
process: 0 value: 12
(I am fine with those few extra values)
But after that I have to manually terminate it with ctrl + c. When running on 1 process it gets smoothly from 1 to 12 and exits.

MPI_Bcast() is a collective operation, and all the ranks of the communicator have to use the same value for the root argument (in your program, they all use a different value).
A valid approach (though unlikely the optimal one) is to send a termination message to rank 0, update flag accordingly and have all the ranks call MPI_Bcast(..., root=0, ...).

Related

MPI stopped working on multiple cores suddenly

This piece of code was working fine before with mpi
#include <mpi.h>
#include <iostream>
using namespace std;
int id, p;
int main(int argc, char* argv[])
{
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &id);
MPI_Comm_size(MPI_COMM_WORLD, &p);
cout << "Processor " << id << " of " << p << endl;
cout.flush();
MPI_Barrier(MPI_COMM_WORLD);
if (id == 0) cout << "Every process has got to this point now!" << endl;
MPI_Finalize();
}
Giving the output:
Processor 0 of 4
Processor 1 of 4
Processor 2 of 4
Processor 3 of 4
Every process has got to this point now!
When run on 4 cores with the command mpiexec -n 4 ${executable filename}$
I restarted my laptop (i'm not sure if this is the cause) and ran the same code and it outputs on one core:
Processor 0 of 1
Every process has got to this point now!
Processor 0 of 1
Every process has got to this point now!
Processor 0 of 1
Every process has got to this point now!
Processor 0 of 1
Every process has got to this point now!
I'm using the microsoft mpi and the project configurations haven't changed.
I'm not really sure what to do about this.
I also installed intel parallel studio and integrated it with visual studio before restarting.
But i'm still compiling with Visual c++ (Same configurations as of when it was working fine)
The easy fix was to uninstall intel parallel studio

MPI to generate first 20 numbers

Here is code that I was trying to generate the first 20 numbers starting from 0 in an attempt to learn MPI.
My code is given below :
#include <mpi.h>
#include <stdio.h>
int i = 0;
void test(int edge_count){
while(i < edge_count){
printf("Edge count %d\n",i);
i++;
}
}
int main(int argc, char** argv) {
int edge_count = 20;
// int *p = &i;
// Initialize the MPI environment. The two arguments to MPI Init are not
// currently used by MPI implementations, but are there in case future
// implementations might need the arguments.
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d out of %d processors\n",
processor_name, world_rank, world_size);
test(edge_count);
printf("The value of i is %d \n",i);
// Finalize the MPI environment. No more MPI calls can be made after this
MPI_Finalize();
}
My output is :
Hello world from processor ENG401651, rank 0 out of 2 processors
Edge count 0
Edge count 1
Edge count 2
Edge count 3
Edge count 4
Edge count 5
Edge count 6
Edge count 7
Edge count 8
Edge count 9
Edge count 10
Edge count 11
Edge count 12
Edge count 13
Edge count 14
Edge count 15
Edge count 16
Edge count 17
Edge count 18
Edge count 19
The value of i is 20
Hello world from processor ENG401651, rank 1 out of 2 processors
Edge count 0
Edge count 1
Edge count 2
Edge count 3
Edge count 4
Edge count 5
Edge count 6
Edge count 7
Edge count 8
Edge count 9
Edge count 10
Edge count 11
Edge count 12
Edge count 13
Edge count 14
Edge count 15
Edge count 16
Edge count 17
Edge count 18
Edge count 19
The value of i is 20
The code that I used to run it is:
mpirun -np 2 execFile
I was expecting that both the processor would communicate and generate a number from 0 to 19 only once but it seems like each of the processor are generating their own set of numbers independently.
What am I doing wrong? I am new to MPI and couldnot figure out what is the reason behind this.
Computers only do what you tell them to. This is true not just of MPI but of any kind of programming.
Where in your script did you explicitly tell the processors to divide the work between them? The thing is, you didn't. And it won't happen by automagically.
The following modified version of your code shows how you can use world_size and world_rank to have each process independently calculate what share of the work it should perform.
To better demonstrate the gains of parallelism, I use thread sleeping to simulate the time that would otherwise be taken by work in an actual implementation.
#include <mpi.h>
#include <stdio.h>
#include <chrono>
#include <thread>
void test(int start, int end){
for(int i=start;i<end;i++){
printf("Edge count %d\n",i);
//Simulates complicated, time-consuming work
std::this_thread::sleep_for(std::chrono::milliseconds(500));
}
}
int main(int argc, char** argv) {
int edge_count = 20;
// int *p = &i;
// Initialize the MPI environment. The two arguments to MPI Init are not
// currently used by MPI implementations, but are there in case future
// implementations might need the arguments.
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d out of %d processors\n",
processor_name, world_rank, world_size);
const int interval = edge_count/world_size;
const int iter_start = world_rank*interval;
const int iter_end = (world_rank+1)*interval;
test(iter_start, iter_end);
// Finalize the MPI environment. No more MPI calls can be made after this
MPI_Finalize();
}

Bizarre deadlock in MPI_Allgather

After much Googling, I have no idea what's causing this issue. Here it is:
I have a simple call to MPI_Allgather in my code which I have double, triple, and quadruple-checked to be correct (send/receive buffers are properly sized; the send/receive sizes in the call are correct), but for 'large' numbers of processes I get either a deadlock or an MPI_ERR_TRUNCATE. The communicator being used for the Allgather is split from MPI_COMM_WORLD using MPI_Comm_split. For my current testing, rank 0 goes to one communicator, and the remaining ranks go to a second communicator. For 6 total ranks or less, the Allgather works just fine. If I use 7 ranks, I get an MPI_ERR_TRUNCATE. 8 ranks, deadlock. I have verified that the communicators were split correctly (MPI_Comm_rank and MPI_Comm_size is correct on all ranks for both Comms).
I have manually verified the size of each send and receive buffer, and the maximal number of receives. My first workaround was to swap the MPI_Allgather for a for-loop of MPI_Gather's to each process. This worked for that one case, but changing the meshes given to my code (CFD grids being partitioned using METIS) brought the problem back. Now my solution, which I haven't been able to break (yet), is to replace the Allgather with an Allgatherv, which I suppose is more efficient anyways since I have a different number of pieces of data being sent from each process.
Here's the (I hope) relevant offending code in context; if I've missed something, the Allgather in question is on line 599 of this file.
// Get the number of mpiFaces on each processor (for later communication)
// 'nProgGrid' is the size of the communicator 'gridComm'
vector<int> nMpiFaces_proc(nProcGrid);
// This MPI_Allgather works just fine, every time
// int nMpiFaces is assigned on preceding lines
MPI_Allgather(&nMpiFaces,1,MPI_INT,nMpiFaces_proc.data(),1,MPI_INT,gridComm);
int maxNodesPerFace = (nDims==2) ? 2 : 4;
int maxNMpiFaces = getMax(nMpiFaces_proc);
// The matrix class is just a fancy wrapper around std::vector that
// allows for (i,j) indexing. The getSize() and getData() methods just
// call the size() and data() methods, respectively, of the underlying
// vector<int> object.
matrix<int> mpiFaceNodes_proc(nProcGrid,maxNMpiFaces*maxNodesPerFace);
// This is the MPI_Allgather which (sometimes) doesn't work.
// vector<int> mpiFaceNodes is assigned in preceding lines
MPI_Allgather(mpiFaceNodes.data(),mpiFaceNodes.size(),MPI_INT,
mpiFaceNodes_proc.getData(),maxNMpiFaces*maxNodesPerFace,
MPI_INT,gridComm);
I am currently using OpenMPI 1.6.4, g++ 4.9.2, and an AMD FX-8350 8-core processor with 16GB of RAM, running the latest updates of Elementary OS Freya 0.3 (basically Ubuntu 14.04). However, I have also had this issue on another machine using CentOS, Intel hardware, and MPICH2.
Any ideas? I have heard that it could be possible to change MPI's internal buffer size(s) to fix similar issues, but a quick try to do so (as shown in http://www.caps.ou.edu/pipermail/arpssupport/2002-May/000361.html) had no effect.
For reference, this issue is very similar to the one shown here: https://software.intel.com/en-us/forums/topic/285074, except that in my case, I have only 1 processor with 8 cores, on a single desktop computer.
UPDATE
I've managed to put together a minimalist example of this failure:
#include <iostream>
#include <vector>
#include <stdlib.h>
#include <time.h>
#include "mpi.h"
using namespace std;
int main(int argc, char* argv[])
{
MPI_Init(&argc,&argv);
int rank, nproc, newID, newRank, newSize;
MPI_Comm newComm;
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&nproc);
newID = rank%2;
MPI_Comm_split(MPI_COMM_WORLD,newID,rank,&newComm);
MPI_Comm_rank(newComm,&newRank);
MPI_Comm_size(newComm,&newSize);
srand(time(NULL));
// Get a different 'random' number for each rank on newComm
//int nSend = rand()%10000;
//for (int i=0; i<newRank; i++) nSend = rand()%10000;
/*! -- Found a set of #'s which fail for nproc=8: -- */
int badSizes[4] = {2695,7045,4256,8745};
int nSend = badSizes[newRank];
cout << "Comm " << newID << ", rank " << newRank << ": nSend = " << nSend << endl;
vector<int> send(nSend);
for (int i=0; i<nSend; i++)
send[i] = rand();
vector<int> nRecv(newSize);
MPI_Allgather(&nSend,1,MPI_INT,nRecv.data(),1,MPI_INT,newComm);
int maxNRecv = 0;
for (int i=0; i<newSize; i++)
maxNRecv = max(maxNRecv,nRecv[i]);
vector<int> recv(newSize*maxNRecv);
MPI_Barrier(MPI_COMM_WORLD);
cout << "rank " << rank << ": Allgather-ing data for communicator " << newID << endl;
MPI_Allgather(send.data(),nSend,MPI_INT,recv.data(),maxNRecv,MPI_INT,newComm);
cout << "rank " << rank << ": Done Allgathering-data for communicator " << newID << endl;
MPI_Finalize();
return 0;
}
The above code was compiled and run as:
mpicxx -std=c++11 mpiTest.cpp -o mpitest
mpirun -np 8 ./mpitest
with the following output on both my 16-core CentOS and my 8-core Ubuntu machines:
Comm 0, rank 0: nSend = 2695
Comm 1, rank 0: nSend = 2695
Comm 0, rank 1: nSend = 7045
Comm 1, rank 1: nSend = 7045
Comm 0, rank 2: nSend = 4256
Comm 1, rank 2: nSend = 4256
Comm 0, rank 3: nSend = 8745
Comm 1, rank 3: nSend = 8745
rank 5: Allgather-ing data for communicator 1
rank 6: Allgather-ing data for communicator 0
rank 7: Allgather-ing data for communicator 1
rank 0: Allgather-ing data for communicator 0
rank 1: Allgather-ing data for communicator 1
rank 2: Allgather-ing data for communicator 0
rank 3: Allgather-ing data for communicator 1
rank 4: Allgather-ing data for communicator 0
rank 5: Done Allgathering-data for communicator 1
rank 3: Done Allgathering-data for communicator 1
rank 4: Done Allgathering-data for communicator 0
rank 2: Done Allgathering-data for communicator 0
Note that only 2 of the ranks from each communicator exit the Allgather; this isn't what happens in my actual code (no ranks on the 'broken' communicator exit the Allgather), but the end result is the same - the code hangs until I kill it.
I'm guessing this has something to do with the differing number of sends on each process, but as far as I can tell from the MPI documentation and tutorials I've seen, this is supposed to be allowed, correct? Of course, the MPI_Allgatherv is a little more applicable, but for reasons of simplicity I have been using Allgather instead.
You must use MPI_Allgatherv if the input counts are not identical across all processes.
To be precise, what must match is the type signature count,type, since technically you can get to the same fundamental representation with different datatypes (e.g. N elements vs 1 element that is a contiguous type of N elements), but if you use the same argument everywhere, which is the common usage of MPI collectives, then your counts must match everywhere.
The relevant portion of the latest MPI standard (3.1) is on page 165:
The type signature associated with sendcount, sendtype, at a process
must be equal to the type signature associated with recvcount,
recvtype at any other process.

MPI Segmentation fault

I am using MPI to run a program in parallel and measure the execution time. I am currently splitting the computation between each process by giving a start and end index as a parameter in the "voxelise" function. This will then work on different sections of the data set and store the result in "p_voxel_data".
I then want to send all of these sub arrays to the root process using "MPI_Gather" so the data can be written to a file and the timer stopped.
The program executes fine when i have the "MPI_Gather" line commented out, i get output similar to this:
Computing time: NODE 3 = 1.07 seconds.
Computing time: NODE 2 = 1.12 seconds.
But when that line is included i get
"APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
And also the computing time for the root node 0 shows up as a minus number "-1.40737e+08"
Can anyone suggest any issues in my call to MPI_Gather?
int main(int argc, char** argv)
//-----------------------------
{
int rank;
int nprocs;
MPI_Comm comm;
MPI::Init(argc, argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
/* Set up data for voxelise function */
. . . . . .
clock_t start(clock());
// Generate the density field
voxelise(density_function,
a,
b,
p_control_point_set,
p_voxel_data,
p_number_of_voxel,
p_voxel_size,
p_centre,
begin,
endInd );
std::vector<float> completeData(512);
std::vector<float> cpData(toProcess);
std::copy(p_voxel_data.begin() + begin, p_voxel_data.begin() + endInd, cpData.begin());
MPI_Gather(&cpData, toProcess, MPI::FLOAT, &completeData, toProcess, MPI::FLOAT, 0, MPI_COMM_WORLD);
// Stop the timer
clock_t end(clock());
float number_of_seconds(float(end - start) / CLOCKS_PER_SEC);
std::cout << "Computing time:\t" << "NODE " << rank << " = " << number_of_seconds << " seconds." <<std::endl;
if(rank == 0) {
MPI::Finalize();
return (EXIT_SUCCESS);
}
You are giving MPI_Gather address to vector object, not address to vector data.
You must do:
MPI_Gather(&cpData[0], toProcess, MPI::FLOAT, &completeData[0], ...
Of course you have to make sure sizes are correct too.

MPI: How to start three functions which will be executed in different threads

I have 3 function and 4 cores. I want execute each function in new thread using MPI and C++
I write this
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
size--;
if (rank == 0)
{
Thread1();
}
else
{
if(rank == 1)
{
Thread2();
}
else
{
Thread3();
}
}
MPI_Finalize();
But it execute just Thread1(). How i must change code?
Thanks!
Print to screen the current value of variable size (possibly without decrementing it) and you will find 1. That is: "there is 1 process running".
You are likely running your compiled code the wrong way. Consider to use mpirun (or mpiexec, depending on your MPI implementation) to execute it, i.e.
mpirun -np 4 ./MyCompiledCode
the -np parameter specifies the number of processes you will start (doing so, your MPI_Comm_size will be 4 as you expect).
Currently, though, you are not using anything explicitly owing to C++. You can consider some C++ binding of MPI such as Boost.MPI.
I worked a little bit on the code you provided. I changed it a little bit producing this working mpi code (I provided some needed correction in capital letters).
FYI:
compilation (under gcc, mpich):
$ mpicxx -c mpi1.cpp
$ mpicxx -o mpi1 mpi1.o
execution
$ mpirun -np 4 ./mpi1
output
size is 4
size is 4
size is 4
2 function started.
thread2
3 function started.
thread3
3 function ended.
2 function ended.
size is 4
1 function started.
thread1
1 function ended.
be aware that stdout is likely messed out.
Are you sure you are compiling your code the right way?
You problem is that MPI provides no way to feed console input into many processes but only into process with rank 0. Because of the first three lines in main:
int main(int argc, char *argv[]){
int oper;
std::cout << "Enter Size:";
std::cin >> oper; // <------- The problem is right here
Operations* operations = new Operations(oper);
int rank, size;
MPI_Init(&argc, &argv);
int tid;
MPI_Comm_rank(MPI_COMM_WORLD, &tid);
switch(tid)
{
all processes but rank 0 block waiting for console input which they cannot get. You should rewrite the beginning of your main function as follows:
int main(int argc, char *argv[]){
int oper;
MPI_Init(&argc, &argv);
int tid;
MPI_Comm_rank(MPI_COMM_WORLD, &tid);
if (tid == 0) {
std::cout << "Enter Size:";
std::cin >> oper;
}
MPI_Bcast(&oper, 1, MPI_INT, 0, MPI_COMM_WORLD);
Operations* operations = new Operations(oper);
switch(tid)
{
It works as follows: only rank 0 displays the prompt and then reads the console input into oper. Then a broadcast of the value of oper from rank 0 is performed so all other processes obtain the correct value, create the Operations object and then branch to the appropriate function.