How to fix the MPI_Gatherv problem with 0xC0000005 - c++

First question in the community, if there is anything wrong about my format, please tell me.
I am using MPI_Gatherv to collect data. I may have to collect something like vector<vector>. I heard MPI_Gatherv can only do with vector, so I decide to send the data vector by vector. Below is the example of my idea. However, it failed in the MPI_Finalize(), and it said 0xC0000005. If I delete the MPI_Gatherv(&c,count.at(i),MPI_INT,&temre,&count.at(i),displs,MPI_INT,0,MPI_COMM_WORLD), it worked. I wonder whether it has something to do with the address conflicts.
Thanks for any help!
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
int gsize;
int myrank;
MPI_Comm_size(MPI_COMM_WORLD, &gsize);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
vector<vector<int>> a,reci;
vector<int> count(1,10);
vector<int> b(10,1),NU(1,0);
a.push_back(b);
reci.push_back(NU);
for ( int i = 0; i < myrank*3+2; i++ )
{
b.push_back(i);
count.push_back(11+i);
a.push_back(b);
reci.push_back(NU);
}
vector<int> c,temre(1,1);
int displs[1]={0};
for ( int i = 0; i < myrank*3+3; i++ )
{
c = a.at(i);
MPI_Gatherv(&c,count.at(i),MPI_INT,
&temre,&count.at(i),displs,MPI_INT,0,MPI_COMM_WORLD);
reci.at(i).swap(temre);
}
MPI_Finalize();
return 0;
}
Many thanks for any comments and answers.
After several days working, I found out the error. For vectors in MPI, you must use &c[0] instead of &c in
MPI_Gatherv(&c,count.at(i),MPI_INT, &temre,&count.at(i),displs,MPI_INT,0,MPI_COMM_WORLD);(also &temre[0] instead of &temre)
However, currently it can work in Serial, but not parallel. I am trying to fix the problem and put the executable code.
Thanks again for any help!

You have to realize that MPI is strictly a C library. So you can not directly send/recv with a std::vector as buffer. Your buffer has to be an int* or whatever type you use. So in your case: MPI_Gatherv(c.data(),...

Related

Limitation of data exchange using MPI c++

I am writing a MPI C++ code for data exchange, below is the sample code:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char **argv)
{
int size, rank;
int dest, tag, i;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("SIZE = %d RANK = %d\n",size,rank);
if ( rank == 0 )
{
double data[500];
for (i = 0; i < 500; i++)
{
data[i] = i;
}
dest = 1;
tag = 1;
MPI_Send(data,500,MPI_DOUBLE,dest,tag, MPI_COMM_WORLD );
}
MPI_Finalize();
return(0);
}
Looks like 500 is the maximum that I can send. If the number of data increases to 600, the code seems to stop at "MPI_SEND" without further progress. I am suspecting is there any limitation for the data be transferred using MPI_SEND. Can you someone enlighten me?
Thanks in advance,
Kan
Long story short, your program is incorrect and you are lucky it did not hang with a small count.
Per the MPI standard, you cannot assume MPI_Send() will return unless a matching MPI_Recv() has been posted.
From a pragmatic point of view, "short" messages are generally sent in eager mode, and MPI_Send() likely returns immediately.
On the other hand, "long" messages usually involve a rendez-vous protocol, and hence hang until a matching receive have been posted.
"small" and "long" depend on several factor, including the interconnect you are using. And once again, you cannot assume MPI_Send() will always return immediately if you send messages that are small enough.

Simple MPI_Gather test with memcpy error

I am learning MPI, and trying to create examples of some of the functions. I've gotten several to work, but I am having issues with MPI_Gather. I had a much more complex fitting test, but I trimmed it down to the most simple code. I am still, however, getting the following error:
root#master:/home/sgeadmin# mpirun ./expfitTest5
Assertion failed in file src/mpid/ch3/src/ch3u_request.c at line 584: FALSE
memcpy argument memory ranges overlap, dst_=0x1187e30 src_=0x1187e40 len_=400
internal ABORT - process 0
I am running one master instance and two node instances through AWS EC2. I have all the appropriate libraries installed, as I've gotten other MPI examples to work. My program is:
int main()
{
int world_size, world_rank;
int nFits = 100;
double arrCount[100];
double *rBuf = NULL;
MPI_Init(NULL,NULL);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
assert(world_size!=1);
int nElements = nFits/(world_size-1);
if(world_rank>0){
for(int k = 0; k < nElements; k++)
{
arrCount[k] = k;
}}
MPI_Barrier(MPI_COMM_WORLD);
if(world_rank==0)
{
rBuf = (double*) malloc( nFits*sizeof(double));
}
MPI_Gather(arrCount, nElements, MPI_DOUBLE, rBuf, nElements, MPI_DOUBLE, 0, MPI_COMM_WORLD);
if(world_rank==0){
for(int i = 0; i < nFits; i++)
{
cout<<rBuf[i]<<"\n";
}}
MPI_Finalize();
exit(0);
}
Is there something I am not understanding in malloc or MPI_Gather? I've compared my code to other samples, and can't find any differences.
The root process in a gather operation does participate in the operation. I.e. it sends data to it's own receive buffer. That also means you must allocate memory for it's part in the receive buffer.
Now you could use MPI_Gatherv and specify a recvcounts[0]/sendcount at root of 0 to follow your example closely. But usually you would prefer to write an MPI application in a way that the root participates equally in the operation, i.e. int nElements = nFits/world_size.

Multithreading set number of threads via commandline arguments or

Edit/Solved: Joachim Pileborg's answer did the job for me. THX
Please be gentle as this is my first question.
I am actual lerning and playing with c++ in particular threading. I looked for an answer (and it would astonish me if there is not allready one out there, but i wasn't able to find it).
So back to topic:
My "play" code looks something like this (Console application)
void foo(){
//do something
}
int _tmain(int argc, _TCHAR* argv[])
{
std::thread t[threadcount];
for (int i = 0; i < threadcount; ++i) {
t[i] = std::thread(foo);
}
for (int i = 0; i < threadcount; ++i) {
t[i].join();
}
}
Is it possible to set the value of threadcount through argv?
If not could someone please give me a short snippet on how to implement
std::thread::hardware_concurrency()
as the threadcount, because also there Visualstudio gives me an error when setting
const int threadcount = std::thread::hardware_concurrency();
Thanks in advance.
As the number of threas is to be controlled by threadcount, setting it from the command line can be implemented by adding
int threadcount = atoi(argv[1]);
to the implementation. Some error checking could be done, e.g. reporting an error on a non-positive number of threads.
If the number of threads is to be determined programmatically, depending on the specific platform, this question could be interesting.

MPI_Comm_Spawn called multiple times

We are writing a code to solve non linear problem using an iterative method (Newton). Anyway, the problem is that we don't know a priori how many MPI processes will be needed from one iteration to another, due to e.g. remeshing, adaptivity, etc. And there is quite a lot of iterations...
We hence would like to use MPI_Comm_Spawn at each iteration to create as much MPI process as we need, gather the results and "destroy" the subprocesses. We know this limits the scalability of the code due to the gathering of information, however, we have been asked to do it :)
I did a couple of tests of MPI_Comm_Spawn on my laptop (on windows 7/64bit) using intel MPI and Visual Studio express 2013. I tried these simple codes
//StackMain
#include <iostream>
#include <mpi.h>
#include<vector>
int main(int argc, char *argv[])
{
int ierr = MPI_Init(&argc,& argv);
for (int i = 0; i < 10000; i++)
{
std::cout << "Loop number "<< i << std::endl;
MPI_Comm children;
std::vector<int> err(4);
ierr = MPI_Comm_spawn("StackWorkers.exe", NULL, 4, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &children, &err[0]);
MPI_Barrier(children);
MPI_Comm_disconnect(&children);
}
ierr = MPI_Finalize();
return 0;
}
And the program launched by the spawned processes:
//StackWorkers
#include <mpi.h>
int main(int argc, char *argv[])
{
int ierr = MPI_Init(&argc,& argv);
MPI_Comm parent;
ierr = MPI_Comm_get_parent(&parent);
MPI_Barrier(parent);
ierr = MPI_Finalize();
return 0;
}
The program is launched using one MPI process:
mpiexec -np 1 StackMain.exe
It seems to work, I do have however some questions...
1- The program freezes during iteration 4096, this number do not change if I relaunch the program. If during each iteration I launch 2 times 4 process, then it will stop at iteration 2048th...
Is it a limitation from the operating system ?
2- When I look at the memory occupied by "mpiexec" during the program, it grows continuously (never going down). Do you know why ? I though that, when subprocess finnished their job, they would release the memory they used...
3- Should I disconnect/free the children communicator or not ? If yes, MPI_Disconnect(...) must be called on both spawned and spawnee processes ? Or only spawnee ?
Thanks a lot!

Segmentation fault of an MPI program

I am writing a program with c++ that uses MPI. The simplified version of my code is
#include <iostream>
#include <fstream>
#include <cstdlib>
#include <mpi.h>
#define RNumber 3000000 //Number of loops to go
using namespace std;
class LObject {
/*Something here*/
public:
void FillArray(long * RawT){
/*Does something*/
for (int i = 0; i < RNumber; i++){
RawT[i] = i;
}
}
};
int main() {
int my_rank;
int comm_sz;
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
LObject System;
long rawT[RNumber];
long * Times = NULL;
if (my_rank == 0) Times = (long*) malloc(comm_sz*RNumber*sizeof(long));
System.FillArray(rawT);
if (my_rank == 0) {
MPI_Gather(rawT, RNumber, MPI_LONG, Times, RNumber,
MPI_LONG, 0, MPI_COMM_WORLD);
}
else {
MPI_Gather(rawT, RNumber, MPI_LONG, Times, RNumber,
MPI_LONG, 0, MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
};
The program compiles fine, but gives a Segmentation fault error on execution. The message is
=================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
When I reduce the RNumber the program works fine. Maybe somebody could explain what precisely goes wrong? Am I trying to allocate too much space for an array? If that's the case, will this problem be solved by storing the results in a file instead of an array?
If it is possible, could you please give broad comments on the things I do wrong.
Thank you for you time and effort!
A couple of possible issues:
long rawT[RNumber];
That's rather a large array to be putting on the stack. There is usually a limit to stack size (especially in a multithreaded program), and a typical size is one or two megabytes. You'd be better off with a std::vector<long> here.
Times = (long*) malloc(comm_sz*RNumber*sizeof(long));
You should check that the memory allocation succeeded. Or better still, use std::vector<long> here as well (which will also fix your memory leak).
if (my_rank == 0) {
// do stuff
} else {
// do exactly the same stuff
}
I'm guessing the else block should do something different; in particular, something that doesn't involve Times, since that is null unless my_rank == 0.
UPDATE: to use a vector instead of a raw array, just initialise it with the size you want, and then use a pointer to the first element where you would use a (pointer to) the array:
std::vector<long> rawT(RNumber);
System.FillArray(&rawT[0]);
std::vector<long> Times(comm_sz*RNumber);
MPI_Gather(&rawT[0], RNumber, MPI_LONG, &Times[0], RNumber,
MPI_LONG, 0, MPI_COMM_WORLD);
Beware that the pointer will be invalidated if you resize the vector (although you won't need to do that if you're simply using it as a replacement for an array).
You may want to check what comes back from
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
e.g. comm_sz==0 would cause this issue.
You are not checking the return value from malloc. Considering that you are attempting to allocate over three million longs, it is quite plausible that malloc would fail.
This might not be what is causing your problem though.