example of MPI thread in c++ - c++

I have install MPI and found simple example of thread that I'm don't fully understand. Can I get a simple example of MPI thread that prints something. Thanks
MPI_Init(&argc, &argv);
int errs = 0;
int provided, flag, claimed;
MPI_Init_thread(0, 0, MPI_THREAD_MULTIPLE, &provided); //what does it MPI_THREAD_MULTIPLE mean
MPI_Is_thread_main(&flag); //why there is flag here
if (!flag) {
errs++; //why there is counter for errs
printf("..."); fflush(stdout); // doesn't print anything
}
MPI_Finalize();
//return errs; I get error here

Related

Meaning of shown methods

in one constructor that I am analysing are the methods below:
if (validParOptions.found(optionName))
{
parRunControl_.runPar(argc, argv);
break; //leave loop
}
with
ParRunControl parRunControl_ //- Switch on/off parallel mode.
and
void runPar(int& argc, char**& argv)
{
RunPar = true; //bool RunPar;
if (!Pstream::init(argc, argv))
{
Info<< "Failed to start parallel run" << endl;
Pstream::exit(1);
}
}
and herein
bool Foam::UPstream::init(int& argc, char**& argv) //Spawns slave processes and
{ //initialises inter-communication
FatalErrorIn("UPstream::init(int& argc, char**& argv)")
{
<< "Trying to use the dummy Pstream library." << nl
<< "This dummy library cannot be used in parallel mode"
<< Foam::exit(FatalError);
return false;
}
Within the first if condition the existance of options of commandlinearguments are checked and like the description of the last method init tells a slave process should be spawned
und inter-communication should be initialised. Two quesitons:
I don't see where in method init a process is spawned. Rather I only see a
error message within the method. Am I missing something?
Do Options in commandlinearguments generally spawn slave process?
greetings streight
I'm assuming it's a macro because it seems to have lots of different invocations. However, the error message is staring you right in the face: it won't let you use that code.
Take a look at the other source files:
https://github.com/OpenFOAM/OpenFOAM-2.2.x/blob/95dc52c102041058f0bcfc8b6aab6b41b20dc313/src/Pstream/dummy/UPstream.C
https://github.com/OpenFOAM/OpenFOAM-2.2.x/blob/95dc52c102041058f0bcfc8b6aab6b41b20dc313/src/Pstream/dummy/UOPwrite.C
https://github.com/OpenFOAM/OpenFOAM-2.2.x/blob/95dc52c102041058f0bcfc8b6aab6b41b20dc313/src/Pstream/dummy/UIPread.C
They either have empty definitions or contain notImplemented. The big hint is that they all lie in the dummy directory.
My guess is you're probably pulling from the wrong headers. Take a look at this:
https://github.com/OpenFOAM/OpenFOAM-2.2.x/blob/master/src/Pstream/mpi/UPstream.C
It actually has code:
bool Foam::UPstream::init(int& argc, char**& argv)
{
MPI_Init(&argc, &argv);
int numprocs;
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myProcNo_);
/* SNIP */
return true;
}

MPI Slave processes hang when there is no more work

I have a serial C++ program that I wish to parallelize. I know the basics of MPI, MPI_Send, MPI_Recv, etc. Basically, I have a data generation algorithm that runs significantly faster than the data processing algorithm. Currently they run in series, but I was thinking that running the data generation in the root process, having the data processing done on the slave processes, and sending a message from the root to a slave containing the data to be processed. This way, each slave processes a data set and then waits for its next data set.
The problem is that, once the root process is done generating data, the program hangs because the slaves are waiting for more.
This is an example of the problem:
#include "mpi.h"
#include <cassert>
#include <cstdio>
class Generator {
public:
Generator(int min, int max) : value(min - 1), max(max) {}
bool NextValue() {
++value;
return value < max;
}
int Value() { return value; }
private:
int value, max;
Generator() {}
Generator(const Generator &other) {}
Generator &operator=(const Generator &other) { return *this; }
};
long fibonnaci(int n) {
assert(n > 0);
if (n == 1 || n == 2) return 1;
return fibonnaci(n-1) + fibonnaci(n-2);
}
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
int rank, num_procs;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
if (rank == 0) {
Generator generator(1, 2 * num_procs);
int proc = 1;
while (generator.NextValue()) {
int value = generator.Value();
MPI_Send(&value, 1, MPI_INT, proc, 73, MPI_COMM_WORLD);
printf("** Sent %d to process %d.\n", value, proc);
proc = proc % (num_procs - 1) + 1;
}
} else {
while (true) {
int value;
MPI_Status status;
MPI_Recv(&value, 1, MPI_INT, 0, 73, MPI_COMM_WORLD, &status);
printf("** Received %d from process %d.\n", value, status.MPI_SOURCE);
printf("Process %d computed %d.\n", rank, fibonnaci(2 * (value + 10)));
}
}
MPI_Finalize();
return 0;
}
Obviously not everything above is "good practice", but it is sufficient to get the point across.
If I remove the while(true) from the slave processes, then the program exits when each of the slaves have exited. I would like the program to exit only after the root process has done its job AND all of the slaves have processed everything that has been sent.
If I knew how many data sets would be generated, I could have that many process running and everything would exit nicely, but that isn't the case here.
Any suggestions? Is there anything in the API that will do this? Could this be solved better with a better topology? Would MPI_Isend or MPI_IRecv do this better? I am fairly new to MPI so bear with me.
Thanks
The usual practice is to send to all worker processes an empty message with a special tag that signals them to exit the infinite processing loop. Let's say this tag is 42. You would do something like that in the worker loop:
while (true) {
int value;
MPI_Status status;
MPI_Recv(&value, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (status.MPI_TAG == 42) {
printf("Process %d exiting work loop.\n", rank);
break;
}
printf("** Received %d from process %d.\n", value, status.MPI_SOURCE);
printf("Process %d computed %d.\n", rank, fibonnaci(2 * (value + 10)));
}
The manager process would do something like this after the generator loop:
for (int i = 1; i < num_procs; i++)
MPI_Send(&i, 0, MPI_INT, i, 42, MPI_COMM_WORLD);
Regarding your next question. Using MPI_Isend() in the master process would deserialise the execution and increase the performance. The truth however is that you are sending very small messages and those are typically internally buffered (WARNING - implementation dependent!) so your MPI_Send() is actually non-blocking and you already have non-serial execution. MPI_Isend() returns an MPI_Request handle that you need to take care of later. You could either wait for it to finish with MPI_Wait() or MPI_Waitall() but you could also just call MPI_Request_free() on it and it will be automatically freed when the operation is over. This is usually done when you'd like to send many messages asynchronously and would not care on when the sends will be completed, but it's a bad practice nevertheless since having a large number of outstanding requests can consume lots of precious memory. As for the worker processes - they need the data in order to proceed with the computation so using MPI_Irecv() is not necessary.
Welcome to the wonderful world of MPI programming!

client/server application using MPI

i have two questions ; the first one is :
i'm gonna use msmpi and i meant by "only mpi" that we mustn't use sockets, my application is about a scalable distributed data structure; initially, we have a server contain a file which has a variable size (the size could be increased by insertions and decreased by deletion) and when the size of the file exceed certain limit the file will be splitted, the half remain in the first server and the second half will be moved to a new server and so on... and the client need to be always informed by the address of the data he want to retrieve so he should have an image of the split operation of the file. finally, i hope i make it clearer.
and the second one is:
i've tried to compile simple client/server application(the code source is bellow) with msmpi or mpich2 and it doesn't work and gives me the error message "fatal error in mpi_open_port() and other errors of stack", so i installed open mpi on ubunto 11.10, and tried to run the same example it worked with server side and it gave me a port name but on the client side it gave me the error message:
[user-Compaq-610:03833] [[39604,1],0] ORTE_ERROR_LOG: Not found in file ../../../../../../ompi/mca/dpm/orte/dpm_orte.c at line 155
[user-Compaq-610:3833] *** An error occurred in MPI_Comm_connect
[user-Compaq-610:3833] *** on communicator MPI_COMM_WORLD
[user-Compaq-610:3833] *** MPI_ERR_INTERN: internal error
[user-Compaq-610:3833] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 3833 on
node toufik-Compaq-610 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
so i'm confused what the problem is, and i spent a while trying to fix it,
i'd be greatfull if any body could help me with it, and thank u in advance.
the source code is here:
/* the server side */
#include <stdio.h>
#include <mpi.h>
main(int argc, char **argv)
{
int my_id;
char port_name[MPI_MAX_PORT_NAME];
MPI_Comm newcomm;
int passed_num;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
passed_num = 111;
if (my_id == 0)
{
MPI_Open_port(MPI_INFO_NULL, port_name);
printf("%s\n\n", port_name); fflush(stdout);
} /* endif */
MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &newcomm);
if (my_id == 0)
{
MPI_Send(&passed_num, 1, MPI_INT, 0, 0, newcomm);
printf("after sending passed_num %d\n", passed_num); fflush(stdout);
MPI_Close_port(port_name);
} /* endif */
MPI_Finalize();
exit(0);
} /* end main() */
and at the client side:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char **argv)
{
int passed_num;
int my_id;
MPI_Comm newcomm;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
MPI_Comm_connect(argv[1], MPI_INFO_NULL, 0, MPI_COMM_WORLD, &newcomm);
if (my_id == 0)
{
MPI_Status status;
MPI_Recv(&passed_num, 1, MPI_INT, 0, 0, newcomm, &status);
printf("after receiving passed_num %d\n", passed_num); fflush(stdout);
} /* endif */
MPI_Finalize();
return 0;
//exit(0);
} /* end main() */
How exactly do you run the application? It seems that provided client and server codes are the same.
Usually the code is the same for all MPI processes and program decides what to execute basing on rank as in this snippet if (my_id == 0) { ... }. The application is executed with mpiexec. For example mpiexec -n 2 ./application would run two MPI processes with ranks 1 and 2 in one MPI_COMM_WORLD communicator. Where exactly the prococesses would be executed (on the same node or on different ones) depends on configuration.
Nevertheless, you should create port with MPI_Open_port and the pass it to MPI_Comm_connect. Here is an example on how to use these functions: MPI_Comm_connect
Moreover, for MPI_Recv there must be corresponding MPI_Send. Otherwise receiving process would wait forever.

MPI communicator error

I had a problem with a program that uses MPI and I have just fixed it, however, I don't seem to understand what was wrong in the first place. I'm quite green with programming relates stuff, so please be forgiving.
The program is:
#include <iostream>
#include <cstdlib>
#include <mpi.h>
#define RNumber 3
using namespace std;
int main() {
/*Initiliaze MPI*/
int my_rank; //My process rank
int comm_sz; //Number of processes
MPI_Comm GathComm; //Communicator for MPI_Gather
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
/*Initialize an array for results*/
long rawT[RNumber];
long * Times = NULL; //Results from threads
if (my_rank == 0) Times = (long*) malloc(comm_sz*RNumber*sizeof(long));
/*Fill rawT with results at threads*/
for (int i = 0; i < RNumber; i++) {
rawT[i] = i;
}
if (my_rank == 0) {
/*Main thread recieves data from other threads*/
MPI_Gather(rawT, RNumber, MPI_LONG, Times, RNumber, MPI_LONG, 0, GathComm);
}
else {
/*Other threads send calculation results to main thread*/
MPI_Gather(rawT, RNumber, MPI_LONG, Times, RNumber, MPI_LONG, 0, GathComm);
}
/*Finalize MPI*/
MPI_Finalize();
return 0;
};
On execution the program returns the following message:
Fatal error in PMPI_Gather: Invalid communicator, error stack:
PMPI_Gather(863): MPI_Gather(sbuf=0xbf824b70, scount=3, MPI_LONG,
rbuf=0x98c55d8, rcount=3, MPI_LONG, root=0, comm=0xe61030) failed
PMPI_Gather(757): Invalid communicator Fatal error in PMPI_Gather:
Invalid communicator, error stack: PMPI_Gather(863):
MPI_Gather(sbuf=0xbf938960, scount=3, MPI_LONG, rbuf=(nil), rcount=3,
MPI_LONG, root=0, comm=0xa6e030) failed PMPI_Gather(757): Invalid
communicator
After I remove GathComm altogether and substitute it with MPI_COMM_WORLD default communicator everything works fine.
Could anyone be so kind to explain what was I doing wrong and how did this adjustment made everything work?
That's because GathComm has not been assigned a valid communicator. "MPI_Comm GathComm;" only declares the variable to hold a communicator but doesn't create one.
You can use the default communicator (MPI_COMM_WORLD) if you simply want to include all procs in the operation.
Custom communicators are useful when you want to organised your procs in separate groups or when using virtual communication topologies.
To find out more, check out this article which describes Groups, Communicator and Topologies.

global static boolean pointer causes segmentation fault using pthread

New to pthread programming, and stuck on this error when working on a C++&C mixed code.
What I have done is to call the c code in the thread created by the c++ code. There is a static boolean pointer is_center used in the thread and should got free when the thread finishes.
However I noticed that every time when the program processed into the c function, the value of the boolean pointer would be changed and the segmentation fault then happened due to the free(). And the problem only happens when the c code is used. Remove the c code and the multi-thread c++ part works well.
Detail code is as follows:
static bool *is_center;
// omit other codes in between ...
void streamCluster( PStream* stream)
{
// some code here ...
while(1){
// some code here ...
is_center = (bool*)calloc(points.num,sizeof(bool));
// start the parallel thread here.
// the c code is invoked in this function.
localSearch(&points,kmin, kmax,&kfinal); // parallel
free(is_center);
}
And the function using parallel is as follows (my c code is invoked in each thread):
void localSearch( Points* points, long kmin, long kmax, long* kfinal ) {
pthread_barrier_t barrier;
pthread_t* threads = new pthread_t[nproc];
pkmedian_arg_t* arg = new pkmedian_arg_t[nproc];
pthread_barrier_init(&barrier,NULL,nproc);
for( int i = 0; i < nproc; i++ ) {
arg[i].points = points;
arg[i].kmin = kmin;
arg[i].kmax = kmax;
arg[i].pid = i;
arg[i].kfinal = kfinal;
arg[i].barrier = &barrier;
pthread_create(threads+i,NULL,localSearchSub,(void*)&arg[i]);
}
for ( int i = 0; i < nproc; i++) {
pthread_join(threads[i],NULL);
}
delete[] threads;
delete[] arg;
pthread_barrier_destroy(&barrier);
}
Finally the function calling my c code:
void* localSearchSub(void* arg_) {
int eventSet = PAPI_NULL;
begin_papi_thread(&eventSet);
pkmedian_arg_t* arg= (pkmedian_arg_t*)arg_;
pkmedian(arg->points,arg->kmin,arg->kmax,arg->kfinal,arg->pid,arg->barrier);
end_papi_thread(&eventSet);
return NULL;
}
And from gdb, what I have got for the is_center is:
Breakpoint 2, localSearchSub (arg_=0x600000000000bc40) at streamcluster.cpp:1711
1711 end_papi_thread(&eventSet);
(gdb) s
Hardware watchpoint 1: is_center
Old value = (bool *) 0x600000000000bba0
New value = (bool *) 0xa93f3
0x400000000000d8d1 in localSearchSub (arg_=0x600000000000bc40) at streamcluster.cpp:1711
1711 end_papi_thread(&eventSet);
Any suggestions? Thanks in advance!
Some new information about the code: for the c code, I am using the PAPI package. I write my own papi wrapper to initialize and read system counters. The code is as follows:
void begin_papi_thread(int* eventSet)
{
int thread_id = pthread_self();
// Events
if (PAPI_create_eventset(eventSet)) {
PAPI_perror(return_value, error_string, PAPI_MAX_STR_LEN);
printf("*** ERROR *** Failed to create event set for thread %d: %s\n.", thread_id, error_string);
}
if((return_value = PAPI_add_events(*eventSet, event_code, event_num)) != PAPI_OK)
{
printf("*** ERROR *** Failed to add event for thread %d: %d.\n", thread_id, return_value);
}
// Start counting
if ((return_value = PAPI_start(*eventSet)) != PAPI_OK) {
PAPI_perror(return_value, error_string, PAPI_MAX_STR_LEN);
printf("*** ERROR *** PAPI failed to start the event for thread %d: %s.\n", thread_id, error_string);
}
}
void end_papi_thread(int* eventSet)
{
int thread_id = pthread_self();
int i;
long long * count_values = (long long*)malloc(sizeof(long long) * event_num);
if (PAPI_read(*eventSet, count_values) != PAPI_OK)
printf("*** ERROR *** Failed to load count values.\n");
if (PAPI_stop(*eventSet, &dummy_values) != PAPI_OK) {
PAPI_perror(return_value, error_string, PAPI_MAX_STR_LEN);
printf("*** ERROR *** PAPI failed to stop the event for thread %d: %s.\n", thread_id, error_string);
return;
}
if(PAPI_cleanup_eventset(*eventSet) != PAPI_OK)
printf("*** ERROR *** Clean up failed for the thread %d.\n", thread_id);
}
I don't think you've posted enough code to really understand your problem, but it looks suspicious that you've declared is_center global. I assume you're using it in more than one place, possibly by multiple threads (localSearchSub mentions it, which is your worker thread function).
If is_center is being read or written by multiple threads, you probably want to protect it with a pthread mutex. You say it is "freed when the thread finishes", but you should be aware that there are nprocs threads, and it looks like they're all working on an array of is_center[points] bools. If points != nproc, this could b e a bad thing[1]. Each thread should probably work on its own array, and localSearch should aggregate the results.
The xxx_papi_thread functions don't get any hits on Google, so I can only imagine it's your own... unlikely we'll be able to help you, if the problem is in there :)
[1]: Even if points == nproc, it is not necessarily OK to write to different elements of an array from multiple threads (it's compiler and processor dependent). Be safe, use a mutex.
Also, this is tagged C++. Can you replace the calloc and dynamic arrays (using new) with vectors? It might end up easier to debug, and it certainly ends up easier to maintain. Why do you hate and want to punish the readers of your code? ;)