How to stop and resume all processes by one process in MPI? - c++

In MPI what is the correct way that at a specific moment, one process (I call it process A) stops all processes doing the application tasks and then that process A resumes all? For example:
#include <stdio.h>
#include <mpi.h>
#include <unistd.h>
int main(int argc, char **argv)
{
int num_procs, my_rank, my_id;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
my_id = getpid();
printf("Hello! I'm process with rank %i out of %i processes and my id is %i\n",
my_rank, num_procs, my_id);
// here I want process A stops all processes
// ....
usleep(3000000); // sleep for 3 seconds
// here I want that process A resumes all processes
printf("Bye! I'm process %i out of %i processes and my id is %i\n",
my_rank, num_procs, my_id);
MPI_Finalize();
return 0;
}
I tried using kill(pid, SIGSTOP); and kill(pid, SIGCONT); but the problem is that the id of each process obtained through getpid() is required so that that process A can use those ids to stop and resume the processes. I tried to see if I can obtain the processes ids and then one process can access them. So, I modified the above code as follows.
#include <stdio.h>
#include <mpi.h>
#include <unistd.h>
#include <vector>
std::vector<int> ids {};
int main(int argc, char **argv)
{
int num_procs, my_rank;
ids.push_back(getpid());
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
if (my_rank == 0){
for (const int& i : ids) {
std::cout << i << "\n";
}
}
MPI_Finalize();
return 0;
}
But due to the parallelism of processes, it didn't work and for loop only prints out the id of the process with rank 0.
To rephrase my question, is there any way in MPI to stop and resumes processes?
UPDATE
Since I was asked in the comments why I need to do so, I add more explanations here. I intend to implement a coordinated checkpointing approach. It works as follows. One process as a coordinator at a specific time stops other processes from executing the main program tasks. Then each process takes its checkpoint and subsequently informs the coordinator process that checkpointing is done. When all processes finish taking their checkpoints, the coordinator resumes the other processes to carry on executing the main program tasks. The coordinator process can be one of the processes that execute the main program, and at checkpointing time, it also takes its checkpoint.

Related

MPI_Comm_Spawn called multiple times

We are writing a code to solve non linear problem using an iterative method (Newton). Anyway, the problem is that we don't know a priori how many MPI processes will be needed from one iteration to another, due to e.g. remeshing, adaptivity, etc. And there is quite a lot of iterations...
We hence would like to use MPI_Comm_Spawn at each iteration to create as much MPI process as we need, gather the results and "destroy" the subprocesses. We know this limits the scalability of the code due to the gathering of information, however, we have been asked to do it :)
I did a couple of tests of MPI_Comm_Spawn on my laptop (on windows 7/64bit) using intel MPI and Visual Studio express 2013. I tried these simple codes
//StackMain
#include <iostream>
#include <mpi.h>
#include<vector>
int main(int argc, char *argv[])
{
int ierr = MPI_Init(&argc,& argv);
for (int i = 0; i < 10000; i++)
{
std::cout << "Loop number "<< i << std::endl;
MPI_Comm children;
std::vector<int> err(4);
ierr = MPI_Comm_spawn("StackWorkers.exe", NULL, 4, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &children, &err[0]);
MPI_Barrier(children);
MPI_Comm_disconnect(&children);
}
ierr = MPI_Finalize();
return 0;
}
And the program launched by the spawned processes:
//StackWorkers
#include <mpi.h>
int main(int argc, char *argv[])
{
int ierr = MPI_Init(&argc,& argv);
MPI_Comm parent;
ierr = MPI_Comm_get_parent(&parent);
MPI_Barrier(parent);
ierr = MPI_Finalize();
return 0;
}
The program is launched using one MPI process:
mpiexec -np 1 StackMain.exe
It seems to work, I do have however some questions...
1- The program freezes during iteration 4096, this number do not change if I relaunch the program. If during each iteration I launch 2 times 4 process, then it will stop at iteration 2048th...
Is it a limitation from the operating system ?
2- When I look at the memory occupied by "mpiexec" during the program, it grows continuously (never going down). Do you know why ? I though that, when subprocess finnished their job, they would release the memory they used...
3- Should I disconnect/free the children communicator or not ? If yes, MPI_Disconnect(...) must be called on both spawned and spawnee processes ? Or only spawnee ?
Thanks a lot!

My C++ mpi code doesn't finish it's run times ..it stucks at the 27th run?

I'm working on ABC algorithm using MPI to optimize Rastrigin function. My code's structure goes as follows:
I defined the control parameters
I defined my variables and arrays
I wrote my functions.
and called them back in a for loop right in my main.
and here is the my main in which I think my problem is at, I have defined total run times of 30 but when I run it get stuck on the 27th run. I'm running it on 4 nodes but it gets stuck! Any help?
Here is my main code :
int main (int argc, char* argv[])
{
int iter,run,j;
double mean;
mean=0;
srand(time(NULL));
MPI_Init (&argc, &argv);
MPI_Comm_rank (MPI_COMM_WORLD, &myRank);
MPI_Comm_size (MPI_COMM_WORLD, &numProc);
for(run=0;run<runtime;run++)
{
if(myRank==0){
initial();
MemorizeBestSource();
}
for (iter=0;iter<maxCycle;iter++)
{
SendEmployedBees();
if(myRank==0){
CalculateProbabilities();
SendOnlookerBees();
MemorizeBestSource();
SendScoutBees();
}
}
if(myRank==master){
for(j=0;j<D;j++)
printf("GlobalParam[%d]: %f\n",j+1,GlobalParams[j]);
printf("%d. run: %e \n",run+1,GlobalMin);
GlobalMins[run]=GlobalMin;
mean=mean+GlobalMin;
}
}
if(myRank==master){
mean=mean/runtime;
printf("Means of %d runs: %e\n",runtime,mean);
getch();
MPI_Finalize ();
}
}

MPI: How to start three functions which will be executed in different threads

I have 3 function and 4 cores. I want execute each function in new thread using MPI and C++
I write this
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
size--;
if (rank == 0)
{
Thread1();
}
else
{
if(rank == 1)
{
Thread2();
}
else
{
Thread3();
}
}
MPI_Finalize();
But it execute just Thread1(). How i must change code?
Thanks!
Print to screen the current value of variable size (possibly without decrementing it) and you will find 1. That is: "there is 1 process running".
You are likely running your compiled code the wrong way. Consider to use mpirun (or mpiexec, depending on your MPI implementation) to execute it, i.e.
mpirun -np 4 ./MyCompiledCode
the -np parameter specifies the number of processes you will start (doing so, your MPI_Comm_size will be 4 as you expect).
Currently, though, you are not using anything explicitly owing to C++. You can consider some C++ binding of MPI such as Boost.MPI.
I worked a little bit on the code you provided. I changed it a little bit producing this working mpi code (I provided some needed correction in capital letters).
FYI:
compilation (under gcc, mpich):
$ mpicxx -c mpi1.cpp
$ mpicxx -o mpi1 mpi1.o
execution
$ mpirun -np 4 ./mpi1
output
size is 4
size is 4
size is 4
2 function started.
thread2
3 function started.
thread3
3 function ended.
2 function ended.
size is 4
1 function started.
thread1
1 function ended.
be aware that stdout is likely messed out.
Are you sure you are compiling your code the right way?
You problem is that MPI provides no way to feed console input into many processes but only into process with rank 0. Because of the first three lines in main:
int main(int argc, char *argv[]){
int oper;
std::cout << "Enter Size:";
std::cin >> oper; // <------- The problem is right here
Operations* operations = new Operations(oper);
int rank, size;
MPI_Init(&argc, &argv);
int tid;
MPI_Comm_rank(MPI_COMM_WORLD, &tid);
switch(tid)
{
all processes but rank 0 block waiting for console input which they cannot get. You should rewrite the beginning of your main function as follows:
int main(int argc, char *argv[]){
int oper;
MPI_Init(&argc, &argv);
int tid;
MPI_Comm_rank(MPI_COMM_WORLD, &tid);
if (tid == 0) {
std::cout << "Enter Size:";
std::cin >> oper;
}
MPI_Bcast(&oper, 1, MPI_INT, 0, MPI_COMM_WORLD);
Operations* operations = new Operations(oper);
switch(tid)
{
It works as follows: only rank 0 displays the prompt and then reads the console input into oper. Then a broadcast of the value of oper from rank 0 is performed so all other processes obtain the correct value, create the Operations object and then branch to the appropriate function.

make main program wait for threads to finish

In the following code I create some number of threads, and each threads sleeps for some seconds.
However my main program doesn't wait for the threads to finish, I was under the assumption that threads would continue to run until they finished by themselves.
Is there someway of making threads continue to run even though the calling thread finishes.
#include <pthread.h>
#include <iostream>
#include <cstdio>
#include <cstdlib>
int sample(int min,int max){
int r=rand();
return (r %max+min );
}
void *worker(void *p){
long i = (long) p;
int s = sample(1,10);
fprintf(stdout,"\tid:%ld will sleep: %d \n",i,s);
sleep(s);
fprintf(stdout,"\tid:%ld done sleeping \n",i,s);
}
pthread_t thread1;
int main(){
int nThreads = sample(1,10);
for(int i=0;i<nThreads;i++){
fprintf(stderr,"\t-> Creating: %d of %d\n",i,nThreads);
int iret1 = pthread_create( &thread1, NULL, worker, (void*) i);
pthread_detach(thread1);
}
// sleep(10);//work if this is not commented out.
return 0;
}
Thanks
Edit:
Sorry for not clarifying, is it possible without explicitly keeping track of my current running threads and by using join.
Each program has a main thread. It is the thread in which your main() function executes. When the execution of that thread finishes, the program finishes along with all its threads. If you want your main thread to wait for other threads, use must use pthread_join function
You need to keep track of the threads. You are not doing that because you are using the same thread1 variable to every thread you are creating.
You track threads by creating a list (or array) of pthread_t types that you pass to the pthread_create() function. Then you pthread_join() those threads in the list.
edit:
Well, it's really lazy of you to not keep track of running threads. But, you can accomplish what you want by having a global var (protected by a mutex) that gets incremented just before a thread finishes. Then in you main thread you can check if that var gets to the value you want. Say nThreads in your sample code.
You need to join each thread you create:
int main()
{
int nThreads = sample(1,10);
std::vector<pthread_t> threads(nThreads);
for(i=0; i<nThreads; i++)
{
pthread_create( &threads[i], NULL, worker, (void*) i)
}
/* Wait on the other threads */
for(i=0; i<nThreads; i++)
{
status* status;
pthread_join(threads[i], &status);
}
}
You learned your assumption was wrong. Main is special. Exiting main will kill your threads. So there are two options:
Use pthread_exit to exit main. This function will allow you to exit main but keep other threads running.
Do something to keep main alive. This can be anything from a loop (stupid and inefficient) to any blocking call. pthread_join is common since it will block but also give you the return status of the threads, if you are interested, and clean up the dead thread resources. But for the purposes of keeping main from terminating any blocking call will do e.g. select, read a pipe, block on a semaphore, etc.
Since Martin showed join(), here's pthread_exit():
int main(){
int nThreads = sample(1,10);
for(int i=0;i<nThreads;i++){
fprintf(stderr,"\t-> Creating: %d of %d\n",i,nThreads);
int iret1 = pthread_create( &thread1, NULL, worker, (void*) i);
pthread_detach(thread1);
}
pthread_exit(NULL);
}

Multi-Threaded MPI Process Suddenly Terminating

I'm writing an MPI program (Visual Studio 2k8 + MSMPI) that uses Boost::thread to spawn two threads per MPI process, and have run into a problem I'm having trouble tracking down.
When I run the program with: mpiexec -n 2 program.exe, one of the processes suddenly terminates:
job aborted:
[ranks] message
[0] terminated
[1] process exited without calling finalize
---- error analysis -----
[1] on winblows
program.exe ended prematurely and may have crashed. exit code 0xc0000005
---- error analysis -----
I have no idea why the first process is suddenly terminating, and can't figure out how to track down the reason. This happens even if I put the rank zero process into an infinite loop at the end of all of it's operations... it just suddenly dies. My main function looks like this:
int _tmain(int argc, _TCHAR* argv[])
{
/* Initialize the MPI execution environment. */
MPI_Init(0, NULL);
/* Create the worker threads. */
boost::thread masterThread(&Master);
boost::thread slaveThread(&Slave);
/* Wait for the local test thread to end. */
masterThread.join();
slaveThread.join();
/* Shutdown. */
MPI_Finalize();
return 0;
}
Where the master and slave functions do some arbitrary work before ending. I can confirm that the master thread, at the very least, is reaching the end of it's operations. The slave thread is always the one that isn't done before the execution gets aborted. Using print statements, it seems like the slave thread isn't actually hitting any errors... it's happily moving along and just get's taken out in the crash.
So, does anyone have any ideas for:
a) What could be causing this?
b) How should I go about debugging it?
Thanks so much!
Edit:
Posting minimal versions of the Master/Slave functions. Note that the goal of this program is purely for demonstration purposes... so it isn't doing anything useful. Essentially, the master threads send a dummy payload to the slave thread of the other MPI process.
void Master()
{
int myRank;
int numProcs;
MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
/* Create a message with numbers 0 through 39 as the payload, addressed
* to this thread. */
int *payload= new int[40];
for(int n = 0; n < 40; n++) {
payload[n] = n;
}
if(myRank == 0) {
MPI_Send(payload, 40, MPI_INT, 1, MPI_ANY_TAG, MPI_COMM_WORLD);
} else {
MPI_Send(payload, 40, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD);
}
/* Free memory. */
delete(payload);
}
void Slave()
{
MPI_Status status;
int *payload= new int[40];
MPI_Recv(payload, 40, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
/* Free memory. */
delete(payload);
}
you have to use thread safe version of mpi runtime.
read up on MPI_Init_thread.