I have written a sample code below:
#include <stdio.h>
#include <mpi.h>
double x;
int main (int argc, char **argv) {
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank==0) x=10.1;
MPI_Barrier(MPI_COMM_WORLD);
printf("%f\n", x);
MPI_Finalize();
return 0;
}
As one may notice, this program actually defines a global variable called x and the zeroth thread tries to assign some value to it. When I have this program run on an SMP (Symmetric multiprocessing) machine with 4 cores I get the following results:
10.1
0
0
0
More interestingly, when I change my code so that each thread prints the address of variable x, i.e. &x, they all print the same thing.
My question is how it is possible that a number of threads on an SMP system share the same value for the address of a variable while they do not share the same value?
and my second question is how I should change the above code so that I get the following results?
10.1
10.1
10.1
10.1
You could use broadcast:
MPI_Bcast(&x,1,MPI_DOUBLE,0,MPI_COMM_WORLD);
This will sent the value of x on process 0 to all other processes.
Related
I have a code which I compile and run using openmpi. Lately, I wanted to run this same code using Intel MPI. But my code is not working as expected.
I digged into the code and found out that MPI_Send behaves differently in both implementation.
I got the advice from the different forum to use MPI_Isend instead of MPi_Send from different forum. But that requires hell lot of work to modify the code. Is there any workaround in Intel MPI to make it work just like in OpenMPI. May be some Flags or Increasing Buffer or something else. Thanks in advance for your answers.
int main(int argc, char **argv) {
int numRanks;
int rank;
char cmd[] = "Hello world";
MPI_Status status;
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &numRanks);
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
if(rank == 0) {
for (int i=0; i< numRanks; i++) {
printf("Calling MPI_Send() from rank %d to %d\n", rank, i);
MPI_Send(&cmd,sizeof(cmd),MPI_CHAR,i,MPI_TAG,MPI_COMM_WORLD);
printf("Returned from MPI_Send()\n");
}
}
MPI_Recv(&cmd,sizeof(cmd),MPI_CHAR,0,MPI_TAG,MPI_COMM_WORLD,&status);
printf("%d receieved from 0 %s\n", rank, cmd);
MPI_Finalize();
}
OpenMPI Result
# mpirun --allow-run-as-root -n 2 helloworld_openmpi
Calling MPI_Send() from rank 0 to 0
Returned from MPI_Send()
Calling MPI_Send() from rank 0 to 1
Returned from MPI_Send()
0 receieved from 0 Hello world
1 receieved from 0 Hello world
Intel MPI Result
# mpiexec.hydra -n 2 /root/helloworld_intel
Calling MPI_Send() from rank 0 to 0
Stuck at MPI_Send.
It is incorrect to assume MPI_Send() will return before a matching receive is posted, so your code is incorrect with respect to the MPI Standard, and you are lucky it did not hang with Open MPI.
MPI implementation usually eager-send small messages so MPI_Send() can return immediately, but this is an implementation choice not mandated by the standard, and "small" message depends on the library version, the interconnect you are using and other factors.
The only safe and portable choice here is to write correct code.
FWIW, MPI_Bcast(cmd, ...) is a better fit here, assuming all ranks already know the string length plus the NUL terminator.
Last but not least, the buffer argument is cmd and not &cmd.
We are writing a code to solve non linear problem using an iterative method (Newton). Anyway, the problem is that we don't know a priori how many MPI processes will be needed from one iteration to another, due to e.g. remeshing, adaptivity, etc. And there is quite a lot of iterations...
We hence would like to use MPI_Comm_Spawn at each iteration to create as much MPI process as we need, gather the results and "destroy" the subprocesses. We know this limits the scalability of the code due to the gathering of information, however, we have been asked to do it :)
I did a couple of tests of MPI_Comm_Spawn on my laptop (on windows 7/64bit) using intel MPI and Visual Studio express 2013. I tried these simple codes
//StackMain
#include <iostream>
#include <mpi.h>
#include<vector>
int main(int argc, char *argv[])
{
int ierr = MPI_Init(&argc,& argv);
for (int i = 0; i < 10000; i++)
{
std::cout << "Loop number "<< i << std::endl;
MPI_Comm children;
std::vector<int> err(4);
ierr = MPI_Comm_spawn("StackWorkers.exe", NULL, 4, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &children, &err[0]);
MPI_Barrier(children);
MPI_Comm_disconnect(&children);
}
ierr = MPI_Finalize();
return 0;
}
And the program launched by the spawned processes:
//StackWorkers
#include <mpi.h>
int main(int argc, char *argv[])
{
int ierr = MPI_Init(&argc,& argv);
MPI_Comm parent;
ierr = MPI_Comm_get_parent(&parent);
MPI_Barrier(parent);
ierr = MPI_Finalize();
return 0;
}
The program is launched using one MPI process:
mpiexec -np 1 StackMain.exe
It seems to work, I do have however some questions...
1- The program freezes during iteration 4096, this number do not change if I relaunch the program. If during each iteration I launch 2 times 4 process, then it will stop at iteration 2048th...
Is it a limitation from the operating system ?
2- When I look at the memory occupied by "mpiexec" during the program, it grows continuously (never going down). Do you know why ? I though that, when subprocess finnished their job, they would release the memory they used...
3- Should I disconnect/free the children communicator or not ? If yes, MPI_Disconnect(...) must be called on both spawned and spawnee processes ? Or only spawnee ?
Thanks a lot!
Super EDIT:
Adding the broadcast step, will result in ncols to get printed by the two processes by the master node (from which I can check the output). But why? I mean, all variables that are broadcast have already a value in the line of their declaration!!! (off-topic image).
I have some code based on this example.
I checked that cluster configuration is OK, with this simple program, which also printed the IP of the machine that it would run onto:
int main (int argc, char *argv[])
{
int rank, size;
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
printf( "Hello world from process %d of %d\n", rank, size );
// removed code that printed IP address
MPI_Finalize();
return 0;
}
which printed every machine's IP twice.
EDIT_2
If I print (only) the grid, like in the example, I am getting for one computer:
Processes grid pattern:
0 1
2 3
and for two:
Processes grid pattern:
Executables were not the same in both nodes!
When I configured the cluster, I had a very hard time, so when I had a problem with mounting, I just skipped it. So now, the changes would appear only in one node. The code would behave weirdly, unless some (or all) of the code was the same.
I'm working on ABC algorithm using MPI to optimize Rastrigin function. My code's structure goes as follows:
I defined the control parameters
I defined my variables and arrays
I wrote my functions.
and called them back in a for loop right in my main.
and here is the my main in which I think my problem is at, I have defined total run times of 30 but when I run it get stuck on the 27th run. I'm running it on 4 nodes but it gets stuck! Any help?
Here is my main code :
int main (int argc, char* argv[])
{
int iter,run,j;
double mean;
mean=0;
srand(time(NULL));
MPI_Init (&argc, &argv);
MPI_Comm_rank (MPI_COMM_WORLD, &myRank);
MPI_Comm_size (MPI_COMM_WORLD, &numProc);
for(run=0;run<runtime;run++)
{
if(myRank==0){
initial();
MemorizeBestSource();
}
for (iter=0;iter<maxCycle;iter++)
{
SendEmployedBees();
if(myRank==0){
CalculateProbabilities();
SendOnlookerBees();
MemorizeBestSource();
SendScoutBees();
}
}
if(myRank==master){
for(j=0;j<D;j++)
printf("GlobalParam[%d]: %f\n",j+1,GlobalParams[j]);
printf("%d. run: %e \n",run+1,GlobalMin);
GlobalMins[run]=GlobalMin;
mean=mean+GlobalMin;
}
}
if(myRank==master){
mean=mean/runtime;
printf("Means of %d runs: %e\n",runtime,mean);
getch();
MPI_Finalize ();
}
}
I have 3 function and 4 cores. I want execute each function in new thread using MPI and C++
I write this
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
size--;
if (rank == 0)
{
Thread1();
}
else
{
if(rank == 1)
{
Thread2();
}
else
{
Thread3();
}
}
MPI_Finalize();
But it execute just Thread1(). How i must change code?
Thanks!
Print to screen the current value of variable size (possibly without decrementing it) and you will find 1. That is: "there is 1 process running".
You are likely running your compiled code the wrong way. Consider to use mpirun (or mpiexec, depending on your MPI implementation) to execute it, i.e.
mpirun -np 4 ./MyCompiledCode
the -np parameter specifies the number of processes you will start (doing so, your MPI_Comm_size will be 4 as you expect).
Currently, though, you are not using anything explicitly owing to C++. You can consider some C++ binding of MPI such as Boost.MPI.
I worked a little bit on the code you provided. I changed it a little bit producing this working mpi code (I provided some needed correction in capital letters).
FYI:
compilation (under gcc, mpich):
$ mpicxx -c mpi1.cpp
$ mpicxx -o mpi1 mpi1.o
execution
$ mpirun -np 4 ./mpi1
output
size is 4
size is 4
size is 4
2 function started.
thread2
3 function started.
thread3
3 function ended.
2 function ended.
size is 4
1 function started.
thread1
1 function ended.
be aware that stdout is likely messed out.
Are you sure you are compiling your code the right way?
You problem is that MPI provides no way to feed console input into many processes but only into process with rank 0. Because of the first three lines in main:
int main(int argc, char *argv[]){
int oper;
std::cout << "Enter Size:";
std::cin >> oper; // <------- The problem is right here
Operations* operations = new Operations(oper);
int rank, size;
MPI_Init(&argc, &argv);
int tid;
MPI_Comm_rank(MPI_COMM_WORLD, &tid);
switch(tid)
{
all processes but rank 0 block waiting for console input which they cannot get. You should rewrite the beginning of your main function as follows:
int main(int argc, char *argv[]){
int oper;
MPI_Init(&argc, &argv);
int tid;
MPI_Comm_rank(MPI_COMM_WORLD, &tid);
if (tid == 0) {
std::cout << "Enter Size:";
std::cin >> oper;
}
MPI_Bcast(&oper, 1, MPI_INT, 0, MPI_COMM_WORLD);
Operations* operations = new Operations(oper);
switch(tid)
{
It works as follows: only rank 0 displays the prompt and then reads the console input into oper. Then a broadcast of the value of oper from rank 0 is performed so all other processes obtain the correct value, create the Operations object and then branch to the appropriate function.