Let's consider a program, as follows,
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[]){
int num_proc;
#ifdef MPI_VERSION
MPI_init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
MPI_Finalize();
#else
num_proc = 1;
#endif
printf("%d\n", num_proc);
}
I want to make it to be both MPI or non-MPI version.
That means when compiling and runing it without MPI linker, like below, num_proc is set to 1.
g++ main.cpp && ./a.out
While, if it is compiled and run with MPI linker, as below, num_proc is set to 2.
mpicxx main.cpp && mpiexec -n 2 ./a.out
Is this possible? How?
Related
I am trying to run the following example MPI code that launches 20 threads and keeps those threads busy for a while. However, when I check the CPU utilization using a tool like nmon or top I see that only a single thread is being used.
#include <iostream>
#include <thread>
#include <mpi.h>
using namespace std;
int main(int argc, char *argv[]) {
int provided, rank;
MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
if (provided != MPI_THREAD_FUNNELED)
exit(1);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
auto f = [](float x) {
float result = 0;
for (float i = 0; i < x; i++) { result += 10 * i + x; }
cout << "Result: " << result << endl;
};
thread threads[20];
for (int i = 0; i < 20; ++i)
threads[i] = thread(f, 100000000.f); // do some work
for (auto& th : threads)
th.join();
MPI_Finalize();
return 0;
}
I compile this code using mpicxx: mpicxx -std=c++11 -pthread example.cpp -o example and run it using mpirun: mpirun -np 1 example.
I am using Open MPI version 4.1.4 that is compiled with posix thread support (following the explanation from this question).
$ mpicxx --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
$ mpirun --version
mpirun (Open MPI) 4.1.4
$ ompi_info | grep -i thread
Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes)
FT Checkpoint support: no (checkpoint thread: no)
$ mpicxx -std=c++11 -pthread example.cpp -o example
$ ./example
My CPU has 10 cores and 20 threads and runs the example code above without MPI on all 20 threads. So, why does the code with MPI not run on all threads?
I suspect I might need to do something with MPI bindings, which I see being mentioned in some answers on the same topic (1, 2), but other answers entirely exclude these options, so I'm unsure whether this is the correct approach.
mpirun -np 1 ./example assigns a single core to your program (so 20 threads end up time sharing): this is the default behavior for Open MPI (e.g. 1 core per MPI process when running with -np 1 or -np 2.
./example (e.g. singleton mode) should use all the available cores, unless you are already running on a subset.
If you want to use all the available cores with mpirun, you can
mpirun --bind-to none -np 1 ./example
The library PETSc runs some test programs during configuration while checking the environment. One of those test programs is the following program (reduced by two relative headers):
#include <stdlib.h>
#include <mpi.h>
int main() {
int size;
int ierr;
MPI_Init(0,0);
ierr = MPI_Type_size(MPI_LONG_DOUBLE, &size);
if(ierr || (size == 0)) exit(1);
MPI_Finalize();
;
return 0;
}
Configuration fails, due to a timeout. When debugging the program, it gets stuck at the line MPI_Init(0, 0);, even though this line should be perfectly legal. I am using OpenMPI 2 with G++ 9.2.1, running on OpenSUSE TW.
The program is compiled using
mpicxx -O0 -g mpi_test.cpp -o mpi_test
I'm working on ABC algorithm using MPI to optimize Rastrigin function. My code's structure goes as follows:
I defined the control parameters
I defined my variables and arrays
I wrote my functions.
and called them back in a for loop right in my main.
and here is the my main in which I think my problem is at, I have defined total run times of 30 but when I run it get stuck on the 27th run. I'm running it on 4 nodes but it gets stuck! Any help?
Here is my main code :
int main (int argc, char* argv[])
{
int iter,run,j;
double mean;
mean=0;
srand(time(NULL));
MPI_Init (&argc, &argv);
MPI_Comm_rank (MPI_COMM_WORLD, &myRank);
MPI_Comm_size (MPI_COMM_WORLD, &numProc);
for(run=0;run<runtime;run++)
{
if(myRank==0){
initial();
MemorizeBestSource();
}
for (iter=0;iter<maxCycle;iter++)
{
SendEmployedBees();
if(myRank==0){
CalculateProbabilities();
SendOnlookerBees();
MemorizeBestSource();
SendScoutBees();
}
}
if(myRank==master){
for(j=0;j<D;j++)
printf("GlobalParam[%d]: %f\n",j+1,GlobalParams[j]);
printf("%d. run: %e \n",run+1,GlobalMin);
GlobalMins[run]=GlobalMin;
mean=mean+GlobalMin;
}
}
if(myRank==master){
mean=mean/runtime;
printf("Means of %d runs: %e\n",runtime,mean);
getch();
MPI_Finalize ();
}
}
I have written a sample code below:
#include <stdio.h>
#include <mpi.h>
double x;
int main (int argc, char **argv) {
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank==0) x=10.1;
MPI_Barrier(MPI_COMM_WORLD);
printf("%f\n", x);
MPI_Finalize();
return 0;
}
As one may notice, this program actually defines a global variable called x and the zeroth thread tries to assign some value to it. When I have this program run on an SMP (Symmetric multiprocessing) machine with 4 cores I get the following results:
10.1
0
0
0
More interestingly, when I change my code so that each thread prints the address of variable x, i.e. &x, they all print the same thing.
My question is how it is possible that a number of threads on an SMP system share the same value for the address of a variable while they do not share the same value?
and my second question is how I should change the above code so that I get the following results?
10.1
10.1
10.1
10.1
You could use broadcast:
MPI_Bcast(&x,1,MPI_DOUBLE,0,MPI_COMM_WORLD);
This will sent the value of x on process 0 to all other processes.
I have 3 function and 4 cores. I want execute each function in new thread using MPI and C++
I write this
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
size--;
if (rank == 0)
{
Thread1();
}
else
{
if(rank == 1)
{
Thread2();
}
else
{
Thread3();
}
}
MPI_Finalize();
But it execute just Thread1(). How i must change code?
Thanks!
Print to screen the current value of variable size (possibly without decrementing it) and you will find 1. That is: "there is 1 process running".
You are likely running your compiled code the wrong way. Consider to use mpirun (or mpiexec, depending on your MPI implementation) to execute it, i.e.
mpirun -np 4 ./MyCompiledCode
the -np parameter specifies the number of processes you will start (doing so, your MPI_Comm_size will be 4 as you expect).
Currently, though, you are not using anything explicitly owing to C++. You can consider some C++ binding of MPI such as Boost.MPI.
I worked a little bit on the code you provided. I changed it a little bit producing this working mpi code (I provided some needed correction in capital letters).
FYI:
compilation (under gcc, mpich):
$ mpicxx -c mpi1.cpp
$ mpicxx -o mpi1 mpi1.o
execution
$ mpirun -np 4 ./mpi1
output
size is 4
size is 4
size is 4
2 function started.
thread2
3 function started.
thread3
3 function ended.
2 function ended.
size is 4
1 function started.
thread1
1 function ended.
be aware that stdout is likely messed out.
Are you sure you are compiling your code the right way?
You problem is that MPI provides no way to feed console input into many processes but only into process with rank 0. Because of the first three lines in main:
int main(int argc, char *argv[]){
int oper;
std::cout << "Enter Size:";
std::cin >> oper; // <------- The problem is right here
Operations* operations = new Operations(oper);
int rank, size;
MPI_Init(&argc, &argv);
int tid;
MPI_Comm_rank(MPI_COMM_WORLD, &tid);
switch(tid)
{
all processes but rank 0 block waiting for console input which they cannot get. You should rewrite the beginning of your main function as follows:
int main(int argc, char *argv[]){
int oper;
MPI_Init(&argc, &argv);
int tid;
MPI_Comm_rank(MPI_COMM_WORLD, &tid);
if (tid == 0) {
std::cout << "Enter Size:";
std::cin >> oper;
}
MPI_Bcast(&oper, 1, MPI_INT, 0, MPI_COMM_WORLD);
Operations* operations = new Operations(oper);
switch(tid)
{
It works as follows: only rank 0 displays the prompt and then reads the console input into oper. Then a broadcast of the value of oper from rank 0 is performed so all other processes obtain the correct value, create the Operations object and then branch to the appropriate function.