My test program works fine when I run multiple processes on a single machine.
$ ./mpirun -np 2 ./mpi-test
Hi I'm A:0
Hi I'm A:1
A:1 sending 11...
A:1 sent 11
A:0 received 11 from 1
all workers checked in!
When I run the same program on multiple hosts the process is spawned on each host, but MPI_Send never returns.
$ ./mpirun -np 2 -host A,B ./mpi-test
Hi I'm A:0
Hi I'm B:1
B:1 sending 11...
I've tried a couple other sample MPI programs I found and I ran into the same problem. Any idea what is going wrong?
EDIT: this also runs on a remote machine if all the processes are spawned on that machine.
Code:
#include <mpi.h>
int main(int argc, char** argv)
{
MPI::Init();
int rank = MPI::COMM_WORLD.Get_rank();
int size = MPI::COMM_WORLD.Get_size();
char name[256];
int len;
MPI::Get_processor_name(name, len);
printf("Hi I'm %s:%d\n", name, rank);
if (rank == 0) {
while (size > 1) {
int val;
MPI::Status status;
MPI::COMM_WORLD.Recv(&val, 1, MPI::INT, MPI::ANY_SOURCE, MPI::ANY_TAG, status);
int source = status.Get_source();
printf("%s:0 received %d from %d\n", name, val, source);
size--;
}
printf("all workers checked in!\n");
}
else {
int val = rank + 10;
printf("%s:%d sending %d...\n", name, rank, val);
MPI::COMM_WORLD.Send(&val, 1, MPI::INT, 0, 0);
printf("%s:%d sent %d\n", name, rank, val);
}
MPI::Finalize();
return 0;
}
EDIT: ompi_info
$ ./mpirun --bynode -host A,B --tag-output ompi_info -v ompi full --parsable
[1,0]<stdout>:package:Open MPI user#A Distribution
[1,0]<stdout>:ompi:version:full:1.4.3
[1,0]<stdout>:ompi:version:svn:r23834
[1,0]<stdout>:ompi:version:release_date:Oct 05, 2010
[1,0]<stdout>:orte:version:full:1.4.3
[1,0]<stdout>:orte:version:svn:r23834
[1,0]<stdout>:orte:version:release_date:Oct 05, 2010
[1,0]<stdout>:opal:version:full:1.4.3
[1,0]<stdout>:opal:version:svn:r23834
[1,0]<stdout>:opal:version:release_date:Oct 05, 2010
[1,0]<stdout>:ident:1.4.3
[1,1]<stdout>:package:Open MPI user#B Distribution
[1,1]<stdout>:ompi:version:full:1.4.3
[1,1]<stdout>:ompi:version:svn:r23834
[1,1]<stdout>:ompi:version:release_date:Oct 05, 2010
[1,1]<stdout>:orte:version:full:1.4.3
[1,1]<stdout>:orte:version:svn:r23834
[1,1]<stdout>:orte:version:release_date:Oct 05, 2010
[1,1]<stdout>:opal:version:full:1.4.3
[1,1]<stdout>:opal:version:svn:r23834
[1,1]<stdout>:opal:version:release_date:Oct 05, 2010
[1,1]<stdout>:ident:1.4.3
I ended up upgrading to 1.5.3 on A and installing 1.5.3 on C. I'm not sure whether it was the upgrade, or an issue with B, but everything is working now.
For reference:
original setup: node A (arch linux, Open MPI 1.4.3), node B (ubuntu, Open MPI
1.4.3)
working setup: node A (arch linux, Open MPI 1.5.3), node C (arch linux,
Open MPI 1.5.3)
The usual reason for this is that something is not set up properly on the remote host; it could be login/network problems, or that the MPI libraries/executables or the program itself isn't found on the remote host.
What happens if you try
mpirun -np 2 -host A,B hostname
?
Related
Set the number of nodes to 3, the program will run normally when the following commands are executed:
[changmx#gpu02 mpiTest]$ mpiexec -n 4 -host gpu02,gpu03,gpu04 helloworld
[gpu04:16537] [[37424,0],2] remote spawn is NULL!
[gpu03:01562] [[37424,0],1] remote spawn is NULL!
Hello World! Process 1 of 4 on gpu02
Hello World! Process 3 of 4 on gpu04
Hello World! Process 0 of 4 on gpu02
Hello World! Process 2 of 4 on gpu03
[changmx#gpu02 mpiTest]$ mpiexec -n 4 -host gpu02,gpu03,gpu05 helloworld
[gpu03:01597] [[37381,0],1] remote spawn is NULL!
[gpu05:26312] [[37381,0],2] remote spawn is NULL!
Hello World! Process 0 of 4 on gpu02
Hello World! Process 1 of 4 on gpu02
Hello World! Process 2 of 4 on gpu03
Hello World! Process 3 of 4 on gpu05
But when the number of nodes is 4, the program will neither execute nor exit unless I type Ctrl C to exit:
[changmx#gpu02 mpiTest]$ mpiexec -n 4 -host gpu02,gpu03,gpu04,gpu05 helloworld
[gpu04:16671] [[37833,0],2] remote spawn is NULL!
[gpu03:01731] [[37833,0],1] remote spawn is NULL!
Below is my source code:
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <mpi.h>
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
int main(int argc, char *argv[])
{
int myrank, numprocs;
int namelen = 20;
char process_name[namelen];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Get_processor_name(process_name, &namelen);
printf("Hello World! Process %d of %d on %s\n", myrank, numprocs, process_name);
MPI_Finalize();
}
My Open MPI version is 1.8.8.
This problem should be caused by the incorrect installation of my Open MPI. When I changed the Open MPI version, this problem didn't appear again.
I would need some help with using ngspice as a library in a webassembly (wasm) project.
I installed emsdk and newest version of emcc (1.39.20) and downloaded the source of ngspice version 32.
To my greatest surprise, I was able to compile ngspice to wasm target by following this guide:
emconfigure ./configure --with-ngshared --disable-debug
emmake make
(I had to patch configure a little to pass the checks by adding .out.js a.out.wasm to this line:)
# The possible output files:
ac_files="a.out a.out.js a.out.wasm conftest.exe conftest a.exe a_out.exe b.out conftest.*"
This produced a libngspice.so.0.0.0 file that I tried to link to from C++ code. However that failed with duplicate symbol: main. So it seemed that libngspice.so.0.0.0 contained a main function, that shouldn't have been there if I understand the purpose of the --with-ngshared of the configure script correctly.
So I manually removed the main function from main.c of ngspice and recomplied using the above method. This time I could successfully complie my own project, linking to ngspice. However when I call ngSpice_Init, I recieve the following runtime errors:
stderr Note: can't find init file.
exception thrown: RuntimeError: unreachable executed,#http://localhost:8001/sim.js line 1802 > WebAssembly.instantiate:wasm-function[67]:0x24e9
#http://localhost:8001/sim.js line 1802 > WebAssembly.instantiate:wasm-function[88]:0x423b
...
Minimal reproducible steps:
compile ngspice as above
compile the code below using em++ -o sim.html sim.cpp lib/libngspice.so
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "sharedspice.h"
using namespace std;
int recieve_char(char * str, int id, void* p){
printf("recieved %s\n", str);
}
int recieve_stat(char* status, int id, void* p){
printf("status: %s\n", status);
}
int ngexit(int status, bool unload, bool exit, int id, void* p){
printf("exit: %d\n", status);
}
int recieve_data(vecvaluesall* data, int numstructs, int id, void* p){
printf("data recieved: %f\n", data->vecsa[0]->creal);
}
int recieve_init_data(vecinfoall* data, int id, void* p){
printf("init data recieved from: %d\n", id);
}
int ngrunning(bool running, int id, void* p){
if(running){
printf("ng is running\n");
}else{
printf("ng is not running\n");
}
}
int main(){
ngSpice_Init(&recieve_char, &recieve_stat, &ngexit,
&recieve_data, &recieve_init_data, &ngrunning, (void*)NULL);
char** circarray = (char**)malloc(sizeof(char*) * 7);
circarray[0] = strdup("test array");
circarray[1] = strdup("V1 1 0 1");
circarray[2] = strdup("R1 1 2 1");
circarray[3] = strdup("C1 2 0 1 ic=0");
circarray[4] = strdup(".tran 10u 3 uic");
circarray[5] = strdup(".end");
circarray[6] = NULL;
ngSpice_Circ(circarray);
ngSpice_Command("run");
return 0;
}
So could someone please help me correctly compiling ngspice library to wasm target?
(Before someone asks, yes, I've seen this question, but it didn't help much)
I was able to compile the library and my example code after making several changes to the ngspice source. The patch and a guide on how to compile ngspice to wasm, can be found here.
(The issue leading to the error shown in my question was with the example code, not returning anything from functions that by signature should return int. This is not tolerated in wasm.)
I have to work on a code written a few years ago which uses MPI and PETSc.
When I try to run it, I have an error with the function MPI_Comm_rank().
Here is the beginning of the code :
int main(int argc,char **argv)
{
double mesure_tps2,mesure_tps1;
struct timeval tv;
time_t curtime2,curtime1;
char help[] = "Solves linear system with KSP.\n\n"; // NB: Petsc est defini dans "fafemo_Constant_Globales.h"
std::cout<< "d�but PetscInitialize" <<std::endl;
(void*) PetscInitialize(&argc,&argv,(char *)0,help);
std::cout<< "d�but PetscInitialize fait" <<std::endl;
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
PetscFinalize();
}
Obviously, there are some code between MPI_Comm_rank() and PetscFinalize().
PetscInitialize and PetscFinalize call respectively MPI_INIT and MPI_FINALIZE.
In my makefil I have :
PETSC_DIR=/home/thib/Documents/bibliotheques/petsc-3.13.2
PETSC_ARCH=arch-linux-c-debug
include ${PETSC_DIR}/lib/petsc/conf/variables
include ${PETSC_DIR}/lib/petsc/conf/rules
PETSC36 = -I/home/thib/Documents/bibliotheques/petsc-3.13.2/include -I/home/thib/Documents/bibliotheques/petsc-3.13.2/arch-linux-c-debug/include
Mpi_include=-I/usr/lib/x86_64-linux-gnu/openmpi
#a variable with some files names
fafemo_files = fafemo_CI_CL-def.cc fafemo_Flux.cc fafemo_initialisation_probleme.cc fafemo_FEM_setup.cc fafemo_sorties.cc fafemo_richards_solve.cc element_read_split.cpp point_read_split.cpp read_split_mesh.cpp
PETSC_KSP_LIB_VSOIL=-L/home/thib/Documents/bibliotheques/petsc-3.13.2/ -lpetsc_real -lmpi -lmpi++
fafemo: ${fafemo_files} fafemo_Richards_Main.o
g++ ${CXXFLAGS} -g -o fafemo_CD ${fafemo_files} fafemo_Richards_Main.cc ${PETSC_KSP_LIB_VSOIL} $(PETSC36) ${Mpi_include}
Using g++ or mpic++ doesn't seem to change anything.
It compiles, but when I try to execute I have :
[thib-X540UP:03696] Signal: Segmentation fault (11)
[thib-X540UP:03696] Signal code: Address not mapped (1)
[thib-X540UP:03696] Failing at address: 0x44000098
[thib-X540UP:03696] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3efd0)[0x7fbfa87e4fd0]
[thib-X540UP:03696] [ 1] /usr/lib/x86_64-linux-gnu/libmpi.so.20(MPI_Comm_rank+0x42)[0x7fbfa9533c42]
[thib-X540UP:03696] [ 2] ./fafemo_CD(+0x230c8)[0x561caa6920c8]
[thib-X540UP:03696] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fbfa87c7b97]
[thib-X540UP:03696] [ 4] ./fafemo_CD(+0x346a)[0x561caa67246a]
[thib-X540UP:03696] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node thib-X540UP exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Also, I have others MPI programs on my computer and I never had such a problem.
Does anyone know why do I get this ?
If someone has the same issue :
When I installed PETSc, I ran ./configure with --download-mpich while I already had mpi installed on my computer.
To solve the problem I did "rm -rf ${PETSC_ARCH}" and ran ./configure again.
I downloaded and installed mpich2-1.0.8p1-win-x86-64.msi from the console with the administrator rights. I created empty win32 console project, I created file code.cpp and I pasted this example code.
#include <stdio.h>
#include "mpi.h"
int main(int argc, char* argv[])
{
int ProcNum, ProcRank, RecvRank;
MPI_Status Status;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &ProcNum);
MPI_Comm_rank(MPI_COMM_WORLD, &ProcRank);
if (ProcRank == 0)
{
printf("\n Hello from process %3d", ProcRank);
for (int i = 1; i < ProcNum; i++)
{
MPI_Recv(&RecvRank, 1, MPI_INT, MPI_ANY_SOURCE,
MPI_ANY_TAG, MPI_COMM_WORLD, &Status);
printf("\n Hello from process %3d", RecvRank);
}
}
else
MPI_Send(&ProcRank, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
Later I went to project properties to VC++ Directories and added include directories and library directories. In Linker/Input/Additional Dependencies I wrote mpi.lib and in C/C++/Language I allowed Open MP Support. When I compiled my project I have got strange errors. Can you help me? I can't understand what I made wrong, because I did it with tutorials.
Your first (and only) warning states that you are linking a 64-bit library with a 32-bit build. You need to either provide a 32-bit library or build for a 64-bit architecture to get rid of the linker errors.
iam trying to build an multihreading webservice. Single threading is working, in my main function i use this:
int main(int argc, char **argv) {
CardSoapBindingService CardSrvc;
Config Conf ;
Conf.update();
int port = Conf.listener_port;
if (!port)
CardSrvc.serve();
else {
if (CardSrvc.run(port)) {
CardSrvc.soap_stream_fault(std::cerr);
exit(-1);
}
}
return 0;
}
But i want multithreading, so i looked in the documentation and found their example, which i tried instead my code. While compiling i get this errors:
main.cpp: In function int main(int, char**)':
main.cpp:56: error:soap_serve' undeclared (first use this function)
main.cpp:56: error: (Each undeclared identifier is reported only once for each
function it appears in.)
main.cpp: In function void* process_request(void*)':<br>
main.cpp:101: error:soap_serve' undeclared (first use this function)
make: *** [main.o] Fehler 1
How can i get this working?
Important:
This code requires gsoap version 2.8.5 as a minimum. It was initially built on Solaris 8 with gsoap version 2.8.3, porting the code to Ubuntu and running under valgrind showed that the 2.8.3 gsoap++ library was corrupting memory which lead to a SIGSEGV. It should be noted that as of 25/11/11 the version of gsoap that Ubuntu installs using apt-get is the broken 2.8.3. A manual download and build of the latest version of gsoap was required (make sure to install flex and bison before you configure the gsoap build!).
Using gsoap 2.8.5 the code below happily creates threads and serves SOAP messages to multiple clients, valgrind now reports 0 errors with the memory allocation.
Looking at your code the example you have working has been built with the -i (or -j) option to create C++ objects. The thread examples in the gsoap doumention are written in standard C; hence the reference to functions such as soap_serve() which you don't have.
Below is my quick re-write of the multithreaded example to use the C+ objects generated. It is based on the following definition file:
// Content of file "calc.h":
//gsoap ns service name: Calculator
//gsoap ns service style: rpc
//gsoap ns service encoding: encoded
//gsoap ns service location: http://www.cs.fsu.edu/~engelen/calc.cgi
//gsoap ns schema namespace: urn:calc
//gsoap ns service method-action: add ""
int ns__add(double a, double b, double &result);
int ns__sub(double a, double b, double &result);
int ns__mul(double a, double b, double &result);
int ns__div(double a, double b, double &result);
The main server code then looks like this:
#include "soapCalculatorService.h" // get server object
#include "Calculator.nsmap" // get namespace bindings
#include <pthread.h>
void *process_request(void *calc) ;
int main(int argc, char* argv[])
{
CalculatorService c;
int port = atoi(argv[1]) ;
printf("Starting to listen on port %d\n", port) ;
if (soap_valid_socket(c.bind(NULL, port, 100)))
{
CalculatorService *tc ;
pthread_t tid;
for (;;)
{
if (!soap_valid_socket(c.accept()))
return c.error;
tc = c.copy() ; // make a safe copy
if (tc == NULL)
break;
pthread_create(&tid, NULL, (void*(*)(void*))process_request, (void*)tc);
printf("Created a new thread %ld\n", tid) ;
}
}
else {
return c.error;
}
}
void *process_request(void *calc)
{
pthread_detach(pthread_self());
CalculatorService *c = static_cast<CalculatorService*>(calc) ;
c->serve() ;
c->destroy() ;
delete c ;
return NULL;
}
This is a very basic threading model but it shows how to use the C++ classes generated by gsoap to build a multithreaded server.