Detecting not using MPI when running with mpirun/mpiexec

Detecting not using MPI when running with mpirun/mpiexec - c++

I am writing a program (in C++11) that can optionally be run in parallel using MPI. The project uses CMake for its configuration, and CMake automatically disables MPI if it cannot be found and displays a warning message about it.
However, I am worrying about a perfectly plausible use case whereby a user configures and compiles the program on an HPC cluster, forgets to load the MPI module, and does not notice the warning. That same user might then try to run the program, notice that mpirun is not found, include the MPI module, but forget to recompile. If the user then runs the program with mpirun, this will work, but the program will just run a number of times without any parallelization, as MPI was disabled at compile time. To prevent the user from thinking the program is running in parallel, I would like to make the program display an error message in this case.
My questions is: how can I detect that my program is being run in parallel without using MPI library functions (as MPI was disabled at compile time)? mpirun just launches the program a number of times, but does not tell the processes it launches about them being run in parallel, as far as I know.
I thought about letting the program write some test file, and then check if that file already exists, but apart from the fact that this might be tricky to do due to concurrency problems, there is no guarantee that mpirun will even launch the various processes on nodes that share a file system.
I also considered using a system variable to communicate between the two processes, but as far as I know, there is no system independent way of doing this (and again, this might cause concurrency issues, as there is no way to coordinate system calls between the various processes).
So at the moment, I have run out of ideas, and I would very much appreciate any suggestions that might help me achieve this. Preferred solutions should by operating system independent, although a UNIX-only solution would already be of great help.

Basically, you want to run a a detection of whether you are being run by mpirun etc. in your non-MPI code-path. There is a very similar question: How can my program detect, whether it was launch via mpirun that already presents one non-portable solution.
Check for environment variables that are set by mpirun. See e.g.:
http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables
As another option, you could get the process id of the parent process and it's process name and compare it with a list of known MPI launcher binaries such as orted,slurmstepd,hydra??1. Everything about that is unfortunately again non-portable.
Since launching itself is not clearly defined by the MPI standard, there cannot be a standard way to detect it.
1: Only from my memory, please don't take the list literally.
From a user experience point of view, I would argue that always showing a clear message how the program is being run, such as:
Running FancySimulator serially. If you see this as part of mpirun, rebuild FancySimuilator with FANCYSIM_MPI=True.
or
Running FancySimulator in parallel with 120 MPI processes.
would "solve" the problem. A user getting 120 garbled messages will hopefully notice.

Related

A Function in an application(.exe) should be called only once regardless of how many times I run the same application

Suppose there are two functions, one to print "hello" and other to print "world" and I call these two functions inside the main function. Now, when I compile it will create a .exe file. When I run this .exe for the first time both functions will print "hello world".This .exe is terminated.
But if I run the same .exe for the second time or multiple times, only one function must execute ie. it should print only "world". I want to a piece of code or function that should only run once and after that, it should destroy itself and should be not be executed again regardless of how many times I run the application(.exe)
I can achieve this by accessing locally or windows registry and write some value for once and can check if that value is present, no need to execute this piece of code or function.
Can I achieve it without any external help that the application itself should be capable of performing this behaviour?
Any ideas are appreciated. thanks for reading

There is no coherent or portable way1 to do this from software without requiring the use of an external resource of some kind.
The issue is that you want the invocation of this process to be aware of the amount of times it has been executed, but the amount of times it has been executed is not a property that is recorded anywhere2. A program itself has no memory of its previous executions unless you program it do so.
Your best bet is to write out this information in some canonicalized location so that it can be read on later executions. This could be as a file in the filesystem (such as a hidden .firstrun file or something), or it could be through the registry (Windows specific), or some other environment-specific form of communication.
The main thing is that this must persist between executions and be available to your process.
1 You could potentially write code that overwrites the executable itself after the first invocation -- but this is extraordinarily brittle, and will be highly specific to the executable format. This is not an ideal nor recommended approach to solving this problem.
2 This is not a capability defined in the C or C++ standard. It's possible that there may be some specialized operating systems/flavors of linux that allow querying this -- but this is not something seen in most general-purpose operating systems. Generally the approach is communicate via an external resource.

Can I achieve it without any external help that the application itself
should be capable of performing this behaviour?
Not by any means defined by C or C++, and probably not on Windows at all.
You have to somehow, somewhere memorialize the fact that the one-time function has been called. If you have nothing but the compiled program to use for that, then the only alternative is to modify the program. Neither C nor C++ provides for live modification of the running program, much less for writing it back to the executable file containing its image.
Conceivably, if the program knows where to find or how to recreate its own source code, and if it knows how to run the compiler, then it could compile a modified version of itself. On Windows, however, it very likely could not overwrite its own executable file while it was running (though that would be possible on various other operating systems), so that would not solve the problem.
Moreover, note that any approach that involves modifying the executable would be at least a bit wonky, for different copies of the program would have their own, semi-independent idea of whether the one-time function had been run.
Basically, then, no.

Sharing data between processes when using MPI on windows

I have done a significant amount of testing trying to use windows named shared memory between multiple independently run programs, across two MPI hosts. The result has been MPI with admin rights not having windows privileges to access the Global\ shared memory.
If the MPI was to launch the EXE would they be considered child processes, and windows would allow memory access to them?
One of the processes contains DirectX, seems like it would be messy to incorporate DirectX directly into a MPI program, therefore I have kept them as independent EXE.
Previously asked about windows privileges for Intel MPI, on Intel's forms but no solutions found yet. ( https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/635157 ) Asking here in a more general sense to see if there are other approaches to this that I haven't been able to find.
Still looking for solutions to:
Gaining windows privileges for programs run with mpiexec
Running a Direct3D app with mpiexec
Decided to go with WinSockets seems promising.

How MPI launches the processes is highly implementation and configuration defined. Any assumptions will result in a poorly portable program. Evidently you cannot even assume that the processes are launched on the same node.
I see no reason not to incorporate MPI into a DirectX program. One solution to keep separate exe (although all using MPI), is to rely on MPMD.
See the OpenMPI FAQ, this should work similarly for IntelMPI or MPICH:
mpirun -np 2 app.exe : -np 1 dx-app.exe
Then use MPI communicators to separate the different kinds of processes and normal MPI facilities for communication (e.g. MPI-3 RMA, if you don't want messages).
You could even go for inter-communicators with MPI_COMM_SPAWN / MPI_COMM_CONNECT, but I see little benefit the way you describe the use-case.

MPI Fundamentals

I have a basic question regarding MPI, to get a better understanding of it (I am new to MPI and multiple processes so please bear with me on this one). I am using a simulation environment in C++ (RepastHPC) that makes extensive use of MPI (using the Boost libraries) to allow parallel operations. In particular, the simulation consists of multiple instances of the respective classes (i.e. agents), that are supposed to interact with each other, exchange information etc. Now given that this takes place on multiple processes (and given my rudimentary understanding of MPI) the natural question or fear I have is, that agents on different processes don't intereact with each other anymore because they cannot connect (I know, this contradicts the entire idea of MPI).
After reading the manual my understanding is this: the available libraries of Boost.MPI (and also the libaries of the above mentionend package) take care of all of the communication and sending packages back and forth between processes, i.e. each process has copies of the instances from other processes (I guess this is some form of call by value, b/c the original instance cannot be changed from a process that has only a copy), then an updating takes place, to ensure that the copies of the instances have the same information as the originals and so on.
Does this mean, that in terms of the final outcomes of the simulations runs, I get the same as if I would be doing the entire thing on one process? Put differently, the multiple processes are just supposed to speed up things but not change the design of simulation (thus I don't have to worry about it)?

I think you have a fundamental misunderstanding of MPI here. MPI is not an automatic parallelization library. It isn't a distributed shared memory mechanism. It doesn't do any magic for you.
What it does do is make it simpler to communicate between different processes on the same or different machines. Each process has its own address space which does not overlap with the other processes (unless you're doing something else outside of MPI). Assuming you set up your MPI installation correctly, it will do all of the pain of setting up the communication channels between your processes for you. It also gives you some higher level abstractions like collective communication.
When you use MPI, you compile your code differently than normal. Instead of using g++ -o code code.cpp (or whatever your compiler is), you use mpicxx -o code code.cpp. This will automatically link with all of the MPI stuff necessary. Then when you run your application, you use mpiexec -n <num_processes> ./code (other arguments aren't required, but are probably necessary) . The argument num_processes will tell MPI how many processes to launch. This isn't done at compile/link time.
You will also have to rewrite your code to use MPI. MPI has lots of functions (the standard is available here and there are lots of tutorials available on the web that are easier to understand) that you can use. The basics are MPI_Send() and MPI_Recv(), but there's lots and lots more. You'll have to find a tutorial for that.

Locating segmentation fault for multithread program running on cluster

It's quite straightforward to use gdb in order to locate a segmentation fault while running a simple program in interactive mode. But consider we have a multithread program - written by pthread - submitted to a cluster node (by qsub command). So we don't have an interactive operation.
How we can locate the segmentation fault? I am looking for a general approach, a program or test tool. I can not provide a reproducible example as the program is really big and crashes on the cluster in some unknown situations.
I need to find a problem in such hard situation because the program runs correctly on the local machine with any number of threads.

The "normal" approach is to have the environment produce a core file and get hold of those. If this isn't an option, you might want to try installing a signal handler for SIGSEGV which obtains, at least, a stack trace dumped somewhere. Of course, this immediately leads to the question "how to get a stack trace" but this is answered elsewhere.
The easiest approach is probably to get hold of a core file. Assuming you have a similar machine where the core file can be read, you can use gdb program corefile to debug the program program which produced the core file corefile: You should be able to look at the different threads, their data (to some extend), etc. If you don't have a suitable machine it may be necessary to cross-compile gdb matching the hardware of the machine where it was run.
I'm a bit confused about the statement that the core files are empty: You can set the limits for core files using ulimit on the shell. If the size for cores is set to zero it shouldn't produce any core file. Producing an empty one seems odd. However, if you cannot change the limits on your program you are probably down to installing a signal handler and dumping out a stack trace from the offending thread.
Thinking of it, you may be able to put the program to sleep in the signal handler and attach to it using a debugger, assuming you can run a debugger on the corresponding machine. You would determine the process ID (using, e.g., ps -elf | grep program) and then attach to it using
gdb program pid
I'm not sure how to put a program to sleep from within the program, though (possibly installing the handler for SIGSTOP for SIGSEGV...).
That said, I assume you tried running your program on your local machine...? Some problems are more fundamental than needing a distributed system of many threads running on each node. This is, obviously, not a replacement for the approach above.

Segmentation fault only when running on multi-core

I am using a c++ library that is meant to be multi-threaded and the number of working threads can be set using a variable. The library uses pthreads. The problem appears when I run the application ,that is provided as a test of library, on a quad-core machine using 3 threads or more. The application exits with a segmentation fault runtime error. When I try to insert some tracing "cout"s in some parts of library, the problem is solved and application finishes normally.
When running on single-core machine, no matter what number of threads are used, the application finishes normally.
How can I figure out where the problem seam from?
Is it a kind of synchronization error? how can I find it? is there any tool I can use too check the code ?

Sounds like you're using Linux (you mention pthreads). Have you considered running valgrind?
Valgrind has tools for checking for data race conditions (helgrind) and memory problems (memcheck). Valgrind may be to find such an error in debug mode without needing to produce the crash that release mode produces.

Some general debugging recommendations.
Make sure your build has symbols (compile with -g). This option is orthogonal to other build options (i.e. the decision to build with symbols is independent of the optimization level).
Once you have symbols, take a close look at the call stack of where the seg fault occurs. To do this, first make sure your environment is configured to generate core files (ulimit -c unlimited) and then after the crash, load the program/core in the debugger (gdb /path/to/prog /path/to/core). Once you know what part of your code is causing the crash, that should give you a better idea of what is going wrong.

You are running into a race condition.
Where multiple threads are interacting on the same resource.
There are a whole host of possible culprits, but without the source anything we say is a guess.
You want to create a core file; then debug the application with the core file. This will set up the debugger to the state of the application at the point it crashed. This will allow you to examin the variables/registers etc.
How to do this will very depending on your system.
A quick Google revealed this:
http://www.codeguru.com/forum/archive/index.php/t-299035.html
Hope this helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js