Correct way to STOP with MPI

Correct way to STOP with MPI - fortran

I'm using MPI, and at some points want to use STOP (or another method), to exit the program with an error message.
Right now, I'm doing something like this:
STOP 'Error'
But I have a feeling I'm doing something wrong. Do I need to call MPI_FINALIZE first? Is there something else to be doing?

In a catastrophic error condition, the usual way to exit is call MPI_Abort(MPI_COMM_WORLD, errcode, ierr). In most implementations, this will kill all tasks. In less drastic situations you could make sure all the tasks know of the condition and then have them all excit more gracefully with an MPI_Finalize.

Take a look at MPI_Abort:
The behavior of MPI_ABORT
(comm, errorcode),for comm other then MPI_COMM_WORLD,
is implementation-dependent. One the other hand, a call to
MPI_ABORT(MPI_COMM_WORLD, errorcode) should always cause all
processes in the group of MPI_COMM_WORLD to abort.

Testing this on NERSC supercomupters, I found that
call MPI_FINALIZE(ierr)
stop
can not stop the whole program. The following works:
call MPI_Abort(MPI_COMM_WORLD, errcode, ierr)
stop

Related

Difference between exit and kill in C++

I have written a signal handler to handle a SIG, and I want to kill the process if I get too many of it. So, which of the following code is better, or should I use them both?
exit(-1); // or some other exit code
kill(getpid(), SIGKILL);

You probably don't want either one, but what you do want is much closer to exit than to kill.
kill is something else coming in from the outside, and forcibly destroying a process. exit is the process itself deciding to quit executing. The latter is generally preferable.
As to why exit isn't the right answer either: most C++ code depends on destructors to clean up objects as you exit from a scope. If you call exit, that won't generally happen--you call exit, it exits to the OS, and no destructors get called in between (except things registered with onexit).
Instead, you generally want to throw an exception that's typically only caught in main, and exits gracefully when it is caught:
int main() {
try {
do_stuff();
}
catch(time_to_die const &) {
}
}
The advantage in this case is that when you do a throw time_to_die;, it automatically unwinds the stack, executing the destructors for all local objects as it goes. When it gets back to main, you get a normal exit, with all destructors having executed, so (assuming proper use of RAII) all your files, network connections, database connections, etc., have been closed as expected, any caches flushed, and so on, so you get a nice, graceful exit.
Short summary: as a rule of thumb, C++ code should never call exit. If your code is completely stuck in a crack, and you want to exit immediately, you want to call abort. If you want a semi-normal exit, do it by throwing an exception that will get you back to main so you can clean things and up and exit gracefully.

Difference between exit and kill in C++
One difference is that kill function is not specified in the C++ standard library. It is merely specified in POSIX. exit is standard C++.
Another difference is that kill(getpid(), SIGKILL) will cause the operating system terminates the process forcefully. exit instead performs cleanup (by calling a atexit callback, and flushing streams etc.) and terminates the execution voluntarily.
So, which of the following code is better, or should I use them both?
Depends on the use case, but usually exit is more sensible, since one usually wants the cleanup that it provides.

I would recommend exit(1).
Typically, an app would want to terminate gracefully if at all possible. SIGKILL is an instant death for your process - your exit handlers won't be called, for example. But in your case, you also have to call the getpid as well as the kill itself. exit instantly initiates the graceful exit process. It's the right choice for your needs.
There's rarely a good architectural reason to use SIGKILL in general. there's so many signals (http://man7.org/linux/man-pages/man7/signal.7.html in POSIX, you have SIGINT, SIGTERM, ... ) and to reiterate, there's no reason not to die gracefully if you can.

What is the difference between "stop" and "exit" in Fortran?

What is the difference between stop and exit in Fortran?
Both can terminate the program immediately with some error information.

exit in Fortran is a statement which terminates loops or completes execution of other constructs. However, the question is clearly about the non-standard extension, as either a function or subroutine, offered by many compilers which is closely related to the stop statement.
For example, gfortran offers such a thing.
As this use of exit is non-standard you should refer to a particular implementation's documentation as to what form this takes and what effects it has.
The stop statement, on the other hand, is a standard Fortran statement. This statement initiates normal termination of execution of a Fortran program (and can be compared with the error stop statement which initiates error termination).
Other than knowing that terminating (normally) execution of the program follows a stop statement and that there is a stop code, the actual way that happens are again left open to the implementation. There are some recommendations (but these are only recommendations) as to what happens. For example, in Fortran 2008 it is suggested that
the stop code (which may be an integer or a string) be printed to an "error" unit;
if the stop code is an integer it be used as the process exit status;
if there is no stop code (or it's a character) zero be returned as the exit status.
The above is fairly vague as in many settings the above concepts don't apply.
Typically in practice, exit will be similar to the C library's function of that name and its effect will be like stop without a stop code (but still passing the given status back to the OS).
In summary, Fortran doesn't describe a difference between stop and exit. Use of exit (for termination) is non-portable and even the effect of stop is not entirely defined.

stop is a fortran statement but exit is a function that just happens to terminate the program.
The stop statement will output its argument [which can also be a string] to stderr
stop 123
and it will return a zero status to the parent process.
Whereas exit is a function and must be called like any other. It will also be silent (i.e. no message):
call exit(123)
and the argument to exit will be returned to the parent process as status

main: return 0 hangs, exit 0 closes. How to debug?

I have a program that spawns three threads, does some communication between them, then closes them. The main thread waits for the last thread to close and then calls return 0.
But for some strange reason my program does not close but hangs when exiting with return 0, it closes fine with exit(0) however. I have already checked that the threads are really closed, I even forced them to close by issuing pthread_kill(pid, 0). I also tried valgrind to look for leaking memory.
As far as I understand the only thing exit() is not doing is calling the destructors of locally scoped non-static objects, but neither is there one in my main function nor would that explain why it hangs.
What is causing that behavior? How could I debug this?
code:
main.cpp: http://pastebin.com/7aN9KA6T
publisher.hpp: http://pastebin.com/Vhz1FKau
publisher.cpp: http://pastebin.com/09nh5YBs
boxoffice.hpp: http://pastebin.com/kaEbgNMJ
boxoffice.cpp: http://pastebin.com/wafaVcGV

You need to join each of your threads before returning.
bo_thread.join();
pub_thread.join();
sub_thread.join();
Also, pthread_kill(pid, 0) in the way you're using it has two issues.
It takes a pthread_t type, not a pid. This can be accessed through boost::thread::native_handle
Calling it doesn't actually 'kill' the thread. What it does depends on the second argument. With 0, it will just check if they're running. See the man page here: http://man7.org/linux/man-pages/man3/pthread_kill.3.html

c++ How to stop all functions when a certain parameter is met

in c++ How can I stop or pause all other functions while this condition is met
else if (packet[0] == 0x5)
{
}
Basically I have a separate void function running a constant loop however if packet[0] ==0x5 i need it to stop all other threads (or pause)
Please
and
thanks
Ant

I don't think there is a direct way to stop or pause other threads (other then stopping the entire program by exit(0)). The only approach I'm aware of is a cooperative way where the threads which need to stop or pause are notified in some way and proactively act upon this notification. How the notification would look exactly depends largely on your system. One approach could be an atomic flag indicating that there is a need to act. If your threads supporting messaging, sending each one a message is probably more lightweight.

In 'stdlib.h' there is a function exit(int Status_Code) which terminates execution of whole program. You should call exit(1) .
If you want only terminate that function just use return. e.g.
if(condition_met)
{
return;
}

Maybe put all those function calls into a thread, and when your realise you need to stop those functions, kill the thread

Pthread Cancellation in c++

The following (pseudo-) code induces some behaviour which I dont understand. I have 2 threads running parallel. Both do the same (complex) calculation but I dont know which will finish first. Repeatedly run, there are cases where the first is faster and cases where the second is faster. This is okay and works as intended. Then, the first successfull thread should terminate the other thread and both should fork together. However if the first solver finishs, everything works out but if the second finishes first, the "join command" does not recognize that the first solver is terminated (so the join waits forever, the programm does not continue). Any ideas what I did wrong or what I could do different?
void* thread_function(...)
{
do_some_complex_task();
if (successfull && first_finsihed)
{
pthread_chancel(other_thread);
}
}
int main()
{
std::vector<pthread_t*> store;
store.resize(2);
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
pthread_create(store[0], NULL, thread_function, ...);
pthread_create(store[1], NULL, thread_function, ...);
pthread_join(*store[0], NULL);
pthread_join(*store[1], NULL);
}
PS. If the pseudo code is not detailed enough please let me know.

Based on the pseudo code, one problem may be that the threads have deferred cancellation (the default value) instead of asynchronous cancellation. If the canceled thread never reaches a cancellation point, then pthread_join would block. When calling pthread_create, the newly created thread inherits the calling thread's:
signal mask (pthread_signmask)
floating point environment (fenv)
capabilities
CPU affinity mask (sched_setaffinity)
Try invoking pthread_setcanceltype from within threads you wish to cancel. You may also want to consider attaching a debugger to the program to identify the current state of the threads.
Ideally, if it is at all possible, consider avoiding pthread_cancel. Although the pthread-calls are documented, it can be difficult to obtain and maintain the exact behavior due to all of the minor details. It is generally easier to set a flag that indicates that the thread is set to exit, and begin to return from functions when it is set. In do_some_complex_task, consider breaking the complex task into smaller tasks, and check for cancellation between each smaller task.

Don't use pthread_cancel - use a global state, and check it inside the "main loop" of each thread . Set it "on" just before the exit of the thread's function.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js