exit fails to set error code - c++

I have a C++ Windows program that fails to set the exit code. The program is very complex and I'm currently unable to reproduce this with a simple test case. I do know that the program calls exit(1) because I have a breakpoint on that line. Immediately after I step over it, the debugger (VS2010) prints The program program.exe has exited with code 0 (0x0). When I run it from the shell, %ERRORLEVEL% is also set to 0.
I use subsystem:console and plain old main (no WinMain).
This only happens on Windows Server 2008 R2, not on my Windows 8.1 laptop. I'm running the same executable on both.
I have tried to use exit, _exit, ExitProcess, and return (the offending call is in main), but none of those seem to have any effect. I also have tried to return other codes, also with no result.
There's a similar question but I cannot reproduce the results described in it. My program does use threads.
How can I approach debugging this issue? I'm rather baffled.

I have tried to use exit, _exit, ExitProcess, and return
You've eliminated all reasonable explanations, particularly with ExitProcess(). There is only one possibility left, you need to try TerminateProcess(). If that still doesn't set the exit code then you need to shove that machine out of a 4th story window.
But with the expectation that it now works. The difference between ExitProcess() and TerminateProcess() is that the former ensures that all DLLs are notified by the termination. Their DllMain() function gets called with fdwReason = DLL_PROCESS_DETACH. Which gives a DLL the opportunity to do something icky like calling Exit/TerminateProcess() itself, thus screwing up the exit code.
Finding such a DLL can be difficult if you don't have all the source code. Could be an injected one as well, there are entirely too many around these days. Best thing to do is to set a breakpoint on system call so you can catch it in the act, you probably want to do this regardless.
Once you step into main(), use Debug > New Breakpoint > Break at Function and enter {,,ntdll.dll}_NtTerminateProcess#8. Press F5 and the debugger now stops just before the program terminates. Look at the Call Stack to find the evil-doer.

Strange symptoms involving exit(), _exit(), ExitProcess(), and others in a multithreaded program - particularly if the symptoms vary between hosts - have a smell of a variable being modified or accessed by different threads, without synchronisation.
Looking at the other thread you linked to, it appears you are using a volatile variable to communicate between threads, but not using any form of synchronisation (for example, code which accesses the value of that variable and code that modifies that value need to cooperate via means of a critical section, mutex, or comparable construct).
That little bit of indirect evidence makes the smell even stronger.
The basic problem I suspect is that declaring a variable as volatile is neither necessary nor sufficient to ensure that variable always has values that will make sense to your program. In particular, it is not sufficient to prevent a thread which is modifying a variable from being preempted when the modification is only partly complete, and for another thread to attempt accessing or modifying the affected variable.
If you look up some articles by Herb Sutter (particularly those concerned with thread synchronisation in his "Guru of the Week" series) you will find detailed explanations of why that is so. Other authors also describe such things, but Sutter's articles are ones that I recall offhand.
The solution is to introduce some means of synchronisation, and for EVERY thread in your program to religiously use it before accessing or modifying variables shared between them. This avoids the various problems (race conditions, operations being preempted partway through) that would cause symptoms like you describe.
Such problems are rarely caught by stepping through with a debugger. The reason for that is that the symptoms are an emergent property. Several unlikely and often independent occurrences, in disparate threads of execution, must occur together. Debuggers do typically change the timing of events in programs, and timing is a critical consideration in the symptoms emerging.
Options include making key variables atomic (so particular operations cannot be preempted), critical sections (where the threads explicitly cooperate within a program), or mutexes (which, depending on definition, allows threads in different programs to explicitly cooperate before accessing shared memory).
Yes, this introduces a bottleneck in your program - a point where every thread must rendezvous and potentially wait for each other. That can affect throughput of your program. Some people advocate using volatile variables to avoid such concerns. More often than not, the result is intermittent symptoms in long running programs like you have described in this question and the "similar question" you linked to.
It doesn't matter whether you use standard means of synchronisation (e.g. introduced in C++11) or windows specific means (WIN API functions). The important thing is that you use a deliberate synchronisation method, rather than just making variables volatile. Different options for synchronisation have different trade-offs, so you will need to make a decision relevant to needs of your program.
Another consideration is to signal all threads so they close cleanly, wait until they are all closed, capture their exit codes, and THEN exit the program. It is often less error prone to do this in the thread running main() - which ultimately starts the process, so is more likely to have access to information it needs to cleanup correctly. If another thread decides the program needs to exit, then better if it communicates that need back to main() to do it.

Related

How to release resources if a program crashes

I have a program that uses services from others. If the program crashes, what is the best way to close those services? At server side, I would define some checkers that monitor if a client is invalid periodically. But can we do any thing at client? I am not the sure if the normal RAII still works at this case. My code is written in C and C++.
If your application experiences a hard crash, then no, your carefully crafted cleanup code will not run, whether it is part of an RAII paradigm or a method you call at the end of main. None of an application's cleanup code runs after a crash that causes the application to be terminated.
Of course, this is not true for exceptions. Although those might eventually cause the application to be terminated, they still trigger this termination in a controlled way. Generally, the runtime library will catch an unhandled exception and trigger termination. Along the way, your RAII-based cleanup code will be executed, unless it also throws an exception. Then you're back to being unceremoniously ripped out of memory.
But even if your application's cleanup code can't run, the operating system will still attempt to clean up after you. This solves the problem of unreleased memory, handles, and other system objects. In general, if you crash, you need not worry about releasing these things. Your application's state is inconsistent, so trying to execute a bunch of cleanup code will just lead to unpredictable and potentially erroneous behavior, not to mention wasting a bunch of time. Just crash and let the system deal with your mess. As Raymond Chen puts it:
The building is being demolished. Don't bother sweeping the floor and emptying the trash cans and erasing the whiteboards. And don't line up at the exit to the building so everybody can move their in/out magnet to out. All you're doing is making the demolition team wait for you to finish these pointless housecleaning tasks.
Do what you must; skip everything else.
The only problem with this approach is, as you suggest in this question, when you're managing resources that are not controlled by the operating system, such as a remote resource on another system. In that case, there is very little you can do. The best scenario is to make your application as robust as possible so that it doesn't crash, but even that is not a perfect solution. Consider what happens when the power is lost, e.g. because a user's cat pulled the cord from the wall. No cleanup code could possibly run then, so even if your application never crashes, there may be termination events that are outside of your control. Therefore, your external resources must be robust in the event of failure. Time-outs are a standard method, and a much better solution than polling.
Another possible solution, depending on the particular use case, is to run consistency-checking and cleanup code at application initialization. This might be something that you would do for a service that is intended to run continuously and will be restarted promptly after termination. The next time it restarts, it checks its data and/or external resources for consistency, releases and/or re-initializes them as necessary, and then continues on as normal. Obviously this is a bad solution for a typical application, because there is no guarantee that the user will relaunch it in a timely manner.
As the other answers make clear, hoping to clean up after an uncontrolled crash (i.e., a failure which doesn't trigger the C++ exception unwind mechanism) is probably a path to nowhere. Even if you cover some cases, there will be other cases that fail and you are building in a serious vulnerability to those cases.
You mention that the source of the crashes is that you are "us[ing] services from others". I take this to mean that you are running untrusted code in-process, which is the potential source of crashes. In this case, you might consider running the untrusted code "out of process" and communicating back to your main process through a pipe or shared memory or whatever. Then you isolate the crashes this child process, and can do controlled cleanup in your main process. A separate process is really the lightest weight thing you can do that gives you the strong isolation you need to avoid corruption in the calling code.
If forking a process per-call is performance-prohibitive, you can try to keep the child process alive for multiple calls.
One approach would be for your program to have two modes: normal operation and monitoring.
When started in a usual way, it would :
Act as a background monitor.
Launch a subprocess of itself, passing it an internal argument (something that wouldn't clash with normal arguments passed to it, if any).
When the subprocess exists, it would release any resources held at the server.
When started with the internal argument, it would:
Expose the user interface and "act normally", using the resources of the server.
You might look into atexit, which may give you the functionality you need to release resources upon program termination. I don't believe it is infallible, though.
Having said that, however, you should really be focusing on making sure your program doesn't crash; if you're hitting an error that is "unrecoverable", you should still invest in some error-handling code. If the error is caused by a Seg-Fault or some other similar OS-related error, you can either enable SEH exceptions (not sure if this is Windows-specific or not) to enable you to catch them with a normal try-catch block, or write some Signal Handlers to intercept those errors and deal with them.

C++ graceful shutdown best practices

I'm writing a multi-threaded c++ application for *nix operating systems. What are some best practices for terminating such an application gracefully? My instinct is that I'd want to install a signal handler on SIGINT (SIGTERM?) which stops/joins my threads. Also, is it possible to "guarantee" that all destructors are called (provided no other errors or exceptions are thrown while handling the signal)?
Some considerations come to mind:
designate 1 thread to be responsible for orchestrating the shutdown, eg, as Dithermaster suggested, this could be the main thread if you are writing a standalone application. Or if you are writing a library, provide an interface (eg function call) whereby a client program can terminate the objects created within the library.
you cannot guarantee destructors are called; that is up to you, and requires carefully calling delete for each new. Maybe smart pointers will help you. But, really, this is a design consideration. The major components should have start & stop semantics, which you could choose to invoke from the class constructor & destructor.
the shutdown sequence for a set of interacting objects is something that can require some effort to get correct. E.g., before you delete an object, are you sure some timer mechanism is not going to try calling it in few micro/milli/seconds later? Trial and error is your friend here; develop a framework which can repeatedly & rapidly start and stop your application to tease out shutdown related race-conditions.
signals are one way to trigger an event; others might be periodically polling for a known file, or opening a socket and receiving some data on it. Either way, you want to decouple the shutdown sequence code from the trigger event.
My recommendation is that the main thread shut down all worker threads before exiting itself. Send each worker an event telling it to clean up and exit, and wait for each one to do so. This will allow all C++ destructors to run.
Regarding signal management, the only thing you can portably and safely do inside a signal handler is to write to a variable of type sig_atomic_t (possibly volatile-qualified) and return. In general, you cannot call most functions and must not write to global memory. In other words, the handler should just set a flag to be tested inside your main routine, at some point you find appropriate, and the action resulting from the signal itself should be performed from there.
(Since there might be blocking I/O involved, consider studying POSIX Thread Cancellation. Your Unix clone (most notably Linux) might have peculiarities with respect to this and to the above.)
Regarding destructors, no magic is involved. They will be executed if control leaves a given scope through any means defined in the language. Leaving a scope through other means (for example, longjmp() or even exit()) does not trigger destructors.
Regarding general shutdown practices, there are divergent opinions on the field.
Some state that a "graceful termination", in the sense of releasing every resource ever allocated, should be performed. In C++, this usually means that all destructors should be properly executed before the process terminates. This is tricky in practice and often a source of much grief, specially in multithreaded programs, for a variety of reasons. Signals further complicate things by the very nature of asynchronous signal dispatching.
Because most of this work is totally useless, some others, like me, contend that the program must just terminate immediately, possibly shortly after undoing persistent changes to the system (like removing temporary files or restoring the screen resolution) and saving configuration. An apparently tidier cleanup is not only a waste of time (because the operating system will clean up most things like allocated memory, dangling threads and open file descriptors), but might be a serious waste of time (deallocators might touch paged out memory, uselessly forcing the system to page them in just for releasing them soon after the process terminates, for example), not mentioning the possibility of deadlocks being originated from joining threads.
Just say no. When you want to leave, call exit() (or even _exit(), but watch out for unflushed I/O) and that's it. More annoying than slow starting programs are slow terminating programs.

How do you stop a thread and flush its registers into the stack?

I'm creating a concurrent memory reclamation algorithm in C++. Periodically, the stacks of executing mutator threads need to be inspected, so that I can see what references the threads are currently holding. In the process of doing this, I need to also check the registers of the mutator thread to check any references that might be in there.
Clearly many JVM's and C# vm's have no problem doing this as part of their garbage collection cycles. However, I haven't been able to find a definitive solution to this issue.
I can't quite tease apart what is going on in the Bohem garbage collector in order to inspect the root set, if you can (or know how its done), I'd really like to know.
Ideally I would be able to cause the mutator thread to be interrupted, and execute a piece of handler code which would report it's PC and also flush any register-based references into the stack, and then perhaps help finish the collection cycle. I believe that most compilers in most systems will automatically flush the registers when interrupt or signal handlers are called, but I'm not clear on the specifics, or how to access that data. It seems that separate stacks might be used for interrupt and signal handlers. Additionally, I can't find any information about how to target a particular thread, or how to send a signal. Windows does not seem to support this form of signaling anyway, and I would like my system to run on both Linux and Windows on x86-64 processors.
Edit: SuspendThread() is used in some situations, although safepoints seem to be preferred. Any ideas on why? Is there any way to deal with long-lasting I/O waits or other waits for kernel code to return?
I thought this was a very interesting question, so I dug into it a bit. It turns out that the Hotspot JVM uses a mechanism called "safepoints" which cause the threads of the JVM to cooperatively all stop themselves so that the GC can begin. In other words, the thread initiating GC doesn't forcibly stop the other threads, the other threads voluntarily suspend themselves by various clever mechanisms.
I don't believe the JVM scans registers, because a safepoint is defined such that all roots are known (I presume this means in memory).
For more information see:
HotSpot Glossary -- which defines safepoints
safepoint.cpp -- the source in HotSpot that implements safepoints
A slide deck that describes safepoints in some detail (look 10 slides or so in)
In regards to your desire to "interrupt" all threads, according to the slide deck I referenced above, thread suspension is "unreliable on Solaris and Linux, e.g., spurious signals." I'm not sure what mechanism even exists for thread suspension that the slides would be referring to.
On windows you should be able to get this done use SuspendThread (and ResumeThread) along with GetThreadContext (as Hans mentioned). All of these functions take handles to the specific thread you intend to target.
To get a list of all threads in the current process, see this(toolhlp32 works on x64, despite its bad naming scheme...).
As a point of interest, one way to flush registers to the stack on x86 is to use the PUSHAD assembly instruction.

Changing Thread Task?

I know you cannot kill a boost thread, but can you change it's task?
Currently I have an array of 8 threads. When a button is pressed, these threads are assigned a task. The task which they are assigned to do is completely independent of the main thread and the other threads. None of the the threads have to wait or anything like that, so an interruption point is never reach.
What I need is to is, at anytime, change the task that each thread is doing. Is this possible? I have tried looping through the array of threads and changing what each thread object points to to a new one, but of course that doesn't do anything to the old threads.
I know you can interrupt pThreads, but I cannot find a working link to download the library to check it out.
A thread is not some sort of magical object that can be made to do things. It is a separate path of execution through your code. Your code cannot be made to jump arbitrarily around its codebase unless you specifically program it to do so. And even then, it can only be done within the rules of C++ (ie: calling functions).
You cannot kill a boost::thread because killing a thread would utterly wreck some of the most fundamental assumptions a programmer makes. You now have to take into account the possibility that the next line doesn't execute for reasons that you can neither predict nor prevent.
This isn't like exception handling, where C++ specifically requires destructors to be called, and you have the ability to catch exceptions and do special cleanup. You're talking about executing one piece of code, then suddenly inserting a call to some random function in the middle of already compiled code. That's not going to work.
If you want to be able to change the "task" of a thread, then you need to build that thread with "tasks" in mind. It needs to check every so often that it hasn't been given a new task, and if it has, then it switches to doing that. You will have to define when this switching is done, and what state the world is in when switching happens.

Thread related issues and debugging them

This is my follow up to the previous post on memory management issues. The following are the issues I know.
1)data races (atomicity violations and data corruption)
2)ordering problems
3)misusing of locks leading to dead locks
4)heisenbugs
Any other issues with multi threading ? How to solve them ?
Eric's list of four issues is pretty much spot on. But debugging these issues is tough.
For deadlock, I've always favored "leveled locks". Essentially you give each type of lock a level number. And then require that a thread aquire locks that are monotonic.
To do leveled locks, you can declare a structure like this:
typedef struct {
os_mutex actual_lock;
int level;
my_lock *prev_lock_in_thread;
} my_lock_struct;
static __tls my_lock_struct *last_lock_in_thread;
void my_lock_aquire(int level, *my_lock_struct lock) {
if (last_lock_in_thread != NULL) assert(last_lock_in_thread->level < level)
os_lock_acquire(lock->actual_lock)
lock->level = level
lock->prev_lock_in_thread = last_lock_in_thread
last_lock_in_thread = lock
}
What's cool about leveled locks is the possibility of deadlock causes an assertion. And with some extra magic with FUNC and LINE you know exactly what badness your thread did.
For data races and lack of synchronization, the current situation is pretty poor. There are static tools that try to identify issues. But false positives are high.
The company I work for ( http://www.corensic.com ) has a new product called Jinx that actively looks for cases where race conditions can be exposed. This is done by using virtualization technology to control the interleaving of threads on the various CPUs and zooming in on communication between CPUs.
Check it out. You probably have a few more days to download the Beta for free.
Jinx is particularly good at finding bugs in lock free data structures. It also does very well at finding other race conditions. What's cool is that there are no false positives. If your code testing gets close to a race condition, Jinx helps the code go down the bad path. But if the bad path doesn't exist, you won't be given false warnings.
Unfortunately there's no good pill that helps automatically solve most/all threading issues. Even unit tests that work so well on single-threaded pieces of code may never detect an extremely subtle race condition.
One thing that will help is keeping the thread-interaction data encapsulated in objects. The smaller the interface/scope of the object, the easier it will be to detect errors in review (and possibly testing, but race conditions can be a pain to detect in test cases). By keeping a simple interface that can be used, clients that use the interface will also be correct just by default. By building up a bigger system from lots of smaller pieces (only a handful of which actually do thread-interaction), you can go a long way towards averting threading errors in the first place.
The four most common problems with theading are
1-Deadlock
2-Livelock
3-Race Conditions
4-Starvation
How to solve [issues with multi threading]?
A good way to "debug" MT applications is through logging. A good logging library with extensive filtering options makes it easier. Of course, logging itself influences the timing, so you still can have "heisenbugs", but it's much less likely than when you're actuall breaking into the debugger.
Prepare and plan for that. Include a good logging facility into your application from the start.
Make your threads as simple as possible.
Try not to use global variables. Global constants (actual constants that never change) is fine. When you do need to use global or shared variables you need to protect them with some type of mutex/lock (semaphore, monitor, ...).
Make sure that you actually understand what how your mutexes work. There are a few different implementations which can work differently.
Try to organize your code so that the critical sections (places where you hold some type of lock(s) ) are as quick as possible. Be aware that some functions may block (sleep or wait on something and keep the OS from allowing that thread to continue running for some time). Do not use these while holding any locks (unless absolutely necessary or during debugging as it can sometimes show other bugs).
Try to understand what more threads actually does for you. Blindly throwing more threads at a problem is very often going to make things worse. Different threads compete for the CPU and for locks.
Deadlock avoidance requires planning. Try to avoid having to acquire more than one lock at a time. If this is unavoidable decide on an ordering you will use to acquire and release the locks for all threads. Make sure you know what deadlock really means.
Debugging multi-threaded or distributed applications is difficult. If you can do most of the debugging in a single threaded environment (maybe even just forcing other threads to sleep) then you can try to eliminate non-threading centric bugs before jumping into multi-threaded debugging.
Always think about what the other threads might be up to. Comment this in your code. If you are doing something a certain way because you know that at that time no other thread should be accessing a certain resource write a big comment saying so.
You may want to wrap calls to mutex locks/unlocks in other functions like:
int my_lock_get(lock_type lock, const char * file, unsigned line, const char * msg) {
thread_id_type me = this_thread();
logf("%u\t%s (%u)\t%s:%u\t%s\t%s\n", time_now(), thread_name(me), me, "get", msg);
lock_get(lock);
logf("%u\t%s (%u)\t%s:%u\t%s\t%s\n", time_now(), thread_name(me), me, "in", msg);
}
And a similar version for unlock. Note, the functions and types used in this are all made up and not overly based on any one API.
Using something like this you can come back if there is an error and use a perl script or something like it to run queries on your logs to examine where things went wrong (matching up locks and unlocks, for instance).
Note that your print or logging functionality may need to have locks around it as well. Many libraries already have this built in, but not all do. These locks need to not use the printing version of the lock_[get|release] functions or you'll have infinite recursion.
Beware of global variables even if
they are const, in particular in
C++. Only POD that are statically
initialized "à la" C are good here.
As soon as a run-time constructor
comes into play, be extremely
careful. AFAIR initialization order
of variables with static linkage that are in
different compilation units are
called in an undefined order. Maybe
C++ classes that initialize all
their members properly and have an
empty function body, could be ok
nowadays, but I once had a bad
experience with that, too.
This is one of the reason why on the
POSIX side pthread_mutex_t is much
easier to program than sem_t: it
has a static initializer
PTHREAD_MUTEX_INITIALIZER.
Keep critical sections as short as
possible, for two reasons: it might
be more efficient at the end, but
more importantly it is easier to
maintain and to debug.
A critical section should never be
longer that a screen, including the
locking and unlocking that is needed
to protect it, and including the
comments and assertions that help
the reader to understand what is
happening.
Start implementing critical sections
very rigidly maybe with one global
lock for them all, and relax the
constraints afterwards.
Logging might is difficult if many
threads start to write at the same
time. If every thread does a
reasonable amount of work try to
have them each write a file of their
own, such that they don't interlock
each other.
But beware, logging changes behavior
of code. This can be bad when bugs
disappear, or beneficial when bugs
appear that you otherwise wouldn't
have noticed.
To make a post-mortem analysis of
such a mess you have to have
accurate timestamps on each line
such that all the files can be
merged and give you a coherent view
of the execution.
-> Add priority inversion to that list.
As another poster eluded to, log files are wonderful things. For deadlocks, using a LogLock instead of a Lock can help pinpoint when you entities stop working. That is, once you know you've got a deadlock, the log will tell you when and where locks were instantiated and released. This can be enormously helpful in tracking these things down.
I've found that race conditions when using an Actor model following the same message->confirm->confirm received style seem to disappear. That said, YMMV.