C++ What exactly does exit() do when using multiple threads? - c++

I'm working on some code that uses an existing code-base which is now a DLL. What I'm trying to do it to terminate all the threads that are were started, but keep my main program running.
This is the basic structure of the code:
void Mainprogram()
{
tempProcessingThreadHandle = (HMODULE)_beginthread(SomeDLLEntry, 0, (void)&Params); //SomeDLLEntry is a function in some.dll
//Other Code I Want to Run
}
In the DLL:
void SomeDLLEntry()
{
tempProcessingThreadHandle = (HMODULE)_beginthread(SomeOtherDLLThing1, 0, (void)&Params);
tempProcessingThreadHandle = (HMODULE)_beginthread(SomeOtherDLLThing2, 0, (void)&Params);
if (someCondition)
return;
}
void SomeOtherDLLThing1()
{
if (someOtherCondition)
exit(1);
}
I thought that returning from SomeDLLEntry() would cause the threads started in the DLL (SomeOtherDLLThing1 and 2) to terminate as well, but that's not the case, as seen in the debugger; SomeDLLEntry() thread would disappear, but the others are still running.
Now, if I set someOtherCondition to true, and exit(1) is called from SomeOtherDLLThing1(), what should happen? When I debug over this, the debugger seems to crash, but it appears to go past the exit(1) line and give me:
Unhandled exception at 0x6B4E87CD (Mso20win32client.dll) in Mainprogram.exe: 0xC0000005: Access violation reading location 0x0000018C.
Is this because the whole process (including Mainprogram()) has been terminated? What exactly does calling exit(1) in SomeOtherDLLThing1() do? How can I properly terminate all of the DLL-related threads and continue with my Mainprogram()?

The only way to safely terminate threads and continue to execute is to coordinate with those threads, have them cleanly finish, then join the thread handles.
There is no free lunch.
Other choices are "hack, force halt of threads, and pray you get lucky" or "do your work in another process, and pray summary shutdown causes no problems".

Related

Better way to monitor and kill other program's stalled process in linux?

I need my program to run some other program, but if the other program won't return within some time limit, I need to kill it. I came up with the following solution that seems to be working.
int main()
{
int retval, timeout=10;
pid_t proc1=fork();
if(proc1>0)
{
while(timeout)
{
waitpid(proc1, &retval, WNOHANG);
if(WIFEXITED(retval)) break; //normal termination
sleep(1);
--timeout;
if(timeout==0)
{
printf("attempt to kill process\n");
kill(proc1, SIGTERM);
break;
}
}
}
else if(proc1==0)
{
execlp("./someprogram", "./someprogram", "-a", "-b", NULL);
}
//else if fork failed etc.
return 0;
}
I need my program to be as robust as possible but I am new to programming under linux so I may not be aware of possible problems with it. My questions are:
1. Is this a proper solution to this particular problem or are there better methods?
2. Does anyone see possible problems or bugs that can lead to an unexpected behavior or a leak of system resources?
(WIFEXITED(retval)) won't return true if the program is killed by a signal (including say a crash due to segmentation violation).
Probably best to just check for a successful return from waitpid. That will only happen if the program is terminated (whether voluntarily or not).
Depending on how important it is to make sure the process is gone...
After killing the process with SIGTERM, you could sleep another second or so and if it's still not gone, use SIGKILL to be sure.

c++ floats and valgrind strange behaviour

I have valgrind 3.6.0, I've searched everywhere and found nothing.
The problem is that when I'm trying to access a float number while using valgrind, I get a segfault, but when I run the program as is, without valgrind, everythings goes as expected.
This is the piece of code:
class MyClass {
public:
void end() {
float f;
f = 1.23;
std::stringstream ss;
ss << f;
std::cout << ss.str();
}
};
extern "C" void clean_exit_on_sig(int sig) {
//Code logging the error
mc->end();
exit(1);
}
MyClass *mc;
int main(int argc, char *argv[]) {
signal(SIGINT , clean_exit_on_sig);
signal(SIGABRT , clean_exit_on_sig);
signal(SIGILL , clean_exit_on_sig);
signal(SIGFPE , clean_exit_on_sig);
signal(SIGSEGV, clean_exit_on_sig);
signal(SIGTERM , clean_exit_on_sig);
mc = new MyClass();
while(true) {
// Main program loop
}
}
When I press Control+C, the program catches the signal correctly and everything goes fine, but when I run the program using valgrind, when tries to execute this command ss << f; // (Inside MyClass) a segfault is thrown :-/
I've tried this too:
std::string stm = boost::lexical_cast<std::string>(f);
But I keep on receiving a segfault signal when boost acceses the float number too.
This is the backtrace when I get segfault with boost:
./a.out(_Z17clean_exit_on_sigi+0x1c)[0x420e72]
/lib64/libc.so.6(+0x32920)[0x593a920]
/usr/lib64/libstdc++.so.6(+0x7eb29)[0x51e6b29]
/usr/lib64/libstdc++.so.6(_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE15_M_insert_floatIdEES3_S3_RSt8ios_baseccT_+0xd3)[0x51e8f43]
/usr/lib64/libstdc++.so.6(_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE6do_putES3_RSt8ios_basecd+0x19)[0x51e9269]
/usr/lib64/libstdc++.so.6(_ZNSo9_M_insertIdEERSoT_+0x9f)[0x51fc87f]
./a.out(_ZN5boost6detail26lexical_stream_limited_srcIcSt15basic_streambufIcSt11char_traitsIcEES4_E9lcast_putIfEEbRKT_+0x8f)[0x42c251]
./a.out(_ZN5boost6detail26lexical_stream_limited_srcIcSt15basic_streambufIcSt11char_traitsIcEES4_ElsEf+0x24)[0x42a150]
./a.out(_ZN5boost6detail12lexical_castISsfLb0EcEET_NS_11call_traitsIT0_E10param_typeEPT2_m+0x75)[0x428349]
./a.out(_ZN5boost12lexical_castISsfEET_RKT0_+0x3c)[0x426fbb]
./a.out(This line of code corresponds to the line where boost tries to do the conversion)
and this is with the default stringstream conversion:
./a.out(_Z17clean_exit_on_sigi+0x1c)[0x41deaa]
/lib64/libc.so.6(+0x32920)[0x593a920]
/usr/lib64/libstdc++.so.6(+0x7eb29)[0x51e6b29]
/usr/lib64/libstdc++.so.6(_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE15_M_insert_floatIdEES3_S3_RSt8ios_baseccT_+0xd3)[0x51e8f43]
/usr/lib64/libstdc++.so.6(_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE6do_putES3_RSt8ios_basecd+0x19)[0x51e9269]
/usr/lib64/libstdc++.so.6(_ZNSo9_M_insertIdEERSoT_+0x9f)[0x51fc87f]
./a.out(This line of code corresponds to the line where I try to do the conversion)
a.out is my program, and I run valgrind this way: valgrind --tool=memcheck ./a.out
Another weird thing is that when I call mc->end(); while the program runs fine (Any signal received, Object just finished his work), I don't get segfault in any way (as is and with valgrind).
Please, don't tell me 'Don't close your program with Control+C blah blah...' this piece of code is for logging any error the program possibly have without losing data in case of segfault, killing it because of deadlock or something else.
EDIT: Maybe is a valgrind bug (I don't know, searched on google but found nothing, don't kill me), any workaround will be accepted too.
EDIT2: Just realized that boost calls ostream too (Here is clearer than using vim :-/), going to try sprintf float conversion.
EDIT3: Tried this sprintf(fl, "%.1g", f); but still crashes, backtrace:
./a.out(_Z17clean_exit_on_sigi+0x40)[0x41df24]
/lib64/libc.so.6(+0x32920)[0x593a920]
/lib64/libc.so.6(sprintf+0x56)[0x5956be6]
./a.out(Line where sprintf is)
Ok, after some hours of reading and research, I found the problem, I'm going to answer my own question because noone does, only a comment by #Kerrek SB [ https://stackoverflow.com/users/596781/kerrek-sb ] but I cannot accept a comment. (Thank you)
It's as easy as inside a signal handler you only can call a bunch of functions safely: http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html
If you call some non-async-safe functions, they can work, but not always.
If you want to call non-async-safe functions inside a signal handler, you can do this:
Create 2 pipes. int pip1[2]; int pip2[2]; pipe(pip1); pipe(pip2);
Create a new thread and make the thread wait to receive some data from the 1rst pipe read(pip1[0], msg, 1);
When signal handler is called, use write async-safe function to write to the 1rst pipe write(pip1[1], "0", 1);
Then make the signal wait for the second pipe with read(pip2[0], msg, 1);
The thread will wake up and do all the job he has to do (saving data to database in this case), after that, make the thread write data to the second pipe write(pip2[1], "0", 1);
Now main thread will wake up and finish with _Exit(1) or something else.
Info:
I'm using 2 pipes because if I write to a pipe and just after that I read it, it's possible that the 2nd thread never wakes up because the main thread have read the data have just written. And I'm using a secondary pipe to block the main thread because I don't want it to exit while the 2nd thread is saving data.
Keep in mind that signal handler maybe has been called while modifying a shared resource, if your 2nd thread acceses that resource is possible that you encounter a second segfault, so be careful when accesing shared resources with your 2nd thread (Global variables or something else).
If you are testing with valgrind and don't want to receive 'false' memory leaks when receiving a signal you can do this before exiting pthread_join(2ndthread, NULL) and exit(1) instead of _Exit(1). These are non-async-safe functions, but at least you can test memory leaks and close you app with a signal without receiving 'false' memory leaks.
Hope this helps someone. Thanks again #Kerrek SB.
Debuggers and stuff sometimes toss signals to the process that you don't normally get. I had to alter a function that used recv to work under gdb for example. Check to see what your signal is and verify that mc is not null before trying to use it. See if that starts getting you closer to an answer.
I am thinking perhaps your use of new (or something else maybe) is possibly causing valgrind to send a signal that is being caught by your handler before mc is initialized.
It's also clear you didn't paste actual code because your use of 'class' without making the end() function public means this should not compile.

C++ program exited with code 0 error

I'm doing c++ at my job for the first time in years and am trying to track down a problem. I wrote code that goes out and enumerates the processes running on a machine and returns performance metrics. My problem is that some sort of unhandled error occurs and in the debug window I get a message saying the program has exited with code 0. Here is the code in the main function
int _tmain(int argc, _TCHAR* argv[])
{
while(nRun == 1)
{
try
{
WriteHeartBeat();
DoProcessLoop(dwTotalRAM, nCheckPause, oPMeter, cFileName, oProcess, oCPUUsage, nProcCount, ddsCaps2, lpDD);
CopyPerfFileToDest(cFileName);
nRun = 1;
tEnd = time(NULL);
}catch(...){
AddToLog("Error in Main Function");
}
}
AddToLog("App Stopped");
return 0;
}
The program runs for a long time but after a while it just comes back saying it exited with code 0 but that "App Stopped" line is never printed into the log. Does anyone know what kind of error I could have or what issue could be occuring? Is that try catch block sufficient enough to catch any error that could occur or is there something else I could do. Any help you could offer would be really appreciated.
EDIT: The log file should get 3 entries from here if it exits correctly. They are "Doing Process Loop" for the the "DoProcessLoop" Function, "Copying File" for the "CopyPerfFileToDest" function and the "App Stopped" if it stops correctly. When I make it stop correctly myself I get all 3 lines, when it is stopping incorrectly I only get "Doing Process Loop" in the log and then it exits with code 0. The error must be in there. I was curious if there is a generic error trap I can do to catch any all errors.
This can happen if one of functions called from _tmain called exit(0):
http://www.cplusplus.com/reference/clibrary/cstdlib/exit/
http://msdn.microsoft.com/en-us/library/6wdz5232.aspx
Sometimes files are not flushed right so if the AddToLog function defers writing to a file right before exit then it might not write out the value. You can debug the program to see if something strange is happening, or add a variable like status and set it in your catch function then return it at the end, so you know based on the value if there was an error.
Try changing the return of addtolog with a boolean and surrounding add to log with:
boolean logged=false;
while(!logged){
logged = AddToLog("App Stopped");
}
This should prevent the program from exiting until "App Stopped" is written, which may be the problem.

Interrupt running program and save data

How to design a C/C++ program so that it can save some data after receiving interrupt signal.
I have a long running program that I might need to kill (say, by pressing Ctrl-C) before it finished running. When killed (as opposed to running to conclusion) the program should be able to save some variables to disk. I have several big Linux books, but not very sure where to start. A cookbook recipe would be very helpful.
Thank you.!
to do that, you need to make your program watch something, for example a global variable, that will tell him to stop what it is doing.
For example, supposing your long-running program execute a loop, you can do that :
g_shouldAbort = 0;
while(!finished)
{
// (do some computing)
if (g_shouldAbort)
{
// save variables and stuff
break; // exit the loop
}
}
with g_shouldAbort defined as a global volatile variable, like that :
static volatile int g_shouldAbort = 0;
(It is very important to declare it "volatile", or else the compiler, seeing that no one write it in the loop, may consider that if (g_shouldAbort) will always be false and optimize it away.)
then, using for example the signal API that other users suggested, you can do that :
void signal_handler(int sig_code)
{
if (sig_code == SIGUSR1) // user-defined signal 1
g_shouldAbort = 1;
}
(you need to register this handler of course, cf. here.
signal(SIGUSR, signal_handler);
Then, when you "send" the SIGUSR1 signal to your program (with the kill command for example), g_shouldAbort will be set to 1 and your program will stop its computing.
Hope this help !
NOTE : this technique is easy but crude. Using signals and global variables makes it difficult to use multiple threads of course, as other users have outlined.
What you want to do isn't trivial. You can start by installing a signal handler for SIGINT (C-c) using signal or sigaction but then the hard part starts.
The main problem is that in a signal handler you can only call async-signal-safe functions (or reentrant functions). Most library function can't be reliably considered reentrant. For instance, stdio functions, malloc, free and many others aren't reentrant.
So how do you handle this ? Set a flag in you handler (set some global variable done to 1) and look out for EINTR errors. It should be safe to do the cleanup outside the handler.
What you are trying to do falls under the rubric of checkpoint/restart.
There's several big problems with using a signal-driven scheme for checkpoint/restart. One is that signal handlers have to be very compact and very primitive. You cannot write the checkpoint inside your signal handler. Another problem is that your program can be anywhere in its execution state when the signal is sent. That random location almost certainly is not a safe point from which a checkpoint can be dropped. Yet another problem is that you need to outfit your program with some application-side checkpoint/restart capability.
Rather than rolling your own checkpoint/restart capability, I suggest you look into using a free one that already exists. gdb on linux provides a checkpoint/restart capability. Another is DMTCP, see http://dmtcp.sourceforge.net/index.html .
Use signal(2) or sigaction(2) to assign a function pointer to the SIGINT signal, and do your cleanups there.
Make your you enter only once in your save function
// somewhere in main
signal( SIGTERM, signalHandler );
signal( SIGINT, signalHandler );
void saveMyData()
{
// save some data here
}
void signalHandler( int signalNumber )
{
static pthread_once_t semaphore = PTHREAD_ONCE_INIT;
std::cout << "signal " << signalNumber << " received." << std::endl;
pthread_once( & semaphore, saveMyData );
}
If your process get 2 or more signals before you finish writing your file you'll save weird data

Make main() "uncrashable"

I want to program a daemon-manager that takes care that all daemons are running, like so (simplified pseudocode):
void watchMe(filename)
{
while (true)
{
system(filename); //freezes as long as filename runs
//oh, filename must be crashed. Nevermind, will be restarted
}
}
int main()
{
_beginThread(watchMe, "foo.exe");
_beginThread(watchMe, "bar.exe");
}
This part is already working - but now I am facing the problem that when an observed application - say foo.exe - crashes, the corresponding system-call freezes until I confirm this beautiful message box:
This makes the daemon useless.
What I think might be a solution is to make the main() of the observed programs (which I control) "uncrashable" so they are shutting down gracefully without showing this ugly message box.
Like so:
try
{
char *p = NULL;
*p = 123; //nice null pointer exception
}
catch (...)
{
cout << "Caught Exception. Terminating gracefully" << endl;
return 0;
}
But this doesn't work as it still produces this error message:
("Untreated exception ... Write access violation ...")
I've tried SetUnhandledExceptionFilter and all other stuff, but without effect.
Any help would be highly appreciated.
Greets
This seems more like a SEH exception than a C++ exception, and needs to be handled differently, try the following code:
__try
{
char *p = NULL;
*p = 123; //nice null pointer exception
}
__except(GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION ?
EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)
{
cout << "Caught Exception. Terminating gracefully" << endl;
return 0;
}
But thats a remedy and not a cure, you might have better luck running the processes within a sandbox.
You can change the /EHsc to /EHa flag in your compiler command line (Properties/ C/C++ / Code Generation/ Enable C++ exceptions).
See this for a similar question on SO.
You can run the watched process a-synchronously, and use kernel objects to communicate with it. For instance, you can:
Create a named event.
Start the target process.
Wait on the created event
In the target process, when the crash is encountered, open the named event, and set it.
This way, your monitor will continue to run as soon as the crash is encountered in the watched process, even if the watched process has not ended yet.
BTW, you might be able to control the appearance of the first error message using drwtsn32 (or whatever is used in Win7), and I'm not sure, but the second error message might only appear in debug builds. Building in release mode might make it easier for you, though the most important thing, IMHO, is solving the cause of the crashes in the first place - which will be easier in debug builds.
I did this a long time ago (in the 90s, on NT4). I don't expect the principles to have changed.
The basic approach is once you have started the process to inject a DLL that duplicates the functionality of UnhandledExceptionFilter() from KERNEL32.DLL. Rummaging around my old code, I see that I patched GetProcAddress, LoadLibraryA, LoadLibraryW, LoadLibraryExA, LoadLibraryExW and UnhandledExceptionFilter.
The hooking of the LoadLibrary* functions dealt with making sure the patching was present for all modules. The revised GetProcAddress had provide addresses of the patched versions of the functions rather than the KERNEL32.DLL versions.
And, of course, the UnhandledExceptionFilter() replacement does what you want. For example, start a just in time debugger to take a process dump (core dumps are implemented in user mode on NT and successors) and then kill the process.
My implementation had the patched functions implemented with __declspec(naked), and dealt with all the registered by hand because the compiler can destroy the contents of some registers that callers from assembly might not expect to be destroyed.
Of course there was a bunch more detail, but that is the essential outline.