We have a custom application we use built around VB/C++ code.
This code will run for days, weeks, months, without this throwing up an exception error.
I'm trying to learn more about how this error is thrown, and how to interpret (if you can) the error listed when an exception is thrown. I've googled some information and read the Microsoft provided error description, but I'm still stuck with the task of trouble shooting something that occurs once in a blue moon. There is no known set of interactions with the software that causes this and appears to happen randomly.
Is the first exception the root cause? Is it all the way down the stack call? Can anyone provide any insight on how to read these codes so I could interpret where I actually need to look.
Any information or guidance on reading the exception or making any use of it, and then trouble shooting it would be helpful. The test below is copied from windows log when the event was thrown.
Thanks in advance for any help.
Application: Epic.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: System.AccessViolationException [![enter image description here][1]][1]
at MemMap.ComBuf.IsCharAvailable(Int32)
at HMI.frmPmacStat.RefreshTimer_Elapsed(System.Object, System.Timers.ElapsedEventArgs)
at System.Timers.Timer.MyTimerCallback(System.Object)
at System.Threading.TimerQueueTimer.CallCallbackInContext(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext,
System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,
System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.TimerQueueTimer.CallCallback()
at System.Threading.TimerQueueTimer.Fire()
at System.Threading.TimerQueue.FireQueuedTimerCompletion(System.Object)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
There are exceptions that are thrown by the c++ runtime environment, as a result of executing a throw expression, and there are other types of errors caused by the operating system or hardware trapping your instruction. Invalid access to memory is generally not thrown by code in c++, but is a side effect of evaluating an expression trying to access memory at an invalid address, resulting in the OS signaling the process, usually killing it. Because it's outside C++, it tends to be platform specific, but typical errors are:
reading a null pointer
using a pointer to an object that has been deleted
going outside an array's valid range of elements
using invalidated iterators into STL containers
Generally speaking, you can test for null and array bounds at runtime to detect the problem before it happens. Using a dangling pointer is more difficult to track down, because the time between the delete and the mis-use of that pointer can be long, and it can be difficult to find why it happened without a memory debugger, such as valgrind. Using smart pointers instead of raw pointers can help mitigate the problems of mis-managing memory, and can help avoid such errors.
Invalid iterators are subset of the general dangling pointer problem, but are common enough to be worth mentioning as their own category. Understanding your containers and which operations invalidate them is crucial, and some implementations can be compiled in "debug mode" which can help detect use of invalidated iterators.
As others have noted, this type of error is tricky to identify without digging into the code and running tests (automated or manual). The more pieces of the system you can pull out and still reproduce it, the better. Divide and conquer is your friend here.
Beyond that, it all depends how important it is for you to resolve this and how much effort you're willing to put in. There are at least three classes of tools that can help with such intermittent problems:
Application monitors that track potential errors as your application runs. These tend to slow your program significantly (10x or more slowdown). Examples include:
Microsoft's Application Verifier
Open-source and cross-platform Dr. Memory
Google's Crashpad. Unlike the previous two programs, this one requires instrumenting your code. It is also (allegedly -- haven't tried it) easier to use with helpers like Backtrace's commercial integration for analyzing Crashpad output
Google's Sanitizers - free and some are built into gcc and clang. There's also a Windows port of Address Sanitizer, but a cursory look suggests it may be a little bit of a second-class citizen.
If you can run and repro it also run it on Linux, you could use valgrind; rr (see this CppCast ep) which is a free extension for gdb that records and replays your program so you can record a debug session that crashed and then step through it to see what went wrong; and/or UndoDB and friends from Undo software, which is a more sophisticated, commercial product like rr.
Static analysis of the code. This is a set of tools that looks for common errors in your code. It generally has a low signal-to-noise ratio, so there are a lot of minor things to dig through if your run it on a large, existing project (best to start a project using these things from the beginning if possible). That said, many of the warnings are invaluable. Examples:
Most compilers have a subset of this functionality built in. If you're using Visual Studio, add /W4 /WX to your compilation flags for the C++ code to force maximum warnings, then fix all the warnings. For gcc and clang, add '-Wall -Wpedantic -Werror` to enforce no warnings.
PVS-Studio (commercial)
PC-Lint (commercial)
If you can instrument the code to write log messages, something like Debugview++ may be of assistance.
Things get harder if you have multithreading going on, which it looks like you do, because the non-determinism gets harder to track, there are new classes of possible errors that are introduced, and some of the above tools won't work well (e.g., I think rr is single-threaded only). Beyond a full IDE like Visual Studio, you'd need to go with something like Intel's Inspector (formerly Intel Thread Checker), or on Linux, Valgrind's Helgrind and DRD and ThreadSanitizer (in the sanitizers above, but also Linux only AFAIK). But hopefully this list gives you a place to start.
Related
Google's C++ style guide says "We do not use exceptions". The style does not mention STL with respect to usage of exception. Since STL allocators can fail, how do they handle exceptions thrown by containers?
If they use STL, how is the caller informed of allocation failures? STL methods like push_back() or map operator[] do not return any status codes.
If they do not use STL, what container implementation do they use?
They say that they don't use exceptions, not that nobody should use them. If you look at the rationale they also write:
Because most existing C++ code at Google is not prepared to deal with exceptions, it is comparatively difficult to adopt new code that generates exceptions.
The usual legacy problem. :-(
We simply don't handle exceptions thrown by containers, at least in application-level code.
I've been an engineer at Google Search working in C++ since 2008. We do use STL containers often. I cannot personally recall a single major failure or bug that was ever traced back to something like vector::push_back() or map::operator[] failing, where we said "oh man, we have to rewrite this code because the allocation could fail" or "dang, if only we used exceptions, this could have been avoided." Does a process ever run out of memory? Yes, but this is usually a simple mistake (e.g., someone added a large new data file to the program and forgot to increase the RAM allocation) or a catastrophic failure where there's no good way to recover and proceed. Our system already manages and restarts jobs automatically to be robust to machines with faulty disks, cosmic rays, etc., and this is really no different.
So as far as I can tell, there is no problem here.
I'm pretty sure that they mean they do not use exceptions in their code. If you check out their cpplint script, it does check to ensure you are including the correct headers for STL containers (like vector, list, etc).
I have found that Google mentions this explicitly about STL and exceptions (emphasis is mine):
Although you should not use exceptions in your own code, they are used
extensively in the ATL and some STLs, including the one that comes
with Visual C++. When using the ATL, you should define
_ATL_NO_EXCEPTIONS to disable exceptions. You should investigate whether
you can also disable exceptions in your STL, but if not, it is
OK to turn on exceptions in the compiler. (Note that this is only to
get the STL to compile. You should still not write exception handling
code yourself.)
I don't like such decisions (lucky that I am not working for Google), but they are quite clear about their behaviour and intentions.
You can’t handle allocation failures anyway on modern operating systems; as a performance optimization, they typically over-commit memory. For instance, if you call malloc() and ask for a really huge chunk of memory on Linux, it will succeed even if the memory required to back it actually isn’t there. It’s only when you access it that the kernel actually tries to allocate pages to back it, and at that point it’s too late to tell you that the allocation failed anyway.
So:
Except in special cases, don’t worry about allocation failures. If the machine runs out of memory, that’s a catastrophic failure from which you can’t reliably recover.
Nevertheless, it’s good practice to catch unhandled exceptions and log the e.what() output, then re-throw, since that may be more informative than a backtrace, and typical C++ library implementations don’t do that automatically for you.
The whole huge thread above about how you can’t rely on crashing when you run out of memory is complete and utter rubbish. The C(++) standard may not guarantee it, but on modern systems crashing is the only thing you can rely on if you run out of memory. In particular, you can’t rely on getting a NULL or indeed any other indication from your allocator, up to and include a C++ exception.
If you find yourself on an embedded system where page zero is accessible, I strongly suggest you fix that by mapping an inaccessible page at that location. Human beings cannot be relied upon to check for NULL pointers everywhere, but you can fix that by mapping a page once rather than trying to correct every possible (past, present and future) location at which someone might have missed a NULL.
I will qualify the above by saying that it’s possible you’re using some kind of custom allocator, or that you’re on a system that doesn’t over-commit (embedded systems without swap are one example of that, but not the only example). In that case, maybe you can handle out-of-memory conditions gracefully, on your system. But in general in the 21st century I’m afraid you are unlikely to get the chance; the first you’ll know that your system is out of memory is when things start crashing.
Stl itself is directly only throwing in case of memory allocation failure. But usually a real world application can fail for a variety of reasons, memory allocation failure just one of them. On 32 bit systems memory allocation failure is not something which should be ignored, as it can occur. So the entire discussion above that memory allocation failure is not going to happen is kind of pointless. Even assuming this, one would have to write ones code using two step initialization. And C++ exception handling predates 64 bit architectures by a long time.
I'm not certain how far I should go in dignifying the negative professionalism shown here by google and only answer the question asked. I remember some paper from IBM in around 1997 stating how well some people at IBM understood & appreciated the implications of C++ Exception Handling. Ok professionalism is not necessary
an indicator of success.
So giving up exception handling is not only a problem if one uses STL. It is a problem if one uses C++ as such. It means giving up on
constructor failure
being able to use member objects and base class objects as arguments for any of the following base/member class constructors ( without any testing). It is no wonder that people used two step construction before C++ exception handling existed.
giving up on hierarchical & rich error messages in an environment which allows for code to be provided by customers or third parties and throw errors, which the original writer of the calling code could not possible have foreseen when writing the calling code and could have provided space for in his return error code range.
avoids such hacks as returning a pointer to a static memory object to message allocation failure which the authors of FlexLm did
being able to use a memory allocator returning addresses into a memory mapped sparse file. In this case allocation failure happens when one accesses the memory in question.(ok currently this works only on windows but apple forced the gnu team to provide the necessary functionality in the G++ compiler. Some more pressure from Linux g++ developer will be necessary to provide the this functionality also for them) (oops -- this even applies to STL)
being able to leave this C style coding behind us (ignoring return values) and having to use a debugger with debug executable to find out what is failing in a non trivial environment with child processes and shared libraries provided by third parties or doing remote execution
being able to return rich error information to the caller without just dumping everything to stderr
There is only one possibility to handle allocation failure under assumptions outlined in the question:
that allocator force application exit on allocation failure. In particular, this requires the cusror allocator.
Index-out-of-bound exceptions are less interesting in this context,
because application can ensure they won't happen using pre-checks.
Late to the party here although I didn't see any comparisons between C++ exception handling at Google and exception handling in Go. Specifically, Go only has error handling via a built-in error type. The linked Golang blog post explicitly concludes
Proper error handling is an essential requirement of good software. By employing the techniques described in this post you should be able to write more reliable and succinct Go code.
The creation of Golang certainly took considerations of best practices from working with C++ into account. The underlying intuition is that less can be more. I haven't worked at Google but do find their use of C++ and creation of Golang to be potentially suggestive of underlying best practices at the company.
I know that E&C is a controversial subject and some say that it encourages a wrong approach to debugging, but still - I think we can agree that there are numerous cases when it is clearly useful - experimenting with different values of some constants, redesigning GUI parameters on-the-fly to find a good look... You name it.
My question is: Are we ever going to have E&C on GDB? I understand that it is a platform-specific feature and needs some serious cooperation with the compiler, the debugger and the OS (MSVC has this one easy as the compiler and debugger always come in one package), but... It still should be doable. I've even heard something about Apple having it implemented in their version of GCC [citation needed]. And I'd say it is indeed feasible.
Knowing all the hype about MSVC's E&C (my experience says it's the first thing MSVC users mention when asked "why not switch to Eclipse and gcc/gdb"), I'm seriously surprised that after quite some years GCC/GDB still doesn't have such feature. Are there any good reasons for that? Is someone working on it as we speak?
It is a surprisingly non-trivial amount of work, encompassing many design decisions and feature tradeoffs. Consider: you are debugging. The debugee is suspended. Its image in memory contains the object code of the source, and the binary layout of objects, the heap, the stacks. The debugger is inspecting its memory image. It has loaded debug information about the symbols, types, address mappings, pc (ip) to source correspondences. It displays the call stack, data values.
Now you want to allow a particular set of possible edits to the code and/or data, without stopping the debuggee and restarting. The simplest might be to change one line of code to another. Perhaps you recompile that file or just that function or just that line. Now you have to patch the debuggee image to execute that new line of code the next time you step over it or otherwise run through it. How does that work under the hood? What happens if the code is larger than the line of code it replaced? How does it interact with compiler optimizations? Perhaps you can only do this on a specially compiled for EnC debugging target. Perhaps you will constrain possible sites it is legal to EnC. Consider: what happens if you edit a line of code in a function suspended down in the call stack. When the code returns there does it run the original version of the function or the version with your line changed? If the original version, where does that source come from?
Can you add or remove locals? What does that do to the call stack of suspended frames? Of the current function?
Can you change function signatures? Add fields to / remove fields from objects? What about existing instances? What about pending destructors or finalizers? Etc.
There are many, many functionality details to attend to to make any kind of usuable EnC work. Then there are many cross-tools integration issues necessary to provide the infrastructure to power EnC. In particular, it helps to have some kind of repository of debug information that can make available the before- and after-edit debug information and object code to the debugger. For C++, the incrementally updatable debug information in PDBs helps. Incremental linking may help too.
Looking from the MS ecosystem over into the GCC ecosystem, it is easy to imagine the complexity and integration issues across GDB/GCC/binutils, the myriad of targets, some needed EnC specific target abstractions, and the "nice to have but inessential" nature of EnC, are why it has not appeared yet in GDB/GCC.
Happy hacking!
(p.s. It is instructive and inspiring to look at what the Smalltalk-80 interactive programming environment could do. In St80 there was no concept of "restart" -- the image and its object memory were always live, if you edited any aspect of a class you still had to keep running. In such environments object versioning was not a hypothetical.)
I'm not familiar with MSVC's E&C, but GDB has some of the things you've mentioned:
http://sourceware.org/gdb/current/onlinedocs/gdb/Altering.html#Altering
17. Altering Execution
Once you think you have found an error in your program, you might want to find out for certain whether correcting the apparent error would lead to correct results in the rest of the run. You can find the answer by experiment, using the gdb features for altering execution of the program.
For example, you can store new values into variables or memory locations, give your program a signal, restart it at a different address, or even return prematurely from a function.
Assignment: Assignment to variables
Jumping: Continuing at a different address
Signaling: Giving your program a signal
Returning: Returning from a function
Calling: Calling your program's functions
Patching: Patching your program
Compiling and Injecting Code: Compiling and injecting code in GDB
This is a pretty good reference to the old Apple implementation of "fix and continue". It also references other working implementations.
http://sources.redhat.com/ml/gdb/2003-06/msg00500.html
Here is a snippet:
Fix and continue is a feature implemented by many other debuggers,
which we added to our gdb for this release. Sun Workshop, SGI ProDev
WorkShop, Microsoft's Visual Studio, HP's wdb, and Sun's Hotspot Java
VM all provide this feature in one way or another. I based our
implementation on the HP wdb Fix and Continue feature, which they
added a few years back. Although my final implementation follows the
general outlines of the approach they took, there is almost no shared
code between them. Some of this is because of the architectual
differences (both the processor and the ABI), but even more of it is
due to implementation design differences.
Note that this capability may have been removed in a later version of their toolchain.
UPDATE: Dec-21-2012
There is a GDB Roadmap PDF presentation that includes a slide describing "Fix and Continue" among other bullet points. The presentation is dated July-9-2012 so maybe there is hope to have this added at some point. The presentation was part of the GNU Tools Cauldron 2012.
Also, I get it that adding E&C to GDB or anywhere in Linux land is a tough chore with all the different components.
But I don't see E&C as controversial. I remember using it in VB5 and VB6 and it was probably there before that. Also it's been in Office VBA since way back. And it's been in Visual Studio since VS2005. VS2003 was the only one that didn't have it and I remember devs howling about it. They intended to add it back anyway and they did with VS2005 and it's been there since. It works with C#, VB, and also C and C++. It's been in MS core tools for 20+ years, almost continuous (counting VB when it was standalone), and subtracting VS2003. But you could still say they had it in Office VBA during the VS2003 period ;)
And Jetbrains recently added it too their C# tool Rider. They bragged about it (rightly so imo) in their Rider blog.
I have a fairly complex (approx 200,000 lines of C++ code) application that has decided to crash, although it crashes a little differently on a couple of different systems. The trick is that it doesn't crash or trap out in debugger. It only crashes when the application .EXE is run independently (either the debug EXE or the release EXE - both behave the same way). When it crashes in the debug EXE, and I get it to start debugging, the call stack is buried down into the windows/MFC part of things, and isn't reflecting any of my code. Perhaps I'm seeing a stack corruption of some sort, but I'm just not sure at the moment. My question is more general - it's about tools and techniques.
I'm an old programmer (C and assembly language days), and a relative newcomer (couple/few years) to C++ and Visual Studio (2003 for this projecT).
Are there tricks or techniques anyone's had success with in tracking down crashing issues when you cannot make the software crash in a debugger session? Stuff like permission issues, for example?
The only thing I've thought of is to start plugging in debug/status messages to a logfile, but that's a long, hard way to go. Been there, done that. Any better suggestions? Am I missing some tools that would help? Is VS 2008 better for this kind of thing?
Thanks for any guidance. Some very smart people here (you know who you are!).
cheers.
lint.
C/C++ Free alternative to Lint?
I've not done C++ professionally for over 10 years, but back in the day I used Rational PurifyPlus, which will be a good start, as is BoundsChecker (if it still exists!) These products find out of bounds accesses, corrupted memory, corrupted stack and other problems that can go undetected until "boom" and then you have no idea where you are.
I would try these first. If that fails, then you can start typing in logging statements.
If the debugger mitigates the crash, this can be for these reasons:
memory corruption: under a debug build memory is allocated with space before an after, so rogue writes may not corrupt under a debug session
timing and multi-threading: the debugger alters timing of threads and can make tricky multi-threaded problems hard to nail down.
If it's memory corruption, a memory tracking/diagnostic tool (I used to use BoundsChecker to great effect in the good old days of C++) may help you to locate and fix the cause in minutes, where any other technique coud take days or even months.
For other cases, you've suggested another approach yourself: a sometimes labour-intensive but very effective approach to getting a "real" stack trace is to simply use printf - a vastly underrated debugging tool available in every environment. If you have a rough idea you can straddle the crash area with only a few log messages to narrow down the location, and then add more as you home in on the problem area. This can often unearth enough clues that you can isolate the cause of the crash in a few minutes, even though it can seem like a lot of work and perhaps a hopeless cause before you start.
edit:
Also, if you have the application under source control, then get a historical version from when you think it was working, and then do a binary chop between that date and "now" to isolate when the issue began to occur. This can often narrow down a bug to the precise checkin that introduced the bug, and if you're lucky it will point you at a few lines of code. (If you're unlucky the bug won't be so easily repeatable, or you'll narrow it down to a 500-file checkin where a major refactoring or similar took place)
Get the debugging tool kit from MS ( http://www.microsoft.com/whdc/devtools/debugging/default.mspx ).
Set adplus up for crash mode monitoring ( http://www.microsoft.com/whdc/devtools/debugging/default.mspx ).
This should get you a crash dump when the app crashes. Load the dump up in WindDbg from the debugging toolkit and analyze using that. It is a painful, but very powerful, process to anaylyze out-of-debugger crashes.
There are quite a few resources around for using WinDbg - a good book on general Windows unmanaged debugging and the tools in the debugging kits is: http://www.amazon.com/Advanced-Windows-Debugging-ebook/dp/B000XPNUMW
I couldn't recommend more the blog of Mark Rusinovich. Absolutely brilliant guy from whom you can learn a whole bunch of debugging techniques for windows and many more. Especially try read some of the "The Case of" series! Amazing stuff!
For example take a look at this case he had investigated - a crash of IE. He shows how to capture the stack of the failing thread and many more interesting stuff. His main tools are windows debugging tools and also his sysinternals tools!
Enough said. Go read it!
Also I would recommend the book: Windows Internals 5. Again by Mark and company.
Might be that you have a too big object on the stack...
Explainations (from comments):
I gives this answer because that's the only case I've seen that a debuger (VS or CodeWarrior) couldn't catch and seeemed mysterious. Most of the time, that was the big application object that was defined on the stack in the main() function, and having members not allocated on the heap. Just calling new to instantiate the object fixed the obscure problem. Didn't need to get a specific tool for that in the end.
My experience is that sometime indeed program launched by the debugger (release or debug mode) don't crash as they do when launch on their own.
But I don't recall a case when the very same program launch on it's own, and then attached and continued through a debugger don't reproduce the crash.
An other and better approach if the crash doesn't always happens, would be to be able to produce a minidump (equivalent of unix coredump) and do a postmortem analysis,
there are plenty of tools on windows to do that, for example look at:
http://www.codeproject.com/KB/debug/postmortemdebug_standalone1.aspx?df=100&forumid=3419&exp=0&select=1114393
(perhaps someone may have a better link that this one).
I have been working for the last few weeks trying to track down a really difficult bug that crashes my application. First, the application was crashing on the assign of a std::string, then during the free of a local variable.
After careful inspection of the code, there was no reason for it to crash at these locations; however, it always crashed while trying to free an invalid pointer (i.e. a pointer that pointed to invalid memory). And I have no idea why this pointer was not pointing to the right location.
I suspect that the issue has to do with a memory corruption problem or pointer corruption problem of some sort. The problem is that I can't visually track it down....yet. I have no idea where to start looking in the code, and there are thousands of lines of code to go through so this does not seem like a realistic approach to the problem.
So in comes Valgrind...
A tool that I have depended upon many a time to find issues within the code that may lead to a crash of this type. However, this time it has come up empty handed! I do not see any errors in valgrind when the problem occurs and so hence the reason for me asking this question.
Are there any other applications that can complement valgrind and help find issues in the code that may cause a crash mentioned above?
Thanks!
I assume you're using valgrind's memcheck tool, which is what it is famous for. Since you are using valgrind already you might also try running your program through valgrind --tool=exp-sgcheck (formerly exp-ptrcheck), which is an experimental tool that is designed to catch certain types of errors that memcheck will miss, including access checks for stack and global arrays, and use of pointers that happen to point to a valid object but not the object that was intended. It does this by using a completely different mechanism, essentially tracking each pointer into memory rather than tracking the memory itself, and through use of heuristics.
Be aware that the tool is experimental, but you may find that it catches something significant. Currently it does not yet support OS X or non-Intel processors.
In my experience, coverity and purify have founds such kind of errors than valgrind didn't (in fact all found problems which weren't seen by the others).
But sometimes no tool give an hint and you have to dig more, add instrumentation, play with breakpoints on "modify memory at address", try to simply the testcase which fails and so on to find out the root cause. That's can be very painful.
My experience is that often this sort of problem is caused by a heap overflow. Electric Fence is a relatively simple allocation debugging tool I like to use. Its main use is as a dynamic analysis tool to check for heap overflows, a complement to "-fstack-protector-all" which checks for stack overflows.
More links to efence stuff.
Is it possible some stack corruption is occurring? If so, try enabling stack canaries with the -fstack-protector-all option, assuming you are using g++.
Other than that, have you cranked up warning flags to help identify suspicious code?
In my opinion, using a debugger with "reverse debugging" capabilities could help.
You would be able to step back in time and hopefully find out what was the real source of the problem.
Here are a couple of links:
http://www.gnu.org/software/gdb/news/reversible.html
http://undo-software.com/ (which apparently is free for non-commercial applications)
You didn't specify the platform, but I can recommend Gimpel PC-lint as an excellent static analysis tool (don't be fooled by the name!). They also offer FlexeLint for other platforms, but I have no personal experience of that product.
Have you tried using lint, flexlint or cppcheck. These may help identify a problem.
If you know what area of memory is being corrupted have you tried marking this memory as protected. This may mask your problem and not help at all but if it still crashes the point at where the memory is modified will help resolve your problem.
If valgrind can identify the bad pointer being passed to free(), you could try running the program under DDD, which can set a hardware watchpoing on the memory location and halt the program when it is getting a bad value. If the pointer is getting changed a lot you may have to write some code around malloc and free to keep track of which values are good and bad.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
We are producing a portable code (win+macOs) and we are looking at how to make the code more rubust as it crashes every so often... (overflows or bad initializations usually) :-(
I was reading that Google Chrome uses a process for every tab so if something goes wrong then the program does not crash compleatelly, only that tab. I think that is quite neat, so i might give it a go!
So i was wondering if someone has some tips, help, reading list, comment, or something that can help me build more rubust c++ code (portable is always better).
In the same topic i was also wondering if there is a portable library for processes (like boost)?
Well many Thanks.
I've developed on numerous multi-platform C++ apps (the largest being 1.5M lines of code and running on 7 platforms -- AIX, HP-UX PA-RISC, HP-UX Itanium, Solaris, Linux, Windows, OS X). You actually have two entirely different issues in your post.
Instability. Your code is not stable. Fix it.
Use unit tests to find logic problems before they kill you.
Use debuggers to find out what's causing the crashes if it's not obvious.
Use boost and similar libraries. In particular, the pointer types will help you avoid memory leaks.
Cross-platform coding.
Again, use libraries that are designed for this when possible. Particularly for any GUI bits.
Use standards (e.g. ANSI vs gcc/MSVC, POSIX threads vs Unix-specific thread models, etc) as much as possible, even if it requires a bit more work. Minimizing your platform specific code means less overall work, and fewer APIs to learn.
Isolate, isolate, isolate. Avoid in-line #ifdefs for different platforms as much as possible. Instead, stick platform specific code into its own header/source/class and use your build system and #includes to get the right code. This helps keep the code clean and readable.
Use the C99 integer types if at all possible instead of "long", "int", "short", etc -- otherwise it will bite you when you move from a 32-bit platform to a 64-bit one and longs suddenly change from 4 bytes to 8 bytes. And if that's ever written to the network/disk/etc then you'll run into incompatibility between platforms.
Personally, I'd stabilize the code first (without adding any more features) and then deal with the cross-platform issues, but that's up to you. Note that Visual Studio has an excellent debugger (the code base mentioned above was ported to Windows just for that reason).
The Chrome answer is more about failure mitigation and not about code quality. Doing what Chrome is doing is admitting defeat.
Better QA that is more than just programmer testing their own work.
Unit testing
Regression testing
Read up on best practices that other
companies use.
To be blunt, if your software is crashing often due to overflows and bad initializations, then you have a very basic programming quality problem that isn't going to be easily fixed. That sounds a hash and mean, that isn't my intent. My point is that the problem with the bad code has to be your primary concern (which I'm sure it is). Things like Chrome or liberal use to exception handling to catch program flaw are only distracting you from the real problem.
You don't mention what the target project is; having a process per-tab does not necessarily mean more "robust" code at all. You should aim to write solid code with tests regardless of portability - just read about writing good C++ code :)
As for the portability section, make sure you are testing on both platforms from day one and ensure that no new code is written until platform-specific problems are solved.
You really, really don't want to do what Chrome is doing, it requires a process manager which is probably WAY overkill for what you want.
You should investigate using smart pointers from Boost or another tool that will provide reference counting or garbage collection for C++.
Alternatively, if you are frequently crashing you might want to perhaps consider writing non-performance critical parts of your application in a scripting language that has C++ bindings.
Scott Meyers' Effective C++ and More Effective C++ are very good, and fun to read.
Steve McConnell's Code Complete is a favorite of many, including Jeff Atwood.
The Boost libraries are probably an excellent choice. One project where I work uses them. I've only used WIN32 threading myself.
I agree with Torlack.
Bad initialization or overflows are signs of poor quality code.
Google did it that way because sometimes, there was no way to control the code that was executed in a page (because of faulty plugins, etc.). So if you're using low quality plug ins (it happens), perhaps the Google solution will be good for you.
But a program without plugins that crashes often is just badly written, or very very complex, or very old (and missing a lot of maintenance time). You must stop the development, and investigate each and every crash. On Windows, compile the modules with PDBs (program databases), and each time it crashes, attach a debugger to it.
You must add internal tests, too. Avoid the pattern:
doSomethingBad(T * t)
{
if(t == NULL) return ;
// do the processing.
}
This is very bad design because the error is there, and you just avoid it, this time. But the next function without this guard will crash. Better to crash sooner to be nearer from the error.
Instead, on Windows (there must be a similar API on MacOS)
doSomethingBad(T * t)
{
if(t == NULL) ::DebugBreak() ; // it will call the debugger
// do the processing.
}
(don't use this code directly... Put it in a define to avoid delivering it to a client...)
You can choose the error API that suits you (exceptions, DebugBreak, assert, etc.), but use it to stop the moment the code knows something's wrong.
Avoid the C API whenever possible. Use C++ idioms (RAII, etc.) and libraries.
Etc..
P.S.: If you use exceptions (which is a good choice), don't hide them inside a catch. You'll only make your problem worse because the error is there, but the program will try to continue and will probably crash sometimes after, and corrupt anything it touches in the mean time.
You can always add exception handling to your program to catch these kinds of faults and ignore them (though the details are platform specific) ... but that is very much a two edged sword. Instead consider having the program catch the exceptions and create dump files for analysis.
If your program has behaved in an unexpected way, what do you know about your internal state? Maybe the routine/thread that crashed has corrupted some key data structure? Maybe if you catch the error and try to continue the user will save whatever they are working on and commit the corruption to disk?
Beside writing more stable code, here's one idea that answers your question.
Whether you are using processes or threads. You can write a small / simple watchdog program. Then your other programs register with that watchdog. If any process dies, or a thread dies, it can be restarted by the watchdog. Of course you'll want to put in some test to make sure you don't keep restarting the same buggy thread. ie: restart it 5 times, then after the 5th, shutdown the whole program and log to file / syslog.
Build your app with debug symbols, then either add an exception handler or configure Dr Watson to generate crash dumps (run drwtsn32.exe /i to install it as the debugger, without the /i to pop the config dialog). When your app crashes, you can inspect where it went wrong in windbg or visual studio by seeing a callstack and variables.
google for symbol server for more info.
Obviously you can use exception handling to make it more robust and use smart pointers, but fixing the bugs is best.
I would recommend that you compile up a linux version and run it under Valgrind.
Valgrind will track memory leaks, uninitialized memory reads and many other code problems. I highly recommend it.
After over 15 years of Windows development I recently wrote my first cross-platform C++ app (Windows/Linux). Here's how:
STL
Boost. In particular the filesystem and thread libraries.
A browser based UI. The app 'does' HTTP, with the UI consisting of XHTML/CSS/JavaScript (Ajax style). These resources are embedded in the server code and served to the browser when required.
Copious unit testing. Not quite TDD, but close. This actually changed the way I develop.
I used NetBeans C++ for the Linux build and had a full Linux port in no time at all.
Build it with the idea that the only way to quit is for the program to crash and that it can crash at any time. When you build it that way, crashing will never/almost never lose any data. I read an article about it a year or two ago. Sadly, I don't have a link to it.
Combine that with some sort of crash dump and have it email you it so you can fix the problem.