So I have some code that looks like this, written in and compiled with Visual Studio 2010:
if ( outputFile.is_open() )
{
outputFile.close();
}
if ( !outputFile.is_open() ) // condition for sanity-checking
{
outputFile.open("errorOut.txt", ios::out);
}
This crashes on an access violation. Attaching a debugger shows that the first condition is false (outputFile is not open), the second condition is true (outputFile is closed, which is good since I just checked it). Then open() gets called, and eventually locale::getloc() attempts to dereference a null pointer, but I have no idea why that would be happening (since that's now three classes deep into the Standard Library).
Interestingly, the file "errorOut.txt" does get created, even though the open call crashes.
I've spent a few hours watching this in a debugger, but I honestly have no idea what's going on. Anyone have any ideas for even trying to determine what's wrong with the code? It's entirely possible that some code elsewhere is contributing to this situation (inherited code), but there's a lot of it and I don't even know where to look. Everything up to that point seems fine.
OK, I'm not really sure if this is the best way to handle this, but since this involved some truly strange behavior (crashing in the middle of an STL function, and some other oddities like hanging on exit(1); and the like), I'll leave an explanation here for the future.
In our case, the error seemed to derive from some memory corruption going on in some truly awful code that we inherited. Cleaning up the code in general eliminated this crash and other strange behaviors displayed by the program.
I don't know if this will be useful to anyone; maybe it would have been better to simply delete the question. I'm actually somewhat curious if I should have, if anyone wants to leave a comment.
Related
I dislike pointers, and generally try to write as much code as I can using refs instead.
I've written a very rudimentary "vertical layout" system for a small Win32 app. Most of the Layout methods look like this:
void Control::DoLayout(int availableWidth, int &consumedYAmt)
{
textYPosition = consumedYAmt;
consumedYAmt += measureText(font, availableWidth);
}
They are looped through like so:
int innerYValue = 0;
foreach(control in controls) {
control->DoLayout(availableWidth, innerYValue);
}
int heightOfControl = innerYValue;
It's not drawing its content here, just calculating exactly how much space this control will require (usually it's adding padding too, etc). This has worked great for me.......in debug mode.
I found that in Release mode, I could suddenly see tangible, loggable issues where, when I'm looping through controls and calling DoLayout(), the consumedYAmt variable actually stays at 0 in the outside loop. The most annoying part is that if I put in breakpoints and walk through the code line by line, this stops happening and parts of it are properly updated by the inside "add" methods.
I'm kind of thinking about whether this would be some compiler optimization where they think I'm simply adding the ref flag to ints as a way to optimize memory; or if there's any possibility this actually works in a way different from how it seems.
I would give a minimum reproducible example, but I wasn't able to do so with a tiny commandline app. I get the sense that if this is an optimization, it only kicks in for larger code blocks and indirections.
EDIT: Again sorry for generally low information, but I'm now getting hints that this might be some kind of linker issue. I skipped one part of the inheritance model in my pseudocode: The calling class actually calls "Layout()", which is a non-virtual function on the root definition of the class. This function performs some implementation-neutral logic, and then calls DoLayout() with the same arguments. However, I'm now noticing that if I try adding a breakpoint to Layout(), Visual Studio claims that "The breakpoint will not be hit. No executable code of the debugger's target code type is associated with this line." I am able to add breakpoints to certain other lines, but I'm beginning to notice weird stepping logic where it refuses to go inside certain functions, like Layout. Already tried completely clearing the build folders and rebuilding. I'm going to have to keep looking, since I have to admit this isn't a lot to go on.
Also, random addition: The "controls" list is a vector containing shared_ptr objects. I hadn't suspected the looping mechanism previously but now I'm looking more closely.
"the consumedYAmt variable actually stays at 0"
The behavior you describe is typical for a specific optimization that's more due to the CPU than the compiler. I suspect you're logging consumedYAmt from another thread. The updates to consumedYAmt simply don't make it to that other thread.
This is legal for the CPU, because the C++ compiler didn't put in memory fences. And the CPU compiler didn't put in fences because the variable isn't atomic.
In a small program without threads, this simply doesn't show up, nor does it show in debug mode.
Written by OP
Okay, eventually figured this one out. As simple as the issue was, pinning it down became difficult because of Release mode's debugger seemingly acting in inconsistent ways. When I changed tactic to adding Logging statements in lots of places, I found that my Control class had an "mShowing" variable that was uninitialized in its constructor. In debug mode, it apparently retained uninitialized memory which I guess made it "true" - but in release mode, my best analysis is that memory protections made it default to "false", which as it turns out skipped the main body of the DoLayout method most of the time.
Since through the process, responders were low on information to work with (certainly could've been easier if I posted a longer example), I instead simply upvoted each comment that mentioned uninitialized variables.
I'm currently dealing with one of the strangest bugs I have ever seen. I have this "else if" statement and inside the else-if I have a line of code that is causing a bug to happen elsewhere (my program is kind of complicated so I don't think it would help to post a short snippet of code here because it would be too difficult to explain -- so I apologize in advance if this post seems rather vague).
The issue is that the line of code that is causing the bug is not being called at all. I put a break point at the line and also put a print statement before it but the program never enters that particular "if-else" statement. The bug goes away when I comment out the line and shows up again when I uncomment it. This leads me to believe that the line must be getting called somehow but my break point and prints suggest otherwise.
Has anyone ever heard of something like this happening? Why would a line of code that is not even being called affect the rest of my program? Are there other ways to detect if the line is being called somehow besides using breakpoints and print statements?
I'm using XCode as my IDE and this is a single threaded program (so it's not some weird asynchronous bug)
PROBLEM HAS BEEN SOLVED. SEE TOP ANSWER
It actually may happen in some cases indeed and I already saw it before. If the bug is some slight buffer overflow then the presence/abasence of that line may make the compiler differently optimize the memory layout (ie. not allocate some variables or arrange them in a different way or place segments differently for example) that will by chance not trigger the problem anymore.
The same applies if the bug is a strange race condition: the lack of that line may change slightly the timings (due to differently optimized code) and make the bug come out.
Very long shot: that code may even somehow trigger a compiler bug. But this may be less the case, but it may.
So: yes it's definitely possible and I already saw it. And if something like this is happening to you and you're 100% sure your code is correct then be very careful since something quite nasty may be hiding in the code.
Not that there is enough information here to really answer this question, but perhaps I can try to provide some pointers.
Optimization. Try turning it off if it is turned on at all. When the compiler is optimizing your code, it may make different decisions when a statement is present vs. when it isn't. Whenever I am dealing with bugs that go away when a a seemingly unrelated code construct is changed, it is usually that the optimizer is doing something differently. I suppose a very simple example would be if the statement accesses something that the compiler would otherwise think is never accessed and can be folded away. The "unrelated" line may take an address of something which affects aliasing information, etc. All this isn't to say that the optimizer is wrong, your code likely still does have a bug, but it just explains the weird behaviour.
Debugging. It is very unlikely that the bug is in this line that is not reached. What I would focus on is setting watch points for the variables that are getting incorrect values (assuming you can narrow it down to what is receiving the wrong value).
These very weird issues often indicate an uninitialized variable or something along those lines. If the underlying compiler supports it, you can try providing an option that will cause the program to trap when an uninitialized memory location is being accessed.
Finally, this could be an indication that something is overwriting areas of the stack (and perhaps less likely other memory areas - heap, data). If you have anything that is writing to arrays allocated on the stack, check if you're walking past the end of the array.
Make sure you're using { and } around the bodies in your if() and else clauses. Without them it's easy to wind up with something like this:
if (a)
do_a_work();
if (b)
do_a_and_b_work();
else
do_not_a_work();
Which actually equates to this (and is not what's implied by the indenting):
if (a) {
do_a_work();
}
if (b) {
do_a_and_b_work();
} else {
do_not_a_work();
}
because the brace-less if (a) only takes the first statement below it for its body.
Commenting out your mystery line may be changing which code belongs to which if-else.
I get a segmentation fault when attempting to delete this.
I know what you think about delete this, but it has been left over by my predecessor. I am aware of some precautions I should take, which have been validated and taken care of.
I don't get what kind of conditions might lead to this crash, only once in a while. About 95% of the time the code runs perfectly fine but sometimes this seems to be corrupted somehow and crash.
The destructor of the class doesn't do anything btw.
Should I assume that something is corrupting my heap somewhere else and that the this pointer is messed up somehow?
Edit : As requested, the crashing code:
long CImageBuffer::Release()
{
long nRefCount = InterlockedDecrement(&m_nRefCount);
if(nRefCount == 0)
{
delete this;
}
return nRefCount;
}
The object has been created with a new, it is not in any kind of array.
The most obvious answer is : don't delete this.
If you insists on doing that, then use common ways of finding bugs :
1. use valgrind (or similar tool) to find memory access problems
2. write unit tests
3. use debugger (prepare for loooong staring at the screen - depends on how big your project is)
It seems like you've mismatched new and delete. Note that delete this; can only be used on an object which was allocated using new (and in case of overridden operator new, or multiple copies of the C++ runtime, the particular new that matches delete found in the current scope)
Crashes upon deallocation can be a pain: It is not supposed to happen, and when it happens, the code is too complicated to easily find a solution.
Note: The use of InterlockedDecrement have me assume you are working on Windows.
Log everything
My own solution was to massively log the construction/destruction, as the crash could well never happen while debugging:
Log the construction, including the this pointer value, and other relevant data
Log the destruction, including the this pointer value, and other relevant data
This way, you'll be able to see if the this was deallocated twice, or even allocated at all.
... everything, including the stack
My problem happened in Managed C++/.NET code, meaning that I had easy access to the stack, which was a blessing. You seem to work on plain C++, so retrieving the stack could be a chore, but still, it remains very very useful.
You should try to load code from internet to print out the current stack for each log. I remember playing with http://www.codeproject.com/KB/threads/StackWalker.aspx for that.
Note that you'll need to either be in debug build, or have the PDB file along the executable file, to make sure the stack will be fully printed.
... everything, including multiple crashes
I believe you are on Windows: You could try to catch the SEH exception. This way, if multiple crashes are happening, you'll see them all, instead of seeing only the first, and each time you'll be able to mark "OK" or "CRASHED" in your logs. I went even as far as using maps to remember addresses of allocations/deallocations, thus organizing the logs to show them together (instead of sequentially).
I'm at home, so I can't provide you with the exact code, but here, Google is your friend, but the thing to remember is that you can't have a __try/__except handdler everywhere (C++ unwinding and C++ exception handlers are not compatible with SEH), so you'll have to write an intermediary function to catch the SEH exception.
Is your crash thread-related?
Last, but not least, the "I happens only 5% of the time" symptom could be caused by different code path executions, or the fact you have multiple threads playing together with the same data.
The InterlockedDecrement part bothers me: Is your object living in multiple threads? And is m_nRefCount correctly aligned and volatile LONG?
The correctly aligned and LONG part are important, here.
If your variable is not a LONG (for example, it could be a size_t, which is not a LONG on a 64-bit Windows), then the function could well work the wrong way.
The same can be said for a variable not aligned on 32-byte boundaries. Is there #pragma pack() instructions in your code? Does your projet file change the default alignment (I assume you're working on Visual Studio)?
For the volatile part, InterlockedDecrement seem to generate a Read/Write memory barrier, so the volatile part should not be mandatory (see http://msdn.microsoft.com/en-us/library/f20w0x5e.aspx).
Sorry if this sounds like an "It compiles, so it must work!" question, but I want to understand why something is happening (or not happening, as the case may be).
In Project Settings, I set Basic Runtime Checks to Both. The debugger informs me that:
Run-Time Check Failure #2 - Stack around the variable 'beg' was corrupted.
But if I set it to the default, which is none, the program runs and completes normally, throwing no exceptions and causing no errors.
My question is, can I safely ignore this (because MSVC++ could be somehow wrong) or is this a real problem? I don't see how the program can continue successfully when the stack has been screwed up.
Edit:
The function that causes this error looks exactly like this:
int fun(list<int>::iterator&, const list<int>::iterator&);
int foo(list<int>& l) {
list<int>::iterator beg = l.begin();
list<int>::iterator end = l.end();
return fun(beg, end);
}
fun increments and operates on beg and when it returns, beg == end, and when MSVC++ breaks, it points to the closing }.
Edit 2:
I have isolated the problem. In some situations, fun removes some elements from the list who owns the items it iterates. This is what causes the error.
Your question isn't answerable without code to reproduce the problem.
But to give a vague answer to your general problem - If the compiler or debugger detected a problem, you probably have one.
In C++, just because something "goes wrong" doesn't mean your program will crash - it might keep running with completely unpredictable results. It may even complete with the results you desired. But just because it ran well on your system doesn't give you any guarantee for other systems, compilers, times of day, or even for additional runs of the same program.
This is called undefined behavior, and is caused by using the language incorrectly (but not in a way that causes a compile failure). A buffer overrun is only one of dozens of examples.
It turned out something was wrong with my Visual Studio installation, so reinstalling it fixed the problem.
I'm changing over from Visual Studio 2008 -> 2010 and I've come across a weird bug in my code when evaluating a find on a std::set of pointers.
I know that this version brings about a change where set::iterator has the same type as set::const_iterator to bring about some compatability with the standard. But I can't figure out why this section of code which previously worked now causes a crash?
void checkStop(Stop* stop)
{
set<Stop*> m_mustFindStops;
if (m_mustFindStops.find(stop) != m_mustFindStops.end()) // this line crashes for some reason??
{
// do some stuff
}
}
PS m_mustFindStops is empty when it crashes.
EDIT: Thanks for the quick replies... I can't get it to reproduce with a simple case either - it's probably not a problem with the set its self. I think that heap corruption may be a culprit - I just wish I knew why changing compilers would suddenly cause corruption for the same code and same input data.
The only thing I can think of is that you have multiple threads, and m_mustfindStops is in fact a member or global variable and not a local to this function. There is no way the code above can cause problems, if correct and taken in isolation.
If you have multiple threads, then read access concurrent with write access will cause random errors - even if the container looks empty, it might not have been when the find call started.
Another possibility is that some other code has corrupted the heap, in which case however any of your code that uses heap memory could malfunction. With that in mind, if it's always this logic that breaks, my bet would be on a threading issue.
btw - there is absolutely nothing wrong with std::set in Visual C++ v10 - your code must have a bug.