I'm changing over from Visual Studio 2008 -> 2010 and I've come across a weird bug in my code when evaluating a find on a std::set of pointers.
I know that this version brings about a change where set::iterator has the same type as set::const_iterator to bring about some compatability with the standard. But I can't figure out why this section of code which previously worked now causes a crash?
void checkStop(Stop* stop)
{
set<Stop*> m_mustFindStops;
if (m_mustFindStops.find(stop) != m_mustFindStops.end()) // this line crashes for some reason??
{
// do some stuff
}
}
PS m_mustFindStops is empty when it crashes.
EDIT: Thanks for the quick replies... I can't get it to reproduce with a simple case either - it's probably not a problem with the set its self. I think that heap corruption may be a culprit - I just wish I knew why changing compilers would suddenly cause corruption for the same code and same input data.
The only thing I can think of is that you have multiple threads, and m_mustfindStops is in fact a member or global variable and not a local to this function. There is no way the code above can cause problems, if correct and taken in isolation.
If you have multiple threads, then read access concurrent with write access will cause random errors - even if the container looks empty, it might not have been when the find call started.
Another possibility is that some other code has corrupted the heap, in which case however any of your code that uses heap memory could malfunction. With that in mind, if it's always this logic that breaks, my bet would be on a threading issue.
btw - there is absolutely nothing wrong with std::set in Visual C++ v10 - your code must have a bug.
Related
I dislike pointers, and generally try to write as much code as I can using refs instead.
I've written a very rudimentary "vertical layout" system for a small Win32 app. Most of the Layout methods look like this:
void Control::DoLayout(int availableWidth, int &consumedYAmt)
{
textYPosition = consumedYAmt;
consumedYAmt += measureText(font, availableWidth);
}
They are looped through like so:
int innerYValue = 0;
foreach(control in controls) {
control->DoLayout(availableWidth, innerYValue);
}
int heightOfControl = innerYValue;
It's not drawing its content here, just calculating exactly how much space this control will require (usually it's adding padding too, etc). This has worked great for me.......in debug mode.
I found that in Release mode, I could suddenly see tangible, loggable issues where, when I'm looping through controls and calling DoLayout(), the consumedYAmt variable actually stays at 0 in the outside loop. The most annoying part is that if I put in breakpoints and walk through the code line by line, this stops happening and parts of it are properly updated by the inside "add" methods.
I'm kind of thinking about whether this would be some compiler optimization where they think I'm simply adding the ref flag to ints as a way to optimize memory; or if there's any possibility this actually works in a way different from how it seems.
I would give a minimum reproducible example, but I wasn't able to do so with a tiny commandline app. I get the sense that if this is an optimization, it only kicks in for larger code blocks and indirections.
EDIT: Again sorry for generally low information, but I'm now getting hints that this might be some kind of linker issue. I skipped one part of the inheritance model in my pseudocode: The calling class actually calls "Layout()", which is a non-virtual function on the root definition of the class. This function performs some implementation-neutral logic, and then calls DoLayout() with the same arguments. However, I'm now noticing that if I try adding a breakpoint to Layout(), Visual Studio claims that "The breakpoint will not be hit. No executable code of the debugger's target code type is associated with this line." I am able to add breakpoints to certain other lines, but I'm beginning to notice weird stepping logic where it refuses to go inside certain functions, like Layout. Already tried completely clearing the build folders and rebuilding. I'm going to have to keep looking, since I have to admit this isn't a lot to go on.
Also, random addition: The "controls" list is a vector containing shared_ptr objects. I hadn't suspected the looping mechanism previously but now I'm looking more closely.
"the consumedYAmt variable actually stays at 0"
The behavior you describe is typical for a specific optimization that's more due to the CPU than the compiler. I suspect you're logging consumedYAmt from another thread. The updates to consumedYAmt simply don't make it to that other thread.
This is legal for the CPU, because the C++ compiler didn't put in memory fences. And the CPU compiler didn't put in fences because the variable isn't atomic.
In a small program without threads, this simply doesn't show up, nor does it show in debug mode.
Written by OP
Okay, eventually figured this one out. As simple as the issue was, pinning it down became difficult because of Release mode's debugger seemingly acting in inconsistent ways. When I changed tactic to adding Logging statements in lots of places, I found that my Control class had an "mShowing" variable that was uninitialized in its constructor. In debug mode, it apparently retained uninitialized memory which I guess made it "true" - but in release mode, my best analysis is that memory protections made it default to "false", which as it turns out skipped the main body of the DoLayout method most of the time.
Since through the process, responders were low on information to work with (certainly could've been easier if I posted a longer example), I instead simply upvoted each comment that mentioned uninitialized variables.
I've run into a strange problem. The Release version of my application seems to run fine, but recently when I switched to the Debug version, I got an access violation immediately on start-up. The access violation is occurring when a block of allocated memory is freed. All of this occurs in the constructor for a static variable.
I believe the problem doesn't occur in the Release version simply because I have defined NDEBUG there, which I believe disables assertions in the C runtime.
I've been able to narrow things down a bit. If I add the following code to the constructor before the usual calls, then I get the same error:
int *temp = new int[3];
delete[] temp;
This makes me think that something outside of this block of code is causing the problem, e.g., perhaps there is a problem with the way the C runtime is being linked. However, I'm at a loss to say what that problem might be, and after a day of poking at the problem I'm running out of ideas for where to poke next.
Any help would be greatly appreciated. I am using Visual Studio 2010 to compile the application and running Windows 7.
Thanks so much!
In Debug mode, additional checks are added; therefore it's not unusual for a program to run perfectly well in Release mode but to give an access violation in Debug mode. This doesn't mean that the Release version is OK; it only means that some error made when the Release version is running is not catched but is when running in Debug mode.
Debugging a corrupted memory problem in C/C++ is very hard because the error can be made by any other instruction affecting the memory. For example, if you have two arrays that follow one each other in the allocated memory and the first array is overrun, then it will corrupt the header put before the second array (each memory allocation is prefixed by an header; this header is used by the operators delete and delete[] when deallocating the memory). However, only when you will try to deallocate the second array that the access violation will occurs and this, even if it's with the first array that there is error in the code.
Of course, you can have other problems with the second array. For example, you can find that some or all of its values have been corrupted when trying to read from it. However, it's not always the case and in many occasions, it might behave perfectly well when reading or writing to or from it and you can have the exact same good behavior with the first array. It's not because you don't have any problem reading and writing to and from some array that you don't overstep its boundary and corrupting the memory above (or below) it. Sometimes, the problem will only show up when trying to deallocate the array and other times, the problem will show up otherwise; for example with the display of corrupted values.
I was able to produce a minimal example by cutting away essentially all of my application code. I reduced the InitInstance function to the following:
BOOL CTempApp::InitInstance()
{
int *temp = new int[3] ;
delete[] temp ;
return FALSE ;
}
Even with all of my application code stripped away, the problem persisted. After this, I created a new project in Visual Studio 2010, once again replaced InitInstance with my minimal version, and then added all of the Additional Dependencies from my original project in the linker options. With this configuration, the problem was reproduced.
Next, I started removing libraries from the list of dependencies. I managed to whittle the list down to a single third-party library which was causing the following linker warning despite being labeled as the debug version:
LINK : warning LNK4098: defaultlib 'msvcrtd.lib' conflicts with use of other libs; use /NODEFAULTLIB:library
I think what's happening here is that the vendor has linked his debug library against the non-debug runtime. Presumably when my application calls delete[] there is some confusion as to what the parameters are for the call, and it is trying to delete a portion of memory I have not allocated.
I have tried adding the following to my Ignore Specific Default Libraries list:
libc.lib, libcmt.lib, msvcrt.lib, libcd.lib, libcmtd.lib
as suggested here. However, the problem persists. I think the solution will be to contact the vendor and have them resolve the conflict.
Thanks again for your suggestions. I hope this answer will help someone else with debugging a similar problem.
So I have some code that looks like this, written in and compiled with Visual Studio 2010:
if ( outputFile.is_open() )
{
outputFile.close();
}
if ( !outputFile.is_open() ) // condition for sanity-checking
{
outputFile.open("errorOut.txt", ios::out);
}
This crashes on an access violation. Attaching a debugger shows that the first condition is false (outputFile is not open), the second condition is true (outputFile is closed, which is good since I just checked it). Then open() gets called, and eventually locale::getloc() attempts to dereference a null pointer, but I have no idea why that would be happening (since that's now three classes deep into the Standard Library).
Interestingly, the file "errorOut.txt" does get created, even though the open call crashes.
I've spent a few hours watching this in a debugger, but I honestly have no idea what's going on. Anyone have any ideas for even trying to determine what's wrong with the code? It's entirely possible that some code elsewhere is contributing to this situation (inherited code), but there's a lot of it and I don't even know where to look. Everything up to that point seems fine.
OK, I'm not really sure if this is the best way to handle this, but since this involved some truly strange behavior (crashing in the middle of an STL function, and some other oddities like hanging on exit(1); and the like), I'll leave an explanation here for the future.
In our case, the error seemed to derive from some memory corruption going on in some truly awful code that we inherited. Cleaning up the code in general eliminated this crash and other strange behaviors displayed by the program.
I don't know if this will be useful to anyone; maybe it would have been better to simply delete the question. I'm actually somewhat curious if I should have, if anyone wants to leave a comment.
I get a segmentation fault when attempting to delete this.
I know what you think about delete this, but it has been left over by my predecessor. I am aware of some precautions I should take, which have been validated and taken care of.
I don't get what kind of conditions might lead to this crash, only once in a while. About 95% of the time the code runs perfectly fine but sometimes this seems to be corrupted somehow and crash.
The destructor of the class doesn't do anything btw.
Should I assume that something is corrupting my heap somewhere else and that the this pointer is messed up somehow?
Edit : As requested, the crashing code:
long CImageBuffer::Release()
{
long nRefCount = InterlockedDecrement(&m_nRefCount);
if(nRefCount == 0)
{
delete this;
}
return nRefCount;
}
The object has been created with a new, it is not in any kind of array.
The most obvious answer is : don't delete this.
If you insists on doing that, then use common ways of finding bugs :
1. use valgrind (or similar tool) to find memory access problems
2. write unit tests
3. use debugger (prepare for loooong staring at the screen - depends on how big your project is)
It seems like you've mismatched new and delete. Note that delete this; can only be used on an object which was allocated using new (and in case of overridden operator new, or multiple copies of the C++ runtime, the particular new that matches delete found in the current scope)
Crashes upon deallocation can be a pain: It is not supposed to happen, and when it happens, the code is too complicated to easily find a solution.
Note: The use of InterlockedDecrement have me assume you are working on Windows.
Log everything
My own solution was to massively log the construction/destruction, as the crash could well never happen while debugging:
Log the construction, including the this pointer value, and other relevant data
Log the destruction, including the this pointer value, and other relevant data
This way, you'll be able to see if the this was deallocated twice, or even allocated at all.
... everything, including the stack
My problem happened in Managed C++/.NET code, meaning that I had easy access to the stack, which was a blessing. You seem to work on plain C++, so retrieving the stack could be a chore, but still, it remains very very useful.
You should try to load code from internet to print out the current stack for each log. I remember playing with http://www.codeproject.com/KB/threads/StackWalker.aspx for that.
Note that you'll need to either be in debug build, or have the PDB file along the executable file, to make sure the stack will be fully printed.
... everything, including multiple crashes
I believe you are on Windows: You could try to catch the SEH exception. This way, if multiple crashes are happening, you'll see them all, instead of seeing only the first, and each time you'll be able to mark "OK" or "CRASHED" in your logs. I went even as far as using maps to remember addresses of allocations/deallocations, thus organizing the logs to show them together (instead of sequentially).
I'm at home, so I can't provide you with the exact code, but here, Google is your friend, but the thing to remember is that you can't have a __try/__except handdler everywhere (C++ unwinding and C++ exception handlers are not compatible with SEH), so you'll have to write an intermediary function to catch the SEH exception.
Is your crash thread-related?
Last, but not least, the "I happens only 5% of the time" symptom could be caused by different code path executions, or the fact you have multiple threads playing together with the same data.
The InterlockedDecrement part bothers me: Is your object living in multiple threads? And is m_nRefCount correctly aligned and volatile LONG?
The correctly aligned and LONG part are important, here.
If your variable is not a LONG (for example, it could be a size_t, which is not a LONG on a 64-bit Windows), then the function could well work the wrong way.
The same can be said for a variable not aligned on 32-byte boundaries. Is there #pragma pack() instructions in your code? Does your projet file change the default alignment (I assume you're working on Visual Studio)?
For the volatile part, InterlockedDecrement seem to generate a Read/Write memory barrier, so the volatile part should not be mandatory (see http://msdn.microsoft.com/en-us/library/f20w0x5e.aspx).
Sorry if this sounds like an "It compiles, so it must work!" question, but I want to understand why something is happening (or not happening, as the case may be).
In Project Settings, I set Basic Runtime Checks to Both. The debugger informs me that:
Run-Time Check Failure #2 - Stack around the variable 'beg' was corrupted.
But if I set it to the default, which is none, the program runs and completes normally, throwing no exceptions and causing no errors.
My question is, can I safely ignore this (because MSVC++ could be somehow wrong) or is this a real problem? I don't see how the program can continue successfully when the stack has been screwed up.
Edit:
The function that causes this error looks exactly like this:
int fun(list<int>::iterator&, const list<int>::iterator&);
int foo(list<int>& l) {
list<int>::iterator beg = l.begin();
list<int>::iterator end = l.end();
return fun(beg, end);
}
fun increments and operates on beg and when it returns, beg == end, and when MSVC++ breaks, it points to the closing }.
Edit 2:
I have isolated the problem. In some situations, fun removes some elements from the list who owns the items it iterates. This is what causes the error.
Your question isn't answerable without code to reproduce the problem.
But to give a vague answer to your general problem - If the compiler or debugger detected a problem, you probably have one.
In C++, just because something "goes wrong" doesn't mean your program will crash - it might keep running with completely unpredictable results. It may even complete with the results you desired. But just because it ran well on your system doesn't give you any guarantee for other systems, compilers, times of day, or even for additional runs of the same program.
This is called undefined behavior, and is caused by using the language incorrectly (but not in a way that causes a compile failure). A buffer overrun is only one of dozens of examples.
It turned out something was wrong with my Visual Studio installation, so reinstalling it fixed the problem.