Nonsensical C++ and Objective-C++ crash on Mavericks - c++

I've got a large Mac app that runs for a couple of days at a time operating on a large data set. It's a mix of Objective-C++ and C++. It runs great on Mountain Lion, but on Mavericks, after running for about 10 to 20 minutes (in which a couple of million objects are allocated and destroyed), it crashes. It behaves as if it's crashing with an invalid pointer (i.e. calling a function on a deleted C++ object), but the object it's pointing to is in a state that makes absolutely no sense.
All my C++ classes inherit from a common base class where the constructor looks something like this:
MyClass::MyClass()
{
mCreated = 12345; //int member variable set here and NEVER TOUCHED AGAIN.
//other initialization stuff
}
When it crashes, the debugger shows that in the bad object, the value for mCreated is 0. It's behaving as if the object never ran its constructor!
I don't think it's memory stomping, because this value is never anything other than 0 or its expected value, and none of the other fields in the object have values that look like the garbage you'd expect from memory stomping.
I've also tried running with scribble turned on, and the 0x555 and 0xaaa values don't show up anywhere. I've also tried Guard Edges.
In-depth investigation has not revealed anything. The bad object isn't even always the same class. All I can think of is that something with the new memory stuff in Mavericks (compressing unused memory) is causing some new behavior (maybe a bug or maybe some previously unknown, mostly-unenforced rule that now really matters).
Has anyone seen anything similar? Or does anyone know of any mostly-unknown memory rules that would apply more strongly under Mavericks?

I think you're right about the invalid pointer suspicion. It might be a pointer to a deleted object or it might be a garbage pointer. Either one would be consistent with the mCreated member being different than you expect. In the case of a deleted object, the memory could be used for something else and therefore set to some other value. In the case of a garbage pointer, you're not pointing to anything that ever was an instance of your class.
I don't know how well the Allocations instrument works for C++ objects, but you could try reproducing the crash under that. When it stops in the debugger, get the this pointer and then get the history of that address from Instruments.
If Instruments doesn't work, you can set the MallocStackLoggingNoCompact environment variable. Then, when it stops in the debugger, examine the this pointer and use the following commands to view the history of that address:
(lldb) script import lldb.macosx.heap
(lldb) malloc_info --stack-history 0x10010d680
(Use the this address instead of 0x10010d680, of course.)
Alternatively, you can use the malloc_history command from a shell to investigate the history, if doing it within LLDB is cumbersome.

Related

Checking if an C++ Pointer is valid (in Objective-C(++))

I've adapted Matt Gallagher's "Testing if an arbitrary pointer is a valid object pointer" in an iOS project which uses Objective-C++. It's working fine with Objective-C objects but it always tells me that my C++-Pointers are invalid regardless of whether it works or not. Sometimes the Code crashes at the pointer. Sometimes the code works fine. But the test-method always tells me the pointer is wrong.
Is here anybody who knows to adapt this code to C++ classes and objects too? I could imagine that the code is only working with Objective-C according to the use of "Class"
The contents of a pointer variable is either: A null pointer, a valid pointer to an object, a valid pointer to an array element or past the last element of an array, or some invalid pointer.
If it is an invalid pointer, then any attempt to use it invokes undefined behaviour. That includes any attempt to check that it is an invalid pointer. And there you are stuck. All you can do is check whether it is a null pointer, or whether it is equal to some other valid pointer.
You should go with the Objective-C philosophy: Trying to use an invalid pointer is a programming error. You don't try to detect and handle this at runtime. You let it crash and fix the bug in your code.
C++ pointers simply reference an address in memory. You could look at what's there in memory using a memory viewer tool, but that wouldn't even guarantee that the memory is still valid. For example:
char* test = new[13];
strcpy(test, "Hello World!");
delete[] test;
.
.
.
printf("%s", test);
In some cases this will print successfully. Sometimes it will print a garbage string. And sometimes it will segfault. There is nothing there to speak to the pointer's validity.
If you're looking at a program that has just segfaulted and you're trying to see what happened there are a few options available to you:
You can look at the memory through a memory viewer, that in combination with the line you faulted on can provide insight.
You can seed your memory before running to make this clearer use 0xbadfood5 or something similar.
Use Valgrind when running is a great tool, if you can deal with the overhead.
The best option is to do error checking in your code. It sounds like you don't have that or you wouldn't be here. Preconditions and postconditions are great and will save you a ton of time in the long run (like now.) However as a silver lining you should exploit this to exact better coding standards in your organization for the future.

C++ function used to work, now returning 0xfdfdfdfd

I have some code I wrote a few years ago. It has been working fine, but after a recent rebuild with some new, unrelated code elsewhere, it is no longer working. This is the code:
//myobject.h
...
inline CMapStringToOb* GetMap(void) {return (m_lpcMap);};
...
The above is accessed from the main app like so:
//otherclass.cpp
...
CMapStringToOb* lpcMap = static_cast<CMyObject*>(m_lpcBaseClass)->GetMap();
...
Like I said, this WAS working for a long time, but it's just decided to start failing as of our most recent build. I have debugged into this, and I am able to see that, in the code where the pointer is set, it is correctly setting the memory address to an actual value. I have even been able to step into the set function, write down the memory address, then move to this function, let it get 0xfdfdfdfd, and then manually get the memory address in the debugger. This causes the code to work. Now, from what I've read, 0xfdfdfdfd means guarding bytes or "no man's land", but I don't really understand what the implications of that are. Supposedly it also means an off by one error, but I don't understand how that could happen, if the code was working before.
I'm assuming from the Hungarian notation that you're using Visual Studio. Since you do know the address that holds the map pointer, start your program in the debugger and set a data breakpoint when that map pointer changes (the memory holding the map pointer, not the map pointed to). Then you'll find out exactly when it's getting overwritten.
0xfdfdfdfd typically implies that you have accessed memory that you weren't supposed to.
There is a good chance the memory was allocated and subsequently freed. So you're using freed memory.
static_cast can modify a pointer and you have an explicit cast to CMyObject and an implicit cast to CMapStringToOb. Check the validity of the pointer directly returned from GetMap().
Scenarios where "magic" happens almost always come back to memory corruption. I suspect that somewhere else in your code you've modified memory incorrectly, and it's resulting in this peculiar behavior. Try testing some different ways of entering this part of the code. Is the behavior consistent?
This could also be caused by an incorrectly built binary. Try cleaning and rebuilding your project.

stl::map issues

This must be me doing something stupid, but has anyone seen this behaviour before:
I have a map in a class member defined like so:
std::map <const std::string, int> m_fCurveMap;
all behaves fine in debug but all goes wrong in release mode. map gets initialised to some crazy number: m_fCurveMap [14757395258967641292]()
Any member I have after the map gets completely corrupted, ie if I put an int on the line after the map like this:
std::map <const std::string, int> m_fCurveMap;
int m_myIntThing;
and in my constructors set m_myIntThing to 0, after the constructor has been called m_myIntThing is some crazy number. If I move m_myIntThing to the line above the map everything for m_myIntThing is fine. This ends up causing big problems for me further down the line. Do I need to do something to the map in my constructor? I'm not at the moment.
I am using visual studio, this works fine with gcc. I only see the problem in release. The project is a dll.
If you have seen this kind of madness before please help its driving me mad. :-)
Many thanks,
Mark
This has happened to me lots of times. Although it's hard to say in your case, a very likely reason is that you have different versions of the C run time library in between different projects. Check your "code generation" tab in the compiler settings for your different projects and make certain they are the same.
What's effectively happening is that different versions of the C run time libraries implement STL containers in different ways. Then when the different projects try to talk to each other, the meaning of what an std::map is (for instance) have changed and are no longer binary compatible.
The strange behavior is very likely some kind of heap corruption, or if it's being passed as a parameter to a function, stack corruption.
The problem is memory corruption of some kind.
A bug that I have seen often in C++ projects is using an object after it has been deleted.
Another possibility is a buffer overflow. It could be any object on the same stack or nearby on the heap.
A pretty good way to catch the culprit is to set a debugger breakpoint that fires on memory change. While the object is still good, set your breakpoint. Then wait until some code writes into that memory location. That should reveal your bug.
If you're getting your information from the VS debugger, I wouldn't trust what it is telling you for a Release DLL. The debugger can only be really trusted with Debug DLLs.
If program output is telling you this, then that's different -- in that case, you're not providing enough information.
Are you mixing a release DLL with a debug app?
Otherwise it sounds like memory corruption, although I can't say for sure.
Something else is stomping on memory
You're accessing deleted memory
You're returning a temporary by pointer or reference
etc
Any of these could appear to work fine in some cases as they're undefined behavior, and only in release mode do they blow up.
I had the exact same problem on g++, I got it resolved by removing the pragmas in a pragma paragraph before that. Eventhough the code is correct, I wonder if this is a compiler bug on the platform showing up when using stl::map in some situations.
#pragma pack(push,1)
xxxx
#pragma(pop)
Just to give a concrete example for the memory corruption:
typedef std::map<int, int> mymap_t;
static mymap_t static_init() { return mymap_t(); }
class foo {
foo(): mymap(static_init()) {}
//!> d'oh, don't reference!
const mymap_t &mymap;
};
Accidentally, I defined a ref to a member variable and not the member variable itself. It gets initialized alright, but as soon as the scope of static_init() is left, the map is destroyed and the ref will just show up in debug as "std::map with 140737305218461 elements" (pretty-printed) or similar as it points to now unallocated meory (or worse).
Beware of accidental references!

Access violation exception when calling a method

I've got a strange problem here. Assume that I have a class with some virtual methods. Under a certain circumstances an instance of this class should call one of those methods. Most of the time no problems occur on that stage, but sometimes it turns out that virtual method cannot be called, because the pointer to that method is NULL (as shown in VS), so memory access violation exception occurs. How could that happen?
Application is pretty large and complicated, so I don't really know what low-level steps lead to this situation. Posting raw code wouldn't be useful.
UPD: Ok, I see that my presentation of the problem is rather indefinite, so schematically code looks like
void MyClass::FirstMethod() const { /* Do stuff */ }
void MyClass::SecondMethod() const
{
// This is where exception occurs,
// description of this method during runtime in VS looks like 0x000000
FirstMethod();
}
No constructors or destructors involved.
Heap corruption is a likely candidate. The v-table pointer in the object is vulnerable, it is usually the first field in the object. A buffer overflow for some kind of other object that happens to be adjacent to the object will wipe the v-table pointer. The call to a virtual method, often much later, will blow.
Another classic case is having a bad "this" pointer, usually NULL or a low value. That happens when the object reference on which you call the method is bad. The method will run as usual but blow up as soon as it tries to access a class member. Again, heap corruption or using a pointer that was deleted will cause this. Good luck debugging this; it is never easy.
Possibly you're calling the function (directly or indirectly) from a constructor of a base class which itself doesn't have that function.
Possibly there's a broken cast somewhere (such as a reinterpret_cast of a pointer when there's multiple inheritance involved) and you're looking at the vtable for the wrong class.
Possibly (but unlikely) you have somehow trashed the vtable.
Is the pointer to the function null just for this object, or for all other objects of the same type? If the former, then the vtable pointer is broken, and you're looking in the wrong place. If the latter, then the vtable itself is broken.
One scenario this could happen in is if you tried to call a pure virtual method in a destructor or constructor. At this point the virtual table pointer for the method may not be initialized causing a crash.
Is it possible the "this" pointer is getting deleted during SecondMethod's processing?
Another possibility is that SecondMethod is actually being called with an invalid pointer right up front, and that it just happens to work (by undefined behavior) up to the nested function call which then fails. If you're able to add print code, check to see if "this" and/or other pointers being used is something like 0xcdcdcdcd or 0xfdfdfdfd at various points during execution of those methods. Those values are (I believe) used by VS on memory alloc/dealloc, which may be why it works when compiled in debug mode.
What you are most likely seeing is a side-effect of the actual problem. Most likely heap or memory corruption, or referencing a previously freed object or null pointer.
If you can consistently have it crash at the same place and can figure out where the null pointer is being loaded from then I suggest using the debugger and put a breakpoint on 'write' at that memory location, once the breakpoint is trigerred then most likely you are viewing the code that has actually caused the corruption.
If memory access violation happens only when Studio fails to show method address, then it could be caused by missing debug information. You probably are debugging the code compiled with release (non-debug) compiler/linker flags.
Try to enable some debug info in C++ properties of project, rebuild and restart debugger. If it will help, you will see all normal traceable things like stack, variables etc.
If your this pointer is NULL, corruption is unlikely. Unless you're zeroing memory you shouldn't have.
You didn't say if you're debugging Debug (not optimized) or Release (optimized) build. Typically, in Release build optimizer will remove this pointer if it is not needed. So, if you're debugging optimized build, seeing this pointer as 0 doesn't mean anything. You have to rely on the deassembly to tell you what's going on. Try turning off optimization in your Release build if you cannot reproduce the problem in Debug build. When debugging optimized build, you're debugging assembly not C++.
If you're already debugging a non-optimized build, make sure you have a clean rebuild before spending too much time debugging corrupted images. Debug builds are typically linked incrementally and incremental linker are known to produce problems like this. If you're running Debug build with clean build and still couldn't figure out what went wrong, post the stack dump and more code. I'm sure we can help you figure it out.

<list> throws unhandled exception when calling push_front()

I'm working on a GUI in SDL. I've created a slave/master class that contains a std::list of pointers to it's own slaves to create a heirarchy in the GUI (window containing buttons. Button a label and so on). It worked fine for a good while, until I edited a completely different class that doesn't effect the slave/master class directly. The call to the list.push_front() in the old slave/master class now throws the following error when debugging in VS C++ 2008 express and I can't find what's causing it.
"Unhandled exception at 0x00b6decd in
workbench.exe: 0xC0000005: Access
violation reading location
0x00000004."
*workbench.exe is my project.
The exception is raised in the _Insert method in the list code on row 718:
_Nodeptr _Newnode = _Buynode(_Pnode, _Prevnode(_Pnode), _Val);
The list is created in the master/slave class' definition and the slave/master class is created on the heap to be inserted in another master's slave list. The list that crashes is empty when push_front() is called but it is second in line in the heirarchy, so it worked once. As I said, it worked fine before and the slave/master class hasn't been altered to cause the error.
The new class does use lists aswell. Can the use of several lists cause clashes? May I have accidentally screwed up the heap?
Any help and tips to what I could look for is appreciated.
P.S The code is rather large now so I would guess it's better to not include it. Especially since I'm not exactly sure just what causes the error. Sorry if it's a bit scarce
Update: I've replaced the push_front() with creating an iterator and using insert(). The result was an iterator pointing to "baadf00d" after assigning the list.begin(). baadf00d is some error/NULL pointer that VS uses to objects that haven't been assigned anything, as far as I can tell. I guess it's another sign that the list is corrupt?
Usually errors like this with addresses like 0x00000004 indicate dereferencing a NULL pointer, e.g.
struct point {
int x;
int y;
};
struct point *pt = NULL;
printf("%d\n", pt->y);
can create an error like that.
Doesn't smell like heap corruption to me, usually those errors tend to be subtler, I bet this is a case of a NULL pointer. I'd go up the call stack and hunt for null pointers, could be a member of the the object you're pushing on to the fron of the list or that object itself. If you do think this is a heap corruption issue, you can use gflags, which is free, to enable page heap and the like which will let you detect heap corruption earlier, hopefully as it happens, rather than by the side effects it causes later.
Probably, you have caused some kind of buffer overrun or other memory corruption in your original code which did not manifest until now. There is no risk of conflict between different list instances, and as you say, the new code does not interact with the old code. Therefore barring magic, you have coded a bug.
Given the lack of code, the best I can do is give you some scenarios I can think of:
The most obvious, and hardest to find, is a memory corruption. The list has been walked on, so adding to the item means the list manipulates crap memory.
You can test this by moving variables around in their declaration, and by changing the order of assignment. If this makes the error go away or move, you are looking at a memory problem. You could also try changing the list to a vector, and see what that does.
A second possibility is that you have a list of pointers/references and the items are being deallocated before/after being put on the stack. This can easily happen if you put the address of a stack object into a list allocated elsewhere. You say you created the object on the heap, so I guess it isn't this.
My guess, based on the seeing the combination of _Prevnode and push_front is that the list was corrupted earlier, possibly by misusing an iterator. Another way to corrupt a list is removing an element from an empty list. Make sure you have iterator debugging turned on in VS2008. It will catch many problems a lot earlier.
Finally after looking through every nook and cranny I've found the bug! It was completely unexpected and I feel a bit embarrassed about it.
I had recently re-arranged the files. Prior to that I had generic classes in one folder and my user interface files in a subfolder. I copied the GUI files to the main folder and I thought I linked everything up correctly, but obviously I missed one line and it never occurred to me when it started acting up. My library compiled since that was linked fine, but my testing program wasn't... it simply looked at the old header files! Worked fine to begin with ofcourse since the headers were the same, but then I edited one and the class declared in it started acting funny as mentioned, obviously, since it couldn't recognize the damn thing anymore. It looked like corrupted memory, so that's what I looked for.
Lesson learned: Don't keep two versions close to each other or at all.