Program state and debugger disagree

Program state and debugger disagree - c++

I'm on Windows 7 using VS2013 building against the 2010 compiler (we've migrated our dev environments, but not all the projects).
I don't really know how to characterize this problem, or I'd google it. I have a pointer into a byte buffer, it's our wire protocol (the code base predates Google and their protocol buffers). We have headers that indicate an id and a type; cast the pointer into the appropriate type and you can access the data and if the data is dynamic in size, like a string field, a length. None of this should be surprising, if not a bit old school...
But what I'm seeing is I've got code that checks the field id - it should never be zero. But the condition is hitting, and when I inspect the element in the debugger, the buffer contents and pointer position are all correct - the field is non-zero.
So my questions to you:
1) How would I be able to better express this problem so I can google it?
2) Have you seen this before? Any ideas?

This is a long shot, but, I have seen it when the project was not built correctly. You can try to clean the solution and rebuild it again.

There are a few (combinations of) problems that can present like this.
Code that fails in a production setting but not when stepping through while debugging. The real culprit for the failure in this case is, most often, some other unrelated code misusing a pointer (and overwriting memory it shouldn't). The thing is, the developer gets a report of an error, and then steps through code with a debugger. Apart from the fact of using the debugger, the other difference is that the production code is compiled with some form of "full" optimisation, while the code is recompiled without optimisation (and with symbols output) for use by the debugger. That changes memory layout of data (and even code) in the program. The offending code is still molesting its pointer, but something else in memory is being overwritten. That means the symptom disappears when debugging. The only fix to this is careful examination of other code that is executed BEFORE the point where the crash is being reported.
Another possibility is that the build process has been messed up, and is including obsolete implementations of functions that exhibit an old bug. Try doing a "make clean" and "make build".
A third possibility is that the code does different things in debug and productions settings. For example, there is code wrapped in #ifdef DEBUG ... #endif which is only active when debugging. Such code is often used to product "debugging output". It also causes change of layout of memory in the program, so affects the symptoms of pointer misuse.
In the scenario you describe, it is also possible that the typecast is invalid. It is quite common, when casting a char pointer into a pointer to X, to implicitly assume that X has a specific size. The problem is that (other than char types) sizes of all types are implementation defined. This sort of mismatch (e.g. program writing the stream and program interpreting it assuming difference sizes) is a potential culprit when the code is rebuilt using a different compiler. [This doesn't explain the symptom appearing and disappearing in debug versus production settings, but is a potential cause to look at].

Found. A hard to see semicolon terminated the condition; I was hitting the condition body every time. Further heap corruptions were corrected with better pointer arithmetic and "realloc(buff, size)" should be "buff = realloc(buff, size)".

Related

C++ code migration: handling uninitialized pointers

As per the title, I am planning to move some legacy code developed a decade+ ago for AIX. The problem is the code base is huge. The developers didn't initialize their pointers in the original code. Now while migrating the code to the latest servers, I see some problems with it.
I know that the best solution is to run through all the code and initialize all the variables whereever required. However, I am just keen to know if there are any other solutions available to this problem. I tried google but couldn't find an appropriate answer.

The most preventive long-term approach is to initialize all pointers at the location they're declared, changing the code to use appropriate smart pointers to manage the lifetime. If you have any sort of unit tests this refactoring can be relatively painless.
In a shorter term and if you're porting to Linux you could use valgrind and get a good shot at tracking down the one or two real issues that are biting you, giving you time to refactor at a more leisurely pace.

Just initializing all the variables may not be a good idea.
Reliable behavior generally depends on variables having values known to be correct ("guaranteed by construction" to be correct). The problem with uninitialized variables isn't simply that they have unknown values. Obviously being unknown is a problem, but again the desired sate is having known and correct values. Initializing a variable to a known value that is not correct does not yield reliable behavior.
Not infrequently it happens that there is no 'default' value that is correct to use as a fallback if more complicated initialization fails. A program may choose not to initialize a variable with a value if that value must be over-written before the variable can be used.
Initializing a variable to a default value may have a few problems in such cases. Often 'default' values are inoffensive in that if they are used the consequences aren't immediately obvious. That's not generally desirable because as the developer you want to notice when things go wrong. You can avoid this problem by picking default values that will have obvious consequences, but that doesn't solve a second issue; Static analyzers can often detect and report when an uninitialized variable is used. If there's a problem with some complicated initialization logic such that no value is set, you want that to be detectable. Setting a default value prevents static analysis from detecting such cases. So there are cases where you do not want to initialize variables.
With pointers the default value is typically nullptr, which to a certain extent avoids the first issue discussed above because dereferencing a null pointer typically produces an immediate crash (good for debugging). However code might also detect a null pointer and report an error (good for debugging) or might fall back to some other method (bad for debugging). You may be better off using static analysis to detect usages of uninitialized pointers rather than initializing them. Though static analysis may detect dereferencing of null pointers it won't detect when null pointers cause error reporting or fallback routines to be used.
In response to your comment:
The major problems that i see are
Pointers to local variables are returned from functions.
Almost all the pointer variables are not initialized. I am sure that AIX does provide this comfort for the customer in the earlier platform however i really doubt that the code would run flawlessly in Linux when it is being put to real test (Production).
I cannot deliver partial solutions which may work. i prefer to give the best to my customer who pays me for my work. So Wont prefer to use workarounds.
Quality cannot be compromised.
fix them (and pay special attention to correctly cleaning up)
As I argue above simply lacking an initializer is not in and of itself a defect. There is only a defect if the uninitialized value is actually used in an illegal manner. I'm not sure what you mean about AIX providing comfort.
As I argue above the 'partial solution' and 'workaround' would be to blindly initialize everything.
Again, blindly initializing everything can result not only in useless work, but it can actually compromise quality by taking away some tools for detecting bugs.

C++ function used to work, now returning 0xfdfdfdfd

I have some code I wrote a few years ago. It has been working fine, but after a recent rebuild with some new, unrelated code elsewhere, it is no longer working. This is the code:
//myobject.h
...
inline CMapStringToOb* GetMap(void) {return (m_lpcMap);};
...
The above is accessed from the main app like so:
//otherclass.cpp
...
CMapStringToOb* lpcMap = static_cast<CMyObject*>(m_lpcBaseClass)->GetMap();
...
Like I said, this WAS working for a long time, but it's just decided to start failing as of our most recent build. I have debugged into this, and I am able to see that, in the code where the pointer is set, it is correctly setting the memory address to an actual value. I have even been able to step into the set function, write down the memory address, then move to this function, let it get 0xfdfdfdfd, and then manually get the memory address in the debugger. This causes the code to work. Now, from what I've read, 0xfdfdfdfd means guarding bytes or "no man's land", but I don't really understand what the implications of that are. Supposedly it also means an off by one error, but I don't understand how that could happen, if the code was working before.

I'm assuming from the Hungarian notation that you're using Visual Studio. Since you do know the address that holds the map pointer, start your program in the debugger and set a data breakpoint when that map pointer changes (the memory holding the map pointer, not the map pointed to). Then you'll find out exactly when it's getting overwritten.

0xfdfdfdfd typically implies that you have accessed memory that you weren't supposed to.
There is a good chance the memory was allocated and subsequently freed. So you're using freed memory.
static_cast can modify a pointer and you have an explicit cast to CMyObject and an implicit cast to CMapStringToOb. Check the validity of the pointer directly returned from GetMap().

Scenarios where "magic" happens almost always come back to memory corruption. I suspect that somewhere else in your code you've modified memory incorrectly, and it's resulting in this peculiar behavior. Try testing some different ways of entering this part of the code. Is the behavior consistent?
This could also be caused by an incorrectly built binary. Try cleaning and rebuilding your project.

Why is it necessary to do a rebuild after adding a new member variable to a class?

This morning, in Visual Studio 2005, I tried adding a new private member variable to a class and found that it was giving me all sorts of weird segmentation faults and the like. When I went into debug mode, I found that my debugger didn't even see the new member variable, and thus it was giving me some strange behavior.
It required a "rebuild all" in order to get my program working again (and to get the debugger to see the new member variables I had made). Why was it necessary to rebuild all? Why was just doing a regular build insufficient?
I already solved the problem, but I feel like I understanding the build process better will help me in the future. Let me know if there's any more information you need.
Thanks in advance!

When you add or remove members of a class you change the memory layout of the object. If you don't recompile you are breaking the ODR rule, and the segmentation faults are just the effect of that.
As to why that happens, old code might be acquiring memory for the old size, and then passing that object (without the new member) to new code that will access beyond the end of the allocated memory to access the new variable. Note that the access specifier does not affect at all, if it is private it will probably be the class member functions the ones accessing the fields.
If you did not add the field to the end, but rather to the middle of the object, the same effect will be seen while accessing those fields that are laid out by the compiler in the higher memory addresses.
The fact that you needed to use the rebuild all feature is an indication that the dependencies of your project are not correctly configured, and you should fix that as soon as possible. Having the right dependencies will force the compiler into rebuilding when needed, and will mean less useless debugging hours.

One obvious answer would be: "because Visual Studios is broken, and doesn't handle dependencies correctly". In fact, however, I don't think you've given us enough information for me to be able to make that statement (and Visual Studios does get the simple cases right).
When you add members (private or public, it doesn't matter), especially data members, but also virtual functions, you change the physical layout of the class in memory. All code which depends on that physical layout must be recompiled. Normally, the build system takes care of this automatically, but a broken makefile, or a bug in the system, can easily mean that it doesn't. (The correct answer isn't to invoke a rebuild/make clean, but to fix the problem with the build system.)

stl::map issues

This must be me doing something stupid, but has anyone seen this behaviour before:
I have a map in a class member defined like so:
std::map <const std::string, int> m_fCurveMap;
all behaves fine in debug but all goes wrong in release mode. map gets initialised to some crazy number: m_fCurveMap [14757395258967641292]()
Any member I have after the map gets completely corrupted, ie if I put an int on the line after the map like this:
std::map <const std::string, int> m_fCurveMap;
int m_myIntThing;
and in my constructors set m_myIntThing to 0, after the constructor has been called m_myIntThing is some crazy number. If I move m_myIntThing to the line above the map everything for m_myIntThing is fine. This ends up causing big problems for me further down the line. Do I need to do something to the map in my constructor? I'm not at the moment.
I am using visual studio, this works fine with gcc. I only see the problem in release. The project is a dll.
If you have seen this kind of madness before please help its driving me mad. :-)
Many thanks,
Mark

This has happened to me lots of times. Although it's hard to say in your case, a very likely reason is that you have different versions of the C run time library in between different projects. Check your "code generation" tab in the compiler settings for your different projects and make certain they are the same.
What's effectively happening is that different versions of the C run time libraries implement STL containers in different ways. Then when the different projects try to talk to each other, the meaning of what an std::map is (for instance) have changed and are no longer binary compatible.
The strange behavior is very likely some kind of heap corruption, or if it's being passed as a parameter to a function, stack corruption.

The problem is memory corruption of some kind.
A bug that I have seen often in C++ projects is using an object after it has been deleted.
Another possibility is a buffer overflow. It could be any object on the same stack or nearby on the heap.
A pretty good way to catch the culprit is to set a debugger breakpoint that fires on memory change. While the object is still good, set your breakpoint. Then wait until some code writes into that memory location. That should reveal your bug.

If you're getting your information from the VS debugger, I wouldn't trust what it is telling you for a Release DLL. The debugger can only be really trusted with Debug DLLs.
If program output is telling you this, then that's different -- in that case, you're not providing enough information.

Are you mixing a release DLL with a debug app?
Otherwise it sounds like memory corruption, although I can't say for sure.
Something else is stomping on memory
You're accessing deleted memory
You're returning a temporary by pointer or reference
etc
Any of these could appear to work fine in some cases as they're undefined behavior, and only in release mode do they blow up.

I had the exact same problem on g++, I got it resolved by removing the pragmas in a pragma paragraph before that. Eventhough the code is correct, I wonder if this is a compiler bug on the platform showing up when using stl::map in some situations.
#pragma pack(push,1)
xxxx
#pragma(pop)

Just to give a concrete example for the memory corruption:
typedef std::map<int, int> mymap_t;
static mymap_t static_init() { return mymap_t(); }
class foo {
foo(): mymap(static_init()) {}
//!> d'oh, don't reference!
const mymap_t &mymap;
};
Accidentally, I defined a ref to a member variable and not the member variable itself. It gets initialized alright, but as soon as the scope of static_init() is left, the map is destroyed and the ref will just show up in debug as "std::map with 140737305218461 elements" (pretty-printed) or similar as it points to now unallocated meory (or worse).
Beware of accidental references!

Can objects be unwinded before they are created on stack?

We have been debugging a strange case for some days now, and have somewhat isolated the bug, but it still doesn't make any sense. Perhaps anyone here can give me a clue about what is going on.
The problem is an access violation that occur in a part of the code.
Basically we have something like this:
void aclass::somefunc() {
try {
erroneous_member_function(*someptr);
}
catch (AnException) {
}
}
void aclass::erroneous_member_function(const SomeObject& ref) {
// { //<--scope here error goes away
LargeObject obj = Singleton()->Object.someLargeObj; //<-remove this error goes away
//DummyDestruct dummy1//<-- this is not destroyed before the unreachable
throw AnException();
// } //<--end scope here error goes away
UnreachableClass unreachable; //<- remove this, and the error goes away
DummyDestruct dummy2; //<- destructor of this object is called!
}
While in the debugger it actually looks like it is destructing the UnreachableClass, and when I insert the DummyDestruct object this does not get destroyed before the strange destructor are called. So it is not seem like the destruction of the LargeObject is going awry.
All this is in the middle of production code, and it is very hard to isolate it to a small example.
My question is, does anyone have a clue about what is causing this, and what is happening? I have a quite full featured debugger available (Embarcadero RAD studio), but now I am not sure what to do with it.
Can anyone give me some advise on how to proceed?
Update:
I placed a DummyDestruct object beneath the throw clause, and placed a breakpoint in the destructor. The destructor for this object is entered (and its only us is in this piece of code).

With the information you have provided, and if everything is as you state, the only possible answer is a bug in the compiler/optimizer. Just add the extra scope with a comment (This is, again, if everything is exactly as you have stated).

Stuff like this sometimes happens due to writing through uninitialized pointers, out of bounds array access, etc. The point at which the error is caused may be quite removed from the place where it manifests. However, based on the symptoms you describe it seems to be localized in this function. Could the copy constructor of LargeObject be misbehaving? Is ref being used? Perhaps somePtr isn't pointing to a valid SomeObject. Is Singleton() returning a pointer to a valid object?
Compiler error is also a possibility, especially with aggressive optimization turned on. I would try to recreate the bug with no optimizations.

Time to practice my telepathic debugging skills:
My best guess is your application has a stack corruption bug. This can write junk over the call stack, which means the debugger is incorrectly reporting the function when you break, and it's not really in the destructor. Either that or you are incorrectly interpreting the debugger's information and the object really is being destructed correctly, but you don't know why!
If stack corruption is the case you're going to have a really tough time working out what the root cause is. This is why it's important to implement tonnes of diagnostics (eg. asserts) throughout your program so you can catch the stack corruption when it happens, rather than getting stuck on its weird side effects.

This might be a real long shot but I'm going to put it out there anyway...
You say you use borland - what version? And you say you see the error in a string - STL? Do you include winsock2 at all in your project?
The reason I ask is that I had a problem when using borland 6 (2002) and winsock - the header seemed to mess up the structure packing and meant different translation units had a different idea of the memory layout of std::string, depending on what headers were included by the translation unit, with predictably disastrous results.

Here's another wild guess, since you mentioned strings. I know of at least one implementation where (STL) string copying is done in a lazy manner (i.e., no actual copying of the string contents takes place until a change is made; the "copying" is done by simply having the target string object point to the same buffer as the source). In that particular implementation (GNU) there is a bug whereby excessive copying causes the reference counter (how many objects are using the same actual string memory after supposedly copying it) to roll over to 0, resulting in all sorts of mischief. I haven't encountered this bug myself, but have been told about it by someone who has. (I say this because one would think that the ref counter would be a 32 bit number and the chances of that ever rolling over are pretty slim, to say the least, so I may not be describing the problem properly.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js