The title is not that clear, and if anybody has a better suggestion please tell me.
Now to business:
I am activating a class' method.
m_someObject.Clear();
The problem is that when I look at the address of m_someObject before the call I get that it is located in a certain address, and when I enter the Clear method with the debugger I get that this variable is located in another address.
The result is that after returning from Clear method it doesn't seem to have affected
m_someObject instance which called it.
Does anybody have any idea what could cause this kind of behavior?
Working on Microsoft Visual Studio 2010 64-bit.
Probably you pass m_someObject as a value to some other function (and thus get a copy) and execute Clear() only on copy. This way you will not notice a change on original object.
Can you please check if you have two different variables with the same name? One defined in the immediate scope and another one, maybe in the global scope?
The most common reason is Multiple Inheritance. Unlike C# and Java, in C++ a class can have multiple base classes. Obviously, not all can be located at offset 0. This means that this has to be adjusted if you're using a method from a base class that's located at a non-zero offset.
Well, apparently the debugger was lying.. I wasn't aware of this, but apparently some of the code was compiled in release mode. Conclusion - Debugger No, printf - Yes.
Related
Background
I work with Watusimoto on the game Bitfighter. We use a variation of LuaWrapper to connect our c++ objects with Lua objects in the game. We also use a variation of Lua called lua-vec to speed up vector operations.
We have been working to solve a bug for some time that has eluded us. Random crashes will occur that suggest corrupt metatables. See here for Watusimoto's post on the issue. I'm not sure it is because of a corrupt metatable and have seen some really odd behavior about which I wish to ask here.
The Problem Manifestation
As an example, we create an object and add it to a level like this:
t = TextItem.new()
t:setText("hello")
levelgen:addItem(t)
However, the game will sometimes (not always) crash. With an error:
attempt to call missing or unknown method 'addItem' (a nil value)
Using a suggestion given in answer to Watusimoto's post mentioned above, I have changed the last line to the following:
local ok, res = pcall(function() levelgen:addItem(t) end)
if not ok then
local s = "Invalid levelgen value: "..tostring(levelgen).." "..type(levelgen).."\n"
for k, v in pairs(getmetatable(levelgen)) do
s = s.."meta "..tostring(k).." "..tostring(v).."\n"
end
error(res..s)
end
This prints out the metatable for levelgen if something when wrong calling a method from it.
However, and this is crazy, when it fails and prints out the metatable, the metatable is exactly how it should be (with the correct addItem call and everything). If I print the metatable for levelgen upon script load, and when it fails using pcall above, they are identical, every call and pointer to userdata is the same and as it should be.
It is as though the metatable for levelgen is spontaneously disappearing at random.
Would anyone have any idea what is going on?
Thank you
Note: This doesn't happen with only the levelgen object. For instance, it has happened on the TestItem object mentioned above as well. In fact, that same code crashes on my computer at the line levelgen:addItem(t) but crashes on another developer's computer with the line t:setText("hello") with the same error message missing or unknown method 'setText' (a nil value)
As with any mystery, you will need to peel it off layer by layer. I recommend going through the same steps Lua is going and trying to detect where the path taken diverge from your expectations:
What does getmetatable(levelgen).__index return? If it's a table, then check its content for addItem. If it's a function, then try to call it with (table, "addItem") and see what it returns.
Check if getmetatable returns reference to the same object before and after the call (or when it fails).
Are there several levels of metatable indirection that the call is going through? If so, try to follow the same path with explicit calls and see where the differences are.
Are you using weak keys that may cause values to disappear if there are no other references?
Can you provide a "default" value when you detect that it fails and continue to see if it "finds" this method again later? Or when it's broken, it's broken for every call after that?
What if you save a proper value for addItem and "fix" it when you detect it's broken?
What if you simply handle the error (as you do) and call it 10 times? Would it show valid results at least once (after it fails)? 100 times? If you keep calling the same method when it works, will it fail? This may help you to come up with a more reproducible error.
I'm not familiar with LuaWrapper to provide more specific questions, but these are the steps I'd take if I were you.
I strongly suspect the issue is that you have a class or struct similar to this:
struct Foo
{
Bar bar;
// Other fields follow
}
And that you've exposed both Foo and Bar to Lua via LuaWrapper. The important bit here is that bar is the first field on your Foo struct. Alternatively, you may have some class that inherits from some other base class and both the derived and base class are exposed to LuaWrapper.
LuaWrapper uses an function called an Identifier to uniquely track each object (like whether or not the given object has already been added to the Lua state). By default it uses the object address as a key. In cases like the one posed above it is possible that both Foo and Bar have the same address in memory, and thus LuaWrapper can get confused.
This may result in grabbing the wrong object's metatable when attempting to look up a method. Clearly, since it's looking at the wrong metatable it won't find the method you want, and so it will appear as if your metatable has mysteriously lost entries.
I've checked in a change that tracks each object's data per-type rather than in one giant pile. If you update your copy LuaWrapper to latest one from the repository I'm fairly certain your problem will be fixed.
After merging with upstream (commit 3c54015) LuaWrapper, this issue has disappeared. It appears to have been a bug in LuaWrapper.
Thanks Alex!
I'm getting a seg fault that I do not understand. I'm using the Wt library and doing some fancy things with signals (which I only mention because it has enabled me to attempt to debug this).
I'm getting a pointer to one of my widgets from a vector and trying to call a method on the object it points to. Gdb shows that the pointer resolves, and if I examine the object it points to, it is exactly the one I need to modify. In this instance, the widget is broadcasting to itself, so it is registered as both the broadcaster and the listener; therefore, I was also able to verify that the 'broadcaster' pointer and the 'listener' pointer are accessing the same object. They do!
However, even though I can see that the object exists, and is initialized, and is in fact the correct object, when I try to call a method on the object, I get an immediate seg fault. I've tried a few different methods (including a few boolean returns that don't modify the object). I've tried calling them through the broadcaster pointer and the listener pointer, again, just to try to debug.
The debugger doesn't even enter the object; the segfault occurs immediately on attempting to call a method.
Code!
/* listeners is a vector of pointers to widgets to whom the broadcasting widget
* is trying to signal.
*/
unsigned int num_listeners = listeners.size();
for (int w = 0; w < num_listeners; w++)
{
// Moldable is an abstraction of another widget type
Moldable* widget = listeners.at(w);
/* Because in this case, the broadcaster and the listener are one in the same,
* these two point to the same location in memory; this part works. I know, therefore,
* that the object has been instantiated, exists, and is happy, or we wouldn't
* have gotten to this point to begin with. I can also examine the fields with gdb
* and can verify that all of this is correct.
*/
Moldable* broadcaster_debug = broadcast->getBroadcaster();
/* setStyle is a method I created, and have tested in other instances and it
* works just fine; I've also used native Wt methods for testing this problem and
* they are also met with segfaults.
*/
widget->setStyle(new_style); // segfault goes here!
}
I have read since researching that storing pointers in vectors is not the greatest idea and I should look into boost::shared_ptr. That may be so, and I will look into it, but it doesn't explain why calling a method on an object known to exist causes a segfault. I'd like to understand why this is happening.
Thanks for any assistance.
Edit:
I have created a gist with the vector operations detailed because it was more code than would comfortably fit in the post.
https://gist.github.com/3111137
I have not shown the code where the widgets are created because it's a recursive algorithm and in order to do that, I would have to show the entire class decision tree for creating widgets. Suffice to say that the widgets are being created; I can see them on the page when viewing the application in a browser. Everything works fine until I start playing with my fancy signals.
Moar Edit:
When I take a look at the disassembly in instruction stepping mode, I can see that just before the segfault occurs, the following operation takes place, the first argument of which is listed as 'void'. Admittedly, I know nothing about Assembly much to my chagrin, but this seems to be important. Can anyone explain what this instruction means and whether it might be the cause of my woes?
add $0x378,%rax //$0x378 is listed as 'void'
Another Edit:
At someone's suggestion, I created a non-virtual method that I am able to successfully call just before the seg fault, meaning the object is in fact there. If I take the same method and make it virtual, the seg fault occurs. So, why do only virtual methods create a seg fault?
I've discovered now that if in the calling class, I make sure to specify Moldable::debug_test (and Moldable::setStyle), the seg fault does not take place. However, this seems to have a similar effect as const bubbling -- every virtual method seems to want this specifier. I've never witnessed this behaviour before. While i'm willing to correct my code if that's REALLY how it's supposed to be, I'm not sure if the root problem is something else.
Getting there!
Well, I figured out the problem, though I'm sad to say it was a totally newbish mistake that due to the nature of the project was super difficult to find. I'll put the answer here, and I've also voted to close the question as too localized. Please feel free to do the same.
The BroadcastMessage class had a __broadcaster field (Moldable* __broadcaster;). When passing in the pointer to the broadcaster into the BroadcastMessage constructor, I forgot to assign the inbound pointer to that field, meaning __broadcaster was not a fully realised instance of the Moldable class.
Therefore, some methods were in fact working -- those that could be inlined, or my dummy functions that I created for testing (one of which returned a value of 1, for instance), so it was appearing that there was a full object there when in fact there was not. It wasn't until calling a more specialized method that tried to access some specific, dynamic property of the object that the segfault occurred.
What's more, most of the broadcast message lifespan was in its constructor, meaning that most of its purpose was fulfilled without issue, because the broadcaster was available in the local scope of the constructor.
However, using Valgrind as suggested, I did uncover some other potential issues. I also pretty much stripped-down and re-built the entire project. I trashed tons of unnecessary code and it runs a lot faster now as a side effect.
Anyway, thanks for all the assistance. Sorry the solution wasn't more of a discovery.
This morning, in Visual Studio 2005, I tried adding a new private member variable to a class and found that it was giving me all sorts of weird segmentation faults and the like. When I went into debug mode, I found that my debugger didn't even see the new member variable, and thus it was giving me some strange behavior.
It required a "rebuild all" in order to get my program working again (and to get the debugger to see the new member variables I had made). Why was it necessary to rebuild all? Why was just doing a regular build insufficient?
I already solved the problem, but I feel like I understanding the build process better will help me in the future. Let me know if there's any more information you need.
Thanks in advance!
When you add or remove members of a class you change the memory layout of the object. If you don't recompile you are breaking the ODR rule, and the segmentation faults are just the effect of that.
As to why that happens, old code might be acquiring memory for the old size, and then passing that object (without the new member) to new code that will access beyond the end of the allocated memory to access the new variable. Note that the access specifier does not affect at all, if it is private it will probably be the class member functions the ones accessing the fields.
If you did not add the field to the end, but rather to the middle of the object, the same effect will be seen while accessing those fields that are laid out by the compiler in the higher memory addresses.
The fact that you needed to use the rebuild all feature is an indication that the dependencies of your project are not correctly configured, and you should fix that as soon as possible. Having the right dependencies will force the compiler into rebuilding when needed, and will mean less useless debugging hours.
One obvious answer would be: "because Visual Studios is broken, and doesn't handle dependencies correctly". In fact, however, I don't think you've given us enough information for me to be able to make that statement (and Visual Studios does get the simple cases right).
When you add members (private or public, it doesn't matter), especially data members, but also virtual functions, you change the physical layout of the class in memory. All code which depends on that physical layout must be recompiled. Normally, the build system takes care of this automatically, but a broken makefile, or a bug in the system, can easily mean that it doesn't. (The correct answer isn't to invoke a rebuild/make clean, but to fix the problem with the build system.)
I'm chasing a bug where a member value of an object seems to magically change, without any methods being called which modify it. No doubt something obvious but proving hard to track down.
I know I can put conditional break-points in methods based on the variable value, but is it in any way possible to actually put a breakpoint on a variable itself? e.g a breakpoint which fires when x==4? I know I can put watches on, what about breakpoints?
Edit: this is a native-only project, no managed malarkey.
You can use a data breakpoint. There are a number of restrictions about how and when they can be used, namely that they work only in native code.
(To the best of my knowledge, you can only tell it to break when the variable changes, not when it changes to a specific value, but I'm not entirely sure; most of my code is mixed managed/native and thus can't use data breakpoints).
What you should do is just wrap the variable in a set/get - not just a template functions but actually in a separate class, where set/get MUST be used to access. Then put a breakpoint in there. Alternatively, for easier chop and change, you could wrap the value in a class, and use operator overloads (with appropriate breaks in) to alter. That's probably the cleanest and most portable solution.
What you may also find is that the variable being modified is not in ways you expect. Best example I've got is that I had unsigned int where I subtracted from zero when I meant to increment from zero, so when I was looking for places that I knew modified it, that didn't flag up. Couldn't work out wtf was going on.
However, as far as I know, VC++ supports no mechanism to break on arbitrary changes, if the data breakpoint won't work for you. for example, if it was changed due to stack/heap corruption. But if you're running in debug, I'd expect that VC++ would break on those.
A while ago I read the Debugging Windows Programs book, and one of the tricks that it talked about extensively was calling functions from the Visual C++ debugger (quick)watch window.
As luck would have it, I don't have a copy on hand and the little documentation that I could find about this is really really poor.
So how DO you call a member function in the watch window? What if the function lives in a DLL? What if it is part of a namespace? Can you pass non-trivial parameters?
Let's use this example: I want to call the size() method of QList<MyType>, where MyType is a custom type.
Thanks!
It works and is hugely useful. You can evaluate expressions in the watch window or open the quick watch window (ctrl-alt-Q -- a very handy shortcut to know). It will let you call most forms of member functions. The only times it commonly tends to fail is if you've got overloaded operators, eg with smart pointers. For a simple class without overloaded operators you should find it should work well. I think it should accept non-trivial parameters (though obviously it depends how non-trivial!) As well as calling functions that return values, you can also call functions that modify the object -- there's no constraint on only calling getter methods.
The other kind-of-obvious thing to remember is that all variables are evaluated in the local stack frame, so ensure the variable is visible from the current point in the stack.
I'd say just write list.size() in the watch window, where list is an instance of your QList, but I'm not sure this works for all classes
Are you sure that you can call methods of objects while debugging code in Visual Studio? Because I was never able to do so. The closest debugging features I know is to have a quick watch on objects (including local objects in the stack, navigating through the call stack), or compile and continue (I used it in VC6) allowing to change the code, recompile and continue debuging from the last statement...