C++/CLI System.NullReferenceException being thrown in Native Code - c++

I'm working on a native C++ application, and I am trying to utilize some managed classes from a separate assembly to read/write data containers that were originally developed in C#. The interop layer is basically loading the data, and then mirroring the managed data into a functionally equivalent native data container for use by the application (and then obviously back again for writing).
To be totally honest, it's not been super fun trying to do this interop, but it's mostly working at this point. There's one step of the translation that's holding everything up, though, and while debugging it, a managed exception (System.NullReferenceException) is being thrown on a line that is native C++ only. I don't understand at all why this is happening, and I'm hoping that someone can make sense of what's going on. I wouldn't even have expected it to be in a managed portion of the stack at the location where it's throwing...
This is a stripped down version of what our code is doing:
// The native data types.
class cUnmanagedInterpolator
{
virtual std::vector<std::pair<double, cUnmanagedInterpolator>& GetPoints() { return mPoints; }
std::vector<std::pair<double, cUnmanagedInterpolator> mPoints;
};
class cUnmanagedInterpolator_Const : public cUnmanagedInterpolator
{
virtual double GetValue() const { return mValue; }
double mValue
};
// The managed data types (please forgive syntax errors here; they're actually written in C# in our software, and this is just to get the point across).
class ManagedInterpolator
{
property List<KeyValuePair<double, ManagedInterpolator^>^ Points = gcnew List<KeyValuePair<double, ManagedInterpolator^>();
};
class ManagedInterpolator_Const : public ManagedInterpolator
{
property double DependentValue;
};
// The function to mirror the managed data container into a native counterpart.
void CopyManagedPoints( ManagedInterpolator^ rhManagedInterpolator, cUnmanagedInterpolator* pUnmanagedInterpolator )
{
// Go through each managed point in the interpolator and add a corresponding unmanaged one.
for each( auto point in rhManagedContainer->Points )
{
// If this is a constant interpolator, just copy the values.
if( dynamic_cast<ManagedInterpolator_Const^>( point->Value ) != nullptr )
{
// Create a new unmanaged copy of the point.
// I even tried making x and r pointers and allocating the doubles on the heap with "new" to make sure they weren't somehow ending up as CLI types, but it didn't make a difference.
double x = point->Key;
double r = dynamic_cast<ManagedInterpolator_Const^>( point->Value )->DependentValue;
std::pair<double, cUnmanagedInterpolator> newPoint( x, cUnmanagedInterpolator_Const( r ) );
// The unmanaged point data was looking weird, and this appeared to be where it was happening, so create a message with what it thinks the point is at this point.
// ***The next line is where the System.NullReferenceException is thrown.***
std::string debugMessage = MakeString( newPoint.first ) + ", " + MakeString( dynamic_cast<cUnmanagedInterpolator_Const*>( &( newPoint.second ) )->GetValue() );
// Add the copy to the unmanaged interpolator.
pUnmanagedInterpolator->GetPoints().push_back( newPoint );
// ***Trying to reference the newly created point by using pUnmanagedInterpolator->GetPoints().back() also results in an exception.
// Show the debug message to the user.
AfxMessageBox( debugMessage.c_str() );
}
// Otherwise, add a new base class interpolator.
else
{
cUnmanagedInterpolator* pNewInterp = new cUnmanagedInterpolator();
// Recurse as deep as it goes.
if( pNewInterp )
{
pUnmanagedInterpolator->GetPoints().push_back( std::make_pair( point->Key, std::move( *pNewInterp ) ) );
CopyManagedPoints( point->Value, &( pUnmanagedInterpolator->GetPoints().back().second ) );
delete pNewInterp;
pNewInterp = nullptr;
}
}
}
}
// Roughly how the function would be used.
int main()
{
ManagedInterpolator^ rhManagedInterpolator = gcnew ManagedInterpolator( /*initialization information*/ );
cUnmanagedInterpolator* pNewInterp = new cUnmanagedInterpolator();
CopyManagedPoints( rhManagedInterpolator, pNewInterp );
// Use the data...
return 0;
}
The exception occurs in the inner if statement (there are comments preceded by three asterisks ("***") marking where the problems appear to be occurring).
A quick summary of the code in case it's not clear enough:
Basically, the data container is an "interpolator" that contains pairs of independent and dependent values ("points"). The dependent values can be either another interpolator or a scalar value. Dependent values that are interpolators can recurse as deeply as desired, but must end in a fixed value, which is a class derived from the interpolator.

You're using handles (^) so this is not native code. The problem is that your pair newPoint has a cUnmanagedInterpolator object as its second value, which you try to dynamic_cast to a cUnmanagedInterpolator_Const on the line that generates the NullReferenceException. This cast will fail and return a nullptr, which when dereferenced will cause the exception.
Fundamentally, while you start with a cUnmanagedInterpolator_Const, it is sliced to a cUnmanagedInterpolator when you create the pair, losing its identity as a cUnmanagedInterpolator_Const and becoming a cUnmanagedInterpolator.

Related

Pointer gives abnormal values after deferencing

I am trying to replicate a big C++ library. It has the following structure of code
RobotRunner.h
class RobotRunner
{
public:
VectorNavData* vectorNavData;
};
RobotRunner.cpp
void RobotRunner::run()
{
printf("Quaternion[0]: %f \n", vectorNavData->quat[0]); //Output: 243235653487854 - Abnormal values
}
SimulationBridge.h
class SimulationBridge
{
private:
VectorNavData _vectorNavData; // Actual value of vectornavdata from the robot is stored here
RobotRunner* _robotRunner = nullptr; //Pointer to the RobotRunner Object
}
SimulationBridge.cpp
void SimulationBridge::init()
{
_robotRunner = new RobotRunner();
printf("Quaternion[0]: %f \n", _vectorNavData.quat[0]); // Output: 0.43 - Normal and expected
_robotRunner->vectorNavData = &_vectorNavData;
}
void SimulationBridge::run()
{
_robotRunner->run();
}
//This function runs continuously and updates the _vectorNavData in a separate thread
void SimulationBridge::readIMU()
{
while(true)
{
//_lowState stores the values of different robot parameters at a given time
_vectorNavData.accelerometer[0] = _lowState.imu.accelerometer[0];
_vectorNavData.accelerometer[1] = _lowState.imu.accelerometer[1];
_vectorNavData.accelerometer[2] = _lowState.imu.accelerometer[2];
_vectorNavData.quat[0] = _lowState.imu.quaternion[1];
_vectorNavData.quat[1] = _lowState.imu.quaternion[2];
_vectorNavData.quat[2] = _lowState.imu.quaternion[3];
_vectorNavData.quat[3] = _lowState.imu.quaternion[0];
_vectorNavData.gyro[0] = _lowState.imu.gyroscope[0];
_vectorNavData.gyro[1] = _lowState.imu.gyroscope[1];
_vectorNavData.gyro[2] = _lowState.imu.gyroscope[2];
}
}
VectorNavData is a struct which stores the details about the orientation of the robot. It has the following definition
struct VectorNavData {
Vec3<float> accelerometer;
Vec3<float> gyro;
Quat<float> quat;
};
I have included only the necessary part of the code here for brevity.
Code Explanation:
SimulationBridge class communicates with the robot in the simulation. It takes in vectorNavData and stores it in the member variable _vectorNavData. SimulationBridge also contains the pointer to the RobotRunner class as one of it's member. I am allocating the address of _vectorNavData object to the pointer _robotRunner->vectorNavData (check SimulationBridge.cpp). Inside the RobotRunner class I deference this pointer and use the values in other parts of the code.
Problem:
If I print the vectorNavData inside the SimulationBridge.cpp the values seems to be normal. But after assigning the pointer of the same object to the robot runner, if I print the values there the values seems to be abnormally high. My question is, is this way of using pointers for dynamic allocation recommended? If not what is the best alternative way I can use?
Another important point to note is, I am compiling the code with CMake and "-O3" optimization flag is set to the CMAKE_CXX_FLAGS. If I remove this flag, the code sorta works fine for the above object pointer but I am still getting similar error for another object pointer in another part of the code. I have not included that here because it's pretty complex to describe the code structure and the problem essentially is the same.

Why does PyCXX handle new-style classes in the way it does?

I'm picking apart some C++ Python wrapper code that allows the consumer to construct custom old style and new style Python classes from C++.
The original code comes from PyCXX, with old and new style classes here and here. I have however rewritten the code substantially, and in this question I will reference my own code, as it allows me to present the situation in the greatest clarity that I am able. I think there would be very few individuals capable of understanding the original code without several days of scrutiny... For me it has taken weeks and I'm still not clear on it.
The old style simply derives from PyObject,
template<typename FinalClass>
class ExtObj_old : public ExtObjBase<FinalClass>
// ^ which : ExtObjBase_noTemplate : PyObject
{
public:
// forwarding function to mitigate awkwardness retrieving static method
// from base type that is incomplete due to templating
static TypeObject& typeobject() { return ExtObjBase<FinalClass>::typeobject(); }
static void one_time_setup()
{
typeobject().set_tp_dealloc( [](PyObject* t) { delete (FinalClass*)(t); } );
typeobject().supportGetattr(); // every object must support getattr
FinalClass::setup();
typeobject().readyType();
}
// every object needs getattr implemented to support methods
Object getattr( const char* name ) override { return getattr_methods(name); }
// ^ MARKER1
protected:
explicit ExtObj_old()
{
PyObject_Init( this, typeobject().type_object() ); // MARKER2
}
When one_time_setup() is called, it forces (by accessing base class typeobject()) creation of the associated PyTypeObject for this new type.
Later when an instance is constructed, it uses PyObject_Init
So far so good.
But the new style class uses much more complicated machinery. I suspect this is related to the fact that new style classes allow derivation.
And this is my question, why is the new style class handling implemented in the way that it is? Why is it having to create this extra PythonClassInstance structure? Why can't it do things the same way the old-style class handling does? i.e. Just type convert from the PyObject base type? And seeing as it doesn't do that, does this mean it is making no use of its PyObject base type?
This is a huge question, and I will keep amending the post until I'm satisfied it represents the issue well. It isn't a good fit for SO's format, I'm sorry about that. However, some world-class engineers frequent this site (one of my previous questions was answered by the lead developer of GCC for example), and I value the opportunity to appeal to their expertise. So please don't be too hasty to vote to close.
The new style class's one-time setup looks like this:
template<typename FinalClass>
class ExtObj_new : public ExtObjBase<FinalClass>
{
private:
PythonClassInstance* m_class_instance;
public:
static void one_time_setup()
{
TypeObject& typeobject{ ExtObjBase<FinalClass>::typeobject() };
// these three functions are listed below
typeobject.set_tp_new( extension_object_new );
typeobject.set_tp_init( extension_object_init );
typeobject.set_tp_dealloc( extension_object_deallocator );
// this should be named supportInheritance, or supportUseAsBaseType
// old style class does not allow this
typeobject.supportClass(); // does: table->tp_flags |= Py_TPFLAGS_BASETYPE
typeobject.supportGetattro(); // always support get and set attr
typeobject.supportSetattro();
FinalClass::setup();
// add our methods to the extension type's method table
{ ... typeobject.set_methods( /* ... */); }
typeobject.readyType();
}
protected:
explicit ExtObj_new( PythonClassInstance* self, Object& args, Object& kwds )
: m_class_instance{self}
{ }
So the new style uses a custom PythonClassInstance structure:
struct PythonClassInstance
{
PyObject_HEAD
ExtObjBase_noTemplate* m_pycxx_object;
}
PyObject_HEAD, if I dig into Python's object.h, is just a macro for PyObject ob_base; -- no further complications, like #if #else. So I don't see why it can't simply be:
struct PythonClassInstance
{
PyObject ob_base;
ExtObjBase_noTemplate* m_pycxx_object;
}
or even:
struct PythonClassInstance : PyObject
{
ExtObjBase_noTemplate* m_pycxx_object;
}
Anyway, it seems that its purpose is to tag a pointer onto the end of a PyObject. This will be because Python runtime will often trigger functions we have placed in its function table, and the first parameter will be the PyObject responsible for the call. So this allows us to retrieve the associated C++ object.
But we also need to do that for the old-style class.
Here is the function responsible for doing that:
ExtObjBase_noTemplate* getExtObjBase( PyObject* pyob )
{
if( pyob->ob_type->tp_flags & Py_TPFLAGS_BASETYPE )
{
/*
New style class uses a PythonClassInstance to tag on an additional
pointer onto the end of the PyObject
The old style class just seems to typecast the pointer back up
to ExtObjBase_noTemplate
ExtObjBase_noTemplate does indeed derive from PyObject
So it should be possible to perform this typecast
Which begs the question, why on earth does the new style class feel
the need to do something different?
This looks like a really nice way to solve the problem
*/
PythonClassInstance* instance = reinterpret_cast<PythonClassInstance*>(pyob);
return instance->m_pycxx_object;
}
else
return static_cast<ExtObjBase_noTemplate*>( pyob );
}
My comment articulates my confusion.
And here, for completeness is us inserting a lambda-trampoline into the PyTypeObject's function pointer table, so that Python runtime can trigger it:
table->tp_setattro = [] (PyObject* self, PyObject* name, PyObject* val) -> int
{
try {
ExtObjBase_noTemplate* p = getExtObjBase( self );
return ( p -> setattro(Object{name}, Object{val}) );
}
catch( Py::Exception& ) { /* indicate error */
return -1;
}
};
(In this demonstration I'm using tp_setattro, note that there are about 30 other slots, which you can see if you look at the doc for PyTypeObject)
(in fact the major reason for working this way is that we can try{}catch{} around every trampoline. This saves the consumer from having to code repetitive error trapping.)
So, we pull out the "base type for the associated C++ object" and call its virtual setattro (just using setattro as an example here). A derived class will have overridden setattro, and this override will get called.
The old-style class provides such an override, which I've labelled MARKER1 -- it is in the top listing for this question.
The only the thing I can think of is that maybe different maintainers have used different techniques. But is there some more compelling reason why old and new style classes require different architecture?
PS for reference, I should include the following methods from new style class:
static PyObject* extension_object_new( PyTypeObject* subtype, PyObject* args, PyObject* kwds )
{
PyObject* pyob = subtype->tp_alloc(subtype,0);
PythonClassInstance* o = reinterpret_cast<PythonClassInstance *>( pyob );
o->m_pycxx_object = nullptr;
return pyob;
}
^ to me, this looks absolutely wrong.
It appears to be allocating memory, re-casting to some structure that might exceed the amount allocated, and then nulling right at the end of this.
I'm surprised it hasn't caused any crashes.
I can't see any indication anywhere in the source code that these 4 bytes are owned.
static int extension_object_init( PyObject* _self, PyObject* _args, PyObject* _kwds )
{
try
{
Object args{_args};
Object kwds{_kwds};
PythonClassInstance* self{ reinterpret_cast<PythonClassInstance*>(_self) };
if( self->m_pycxx_object )
self->m_pycxx_object->reinit( args, kwds );
else
// NOTE: observe this is where we invoke the constructor, but indirectly (i.e. through final)
self->m_pycxx_object = new FinalClass{ self, args, kwds };
}
catch( Exception & )
{
return -1;
}
return 0;
}
^ note that there is no implementation for reinit, other than the default
virtual void reinit ( Object& args , Object& kwds ) {
throw RuntimeError( "Must not call __init__ twice on this class" );
}
static void extension_object_deallocator( PyObject* _self )
{
PythonClassInstance* self{ reinterpret_cast< PythonClassInstance* >(_self) };
delete self->m_pycxx_object;
_self->ob_type->tp_free( _self );
}
EDIT: I will hazard a guess, thanks to insight from Yhg1s on the IRC channel.
Maybe it is because when you create a new old-style class, it is guaranteed it will overlap perfectly a PyObject structure.
Hence it is safe to derive from PyObject, and pass a pointer to the underlying PyObject into Python, which is what the old-style class does (MARKER2)
On the other hand, new style class creates a {PyObject + maybe something else} object.
i.e. It wouldn't be safe to do the same trick, as Python runtime would end up writing past the end of the base class allocation (which is only a PyObject).
Because of this, we need to get Python to allocate for the class, and return us a pointer which we store.
Because we are now no longer making use of the PyObject base-class for this storage, we cannot use the convenient trick of typecasting back to retrieve the associated C++ object.
Which means that we need to tag on an extra sizeof(void*) bytes to the end of the PyObject that actually does get allocated, and use this to point to our associated C++ object instance.
However, there is some contradiction here.
struct PythonClassInstance
{
PyObject_HEAD
ExtObjBase_noTemplate* m_pycxx_object;
}
^ if this is indeed the structure that accomplishes the above, then it is saying that the new style class instance is indeed fitting exactly over a PyObject, i.e. It is not overlapping into the m_pycxx_object.
And if this is the case, then surely this whole process is unnecessary.
EDIT: here are some links that are helping me learn the necessary ground work:
http://eli.thegreenplace.net/2012/04/16/python-object-creation-sequence
http://realmike.org/blog/2010/07/18/introduction-to-new-style-classes-in-python
Create an object using Python's C API
to me, this looks absolutely wrong. It appears to be allocating memory, re-casting to some structure that might exceed the amount allocated, and then nulling right at the end of this. I'm surprised it hasn't caused any crashes. I can't see any indication anywhere in the source code that these 4 bytes are owned
PyCXX does allocate enough memory, but it does so by accident. This appears to be a bug in PyCXX.
The amount of memory Python allocates for the object is determined by the first call to the following static member function of PythonClass<T>:
static PythonType &behaviors()
{
...
p = new PythonType( sizeof( T ), 0, default_name );
...
}
The constructor of PythonType sets the tp_basicsize of the python type object to sizeof(T). This way when Python allocates an object it knows to allocate at least sizeof(T) bytes. It works because sizeof(T) turns out to be larger that sizeof(PythonClassInstance) (T is derived from PythonClass<T> which derives from PythonExtensionBase, which is large enough).
However, it misses the point. It should actually allocate only sizeof(PythonClassInstance) . This appears to be a bug in PyCXX - that it allocates too much, rather than too little space for storing a PythonClassInstance object.
And this is my question, why is the new style class handling implemented in the way that it is? Why is it having to create this extra PythonClassInstance structure? Why can't it do things the same way the old-style class handling does?
Here's my theory why new style classes are different from the old style classes in PyCXX.
Before Python 2.2, where new style classes were introduced, there was no tp_init member int the type object. Instead, you needed to write a factory function that would construct the object. This is how PythonExtension<T> is supposed to work - the factory function converts the Python arguments to C++ arguments, asks Python to allocate the memory and then calls the constructor using placement new.
Python 2.2 added the new style classes and the tp_init member. Python first creates the object and then calls the tp_init method. Keeping the old way would have required that the objects would first have a dummy constructor that creates an "empty" object (e.g. initializes all members to null) and then when tp_init is called, would have had an additional initialization stage. This makes the code uglier.
It seems that the author of PyCXX wanted to avoid that. PyCXX works by first creating a dummy PythonClassInstance object and then when tp_init is called, creates the actual PythonClass<T> object using its constructor.
... does this mean it is making no use of its PyObject base type?
This appears to be correct, the PyObject base class does not seem to be used anywhere. All the interesting methods of PythonExtensionBase use the virtual self() method, which returns m_class_instance and completely ignore the PyObject base class.
I guess (only a guess, though) is that PythonClass<T> was added to an existing system and it seemed easier to just derive from PythonExtensionBase instead of cleaning up the code.

Access violation on std::function assignement using lambdas

Hy everyone, here again. Continuing the code from my previous question : Is this a bad hack? memcpy with virtual classes
I corrected that, using the Clone approach as suggested, but I'm having an error that also happened before I tried the memcpy thing(read question above).
What I'm trying to do is to create a lambda that captures the current script and executes it, and then pass and store that lambda in an object ( Trigger*), in the member InternalCallback.
I get an access violation error on the lambda assignment: http://imgur.com/OKLMJpa
The error happens only at the 4th iteration of this code:
if(CheckHR(EnginePTR->iPhysics->CreateFromFile(physicsPath,StartingTriggerID,trans,scale,-1,false,engPtr)) == HR_Correct)
{
_Lua::ScriptedEntity * newScript = EntityBase->Clone(vm);//nullptr;
string luaPath = transforms.next_sibling().next_sibling().first_attribute().as_string();
if(UseRelativePaths)
{
stringstream temp2;
temp2 << _Core::ExePath() << LuaSubfolder << "\\" << luaPath;
luaPath = temp2.str();
}
newScript->CompileFile(luaPath.c_str());
newScript->EnginePTR_voidptr = engPtr;
auto callback = [=](_Physics::Trigger* trigger,PxTriggerPair* pairs, PxU32 count)
{
newScript->SelectScriptFunction("TriggerCallback");
newScript->AddParam(trigger->Id);
auto data = (_Physics::RayCastingStats*)pairs->otherShape->userData;
newScript->AddParam((PxU8)pairs->flags);
newScript->AddParam(data->ID);
newScript->AddParam((int)data->Type);
newScript->AddParam((int)count);
newScript->Go(1);
return;
};
((_Physics::Trigger*)EnginePTR->iPhysics->GetPhysicObject(StartingTriggerID))->InternalCallback = callback;
StartingTriggerID++;
}
This is the code for Trigger
class Trigger : public PhysicObject
{
public:
Trigger()
{
ActorDynamic = nullptr;
ActorStatic = nullptr;
InternalCallback = nullptr;
}
virtual HRESULT Update(float ElapsedTime,void * EnginePTR);
virtual HRESULT Cleanup(); // Release the actor!!
long Id;
ShapeTypes Type;
static const PhysicObjectType PhysicsType = PhysicObjectType::Trigger;
PxVec3 Scale;
void* UserData;
void Callback(PxTriggerPair* pairs,PxU32 count)
{
InternalCallback(this,pairs,count);
}
function<void(_Physics::Trigger* trigger,PxTriggerPair* pairs, PxU32 count)> InternalCallback;
};
By iteration I mean that is part of a for loop.
My system is Win 7 64 bits, Intel i3, NVIDIA GTX 480, and the compiler Visual Studio 2012 Express, using the C++11 toolset.
I'm really out of ideas. I tested for heap corruption, it appears to be good, I changed the capture in the lambda, changed nothing, I skip the 4th object and it works.
Any help would be really appreciated.
Edit: As required, here is the callstack: http://imgur.com/P7P3t4k
Solved. It was a design error. I store a lot of objects in a map, and they all derive from an object class ( like above, where Trigger derives from PhysicObject ).
The problem was that I was having IDs collisions, so the object stored in ID 5 wasn't a Trigger, so the cast created a bad object, and so the program crashed.
Silly error, really specific, but it might help somebody to remember to check temporal objects.

Null function pointers in a new object aren't actually nullptr

I am using C++(11) w/ Visual Studio 2012.
I create windows using a custom made wrapper class.
CUIWindow* winA = new CUIWindow ( NULL, TEXT("winAClassName"), TEXT("winACaption"), 200, 300 );
Each window has a series of "sockets" for pluggabe events.
public:
LPFNCUIWINDOWONCLOSE OnClose;
LPFNCUIWINDOWONDESTROY OnDestroy;
LPFNCUIWINDOWONNOTIFY OnNotify;
LPFNCUIWINDOWONSIZE OnSize;
LPFNCUIWINDOWONHOTKEY OnHotkey;
I use the following macro for calling the different sockets that can be assigned to my window class in the message loop:
#define MapEvent(e, fn) \
{ \
case e: \
if ( fn != nullptr ) \
return static_cast<LPARAM>( HANDLE_##e((hWnd), (wParam), (lParam), fn) ); \
}
I have a situation as below;
You can assume pWindow is a valid pointer to a CUIWindow object.
At the shown breakpoint some of the uninitialized OnXXXX events are defined as 0xCDCDCDCD and are getting called when their message arrives (regardless of me never actually explicitly setting them after class creation). This gives me an 0x0BADFOOD exception because the function pointer is bad. I would have assumed that null function pointers would have been caught by if ( fn != nullptr ) however now I am not so sure and I'm requesting help to;
Explain why this is occurring
Find the best way to stop this from occurring, preferably without explicitly setting all the function pointers to zero in the constructor since sometimes there are many, many, sockets.
In the constructor for the CUIWindow class you need to explicitly set those members to nullptr (or NULL or 0); since all those pointers are raw pointers, they won't automatically get set to 0 in the CUIWindow constructor:
CUIWindow ( /* whatever parameters... */)
: OnClose(nullptr)
, OnDestroy(nullptr)
, OnNotify(nullptr)
, OnSize(nullptr)
, OnHotkey(nullptr)
{
// the constructor logic...
}
Uninitialized pointers don't generally get set to a null pointer automatically. Is changing your class slightly an option? If so, you can set all of them without naming all of them.
struct CUIWindowEvents
{
LPFNCUIWINDOWONCLOSE OnClose;
LPFNCUIWINDOWONDESTROY OnDestroy;
LPFNCUIWINDOWONNOTIFY OnNotify;
LPFNCUIWINDOWONSIZE OnSize;
LPFNCUIWINDOWONHOTKEY OnHotkey;
}
class CUIWindow
{
public:
CUIWindowEvents Events; // move all events to a simple struct
CUIWindow() // and in your constructor
: Events() // initialise Events; this sets all the pointers to null
{ ... }
};

luabind: cannot retrieve values from table indexed by non-built-in classes‏

I'm using luabind 0.9.1 from Ryan Pavlik's master distribution with Lua 5.1, cygwin on Win XP SP3 + latest patches x86, boost 1.48, gcc 4.3.4. Lua and boost are cygwin pre-compiled versions.
I've successfully built luabind in both static and shared versions.
Both versions pass all the tests EXCEPT for the test_object_identity.cpp test which fails in both versions.
I've tracked down the problem to the following issue:
If an entry in a table is created for NON built-in class (i.e., not int, string, etc), the value CANNOT be retrieved.
Here's a code piece that demonstrates this:
#include "test.hpp"
#include <luabind/luabind.hpp>
#include <luabind/detail/debug.hpp>
using namespace luabind;
struct test_param
{
int obj;
};
void test_main(lua_State* L)
{
using namespace luabind;
module(L)
[
class_<test_param>("test_param")
.def_readwrite("obj", &test_param::obj)
];
test_param temp_object;
object tabc = newtable(L);
tabc[1] = 10;
tabc[temp_object] = 30;
TEST_CHECK( tabc[1] == 10 ); // passes
TEST_CHECK( tabc[temp_object] == 30 ); // FAILS!!!
}
tabc[1] is indeed 10 while tabc[temp_object] is NOT 30! (actually, it seems to be nil)
However, if I use iterate to go over tabc entries, there're the two entries with the CORRECT key/value pairs.
Any ideas?
BTW, overloading the == operator like this:
#include <luabind/operator.hpp>
struct test_param
{
int obj;
bool operator==(test_param const& rhs) const
{
return obj == rhs.obj;
}
};
and
module(L)
[
class_<test_param>("test_param")
.def_readwrite("obj", &test_param::obj)
.def(const_self == const_self)
];
Doesn't change the result.
I also tried switching to settable() and gettable() from the [] operator. The result is the same. I can see with the debugger that default conversion of the key is invoked, so I guess the error arises from somewhere therein, but it's beyond me to figure out what exactly the problem is.
As the following simple test case show, there're definitely a bug in Luabind's conversion for complex types:
struct test_param : wrap_base
{
int obj;
bool operator==(test_param const& rhs) const
{ return obj == rhs.obj ; }
};
void test_main(lua_State* L)
{
using namespace luabind;
module(L)
[
class_<test_param>("test_param")
.def(constructor<>())
.def_readwrite("obj", &test_param::obj)
.def(const_self == const_self)
];
object tabc, zzk, zzv;
test_param tp, tp1;
tp.obj = 123456;
// create new table
tabc = newtable(L);
// set tabc[tp] = 5;
// o k v
settable( tabc, tp, 5);
// get access to entry through iterator() API
iterator zzi(tabc);
// get the key object
zzk = zzi.key();
// read back the value through gettable() API
// o k
zzv = gettable(tabc, zzk);
// check the entry has the same value
// irrespective of access method
TEST_CHECK ( *zzi == 5 &&
object_cast<int>(zzv) == 5 );
// convert key to its REAL type (test_param)
tp1 = object_cast<test_param>(zzk);
// check two keys are the same
TEST_CHECK( tp == tp1 );
// read the value back from table using REAL key type
zzv = gettable(tabc, tp1);
// check the value
TEST_CHECK( object_cast<int>(zzv) == 5 );
// the previous call FAILS with
// Terminated with exception: "unable to make cast"
// this is because gettable() doesn't return
// a TRUE value, but nil instead
}
Hopefully, someone smarter than me can figure this out,
Thx
I've traced the problem to the fact that Luabind creates a NEW DISTINCT object EVERY time you use a complex value as key (but it does NOT if you use a primitive one or an object).
Here's a small test case that demonstrates this:
struct test_param : wrap_base
{
int obj;
bool operator==(test_param const& rhs) const
{ return obj == rhs.obj ; }
};
void test_main(lua_State* L)
{
using namespace luabind;
module(L)
[
class_<test_param>("test_param")
.def(constructor<>())
.def_readwrite("obj", &test_param::obj)
.def(const_self == const_self)
];
object tabc, zzk, zzv;
test_param tp;
tp.obj = 123456;
tabc = newtable(L);
// o k v
settable( tabc, tp, 5);
iterator zzi(tabc), end;
std::cerr << "value = " << *zzi << "\n";
zzk = zzi.key();
// o k v
settable( tabc, tp, 6);
settable( tabc, zzk, 7);
for (zzi = iterator(tabc); zzi != end; ++zzi)
{
std::cerr << "value = " << *zzi << "\n";
}
}
Notice how tabc[tp] first has the value 5 and then is overwritten with 7 when accessed through the key object. However, when accessed AGAIN through tp, a new entry gets created. This is why gettable() fails subsequently.
Thx,
David
Disclaimer: I'm not an expert on luabind. It's entirely possible I've missed something about luabind's capabilities.
First of all, what is luabind doing when converting test_param to a Lua key? The default policy is copy. To quote the luabind documentation:
This will make a copy of the parameter. This is the default behavior when passing parameters by-value. Note that this can only be used when passing from C++ to Lua. This policy requires that the parameter type has an accessible copy constructor.
In pratice, what this means is that luabind will create a new object (called "full userdata") which is owned by the Lua garbage collector and will copy your struct into it. This is a very safe thing to do because it no longer matters what you do with the c++ object; the Lua object will stick around without really any overhead. This is a good way to do bindings for by-value sorts of objects.
Why does luabind create a new object each time you pass it to Lua? Well, what else could it do? It doesn't matter if the address of the passed object is the same, because the original c++ object could have changed or been destroyed since it was first passed to Lua. (Remember, it was copied to Lua by value, not by reference.) So, with only ==, luabind would have to maintain a list of every object of that type which had ever been passed to Lua (possibly weakly) and compare your object against each one to see if it matches. luabind doesn't do this (nor do I think should it).
Now, let's look at the Lua side. Even though luabind creates two different objects, they're still equal, right? Well, the first problem is that, besides certain built-in types, Lua can only hold objects by reference. Each of those "full userdata" that I mentioned before is actually a pointer. That means that they are not identical.
But they are equal, if we define an __eq meta operation. Unfortunately, Lua itself simply does not support this case. Userdata when used as table keys are always compared by identity, no matter what. This actually isn't special for userdata; it is also true for tables. (Note that to properly support this case, Lua would need to override the hashcode operation on the object in addition to __eq. Lua also does not support overriding the hashcode operation.) I can't speak for the authors of Lua why they did not allow this (and it has been suggested before), but there it is.
So, what are the options?
The simplest thing would be to convert test_param to an object once (explicitly), and then use that object to index the table both times. However, I suspect that while this fixes your toy example, it isn't very helpful in practice.
Another option is simply not to use such types as keys. Actually, I think this is a very good suggestion, since this kind of light-weight binding is quite useful, and the only other option is to discard it.
It looks like you can define a custom conversion on your type. In your example, it might be reasonable to convert your type to a Lua number which will behave well as a table index.
Use a different kind of binding. There will be some overhead, but if you want identity, you'll have to live with it. It sounds like luabind has some support for wrappers, which you may need to use to preserve identity:
When a pointer or reference to a registered class with a wrapper is passed to Lua, luabind will query for it's dynamic type. If the dynamic type inherits from wrap_base, object identity is preserved.