I can't get __dealloc__ be called when deleting an object

I can't get __dealloc__ be called when deleting an object - c++

I have the following C++ class :
.H
class ALabSet: public LabSet {
public:
PyObject *m_obj;
ALabSet(PyObject *obj);
virtual ~ALabSet();
PyObject *GetPyObj();
};
.CPP
ALabSet::ALabSet(PyObject *obj): LabSet() {
this->m_obj = obj;
// Provided by "cyelp_api.h"
if (import_cyelp()) {
} else {
Py_XINCREF(this->m_obj);
}
}
ALabSet::~ALabSet() {
Py_XDECREF(this->m_obj);
}
PyObject *ALabSet::GetPyObj() {
return this->m_obj;
}
I exposed it as follows with Cython :
cdef extern from "adapter/ALabSiteSetsManager.h" namespace "elps" :
cdef cppclass ALabSet:
ALabSet(PyObject *)
PyObject *GetPyObj()
cdef class PyLabSet:
cdef ALabSet *thisptr
def __cinit__(self):
self.thisptr = new ALabSet(<PyObject *>self)
def __dealloc__(self):
print "delete from PY !"
if self.thisptr:
del self.thisptr
My problem is that I can't figure out how to get the destructor called from Python. The following does exactly nothing :
a_set = PyLabSet()
del a_set
I can't find similar issues on the web. Does any of you has an idea of is appening to me here ?
I'm I missing something about reference counting management, or ...
Thanks a lot

del a_set removes a reference to the object (the local variable). There's still another reference, in the C++ object. This is known as a reference cycle. The cycle GC could collect this after a while. However, there is no guarantee when (or even if) this happens, so you should not rely on it1.
For example, reference cycles containing pure Python objects with a __del__ special method are documented to not be freed at all:
Changed in version 3.4: Following PEP 442, objects with a __del__() method don’t end up in gc.garbage anymore.
I don't know whether Cython's implementation of __dealloc__ triggers this behavior, but as outlined before, destruction isn't deterministic anyway. If you want to free some resource (e.g. a block of memory that isn't a Python object, a file, a connection, a lock, etc.) you should expose an explicit way of doing so manually (cf. the close methods of various objects). Context managers can simplify client code doing this.
Disclaimer: Almost all of this is CPython-specific.
1 Some people prefer thinking of GC as an abstraction that simulates availability of infinite memory, rather than something that destroys unreachable objects. With this approach, it becomes quite obvious that destruction is not deterministic and not even guaranteed.

Related

c++ passing json object by reference

In the below code, I am taking requests from a client, put them together on a json object on my server class and sending it to a pusher(directly connected to a website, putting my data in there so I can search data easily)
The code is working perfectly fine, but my manager said that I need to pass json by reference in this code, and I have no idea what to do.
On Server Class:
grpc::Status RouteGuideImpl::PubEvent(grpc::ServerContext *context,
const events::PubEventRequest *request,
events::PubEventResponse *response){
for(int i=0; i<request->event_size();i++){
nhollman::json object;
auto message = request->events(i);
object["uuid"]=message.uuid();
object["topic"]=message.type();
pusher.jsonCollector(obj);
}
...
}
On Pusher Class:
private:
nholmann::json queue = nlohmann::json::array();
public:
void Pusher::jsonCollector(nlohmann::json dump){
queue.push_back(dump);
}
void Pusher::curlPusher(){
std::string str = queue.dump();
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, str.data());
...
}
As much as I understand, I need to send the json object by reference. How can I do that?

The simple answer is to change
void Pusher::jsonCollector(nlohmann::json dump)
to
void Pusher::jsonCollector(const nlohmann::json& dump)
(note that if this is inside the class then Pusher:: is a non-standard visual studio extension).
This will reduce the number of times the object is copied from 2 to 1 however you can avoid the copy completely by using std::move:
void Pusher::jsonCollector(nlohmann::json dump){
queue.push_back(std::move(dump));
}
And call it with:
pusher.jsonCollector(std::move(obj));
If you want to enforce this behaviour to ensure that callers of jsonCollector always use std::move you can change jsonCollector to:
void Pusher::jsonCollector(nlohmann::json&& dump){
queue.push_back(std::move(dump));
}

Well, references are one of the many, many features, that distinguishes C from C++.
In other languages, like python or java, when you pass an object (not basic types) to a function and change it there, it is changed in the caller entity as well. In these languages, you don't have pointers, but you need to pass the object, not a copy.
That's what you have with references in C++. They are used like value types, but they are no copy.
Pointers can be nullptr (or NULL in C), references cannot. The address a pointer points to can be changed (assigned), you cannot change what object a reference refers to.
Have a look at this https://en.cppreference.com/w/cpp/language/reference for more information.

Downcasting trouble

This is my first experience with downcasting in C++ and I just can't understand the problem.
AInstruction and CInstruction inherit from AssemblerInstruction.
Parser takes the info in its ctor and creates one of those derived instruction types for its mInstruction member (accessed by getInstruction). In the program, a method of the base AssemblerInstruction class is used, for happy polymorphism.
But when I want to test that the Parser has created the correct instruction, I need to query the derived instruction members, which means I need to downcast parser.getInstruction() to an AInstruction or CInstruction.
As far as I can tell this needs to be done using a bunch of pointers and references. This is how I can get the code to compile:
TEST(ParserA, parsesBuiltInConstants)
{
AssemblerInstruction inst = Parser("#R3", 0).getInstruction();
EXPECT_EQ(inst.getInstructionType(), AssemblerInstruction::InstructionType::A);
AssemblerInstruction* i = &(inst);
AInstruction* a = dynamic_cast<AInstruction*>(i);
EXPECT_EQ(a->getLine(), "R3");
}
Running this gives this error:
unknown file: error: SEH exception with code 0xc0000005 thrown in the test body.
And stepping through the code, when the debugger is on the final line of the function, a is pointing to
0x00000000 <NULL>.
I imagine this is an instance where I don't have a full enough understanding of C++, meaning that I could be making a n00b mistake. Or maybe it's some bigger crazy problem. Help?
Update
I've been able to make this work by making mInstruction into a (dumb) pointer:
// in parser, when parsing
mInstructionPtr = new AInstruction(assemblyCode.substr(1), lineNumber);
// elsewhere in AssemblerInstruction.cpp
AssemblerInstruction* AssemblyParser::getInstructionPtr() { return mInstructionPtr; }
TEST(ParserA, parsesBuiltInConstants)
{
auto ptr = Parser("#R3", 0).getInstructionPtr();
AInstruction* a = dynamic_cast<AInstruction*>(ptr);
EXPECT_EQ(a->getLine(), "R3");
}
However I have trouble implementing it with a unique_ptr:
(I'm aware that mInstruction (non-pointer) is redundant, as are two types of pointers. I'll get rid of it later when I clean all this up)
class AssemblyParser
{
public:
AssemblyParser(std::string assemblyCode, unsigned int lineNumber);
AssemblerInstruction getInstruction();
std::unique_ptr<AssemblerInstruction> getUniqueInstructionPtr();
AssemblerInstruction* getInstructionPtr();
private:
AssemblerInstruction mInstruction;
std::unique_ptr<AssemblerInstruction> mUniqueInstructionPtr;
AssemblerInstruction* mInstructionPtr;
};
// in AssemblyParser.cpp
// in parser as in example above. this works fine.
mUniqueInstructionPtr = make_unique<AInstruction>(assemblyCode.substr(1), lineNumber);
// this doesn't compile!!!
unique_ptr<AssemblerInstruction> AssemblyParser::getUniqueInstructionPtr()
{
return mUniqueInstructionPtr;
}
In getUniqueInstructionPtr, there is a squiggle under mUniqueInstructionPtr with this error:
'std::unique_ptr<AssemblerInstruction,std::default_delete>::unique_ptr(const std::unique_ptr<AssemblerInstruction,std::default_delete> &)': attempting to reference a deleted function
What!? I haven't declared any functions as deleted or defaulted!

You can not downcast an object to something which doesn't match it's dynamic type. In your code,
AssemblerInstruction inst = Parser("#R3", 0).getInstruction();
inst has a fixed type, which is AssemblerInstruction. Downcasting it to AInstruction leads to undefined behavior - manifested as crash - because that is not what it is.
If you want your getInstruction to return a dynamically-typed object, it has to return a [smart] pointer to base class, while constructing an object of derived class. Something like that (pseudo code):
std::unique_ptr<AssemblerInstruction> getInstruction(...) {
return std::make_unique<AInstruction>(...);
}
Also, if you see yourself in need of downcasting object based on a value of a class, you are doing something wrong, as you are trying to home-brew polymorphism. Most of the times it does indicate a design flaw, and should instead be done using built-in C++ polymorphic support - namely, virtual functions.

c++, dealing with exceptions from constructors

I have a class which is loaded from an external file, so ideally I would want its constructor to load from a given path if the load fails, I will want to throw an error if the file is not found/not readable (Throwing errors from constructors is not a horrible idea, see ISO's FAQ).
There is a problem with this though, I want to handle errors myself in some controlled manner, and I want to do that immediately, so I need to put a try-catch statement around the constructor for this object ... and if I do that, the object is not declared outside the try statement, i.e.:
//in my_class.hpp
class my_class
{
...
public:
my_class(string path);//Throws file not found, or other error error
...
};
//anywhere my_class is needed
try
{
my_class my_object(string);
}
catch(/*Whatever error I am interesetd in*/)
{
//error handling
}
//Problem... now my_object doesn't exist anymore
I have tried a number of ways of getting around it, but I don't really like any of them:
Firstly, I could use a pointer to my_class instead of the class itself:
my_class* my_pointer;
try
{
my_class my_pointer = new my_class(string);
}
catch(/*Whatever error I am interesetd in*/)
{
//error handling
}
The problem is that the instance of this object doesn't always end up in the same object which created it, so deleting all pointers correctly would be easy to do wrong, and besides, I personally think it is ugly to have some objects be pointers to objects, and have most others be "regular objects".
Secondly, I could use a vector with only one element in much the same way:
std::vector<my_class> single_vector;
try
{
single_vector.push_back(my_class(string));
single_vector.shrink_to_fit();
}
catch(/*Whatever error I am interesetd in*/)
{
//error handling
}
I don't like the idea of having a lot of single-element vectors though.
Thirdly, I can create an empty faux constructor and use another loading function, i.e.
//in my_class.hpp
class my_class
{
...
public:
my_class() {}// Faux constructor which does nothing
void load(string path);//All the code in the constructor has been moved here
...
};
//anywhere my_class is needed
my_class my_object
try
{
my_object.load(path);
}
catch(/*Whatever error I am interesetd in*/)
{
//error handling
}
This works, but largely defeats the purpose of having a constructor, so I don't really like this either.
So my question is, which of these methods for constructing an object, which may throw errors in the constructor, is the best (or least bad)? and are there better ways of doing this?
Edit: Why don't you just use the object within the try-statement
Because the object may need to be created as the program is first started, and stopped much later. In the most extreme case (which I do actually need in this case also) that would essentially be:
int main()
{
try
{
//... things which might fail
//A few hundred lines of code
}
catch(/*whaveter*/)
{
}
}
I think this makes my code hard to read since the catch statement will be very far from where things actually went wrong.

One possibility is to wrap the construction and error handling in a function, returning the constructed object. Example :
#include <string>
class my_class {
public:
my_class(std::string path);
};
my_class make_my_object(std::string path)
{
try {
return {std::move(path)};
}
catch(...) {
// Handle however you want
}
}
int main()
{
auto my_object = make_my_object("this path doesn't exist");
}
But beware that the example is incomplete because it isn't clear what you intend to do when construction fails. The catch block has to either return something, throw or terminate.
If you could return a different instance, one with a "bad" or "default" state, you could have just initialized your instance to that state in my_class(std::string path) when it was determined the path is invalid. So in that case, the try/catch block is not needed.
If you rethrow the exception, then there is no point in catching it in the first place. In that case, the try/catch block is also not needed, unless you want to do a bit of extra work, like logging.
If you want to terminate, you can just let the exception go uncaught. Again, in that case, the try/catch block is not needed.
The real solution here is probably to not use a try/catch block at all, unless there is actually error handling you can do that shouldn't be implemented as part of my_class which isn't made apparent in the question (maybe a fallback path?).

and if I do that, the object is not declared outside the try statement
I have tried a number of ways of getting around it
That doesn't need to be a problem. There's not necessarily need to get around it. Simply use the object within the try statement.
If you really cannot have the try block around the entire lifetime, then this is a use case for std::optional:
std::optional<my_class> maybe_my_object;
try {
maybe_my_object.emplace(string);
} catch(...) {}
The problem is that the instance of this object doesn't always end up in the same object which created it, so deleting all pointers correctly would be easy to do wrong,
A pointer returned by new is correct to delete. In the error case, simply set the pointer to null and there would be no problem. That said, use a smart pointer instead for dynamic allocation, if you were to use this approach.
single_vector.push_back(my_class(string));
single_vector.shrink_to_fit();
Don't push and shrink when you know the number of objects that are going to be in the vector. Use reserve instead if you were to use this approach.

The object creation can fail because a resource is unavailable. It's not the creation which fails; it is a prerequisite which is not fulfilled.
Consequently, separate these two concerns: First obtain all resources and then, if that succeeded, create the object with these resources and use it. The object creation as such in this design cannot fail, the constructor is nothrow; it is trivial boilerplate code (copy data etc.). If, on the other hand, resource acquisition failed, object creation and object use are both skipped: Your problem with existing but unusable objects is gone.
Responding to your edit about try/catch comprising the entire program: Exceptions as error indicators are better suited for things which are done in many places at various times in a program because they guarantee error handling (by default through an abort) while separating it from the normal control flow. This is impossible to do with classic return value examination, which leaves us with a choice between unreadable or unreliable programs.
But if you have long-lived objects which are created only rarely (in your example: only at startup) you don't need exceptions. As you said, constructor exceptions guarantee that only properly initialized objects can be used. But if such an object is only created at startup this danger is low. You check for success one way or another and exit the program which cannot perform its purpose if the initial resource acquisition failed. This way the error is handled where it occurred. Even in less extreme cases (e.g. when an object is created at the beginning of a large function other than main) this may be the simpler solution.
In code, my suggestion looks like this:
struct T2;
struct myEx { myEx(const char *); };
void exit(int);
T1 *acquireResource1(); // e.g. read file
T2 *acquireResource2(); // e.g. connect to db
void log(const char *what);
class ObjT
{
public:
struct RsrcT
{
T1 *mT1;
T2 *mT2;
operator bool() { return mT1 && mT2; }
};
ObjT(const RsrcT& res) noexcept
{
// initialize from file data etc.
}
// more member functions using data from file and db
};
int main()
{
ObjT::RsrcT rsrc = { acquireResource1(), acquireResource2() };
if(!rsrc)
{
log("bummer");
exit(1);
}
///////////////////////////////////////////////////
// all resources are available. "Real" code starts here.
///////////////////////////////////////////////////
ObjT obj(rsrc);
// 1000 lines of code using obj
}

Why does PyCXX handle new-style classes in the way it does?

I'm picking apart some C++ Python wrapper code that allows the consumer to construct custom old style and new style Python classes from C++.
The original code comes from PyCXX, with old and new style classes here and here. I have however rewritten the code substantially, and in this question I will reference my own code, as it allows me to present the situation in the greatest clarity that I am able. I think there would be very few individuals capable of understanding the original code without several days of scrutiny... For me it has taken weeks and I'm still not clear on it.
The old style simply derives from PyObject,
template<typename FinalClass>
class ExtObj_old : public ExtObjBase<FinalClass>
// ^ which : ExtObjBase_noTemplate : PyObject
{
public:
// forwarding function to mitigate awkwardness retrieving static method
// from base type that is incomplete due to templating
static TypeObject& typeobject() { return ExtObjBase<FinalClass>::typeobject(); }
static void one_time_setup()
{
typeobject().set_tp_dealloc( [](PyObject* t) { delete (FinalClass*)(t); } );
typeobject().supportGetattr(); // every object must support getattr
FinalClass::setup();
typeobject().readyType();
}
// every object needs getattr implemented to support methods
Object getattr( const char* name ) override { return getattr_methods(name); }
// ^ MARKER1
protected:
explicit ExtObj_old()
{
PyObject_Init( this, typeobject().type_object() ); // MARKER2
}
When one_time_setup() is called, it forces (by accessing base class typeobject()) creation of the associated PyTypeObject for this new type.
Later when an instance is constructed, it uses PyObject_Init
So far so good.
But the new style class uses much more complicated machinery. I suspect this is related to the fact that new style classes allow derivation.
And this is my question, why is the new style class handling implemented in the way that it is? Why is it having to create this extra PythonClassInstance structure? Why can't it do things the same way the old-style class handling does? i.e. Just type convert from the PyObject base type? And seeing as it doesn't do that, does this mean it is making no use of its PyObject base type?
This is a huge question, and I will keep amending the post until I'm satisfied it represents the issue well. It isn't a good fit for SO's format, I'm sorry about that. However, some world-class engineers frequent this site (one of my previous questions was answered by the lead developer of GCC for example), and I value the opportunity to appeal to their expertise. So please don't be too hasty to vote to close.
The new style class's one-time setup looks like this:
template<typename FinalClass>
class ExtObj_new : public ExtObjBase<FinalClass>
{
private:
PythonClassInstance* m_class_instance;
public:
static void one_time_setup()
{
TypeObject& typeobject{ ExtObjBase<FinalClass>::typeobject() };
// these three functions are listed below
typeobject.set_tp_new( extension_object_new );
typeobject.set_tp_init( extension_object_init );
typeobject.set_tp_dealloc( extension_object_deallocator );
// this should be named supportInheritance, or supportUseAsBaseType
// old style class does not allow this
typeobject.supportClass(); // does: table->tp_flags |= Py_TPFLAGS_BASETYPE
typeobject.supportGetattro(); // always support get and set attr
typeobject.supportSetattro();
FinalClass::setup();
// add our methods to the extension type's method table
{ ... typeobject.set_methods( /* ... */); }
typeobject.readyType();
}
protected:
explicit ExtObj_new( PythonClassInstance* self, Object& args, Object& kwds )
: m_class_instance{self}
{ }
So the new style uses a custom PythonClassInstance structure:
struct PythonClassInstance
{
PyObject_HEAD
ExtObjBase_noTemplate* m_pycxx_object;
}
PyObject_HEAD, if I dig into Python's object.h, is just a macro for PyObject ob_base; -- no further complications, like #if #else. So I don't see why it can't simply be:
struct PythonClassInstance
{
PyObject ob_base;
ExtObjBase_noTemplate* m_pycxx_object;
}
or even:
struct PythonClassInstance : PyObject
{
ExtObjBase_noTemplate* m_pycxx_object;
}
Anyway, it seems that its purpose is to tag a pointer onto the end of a PyObject. This will be because Python runtime will often trigger functions we have placed in its function table, and the first parameter will be the PyObject responsible for the call. So this allows us to retrieve the associated C++ object.
But we also need to do that for the old-style class.
Here is the function responsible for doing that:
ExtObjBase_noTemplate* getExtObjBase( PyObject* pyob )
{
if( pyob->ob_type->tp_flags & Py_TPFLAGS_BASETYPE )
{
/*
New style class uses a PythonClassInstance to tag on an additional
pointer onto the end of the PyObject
The old style class just seems to typecast the pointer back up
to ExtObjBase_noTemplate
ExtObjBase_noTemplate does indeed derive from PyObject
So it should be possible to perform this typecast
Which begs the question, why on earth does the new style class feel
the need to do something different?
This looks like a really nice way to solve the problem
*/
PythonClassInstance* instance = reinterpret_cast<PythonClassInstance*>(pyob);
return instance->m_pycxx_object;
}
else
return static_cast<ExtObjBase_noTemplate*>( pyob );
}
My comment articulates my confusion.
And here, for completeness is us inserting a lambda-trampoline into the PyTypeObject's function pointer table, so that Python runtime can trigger it:
table->tp_setattro = [] (PyObject* self, PyObject* name, PyObject* val) -> int
{
try {
ExtObjBase_noTemplate* p = getExtObjBase( self );
return ( p -> setattro(Object{name}, Object{val}) );
}
catch( Py::Exception& ) { /* indicate error */
return -1;
}
};
(In this demonstration I'm using tp_setattro, note that there are about 30 other slots, which you can see if you look at the doc for PyTypeObject)
(in fact the major reason for working this way is that we can try{}catch{} around every trampoline. This saves the consumer from having to code repetitive error trapping.)
So, we pull out the "base type for the associated C++ object" and call its virtual setattro (just using setattro as an example here). A derived class will have overridden setattro, and this override will get called.
The old-style class provides such an override, which I've labelled MARKER1 -- it is in the top listing for this question.
The only the thing I can think of is that maybe different maintainers have used different techniques. But is there some more compelling reason why old and new style classes require different architecture?
PS for reference, I should include the following methods from new style class:
static PyObject* extension_object_new( PyTypeObject* subtype, PyObject* args, PyObject* kwds )
{
PyObject* pyob = subtype->tp_alloc(subtype,0);
PythonClassInstance* o = reinterpret_cast<PythonClassInstance *>( pyob );
o->m_pycxx_object = nullptr;
return pyob;
}
^ to me, this looks absolutely wrong.
It appears to be allocating memory, re-casting to some structure that might exceed the amount allocated, and then nulling right at the end of this.
I'm surprised it hasn't caused any crashes.
I can't see any indication anywhere in the source code that these 4 bytes are owned.
static int extension_object_init( PyObject* _self, PyObject* _args, PyObject* _kwds )
{
try
{
Object args{_args};
Object kwds{_kwds};
PythonClassInstance* self{ reinterpret_cast<PythonClassInstance*>(_self) };
if( self->m_pycxx_object )
self->m_pycxx_object->reinit( args, kwds );
else
// NOTE: observe this is where we invoke the constructor, but indirectly (i.e. through final)
self->m_pycxx_object = new FinalClass{ self, args, kwds };
}
catch( Exception & )
{
return -1;
}
return 0;
}
^ note that there is no implementation for reinit, other than the default
virtual void reinit ( Object& args , Object& kwds ) {
throw RuntimeError( "Must not call __init__ twice on this class" );
}
static void extension_object_deallocator( PyObject* _self )
{
PythonClassInstance* self{ reinterpret_cast< PythonClassInstance* >(_self) };
delete self->m_pycxx_object;
_self->ob_type->tp_free( _self );
}
EDIT: I will hazard a guess, thanks to insight from Yhg1s on the IRC channel.
Maybe it is because when you create a new old-style class, it is guaranteed it will overlap perfectly a PyObject structure.
Hence it is safe to derive from PyObject, and pass a pointer to the underlying PyObject into Python, which is what the old-style class does (MARKER2)
On the other hand, new style class creates a {PyObject + maybe something else} object.
i.e. It wouldn't be safe to do the same trick, as Python runtime would end up writing past the end of the base class allocation (which is only a PyObject).
Because of this, we need to get Python to allocate for the class, and return us a pointer which we store.
Because we are now no longer making use of the PyObject base-class for this storage, we cannot use the convenient trick of typecasting back to retrieve the associated C++ object.
Which means that we need to tag on an extra sizeof(void*) bytes to the end of the PyObject that actually does get allocated, and use this to point to our associated C++ object instance.
However, there is some contradiction here.
struct PythonClassInstance
{
PyObject_HEAD
ExtObjBase_noTemplate* m_pycxx_object;
}
^ if this is indeed the structure that accomplishes the above, then it is saying that the new style class instance is indeed fitting exactly over a PyObject, i.e. It is not overlapping into the m_pycxx_object.
And if this is the case, then surely this whole process is unnecessary.
EDIT: here are some links that are helping me learn the necessary ground work:
http://eli.thegreenplace.net/2012/04/16/python-object-creation-sequence
http://realmike.org/blog/2010/07/18/introduction-to-new-style-classes-in-python
Create an object using Python's C API

to me, this looks absolutely wrong. It appears to be allocating memory, re-casting to some structure that might exceed the amount allocated, and then nulling right at the end of this. I'm surprised it hasn't caused any crashes. I can't see any indication anywhere in the source code that these 4 bytes are owned
PyCXX does allocate enough memory, but it does so by accident. This appears to be a bug in PyCXX.
The amount of memory Python allocates for the object is determined by the first call to the following static member function of PythonClass<T>:
static PythonType &behaviors()
{
...
p = new PythonType( sizeof( T ), 0, default_name );
...
}
The constructor of PythonType sets the tp_basicsize of the python type object to sizeof(T). This way when Python allocates an object it knows to allocate at least sizeof(T) bytes. It works because sizeof(T) turns out to be larger that sizeof(PythonClassInstance) (T is derived from PythonClass<T> which derives from PythonExtensionBase, which is large enough).
However, it misses the point. It should actually allocate only sizeof(PythonClassInstance) . This appears to be a bug in PyCXX - that it allocates too much, rather than too little space for storing a PythonClassInstance object.
And this is my question, why is the new style class handling implemented in the way that it is? Why is it having to create this extra PythonClassInstance structure? Why can't it do things the same way the old-style class handling does?
Here's my theory why new style classes are different from the old style classes in PyCXX.
Before Python 2.2, where new style classes were introduced, there was no tp_init member int the type object. Instead, you needed to write a factory function that would construct the object. This is how PythonExtension<T> is supposed to work - the factory function converts the Python arguments to C++ arguments, asks Python to allocate the memory and then calls the constructor using placement new.
Python 2.2 added the new style classes and the tp_init member. Python first creates the object and then calls the tp_init method. Keeping the old way would have required that the objects would first have a dummy constructor that creates an "empty" object (e.g. initializes all members to null) and then when tp_init is called, would have had an additional initialization stage. This makes the code uglier.
It seems that the author of PyCXX wanted to avoid that. PyCXX works by first creating a dummy PythonClassInstance object and then when tp_init is called, creates the actual PythonClass<T> object using its constructor.
... does this mean it is making no use of its PyObject base type?
This appears to be correct, the PyObject base class does not seem to be used anywhere. All the interesting methods of PythonExtensionBase use the virtual self() method, which returns m_class_instance and completely ignore the PyObject base class.
I guess (only a guess, though) is that PythonClass<T> was added to an existing system and it seemed easier to just derive from PythonExtensionBase instead of cleaning up the code.

Python C-API Object Allocation

I want to use the new and delete operators for creating and destroying my objects.
The problem is python seems to break it into several stages. tp_new, tp_init and tp_alloc for creation and tp_del, tp_free and tp_dealloc for destruction. However c++ just has new which allocates and fully constructs the object and delete which destructs and deallocates the object.
Which of the python tp_* methods do I need to provide and what must they do?
Also I want to be able to create the object directly in c++ eg "PyObject *obj = new MyExtensionObject(args);" Will I also need to overload the new operator in some way to support this?
I also would like to be able to subclass my extension types in python, is there anything special I need to do to support this?
I'm using python 3.0.1.
EDIT:
ok, tp_init seems to make objects a bit too mutable for what I'm doing (eg take a Texture object, changing the contents after creation is fine, but change fundamental aspects of it such as, size, bitdept, etc will break lots of existing c++ stuff that assumes those sort of things are fixed). If I dont implement it will it simply stop people calling __init__ AFTER its constructed (or at least ignore the call, like tuple does). Or should I have some flag that throws an exception or somthing if tp_init is called more than once on the same object?
Apart from that I think ive got most of the rest sorted.
extern "C"
{
//creation + destruction
PyObject* global_alloc(PyTypeObject *type, Py_ssize_t items)
{
return (PyObject*)new char[type->tp_basicsize + items*type->tp_itemsize];
}
void global_free(void *mem)
{
delete[] (char*)mem;
}
}
template<class T> class ExtensionType
{
PyTypeObject *t;
ExtensionType()
{
t = new PyTypeObject();//not sure on this one, what is the "correct" way to create an empty type object
memset((void*)t, 0, sizeof(PyTypeObject));
static PyVarObject init = {PyObject_HEAD_INIT, 0};
*((PyObject*)t) = init;
t->tp_basicsize = sizeof(T);
t->tp_itemsize = 0;
t->tp_name = "unknown";
t->tp_alloc = (allocfunc) global_alloc;
t->tp_free = (freefunc) global_free;
t->tp_new = (newfunc) T::obj_new;
t->tp_dealloc = (destructor)T::obj_dealloc;
...
}
...bunch of methods for changing stuff...
PyObject *Finalise()
{
...
}
};
template <class T> PyObjectExtension : public PyObject
{
...
extern "C" static PyObject* obj_new(PyTypeObject *subtype, PyObject *args, PyObject *kwds)
{
void *mem = (void*)subtype->tp_alloc(subtype, 0);
return (PyObject*)new(mem) T(args, kwds)
}
extern "C" static void obj_dealloc(PyObject *obj)
{
~T();
obj->ob_type->tp_free(obj);//most of the time this is global_free(obj)
}
...
};
class MyObject : PyObjectExtension<MyObject>
{
public:
static PyObject* InitType()
{
ExtensionType<MyObject> extType();
...sets other stuff...
return extType.Finalise();
}
...
};

The documentation for these is at http://docs.python.org/3.0/c-api/typeobj.html and
http://docs.python.org/3.0/extending/newtypes.html describes how to make your own type.
tp_alloc does the low-level memory allocation for the instance. This is equivalent to malloc(), plus initialize the refcnt to 1. Python has it's own allocator, PyType_GenericAlloc, but a type can implement a specialized allocator.
tp_new is the same as Python's __new__. It's usually used for immutable objects where the data is stored in the instance itself, as compared to a pointer to data. For example, strings and tuples store their data in the instance, instead of using a char * or a PyTuple *.
For this case, tp_new figures out how much memory is needed, based on the input parameters, and calls tp_alloc to get the memory, then initializes the essential fields. tp_new does not need to call tp_alloc. It can for example return a cached object.
tp_init is the same as Python's __init__. Most of your initialization should be in this function.
The distinction between __new__ and __init__ is called two-stage initialization, or two-phase initialization.
You say "c++ just has new" but that's not correct. tp_alloc corresponds a custom arena allocator in C++, __new__ corresponds to a custom type allocator (a factory function), and __init__ is more like the constructor. That last link discusses more about the parallels between C++ and Python style.
Also read http://www.python.org/download/releases/2.2/descrintro/ for details about how __new__ and __init__ interact.
You write that you want to "create the object directly in c++". That's rather difficult because at the least you'll have to convert any Python exceptions that occurred during object instantiation into a C++ exception. You might try looking at Boost::Python for some help with this task. Or you can use a two-phase initialization. ;)

I don't know the python APIs at all, but if python splits up allocation and initialization, you should be able to use placement new.
e.g.:
// tp_alloc
void *buffer = new char[sizeof(MyExtensionObject)];
// tp_init or tp_new (not sure what the distinction is there)
new (buffer) MyExtensionObject(args);
return static_cast<MyExtensionObject*>(buffer);
...
// tp_del
myExtensionObject->~MyExtensionObject(); // call dtor
// tp_dealloc (or tp_free? again I don't know the python apis)
delete [] (static_cast<char*>(static_cast<void*>(myExtensionObject)));

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js