How to store enumerate types/store type info across modules in C++ - c++

I'm working on a game engine incorporating ECS. I have a pretty straightforward setup - my engine is in a DLL and my game module is the .EXE which attaches the DLL. I have created an ECS using dense memory block allocations and I'm focusing quite a lot on performance. For the very basis of the ECS i use a type info structure to enumerate different entity component types. I basically use function pointers to templated functions as a way to enumerate different types. It is defined as follows:
struct ComponentTypeInfo
{
typedef (*_InternalId)();
_InternalId TypeId;
size_t TypeSize;
protected:
template<typename>
static void TypeId() {}
}
template<typename TComponent>
struct GetComponentTypeInfo : public ComponentTypeInfo
{
GetComponentTypeInfo() : ComponentTypeInfo(TypeId<TComponent>, sizeof(TComponent))
}
The problem: In practice this works fine, however it breaks when types are compared across modules due to function instances being seperate. This causes the the function pointers to be different, and therefore the Ids to be different. I have tried a bunch of things but they all come to the same issue regarding module boundries.
What I have thought of so far:
Making sure only one module uses type info - Practically impossible since the engine has own component types predefined to use in systems.
Using a counter/increment-style id - Still encounters the same problem - any way of attaching ids to types encounters the problem that templated functionality is always instantiated in its own module, thus the ids would never match.
Using inheritence - since the ECS uses densely packed memory blocks, type erasure happens very early on and the logic working with memory blocks and stacks doesnt have the compile-time information of the types stored. Just the type ids.
I want to stay away from STL unless I absolutely need to use it. However I might end up using a wrapper around it if nothing else works. Libraries (boost, etc) are not a consideration.
Is there any senseble way to enumerate types across modules? Is RTTI only made to be available to STL?

Related

Is it safe to use strings as private data members in a class used across a DLL boundry?

My understanding is that exposing functions that take or return stl containers (such as std::string) across DLL boundaries can cause problems due to differences in STL implementations of those containers in the 2 binaries. But is it safe to export a class like:
class Customer
{
public:
wchar_t * getName() const;
private:
wstring mName;
};
Without some sort of hack, mName is not going to be usable by the executable, so it won't be able to execute methods on mName, nor construct/destruct this object.
My gut feeling is "don't do this, it's unsafe", but I can't figure out a good reason.
It is not a problem. Because it is trumped by the bigger problem, you cannot create an object of that class in code that lives in a module other than the one that contains the code for the class. Code in another module cannot accurately know the required object size, their implementation of the std::string class may well be different. Which, as declared, also affects the size of the Customer object. Even the same compiler cannot guarantee this, mixing optimized and debugging builds of these modules for example. Albeit that this is usually pretty easy to avoid.
So you must create a class factory for Customer objects, a factory that lives in that same module. Which then automatically implies that any code that touches the "mName" member also lives in the same module. And is therefore safe.
Next step then is to not expose Customer at all but expose an pure abstract base class (aka interface). Now you can prevent the client code from creating an instance of Customer and shoot their leg off. And you'll trivially hide the std::string as well. Interface-based programming techniques are common in module interop scenarios. Also the approach taken by COM.
As long as the allocator of instances of the class and deallocator are of the same settings, you should be ok, but you are right to avoid this.
Differences between the .exe and .dll as far as debug/release, code generation (Multi-threaded DLL vs. Single threaded) could cause problems in some scenarios.
I would recommend using abstract classes in the DLL interface with creation and deletion done solely inside the DLL.
Interfaces like:
class A {
protected:
virtual ~A() {}
public:
virtual void func() = 0;
};
//exported create/delete functions
A* create_A();
void destroy_A(A*);
DLL Implementation like:
class A_Impl : public A{
public:
~A_Impl() {}
void func() { do_something(); }
}
A* create_A() { return new A_Impl; }
void destroy_A(A* a) {
A_Impl* ai=static_cast<A_Impl*>(a);
delete ai;
}
Should be ok.
Even if your class has no data members, you cannot expect it to be usable from code compiled with a different compiler. There is no common ABI for C++ classes. You can expect differences in name mangling just for starters.
If you are prepared to constrain clients to use the same compiler as you, or provide source to allow clients to compile your code with their compiler, then you can do pretty much anything across your interface. Otherwise you should stick to C style interfaces.
If you want to provide an object oriented interface in a DLL that is truly safe, I would suggest building it on top of the COM object model. That's what it was designed for.
Any other attempt to share classes between code that is compiled by different compilers has the potential to fail. You may be able to get something that seems to work most of the time, but it can't be guaraneteed to work.
The chances are that at some point you're going to be relying on undefined behaviour in terms of calling conventions or class structure or memory allocation.
The C++ standard does not say anything about the ABI provided by implementations. Even on a single platform changing the compiler options may change binary layout or function interfaces.
Thus to ensure that standard types can be used across DLL boundaries it is your responsibility to ensure that either:
Resource Acquisition/Release for standard types is done by the same DLL. (Note: you can have multiple crt's in a process but a resource acquired by crt1.DLL must be released by crt1.DLL.)
This is not specific to C++. In C for example malloc/free, fopen/fclose call pairs must each go to a single C runtime.
This can be done by either of the below:
By explicitly exporting acquisition/release functions ( Photon's answer ). In this case you are forced to use a factory pattern and abstract types.Basically COM or a COM-clone
Forcing a group of DLL's to link against the same dynamic CRT. In this case you can safely export any kind of functions/classes.
There are also two "potential bug" (among others) you must take care, since they are related to what is "under" the language.
The first is that std::strng is a template, and hence it is instantiated in every translation unit. If they are all linked to a same module (exe or dll) the linker will resolve same functions as same code, and eventually inconsistent code (same function with different body) is treated as error.
But if they are linked to different module (and exe and a dll) there is nothing (compiler and linker) in common. So -depending on how the module where compiled- you may have different implementation of a same class with different member and memory layout (for example one may have some debugging or profiling added features the other has not). Accessing an object created on one side with methods compiled on the other side, if you have no other way to grant implementation consistency, may end in tears.
The second problem (more subtle) relates to allocation/deallocaion of memory: because of the way windows works, every module can have a distinct heap. But the standard C++ does not specify how new and delete take care about which heap an object comes from. And if the string buffer is allocated on one module, than moved to a string instance on another module, you risk (upon destruction) to give the memory back to the wrong heap (it depends on how new/delete and malloc/free are implemented respect to HeapAlloc/HeapFree: this merely relates to the level of "awarness" the STL implementation have respect to the underlying OS. The operation is not itself destructive -the operation just fails- but it leaks the origin's heap).
All that said, it is not impossible to pass a container. It is just up to you to grant a consistent implementation between the sides, since the compiler and linker have no way to cross check.

Implications of using std::vector in a dll exported function

I have two dll-exported classes A and B. A's declaration contains a function which uses a std::vector in its signature like:
class EXPORT A{
// ...
std::vector<B> myFunction(std::vector<B> const &input);
};
(EXPORT is the usual macro to put in place _declspec(dllexport)/_declspec(dllimport) accordingly.)
Reading about the issues related to using STL classes in a DLL interface, I gather in summary:
Using std::vector in a DLL interface would require all the clients of that DLL to be compiled with the same version of the same compiler because STL containers are not binary compatible. Even worse, depending on the use of that DLL by clients conjointly with other DLLs, the ''instable'' DLL API can break these client applications when system updates are installed (e.g. Microsoft KB packages) (really?).
Despite the above, if required, std::vector can be used in a DLL API by exporting std::vector<B> like:
template class EXPORT std::allocator<B>;
template class EXPORT std::vector<B>;
though, this is usually mentioned in the context when one wants to use std::vector as a member of A (http://support.microsoft.com/kb/168958).
The following Microsoft Support Article discusses how to access std::vector objects created in a DLL through a pointer or reference from within the executable (http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q172396). The above solution to use template class EXPORT ... seems to be applicable too. However, the drawback summarized under the first bullet point seems to remain.
To completely get rid of the problem, one would need to wrap std::vector and change the signature of myFunction, PIMPL etc..
My questions are:
Is the above summary correct, or do I miss here something essential?
Why does compilation of my class 'A' not generate warning C4251 (class 'std::vector<_Ty>' needs to have dll-interface to be used by clients of...)? I have no compiler warnings turned off and I don't get any warning on using std::vector in myFunction in exported class A (with VS2005).
What needs to be done to correctly export myFunction in A? Is it viable to just export std::vector<B> and B's allocator?
What are the implications of returning std::vector by-value? Assuming a client executable which has been compiled with a different compiler(-version). Does trouble persist when returning by-value where the vector is copied? I guess yes. Similarly for passing std::vector as a constant reference: could access to std::vector<B> (which might was constructed by an executable compiled with a different compiler(-version)) lead to trouble within myFunction? I guess yes again..
Is the last bullet point listed above really the only clean solution?
Many thanks in advance for your feedback.
Unfortunately, your list is very much spot-on. The root cause of this is that DLL-to-DLL or DLL-to-EXE is defined on the level of the operating system, while the the interface between functions is defined on the level of a compiler. In a way, your task is similar (although somewhat easier) to that of client-server interaction, when the client and the server lack binary compatibility.
The compiler maps what it can to the way the DLL importing and exporting is done in a particular operating system. Since language specifications give compilers a lot of liberty when it comes to binary layout of user-defined types and sometimes even built-in types (recall that the exact size of int is compiler-dependent, as long as minimal sizing requirements are met), importing and exporting from DLLs needs to be done manually to achieve binary-level compatibility.
When you use the same version of the same compiler, this last issue above does not create a problem. However, as soon as a different compiler enters the picture, all bets are off: you need to go back to the plainly-typed interfaces, and introduce wrappers to maintain nice-looking interfaces inside your code.
I've been having the same problem and discovered a neat solution to it.
Instead of passing std:vector, you can pass a QVector from the Qt library.
The problems you quote are then handled inside the Qt library and you do not need to deal with it at all.
Of course, the cost is having to use the library and accept its slightly worse performance.
In terms of the amount of coding and debugging time it saves you, this solution is well worth it.

Game entity type allocation

I am migrating a tile based 2D game to C++, because I am really not a fan of Java (some features are nice, but I just can't get used to it). I am using TMX tiled maps. This question is concerning how to translate the object definitions into actual game entities. In Java, I used reflection to allocate an object of the specified type (given that it derived from the basic game entity).
This worked fine, but this feature is not available in C++ (I understand why, and I'm not complaining. I find reflection messy, and I was hesitant to use it in Java, haha). I was just wondering what the best way was to translate this data. My idea was a base class from which all entities could derive from (this seems pretty standard), then have the loader allocate the derived types based on the 'type' value from the TMX map. I have thought of two ways to do this.
A giant switch-case block. Lengthy and disgusting. I'm doubtful that anyone would suggest this (but it is the obvious).
Use a std::map, which would map arbitrary type names to a function to allocate said classes corresponding to said type names.
Lastly, I had thought of making entities of one base class, and using scripting for different entity types. The scripts themselves would register their entity type with the system, although the game would need to load said entity type scripts upon loading (this could be done via one main entity type declaration script, which would bring the number of edits per entity down to 2: entity creation, and entity registration).
while option two looks pretty good, I don't like having to change 3 pieces of code for each type (defining the entity class, defining an allocate function, and adding the function to the std::map). Option 3 sounds great except for two things in my mind: I'm afraid of the speed of purely script driven entities. Also, I know that adding scripting to my engine is going to be a big project in itself (adding all the helper functions for interfacing with the library will be interesting).
Does anyone know of a better solution? Maybe not better, but just cleaner. With less code edits per entity type.
You can reduce the number of code changes in solution 2, if you use a self registration to a factory. The drawback is that the entities know this factory (self registration) and this factory has to be a global (e.g. a singleton) instance. If tis is no problem to you, this pattern can be very nice. Each new type requires only compilation an linking of one new file.
You can implement self registration like this:
// foo.cpp
namespace
{
bool dummy = FactoryInstance().Register("FooKey", FooCreator);
}
Abstract Factory, Template Style, by Jim Hyslop and Herb Sutter

How to implement an adapter framework in C++ that works in both Linux and Windows

Here is what I am trying to do:
I am developing a cross-platform IDE (Linux and Windows) that supports plug-ins. I need to support extensibility using an adapter framework similar to the one that Eclipse provides. See here for more details, but basically I need the following:
Let Adaptee and Adapted be completely unrelated classes which already exist and which we are not allowed to change in any way. I want to create an AdapterManager class which has a method
template <class Adaptee, class Adapted> Adapted* adapt( Adaptee* object);
which will create an instance of Adapted given an instance of Adaptee. How exactly the instance is created depends on an adapter function which will have to be registered with AdapterManager. Each new plug-in should be able to contribute adapter functions for arbitrary types.
Here are my thoughts about a possible solution and why it does not work:
C++11's RTTI functions and the type_info class provide a hash_code() method which returns a unique integer for each type in the program. See here. Thus AdapterManager could simply contain a map that given the hash codes for the Adaptee and Adapter classes returns a function pointer to the adapter function. This makes the implementation of the adapt() function above trivial:
template <class Adaptee, class Adapted> Adapted* AdapterManager::adapt( Adaptee* object)
{
AdapterMapKey mk( typeid(Adapted).hash_code(), typeid(Adaptee).hash_code());
AdapterFunction af = adapterMap.get(mk);
if (!af) return nullptr;
return (Adapted*) af(object);
}
Any plug-in can easily extend the framework by simply inserting an additional function into the map. Also note that any plug-in can try to adapt any class to any other class and succeed if there exists a corresponding adapter function registered with AdapterManager regardless of who registered it.
A problem with this is the combination of templates and plug-ins (shared objects / DLLs). Since two plug-ins can instantiate a template class with the same parameters, this could potentially lead to two separate instances of the corresponding type_info structures and potentially different hash_code() results, which will break the mechanism above. Adapter functions registered from one plug-in might not always work in another plug-in.
In Linux, the dynamic linker seems to be able to deal with multiple declarations of types in different shared libraries under some conditions according to this (point 4.2). However the real problem is in Windows, where it seems that each DLL will get its own version of a template instantiation regardless of whether it is also defined in other loaded DLLs or the main executable. The dynamic linker seems quite inflexible compared to the one used in Linux.
I have considered using explicit template instantiations which seems to reduce the problem, but still does not solve it as two different plug-ins might still instantiate the same template in the same way.
Questions:
Does anyone know of a way to achieve this in Windows? If you were allowed to modify existing classes, would this help?
Do you know of another approach to achieve this functionality in C++, while still preserving all the desired properties: no change to existing classes, works with templates, supports plug-ins and is cross-platform?
Update 1:
This project uses the Qt framework for many things including the plug-in infrastructure. Qt really helps with cross platform development. If you know of a Qt specific solution to the problem, that's also welcome.
Update 2:
n.m.'s comment made me realize that I only know about the problem in theory and have not actually tested it. So I did some testing in both Windows and Linux using the following definition:
template <class T>
class TypeIdTest {
public:
virtual ~TypeIdTest() {};
static int data;
};
template <class T> int TypeIdTest<T>::data;
This class is instantiated in two different shared libraries/DLLs with T=int. Both libraries are explicitly loaded at run-time. Here is what I found:
In Linux everything just works:
The two instantiations used the same vtable.
The object returned by typeid was at the same address.
Even the static data member was the same.
So the fact that the template was instantiated in multiple dynamically loaded shared libraries made absolutely no difference. The linker seems to simply use the first loaded instantiation and ignore the rest.
In Windows the two instantiations are 'somewhat' distinct:
The typeid for the different instances returns type_info objects at different addresses. These objects however are equal when tested with ==. The corresponding hash codes are also equal. It seems like on Windows equality between types is established using the type's name - which makes sense. So far so good.
However the vtables for the two instances were different. I'm not sure how much of a problem this is. In my tests I was able to use dynamic_cast to downcast an instance of TypeIdTest to a derived type across shared library boundaries.
What's also a problem is that each instantiation used its own copy of the static field data. That can cause a lot of problems and basically disallows static fields in template classes.
Overall, it seems that even in Windows things are not as bad as I thought, but I'm still reluctant to use this approach given that template instantiations still use distinct vtables and static storage. Does anyone know how to avoid this problem? I did not find any solution.
I think Boost Extension deals with exactly this problem domain:
http://boost-extension.redshoelace.com/docs/boost/extension/index.html
(in preparation for this library's submission to Boost for review)
In particular you'd be interested in what the author wrote in this blog post: "Resource Management Across DLL Boundaries:
RTTI does not always function as expected across DLL boundaries. Check out the type_info classes to see how I deal with that.
I'm not sure whether his solution is actually robust, but he sure gave this thought, before. In fact, there are some samples using Boost Extensions that you can give a go, you might want to use it.

Module and classes handling (dynamic linking)

Run into a bit of an issue, and I'm looking for the best solution concept/theory.
I have a system that needs to use objects. Each object that the system uses has a known interface, likely implemented as an abstract class. The interfaces are known at build time, and will not change. The exact implementation to be used will vary and I have no idea ahead of time what module will be providing it. The only guarantee is that they will provide the interface. The class name and module (DLL) come from a config file or may be changed programmatically.
Now, I have all that set up at the moment using a relatively simple system, set up something like so (rewritten pseudo-code, just to show the basics):
struct ClassID
{
Module * module;
int number;
};
class Module
{
HMODULE module;
function<void * (int)> * createfunc;
static Module * Load(String filename);
IObject * CreateClass(int number)
{
return createfunc(number);
}
};
class ModuleManager
{
bool LoadModule(String filename);
IObject * CreateClass(String classname)
{
ClassID class = AvailableClasses.find(classname);
return class.module->CreateObject(class.number);
}
vector<Module*> LoadedModules;
map<String, ClassID> AvailableClasses;
};
Modules have a few exported functions to give the number of classes they provide and the names/IDs of those, which are then stored. All classes derive from IObject, which has a virtual destructor, stores the source module and has some methods to get the class' ID, what interface it implements and such.
The only issue with this is each module has to be manually loaded somewhere (listed in the config file, at the moment). I would like to avoid doing this explicitly (outside of the ModuleManager, inside that I'm not really concerned as to how it's implemented).
I would like to have a similar system without having to handle loading the modules, just create an object and (once it's all set up) it magically appears.
I believe this is similar to what COM is intended to do, in some ways. I looked into the COM system briefly, but it appears to be overkill beyond belief. I only need the classes known within my system and don't need all the other features it handles, just implementations of interfaces coming from somewhere.
My other idea is to use the registry and keep a key with all the known/registered classes and their source modules and numbers, so I can just look them up and it will appear that Manager::CreateClass finds and makes the object magically. This seems like a viable solution, but I'm not sure if it's optimal or if I'm reinventing something.
So, after all that, my question is: How to handle this? Is there an existing technology, if not, how best to set it up myself? Are there any gotchas that I should be looking out for?
COM very likely is what you want. It is very broad but you don't need to use all the functionality. For example, you don't need to require participants to register GUIDs, you can define your own mechanism for creating instances of interfaces. There are a number of templates and other mechanisms to make it easy to create COM interfaces. What's more, since it is a standard, it is easy to document the requirements.
One very important thing to bear in mind is that importing/exporting C++ objects requires all participants to be using the same compiler. If you think that ever could be a problem to you then you should use COM. If you are happy to accept that restriction then you can carry on as you are.
I don't know if any technology exists to do this.
I do know that I worked with a system very similar to this. We used XML files to describe the various classes that different modules made available. Our equivalent of ModuleManager would parse the xml files to determine what to create for the user at run time based on the class name they provided and the configuration of the system. (Requesting an object that implemented interface 'I' could give back any of objects 'A', 'B' or 'C' depending on how the system was configured.)
The big gotcha we found was that the system was very brittle and at times hard to debug/understand. Just reading through the code, it was often near impossible to see what concrete class was being instantiated. We also found that maintaining the XML created more bugs and overhead than expected.
If I was to do this again, I would keep the design pattern of exposing classes from DLL's through interfaces, but I would not try to build a central registry of classes, nor would I derive everything from a base class such as IObject.
I would instead make each module responsible for exposing its own factory functions(s) to instantiate objects.