How to handle memory management when passing pointers to library functions - c++

I'm working on a project in C++/CLI currently, and I'm trying to wrap my brain around memory allocation. In my code I need to call methods from an external library that I don't have the ability to edit or read the source code. A specific issues I've ran into is the following. They have a class something like this:
class Item {
public:
void SetName(const char* name);
};
Nothing too unusual, it has a method for setting the name... but wait a minute... how am I supposed to free that char* pointer? If I call it from my code like this:
void MyMethod(Item* myItem){
char* name = CallOtherMethodThatGivesMeTheName();
myItem->SetName(name);
};
When am I supposed to deallocate name? I can't stay inside this function forever waiting until myItem no longer uses it, and I have absolutely no clue how the library has implemented SetName().
Should I just delete name within this scope? Won't that mean myItem no longer has a valid pointer though? Maybe I should use some sort of smart pointer? In general, is there a standard way to handle this situation?
And since I am using C++ / CLI, my actual method looks like the following, but I'm looking for a general solution (this might be wrong too).
void MyMethod(Item* myItem, String^ name){
pin_ptr<const wchar_t> pinned_name = PtrToStringChars(name)
myItem->SetName(pinned_name);
};
Thanks for any help!

Ignoring the CLI portion of your question, I provide the following answer. I don't think using CLI is relevant to the core of your question. What you really want to know is when the third party library deems it safe for you to release some allocated string you passed it.
Given the code you provided:
class Item {
public:
void SetName(const char* name);
};
and assuming there is no documentation, I would assume the library makes an internal copy of a c-style string that you provide and that there is no reason to concern yourself with freeing it any more than you would normally. I make this assumption from the context. That would be the proper thing to do. Granted not all third party libraries do the "proper thing", but you cannot be omnipotent.
For example, I would assume:
#include <string>
int main()
{
Item item;
{
std::string name("Sam");
item.SetName(name.c_str());
}
// Carry on doing things with item...
return 0;
}
works just fine, until I examine some behavior that says otherwise. If I am concerned about a third party library behaving poorly, then I test the above in a loop and watch the memory, or witness access violations, but not until I have reason to do so.
In the end:
) Read the documentation for the library.
) Ask on their user groups or forums.
) If none is available, use intuition and context.
) If something unexpected happens, then test.

When you change the value of name after you called SetName and the library uses the originally provided value you know it has been copied.
Otherwise it uses always the current pointer and you cannot deallocate it safely due to the fact you don't know in which order the compiler releases the objects when the program ends.

Usually you only "delete" what you create with "new".
Libraries that return data with pointers, work in two way:
1- Ask you to allocate memory and pass them the pointer (datatype * p)
2- Allocate for you and give you the pointer (return datatype *) or (argument datatype ** p) or (argument datatype * & p)
In first way you know when to free your memory.
In second way, when you finished working with data, you need to call another function from the same library to free the memory for you, pair functions like this pattern are quite common (open close) (alloc free) ...
Of course there can be other ways, so always reading documentation is the best.
In C++/CLI you don't delete objects defined with "^" sign, garbage-collector will take care of them.

Related

Align A Pointer To Have A Functional -> Operator? C++

This quesiton is composed of a couple parts, the first has to do with the -> operator in a class. Does it take some sort of input (according to the C++ standard)? For example
some_return_type? operator->( long address ) {
cast the address to some sort of pointer and do something with it...
return something?...possibly... maybe not?;
}
So in reality A::SomeMethod() would refer to an address for a function in memory passed to ->. Or
A::someStaticOrNonStaticDataMember would refer to an address for a field?
If so (given that we do not have access to the actual type of the class), or something like this exists, what is it, and can we reconstruct part of a pointer, or align a pointer, (or write a class with an algorithm to do this), for a class based on some information about that class, so that it had an operable -> operator, so one could write:
somePointer->A::SomeMethod();
and have it call A::SomeMethod()? And maybe make context for the memory used in the class?
From the comments it seems you want to control how Compiler handles and generates -> tokens. This is for your bad luck not possible, because Compiler doesn't expose such information, nor is it required by Standard to do so
It is like you are trying to have "dynamic" (the C# type) but in C++, unluckily this is not possible. What could be similiar is wrapping some sort of "Closure collection" addressed by strings (a sort of scripting language) but that would be really heavy and not very nice.
Actually doing what you want with the syntax you showed is not possible.
If the type of an object is not known, then you have that object hided behind a "void *". That means basically that the only way you can use that object is by casting it back to its original type.
Suppose you have a DLL that expose 2 functions (with header files)
// creates an object of given type or null_ptr if no object match
void* getObject(std::string obj_type);
// call a method on that object
void callMethod(void* obj, std::string method_name, void* args, void* returnval);
Actually that solution (even if ugly) allows to call methods on objects that you don't know (it could be a lot better than that.)
But that force you to use void* and strings. That's because how C++ resolve method names (in reality also in C# the "dynamic" type generates behind the scenes reflection code that use strings with method names and is particulary slow)
So something similiar can be achieved with
float fuelLiters = 3.0f;
void * myObj = createObject("SomeCar");
callMethod(myObj,"loadFuel", &fuelLiters, null_ptr);
you probably can make the syntax a little better with templates or some macro, but you'll never be able to do something like
myObj->A::loadFuel(fuelLiters);
What you can do is having the externally loaded class, use the same interfaces of your application, says:
class ICar{
public:
void loadFuel(float liters)=0;
};
In that case you can use a function that cast the opaque object handle to ICar. This is what I already doing in a library I wrote 2 years ago:
So you just need the DLL expose a method for casting the class (downcast)
//if given object is implementing a ICar, the correct pointer is returned, else
// this function will return nullptr (or throw exception if you like more)
void * downcast( typeof(ICar), myObj);
You'll need simply
ICar *myCar = static_cast<ICar>(downcast( typeof(ICar), myObj));
myCar->loadFuel(3.0f);
However note that both the DLL and your application should "know" about what "ICar" is, so they must include the "ICar" header.
doing that is definitely possible, I did it already in 2 different ways, so If you need more details about implementation I'll be happy to show a possible way (given I understood correctly your question).
The arrow operator (->) is a dereference operator that is used exclusively with pointers to objects that have members.
foo->bar() is the same as (*foo).bar()
If you want to overload -> you should also overload *

Dynamic Libraries, plugin frameworks, and function pointer casting in c++

I am trying to create a very open plugin framework in c++, and it seems to me that I have come up with a way to do so, but a nagging thought keeps telling me that there is something very, very wrong with what I am doing, and it either won't work or it will cause problems.
The design I have for my framework consists of a Kernel that calls each plugin's init function. The init function then turns around and uses the Kernel's registerPlugin and registerFunction to get a unique id and then register each function the plugin wants to be accessible using that id, respectively.
The function registerPlugin returns the unique id. The function registerFunction takes that id, the function name, and a generic function pointer, like so:
bool registerFunction(int plugin_id, string function_name, plugin_function func){}
where plugin_function is
typedef void (*plugin_function)();
The kernel then takes the function pointer and puts it in a map with the function_name and plugin_id. All plugins registering their function must caste the function to type plugin_function.
In order to retrieve the function, a different plugin calls the Kernel's
plugin_function getFunction(string plugin_name, string function_name);
Then that plugin must cast the plugin_function to its original type so it can be used. It knows (in theory) what the correct type is by having access to a .h file outlining all the functions the plugin makes available. Plugins, by the by, are implemented as dynamic libraries.
Is this a smart way to accomplish the task of allowing different plugins to connect with each other? Or is this a crazy and really terrible programming technique? If it s, please point me in the direction of the correct way to accomplish this.
EDIT: If any clarification is needed, ask and it will be provided.
Function pointers are strange creatures. They're not necessarily the same size as data pointers, and hence cannot be safely cast to void* and back. But, the C++ (and C) specifications allow any function pointer to be safely cast to another function pointer type (though you have to later cast it back to the earlier type before calling it if you want defined behaviour). This is akin to the ability to safely cast any data pointer to void* and back.
Pointers to methods are where it gets really hairy: a method pointer might be larger than a normal function pointer, depending on the compiler, whether the application is 32- or 64-bit, etc. But even more interesting is that, even on the same compiler/platform, not all method pointers are the same size: Method pointers to virtual functions may be bigger than normal method pointers; if multiple inheritance (with e.g. virtual inheritance in the diamond pattern) is involved, the method pointers can be even bigger. This varies with compiler and platform too. This is also the reason that it's difficult to create function objects (that wrap arbitrary methods as well as free functions) especially without allocating memory on the heap (it's just possible using template sorcery).
So, by using function pointers in your interface, it becomes unpractical for the plugin authors to pass back method pointers to your framework, even if they're using the same compiler. This might be an acceptable constraint; more on this later.
Since there's no guarantee that function pointers will be the same size from one compiler to the next, by registering function pointers you're limiting the plugin authors to compilers that implement function pointers having the same size as your compiler does. This wouldn't necessarily be so bad in practice, since function pointer sizes tend to be stable across compiler versions (and may even be the same for multiple compilers).
The real problems start to arise when you want to call the functions pointed to by the function pointers; you can't safely call the function at all if you don't know its true signature (you will get poor results ranging from "not working" to segmentation faults). So, the plugin authors would be further limited to registering only void functions that take no parameters.
It gets worse: the way a function call actually works at the assembler level depends on more than just the signature and function pointer size. There's also the calling convention, the way exceptions are handled (the stack needs to be properly unwound when an exception is thrown), and the actual interpretation of the bytes of function pointer (if it's larger than a data pointer, what do the extra bytes signify? In what order?). At this point, the plugin author is pretty much limited to using the same compiler (and version!) that you are, and needs to be careful to match the calling convention and exception handling options (with the MSVC++ compiler, for example, exception handling is only explicitly enabled with the /EHsc option), as well as use only normal function pointers with the exact signature you define.
All the restrictions so far can be considered reasonable, if a bit limiting. But we're not done yet.
If you throw in std::string (or almost any part of the STL), things get even worse though, because even with the same compiler (and version), there are several different flags/macros that control the STL; these flags can affect the size and meaning of the bytes representing string objects. It is, in effect, like having two different struct declarations in separate files, each with the same name, and hoping they'll be interchangeable; obviously, this doesn't work. An example flag is _HAS_ITERATOR_DEBUGGING. Note that these options can even change between debug and release mode! These types of errors don't always manifest themselves immediately/consistently and can be very difficult to track down.
You also have to be very careful with dynamic memory management across modules, since new in one project may be defined differently from new in another project (e.g. it may be overloaded). When deleting, you might have a pointer to an interface with a virtual destructor, meaning the vtable is needed to properly delete the object, and different compilers all implement the vtable stuff differently. In general, you want the module that allocates an object to be the one to deallocate it; more specifically, you want the code that deallocates an object to have been compiled under the exact same conditions as the code that allocated it. This is one reason std::shared_ptr can take a "deleter" argument when it is constructed -- because even with the same compiler and flags (the only guaranteed safe way to share shared_ptrs between modules), new and delete may not be the same everywhere the shared_ptr can get destroyed. With the deleter, the code that creates the shared pointer controls how it is eventually destroyed too. (I just threw this paragraph in for good measure; you don't seem to be sharing objects across module boundaries.)
All of this is a consequence of C++ having no standard binary interface (ABI); it's a free-for-all, where it is very easy to shoot yourself in the foot (sometimes without realising it).
So, is there any hope? You betcha! You can expose a C API to your plugins instead, and have your plugins also expose a C API. This is quite nice because a C API can be interoperated with from virtually any language. You don't have to worry about exceptions, apart from making sure they can't bubble up above the plugin functions (that's the authors' concern), and it's stable no matter the compiler/options (assuming you don't pass STL containers and the like). There's only one standard calling convention (cdecl), which is the default for functions declared extern "C". void*, in practice, will be the same across all compilers on the same platform (e.g. 8 bytes on x64).
You (and the plugin authors) can still write your code in C++, as long as all the external communication between the two uses a C API (i.e. pretends to be a C module for the purposes of interop).
C function pointers are also likely compatible between compilers in practice, though if you'd rather not depend on this you could have the plugin register a function name (const char*) instead of address, and then you could extract the address yourself using, e.g., LoadLibrary with GetProcAddress for Windows (similarly, Linux and Mac OS X have dlopen and dlsym). This works because name-mangling is disabled for functions declared with extern "C".
Note that there's no direct way around restricting the registered functions to be of a single prototype type (otherwise, as I've said, you can't call them properly). If you need to give a particular parameter to a plugin function (or get a value back), you'll need to register and call the different functions with different prototypes separately (though you could collapse all the function pointers down to a common function pointer type internally, and only cast back at the last minute).
Finally, while you cannot directly support method pointers (which don't even exist in a C API, but are of variable size even with a C++ API and thus cannot be easily stored), you can allow the plugins to supply a "user-data" opaque pointer when registering their function, which is passed to the function whenever it's called; this gives the plugin authors an easy way to write function wrappers around methods and store the object to apply the method to in the user-data parameter. The user-data parameter can also be used for anything else the plugin author wants, which makes your plugin system much easier to interface with and extend. Another example use is to adapt between different function prototypes using a wrapper and extra arguments stored in the user-data.
These suggestions lead to code something like this (for Windows -- the code is very similar for other platforms):
// Shared header
extern "C" {
typedef void (*plugin_function)(void*);
bool registerFunction(int plugin_id, const char* function_name, void* user_data);
}
// Your plugin registration code
hModule = LoadLibrary(pluginDLLPath);
// Your plugin function registration code
auto pluginFunc = (plugin_function)GetProcAddress(hModule, function_name);
// Store pluginFunc and user_data in a map keyed to function_name
// Calling a plugin function
pluginFunc(user_data);
// Declaring a plugin function
extern "C" void aPluginFunction(void*);
class Foo { void doSomething() { } };
// Defining a plugin function
void aPluginFunction(void* user_data)
{
static_cast<Foo*>(user_data)->doSomething();
}
Sorry for the length of this reply; most of it can be summed up with "the C++ standard doesn't extend to interoperation; use C instead since it at least has de facto standards."
Note: Sometimes it's simplest just to design a normal C++ API (with function pointers or interfaces or whatever you like best) under the assumption that the plugins will be compiled under exactly the same circumstances; this is reasonable if you expect all the plugins to be developed by yourself (i.e. the DLLs are part of the project core). This could also work if your project is open-source, in which case everybody can independently choose a cohesive environment under which the project and the plugins are compiled -- but then this makes it hard to distribute plugins except as source code.
Update: As pointed out by ern0 in the comments, it's possible to abstract the details of the module interoperation (via a C API) so that both the main project and the plugins deal with a simpler C++ API. What follows is an outline of such an implementation:
// iplugin.h -- shared between the project and all the plugins
class IPlugin {
public:
virtual void register() { }
virtual void initialize() = 0;
// Your application-specific functionality here:
virtual void onCheeseburgerEatenEvent() { }
};
// C API:
extern "C" {
// Returns the number of plugins in this module
int getPluginCount();
// Called to register the nth plugin of this module.
// A user-data pointer is expected in return (may be null).
void* registerPlugin(int pluginIndex);
// Called to initialize the nth plugin of this module
void initializePlugin(int pluginIndex, void* userData);
void onCheeseBurgerEatenEvent(int pluginIndex, void* userData);
}
// pluginimplementation.h -- plugin authors inherit from this abstract base class
#include "iplugin.h"
class PluginImplementation {
public:
PluginImplementation();
};
// pluginimplementation.cpp -- implements C API of plugin too
#include <vector>
struct LocalPluginRegistry {
static std::vector<PluginImplementation*> plugins;
};
PluginImplementation::PluginImplementation() {
LocalPluginRegistry::plugins.push_back(this);
}
extern "C" {
int getPluginCount() {
return static_cast<int>(LocalPluginRegistry::plugins.size());
}
void* registerPlugin(int pluginIndex) {
auto plugin = LocalPluginRegistry::plugins[pluginIndex];
plugin->register();
return (void*)plugin;
}
void initializePlugin(int pluginIndex, void* userData) {
auto plugin = static_cast<PluginImplementation*>(userData);
plugin->initialize();
}
void onCheeseBurgerEatenEvent(int pluginIndex, void* userData) {
auto plugin = static_cast<PluginImplementation*>(userData);
plugin->onCheeseBurgerEatenEvent();
}
}
// To declare a plugin in the DLL, just make a static instance:
class SomePlugin : public PluginImplementation {
virtual void initialize() { }
};
SomePlugin plugin; // Will be created when the DLL is first loaded by a process
// plugin.h -- part of the main project source only
#include "iplugin.h"
#include <string>
#include <vector>
#include <windows.h>
class PluginRegistry;
class Plugin : public IPlugin {
public:
Plugin(PluginRegistry* registry, int index, int moduleIndex)
: registry(registry), index(index), moduleIndex(moduleIndex)
{
}
virtual void register();
virtual void initialize();
virtual void onCheeseBurgerEatenEvent();
private:
PluginRegistry* registry;
int index;
int moduleIndex;
void* userData;
};
class PluginRegistry {
public:
registerPluginsInModule(std::string const& modulePath);
~PluginRegistry();
public:
std::vector<Plugin*> plugins;
private:
extern "C" {
typedef int (*getPluginCountFunc)();
typedef void* (*registerPluginFunc)(int);
typedef void (*initializePluginFunc)(int, void*);
typedef void (*onCheeseBurgerEatenEventFunc)(int, void*);
}
struct Module {
getPluginCountFunc getPluginCount;
registerPluginFunc registerPlugin;
initializePluginFunc initializePlugin;
onCheeseBurgerEatenEventFunc onCheeseBurgerEatenEvent;
HMODULE handle;
};
friend class Plugin;
std::vector<Module> registeredModules;
}
// plugin.cpp
void Plugin::register() {
auto func = registry->registeredModules[moduleIndex].registerPlugin;
userData = func(index);
}
void Plugin::initialize() {
auto func = registry->registeredModules[moduleIndex].initializePlugin;
func(index, userData);
}
void Plugin::onCheeseBurgerEatenEvent() {
auto func = registry->registeredModules[moduleIndex].onCheeseBurgerEatenEvent;
func(index, userData);
}
PluginRegistry::registerPluginsInModule(std::string const& modulePath) {
// For Windows:
HMODULE handle = LoadLibrary(modulePath.c_str());
Module module;
module.handle = handle;
module.getPluginCount = (getPluginCountFunc)GetProcAddr(handle, "getPluginCount");
module.registerPlugin = (registerPluginFunc)GetProcAddr(handle, "registerPlugin");
module.initializePlugin = (initializePluginFunc)GetProcAddr(handle, "initializePlugin");
module.onCheeseBurgerEatenEvent = (onCheeseBurgerEatenEventFunc)GetProcAddr(handle, "onCheeseBurgerEatenEvent");
int moduleIndex = registeredModules.size();
registeredModules.push_back(module);
int pluginCount = module.getPluginCount();
for (int i = 0; i < pluginCount; ++i) {
auto plugin = new Plugin(this, i, moduleIndex);
plugins.push_back(plugin);
}
}
PluginRegistry::~PluginRegistry() {
for (auto it = plugins.begin(); it != plugins.end(); ++it) {
delete *it;
}
for (auto it = registeredModules.begin(); it != registeredModules.end(); ++it) {
FreeLibrary(it->handle);
}
}
// When discovering plugins (e.g. by loading all DLLs in a "plugins" folder):
PluginRegistry registry;
registry.registerPluginsInModule("plugins/cheeseburgerwatcher.dll");
for (auto it = registry.plugins.begin(); it != registry.plugins.end(); ++it) {
(*it)->register();
}
for (auto it = registry.plugins.begin(); it != registry.plugins.end(); ++it) {
(*it)->initialize();
}
// And then, when a cheeseburger is actually eaten:
for (auto it = registry.plugins.begin(); it != registry.plugins.end(); ++it) {
auto plugin = *it;
plugin->onCheeseBurgerEatenEvent();
}
This has the benefit of using a C API for compatibility, but also offering a higher level of abstraction for plugins written in C++ (and for the main project code, which is C++). Note that it lets multiple plugins be defined in a single DLL. You could also eliminate some of the duplication of function names by using macros, but I chose not to for this simple example.
All of this, by the way, assumes plugins that have no interdependencies -- if plugin A affects (or is required by) plugin B, you need to devise a safe method for injecting/constructing dependencies as needed, since there's no way of guaranteeing what order the plugins will be loaded in (or initialized). A two-step process would work well in that case: Load and register all plugins; during registration of each plugin, let them register any services they provide. During initialization, construct requested services as needed by looking at the registered service table. This ensures that all services offered by all plugins are registered before any of them are attempted to be used, no matter what order plugins get registered or initialized in.
The approach you took is sane in general, but I see a few possible improvements.
Your kernel should export C functions with a conventional calling convention (cdecl, or maybe stdcall if you are on Windows) for the registration of plugins and functions. If you use a C++ function then you are forcing all plugin authors to use the same compiler and compiler version that you use, since many things like C++ function name mangling, STL implementation and calling conventions are compiler specific.
Plugins should only export C functions like the kernel.
From the definition of getFunction it seems each plugin has a name, which other plugins can use to obtain its functions. This is not a safe practice, two developers can create two different plugins with the same name, so when a plugin asks for some other plugin by name it may get a different plugin than the expected one. A better solution would be for plugins to have a public GUID. This GUID can appear in each plugin's header file, so that other plugins can refer to it.
You have not implemented versioning. Ideally you want your kernel to be versioned because invariably you will change it in the future. When a plugin registers with the kernel it passes the version of the kernel API it was compiled against. The kernel then can decide if the plugin can be loaded. For example, if kernel version 1 receives a registration request for a plugin that requires kernel version 2 you have a problem, the best way to address that is to not allow the plugin to load since it may need kernel features that are not present in the older version. The reverse case is also possible, kernel v2 may or may not want to load plugins that were created for kernel v1, and if it does allow it it may need to adapt itself to the older API.
I'm not sure I like the idea of a plugin being able to locate another plugin and call its functions directly, as this breaks encapsulation. It seems better to me if plugins advertise their capabilities to the kernel, so that other plugins can find services they need by capability instead of by addressing other plugins by name or GUID.
Be aware that any plugin that allocates memory needs to provide a deallocation function for that memory. Each plugin could be using a different run-time library, so memory allocated by a plugin may be unknown to other plugins or the kernel. Having allocation and deallocation in the same module avoids problems.
C++ has no ABI. So what you want doing has a restriction: the plugins and your framework must compile & link by same compiler & linker with same parameter in same os. That is meaningless if the achievement is inter-operation in form of binary distribution because each plugin developed for framework has to prepare many version which target at different compiler on different os. So distrbute source code will be more practical than this and that's the way of GNU(download a src, configure and make)
COM is a chose, but it is too complex and out-of-date. Or managed C++ on .Net runtime. But they are only on ms os. If you want a universal solution, I suggest you change to another language.
As jean mentions, since there is no standard C++ ABI and standard name mangling conventions you are stuck to compile things with same compiler and linker. If you want a shared library/dll kind of plugins you have to use something C-ish.
If all will be compiled with same compiler and linker, you may want to also consider std::function.
typedef std::function<void ()> plugin_function;
std::map<std::string, plugin_function> fncMap;
void register_func(std::string name, plugin_function fnc)
{
fncMap[name] = fnc;
}
void call(std::string name)
{
auto it = fncMap.find(name);
if (it != fncMap.end())
(it->second)(); // it->second is a function object
}
///////////////
void func()
{
std::cout << "plain" << std::endl;
}
class T
{
public:
void method()
{
std::cout << "method" << std::endl;
}
void method2(int i)
{
std::cout << "method2 : " << i << std::endl;
}
};
T t; // of course "t" needs to outlive the map, you could just as well use shared_ptr
register_func("plain", func);
register_func("method", std::bind(&T::method, &t));
register_func("method2_5", std::bind(&T::method2, &t, 5));
register_func("method2_15", std::bind(&T::method2, &t, 15));
call("plain");
call("method");
call("method2_5");
call("method2_15");
You can also have plugin functions that take argumens. This will use the placeholders for std::bind, but soon you can find that it is somewhat lacking behind boost::bind. Boost bind has nice documentation and examples.
There is no reason why you should not do this. In C++ using this style of pointer is the best since it's just a plain pointer. I know of no popular compiler that would do anything as brain-dead as not making a function pointer like a normal pointer. It is beyond the bounds of reason that someone would do something so horrible.
The Vst plugin standard operates in a similar way. It just uses function pointers in the .dll and does not have ways of calling directly to classes. Vst is a very popular standard and on windows people use just about any compiler to do Vst plugins, including Delphi which is pascal based and has nothing to do with C++.
So I would do exactly what you suggest personally. For the common well-known plugins I would not use a string name but an integer index which can be looked up much faster.
The alternative is to use interfaces but I see no reason to if your thinking is already based around function pointers.
If you use interfaces then it is not so easy to call the functions from other languages. You can do it from Delphi but what about .NET.
With your function pointer style suggestion you can use .NET to make one of the plugins for example. Obviously you would need to host Mono in your program to load it but just for hypothetical purposes it illustrates the simplicity of it.
Besides, when you use interfaces you have to get into reference counting which is nasty. Stick your logic in function pointers like you suggest and then wrap the control in some C++ classes to do the calling and stuff for you. Then other people can make the plugins with other languages such as Delphi Pascal, Free Pascal, C, Other C++ compilers etc...
But as always, regardless of what you do, exception handling between compilers will remain an issue so you have to think about the error handling. Best way is that the plugins own method catches own plugin exceptions and returns an error code to the kernel etc...
With all the excellent answers above, I'll just add that this practice is actually pretty wide distributed. In my practice, I've seen it both in commercial projects and in freeware/opensource ones.
So - yes, it's good and proven architecture.
You don't need to register functions manually. Really? Really.
What you could use is a proxy implementation for your plugin interface, where each function loads its original from the shared library on demand, transparently, and calls it. Whoever reaches a proxy object of that interface definition just can call the functions. They will be loaded on demand.
If plugins are singletons, then there is no need for manual binding at all (otherwise the correct instance has to be chosen first).
The idea for the developer of a new plugin would be to describe the interface first, then have a generator which generates a stub for the implementation for the shared library, and additionally a plugin proxy class with the same signature but with the autoloading on demand which then is used in the client software. Both should fulfill the same interface (in C++ a pure abstract class).

Avoid Segfault in C++ code if user redefines initialize() in Ruby with Rice

One problem I am struggling with while writing a C++ extension for Ruby is to make it really safe even if the user does silly things. He should get exceptions then, but never a SegFault. A concrete problem is the following: My C++ class has a non-trivial constructor. Then I use the Rice API to wrap my C++ class. If the user redefines initialize() in his Ruby code, then the initialize() function created by Rice is overwritten and the object is neither allocated nor initialized. One toy example could be the following:
class Person {
public:
Person(const string& name): m_name (name) {}
const string& name() const { return m_name; }
private:
string m_name;
}
Then I create the Ruby class like this:
define_class<Person>("Person")
.define_constructor(Constructor<Person, const string&>(), Arg("name"))
.define_method("name", &Person::name);
Then the following Ruby Code causes a Segfault
require 'MyExtension'
class Person
def initialize
end
end
p = Person.new
puts p.name
There would be two possibilities I would be happy about: Forbid overwriting the initialize function in Ruby somehow or check in C++, if the Object has been allocated correctly and if not, throw an exception.
I once used the Ruby C API directly and then it was easy. I just allocated a dummy object consisting of a Null Pointer and a flag that is set to false in the allocate() function and in the initialize method, I allocated the real object and set the flag to true. In every method, I checked for that flag and raised an exception, if it was false. However, I wrote a lot of stupid repetitive code with the Ruby C API, I first had to wrap my C++ classes such that they were accessible from C and then wrap and unwrap Ruby types etc, additionally I had to check for this stupid flag in every single method, so I migrated to Rice, which is really nice and I am very glad about that.
In Rice, however, the programmer can only provide a constructor which is called in the initialize() function created by rice and the allocate() function is predefined and does nothing. I don't think there is an easy way to change this or provide an own allocate function in an "official" way. Of course, I could still use the C API to define the allocate function, so I tried to mix the C API and Rice somehow, but then I got really nasty, I got strange SegFaults and it was really ugly, so I abandoned that idea.
Does anyone here have experiences with Rice or does anyone know how to make this safe?
How about this
class Person
def initialize
puts "old"
end
alias_method :original_initialize, :initialize
def self.method_added(n)
if n == :initialize && !#adding_initialize_method
method_name = "new_initialize_#{Time.now.to_i}"
alias_method method_name, :initialize
begin
#adding_initialize_method = true
define_method :initialize do |*args|
original_initialize(*args)
send method_name, *args
end
ensure
#adding_initialize_method = false
end
end
end
end
class Person
def initialize
puts "new"
end
end
Then calling Person.new outputs
old
new
i.e. our old initialize method is still getting called
This uses the method_added hook which is called whenever a method is added (or redefined) at this point the new method already exists so it's too late to stop them from doing it. Instead we alias the freshly defined initialize method (you might want to work a little harder to ensure the method name is unique) and define another initialize that calls the old initialize method first and then the new one.
If the person is sensible and calls super from their initialize then this would result in your original initialize method being called twice - you might need to guard against this
You could just throw an exception from method_added to warn the user that they are doing a bad thing, but this doesn't stop the method from being added: the class is now in an unstable state. You could of course realias your original initialize method on top of their one.
In your comment you say that in the c++ code, this is a null pointer. If it is possible to call a c++ class that way from ruby, I'm afraid there is no real solution. C++ is not designed to be fool-proof. Basically this happens in c++;
Person * p = 0;
p->name();
A good c++ compiler will stop you from doing this, but you can always rewrite it in a way the compiler cannot detect what is happening. This results in undefined behaviour, the program can do anything, including crash.
Of course you can check for this in every non-static function;
const string& Person::name() const
{
if (!this) throw "object not allocated";
return m_name;
}
To make it easier and avoid double code, create a #define;
#define CHECK if (!this) { throw "object not allocated"; }
const string& name() const { CHECK; return m_name; }
int age() const { CHECK; return m_age; }
However it would have been better to avoid in ruby that the user can redefine initialize.
This is an interesting problem, not endemic to Rice, but to any extension which separates allocation and initialization into separate methods. I do not see an obvious solution.
In Ruby 1.6 days, we did not have allocate/initialize; we had new/initialize. There may still be code left over in Rice which defines MyClass.new instead of MyClass.allocate and MyClass#initialize. Ruby 1.8 separated allocation and initialization into separate methods (http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/23358) and Rice uses the new "allocation framework". But it has the problem you pointed out.
The allocate method cannot construct the object, because it does not have the parameters to pass to the constructor.
Rice could define .new instead (as was done on 1.6), but this doesn't work with #dup and Marshal.load. However, this is probably the safer (and right) solution.
I now think that this is a problem of the Rice library. If you use Rice in the way it is documented, you get these problems and there is no obvious way to solve it and all workarounds have drawbacks and are terrible. So I guess the solution is to fork Rice and fix this since they seem to ignore bug reports.

What vb6 type is ABI-compatible with std::vector?

I've been writing a DLL in C++, now I must call this DLL from a VB6 application.
Here's a code sample from this DLL :
#include <vector>
#include <string>
using namespace std;
void __stdcall DLLFunction (vector<Object>*)
{
// performs a few operations on the Objects contained in the vector.
}
struct Object
{
long CoordX;
long CoordY;
long Width;
long Height;
LPSTR Id;
};
I also defined the "Object struct" in VB6
Private Type Object
CoordX As Integer
CoordY As Integer
Width As Integer
Height As Integer
Id As String
End Type
The issue is I don't know what vb6 type could stand for std::vector in order to call the DLL's function.
Notes :
- I use a vector for the DLL to be able to add objects.
- I use a pointer in order to use as less memory as possible.
- Sorry for my english, it ain't my home language at all.
- Thank you for reading and trying to help me.
Edit :
- I fixed the typing issues (Ids are definitely ended by NullChar, so LPSTR should do the trick).
- I read your answers, and I'd like to thank both of you, your answers are close to one another and a major issue remains. My DLL definitely needs to add elements to the container. Thus, I'm wondering how I could do the trick. Maybe I could add a return type to my function and then make that the function is able to return the items it created (instead of putting it directly into the container) so that the vb6 application gets these items and is able to process them, but I can't figure out how to do this
Edit bis :
#Rook : I feel like I could achieve this by using a new struct.
struct ObjectArrayPointer
{
Object* Pointer;
size_t Counter;
}
And then call my function this way :
void __stdcall DLLFunction (ObjectArrayPointer*);
I would then be able to add objects and edit the size parameter for my VB6 application to find these new objects. Was that what you meant?
You should not be trying to export template containers from a DLL anyway. They're likely to break when faced with newer compilers and libraries (eg. a library built under C++03 will not play well with code built using C++11).
The least painful thing to do is to accept a pointer to a buffer and a length parameter,
void __stdcall DLLFunction (Object* buffer, size_t nObjects);
if the size of the container will not change during execution. This interface is about as simple as it gets, and is easily accessible by any language that understand C calling conventions (eg. almost every single one.)
You've already thrown away most of the use of a std::vector because you've already specialised it to Object; you could consider going all the way and creating your own ObjectCollection class which uses a std::vector internally but presents a non-templated interface. Here's a simple example :
// In your public API header file:
typedef struct object_collection_t *object_collection;
object_collection CreateObjectCollection();
void DestroyObjectCollect(object_collection collection);
void AddObjectToCollection(object_collection collection, Object* object);
// etc
No template types are exposed in any form in the header. This is good.
// And the corresponding code file:
struct object_collection_t
{
std::vector<Object*> objects;
};
object_collection CreateObjectCollection() { return new object_collection_t; }
void DestroyObjectCollect(object_collection collection) { delete collection; }
void AddObjectToCollection(object_collection collection, Object* object)
{
collection->objects.push_back(object);
}
// etc
All of templating code is hidden away, leaving you with a fairly clean and simple interface which present an opaque pointer type that can be passed around by external code but only queried and modified by your own, etc.
EDIT: Incidentally, I've used Object* throughout the above code. It may well be safer and impler to use just plain old Object and avoid all of the issues associated with memory management and pointer manipulation by client code. If Object is sufficiently small and simple, passing by value may be a better approach.
(NB: not checked for compilability or functionality. E&OE. Caveat Implementor!)
You can't do that as it's a C++ class/template. Internally, it's an array but not in a way that can be created from VB6.
Your best bet is to change the function to accept a pointer to an array with a count parameter.
You'll also need to be very careful as to how the type is structured.
C++ ints are Longs in VB6.
Also, the Id string won't be compatible. VB6 will have a pointer to a unicode BString (unless you make it fixed length) where as a C++ will have std::string which is an array of ANSI chars. VB6 MAY marshal this if you pass an array of the objects (rather than a pointer)
The VB6 ABI is the COM Automation ABI.
Therefore, if you need an arry which is VB6 ABI compatible, you should probably use SAFEARRAY. I suggest you should also be using the Compiler COM Support classes:
http://msdn.microsoft.com/en-US/library/5yb2sfxk(v=vs.80).aspx
This question appears to do exactly what you want, using ATL's CComSafeArray class:
conversion between std::vector and _variant_t
You may also want to look at these:
https://stackoverflow.com/search?q=safearray+_variant_t
Alternatives to SAFEARRAY
The alternative to SAFEARRAY is to supply a COM Collection object. This is simply a COM object with a Dispinterface or Dual interface with the methods Count and Item. Item should have dispid=0 to be the default method. You may also want to supply _NewEnum with DISPID_NEWENUM to support the For Each syntax.

I need some C++ guru's opinions on extending std::string

I've always wanted a bit more functionality in STL's string. Since subclassing STL types is a no no, mostly I've seen the recommended method of extension of these classes is just to write functions (not member functions) that take the type as the first argument.
I've never been thrilled with this solution. For one, it's not necessarily obvious where all such methods are in the code, for another, I just don't like the syntax. I want to use . when I call methods!
A while ago I came up with the following:
class StringBox
{
public:
StringBox( std::string& storage ) :
_storage( storage )
{
}
// Methods I wish std::string had...
void Format();
void Split();
double ToDouble();
void Join(); // etc...
private:
StringBox();
std::string& _storage;
};
Note that StringBox requires a reference to a std::string for construction... This puts some interesting limits on it's use (and I hope, means it doesn't contribute to the string class proliferation problem)... In my own code, I'm almost always just declaring it on the stack in a method, just to modify a std::string.
A use example might look like this:
string OperateOnString( float num, string a, string b )
{
string nameS;
StringBox name( nameS );
name.Format( "%f-%s-%s", num, a.c_str(), b.c_str() );
return nameS;
}
My question is: What do the C++ guru's of the StackOverflow community think of this method of STL extension?
I've never been thrilled with this solution. For one, it's not necessarily obvious where all such methods are in the code, for another, I just don't like the syntax. I want to use . when I call methods!
And I want to use $!---& when I call methods! Deal with it. If you're going to write C++ code, stick to C++ conventions. And a very important C++ convention is to prefer non-member functions when possible.
There is a reason C++ gurus recommend this:
It improves encapsulation, extensibility and reuse. (std::sort can work with all iterator pairs because it isn't a member of any single iterator or container class. And no matter how you extend std::string, you can not break it, as long as you stick to non-member functions. And even if you don't have access to, or aren't allowed to modify, the source code for a class, you can still extend it by defining nonmember functions)
Personally, I can't see the point in your code. Isn't this a lot simpler, more readable and shorter?
string OperateOnString( float num, string a, string b )
{
string nameS;
Format(nameS, "%f-%s-%s", num, a.c_str(), b.c_str() );
return nameS;
}
// or even better, if `Format` is made to return the string it creates, instead of taking it as a parameter
string OperateOnString( float num, string a, string b )
{
return Format("%f-%s-%s", num, a.c_str(), b.c_str() );
}
When in Rome, do as the Romans, as the saying goes. Especially when the Romans have good reasons to do as they do. And especially when your own way of doing it doesn't actually have a single advantage. It is more error-prone, confusing to people reading your code, non-idiomatic and it is just more lines of code to do the same thing.
As for your problem that it's hard to find the non-member functions that extend string, place them in a namespace if that's a concern. That's what they're for. Create a namespace StringUtil or something, and put them there.
As most of us "gurus" seem to favour the use of free functions, probably contained in a namespace, I think it safe to say that your solution will not be popular. I'm afraid I can't see one single advantage it has, and the fact that the class contains a reference is an invitation to that becoming a dangling reference.
I'll add a little something that hasn't already been posted. The Boost String Algorithms library has taken the free template function approach, and the string algorithms they provide are spectacularly re-usable for anything that looks like a string: std::string, char*, std::vector, iterator pairs... you name it! And they put them all neatly in the boost::algorithm namespace (I often use using namespace algo = boost::algorithm to make string manipulation code more terse).
So consider using free template functions for your string extensions, and look at Boost String Algorithms on how to make them "universal".
For safe printf-style formatting, check out Boost.Format. It can output to strings and streams.
I too wanted everything to be a member function, but I'm now starting to see the light. UML and doxygen are always pressuring me to put functions inside of classes, because I was brainwashed by the idea that C++ API == class hierarchy.
If the scope of the string isn't the same as the StringBox you can get segfaults:
StringBox foo() {
string s("abc");
return StringBox(s);
}
At least prevent object copying by declaring the assignment operator and copy ctor private:
class StringBox {
//...
private:
void operator=(const StringBox&);
StringBox(const StringBox&);
};
EDIT: regarding API, in order to prevent surprises I would make the StringBox own its copy of the string. I can think fo 2 ways to do this:
Copy the string to a member (not a reference), get the result later - also as a copy
Access your string through a reference-counting smart pointer like std::tr1::shared_ptr or boost:shared_ptr, to prevent extra copying
The problem with loose functions is that they're loose functions.
I would bet money that most of you have created a function that was already provided by the STL because you simply didn't know the STL function existed, or that it could do what you were trying to accomplish.
It's a fairly punishing design, especially for new users. (The STL gets new additions too, further adding to the problem.)
Google: C++ to string
How many results mention: std::to_string
I'm just as likely to find some ancient C method, or some homemade version, as I am to find the STL version of any given function.
I much prefer member methods because you don't have to struggle to find them, and you don't need to worry about finding old deprecated versions, etc,. (ie, string.SomeMethod, is pretty much guaranteed to be the method you should be using, and it gives you something concrete to Google for.)
C# style extension methods would be a good solution.
They're loose functions.
They show up as member functions via intellisense.
This should allow everyone to do exactly what they want.
It seems like it could be accomplished in the IDE itself, rather than requiring any language changes.
Basically, if the interpreter hits some call to a member that doesn't exist, it can check headers for matching loose functions, and dynamically fix it up before passing it on to the compiler.
Something similar could be done when it's loading up the intellisense data.
I have no idea how this could be worked for existing functions, no massive change like this should be taken lightly, but, for new functions using a new syntax, it shouldn't be a problem.
namespace StringExt
{
std::string MyFunc(this std::string source);
}
That can be used by itself, or as a member of std::string, and the IDE can handle all the grunt work.
Of course, this still leaves the problem of methods being spread out over various headers, which could be solved in various ways.
Some sort of extension header: string_ext which could include common methods.
Hmm....
That's a tougher issue to solve without causing issues...
If you want to extend the methods available to act on string, I would extend it by creating a class that has static methods that take the standard string as a parameter.
That way, people are free to use your utilities, but don't need to change the signatures of their functions to take a new class.
This breaks the object-oriented model a little, but makes the code much more robust - i.e. if you change your string class, then it doesn't have as much impact on other code.
Follow the recommended guidelines, they are there for a reason :)
The best way is to use templated free functions. The next best is private inheritance struct extended_str : private string, which happens to get easier in C++0x by the way as you can using constructors. Private inheritance is too much trouble and too risky just to add some algorithms. What you are doing is too risky for anything.
You've just introduced a nontrivial data structure to accomplish a change in code punctuation. You have to manually create and destroy a Box for each string, and you still need to distinguish your methods from the native ones. You will quickly get tired of this convention.