Class conflicts in loaded shared libraries - c++

Say I have a class in my main process:
class MyClass
{
void doStuff();
int myThing;
int mySecondThing;
bool myThirdThing;
};
And I load a shared library, mysharedlib.so, with a newer version of the class compiled in:
class MyClass
{
void doStuff();
int myThing;
int mySecondThing;
bool myThirdThing;
std::string myFourthThing;
u32 myFifthThing;
};
What happens when I create an instance of MyClass / pass existing instances around between the library's functions and the main executable's functions?
I know the two libraries live in a different address space but passing the data between the library and the executable is what confuses me.
Does this behave differently when using gmodule?

Problems might occur when using MyClass objects .
It depends on how you use them.
Take the following scenario (bogus code here).
MyClass* ptr = SharedLibHandle->CreateMyClass();
ptr->doStuffNonVirtual(); //1 this might work fine
ptr->doStuffVirtual(); //2 this will work fine
ptr->myThing= 5; // 3 this might work fine
MyClass* localAllocPtr = new MyClass();
SharedLibHandle()->DoSomethingWithTheClass(localAllocPtr);
...
void DoSomethingWithTheClass(MyClass* ptr)
{
ptr->myFourthThing = " mama " ; // 4 this might seem to work fine
}
In the example above there are several possible use cases based on the place of instantiation and usage :
ptr handles the scenario where the class is instantiated in the so with the size defined in the so , then used by your executable with the size defined there.
localAllocPtr handles the reverse scenario (class instantiated in your executable then passed to .so).
Taking each one :
Call to non virtual function.
Non virtual functions are resolved at compile time, this meaning that if you have a different code implementation inside your executable, the stack pointer will jump to your function implementation instead of the one in the .so . It will work as expected if your code is the same in both of the executable and so , and the structure alignment remains the same (which is most likely).
Call to virtual function
This will work fine, since it will jump into a vftable then jump to the correct memory address in the .so . The .so initialized the class, so offsets, jumps and everything will be legal.
Access of commonly defined member
This will work fine only if myThing has the same alignment inside the structure, meaning he's at *(ptr+0) offset inside the structure. If by any chance inside your class myThing is first and mySecondThing is second, while in the .so mySecondThing is first while myThing is second, then you will change the wrong parameter. Which ironically will have no effect if you continue to use the class inside your executable and not pass it back to the .so (Let's say ignorance is a bliss).
Access to a non-allocated member
When your executable allocs localAllocPtr it will allocate it with the sizeof(MyClass) as defined in your executable. In your executable the class doesn't define a string and a u32. When passing this allocated structure to the .so , the .so will consider the class as having the members and size according to it's definition. When accessing myFourthThing it will access a zone of memory that would normally be *(ptr + 8). If that zone of memory is in use (someone allocated there) you will write outside your ptr bounds into someone else's memory, and it might seem to work fine but you will end up with one of the hardest bugs to find. If after *(ptr +8) nothing is allocated, you'll get lucky and get a segmentation fault.
In order to avoid the sort of problem you are describing a common approach is the pImpl idiom , which allows you to make the class specific implementation private, so you can add virtual functions and member variables while keeping the exposed definition of the class the same.

Related

Why aren't C++ constructors capable of adding in new member variables?

It would make more intuitive sense to me if you could add in member variables in the constructor. This way, the class can adapt to changing input.
In C++ an object always has a fixed size. If constructors can add members at runtime, that guarantee goes out the window. In addition, in C++ all objects of the same type have the same size. Since a class can have multiple different constructors, the different constructors could specify different sizes.
This single, fixed size is the magic sauce that makes a number of C++'s high-performance tricks work, and in C++ convenience often gives way to speed. For example, an array of objects actually holds the objects. Not references to the objects, literally the objects. It can do this because everything in the array is the same size and the compiler can generate all of the indexing at compile time. CPUs love this because access is dead predictable and it can make full use of caches (assuming the access patterns you write allow it to do so). The more that's known and fixed at compile time, the more optimization opportunities the compiler has.
What you can do is add a member like a std::map or std::unordered_map that maps an identifier to its data. If all data is of the same type, this can be as easy as
std::map<std::string, int> members;
and access looks something like
members["hit points"] -= damage;
Note that while the map is inside the object, these mapped variables are not "inside" the object, They have to first be looked up in the map and then the data needs to be loaded from wherever it resides in in dynamic memory. This can slow down access considerably compared to a member that is known at compile time and reduced to an offset from the beginning of the object at a memory location that was probably already loaded into cache with the rest of the object.
Let's say we add that to the standard. Let's say we decide to introduce new keyword append to create new class member from within constructor. Then, one could have a following class:
struct A
{
A(int n) {
append int x = n;
}
A(std::string s) {
append std::string str = s;
}
};
Now, what is the sizeof(A)? Is it sizeof(int) or sizeof(std::string)? Remember that sizeof is a compile time operation. Compiler must be able to know that, it cannot be deferred to runtime.
And one more example:
void foo(A a)
{
std::cout << a.x; //should this compile?
std::cout << a.str; //or should this compile?
}
How would compiler know if a has member x or member str accessible to foo? Compilation in C++ is done in translation units, with each translation unit being compiled completely separately from the others. If foo() is defined in foo.cpp and it is called from main.cpp, compiler would have no idea which operation is valid. Moreover, both could be valid, just for different A objects.
C++ has many ways to add some flexibility to amount of members in classes, notably inheritance (to add new members) and templates (to create members of different type in the same class template). There is no need to try to introduce mechanisms from interpreted languages like Python.

Compiler Optimizations - Function has no address

I have not used much pointers to member functions but I think that found some dangerous scenarios when using such pointers.
The problem comes when compiler decides not to assign address to function, because of some optimization. It happened with VS 2015 even in Debug, x86 (with disabled Optimization - /Od). I am refactoring one old system, moving some code in a common static library (common.lib) so to be able to be used from several projects. Even if not the best pattern, the old implementation depends heavily from function member pointers and I do not want to change this. For example, I added the interface ModuleBase to one very big old class to something like:
class ModuleBase
{
public:
typedef void (ModuleBase::*Main)() const; // moved from old module
virtual void FunctionMain() const = 0; // Function has no address, possibly due to compiler optimizations.
virtual void FunctionSecondary() const = 0; // Function has no address, possibly due to compiler optimizations.
};
class OldModule : public ModuleBase
{
public:
virtual void FunctionMain() const {};
virtual void FunctionSecondary() const {};
}
The idea was to move ModuleBase in the Static library, but OldModule to remain in the main EXE project. While ModuleBase was in the main project it worked fine but when I move it in the static Common.lib it start crashing! It took me about 2 days to finally notice that at several places the compiler decided (but only for the Static Library) not to assign addresses to FunctionMain, FunctionSecondary() and etc.. from ModuleBase. So when pointers to these virtual functions were passed to other routines they were zeroes.
For example in the code bellow:
new Manager::ModuleDecription(
"Test Module",
"Secondary Scene",
"Description"
PosX,
PosY,
Proc,
&ModuleBase::FunctionSecondary //contains nullptr when in static library!!!!!
The last member in the structure was zero but only when is in the static library. It was quite nasty because I had to check many other things before to notice this. Also there are other pointers which were not zero because the structure was not zeroed in the constructor so one has to notice that address value is different and crashes when trying to call the function.
So my questions are -
1) Am I seeing this right - is this valid situation (that compiler is removing functions addresses, for the same code when moved in a static library)?
2) How to force compiler always to keep the member function addresses?
My apology, I found no problems with the addresses of pointers-to-members-functions in Visual Studio. Pointers to the base interface virtual functions are resolved Ok, even if placed in a Static Library. Reasons for my problems were:
1) Debugger sometimes shows function addresses of template classes as zeroes
2) Reason for the crashes was that the main project had the /vmg compiler option, but I missed to put it in the Static Library project. In such case one should be careful to use /vmg always in all referenced library projects (complications because of it is another topic).
Anyway, using pointers-to-members functions together with the object pointer is usually a sign of bad underlying design.
I hope this may help someone.

Class static members contributing to program memory footprint even if class is not used

In class I want to have constant array of constant C strings:
.cpp
const char* const Colors::Names[] = {
"red",
"green"
};
.h
class Colors {
public:
static const char* const Names[];
};
The array should be common to all instances of class Colors (even though I plan to have just one instance but it should not metter), hence declaring array static.
The requirment is that if class is not instantied, array should not consume any memory in binary file.
However, with above solution, it does consume:
.rodata._ZN6Colors5NamesE
0x00000000 0x8
not sure about C strings itself as cannot find them in a map file but I assume they consume memory as well.
I know that one solution to this would be to use constexpr and C++17 where is it no longer needed to have definition of static constexpr members outside of class.
However, for some reasons (i.e. higher compilation times in my build system and slighlty higher program memory footprint) I don't want to change c++ standard version.
Another idea is to drop static (as I plan to have one instance anyway). However, the first issue with this solution is that I have to specify array size, which I would rather prefer not to do, otherwise I get:
error: flexible array member 'Colors::Names' in an otherwise empty 'class Colors'
Second issue is that array is placed in RAM section (inside class object), and only C strings are placed in FLASH memory.
Does anyone know other solutuions to this issue?
PS. My platform is Stm32 MCU and using GCC ARM compiler
EDIT (to address some of the answers in comments)
As suggested in comments this can't be done with just static members.
So the question should probably actually be: How to create (non-static) class array member, that's placed in read only memory (not initialized), which is placed in a memory only if the class is actually used in the program and preferably common for all instances of that class? Array itself is only used from that class.
Some background info:
Let's say that array has size of 256, and each C string 40 chars. That's 1kB for array + 10kB for C strings (32 bit architecture). Class is a part of library that is used by different projects (programs). If the class is not used in that project then I don't want that it (and it's array) would occupy even a single byte beacuse I need that FLASH space for other things, therefore compresion is not an option.
If there will be no other solutions then I will consider possiblity of removing unused sections by linker (alothough was hoping for a simpler solution).
Thanks for all suggestions.

c++ plugin : Is it ok to pass polymorphic objects?

When using dynamic libraries, I understand that we should only pass Plain Old Data-structures across boundaries. So can we pass a pointer to base ?
My idea is that the application and the library could both be aware of a common Interface (pure virtual method, = 0).
The library could instantiate a subtype of that Interface,
And the application could use it.
For instance, is the following snippet safe ?
// file interface.h
class IPrinter{
virtual void print(std::string str) = 0;
};
-
// file main.cpp
int main(){
//load plugin...
IPrinter* printer = plugin_get_printer();
printer->print( std::string{"hello"} );
}
-
// file plugin.cpp (compiled by another compiler)
IPrinter* plugin_get_printer(){
return new PrinterImpl{};
}
This snippet is not safe:
the two sides of your DLL boundaries do not use the same compiler. This means that the name mangling (for function names) and the vtable layout (for virtual functions) might not be the same (implementation specific.
the heap on both sides may also be managed differently, thus you have risks related to the deleting of your object if it's not done in the DLL.
This article presents very well the main challenges with binary compatible interfaces.
You may however pass to the other side of the mirror a pointer, as part of a POD as long as the other part doesn't us it by iself (f.ex: your app passes a pointer to a configuration object to the DLL. Later another DLL funct returns that pointer to your app. Your app can then use it as expected (at least if it wasn't a pointer to a local object that no longer exists) .
The presence of virtual functions in your class means that your class is going to have a vtable, and different compilers implement vtables differently.
So, if you use classes with virtual methods across DLL calls where the compiler used on the other side is different from the compiler that you are using, the result is likely to be spectacular crashes.
In your case, the PrinterImpl created by the DLL will have a vtable constructed in a certain way, but the printer->print() call in your main() will attempt to interpret the vtable of IPrinter in a different way in order to resolve the print() method call.

Persistent class variables

I have a question regarding static variables, or some other way to do so.
I have a master class, PatternMatcher. I have several derived units from that, depending on what matcher is used. Now each subclass needs to store a vector of floats, but within each class it is constant. The data for that vector is read during initialization, and can be up to 1GB in size (smallest I have is 1MB, biggest is 1GB).
Currently when I have for example two instances of Matcher_A, it would allocate twice the memory. I do not know in advance which matchers are to be used (per run it will be three matchers, and you can use the same matcher several times). I would prefer to not check during run time whether the wanted matcher is already initialized somewhere, since this would require additional code for every change I do.
Currently I allocate the 3 matchers with
PatternMatcher* a = new PMMatcherA();
PatternMatcher* b = new PMMatcherB();
PatternMatcher* c = new PMMatcherC();
, but since they are user selected, it could happen that A and C are the same for example. When I run a check via typeid(a).name();, it would give me PatternMatcher as result, never matter what class I used to initiate with. PatternMatcher basically is purely a virtual class.
I always thought that static means that a variable is constant over different allocations, but when I define my vector as static, I would get a linker resolve error. In an earlier iteration, I had these vectors global, but would prefer them to be localized to their classes.
What are the keywords I need to use to have the vector from on initialization be available for the next initialization already? A simple check if the vector size is greater than 0 would already be enough, but every object uses its own vector.
static keyword is a way to go - that would store exactly one copy of a member for the whole class. What you were missing is an actual declaration of such static in a compilation module so that the linker can use it. For instance:
header file foo.h:
struct Foo {
static int s_int;
}
source file foo.cpp:
#include "foo.h"
int Foo::s_int; // optionally =0 for initialization
The second part is vital as this will allocate a memory space for the object to be used as a static member.
Keep in mind, though, that:
static members will all be initialized before the main(), which means your 1GB of data will be read regardless of whether anyone ever uses that particular class
You can work around the abovementioned issue, but then you will have to be checking if the data load and initialization has happened during run-time
There's another option for you, however. If you store your floats "as-is" (i.e. 32 bits per each, in binary format) you can just simply "map" the files into memory spaces and access them as if they were already loaded - the OS will take care of loading appropriate 4K pages into RAM when needed.
Read more about mmap at http://en.wikipedia.org/wiki/Mmap
Yes, static is what you need. You can use it like this:
class MyClass
{
private:
static std::vector< float > data_;
};
std::vector< float > MyClass::data_;
Please note that in the class itself you only declare static variables. But you also need to define them outside of the class exactly once. That's why we have the line std::vector< float > MyClass::data_;, if you omit that, you will have linker errors.
After this, every object of MyClass class will share the same data_ vector.
You can operate it either from any object of the class:
MyClass a;
a.data_.push_back(0);
or through the class name:
MyClass::data_.push_back(0);
when I define my vector as static, I would get a linker resolve
error.
This is because you declare the static variable (in your header file) but you never explicitly initialize it in one of your implementation file (.cpp).
For example:
//AClass.h
class AClass
{
private:
static std::vector<int> static_vector;
};
and in the .cpp implementation file:
std::vector<int> AClass::static_vector;