C++ operator new, object versions, and the allocation sizes

C++ operator new, object versions, and the allocation sizes - c++

I have a question about different versions of an object, their sizes, and allocation. The platform is Solaris 8 (and higher).
Let's say we have programs A, B, and C that all link to a shared library D. Some class is defined in the library D, let's call it 'classD', and assume the size is 100 bytes. Now, we want to add a few members to classD for the next version of program A, without affecting existing binaries B or C. The new size will be, say, 120 bytes. We want program A to use the new definition of classD (120 bytes), while programs B and C continue to use the old definition of classD (100 bytes). A, B, and C all use the operator "new" to create instances of D.
The question is, when does the operator "new" know the amount of memory to allocate? Compile time or run time? One thing I am afraid of is, programs B and C expect classD to be and alloate 100 bytes whereas the new shared library D requires 120 bytes for classD, and this inconsistency may cause memory corruption in programs B and C if I link them with the new library D. In other words, the area for extra 20 bytes that the new classD require may be allocated to some other variables by program B and C. Is this assumption correct?
Thanks for your help.

Changing the size of a class is binary incompatible. That means that if you change the size of classD without recompiling the code that uses it, you get undefined behavior (most likely crashes).
A common trick to get around this limitation is to design classD so that it can be safely extended in a binary compatible way, for example by using the Pimpl idiom.
In any case, if you want different programs to use different versions of your class, I think you have no choice but releasing multiple versions of the shared library and have those programs linked to the appropriate version.

Compile Time, you should not change shared object size underneath their clients.
there is a simple workaround for that:
class foo
{
public:
// make sure this is not inlined
static foo* Create()
{
return new foo();
}
}
// at the client
foo* f = foo::Create();

You are correct the memory size is defined at compile time and applications B/C would be in danger of serious memory corruption problems.
There is no way to handle this explicitly at the language level. You need to work with the OS to get the appropriate shared libraries to the application.
You need to version your libraries.
As there is no explicit way of doing this with the build tools you need to do it with file names. If you look at most products this is approx how they work.
In the lib directory:
libD.1.00.so
libD.1.so -> libD.1.00.so // Symbolic link
libD.so -> libD.1.so // Symbolic link
Now at compile time you specify -lD and it links against libD.1.00.so because it follows the symbolic links. At run time it knows to use this version as this is the version it compiled against.
So you now update lib D to version 2.0
In the lib directory:
libD.1.00.so
libD.2.00.so
libD.1.so -> libD.1.00.so // Symbolic link
libD.2.so -> libD.2.00.so // Symbolic link
libD.so -> libD.2.so // Symbolic link
Now when you build with -libD it links against version 2. Thus you re-build A and it will use version 2 of the lib from now on; while B and C will still use version 1. If you rebuild B or C it will use the new version of the library unless you explicitly use an old version of the library when building -libD.1
Some linkers do not know to follow symbolic links very well so there are linker commands that help. gcc use the '-install_name' flag your linker may have a slightly different named flag.
As a runtime check it is usally a good idea to put version information into your shared objects (global variable/function call etc). Thus at runtime you can retrieve the shared libraries version information and check that your application is compatible. If not you should exit with the appropriate error message.
Also note: If you serialize objects of D to a file. You know need to make sure that version information about D is maintained. Libd.2 may know how to read version 1 D objects (with some explicit work), but the inverse would not be true.

Memory allocation is figured out at compile time. Changing the size of a class in D will trigger a recompile.
Consider publicly deriving from the class in question to extend it, if that would apply. Or, compose it in another object.

The amount of memory to allocate is determined at compile time when doing something like
new Object();
but it can be a dynamic parameter such as in
new unsigned char[variable];
I really advise you to go through some middleware to achieve what you want. C++ guarantees nothing in terms of binary interfaces.
Have you looked at protobuf?

In addition to the mentioned 'ad hoc' techniques, you can also model compatibility into your system by saying that your new class A is really a subclass of the 'old' class A. That way, your old code keeps working, but all code that needs the extended functionality needs to be revised.
This design principle is clearly visible in the COM world, where especially interfaces are never changed over versions, only extended by inheritance. Next to that, they only construct classes by the CreateInstance method, which moves the allocation problem to the library containing the class.

Related

How to dynamically register class in a factory class at runtime period with c++

Now, I implemented a factory class to dynamically create class with a idenification string, please see the following code:
void IOFactory::registerIO()
{
Register("NDAM9020", []() -> IOBase * {
return new NDAM9020();
});
Register("BK5120", []() -> IOBase * {
return new BK5120();
});
}
std::unique_ptr<IOBase> IOFactory::createIO(std::string ioDeviceName)
{
std::unique_ptr<IOBase> io = createObject(ioDeviceName);
return io;
}
So we can create the IO class with the registered name:
IOFactory ioFactory;
auto io = ioFactory.createIO("BK5120");
The problem with this method is if we add another IO component, we should add another register code in registerIO function and compile the whole project again. So I was wondering if I could dynamically register class from a configure file(see below) at runtime.
io_factory.conf
------------------
NDAM9020:NDAM9020
BK5120:BK5120
------------------
The first is identification name and the second is class name.
I have tried with Macros, but the parameter in Macros cann't be string. So I was wondering if there is some other ways. Thanks for advance.
Update:
I didn't expect so many comments and answers, Thank you all and sorry for replying late.
Our current OS is Ubuntu16.04 and we use the builtin compiler that is gcc/g++5.4.0, and we use CMake to manage the build.
And I should mention that it is not a must that I should register class at runtime period, it is also OK if there is a way to do this in compile period. What I want is just avoiding the recompiling when I want to register another class.

So I was wondering if I could dynamically register class from a configure file(see below) at runtime.
No. As of C++20, C++ has no reflection features allowing it. But you could do it at compile time by generating a simple C++ implementation file from your configuration file.

How to dynamically register class in a factory class at runtime period with c++
Read much more about C++, at least a good C++ programming book and see a good C++ reference website, and later n3337, the C++11 standard. Read also the documentation of your C++ compiler (perhaps GCC or Clang), and, if you have one, of your operating system. If plugins are possible in your OS, you can register a factory function at runtime (by referring to to that function after a plugin providing it has been loaded). For examples, the Mozilla firefox browser or recent GCC compilers (e.g. GCC 10 with plugins enabled), or the fish shell, are doing this.
So I was wondering if I could dynamically register class from a configure file(see below) at runtime.
Most C++ programs are running under an operating system, such as Linux. Some operating systems provide a plugin mechanism. For Linux, see dlopen(3), dlsym(3), dlclose(3), dladdr(3) and the C++ dlopen mini-howto. For Windows, dive into its documentation.
So, with a recent C++ implementation and some recent operating systems, y ou can register at runtime a factory class (using plugins), and you could find libraries (e.g. Qt or POCO) to help you.
However, in pure standard C++, the set of translation units is statically known and plugins do not exist. So the set of functions, lambda-expressions, or classes in a given program is finite and does not change with time.
In pure C++, the set of valid function pointers, or the set of valid possible values for a given std::function variable, is finite. Anything else is undefined behavior. In practice, many real-life C++ programs accept plugins thru their operating systems or JIT-compiling libraries.
You could of course consider using JIT-compiling libraries such as asmjit or libgccjit or LLVM. They are implementation specific, so your code won't be portable.
On Linux, a lot of Qt or GTKmm applications (e.g. KDE, and most web browsers, e.g. Konqueror, Chrome, or Firefox) are coded in C++ and do load plugins with factory functions. Check with strace(1) and ltrace(1).
The Trident web browser of MicroSoft is rumored to be coded in C++ and probably accepts plugins.
I have tried with Macros, but the parameter in Macros can't be string.
A macro parameter can be stringized. And you could play x-macros tricks.
What I want is just avoiding the recompiling when I want to register another class.
On Ubuntu, I recommend accepting plugins in your program or library
Use dlopen(3) with an absolute file path; the plugin would typically be passed as a program option (like RefPerSys does, or like GCC does) and dlopen-ed at program or library initialization time. Practically speaking, you can have lots of plugins (dozen of thousands, see manydl.c and check with pmap(1) or proc(5)). The dlsym(3)-ed C++ functions in your plugins should be declared extern "C" to disable name mangling.
A single C++ file plugin (in yourplugin.cc) can be compiled with g++ -Wall -O -g -fPIC -shared yourplugin.cc -o yourplugin.so and later you would dlopen "./yourplugin.so" or an absolute path (or configure suitably your $LD_LIBRARY_PATH -see ld.so(8)- and pass "yourplugin.so" to dlopen). Be also aware of Rpath.
Consider also (after upgrading your GCC to GCC 9 at least, perhaps by compiling it from its source code) using libgccjit (it is faster than generating temporary C++ code in some file and compiling that file into a temporary plugin).
For ease of debugging your loaded plugins, you might be interested by Ian Taylor's libbacktrace.
Notice that your program's global symbols (declared as extern "C") can be accessed by name by passing a nullptr file path to dlopen(3), then using dlsym(3) on the obtained handle. You want to pass -rdynamic -ldl when linking your program (or your shared library).
What I want is just avoiding the recompiling when I want to register another class.
You might registering classes in a different translation unit (a short one, presumably). You could take inspiration from RefPerSys multiple #include-s of its generated/rps-name.hh file. Then you would simply recompile a single *.cc file and relink your entire program or library. Notice that Qt plays similar tricks in its moc, and I recommend taking inspiration from it.
Read also J.Pitrat's book on Artificial Beings: the Conscience of a Conscious Machine ISBN which explains why a metaprogramming approach is useful. Study the source code of GCC (or of RefPerSys), use or take inspiration from SWIG, ANTLR, GNU bison (they all generate C++ code) when relevant

You seem to have asked for more dynamism than you actually need. You want to avoid the factory itself having to be aware of all of the classes registered in it.
Well, that's doable without going all the way runtime code generation!
There are several implementations of such a factory; but I am obviously biased in favor of my own: einpoklum's Factory class (gist.github.com)
simple example of use:
#include "Factory.h"
// we now have:
//
// template<typename Key, typename BaseClass, typename... ConstructionArgs>
// class Factory;
//
#include <string>
struct Foo { Foo(int x) { }; }
struct Bar : Foo { Bar(int x) : Foo(x) { }; }
int main()
{
util::Factory<std::string, Foo, int> factory;
factory.registerClass<Bar>("key_for_bar");
auto* my_bar_ptr factory.produce("key_for_bar");
}
Notes:
The std::string is used as a key; you could have a factory with numeric values as keys instead, if you like.
All registered classes must be subclasses of the BaseClass value chosen for the factory. I believe you can change the factory to avoid that, but then you'll always be getting void *s from it.
You can wrap this in a singleton template to get a single, global, static-initialization-safe factory you can use from anywhere.
Now, if you load some plugin dynamically (see #BasileStarynkevitch's answer), you just need that plugin to expose an initialization function which makes registerClass() class calls on the factory; and call this initialization function right after loading the plugin. Or if you have a static-initialization safe singleton factory, you can make the registration calls in a static-block in your plugin shared library - but be careful with that, I'm not an expert on shared library loading.

Definetly YES!
Theres an old antique post from 2006 that solved my life for many years. The implementation runs arround having a centralized registry with a decentralized registration method that is expanded using a REGISTER_X macro, check it out:
https://web.archive.org/web/20100618122920/http://meat.net/2006/03/cpp-runtime-class-registration/
Have to admit that #einpoklum factory looks awesome also. I created a headeronly sample gist containing the code and a sample:
https://gist.github.com/h3r/5aa48ba37c374f03af25b9e5e0346a86

Class size is different across modules (DLLs). How and why?

I have two Visual Studio 2015 C++ projects, namely A and B. A is compiled as a shared library and B uses it. They both utilize a class Foo defined in A.
The problem occurs at a line in B that looks like:
auto p = std::make_shared<Foo>(3);
raising an AccessViolationException.
I realized that A and B recognize the size of Foo differently and that it makes the constructor of Foo in B go over the boundary of memory allocated by A's make_shared. Using the Watch window of Visual studio, I could see sizeof(Foo) is 1832 when the code is running in A module, while the same watch entry gives a value of 1813 when the code is running in B module.
I tried to delete all intermediate and output files of both projects and rebuild the entire solution but never helped.
So, how can a single class can appear in different sizes in different modules? What determines the memory layout of a class? Finally and most importantly, how can I fix the problem?

You cannot reason about object sizes (or layout) from a purist stand point unless one of the following is true:
The type is a standard layout type.
or
Everything about the build is constant - in particular compiler flags, compiler used, compiler version, etc.
Finally and most importantly, how can I fix the problem ?
Bascily you have 2 options.
Use a standard layout type or make sure it's build the same way. For this reason, most publicly available DLL or shared library interfaces doens't use advanced types (ie. non standard-layout) in their interfaces and some even stay with c compatible code (sometime to actually be compatible with c).

C++ Passing std::string by reference to function in dll

I have the problem with passing by reference std::string to function in dll.
This is function call:
CAFC AFCArchive;
std::string sSSS = std::string("data\\gtasa.afc");
AFCER_PRINT_RET(AFCArchive.OpenArchive(sSSS.c_str()));
//AFCER_PRINT_RET(AFCArchive.OpenArchive(sSSS));
//AFCER_PRINT_RET(AFCArchive.OpenArchive("data\\gtasa.afc"));
This is function header:
#define AFCLIBDLL_API __declspec(dllimport)
AFCLIBDLL_API EAFCErrors CAFC::OpenArchive(std::string const &_sFileName);
I try to debug pass-by-step through calling the function and look at _sFileName value inside function.
_sFileName in function sets any value(for example, t4gs..\n\t).
I try to detect any heap corruption, but compiler says, that there is no error.
DLL has been compiled in debug settings. .exe programm compiled in debug too.
What's wrong?? Help..!
P.S. I used Visual Studio 2013. WinApp.
EDIT
I have change header of func to this code:
AFCLIBDLL_API EAFCErrors CAFC::CreateArchive(char const *const _pArchiveName)
{
std::string _sArchiveName(_pArchiveName);
...
I really don't know, how to fix this bug...
About heap: it is allocated in virtual memory of our process, right? In this case, shared virtual memory is common.

The issue has little to do with STL, and everything to do with passing objects across application boundaries.
1) The DLL and the EXE must be compiled with the same project settings. You must do this so that the struct alignment and packing are the same, the members and member functions do not have different behavior, and even more subtle, the low-level implementation of a reference and reference parameters is exactly the same.
2) The DLL and the EXE must use the same runtime heap. To do this, you must use the DLL version of the runtime library.
You would have encountered the same problem if you created a class that does similar things (in terms of memory management) as std::string.
Probably the reason for the memory corruption is that the object in question (std::string in this case) allocates and manages dynamically allocated memory. If the application uses one heap, and the DLL uses another heap, how is that going to work if you instantiated the std::string in say, the DLL, but the application is resizing the string (meaning a memory allocation could occur)?

C++ classes like std::string can be used across module boundaries, but doing so places significant constraints on the modules. Simply put, both modules must use the same instance of the runtime.
So, for instance, if you compile one module with VS2013, then you must do so for the other module. What's more, you must link to the dynamic runtime rather than statically linking the runtime. The latter results in distinct runtime instances in each module.
And it looks like you are exporting member functions. That also requires a common shared runtime. And you should use __declspec(dllexport) on the entire class rather than individual members.
If you control both modules, then it is easy enough to meet these requirements. If you wish to let other parties produce one or other of the modules, then you are imposing a significant constraint on those other parties. If that is a problem, then consider using more portable interop. For example, instead of std::string use const char*.
Now, it's possible that you are already using a single shared instance of the dynamic runtime. In which case the error will be more prosaic. Perhaps the calling conventions do not match. Given the sparse level of detail in your question, it's hard to say anything with certainty.

I encountered similar problem.
I resolved it synchronizing Configuration Properties -> C / C++ settings.
If you want debug mode:
Set _DEBUG definition in Preprocessor Definitions in both projects.
Set /MDd in Code Generation -> Runtime Library in both projects.
If you want release mode:
Remove _DEBUG definition in Preprocessor Definitions in both projects.
Set /MD in Code Generation -> Runtime Library in both projects.
Both projects I mean exe and dll project.
It works for me especially if I don't want to change any settings of dll but only adjust to them.

C++ run time type mismatch with Python module?

Unfortunately I can't post the source code for this, but I will try to set it up as best I can.
I have a case where dynamic_cast fails to cast to a derived class type, and I know it should succeed (ie, I know the actual type of the instance).
Also typeid for a heap allocated object doesn't equal the typeid for a stack allocated object!! IE,
Foo mstack;
Foo*mheap = new Foo();
typeid(mstack) == typeid(*mheap); // returns FALSE!?
So there is clearly a RTTI problem somewhere. The class implementation (for both base and derived classes) is in one shared library, the malfunctioning code is in a second shared library which is loaded as a Python module in the Python interpreter (all on linux, same problem when using either gcc 4 or Intel C++ compiler). If I write a simple little test executable that links both shared libraries, everything works fine. I've tried --export-dynamic when linking the shared libraries without success (looks like it's intended for use with executables).
Anybody have any pointers for where to look? Is there something particular about the way Python uses dlopen() that causes this kind of problem?

This is caused by Python loading the extension module with RTLD_LOCAL, and the solution is to force Python to load it with RTLD_GLOBAL instead (see OP's comment).

memory address of a member variable of argument objects changes when dll function is called

class SomeClass
{
//some members
MemberClass one_of_the_mem_;
}
I have a function foo( SomeClass *object ) within a dll, it is being called from an exe.
Problem
address of one_of_the_mem_ changes during the time the dll call is dispatched.
Details:
before the call is made (from exe):
'&(this).one_of_the_mem_' - `0x00e913d0`
after - in the dll itself :
'&(this).one_of_the_mem_' - `0x00e913dc`
The address of object remains constant. It is only the member whose address shift by c every time.
I want some pointers regarding how can I troubleshoot this problem.
Code :
Code from Exe
stat = module->init ( this,
object_a,
&object_b,
object_c,
con_dir
);
Code in DLL
Status_C ModuleClass( SomeClass *object, int index, Config *conf, const char* name)
{
_ASSERT(0); //DEBUGGING HOOK
...
Update 1:
I compared the Offsets of members following Michael's instruction and they are the same in both cases.
Update 2:
I found a way to dump the class layout and noticed the difference in size, I have to figure out why is that happening though.
linked is the question that I found to dump class layout.
Update 3:
Final Update : Solved the problem, much thanks to Michael Burr.
it turned out that one of the build was using 32 bit time, _USE_32BIT_TIME_T was defined in it and the other one was using 64 bit time. So it generated the different layout for the object, attached is the difference file.

Your DLL was probably compiled with different set of compiler options (or maybe even a slightly different header file) and the class layout is different as a result.
For example, if one was built using debug flags and other wasn't or even if different compiler versions were used. For example, the libraries used by different compiler versions might have subtle differences and if your class incorporates a type defined by the library you could have different layouts.
As a concrete example, with Microsoft's compiler iterators and containers are sensitive to release/debug, _SECURE_SCL on/off , and _HAS_ITERATOR_DEBUGGING on/off setting (at least up though VS 2008 - VS 2010 may have changed some of this to a certain extent). See http://connect.microsoft.com/VisualStudio/feedback/details/352699/secure-scl-is-broken-in-release-builds for some details.
These kinds of issues make using C++ classes across DLL boundaries a bit more fragile than using straight C interfaces. They can occur in C structures as well, but it seems like C++ libraries have these differences more often (I think that's the nature of having richer functionality).
Another layout-changing issue that occurs every now and then is having a different structure packing option in effect in the different compiles. One thing that can 'hide' this is that pragmas are often used in headers to set structure packing to a certain value, and sometimes you may come across a header that does this without changing it back to the default (or more correctly the previous setting). If you have such a header, it's easy to have it included in the build for one module, but not another.

that sounds a bit wierd, you should show more code, it should 'move' if it being passed by ref, it sounds more like a copy of it is being made and that having the member function called.
Perhaps the DLL versions is compiled against a different version that you are referencing. check and make sure the header file is for the same version as the dll.
Recompile the library if you can.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js