Firstly, I know that writing a class to disk is bad, but you should see some of our other code. D:
My question is: can I write a polymorphic class to disk and then read it in later and not get undefined behaviour? I am going to guess not because of vtables (I think these are generated at runtime and unique to the object?)
I.e.
class A {
virtual ~A() {}
virtual void foo() = 0;
};
class B : public A {
virtual ~B() {}
virtual void foo() {}
};
A * a = new B;
fwrite( a, 1, sizeof( B ), fp );
delete a;
a = new B;
fread( a, 1, sizeof( B ), fp );
a->foo();
delete a;
Thank-you!
I'll suggest you to take a look at Boost Serialization.
we use the term "serialization" to mean the reversible deconstruction of an arbitrary set of C++ data structures to a sequence of bytes. Such a system can be used to reconstitute an equivalent structure in another program context. Depending on the context, this might used implement object persistence, remote parameter passing or other facility.
You might be able to get away with it if such objects are always read back during the same execution of the program that wrote them (though I really don't recommend it). But if the data in the file must persist between different executions of the program, then using the raw bytes of the in-memory objects will almost certainly lead to significant problems.
Each vtable itself is generated at compile time and stored somewhere in the resulting executable. What each object instance contains is just a pointer to the appropriate vtable, and that pointer does not change for the lifetime of any given object. (Multiple inheritance can be a little more complicated, but for this discussion those details aren't relevant. The pointers are still constant.)
So if an object has a vtable pointer and you write the raw bytes of that object to disk, then the vtable pointer is written to disk as well. If you then read back those bytes during the same execution of the program and push them into an appropriate object, it may work since the vtable will still be in the same location and thus the vtable pointer will still be correct.
(However note that everything I just explained there is an implementation detail. While many compilers typically implement virtual functions in that manner, I don't think any of the exact details are guaranteed by the C++ standard. So there could be additional potential problems.)
Now, if this might be possible, why not store such objects for longer durations? Because you have no guarantee that any particular virtual table will be in the same memory location.
Some operating systems may change the memory layout for each execution of the same program. I don't know whether or not this actually affects virtual table locations, but that's certainly a serious risk.
Furthermore, if you ever compile a new version of the program, the location of each virtual table is completely up to the whims of the compiler. Changes to seemingly unrelated parts of the code may cause the compiler to place the relevant virtual tables in different locations. Obviously, that happening would completely break this scheme. And you have no way to prevent it from happening.
(And beyond the vtables, what if new data members need to be added to those objects in subsequent versions of the program? You might have to deal with reading past versions of raw objects' bytes into new versions that have new members or a different layout of members. That can get complicated and ugly as well as error prone.)
Now, even if you only intend to store the objects temporarily for each execution of the program. I still don't think it's a good idea. You are highly restricted as to what kinds of variables these objects can contain. No smart objects (std::string, std::vector, etc). No pointers to memory allocated per each object. Any strings must therefore be stored in raw character arrays. Other dynamic allocation would have to be turned into fixed members or member arrays. That means you lose a lot of C++'s benefits everywhere these objects are used.
Furthermore, these objects and the scheme of writing this directly to disk would need to be accompanied by comments and documentation warning of all the dangers I've described. Otherwise, some future programmer might unknowingly decide to add the wrong kind of data member. Or even worse, they might decide to try storing such objects longer than the execution of the program, opening them up to serious crashes and failures that might not happen until much later in the future (and probably at the worst possible time).
In the end, I strongly suggest using a scheme that stores the data in a format specifically intended for the file. As someone else already mentioned, Boost Serialization is a good option. If not that, there are may be other usable serialization libraries. Or else depending on your needs, you may be able to roll your own mechanism without too much trouble.
The problem is not the vtable. It is stored per class type, not per instance, so you won't write it to file. Basically your code should work (haven't tried it).
However, you should keep in mind that reading pointers/handles from file does not work.
Related
C++ supports dynamic binding through virtual mechanism. But as I understand the virtual mechanism is an implementation detail of the compiler and the standard just specifies the behaviors of what should happen under specific scenarios. Most compilers implement the virtual mechanism through the virtual table and virtual pointer. This is not about implementation detail of virtual pointers and table. My questions are:
Are there any compilers which implement dynamic dispatch of virtual functions in any other way other than the virtual pointer and virtual table mechanism? As far as I have seen most (read G++, Microsoft Visual Studio) implement it through virtual table, pointer mechanism. So practically are there any other compiler implementations at all?
The sizeof of any class with just a virtual function will be size of an pointer (vptr inside this) on that compiler. So given that virtual pointer and TBL mechanism itself is compiler implementation, will this statement I made above be always true?
It is not true that vtable pointers in objects are always the most efficient. My compiler for another language used to use in-object pointers for similar reasons but no longer does: instead it uses a separate data structure which maps the object address to the required meta-data: in my system this happens to be shape information for use by the garbage collector.
This implementation costs a bit more storage for a single simple object, is more efficient for complex objects with many bases, and it is vastly more efficient for arrays, since only a single entry is required in the mapping table for all objects in the array. My particular implementation can also find the meta-data given a pointer to any point interior to the object.
The actual lookup is extremely fast, and the storage requirements very modest, because I am using the best data structure on the planet: Judy arrays.
I also know of no C++ compiler using anything other than vtable pointers, but it is not the only way. In fact, the initialisation semantics for classes with bases make any implementation messy. This is because the complete type has to see-saw around as the object is constructed. As a consequence of these semantics, complex mixin objects lead to massive sets of vtables being generated, large objects, and slow object initialisation. This probably isn't a consequence of the vtable technique as much as needing to slavishly follow the requirement that the run-time type of a subobject be correct at all times. Actually there's no good reason for this during construction, since constructors are not methods and can't sensibly use virtual dispatch: this isn't so clear to me for destruction since destructors are real methods.
To my knowledge, all C++ implementations use a vtable pointer, although it would be quite easy (and perhaps not so bad perf wise as you might think given caches) to keep a small type-index in the object (1-2 B) and subsequently obtain the vtable and type information with a small table lookup.
Another interesting approach might be BIBOP (http://foldoc.org/BIBOP) -- big bag of pages -- although it would have issues for C++. Idea: put objects of the same type on a page. Get a pointer to the type descriptor / vtable at the top of the page by simply and'ing off the less signficant bits of the object pointer. (Doesn't work well for objects on the stack, of course!)
Another other approach is to encode certain type tags/indices in the object pointers themselves. For example, if by construction all objects are 16-byte aligned, you can use the 4 LSBs to put a 4-bit type tag in there. (Not really enough.) Or (particularly for embedded systems) if you have guaranteed unused more-significant-bits in addresses, you can put more tag bits up there, and recover them with a shift and mask.
While both these schemes are interesting (and sometimes used) for other language implementations, they are problematic for C++. Certain C++ semantics, such as which base class virtual function overrides are called during (base class) object construction and destruction, drive you to a model where there is some state in the object that you modify as you enter base class ctors/dtors.
You may find my old tutorial on the Microsoft C++ object model implementation interesting.
http://www.openrce.org/articles/files/jangrayhood.pdf
Happy hacking!
I don't think there are any modern compilers with an approach other than vptr/vtable. Indeed, it would be hard to figure out something else that is not just plain inefficient.
However, there is still a pretty large room for design tradeoffs within that approach. Maybe especially regarding how virtual inheritance is handled. So it makes sense to make this implementation-defined.
If you are interested in this kind of stuff, I strongly suggest reading Inside the C++ Object Model.
sizeof class depends on the compiler. If you want portable code, don't make any assumptions.
Are there any compilers which implement Virtual Mechanism in any other way other than the virtual pointer and virtual table mechanism? As far as i have seen most(read g++,Microsoft visual studio) implement it through virtual table, pointer mechanism. So practically are there any other compiler implementations at all?
All current compilers that I know of use the vtable mechanism.
This is an optimization that's possible because C++ is statically type checked.
In some more dynamic languages there is instead a dynamic search up the base class chain(s), searching for an implementation of a member function that's called virtually, starting in the most derived class of the object. For example, that's how it worked in original Smalltalk. And the C++ standard describes the effect of a virtual call as if such a search had been used.
In Borland/Turbo Pascal in the 1990's such dynamic search was employed for finding handlers of Windows API "window messages". And I think possibly the same in Borland C++. It was in addition to the normal vtable mechanism, used solely for message handlers.
If it was used in Borland/Turbo C++ – I can't remember – then it was in support of a language extensions that allowed you to associate message id's with message handler functions.
The sizeof of any class with just a virtual function will be size of an pointer(vptr inside the this) on that compiler, So given that virtual ptr and tbl mechanism itself is compiler implementation, will this statement I made above be always true?
Formally no (even with assumption of vtable mechanism), it depends on the compiler. Since the standard doesn't require the vtable mechanism it says nothing about placement of vtable pointer in each object. And other rules let the compiler freely add padding, unused bytes, at the end.
But in practice perhaps. ;-)
However it's not something that you should rely on, or that you need to rely on. But in the other direction you can require this, for example if you're defining an ABI. Then any compiler that doesn't, simply doesn't conform to your requirement.
Cheers & hth.,
Are there any compilers which implement Virtual Mechanism in any other way other than the virtual pointer and virtual table mechanism? As far as i have seen most(read g++,Microsoft visual studio) implement it through virtual table, pointer mechanism. So practically are there any other compiler implementations at all?
None that I'm aware of C++ compilers using, though you might find it interesting to read about Binary Tree Dispatch. If you're interested in exploiting the expectation of virtual dispatch tables in any way, you should be aware that compilers can - where the types are known at compile time - sometimes resolve virtual function calls at compile time, so may not consult the table.
The sizeof of any class with just a virtual function will be size of an pointer(vptr inside the this) on that compiler, So given that virtual ptr and tbl mechanism itself is compiler implementation, will this statement I made above be always true?
Assuming no base classes with their own virtual members, and no virtual base classes, it's overwhelmingly likely to be true. Alternatives can be envisaged - such as whole-program analysis revealing only one member in the class heirarchy, and a switch to compile-time dispatch. If run-time dispatch is required, it's hard to imagine why any compiler would introduce further indirection. Still, the Standard deliberately doesn't stipulate these things precisely so that implementations can vary, or be varied in future.
In trying to imagine an alternative scheme, I have come up with the following, along the lines of Yttril's answer. As far as I'm aware, no compiler uses it!
Given a sufficiently large virtual address space and flexible OS memory allocation routines, it would be possible for new to allocate objects of different types in fixed, non-overlapping address ranges. Then the type of an object could be inferred quickly from its address using a right-shift operation, and the result used to index a table of vtables, thus saving 1 vtable pointer per object.
At first glance this scheme might seem to run into problems with stack-allocated objects, but this can be handled cleanly:
For each stack-allocated object, the compiler adds code that adds a record to a global array of (address range, type) pairs when the object is created and removes the record when it is destroyed.
The address range comprising the stack would map to a single vtable containing a large number of thunks that read the this pointer, scan the array to find the corresponding type (vptr) for the object at that address, and call the corresponding method in the vtable pointed to. (I.e. the 42nd thunk will call the 42nd method in the vtable -- if the most virtual functions used in any class is n, then at least n thunks are required.)
This scheme obviously incurs non-trivial overhead (at least O(log n) for the lookup) for virtual method calls on stack-based objects. In the absence of arrays or composition (containment within another object) of stack-based objects, a simpler and faster approach can be used in which the vptr is placed on the stack immediately before the object (note that it is not considered part of the object and does not contribute to its size as measured by sizeof). In this case thunks simply subtract sizeof (vptr) from this to find the correct vptr to use, and forward as before.
IIRC Eiffel uses a different approach and all overrides of a method end up merged and compiled in the same address with a prologue where the object type is checked (so every object must have a type ID, but it's not a pointer to a VMT). This for C++ would require of course that the final function is created at link time.
I don't know any C++ compiler that uses this approach, however.
I've never heard of or seen any compiler that uses any alternative implementation. The reason that vtables are so popular is because that not only is it the most efficient implementation, but it's also the easiest design and most obvious implementation.
On pretty much any compiler you care to use, it's almost certainly true. However, it's not guaranteed and not always true- you can't depend on it, even though it's pretty much always the case. Your favourite compiler could also alter it's alignment, increasing it's size, for funsies, without telling you. From memory, it can also insert whatever debug information and whatever it likes.
C++/CLI deviates from both assumptions. If you define a ref class, it doesn't get compiled into machine code at all; instead, the compiler compiles it into .NET managed code. In the intermediate language, classes are a built-in feature, and the set of virtual methods is defined in the metadata, rather than a method table.
The specific strategy to implement object layout and dispatch depends on the VM. In Mono, an object containing just one virtual method has not the size of one pointer, but needs two pointers in the MonoObject struct; the second one for the synchronization of the object. As this is implementation-defined and also not really useful to know, sizeof is not supported for ref classes in C++/CLI.
First, there were mentioned Borland's proprietary extension to C++, Dynamic Dispatch Virtual Tables (DDVT), and you can read something about it in a file named DDISPATC.ZIP. Borland Pascal had both virtual and dynamic methods, and Delphi introduced yet another "message" syntax, similar to dynamic, but for messages. At this point I'm not sure if Borland C++ had the same features. There was no multiple inheritance in either Pascal or Delphi, so Borland C++ DDVT might be different from either Pascal or Delphi.
Second, in 1990s and a bit earlier there was experimenting with different object models, and Borland was not the most advanced one. I personally think that shutting down IBM SOMobjects did a damage to the world that we all still suffering from. Before shutting down SOM there were experiments with Direct-to-SOM C++ compilers. So instead of C++'s way of invoking methods SOM is used. It is in many ways similar to C++ vtable, with several exceptions. First, to prevent fragile base class problem, programs do not use offsets inside of vtable, because they don't know this offset. It can change if base class introduces new methods. Instead, callers invoke a thunk created in runtime that has this knowledge in its assembly code. And there is one more difference. In C++, when multiple inheritance is used, an object can contain several VMTs IIRC. In contrast to C++, each SOM object has just one VMT, so dispatch code should be different from "call dword ptr [VMT+offset]".
There is a document related to SOM, Release-to-Release Binary Compatibility in SOM. You can find comparison of SOM with another projects I know little of, like Delta/C++ and Sun OBI. They solve a subset of problems that SOM solves, and by doing so they are also having somewhat tweaked invokation code.
I have recently found Visual Age C++ v3.5 for Windows compiler fragment enough to get things running and actually touch it. Most of users are not likely to get OS/2 VM just to play with DTS C++, but having Windows compiler is completely another matter. VAC v3.5 is the first and the last version to support Direct-to-SOM C++ feature. VAC v3.6.5 and v4.0 are not appropriate.
Download VAC 3.5 fixpak 9 from IBM FTP. This fixpak contain many files, so you don't even need to full compiler (I have 3.5.7 distro, but fixpak 9 was big enough to do some tests).
Unpack to e. g. C:\home\OCTAGRAM\DTS
Start command line and run subsequent commands there
Run: set SOMBASE=C:\home\OCTAGRAM\DTS\ibmcppw
Run: C:\home\OCTAGRAM\DTS\ibmcppw\bin\SOMENV.BAT
Run: cd C:\home\OCTAGRAM\DTS\ibmcppw\samples\compiler\dts
Run: nmake clean
Run: nmake
hhmain.exe and its dll are in different directories, so we must make them find each other somehow; since I was doing several experiments, I executed "set PATH=%PATH%;C:\home\OCTAGRAM\DTS\ibmcppw\samples\compiler\dts\xhmain\dtsdll" once, but you can just copy dll near to hhmain.exe
Run: hhmain.exe
I've got an output this way:
Local anInfo->x = 5
Local anInfo->_get_x() = 5
Local anInfo->y = A
Local anInfo->_get_y() = B
{An instance of class info at address 0092E318
}
Tony D's answer correctly points out that compilers are allowed to use whole-program analysis to replace a virtual function call with a static call to the unique possible function implementation; or to compile obj->method() into the equivalent of
if (auto frobj = dynamic_cast<FrequentlyOccurringType>(obj)) {
frobj->FrequentlyOccurringType::method(); // static dispatch on hot path
} else {
obj->method(); // vtable dispatch on cold path
}
Karel Driesen and Urs Hölzle wrote a really fascinating paper way back in 1996 in which they simulated the effect of perfect whole-program optimization on typical C++ applications: "The Direct Cost of Virtual Function Calls in C++". (The PDF is available for free if you Google for it.) Unfortunately, they only benchmarked vtable dispatch versus perfect static dispatch; they didn't compare it to binary tree dispatch.
They did point out that there are actually two kinds of vtables, when you're talking about languages (like C++) that support multiple inheritance. With multiple inheritance, when you call a virtual method that is inherited from the second base class, you need to "fix up" the object pointer so it points to an instance of the second base class. This fixup offset can be stored as data in the vtable, or it can be stored as code in a "thunk". (See the paper for more details.)
I believe all decent compilers these days use thunks, but it did take 10 or 20 years for that market penetration to reach 100%.
I'm attempting to implement a Save/Load feature into my small game. To accomplish this I have a central class that stores all the important variables of the game such as position, etc. I then save this class as binary data to a file. Then simply load it back for the loading function. This seems to work MOST of the time, but if I change certain things then try to do a save/load the program will crash with memory access violations. So, are classes guaranteed to have the same structure in memory on every run of the program or can the data be arranged at random like a struct?
Response to Jesus - I mean the data inside the class, so that if I save the class to disk, when I load it back, will everything fit nicely back.
Save
fout.write((char*) &game,sizeof Game);
Load
fin.read((char*) &game, sizeof Game);
Your approach is extremely fragile. With many restrictions, it can work. These restrictions are not worth subjecting your users (or yourself!) to in typical cases.
Some Restrictions:
Never refer to external memory (e.g. a pointer or reference)
Forbid ABI changes/differences. Common case: memory layout and natural alignment on 32 vs 64 will vary. The user will need a new 'game' for each ABI.
Not endian compatible.
Altering your type's layouts will break your game. Changing your compiler options can do this.
You're basically limited to POD data.
Use offsets instead of pointers to refer to internal data (This reference would be in contiguous memory).
Therefore, you can safely use this approach in extremely limited situations -- that typically applies only to components of a system, rather than the entire state of the game.
Since this is tagged C++, "boost - Serialization" would be a good starting point. It's well tested and abstracts many of the complexities for you.
Even if this would work, just don't do it. Define a file format at the byte-level and write sensible 'convert to file format' and 'convert from file format' functions. You'll actually know the format of the file. You'll be able to extend it. Newer versions of the program will be able to read files from older versions. And you'll be able to update your platform, build tools, and classes without fear of causing your program to crash.
Yes, classes and structures will have the same layout in memory every time your program runs., although I can't say if the standard enforces this. The machine code generated by C++ compilers use "hard-coded" offsets to access type fields, so they are fixed. Realistically, the layout will only change if you modify the C++ class definition (field sizes, order, virtual methods, etc.), compile with a different compiler or change compiler options.
As long as the type is POD and without pointer fields, it should be safe to simply dump it to a file and read it back with the exact same program. However, because of the above-mentionned concerns, this approach is quite inflexible with regard to versionning and interoperability.
[edit]
To respond to your own edit, do not do this with your "Game" object! It certainly has pointers to other objects, and those objects will not exist anymore in memory or will be elsewhere when you'll reload your file.
You might want to take a look at this.
Classes are not guaranteed to have the same structure in memory as pointers can point to different locations in memory each time a class is created.
However, without posting code it is difficult to say with certainty where the problem is.
I open a piece of shared memory and get a handle of it. I'm aware there are several vectors of data stored in the memory. I'd like to access those vectors of data and perform some actions on them. How can I achieve this? Is it appropriate to treat the shared memory as an object so that we can define those vectors as fields of the object and those needed actions as member functions of the object?
I've never dealt with shared memory before. To make things worse, I'm new to C++ and POSIX. Could someone please provide some guidance? Simple examples would be greatly appreciated.
int my_shmid = shmget(key,size,shmflgs);
...
void* address_of_my_shm1 = shat(my_shmid,0,shmflags);
Object* optr = static_cast<Object*>(address_of_my_shm1);
...or, in some other thread/process to which you arranged to pass the address_of_my_shm1
...by some other means
void* address_of_my_shm2 = shat(my_shmid,address_of_my_shm1,shmflags);
You may want to assert that address_of_shm1 == address_of_shm2. But note that I say "may" - you don't actually have to do this. Some types/structs/classes can be read equally well at different addresses.
If the object will appear in different address spaces, then pointers outside the shhm in process A may not point to the same thing as in process B. In general, pointers outside the shm are bad. (Virtual functions are pointers outside the object, and outside the shm. Bad, unless you have other reason to trust them.)
Pointers inside the shm are usable, if they appear at the same address.
Relative pointers can be quite usable, but, again, so long as they point only inside the shm. Relative pointers may be relative to the base of an object, i.e. they may be offsets. Or they may be relative to the pointer itself. You can define some nice classes/templates that do these calculations, with casting going on under the hood.
Sharing of objects through shmem is simplest if the data is just POD (Plain Old Data). Nothing fancy.
Because you are in different processes that are not sharing the whole address space, you may not be guaranteed that things like virtual functions will appear at the same address in all processes using the shm shared memory segment. So probably best to avoid virtual functions. (If you try hard and/or know linkage, you may in some circumstances be able to share virtual functions. But that is one of the first things I would disable if I had to debug.)
You should only do this if you are aware of your implementation's object memory model. And if advanced (for C++) optimizations like splitting structs into discontiguous hot and cold parts are disabled. Since such optimizations rae arguably not legal for C++, you are probably safe.
Obviously you are better off if you are casting to the same object type/class on all sides.
You can get away with non-virtual functions. However, note that it can be quite easy to have the same class, but different versions of the class - e.g. differing in size, e.g. adding a new field and changing the offsets of all of the other fields - so you need to be quite careful to ensure all sides are using the same definitions and declarations.
Pointers cannot be persisted directly to file, because they point to absolute addresses. To address this issue I wrote a relative_ptr template that holds an offset instead of an absolute address.
Based on the fact that only trivially copyable types can be safely copied bit-by-bit, I made the assumption that this type needed to be trivially copyable to be safely persisted in a memory-mapped file and retrieved later on.
This restriction turned out to be a bit problematic, because the compiler generated copy constructor does not behave in a meaningful way. I found nothing that forbid me from defaulting the copy constructor and making it private, so I made it private to avoid accidental copies that would lead to undefined behaviour.
Later on, I found boost::interprocess::offset_ptr whose creation was driven by the same needs. However, it turns out that offset_ptr is not trivially copyable because it implements its own custom copy constructor.
Is my assumption that the smart pointer needs to be trivially copyable to be persisted safely wrong?
If there's no such restriction, I wonder if I can safely do the following as well. If not, exactly what are the requirements a type must fulfill to be usable in the scenario I described above?
struct base {
int x;
virtual void f() = 0;
virtual ~base() {} // virtual members!
};
struct derived : virtual base {
int x;
void f() { std::cout << x; }
};
using namespace boost::interprocess;
void persist() {
file_mapping file("blah");
mapped_region region(file, read_write, 128, sizeof(derived));
// create object on a memory-mapped file
derived* d = new (region.get_address()) derived();
d.x = 42;
d->f();
region.flush();
}
void retrieve() {
file_mapping file("blah");
mapped_region region(file, read_write, 128, sizeof(derived));
derived* d = region.get_address();
d->f();
}
int main() {
persist();
retrieve();
}
Thanks to all those that provided alternatives. It's unlikely that I will be using something else any time soon, because as I explained, I already have a working solution. And as you can see from the use of question marks above, I'm really interested in knowing why Boost can get away without a trivially copyable type, and how far can you go with it: it's quite obvious that classes with virtual members will not work, but where do you draw the line?
To avoid confusion let me restate the problem.
You want to create an object in mapped memory in such a way that after the application is closed and reopened the file can be mapped once again and object used without further deserialization.
POD is kind of a red herring for what you are trying to do. You don't need to be binary copyable (what POD means); you need to be address-independent.
Address-independence requires you to:
avoid all absolute pointers.
only use offset pointers to addresses within the mapped memory.
There are a few correlaries that follow from these rules.
You can't use virtual anything. C++ virtual functions are implemented with a hidden vtable pointer in the class instance. The vtable pointer is an absolute pointer over which you don't have any control.
You need to be very careful about the other C++ objects your address-independent objects use. Basically everything in the standard library may break if you use them. Even if they don't use new they may use virtual functions internally, or just store the address of a pointer.
You can't store references in the address-independent objects. Reference members are just syntactic sugar over absolute pointers.
Inheritance is still possible but of limited usefulness since virtual is outlawed.
Any and all constructors / destructors are fine as long as the above rules are followed.
Even Boost.Interprocess isn't a perfect fit for what you're trying to do. Boost.Interprocess also needs to manage shared access to the objects, whereas you can assume that you're only one messing with the memory.
In the end it may be simpler / saner to just use Google Protobufs and conventional serialization.
Yes, but for reasons other than the ones that seem to concern you.
You've got virtual functions and a virtual base class. These lead to a host of pointers created behind your back by the compiler. You can't turn them into offsets or anything else.
If you want to do this style of persistence, you need to eschew 'virtual'. After that, it's all a matter of the semantics. Really, just pretend you were doing this in C.
Even PoD has pitfalls if you are interested in interoperating across different systems or across time.
You might look at Google Protocol Buffers for a way to do this in a portable fashion.
Not as much an answer as a comment that grew too big:
I think it's going to depend on how much safety you're willing to trade for speed/ease of usage. In the case where you have a struct like this:
struct S { char c; double d; };
You have to consider padding and the fact that some architectures might not allow you to access a double unless it is aligned on a proper memory address. Adding accessor functions and fixing the padding tackles this and the structure is still memcpy-able, but now we're entering territory where we're not really gaining much of a benefit from using a memory mapped file.
Since it seems like you'll only be using this locally and in a fixed setup, relaxing the requirements a little seems OK, so we're back to using the above struct normally. Now does the function have to be trivially copyable? I don't necessarily think so, consider this (probably broken) class:
1 #include <iostream>
2 #include <utility>
3
4 enum Endian { LittleEndian, BigEndian };
5 template<typename T, Endian e> struct PV {
6 union {
7 unsigned char b[sizeof(T)];
8 T x;
9 } val;
10
11 template<Endian oe> PV& operator=(const PV<T,oe>& rhs) {
12 val.x = rhs.val.x;
13 if (e != oe) {
14 for(size_t b = 0; b < sizeof(T) / 2; b++) {
15 std::swap(val.b[sizeof(T)-1-b], val.b[b]);
16 }
17 }
18 return *this;
19 }
20 };
It's not trivially copyable and you can't just use memcpy to move it around in general, but I don't see anything immediately wrong with using a class like this in the context of a memory mapped file (especially not if the file matches the native byte order).
Update:
Where do you draw the line?
I think a decent rule of thumb is: if the equivalent C code is acceptable and C++ is just being used as a convenience, to enforce type-safety, or proper access it should be fine.
That would make boost::interprocess::offset_ptr OK since it's just a helpful wrapper around a ptrdiff_t with special semantic rules. In the same vein struct PV above would be OK as it's just meant to byte swap automatically, though like in C you have to be careful to keep track of the byte order and assume that the structure can be trivially copied. Virtual functions wouldn't be OK as the C equivalent, function pointers in the structure, wouldn't work. However something like the following (untested) code would again be OK:
struct Foo {
unsigned char obj_type;
void vfunc1(int arg0) { vtables[obj_type].vfunc1(this, arg0); }
};
That is not going to work. Your class Derived is not a POD, therefore it depends on the compiler how it compiles your code. In another words - do not do it.
by the way, where are you releasing your objects? I see are creaing in-place your objects, but you are not calling destructor.
Absolutely not. Serialisation is a well established functionality that is used in numerous of situations, and certainly does not require PODs. What it does require is that you specify a well defined serialisation binary interface (SBI).
Serialisation is needed anytime your objects leave the runtime environment, including shared memory, pipes, sockets, files, and many other persistence and communication mechanisms.
Where PODs help is where you know you are not leaving the processor architecture. If you will never be changing versions between writers of the object (serialisers) and readers (deserialisers) and you have no need for dynamically-sized data, then PODs allow easy memcpy based serialisers.
Commonly, though, you need to store things like strings. Then, you need a way to store and retrieve the dynamic information. Sometimes, 0 terminated strings are used, but that is pretty specific to strings, and doesn't work for vectors, maps, arrays, lists, etc. You will often see strings and other dynamic elements serialized as [size][element 1][element 2]… this is the Pascal array format. Additionally, when dealing with cross machine communications, the SBI must define integral formats to deal with potential endianness issues.
Now, pointers are usually implemented by IDs, not offsets. Each object that needs to be serialise can be given an incrementing number as an ID, and that can be the first field in the SBI. The reason you usually don't use offsets is because you may not be able to easily calculate future offsets without going through a sizing step or a second pass. IDs can be calculated inside the serialisation routine on first pass.
Additional ways to serialize include text based serialisers using some syntax like XML or JSON. These are parsed using standard textual tools that are used to reconstruct the object. These keep the SBI simple at the cost of pessimising performance and bandwidth.
In the end, you typically build an architecture where you build serialisation streams that take your objects and translate them member by member to the format of your SBI. In the case of shared memory, it typically pushes the members directly on to the memory after acquiring the shared mutex.
This often looks like
void MyClass::Serialise(SerialisationStream & stream)
{
stream & member1;
stream & member2;
stream & member3;
// ...
}
where the & operator is overloaded for your different types. You may take a look at boost.serialize for more examples.
In my application I have quite some void-pointers (this is because of historical reasons, application was originally written in pure C). In one of my modules I know that the void-pointers points to instances of classes that could inherit from a known base class, but I cannot be 100% sure of it. Therefore, doing a dynamic_cast on the void-pointer might give problems. Possibly, the void-pointer even points to a plain-struct (so no vptr in the struct).
I would like to investigate the first 4 bytes of the memory the void-pointer is pointing to, to see if this is the address of the valid vtable. I know this is platform, maybe even compiler-version-specific, but it could help me in moving the application forward, and getting rid of all the void-pointers over a limited time period (let's say 3 years).
Is there a way to get a list of all vtables in the application, or a way to check whether a pointer points to a valid vtable, and whether that instance pointing to the vtable inherits from a known base class?
I would like to investigate the first
4 bytes of the memory the void-pointer
is pointing to, to see if this is the
address of the valid vtable.
You can do that, but you have no guarantees whatsoever it will work. Y don't even know if the void* will point to the vtable. Last time I looked into this (5+ years ago) I believe some compiler stored the vtable pointer before the address pointed to by the instance*.
I know this is platform, maybe even
compiler-version-specific,
It may also be compiler-options speciffic, depending on what optimizations you use and so on.
but it could help me in moving the
application forward, and getting rid
of all the void-pointers over a
limited time period (let's say 3
years).
Is this the only option you can see for moving the application forward? Have you considered others?
Is there a way to get a list of all
vtables in the application,
No :(
or a way to check whether a pointer
points to a valid vtable,
No standard way. What you can do is open some class pointers in your favorite debugger (or cast the memory to bytes and log it to a file) and compare it and see if it makes sense. Even so, you have no guarantees that any of your data (or other pointers in the application) will not look similar enough (when cast as bytes) to confuse whatever code you like.
and whether that instance pointing to
the vtable inherits from a known base
class?
No again.
Here are some questions (you may have considered them already). Answers to these may give you more options, or may give us other ideas to propose:
how large is the code base? Is it feasible to introduce global changes, or is functionality to spread-around for that?
do you treat all pointers uniformly (that is: are there common points in your source code where you could plug in and add your own metadata?)
what can you change in your sourcecode? (If you have access to your memory allocation subroutines or could plug in your own for example you may be able to plug in your own metadata).
If different data types are cast to void* in various parts of your code, how do you decide later what is in those pointers? Can you use the code that discriminates the void* to decide if they are classes or not?
Does your code-base allow for refactoring methodologies? (refactoring in small iterations, by plugging in alternate implementations for parts of your code, then removing the initial implementation and testing everything)
Edit (proposed solution):
Do the following steps:
define a metadata (base) class
replace your memory allocation routines with custom ones which just refer to the standard / old routines (and make sure your code still works with the custom routines).
on each allocation, allocate the requested size + sizeof(Metadata*) (and make sure your code still works).
replace the first sizeof(Metadata*) bytes of your allocation with a standard byte sequence that you can easily test for (I'm partial to 0xDEADBEEF :D). Then, return [allocated address] + sizeof(Metadata*) to the application. On deallocation, take the recieved pointer, decrement it by `sizeof(Metadata*), then call the system / previous routine to perform the deallocation. Now, you have an extra buffer allocated in your code, specifically for metadata on each allocation.
In the cases you're interested in having metadata for, create/obtain a metadata class pointer, then set it in the 0xDEADBEEF zone. When you need to check metadata, reinterpret_cast<Metadata*>([your void* here]), decrement it, then check if the pointer value is 0xDEADBEEF (no metadata) or something else.
Note that this code should only be there for refactoring - for production code it is slow, error prone and generally other bad things that you do not want your production code to be. I would make all this code dependent on some REFACTORING_SUPPORT_ENABLED macro that would never allow your Metadata class to see the light of a production release (except for testing builds maybe).
I would say it is not possible without related reference (header declaration).
If you want to replace those void pointers to correct interface type, here is what I think to automate it:
Go through your codebase to get a list of all classes that has virtual functions, you could do this fast by writing script, like Perl
Write an function which take a void* pointer as input, and iterate over those classes try to dynamic_cast it, and log information if succeeded, such as interface type, code line
Call this function anywhere you used void* pointer, maybe you could wrap it with a macro so you could get file, line information easy
Run a full automation (if you have) and analyse the output.
The easier way would be to overload operator new for your particular base class. That way, if you know your void* pointers are to heap objects, then you can also with 100% certainty determine whether they're pointing to your object.