C++ (semi) Reflection for file save/load? (Hack?) - c++

I have a bunch of structs in C++. I'd like to save it to file and load them up again. Problem is a few of my structs are pointers to base classes(/structs). So i'd need a way to figure out the type and create it. They really are just POD, they all have public members and no constructors.
What is the easiest way to save and load them from file? I have a LOT of structs and the only types i use are ints, pointers or c strings. I am thinking i could do some macro hacks. But really i have no idea what i should do.

Have you tried the Boost serialization library?

Don't roll your own here - use something well-developed and tested. One idea is Protocol Buffers

The pointers pose a specific issue: I suppose that multiple struct may actually refer to the same pointer and that you'd like a single pointer to be recreated when deserializing...
The first idea, to avoid boiler-plate code, is to create a compile-time reflexion tool:
BOOST_FUSION_ADAPT_STRUCT
BOOST_FUSION_ADAPT_STRUCT_NAMED
Those 2 macros will generate some wicked information on your struct so that you can then use them with Fusion algorithms, which cross the gap between compile-time and run-time.
Now, you need something that will be able to serialize and deserialize your data. Deserialization is usually a bit more difficult, though here you have the advantage of no polymorphism (which always makes things difficult).
Normally, on a first pass you identify the graph of objects to serialize, assign them all an ID, and use this ID in lieu of the pointer when serializing. For deserializing, you use a 3-columns map:
the map is ID -> (pointer to allocated object, list of pointers that could not be set)
allocate all objects, filling the ID map with a pointer to the allocated object each time
when you need to deserialize an ID, look it up in the map, if absent put a pointer to your pointer in the corresponding list
when you put the pointer to the allocated object in the map, take the time to fill all 'not set' pointers (and remove the list at the same time)
Of course, it's better to have frameworks handling it for you. You may try out s11n, if I remember correctly it handles cycles of references.

Related

what is a good way to efficiently store a list of c++ objects from different classes using Flatbuffers?

I have a list of c++ objects derived from different classes. I would like to use Flatbuffers to persist/restore them.
An obvious way to do this is with a union, but i do not want to waste bytes or use a vector of pointers to a superclass (assuming this is an option in Flatbuffers).
Another way is to store the concatenated bytes of all the objects and a separate map from the object index to the byte offset and class type.. but perhaps there is a better way.
Another way would be to use internal links in the objects which allow them to refer to each other. This would allow efficient storage of arbitrary data structures. This is currently my preferred route, but i am not sure if it is possible - and it may not be the best option. Perhaps if i overrode the pack/unpack mechanism it might be possible to place byte offsets into the link fields. Usage of full reflection would be ok.
any assistance would be appreciated!
thanks
As per Mr van Oortmerssen, creator of Flatbuffers, it is possible to create a union of tables, where each table corresponds to an object class, then reference that union in each object in the list.
Example at https://github.com/batwicket/flatbuffers_list_test.

C++ implementation of game tree

I'm going to represent a chess game as C++ structure. I think, that the best option will be a tree structure(cause at each deep we have several possible moves).
Is it a good approach?
struct TreeElement{
SomeMoveType move;
TreeElement *parent;
std::vector<TreeElement*> children;
};
How to efficiency store this kind of data in a file? Is there a way to make sure, that the whole tree structure will be stored in the same part of memory, which will allow to use mmap function?
To store the data in the same section of memory, you probably want to supply an Allocator object for the std::vector<TreeElement *> you use, and have that allocate from your block.
To be able to serialize it, instead of storing actual pointers, you might consider storing offsets in the block. Then when you read the data back in, you can add the address of the start of the block to each offset to turn it back into an address.
Depending on the OS/compiler you're using, there may be some support for this already. For example, Microsoft's compiler supports __based pointers, that are pretty much what I've described: a base address, and each pointer based off that address is really just an offset, not a full pointer. The mention of mmap indicates that's probably not available to you directly, but it's possible that the compiler/OS you're using has something similar. Otherwise, you'll probably have to do the job on your own (e.g., with a based_pointer class).
The real question is why you're trying to serialize a move-tree at all. In a typical case, you're better off just saving the current board position (or, about equivalently, the move history) and re-generating the move tree from that when/if needed. That's enough smaller that it's really easy to store.
I think using a std::vector is not the best way to create the tree, singly linked lists style tree construction is probably simpler and best.

Is it possible to write a truly generic disk-baked B+Tree implementation?

I wrote a generic in-memory B+Tree implementation in C++ few times ago, and I'm thinking about making it persistent on disk (which is why B+Tree have been designed for initially).
My first thought was to use mmap (I'm under Linux) to be able to manipulate the file as normal memory and just rewrite the new operator of my nodes classes so that it returns pointers in the mapped portion and create a smart pointer which can convert RAM adresses to file offset to link my nodes with others.
But I want my implementation to be generic, so the user can store an int, an std::string, or whatever custom class he wants in the B+tree.
That's where the problem occurs: for primitive types or aggregated types that do not contain pointers that's all good, but as soon as the object contains a pointer/reference to an heap allocated object, this approach no longer works.
So my question is: is there some known way to overcome this difficulty? My personnal searches on the topic end up unsuccessful, but maybe I missed something.
As far as I know, there are three (somewhat) easy ways to solve this.
Approach 1: write a std::streambuf that points to some pre-allocated memory.
This approach allows you to use operator<< and use whatever existing code already exists to get a string representation of what you want.
Pro: re-use loads of existing code.
Con: no control over how operator<< spits out content.
Con: text-based representations only.
Approach 2: write your own (many times overloaded) output function.
Pro: can come up with binary representation.
Pro: exact control over every single output format.
Con: re-write so many output functions... writing overloads for new types by clients is a pain because they shouldn't write functions that fall in your library's namespace... unless you resort to Koenig (argument dependant) lookup!
Approach 3: write a btree_traits<> template.
Pro: can come up with binary representation.
Pro: exact control over every single output format.
Pro: more control on output and format that a function, may contain meta data and all.
Con: still requires you / your library's users to write lots of custom overloads.
Pro: have the btree_traits<> detault to use operator<< unless someone overrides the traits?
You cannot write a truly generic and transparent version since if the pointer in a non-trivial item was allocated with malloc (or new and new[]), then it's already in the heap.
A non-transparent sollution may be serializing the class is an option, and this can be done relatively easy. Before you store the class you'd have to call the serialization function and before pulling it you'd call the deserialize. Boost has good serialization features that you could make work with your B+Tree.
Handling pointers and references in a generic way means you will need to inspect the type of the structure you're trying to store, and its fields. C++ is a language not known for its reflectiveness.
But even in a language with powerful reflection, a generic solution to this problem is difficult. You might be able to get it to work for a subset of types in higher level languages like Python, Ruby, etc. A related and more powerful paradigm is the persistent programming language.
The function you want is usually implemented by delegating responsibility for writing the data block to the target type itself. It's called serialization. It simply means writing an interface with a method to dump data, and a method to load data. Any class that wants to be persisted in your B-tree then simply implements this interface.

A design issue with a tree of information in C++

Sorry in advance for the lengthy explanation!
I have a C++ application that uses a hash_map to store a tree of information that was parsed from a text file. The values in the map are either a child hash_map or a string. These values were parsed from a text file and then stored into the map.
I wanted to avoid having to send the strings and maps as a copy to the hash map assignment function, so when file was parsed, I created a pointer to new string() or a new hash_map() and stored the value into the map as "arbitrary" data (pointer to a void).
However, this poses a pretty big problem when it comes to clean-up, as deleting a void doesn't behave like one would want it to (and it makes sense). I looked for an easy solution by just creating an Object class and made child classes called StringObj and HashMap, which stored their respective data, and the appropriate destructor was called since the hash_map value type was changed to a pointer to an Object.
Is there an easier way to solve this? I looked into dynamic casting and thought it might work well, since I could catch the exception from the failed cast, and treat it appropriately, but I can't help but feel there might be a simpler solution, or that I'm over-complicating it a bit.
Suggestions?
Thanks in advance,
Jengerer
Use boost::variant (which is equivalent to a C++ union for user-defined types), C++ union (applicable in this case as you're working with only pointers) or boost::any (which can store any type) to store a pointer to either hash_map or string.
One other option is that you could store a std::pair<hash_map*, string*> for each entry in the hash map. Obviously set the unused pointer in each pair to NULL so you can tell which is used and which isn't.
Debatable whether that's neater than your approach or not, although I would hazard that it's less code since you don't need definitions of Object, StringObj and HashMap.

Fast method to read and store serialized objects with pointers and pointers to pointers in C++

I'm needing a fast method to read and store objects with pointers and pointers to pointers in xml files in c++ . Every object has it's own id , name , and class type.
You should build a map of pointers to IDs as you serialise your data.
You can't do it for pointers, you'll need to define some other method of identifying objects - like GUIDs or some other unique identifiers. In many cases you can just store the objects themselves instead of pointers.
Have you checked out boost::serialize?
I'm pretty sure that it automatically does one level of pointer indirection, and it is capable of writing a "form" of xml.
I've tried boost ,but for size of my project it's too big ( it's big , but it has high simplicity about 4-5 classes )