Write class with pointer arrays and other classes to file - c++

Is there an easy way to write a class that contains other classes and arrays of pointers to other classes to a file?
Thanks!

What you want to look into is called serialisation. It's the method of turning an object into a stream of bytes. The opposite is called deserialisation and is for constructing an object from a stream of bytes.
There is no way to do this automatically in vanilla C++, and if you would just write the object to file just like that from the object's address and size you would not write all that it points to.
Serialisation can be done in a lot of ways, either manually or using a third party library. I personally really like the Cereal library for serialisation/deserialisation in C++. Link here

Related

Writing complex C++ objects to file

I have a C++ objects that looks like this
class myClass
{
vector<OtherClass*> otherClassVector;
AnotherClass* anotherClassObj;
// A few other primitive types and functions
}
What is the best way to store this to disk and read it back programmatically?
Will using fstream read/write in binary mode work? Or should I use boost serialization? And why?
I don't require the stored file to be human readable.
Using boost::serialization is simply, than write your own serializer. If OtherClass is concrete type (not base) - serialize by read/write is simple, for vector - just save size and than array (if your myClass has no non-POD types) and then store element on which points anotheClassObj pointer...
You can serialize objects with ofstream f("filename", std::ios::binary); only if those objects are POD-types.
Anything else needs to be handled manually. For a simple example, if the object contains any pointers, the addresses of those will be saved, not the data they point at.
For more complex types, you will have to either serialize them completely manually (write a class or function that will save all the POD data from the class and do something tricky with all the "special" data)), or use boost serialization.
The C++_Middleware Writer may be of interest. It has some advantages over other approaches.

How to override what happens when boost::serialize gets a POINTER to an object

Hey so I understand boost serializes pointers automatically as long as you've defined the serialization function for the object it's pointing to, but
what do I do if I want to write a boost serialization function that takes a myClass pointer?
I don't want the boost to do the default action of saving the object that's pointed to and then restoring the pointer to point to that. I want the boost to do something different.
If you need a behaviour that is different from the normal pointer serialization for your special class, you have two options:
Maybe you can adjust the serialize() methods of all structs/classes that contain your myClass pointer in order to achieve the behaviour that you want. However, if you have many such pointers around, this won't be an option. Another possibility might be to use a free function as described in http://www.boost.org/doc/libs/1_47_0/libs/serialization/doc/index.html (written for the myClass pointer).
The other option only works if you are using no more than one archive type (e.g. the binary archive). You can derive from the archive classes and add an overload for the method save (and load, respectively).

C++ (semi) Reflection for file save/load? (Hack?)

I have a bunch of structs in C++. I'd like to save it to file and load them up again. Problem is a few of my structs are pointers to base classes(/structs). So i'd need a way to figure out the type and create it. They really are just POD, they all have public members and no constructors.
What is the easiest way to save and load them from file? I have a LOT of structs and the only types i use are ints, pointers or c strings. I am thinking i could do some macro hacks. But really i have no idea what i should do.
Have you tried the Boost serialization library?
Don't roll your own here - use something well-developed and tested. One idea is Protocol Buffers
The pointers pose a specific issue: I suppose that multiple struct may actually refer to the same pointer and that you'd like a single pointer to be recreated when deserializing...
The first idea, to avoid boiler-plate code, is to create a compile-time reflexion tool:
BOOST_FUSION_ADAPT_STRUCT
BOOST_FUSION_ADAPT_STRUCT_NAMED
Those 2 macros will generate some wicked information on your struct so that you can then use them with Fusion algorithms, which cross the gap between compile-time and run-time.
Now, you need something that will be able to serialize and deserialize your data. Deserialization is usually a bit more difficult, though here you have the advantage of no polymorphism (which always makes things difficult).
Normally, on a first pass you identify the graph of objects to serialize, assign them all an ID, and use this ID in lieu of the pointer when serializing. For deserializing, you use a 3-columns map:
the map is ID -> (pointer to allocated object, list of pointers that could not be set)
allocate all objects, filling the ID map with a pointer to the allocated object each time
when you need to deserialize an ID, look it up in the map, if absent put a pointer to your pointer in the corresponding list
when you put the pointer to the allocated object in the map, take the time to fill all 'not set' pointers (and remove the list at the same time)
Of course, it's better to have frameworks handling it for you. You may try out s11n, if I remember correctly it handles cycles of references.

Is it possible to write a truly generic disk-baked B+Tree implementation?

I wrote a generic in-memory B+Tree implementation in C++ few times ago, and I'm thinking about making it persistent on disk (which is why B+Tree have been designed for initially).
My first thought was to use mmap (I'm under Linux) to be able to manipulate the file as normal memory and just rewrite the new operator of my nodes classes so that it returns pointers in the mapped portion and create a smart pointer which can convert RAM adresses to file offset to link my nodes with others.
But I want my implementation to be generic, so the user can store an int, an std::string, or whatever custom class he wants in the B+tree.
That's where the problem occurs: for primitive types or aggregated types that do not contain pointers that's all good, but as soon as the object contains a pointer/reference to an heap allocated object, this approach no longer works.
So my question is: is there some known way to overcome this difficulty? My personnal searches on the topic end up unsuccessful, but maybe I missed something.
As far as I know, there are three (somewhat) easy ways to solve this.
Approach 1: write a std::streambuf that points to some pre-allocated memory.
This approach allows you to use operator<< and use whatever existing code already exists to get a string representation of what you want.
Pro: re-use loads of existing code.
Con: no control over how operator<< spits out content.
Con: text-based representations only.
Approach 2: write your own (many times overloaded) output function.
Pro: can come up with binary representation.
Pro: exact control over every single output format.
Con: re-write so many output functions... writing overloads for new types by clients is a pain because they shouldn't write functions that fall in your library's namespace... unless you resort to Koenig (argument dependant) lookup!
Approach 3: write a btree_traits<> template.
Pro: can come up with binary representation.
Pro: exact control over every single output format.
Pro: more control on output and format that a function, may contain meta data and all.
Con: still requires you / your library's users to write lots of custom overloads.
Pro: have the btree_traits<> detault to use operator<< unless someone overrides the traits?
You cannot write a truly generic and transparent version since if the pointer in a non-trivial item was allocated with malloc (or new and new[]), then it's already in the heap.
A non-transparent sollution may be serializing the class is an option, and this can be done relatively easy. Before you store the class you'd have to call the serialization function and before pulling it you'd call the deserialize. Boost has good serialization features that you could make work with your B+Tree.
Handling pointers and references in a generic way means you will need to inspect the type of the structure you're trying to store, and its fields. C++ is a language not known for its reflectiveness.
But even in a language with powerful reflection, a generic solution to this problem is difficult. You might be able to get it to work for a subset of types in higher level languages like Python, Ruby, etc. A related and more powerful paradigm is the persistent programming language.
The function you want is usually implemented by delegating responsibility for writing the data block to the target type itself. It's called serialization. It simply means writing an interface with a method to dump data, and a method to load data. Any class that wants to be persisted in your B-tree then simply implements this interface.

I serialized a C++ object, how to allocate memory for it without knowing what type it is?

I have serialized a C++ object and I wish to allocate space for it, although I can't use the "new" operator, because I do not know the object's class. I tried using malloc(sizeof(object)), although trying to typecast the pointer to the type the serialized object is of, the program shut down. Where is the information about the object class stored?
class object
{
public:
virtual void somefunc();
int someint;
};
class objectchild:public object
{
}
object *o=(object*)malloc(sizeof(objectchild));
cout << int(dynamic_cast<objectchild*>(o)) << endl;
This causes a program shutdown.
Thank you in advance.
I have serialized a C++ object
I'm not sure you have. If you've written anything like this:
object *p = new objectchild();
some_file.write((char*)p, sizeof(objectchild));
then you haven't serialized your object. You've written some data to file, and (in most implementations) that data includes a pointer to a vtable and type information. When you "deserialize" the data, on another machine or in another run of the same program, the vtable will not in general be at the same address, and the pointer is useless.
The only way to serialize an object in C++ is to write its data members, in a known format you design. That known format should include enough information to work out the type of the object. There are frameworks that can help you with this, but unlike Java there is no mechanism built into the language or standard libraries.
you should not mix C++ and C memory routes. dynamic_cast checks actual type of object. in your case you have raw memory casted to object *
Rewrite your code so that you can read the type of the object in some way from your serialized archive. You can do this by string or by some custom values you use, but it probably won't be generic.
For example, if you are writing a CFoo object, first stream the value "1". If you are writing a CBar, stream the value "2 .
Then, when reading back the archive, if you see a "1" you know you have to "new" a CFoo, and if you read a "2" you know you have to new a CBar.
Alternatively, you could use a full-featured serialization library (or use it as inspiration).
See for example boost::serialization
You need the following code
object *o = new objectchild;
to use dynamic_cast.
You're trying to dynamic_cast a memory location with nothing in it. malloc has given you free space to place an object, but until the new() operator is called no object is there, so when dynamic_cast does it's type-safety check it will fail. You could try using static_cast rather than dynamic_cast, since static doesn't do a type-safety check, but really you shouldn't mix C and C++ allocation/casting styles like that.