Python-style pickling for C++? - c++

Does anyone know of a "language level" facility for pickling in C++? I don't want something like Boost serialization, or Google Protocol Buffers. Instead, something that could automatically serialize all the members of a class (with an option to exclude some members, either because they're not serializable, or else because I just don't care to save them for later). This could be accomplished with an extra action at parse time, that would generate code to handle the automatic serialization. Has anyone heard of anything like that?

I don't believe there's any way to do this in a language with no run-time introspection capabilities.

something that could automatically
serialize all the members of a class
This is not possible in C++. Python, C#, Java et al. use run-time introspection to achieve this. You can't do that in C++, RTTI is not powerful enough.
In essence, there is nothing in the C++ language that would enable someone to discover the member variables of an object at run-time. Without that, you can't automatically serialize them.

There's the standard C++ serialization with the << and >> operators, although you'll have to implement these for each of your classes (which it sounds like you don't want to do). Some practitioners say you should alway implement these operators, although of course, most of us rarely do.

perhaps xml Data Binding? gsoap is just one of many options. You can automatically generate code for mapping between data structure and xml schema. Not sure that setting this up would be easier than other options you mention

One quick way to do this that I got working once when I needed to save a struct to a file was to cast my struct to a char array and write it out to a file. Then when I wanted to load my struct back in, I would read the entire file (in binary mode), and cast the whole thing to my struct's type. Easy enough and exploits the fact that structs are stored as a contiguous block in memory. I wouldn't expect this to work with convoluted data structures or pointers, though, but food for thought.

Related

How to go from a handle contained in "sub" structure in C to a simple object in C++?

The question is in the title but I think it deserves some explanation as it can be very unclear :
I must rewrite in C++ an API currently written in C. The parameters taken in the functions can be handles, contained in a structure of structures (of structures)...
It means that, to manipulate a handle, the user of the API must write something like : getHandleValue(struct1.subStruct1.myHandle);
One of my main objectives by rewriting the code in C++ is to implement all of this in Object Oriented style.
So I'd like something like : myObject->getValue; it's also to avoid the tedious calling of the handle with all the structures and sub structures (reminder : struct1.subStruct1.myHandle)
The main issue I encounter is that two handles from two different subStructures can have the same name. Same for the subStructures, two can have the same name in two different structures.
So I have that question:
Is it possible to forget the tedious calling with all the . and make the type of calling I want possible ? if it's not with an object, is it possible with a simple handle(getHandleValue(myHandle)), somehow "hiding" the whole actual address of the handle to the user ?
And in any cases, when you call handle1 for instance, how can you tell you call the handle1 from subStructure1 or the handle1 from subStructure2 ?
If you wanted to make your question more useful for both yourself and others, you'd probably need to tell us a bit more about the problem domain, and what the API is for. As it stands, it's a question whose original form would not be useful to anyone, yourself included, since its narrow scope bypasses everything that you really would like to know but don't know yet that you need to know :) You don't want to make the question too wide in scope, since then it may become off-topic on SO, thus your application-specific details would be needed. I'm sure you could present them in a generic way so that you wouldn't spill any secrets - but we do need to know the "concrete shape" of the problem domain whose API you'd be reimplementing.
It's a trivial task as presented, but it's up to you to decide which handle is actually needed, so if multiple handles have the same name, you have to distinguish between them somehow, e.g. by using different getter method names:
auto MyClass::getBarHandle() const { return foo.bar.h1; }
auto MyClass::getBazHandle() const { return foo.baz.h1; }
Alas, you don't really want the answer to this detail yet - the implementation details have obscured the big picture here, and this is a classical XY problem. I'd be very leery of assuming that the concept of low-level "handles" needs to be captured directly in your C++ API. It may be that iterators, object references and values are all that the user will need - who knows at this point. This has to be a conscious choice, not just parroting the C API.
You're not "porting" an API to C++. There's no such thing. Whoever uses such a term has no idea what they are talking about. You have to design a new API in C++, and then reuse the C code (or even the C API as-is, if needed) to implement it. Thus you need to understand the C++ idioms - how anyone writing C++ expects a C++ API to behave. It should be idiomatic C++. Same could be said of any expressive high level language, e.g. if you wanted to have a Python API, it should be pythonic (meaning: idiomatic Python), and probably far removed from how the C API might look.
Points to consider (and that's necessarily just a fraction of what you need to think about):
iterator support so that your data structures can be traversed - and that must work with range-for, otherwise your API will be universally hated.
useful range/iterator adapters and predicate functions , so that the data can be filtered to answer commonly asked questions without tedium (say you want to iterate over elements that fulfill certain properties).
value semantics support where appropriate, so that you don't prematurely pessimize performance by forcing the users to only store the objects on the heap. Modern C++ is really good at making value types useful, so the "everything is accessed via a pointer" mindset is rather counterproductive.
object and sub-object ownership - this ties into value semantics, too.
appropriate support of both non-modifying and modifying access, i.e. const iterators, const references, potential optimizations implied by non-modifying access, etc.
see whether PIMPL would be helpful as an implementation detail, and if so - does it make sense to leverage it for implicit sharing, while also keeping in mind the pitfalls.
You need to have real use cases in mind - ways to easily accomplish complex tasks using the power of the language and its standard library - so that your API won't be in the way. A good C++ API will not resemble its counterpart C API at all, really, since the level of abstraction expected of C++ APIs is much higher.
implement all of this in Object Oriented style.
The task isn't to write in some bastardized "C with objects" language, since that's not what C++ is all about. In C++, all encapsulated data types are classes, but that doesn't mean much - in C you also would be operating on objects, and a good C API would provide a degree of encapsulation too. The term "object" as it applies to C++ usually means a value of some type, and an integer variable is just as much an object as std::vector variable would be.
It's a task that starts at a high level. And once the big picture is in place, the details needed to fill it in would become self-evident, although this certainly requires experience in C++. C++ APIs designed by fresh converts to C++ are universally terrible unless said converts are mentored to do the right thing or have enough software engineering experience to explore the field and learn quickly. You'd do well to explore various other well-regarded C++ APIs, but this isn't something that can be done in one afternoon, I'm afraid. If your application domain is similar to other products that offer C++ APIs, you may wish to limit your search to that domain, but you're not guaranteed that the APIs will be of high quality, since most commercial offerings lag severely behind the state of the art in C++ API design.
#Unslander Monica :
First, thanks for your fast and dense answer. There's a lot of useful information and some technical terms I didn't know about so thanks very much !
You're not "porting" an API to C++. There's no such thing. Whoever uses such a term has no idea what they are talking about.
I didn't say I was porting the API, I just said that I was rewriting it, doing another version in a different language. And yes, I'm a "fresh convert" as you say but I'm not a complete ignorant. :)
I did do a high level work, for instance I made a class diagram and use cases. I also put myself in a user's shoes and called the API functions the way I'd see it.
But, now that it comes to the implementation, I ask myself some questions of feasibility. The question I asked in my publication was more a question of curiosity than a distress call...
Anyway, as you guessed I can't talk much about my project since it's private. But what I can do is give you the big picture
Currently : This is generated automatically from a XML file. We parse it, then create the following type of structure :
struct {
HANDLE hPage;
struct {
HANDLE hLine1;
struct {
HANDLE hWord;
}tLine1;
HANDLE hLine2;
struct {
HANDLE hWord;
}tLine2;
}tPage;
}tBook;
The user then calls any object via its handle. For example getValue(tBook.tPage.tLine2.hWord);
This is in C. In C++, it won't be structures but classes with a collection of objects defined by me. The class Page will have a collection of Lines for instance.
class Page {
private :
list<Line> lines;
}
The functions available for the user are mostly basic ones (set/get value or state, wait...) The API's job is to call with its functions, several functions from diverse underlying software components.
Concerning your remarks,
Thus you need to understand the C++ idioms - how anyone writing C++ expects a C++ API to behave. It should be idiomatic C++.
I've already thought of ways to introduce RAII, STL lib, smart pointers, overloaded operators... etc
iterator support so that your data structures can be traversed - and that must work with range-for
What do you mean by "range-for" ? Do you mean range-based for loops ?
so the "everything is accessed via a pointer" mindset is rather counterproductive.
That's more the philosophy of the current API in C, not mine :)
The task isn't to write in some bastardized "C with objects" language
No of course. But the current API's functioning is very, very hard to understand and some functions are really dense and sometimes too much complicated to even rewrite them in a different way.
For timing constraints, unfortunately I won't be able to adapt all of the API and my first thoughts when I saw the code is "OK... how do I do it in C++ ? In C, it's handles stocked in structures, in C++ it would be classes stocking handles, directly objects ?" Hence me saying "rewrite it Object Oriented style" ;) sorry if that came out wrong
Also you're right about exploring other APIs, that's what I've been doing with Qt framework. And, I lack C++ experience, that's why I come here, maybe I'm missing something simple here, or something I just don't know yet !
I'm here to learn, because I don't want to make a "terrible API", just like you said in your pep talk... ;)
Anyway, I hope that this answer helps you to understand a little more my problem!

(C++) serialize object of class from external library

Is there any way to serialize (or, more generally to save to file) an object from a class that I can't modify?
All of the serialization approaches that I've found so far require some manner of intrusion, at the very least adding methods to the class, adding friends etc.
If the object is from a class in a library that I didn't write myself, I don't want to have to change the classes in that library.
More concretely: I'm developing a module-based application which also involves inter-module communication (which is as simple as: one module, the source, is the owner of some data, and another module, the destination, has a pointer to that data, and there's a simple updating/hand shaking protocol between the modules). I need to be able to save the state of the modules (which data was last communicated between the modules). Now, it would be ideal if this communication could handle any type, including ones from external libraries. So I need to save this data when saving the module state.
In general, there is no native serialization in C++, at least according to the wikipedia article on the topic of serialization.
There's a good FAQ resource on this topic here that might be useful if you haven't found it already. This is just general info on serialization, though; it doesn't address the specific problem you raise.
There are a couple of (very) ugly approaches I can think of. You've likely already thought of and rejected these, but I'm throwing them out there just in case:
Do you have access to all the data members of the class? Worst case, you could read those out into a c-style structure, apply your compiler's version of "packed" to keep the sizes of things predictable, and get a transferable binary blob that way.
You could cast a pointer to the object as a pointer to uint8_t, and treat the whole thing like an array. This will get messy if there are references, pointers, or a vtable in there. This approach might work for POD objects, though.

Efficient ways to save and load data from C++ simulation

I would like to know which are the best way to save and load C++ data.
I am mostly interested in saving classes and matrices (not sparse) I use in my simulations.
Now I just save them as txt files, but if I add a member to a class I then have to modify the function that loads the data (it has to parse and check for the value in the txt file),
that I think is not ideal.
What would you recommend in general? (p.s. as I'd like to release my code I'd really like to use only standard c++ or libraries that can be redistributed).
In this case, there is no "best." What is best for you is highly dependent upon your situation. But, lets have an example to get you thinking about your details and how deep this rabbit hole can go.
If you absolutely positively must have the fastest save possible without question (and you're willing to pay the price), you can define your own memory management to put all objects into a contiguous array of a common type (such as integers). This allows you to write that array to disk as binary data very rapidly. You might need this in a simulation that uses threads efficiently to load every core/processor to run at real time.
Why is a rather horrible solution? Because it takes a LOT of work and runs many risks for problems in the name of "optimization."
It requires you to build your own memory management (operator new() and operator delete()) which may need to be thread safe.
If you try to load from this array, you will have to placement new all objects with a unique non-modifying constructor in order to ensure all virtual pointers are set properly. Oh, and you have to track the type of each address to now how to do this.
For portability with other systems and between versions of the binary, you will need to have utilities to convert from the binary format to something generic enough to be cross platform (including repopulating pointers to other objects).
I have done this. It was highly unpleasant. I have no doubt there are still problems with it and I have only listed a few here. But, it was very, very fast and very, very, very problematic.
You must design to your needs. Generally, the first need is "Make it work." Don't care about efficiency, just about something that accurately persists and that you have the information known and accessible at some point to do it. Also, you should encapsulate the process of saving and loading. Then, if the need "Make it better" steps in, you should be able to change that one bit of code and the rest should work. You might even make the saving format selectable on user needs instead of your needs which you must assume for all users.
Given all the assumptions, pros and cons listed, you should be able to elaborate your particular needs for this question.
Given that performance is not your concern -- which is a critical part of the answer -- the Boost Serialization library is a great answer.
The link in the comment leads to the documentation. Read the tutorial (which is overkill for what you are initially wanting, but well worth it).
Finally, since you have mostly array matrices, try to encapsulate the entire process of save and load so that should you need to change it later, you are writing a new implementatio and choosing between the exisiting. I expend the eddedmtime for the smarts of Boost Serialization would not be great; however, you might find a future requirement moves you to something else or multiple something elses.
The C++ Middleware Writer automates the creation of marshalling functions. When you add a member to a class, it updates the marshalling functions for you.

Object Reflection

Does anyone have any references for building a full Object/Class reflection system in C++ ?
Ive seen some crazy macro / template solutions however ive never found a system which solves everything to a level im comfortable with.
Thanks!
Using templates and macros to automatically, or semi-automatically, define everything is pretty much the only option in C++. C++ has very weak reflection/introspection abilities. However, if what you want to do is mainly serialization and storage, this has already been implemented in the Boost Serialization libraries. You can do this by either implementing a serializer method on the class, or have an external function if you don't want to modify the class.
This doesn't seem to be what you were asking though. I'm guessing you want something like automatic serialization which requires no extra effort on the part of the class implementer. They have this in Python, and Java, and many other languages, but not C++. In order to get what you want, you would need to implement your own object system like, perhaps, the meta-object system that IgKh mentioned in his answer.
If you want to do that, I'd suggest looking at how JavaScript implements objects. JavaScript uses a prototype based object system, which is reasonably simple, yet fairly powerful. I recommend this because it seems to me like it would be easier to implement if you had to do it yourself. If you are in the mood for reading a VERY long-winded explanation on the benefits and elegance of prototypes, you can find an essay on the subject at Steve Yegge's blog. He is a very experienced programmer, so I give his opinions some credence, but I have never done this myself so I can only point to what others have said.
If you wanted to remain with the more C++ style of classes and instances instead of the less familiar prototypes, look at how Python objects and serialization work. Python also use a "properties" approach to implementing its objects, but the properties are used to implement classes and inheritance instead of a prototype based system, so it may be a little more familiar.
Sorry that I don't have a simpler answer to your question! But hopefully this will help.
I'm not entirely sure that I understood you intention, however the Qt framework contains a powerful meta object system that lets you do most operation expected from a reflection a system: Getting the class name as string, checking if a object is a instance of a given type, listing and invoking methods, etc.
I've used ROOT's Reflex library with good results. Rather than using crazy macro / template solutions like you described, it processes your C++ header files at build time to create reflection dictionaries then operates off of those.

XML Representation of C++ Objects

I'm trying to create a message validation program and would like to create easily modifiable rules that apply to certain message types. Due to the risk of the rules changing I've decided to define these validation rules external to the object code.
I've created a basic interface that defines a rule and am wondering what the best way to store this simple data would be. I was leaning towards XML but it seems like it might be too heavy.
Each rule would only need a very small set of data (i.e. type of rule, value, applicable mask, etc).
Does anyone know of a good resource that I could look at that would perform a similar functionality. I'd rather not dig too deep into XML on a problem that seems to barely need a subset of the functionality I see in most of the examples I bump into.
If I can find a concise example to examine I would be able to decide on whether or not to just go with a flat file.
Thanks in advance for your input!
Personally, for small, easily modifiable XML, I find TinyXML to be an excellent library. You can make each class understand it's own format, so your object hierarchy is represented directly in the XML.
However, if you don't think you need XML, you might want to go with a lighter storage like yaml. I find it is much easier to understand the underlying data, modify it and extend functionality.
(Also, boost::serialization has an XML archive, but it isn't what I'd call easily modifiable)
The simplest is to use a flat file designed to be easy to parse using the C++ >> operator. Just simple tokens separated by whitespace.
Well, if you want your rules to be human readable, XML is the way to go, and you can interface it nicely with c++ using xerces. If you want performance and or size, you could save the data as binaries using simple structs.
Another way to implement this would be to define your rules in XML Schema and then have an XML Data Binding tool generate the corresponding C++ object model along with the XML parsing and serialization code. One such tool (that I happen to be working on) is CodeSynthesis XSD:
http://www.codesynthesis.com/products/xsd/
For a 2-minutes overview of the idea, see the "Hello World" example in the C++/Tree mapping documentation.