Suggestion on C++ object serialization techniques - c++

I'm creating a C++ object serialization library. This is more towards self-learning and enhancements & I don't want to use off-the-shelf library like boost or google protocol buf.
Please share your experience or comments on good ways to go about it (like creating some encoding with tag-value etc).
I would like to start by supporting PODs followed by support to non-linear DSs.
If you need serialization for inter process communication, then I suggest to use some interface language (IDL or ASN.1) for defining interfaces.
So it will be easier to make support for other languages (than C++) too. And also, it will be easier to implement code/stub generator.

I have been working on something similar for the last few months. I couldn't use Boost because the task was to serialize a bunch of existing classes (huge existing codebase) and it was inappropriate to have the classes inherit from the interface which had the serialize() virtual function (we did not want multiple inheritance).
The approach taken had the following salient features:
Create a helper class for each existing class, designated with the task of serializing that particular class, and make the helper class a friend of the class being serialized. This avoids introduction of inheritance in the class being serialized, and also allows the helper class access to private variables.
Have each of the helper classes (let's call them 'serializers') register themselves into a global map. Each serializer class implements a clone() virtual function ('prototype' pattern), which allows one to retrieve a pointer to a serializer, given the name of the class, from this map. The name is obtained by using compiler-specific RTTI information. The registration into the global map is taken care of by instantiating static pointers and 'new'ing them, since static variables get created before the program starts.
A special stream object was created (derived from std::fstream), that contained template functions to serialize non-pointer, pointer, and STL data types. The stream object could only be opened in read-only or write-only modes (by design), so the same serialize() function could be used to either read from the file or write into the file, depending on the mode in which the stream was opened. Thus, there is no chance of any mismatch in the order of reading versus writing of the class members.
For every object being saved or restored, a unique tag (integer) was created based on the address of the variable and stored in a map. If the same address occurred again, only the tag was saved, not the deep-copied object itself. Thus, each object was deep copied only once into the file.
A page on the web captures some of these ideas shared above: Hope that helps.

I wrote an article some years ago. Code and tools can be obsolete, but concepts can remain the same.
May be this can help you.



DECLARE_DYNCREATE provides exactly the same feature of DECLARE_DYNAMIC along with its dynamic object creation ability. Then why should anyone use DECLARE_DYNAMIC instead of DECLARE_DYNCREATE?
The macros are documented to provide different functionality.
Adds the ability to access run-time information about an object's class when deriving a class from CObject.
This provides the functionality for introspection, similar to RTTI (Run-Time Type Information) provided by C++. An application can query a CObject-derived class instance for its run-time type through the associated CRuntimeClass Structure. It is useful in situations where you need to check that an object is of a particular type, or has a specific base class type. The examples at CObject::IsKindOf should give you a good idea.
Enables objects of CObject-derived classes to be created dynamically at run time.
This macro enables dynamic creation of class instances at run-time. The functionality is provided through the class factory method CRuntimeClass::CreateObject. It can be used when you need to create class instances at run-time based on the class type's string representation. An example would be a customizable GUI, that is built from an initialization file.
Both features are implemented through the same CRuntimeClass Structure, which may lead to the conclusion that they can be used interchangeably. In fact, code that uses an inappropriate macro will compile just fine, and expose the desired run-time behavior. The difference is purely semantic: The macros convey different intentions, and should be used according to the desired features, to communicate developer intent.
There's also a third related macro, DECLARE_SERIAL:
Generates the C++ header code necessary for a CObject-derived class that can be serialized.
It enables serialization of respective CObject-derived class instances, for example to a file, memory stream, or network socket. Since the deserialization process requires dynamic creation of objects from the serialized stream, it includes the functionality of DECLARE_DYNCREATE.
Put together, the following list should help you pick the right macro for your specific scenarios:
Use DECLARE_DYNAMIC if your code needs to retrieve an object's run-time type.
Use DECLARE_DYNCREATE if, in addition, you need to dynamically create class instances based on the type's string representation.
Use DECLARE_SERIAL if, in addition, you need to provide serialization support.
You're asking "why buy a Phillips screwdriver when I own a flathead?" The answer is that you should use the tool that suits your needs: if you need to drive only flathead screws, don't buy a Phillips driver. Otherwise, buy one.
If you need the features provided by DECLARE_DYNCREATE (e.g. because you're creating a view that's auto-created by the framework when a document is opened) then you should use DECLARE_DYNCREATE and if you don't and DECLARE_DYNAMIC works, you should use it.

Setting a function from one object to called by another object in arduino library

I'm a little new to writing in C so I hope I'm not to far off base here.
I'm working on a library to handle the control of multiple types of LED ICs. There are a ton of different types of RGB Pixel libraries each with their own unique naming, but all really perform the same basic actions. A "strip" or "strand" object is created, each pixel gets a color value set, and the strip then gets updated.
My library handles getting pixel color values from software in the back ground and providing the user with the most recent values from an array belonging to the object.
What I would like is to allow the user to initiate their LED strip object and pass a reference to that object to my library, and then allow them to pass their objects "setPixelColor()" function and "UpdateStrip()" function to the library as well. If this is achievable then I believe my library could then handle all of light control operations for any given PixelLibrary.
I believe what I'm looking for is the proper way to pass a functions pointer between objects? Not looking for someone to do this for me, but just looking for directed guidance. Been searching google for while this morning, but I don't know that I'm even using the proper terms. Any advice or guidance would be a big help. Thanks!
Sounds like what you need is a base class or virtual base/interface. You define a class with common data and methods which work across all your LEDs. This common or abstract class defines the common functions. Each of your LED strand types will then inherit the base class/interface and implement the specific functions to set an LED for example.
Using this approach the application code works using the Base class/interface methods treating all the strands the same way.
If you use this approach, I also recommend you create a static factory method which returns a base class/interface pointer after creating the specifically required object.
abstractController=CreateLEDStrandController("Strand Type");//Creates the right object, returns an abstracted base class pointer.
abstractController.SetLEDColor("RED"); //Actually calls the specific object SetLEDColor

Design Pattern for an EEPROM burner

I've built myself a basic EEPROM burner using a Teensy++ 2.0 for my PC bridge, and it's working great, but as I look to expand its compatibility, my code is getting rather hacky. I'm looking for some advice for a proper design for making this code expandable. I've taken a class in software design patterns, but it was awhile ago, and I'm currently drawing a blank. Basically, here's the use case:
I have several methods, such as ReadByte(), WriteByte(), ProgramByte() (for FlashROMs that require a multi-byte write sequence in order to program), EraseChip(), etc. so basically I have an EEPROM pure virtual base class that gets implemented by concrete classes for each chip type that I want to support. The tricky part is determining which chip type object to generate. I'm currently using a pseudo-terminal front-end on the Teensy++ serial input, a basic command-line type interface with parameters, to send options like the chip type to the Teensy++. The question is, is there a design pattern (in C/C++), something like a Factory Pattern, that would take a string input of the chip type (because that's what I'm getting from the user), and return an EEPROM object of the correct derived type, without having to manually create some big switch statement or something ugly like that where I'd have to add the new chip to a list any time I create a new chip derived class? So something like:
public const EEPROM & GetEEPROM(const std::string & id)
and if I pass it the string "am29f032b" it returns a reference to an AM29F032B object, or if I pass it the string "sst39sf040" it returns a reference to an SST39SF040 object, which I could then call the previously mentioned functions on, and it would work for the specified chip.
This code will be run on an AVR microcontroller, so I can't have anything with a huge OOP overhead, but the particular microcontroller I'm using does have a relatively large amount of program flash and work RAM, so it's not like I'm trying to operate in 2kb, but I do have to keep in mind limited resources.
What you're looking for is a Pluggable Factory. There's a good description here. It's attributed to John Vlissides (1 of the Gang of Four) and takes the Abstract factory pattern a few steps further. This pattern happens to also be the architectural foundation of COM.
The usual way of implementing one in C++ is to maintain a static registry of abstract factories. With judicious use of a few templates and static initialisers, you can wrap the whole lot up few lines of boiler-plate which you include in each concrete product (e.g. chip type).
The use of static initialisers allows a complete decoupling of concrete products from both the registry and the code wanting to create products, and has the possibility of implementing each as a plug-in.
You could have a singleton factory manager that keeps a map of string->factory object. Then each factory class would have a global instance that registers itself with the manager on startup.
I try to avoid globals in general, and singletons in particular, but any other approach would require some form of explicit list which you're trying to avoid. You will have to be careful with timing issues (you can't really assume anything about the order in which the various factories will be created).

What is the best design pattern to register data "chunks"?

I have a library which can save/load on disk "chunks" which are POD structs with constant size and unique static CHUNK_ID field. So load looks somethink like this.
void Load(int docId, char* ptr, int type, size_t& size)...
If you want to add new chunk you just add struct with new CHUNK_ID and use Save Load functions to it.
What I want is to force all "chunks" to have functions like PrintHumanReadable, CompareThisTypeOfChunk etc(Ideally program should not compile without such functions). Also I want to mark/register/enumerate all chunk-structs.
I have a few ideas but all of them have problems.
Create base class with pure virtual functions PrintHumanReadable, CompareThisTypeOfChunk.
Problem:breaks pod type and requires library rewriting.
Implement factory which creates chunk struct from CHUNK_ID. Problem: compiles when I add new chunk without required functions.
Could you recomend elegant design solution for my problem?
Implement a simple code generator. You can use something like Mako or Cheetah (both Python libraries). Make a text file containing all the class names, then have the generator build the factory method and a series of methods which aren't really used but which refer to the desired methods in all the classes. This will also make it straightforward to enumerate the classes (again, using generated code).
The proper design pattern for this is called "use Boost.Serialization". It's really the best tool for writing objects to a format and then reading them back later. It can write in text, binary, and even XML formats (and others if you write a proper stream for them). It's can be non-intrusive, so you don't need to modify the objects to serialize them. And so forth.
Once you're using the proper tool for this job, you can then use whatever class hierarchy or other method you like to ensure that the proper functions for an object exist.
If you can't/won't use Boost.Serialization, then you're pretty much stuck with a runtime solution. And since the solution is runtime rather than compile time, there's no way to ensure at compile time that any particular chunk ID has the requisite functions.

Parsing huge data with c++

In my job, i need to parse different kind of data files from different data sources.Sometimes i parse them by writing directly c++ code (with the help of qt and boost:D), sometimes manually with a helper program.
I must note that data types are so different from each other it is so hard to create common a interface for all of them. But i want to do this job in a more generic way.I am planning to write a library to convert them and it should be easy to add new parser utility in future.I am also planning to use other helper programs inside my program, not manually.
My question is what kind of an architecture or pattern do you suggest, Basic condition is library must be extendable via new classes or dll's and also configurable.
By the way data can be in text, ascii or something like CSV(comma seperated values) and most of them are specific for a certain data.
Not to blow my own trumpet, but my small Open Source utility CSVfix has an extensible architecture based on deriving new C++ classes with a very simple interface. I did consider using a plugin-architecture with DLLs but it seemed like overkill for such a simple utility . If interested, you can get the binaries & sources here.
I'd suggest a 3-part model, where the common data-format is a String which should be able to contain every value:
Reader: In this layer the values are read from the source (ie. CSV-file) using some sort of file-format-descriptor. The values are then stored in some sort of intermediate data structure.
Connector/Converter: This layer is responsible for mapping the reader-data to the writer-fields.
Writer: This layer is responsible for writing a specific data structure to the target (ie. another file-format or a database).
This way you can write different Readers for different input files.
I think the hardest part would be creating the definition of the intermediate storage format/structure so that it is future-proof and flexible.
One method I used for defining data structure in my datafile read/write classes is to use std::map<std::string, std::vector<std::string>, string_compare> where the key is the variable name and the vector of strings is the data. While this is expensive in memory, it does not lock me down to only numeric data. And, this method allows for different lengths of data within the same file.
I had the base class implement this generic storage, while the derived classes implemented the reader/writer capability. I then used a factory to get to the desired handler, using another class that determined the file format.