Script to binary conversion & Object serialization - c++

All the code I write (C++ or AS3) is heavily scripted (JSON or XML). My problem is that parsing can be very slow at times, especially with less powerful devices like mobiles.
Here is an example of a Flash script of mine:
<players class="fanlib.gfx.TSprite" vars="x=0|y=-50|visible=Bool:true">
<player0 class="fanlib.gfx.TSprite" vars="x=131|y=138">
<name class="fanlib.text.TTextField" format="Myriad Pro,18,0xffffff,true,,,,,center,,,,0" alignX="center" alignY="bottom" filters="DropShadow,2" vars="background=Bool:false|backgroundColor=0|embedFonts=Bool:true|multiline=Bool:false|mouseEnabled=Bool:false|autoSize=center|text=fae skata|y=-40"/>
<avatar class="fanlib.gfx.FBitmap" alignX="center" alignY="center" image="userDefault.png"/>
<chip class="fanlib.gfx.FBitmap" alignX="center" alignY="center" image="chip1.png" vars="x=87|y=68"/>
<info class="fanlib.text.TTextField" format="Myriad Pro,18,0xffffff,true,,,,,center,,,,0" alignX="center" alignY="top" filters="DropShadow,2" css=".win {color: #40ff40}" vars="y=40|background=Bool:false|backgroundColor=0|embedFonts=Bool:true|multiline=Bool:false|mouseEnabled=Bool:false|autoSize=center"/>
</player0>
<player1 class="Copy:player0" vars="x=430|y=70">
<chip class="Child:chip" image="chip2.png" vars="x=-82|y=102"/>
</player1>
<player2 class="Copy:player0" vars="x=778|y=70">
<chip class="Child:chip" image="chip3.png" vars="x=88|y=103"/>
</player2>
<player3 class="Copy:player0" vars="x=1088|y=137">
<chip class="Child:chip" image="chip4.png" vars="x=-111|y=65"/>
</player3>
<player4 class="Copy:player0" vars="x=1088|y=533">
<chip class="Child:chip" image="chip5.png" vars="x=-88|y=-23"/>
</player4>
<player5 class="Copy:player0" vars="x=585|y=585">
<chip class="Child:chip" image="chip6.png" vars="x=82|y=-54"/>
</player5>
<player6 class="Copy:player0" vars="x=117|y=533">
<chip class="Child:chip" image="chip7.png" vars="x=85|y=-26"/>
</player6>
</players>
The script above creates "native" (as in "non-dynamic") Flash objects. TSprite is a Sprite descendant, FBitmap inherits from Bitmap etc. At 71KBs, it takes tens of seconds to be parsed on my Sony XPeria.
Instead of optimizing the parser (which wouldn't probably gain too much anyway) I am contemplating converting my scripts to binaries, so that scripts will be used for debugging and the finalized binaries for release-built code.
One question is, how does one handle pointers from one object to another when serializing them? How are pointers translated from memory to disk file-friendly format,then back to memory?
Another question is, what about "nested" objects? In Flash for example, an object can be a graphics container of other objects. Could such a state be serialized? Or must objects be saved separately and, when loaded from disk, added to their parents through the nesting functions (i.e. addChild etc...)?
If possible, I would prefer generic guidelines that could apply to languages as different as C++ or AS3.

As far as I've understood your idea is to save some time by replacing creation of the objects from some fixture script (xml/json) to deserializing (from binary) previously serialized objects. If that's the case I beleive that you've taken wrong approach to solve this issue.
Since you've asked for general guidelines I'll try to explain my reasoning, not delving too deeply into language-specific details. Note that I'll talk about the common case and there may be exceptions from it. There is no silver bullet and you should analyze your scenario to pick the best solution for it.
From one point of view creating set of objects based on fixture/script is not that different from deserializing objects from binary. In the end they both are about turning some "condensed" state into objects that you can use. Yes, it is true that binaries are usually smaller in size but formats like JSON have not that many overhead in commomon case (XML is usually more redundant though). In general you won't save much time/memory on deserializing this state from binary instead of parsing it from script. Here is a real world example from somthing I worked with: Mental Ray is a de-facto standard for rendering 3d scenes\special effects in a movie industry. It uses textual file format to represent scenes that is somewhat similar to JSON in many aspects. Mental Ray is heavy on computations, performance is one of the key issues here, yet it lives perfectly fine without binary scene file format. So, analyzing this aspect you can say that there is no substantial difference between these 2 approaches.
From another point of view, there is a difference that may come into play. While deserializing object implies only creating object and loading state into it's fields, creating object from script may also include some extra initialization on top of that. So, in some cases there may be benefit from using deserialization approach.
However, in the end I would argue that it is not a good idea to simply replase your scripted objects to serialized objects because scripting and serialization are conceptually different things and have different purposes (though they do have something in common). Using serialization approach you'll loose flexibility in modifying your fixtured state (it is usually much harder for humans to edit binaries instead of JSON/XML) as well as ability to do initialization work.
So, think about what you actually need in your scenario and stick with it. That's the best way.
Now, if it happen to be that you actually need your objects to be scripted but this approach is not fast enough I would investigate speeding it up in one of 2 ways:
Analyze whether it is possible to restructure your data the way it will take less time to load. This is not always possible, however, it might be worth trying it.
Analyze what else your scripting engine does to init object except of simply creating them and load state into their fields and try to optimize it. This approach is actually has the most potential since this is the only part that has substantional difference in terms of performance (between scripting and deserialization approach) and does not lead to misuse of concepts. Try to see whether you are able to reduce this amount of work, needed to init object. It may be a good idea to tailor something more specific for your needs if you are currently using some generic scripting engine\framework at the moment.
Now, answering your original questions...
how does one handle pointers from one object to another when serializing them?
References are headache most of serialization implementations does not mess with.
One of the approaches is to use something to identify object during serialization (pointer, for example), serialize object preserving this this identity, and store references from another objects to this object not as primitive type, but as a reference type (basically saving identity). When deserializing - keep track of all deserialized objects and reuse them when deseializing reference typed field.
How are pointers translated from memory to disk file-friendly format,then back to memory?
It is a rarity when serialization deals with raw memory. This approach is only good for primitive types and does not work with poiners/references well. Languages that support reflection\introspection usually use it to inspect values of the fields to serialize objects. If we talk about pure C++, where reflection is poor - there is no other reliable way except to make object itself define means to serialize itself to bytestream and use these methods.
Another question is, what about "nested" objects? In Flash for example, an object can be a graphics container of other objects. Could such a state be serialized? Or must objects be saved separately and, when loaded from disk, added to their parents through the nesting functions (i.e. addChild etc...)?
If we talk about deserialization - these should probably be treated like references (see answer above). Using methods like addChild is not a good idea since it can contain some logic that will mess things.
Hope this answers your questions.

You should realty take a look at Adobe Remote Object.
Typically using serialization could cost you problems, such as:
I have a serialized object from application version 2.3 and now at the new version 2.4 it been modified: property removed / property added. This makes my serialized object unparsable.
While developing a serialization protocol that will support cross platform, you may actually wish to kill yourself while debugging. I remember myself doing this and I spent hours to find out that my flash using Big Indian and C# using Small Indian.
Adobe solved those problems for you, they created a nice binary protocol called AMF - Action Message Format. It have many implementations on various platforms that can communicate with your actions script.
Here you may find some C++ implementations.

Related

(C++) serialize object of class from external library

Is there any way to serialize (or, more generally to save to file) an object from a class that I can't modify?
All of the serialization approaches that I've found so far require some manner of intrusion, at the very least adding methods to the class, adding friends etc.
If the object is from a class in a library that I didn't write myself, I don't want to have to change the classes in that library.
More concretely: I'm developing a module-based application which also involves inter-module communication (which is as simple as: one module, the source, is the owner of some data, and another module, the destination, has a pointer to that data, and there's a simple updating/hand shaking protocol between the modules). I need to be able to save the state of the modules (which data was last communicated between the modules). Now, it would be ideal if this communication could handle any type, including ones from external libraries. So I need to save this data when saving the module state.
In general, there is no native serialization in C++, at least according to the wikipedia article on the topic of serialization.
There's a good FAQ resource on this topic here that might be useful if you haven't found it already. This is just general info on serialization, though; it doesn't address the specific problem you raise.
There are a couple of (very) ugly approaches I can think of. You've likely already thought of and rejected these, but I'm throwing them out there just in case:
Do you have access to all the data members of the class? Worst case, you could read those out into a c-style structure, apply your compiler's version of "packed" to keep the sizes of things predictable, and get a transferable binary blob that way.
You could cast a pointer to the object as a pointer to uint8_t, and treat the whole thing like an array. This will get messy if there are references, pointers, or a vtable in there. This approach might work for POD objects, though.

Efficient ways to save and load data from C++ simulation

I would like to know which are the best way to save and load C++ data.
I am mostly interested in saving classes and matrices (not sparse) I use in my simulations.
Now I just save them as txt files, but if I add a member to a class I then have to modify the function that loads the data (it has to parse and check for the value in the txt file),
that I think is not ideal.
What would you recommend in general? (p.s. as I'd like to release my code I'd really like to use only standard c++ or libraries that can be redistributed).
In this case, there is no "best." What is best for you is highly dependent upon your situation. But, lets have an example to get you thinking about your details and how deep this rabbit hole can go.
If you absolutely positively must have the fastest save possible without question (and you're willing to pay the price), you can define your own memory management to put all objects into a contiguous array of a common type (such as integers). This allows you to write that array to disk as binary data very rapidly. You might need this in a simulation that uses threads efficiently to load every core/processor to run at real time.
Why is a rather horrible solution? Because it takes a LOT of work and runs many risks for problems in the name of "optimization."
It requires you to build your own memory management (operator new() and operator delete()) which may need to be thread safe.
If you try to load from this array, you will have to placement new all objects with a unique non-modifying constructor in order to ensure all virtual pointers are set properly. Oh, and you have to track the type of each address to now how to do this.
For portability with other systems and between versions of the binary, you will need to have utilities to convert from the binary format to something generic enough to be cross platform (including repopulating pointers to other objects).
I have done this. It was highly unpleasant. I have no doubt there are still problems with it and I have only listed a few here. But, it was very, very fast and very, very, very problematic.
You must design to your needs. Generally, the first need is "Make it work." Don't care about efficiency, just about something that accurately persists and that you have the information known and accessible at some point to do it. Also, you should encapsulate the process of saving and loading. Then, if the need "Make it better" steps in, you should be able to change that one bit of code and the rest should work. You might even make the saving format selectable on user needs instead of your needs which you must assume for all users.
Given all the assumptions, pros and cons listed, you should be able to elaborate your particular needs for this question.
Given that performance is not your concern -- which is a critical part of the answer -- the Boost Serialization library is a great answer.
The link in the comment leads to the documentation. Read the tutorial (which is overkill for what you are initially wanting, but well worth it).
Finally, since you have mostly array matrices, try to encapsulate the entire process of save and load so that should you need to change it later, you are writing a new implementatio and choosing between the exisiting. I expend the eddedmtime for the smarts of Boost Serialization would not be great; however, you might find a future requirement moves you to something else or multiple something elses.
The C++ Middleware Writer automates the creation of marshalling functions. When you add a member to a class, it updates the marshalling functions for you.

C++ Boost.serialization vs simple load/save

I am computational scientist that work with large amount of simulation data and often times I find myself saving/loading data into/from the disk. For simple tasks, like a vector, this is usually as simple as dumping bunch of numbers into a file and that's it.
For more complex stuff, life objects and such, I have save/load member functions. Now, I'm not a computer scientist, and thus often times I see terminologies here on SO that I just do not understand (but I love to). One of these that I've came across recently is the subject of serialization and Boost.Serialization library.
From what I understand serialization is the simply the process of converting your objects into something that can be saved/loaded from dist or be transmitted over a network and such. Considering that at most I need to save/load my objects into/from disk, is there any reason I should switch from the simple load/save functions into Boost.Serialization? What would Boost.Serialization give me other than what I'm already doing?
That library takes into accounts many details that could be non very apparent from a purely 'applicative' point of view.
For instance, data portability WRT big/little numeric endianess, pointed data life time, structured containers, versioning, non intrusive extensions, and more. Moreover, it handles the right way the interaction with other std or boost infrastructure, and dictates a way of code structuring that will reward you with easier code maintenance. You will find ready to use serializers for many (all std & boost ?) containers.
And consider if you need to share your data with someone other, there are chances that referring to a published, mantained, and debugged schema will make things much easier.

How can I decrease complexity in library without increasing complexity elsewhere?

I am tasked to maintain and update a library which allows a computer to send commands at a hardware device and then receive its response. Currently the code is setup in such a way that every single possible command the device can receive is sent via its own function. Code repetition is everywhere; a DRY advocate's worst nightmare.
Obviously there is much opportunity for improvement. The problem is each command has a different payload. Currently the data that is to be the payload is passed to each command function in the form of arguments. It's difficult to consolidate functionality without pushing the complexity to a level that calls the library.
When a response is received from the device its data is put into an object of a class solely responsible for holding this data, they do nothing else. There are hundreds of classes which do this. These objects are then used to access the returned data by the app layer.
My objectives:
Throughly reduce code repetition
Maintain similiar level of complexity at application layer
Make it easier to add new commands
My idea:
Have one function to send a command and one to receive (the receiving function is automatically called when a response from the device is detected). Have a struct holding all command/response data which will be passed to sending function and returned by receiving function. Since each command has a corresponding enum value, have a switch statement which sets up any command specific data for sending.
Is my idea the best way to do it? Is there a design pattern I could use here? I've looked and looked but nothing seems to fit my needs.
Thanks in advance! (Please let me know if clarification is necessary)
This reminds me of the REST vs. SOA debate, albeit on a smaller physical scale.
If I understand you correctly, right now you have calls like
device->DoThing();
device->DoOtherThing();
and then sometimes I get a callback like
callback->DoneThing(ThingResult&);
callback->DoneOtherTHing(OtherThingResult&)
I suggest that the user is the key component here. Do the current library users like the interface at the level it is designed? Is the interface consistent, even if it is large?
You seem to want to propose
device->Do(ThingAndOtherThingParameters&)
callback->Done(ThingAndOtherThingResult&)
so to have a single entry point with more complex data.
The downside from a library user perspective may that now I have to use a manual switch() or other type statement to tell what really happened. While the dispatching to the appropriate result callback used to be done for me, now you have made it a burden upon the library user.
Unless this bought me as a user some level of flexibility, that I as as user wanted I would consider this a step backwards.
For your part as an implementor, one suggestion would be to go to the generic form internally, and then offer both interfaces externally. Perhaps the old specific interface could even be auto-generated somehow.
Good Luck.
Well, your question implies that there is a balance between the library's complexity and the client's. When those are the only two choices, one almost always goes with making the client's life easier. However, those are rarely really the only two choices.
Now in the text you talk about a command processing architecture where each command has a different set of data associated with it. In the olden days, this would typically be implemented with a big honking case statement in a loop, where each case called a different routine with different parameters and perhaps some setup code. Grisly. McCabe complexity analysers hate this.
These days what you can do with an OO language is use dynamic dispatch. Create a base abstract "command" class with a standard "handle()" method, and have each different command inherit from it to add their own members (to represent the different "arguments" to the different commands). Then you create a big honking array of these at startup, usually indexed by the command ID. For languages like C++ or Ada it has to be an array of pointers to "command" objects, for the dynamic dispatch to work. Then you can just call the appropriate command object for the command ID you read from the client. The big honking case statement is now handled implicitly by the dynamic dispatch.
Where you can get the big savings in this scenario is in subclassing. Do you have several commands that use the exact same parameters? Make a subclass for them, and then derive all of those commands from that subclass. Do you have several commands that have to perform the same operation on one of the parameters? Make a subclass for them with that one method implemented for that operation, and then derive all those commands from that subclass.
Your first objective should be to produce a library that decouples higher software layers from the hardware. Users of your library shouldn't care that you have a hardware device that can execute a number of functions with a different payload. They should only care what the device does in a higher level. In this sense, it is in my opinion a good thing that every command is mapped to each one function.
My plan will be:
Identify the objects the higher data layers need to get the job done. Model the objects in C++ classes from their perspective, not from the perspective of the hardware
Define the interface of the library using the above objects
Start the implementation of the library. Perhaps an intermediate layer that maps software objects to hardware objects is necessary
There are many things you can do to reduce code repetition. You can use polymorphism. Define a class with the base functionality and extend it. You can also use utility classes, that implement functions needed for many commands.

What's a pattern for getting two "deep" parts of a multi-threaded program talking to each other?

I have this general problem in design, refactoring or "triage":
I have an existing multi-threaded C++ application which searches for data using a number of plugin libraries. With the current search interface, a given plugin receives a search string and a pointer to a QList object. Running on a different thread, the plugin goes out and searches various data sources (locally and on the web) and adds the objects of interest to the list. When the plugin returns, the main program, still on the separate thread, adds this data to the local data store (with further processing), guarding this insertion point using a mutex. Thus each plugin can return data asynchronously.
The QT-base plugin library is based on message passing. There are a fair number of plugins which are already written and tested for the application and they work fairly well.
I would like to write some more plugins and leverage the existing application.
The problem is that the new plugins will need more information from the application. They will to need intermittent access to the local data store itself as they search. So to get this, they would need direct or indirect access both the hash array storing the data and the mutex which guards multiple access to the store. I assume the access would be encapsulated by adding an extra method in a "catalog" object.
I can see three ways to write these new plugins.
When loading a plugin, pass them
a pointer to my "catalog" at the
start. This becomes an extra,
"invisible" interface for the new
plugins. This seems quick, easy,
completely wrong according to OO but
I can't see what the future problems would be.
Add a method/message to the
existing interface so I have a
second function which could be
called for the new plugin libraries,
the message would pass a pointer to
the catalog to the plugins. This
would be easy for the plugins but it
would complicate my main code and
seems generally bad.
Redesign the plugin interface.
This seems "best" according to OO,
could have other added benefits but
would require all sorts of
rewriting.
So, my questions are
A. Can anyone tell me the concrete dangers of option 1?
B. Is there a known pattern that fits this kind of problem?
Edit1:
A typical function for calling the plugin routines looks like:
elsewhere(spec){
QList<CatItem> results;
plugins->getResult(spec, &results);
use_list(results);
}
...
void PluginHandler::getResults(QString* spec, QList<CatItem>* results)
{
if (id->count() == 0) return;
foreach(PluginInfo info, plugins) {
if (info.loaded)
info.obj->msg(MSG_GET_RESULTS, (void*) spec, (void*) results);
}
}
It's a repeated through-out the code. I'd rather extend it than break it.
Why is it "completely wrong according to OO"? If your plugin needs access to that object, and it doesn't violate any abstraction you want to preserve, it is the correct solution.
To me it seems like you blew your abstractions the moment you decided that your plugin needs access to the list itself. You just blew up your entire application's architecture. Are you sure you need access to the actual list itself? Why? What do you need from it? Can that information be provided in a more sensible way? One which doesn't 1) increase contention over a shared resource (and increase the risk of subtle multithreading bugs like race conditions and deadlocks), and 2) doesn't undermine the architecture of the rest of the app (which specifically preserves a separation between the list and its clients, to allow asynchronicity)
If you think it's bad OO, then it is because of what you're fundamentally trying to do (violate the basic architecture of your application), not how you're doing it.
Well, option 1 is option 3, in the end. You are redesigning your plugin API to receive extra data from the main app.
It's a simple redesign that, as long as the 'catalog' is well implemented and hide every implementation detail of your hash and mutex backing store, is not bad, and can serve the purpose well enough IMO.
Now if the catalog leaks implementation details then you would better use messages to query the store, receiving responses with the needed data.
Sorry, I just re-read your question 3 times and I think my answer may have been too simple.
Is your "Catalog" an independent object? If not, you could wrap it as it's own object. The Catalog should be completely safe (including threadsafe)--or better yet immutable.
With this done, it would be perfectly valid OO to pass your catalog to the new plugins. If you are worried about passing them through many layers, you can create a factory for the catalog.
Sorry if I'm still misunderstanding something, but I don't see anything wrong with this approach. If your catalog is an object outside your control, however, such as a database object or collection then you really HAVE to encapsulate it in something you can control with a nice, clean interface.
If your Catalog is used by many pieces across your program, you might look at a factory (which, at it's simplest degrades to a Singleton). Using a factory you should be able to summon your Catalog with a Catalog.getType("Clothes"); or whatever. That way you are giving out the same object to everyone who wants one without passing it around.
(this is very similar to a singleton, by the way, but coding it as a factory reminds you that there will almost certainly be more than one--also remember to allow a Catalog.setType("Clothes", ...); for testing.