XML Representation of C++ Objects - c++

I'm trying to create a message validation program and would like to create easily modifiable rules that apply to certain message types. Due to the risk of the rules changing I've decided to define these validation rules external to the object code.
I've created a basic interface that defines a rule and am wondering what the best way to store this simple data would be. I was leaning towards XML but it seems like it might be too heavy.
Each rule would only need a very small set of data (i.e. type of rule, value, applicable mask, etc).
Does anyone know of a good resource that I could look at that would perform a similar functionality. I'd rather not dig too deep into XML on a problem that seems to barely need a subset of the functionality I see in most of the examples I bump into.
If I can find a concise example to examine I would be able to decide on whether or not to just go with a flat file.
Thanks in advance for your input!

Personally, for small, easily modifiable XML, I find TinyXML to be an excellent library. You can make each class understand it's own format, so your object hierarchy is represented directly in the XML.
However, if you don't think you need XML, you might want to go with a lighter storage like yaml. I find it is much easier to understand the underlying data, modify it and extend functionality.
(Also, boost::serialization has an XML archive, but it isn't what I'd call easily modifiable)

The simplest is to use a flat file designed to be easy to parse using the C++ >> operator. Just simple tokens separated by whitespace.

Well, if you want your rules to be human readable, XML is the way to go, and you can interface it nicely with c++ using xerces. If you want performance and or size, you could save the data as binaries using simple structs.

Another way to implement this would be to define your rules in XML Schema and then have an XML Data Binding tool generate the corresponding C++ object model along with the XML parsing and serialization code. One such tool (that I happen to be working on) is CodeSynthesis XSD:
http://www.codesynthesis.com/products/xsd/
For a 2-minutes overview of the idea, see the "Hello World" example in the C++/Tree mapping documentation.

Related

How should I document a Lua API/object model written in C++ code?

I am working on documenting a new and expanded Lua API for the game Bitfighter (http://bitfighter.org). Our Lua object model is a subset of the C++ object model, and the methods exposed to Lua that I need to document are a subset of the methods available in C++. I want to document only the items relevant to Lua, and ignore the rest.
For example, the object BfObject is the root of all the Lua objects, but is itself in the middle of the C++ object tree. BfObject has about 40 C++ methods, of which about 10 are relevant to Lua scripters. I wish to have our documentation show BfObject as the root object, and show only those 10 relevant methods. We would also need to show its children objects in a way that made the inheritance of methods clear.
For the moment we can assume that all the code is written in C++.
One idea would be to somehow mark the objects we want to document in a way that a system such as doxygen would know what to look at and ignore the rest. Another would be to preprocess the C++ code in such a way as to delete all the non-relevant bits, and document what remains with something like doxygen. (I actually got pretty far with this approach using luadoc, but could not find a way to make luadoc show object hierarchy.)
One thing that might prove helpful is that every Lua object class is registered in a consistent manner, along with its parent class.
There are a growing number of games out there that use Lua for scripting, and many of them have decent documentation. Does anyone have a good suggestion on how to produce it?
PS To clarify, I'm happy to use any tool that will do the job -- doxygen and luadoc are just examples that I am somewhat familiar with.
I have found a solution, which, while not ideal, works pretty well. I cobbled together a Perl script which rips through all the Bitfighter source code and produces a second set of "fake" source that contains only the elements I want. I can then run this secondary source through Doxygen and get a result that is 95% of what I'm looking for.
I'm declaring victory.
One advantage of this approach is that I can document the code in a "natural" way, and don't need to worry about marking what's in and what's out. The script is smart enough to figure it out from the code structure.
If anyone is interested, the Perl script is available in the Bitfighter source archive at https://code.google.com/p/bitfighter/source/browse/luadoc.pl. It is only about 80% complete, and is missing a few very important items (such as properly displaying function args), but the structure is there, and I am satisfied the process will work. The script will improve with time.
The (very preliminary) results of the process can be seen at http://bitfighter.org/luadocs/index.html. The templates have hardly been modified, so it has a very "stock" look, but it shows that things more-or-less work.
Since some commenters have suggested that it is impossible to generate good documentation with Doxygen, I should note that almost none of our inline docs have been added yet. To get a sense of what they will look like, see the Teleporter class. It's not super good, but I think it does refute the notion that Doxygen always produces useless docs.
My major regret at this point is that my solution is really a one-off and does not address what I think is a growing need in the community. Perhaps at some point we'll standardize on a way of merging C++ and Lua and the task of creating a generalized documentation tool will be more manageable.
PS You can see what the markup in the original source files looks like... see https://code.google.com/p/bitfighter/source/browse/zap/teleporter.cpp, and search for #luaclass
Exclude either by namespace (could be class as well) of your C++ code, but not the lua code
EXCLUDE_SYMBOLS = myhier_cpp::*
in the doxygen config file or cherry pick what to exclude by using
/// #cond
class aaa {
...
...
}
/// #endcond
in your c++ code.
I personally think that separating by namespace is better since it reflects the separation in code + documentation, which leads to a namespace based scheme for separation of pure c++ from lua bindings.
Separating via exclusion is probably the most targeted approach but that would involve an extra tool to parse the code, mark up relevant lua parts and add the exclusion to the code. (Additionally you could also render special info like graphs separately with this markup and add them via an Image to your documentation, at least that's easy to do with Doxygen.). Since there has to be some kind of indication of lua code, the markup is probably not too difficult to derive.
Another solution is to use LDoc. It also allows you to write C++ comments, which will be parsed by LDoc and included into the documentation.
An advantage is that you can just the same tool to document your lua code as well. A drawback is that the project seems to be unmaintained. It may also not be possible to document complex object hierarchies, like the questioner mentioned.
I forked it myself for some small adjustments regarding c++. Have a look here.

XML library optimized for big XML with memory constraints

I need to handle big XML files, but I want to make relatively small set of changes to it. I also want the program to adhere strict memory constraints. We must never use more than, say, 300Mb of ram.
Is there a library that allows me not to keep all the DOM in memory, and parse the XML on the go, while I traverse the DOM?
I know you can do that with call-back based approach, but I don't want that. I want to have my cake and eat it too. I want to use the DOM API, but to parse each element lazily, so that existing code that use the DOM API won't have to change.
There are two possible approaches I thought of for this problem:
Parse the lazily XML, each call to getChildren() will parse the next bit of XML.
Parse the entire XML tree, but cache whatever you're not using right now on the disk.
Two of the approaches are acceptable, is there an existing solution.
I'm looking for a native solution, but I'll be interested with hearing about libraries in other languages.
It sounds like what you want is something similar to the Streaming API for XML (StAX).
While it does not use the standard DOM API, it is similar in principle to your "getChildren()" approach. It does not have the memory overheads of the DOM approach, nor the complexity of the callback (SAX) approach.
There are a number of implementations linked on the Wikipedia page for StAX most of which are for Java, but there are a couple for C++ too - Ambiera irrXML and Llamagraphics LlamaXML.
edit: Since you mention "small changes" to the document, if you don't need to use the document contents for anything else, you might also consider Streaming Transformations for XML (STX) (described in this XML.com introduction to STX). STX is to XSLT something like what SAX/StAX is to DOM.
I want to use the DOM API, but to parse each element lazily, so that existing code that use the DOM API won't have to change.
You want a streaming DOM-style API? Such a thing generally does not exist, and for good reason: it would be difficult if not impossible to make it actually work.
XML is generally intended to be read one-way: from front to back. What you're suggesting would require being able to random-access an XML file.
I suppose you could do something where you build a table of elements, with file offsets pointing to where that element is in the file. But at that point, you've already read and parsed the file more or less. Unless most of your data is in text elements (which is entirely possible), you may as well be using a DOM.
Really, you would be much better off just rewriting your existing code to use an xmlReader or SAX-style API.
How to do streaming transformations is a big, open, unsolved problem. There are numerous partial solutions, depending on what restrictions you are prepared to accept. Current releases of Saxon-EE, for example, have the capability to do some XSLT transformations in a streaming fashion: see http://www.saxonica.com/html/documentation/sourcedocs/streaming.html. Also, as already mentioned, there is STX (though implementations are not especially mature).
Your title suggests you want to write the transformation in C++. That's severely limiting, because it pretty well means the programmer has to cope with the complexities rather than leaving it to the transformation engine. You can of course hand-code streaming transformations using SAX-like or StAX-like parser APIs, but both are hard work, and each case will need to be approached from scratch.
Google for "streaming XML transformation"

Python-style pickling for C++?

Does anyone know of a "language level" facility for pickling in C++? I don't want something like Boost serialization, or Google Protocol Buffers. Instead, something that could automatically serialize all the members of a class (with an option to exclude some members, either because they're not serializable, or else because I just don't care to save them for later). This could be accomplished with an extra action at parse time, that would generate code to handle the automatic serialization. Has anyone heard of anything like that?
I don't believe there's any way to do this in a language with no run-time introspection capabilities.
something that could automatically
serialize all the members of a class
This is not possible in C++. Python, C#, Java et al. use run-time introspection to achieve this. You can't do that in C++, RTTI is not powerful enough.
In essence, there is nothing in the C++ language that would enable someone to discover the member variables of an object at run-time. Without that, you can't automatically serialize them.
There's the standard C++ serialization with the << and >> operators, although you'll have to implement these for each of your classes (which it sounds like you don't want to do). Some practitioners say you should alway implement these operators, although of course, most of us rarely do.
perhaps xml Data Binding? gsoap is just one of many options. You can automatically generate code for mapping between data structure and xml schema. Not sure that setting this up would be easier than other options you mention
One quick way to do this that I got working once when I needed to save a struct to a file was to cast my struct to a char array and write it out to a file. Then when I wanted to load my struct back in, I would read the entire file (in binary mode), and cast the whole thing to my struct's type. Easy enough and exploits the fact that structs are stored as a contiguous block in memory. I wouldn't expect this to work with convoluted data structures or pointers, though, but food for thought.

Converting from XML to a C++ Object

I'm working on a C++ project, and wanted to get some inputs from developers with similar experience.
The task is to connect to a web service which gives the results in an XML form. My role in the task is once I receive the XML form, I need to convert the XML into a C++ object and parse the XML data to the C++ object.
Following are my clarifications.
a) One way is to handcraft the whole thing but I need to do this for around hundreds of web services. I am aware there are simpler tools for C# and Java to do the same.
Is there a tool/utility for C++ too?
Any suggestions, would be helpful.
In the past, I've used TinyXML for my XML parsing needs. My parsing code operated under the assumption that all XML input conforms to a particular XSD schema I wrote. It worked fairly well but the ripple effects were annoying - if I wanted to change the XSD, I had to update all my XML test files as well as my parsing code. While it's not so bad in the case of parsing one schema, I'd hate to have to do it for hundreds of them.
I'm not sure what the common solution is, but CodeSynthesis XSD sounds pretty promising. I haven't used it, but it appears that it generates a data layer, a parser and serialisation code for you. Could save you a lot of time.
If you're asking if there's a way to dynamically create an object representation of an XML data stream (such that you can access it like topLevel.subObject.value), it's not possible. C++ is a statically-typed language, which means all objects need to be defined a compile time. The best you could do is something like: xmlData.getSubObject("objectName").getValue().
As for toolsets for parsing into something usable dynamically (as per my later example), there are several. For Windows, for example, you could use the "built-in" MSXML objects. There's nothing in the base C++ libraries to do so, however, as far as I am aware.
Hope that helps.

Using strings with "general purpose" XML in WS - good or bad?

We're working now on the design of a new API for our product, which will be exposed via web services. We have a dispute whether we should use strict parameters with well defined types (my opinion) or strings that will contain XML in whatever structure needed. It is quite obvious that ideally using a strict signature is safer, and it will allow our users to use tools like wsdl2java. OTOH, our product is developing rapidly, and if the parameters of a service will have to be changed, using XML (passed as a string or anyType - not complex type, which is well defined type) will not require the change of the interface.
So, what I'm asking for is basically rule of thumb recommendations - would you prefer using strict types or flexible XML? Have you had any significant problems using either way?
Thanks,
Eran
I prefer using strict types. That gives you access to client tools that make that end of the job much easier. You also state that if the messaging changes, the string approach will not require changing the interface. Personally, I see this as a disadvantage, not an advantage. If the interface changes, you will know very quickly which clients need to be updated.
Strings containing XML is an extremely bad idea and asking for trouble. Use messages that have a defined schema.I had to rewrite significant portions of an app that used a lot of XML internally instead of types. It was horribly slow and impossible to figure out what was happening.