Light weight C++ SAX XML parser - c++

I know of at least three light weight C++ XML parsers: RapidXML, TinyXML and PugiXML. However, all three use a DOM based interface (ie, they build their own in-memory representation of the XML document and then provide an interface to traverse and manipulate it). For most situations that I have to deal with, I much prefer the SAX interface (where the parser just spits out a stream of events like start-of-tag, and the application code is responsible for doing whatever it wants based on those events).
Can anyone recommend a light weight C++ XML library with a SAX interface?
Edit: I should also note the Microsoft XmlLite library, which does use a SAX interface (well, actually a "pull" interface which is possibly even better). Unfortunately, it's ruled out for me at the moment since as far as I know it's closed source and Windows only (please correct me if I'm wrong on this).

I've used expat when I needed to parse XML. It's very light-weight (well, it used to be; it's a while since I've done XML stuff) and does the job.

you can try https://github.com/thinlizzy/die-xml . it seems to be very small and easy to use
this is a recently made C++0x XML SAX parser open source and the author is willing feedbacks
it parses an input stream and generates events on callbacks compatible to std::function
the stack machine uses finite automata as a backend and some events (start tag and text nodes) use iterators in order to minimize buffering, making it pretty lightweight

PugiXML and RapidXML do not have DOM conforming interfaces.. those API came with severe limitations on functionalities and conformance. You might want to investigate VTD-XML that is signifiantly more advanced than either DOM or SAX/Pull

Related

Streaming/progressive C++ XML creation library?

I'm looking for an XML library that writes out the XML stream as it goes. I've looked at TinyXML, pugixml, etc. and it seems these only write the stream when the entire DOM is built in memory. I want a library that will write each object as soon as all children and attributes are available. Is there such a thing?
The word you're looking for is SAX.
Xerces is one such C++ SAX library. If you're in the MS world then MSXML supports SAX2 too.
I wrote my own library in the end. I'm willing to share the source if it's of interest to anyone - it's a little clunky and minimal though.

XML file Generator

Any recommendations for XML generators in C++?
There are quite a few XML generators for C++. Some of them work with DOM, others can serialize your classes, and yet others work in even more different ways, like Boost.PropertyTree. Whichever you should choose, depends entirely on your requirements.
If you need to write a small set of data to an XML file (and may also want to write this data to other formats in the end), consider using Boost.PropertyTree. If you want to serialize C++ classes to XML, or make C++ class representation of XSD Schemas, consider using a binding generator such as CodeSynthesis XSD. And if you just want to manipulate the XML directly, you can use a DOM parser/writer like the cross-platform Xerces C++.
MSXML is a sensible option if you're limiting your application to Windows.
Xerces could prove useful if you're wanting to write code that can be ported to other platforms.
I'm sure there are a few that exist already, but if one were inclined to implement their own it would be an easy task using Boost.Spirit.Karma.

Single file non-validating xml parser/reader

I'm looking for a simple non-validating XML parser in either C or C++.
Several years back I found one that was just a single file solution but I can't find
it anymore.
I'm after some links and suggested ones that are very small and lightweight
ideally suited for an embedded platform.
Expat
You can work with or without validation and in "streaming mode". It is very lightweight.
What about something like pugixml. From their site...
pugixml is a light-weight C++ XML
processing library. It features:
DOM-like interface with rich traversal/modification capabilities
Extremely fast non-validating XML parser which constructs the DOM
tree from an XML file/buffer
XPath 1.0 implementation for complex data-driven tree queries
Full Unicode support with Unicode interface variants and
automatic encoding conversions
The library is extremely portable and
easy to integrate and use.
pugixml is developed and maintained
since 2006 and has many users. All
code is distributed under the MIT
license, making it completely free to
use in both open-source and
proprietary applications.
Also, this answer has more info.
There is also tinyxml and RapidXml.
There is definitely a pure C, tiny xml parser available. It was cited in an earlier answer on SO, but I can't find it right now. If I remember right, it's just a few hundred lines of code.
Update: Here's the question/answer that references it:
Is there a good tiny XML parser for an embedded C project?
And the actual code:
http://mercurial.intuxication.org/hg/cstuff/file/tip/tinyxml
RapidXML is a single-header (multiple headers if you want extra functionality) ultra-lightweight, ultra-fast implementation. It can operate in "destructive" mode, that means by setting pointers right into the XML and possibly overwriting some, avoiding all extra memory allocations and data copies.
tinyxml is not precisely single-header, but it is still fairly lightweight compared to other parsers. I've used it for half a decade without ever encountering an issue. The author has recently started with "tinyxml-2", which is supposedly better and even more lightweight, but I've not had occasion to actually try that one yet.
http://mercurial.intuxication.org/hg/cstuff/file/tip/tinyxml
can this parser work with nested XML like
<CServiceType>
<serno>61</serno>
<caption1 />
<caption2>Satelite</caption2>
<caption3 />
</CServiceType>

Any native XML DOM on Windows with control over memory allocation?

I'm looking to replace MSXML with a library that will allow us to use DOM processing but using our own allocation, so we can assure it is mapped directly onto a memory-mapped file. This avoids having to synch the DOM back to the file. Can anyone please suggest which of the various libraries out there is most likely to be easily customised in this manner.
We are using simple XPaths as well as hierarchical DOM navigation. As an secondary preference we would like it to have an API close to the .Net DOM classes, to keep application code similar.
I am quite capable of customising or wrapping libraries if necessary, having written expatpp the OO wrapper for expat. In benchmarks, it seems RapidXML and LibXML2 are ahead of expat in performance and include DOM code which I'd otherwise have to write. Another contender is pugixml.
It sounds like RapidXML is close to what I need already, from this comment (in the manual) nodes and attributes do not own the text of their names and values. This is because normally they only store pointers to the source text.
Have a look also at pugixml's manual, especially at Custom memory allocation/deallocation functions
Pugixml has XPath support and is actively maintained.

Best XML serialization library for a MFC C++ app

I have an application, written in C++ using MFC and Stingray libraries. The application works with a wide variety of large data types, which are all currently serialized based on MFC Document/View serialize derived functionality. I have also added options for XML serialization based on the Stingray libraries, which implements DOM via the Microsoft XML SDK. While easy to implement the performance is terrible, to the extent that it is unusable on anything other than very small documents.
What other XML serialization tools would you folks recommend for this scenario. I don't want DOM, as it seems to be a memory hog, and I'm already dealing with large in memory data. Ideally, i'd like a streaming parser that is fast, and easy to use with MFC. My current front runner is expat which is fast and simple, but would require a lot of class by class serialization code to be added. Any other efficient and easier to implement alternatives out there that people would recommend?
The Boost Serialization library supports XML. This library basically consists in:
Start from the principles of MFC serialization and take all the good things it provides.
Solve every single issue of MFC serialization!
Among the improvements compared to MFC is support for XML.
Note that you don't necessarily control the XML schema of this serialization. It uses its own schema.
This is an age old problem. I was the team lead of the development team with the most critical path dependencies on the largest software project in the world during 1999 and 2000 and this very issue was the focus of my work during that time. I am convinced that the wheel was invented by multiple engineers who were unaware that others had already invented it. The same is true of XML Data binding in C++. I invented it too, and I've been perfecting it for over 10 years on various projects. I have a solution that addresses the issues noted here and some additional issues that repeatedly arise:
XML Updates. This is the ability to re-apply a subset of XML into an existing object model. In many cases the XML is bound to indexed objects and we cannot afford to re-index for each update.
COM and CORBA interface management. In the same respect that the XML Data Binding can be automated through object oriented practices - so can the instances of interface objects that provide that data to the application layer.
State Tracking. The application often needs to distinguish between an empty value vs. a missing value - both create an empty string. This provides the validation along with Data Binding.
The source code uses the least restrictive license - less so that GPL. The project is supported and managed from here:
http://www.codeproject.com/KB/XML/XMLFoundation.aspx
Now that it's the year 2010, I believe that nobody else will attempt to reinvent the wheel because there are a few to choose from. IMHO - this wheel is the most polished and well rounded implementation available.
Enjoy.
A good solution would be libxml. It provides lightweight SAX parsing and data structures for XML processing. There are several DOM libraries which are built on top of libxml.
Unfortunatly it is a C library, but C++ wrappers are available.
A few years ago I switched from MSXML to libxml because of the performance issues you mentioned.
If you decide to use libxml, you should also take a look at libxslt.
We use Xerces-C++. It was easy to setup and performance is good enough so we don't need to think about changing. However we aren't XML heavy.
I did listen to a podcast by Scott Hanselman (from Hansel Minutes) where they discuss the XML performance of MSXML and XSLT.
what about RapidXML, I am using it in an MFC app with some modification to support UTF-16 with std::string. I am quite satisfied with it so far.
The gSOAP toolkit auto-serializes native C and C++ data to/from XML and supports the full XML schema specification through XML data bindings:
gSOAP SourceForge Project
It has evolved since 1999 to a significant code base with code generation tools and libraries. It supports many databinding and customization features, which is especially critical for mapping XML schema types to/from the C and C++ types. It can serialize any C/C++ type and also STL containers, container templates, and cyclic data structures. It has been used in the W3C Schema Patterns for Databinding working group (with 100% schema pattern coverage success since years). There is an active open source user base and the gSOAP development functionality has been used in many industrial projects and Fortune 100 companies to develop SOAP/XML infrastructures.
This is late in the game, I just want to mention that we also use LIBXML. It's robust and reliable, and has worked well. A little bit too low-level, you'll want to build some wrappers on top of its functions.
For instance, you'll get a different sequence of function returns depending on whether you have this:
<tag attribute="value"/>
or this:
<tag attribute="value"> </tag>
Sometimes you may want that, sometimes you don't care.
We use TinyXML for all our XML needs be it MFC or straight C++.
http://sourceforge.net/projects/tinyxml