c/c++ XML library question - c++

I know that a lot of c/c++ XML library questions have been asked already (I tried to read through all of them before getting to this).
Here are the things I'm going to need in my own project:
Excellent performance
SAX2
Validation
Open source
Cross platform
I was going to use Xerces-C, but I see that a simple SAX2 setup with nothing going on in the filter is taking 5 seconds to run. (Perhaps I'm doing something wrong here?)
I would like to use libxml++, but as I tried to get it set up on my MacBook, there were some crazy dependencies that took me all the way back to gtk-doc, at which point I sort of tabled the idea.
So now I'm at libxml2. Is this the way to go? Have I missed an important option, bearing in mind the five requirements above? I don't mind using a (good) c-library like libxml2, but a c++ interface would be nice. (I don't like Xerces-C's API very much.)
I am willing to bend on the SAX2 requirement if comparable functionality is available.

Having spent a goodly amount of time on this same problem, it was my conclusion that libxml2 is the best option available under your guidelines. The C interface is not too difficult to use and it's very fast.
There are some other good options for commercial libraries, but most of the other comparable open-source options are either painfully slow or are mired in a deep, annoying vat of dependency soup.

You say you need these things in your project, but don't give any idea of the pipeline. For example, we had a whole load of static XML files which needed to be loaded quickly, but only validated rarely. So validated using a separate process in batch (using RelaxNG as it was human writable markup ) and loaded the XML using expat. The system also used XMPP, so checked streaming input, but that didn't require validating against a schema (partly because it was streamed, and mostly because most of the possible errors were not expressible in a schema).

If you need a whole host of other facilities, you can consider Qt, which has good XML support. Be warned though, it's WAY more than an XML processing library; it's a full blown application framework with support for GUIs, networking and a whole host of other things.
Qt
You can also try Poco. It's another application framework, but not as huge as Qt (i.e. no GUI-related things etc.)
Poco
Lastly, if you don't mind a C library, you can use Expat. It's not SAX per se, but writing code using Expat is somewhat like SAX. It has C++ wrappers, but they're not officially part of the project IIRC, and may not be as well-maintained or designed. I'm not too sure though.
Expat
Hope this helps!
EDIT: I misread your original post: not too sure about the validation features of these libraries, I've never used them before.

Related

Deciding on a language/framework for a modular OpenCV application

What's this about?
We have a C++ application dealing with image processing and computer vision on videos using OpenCV, we're going to rewrite it from scratch and need some help deciding what technologies to use. More specifically, I need help on how to choose the technology I'd use.
About the app
The app's functionality is divided in modules that are called in an order defined by a configuration XML file and can also be changed in runtime, but not in realtime (i.e. the application doesn't need to close, but the processing will start from scratch). These modules share data in a central datapool.
Why are we starting from scratch?
This application wasn't planned to be as dynamic as it currently strives to be, so it's grown to be a collection of buggy patches, macros and workarounds; it's now full of memory leaks, unnecessary QT dependencies, slow conversions between QT and OpenCV image formats and compilation and testing times have grown too much.
Language choice
The original code used C++, just because the guy who originally started the project only knew C++. This may be a good choice, because we need it to be as fast as possible, but there may be better choices to account for the dynamic nature of the application.
We're limited by the languages supported by OpenCV (C++, Java and Python mainly; although I've read there is also support for Ruby, Ch, C# and any JVM language)
What is needed
Speed: We're aiming for realtime tracking. This may rule out Python and Ruby.
Class Instantiation by name: Although our C++ macros and class registration system work, a language designed to be dynamic that has it's own runtime would be nice. Maybe Objective-C++, or Java.
What would be ideal
Module/Plugin/Extension/Component Framework: Why reinvent the wheel, using a good framework for this would let us focus on what's special about our app. There are many options here. Objective-C has it's NSBundles; C++ has libraries like Boost.Extension, Pluma, DynObj, FxEngine, etc; C has C-Pluff; I'd even say there are too many options.
Runtime class loading and reloading: From a developing point of view, it would be interesting to be able to compile and reload just one module. I've seen this done in via code injection in Objective-C and using Java's reflection.
What am I missing?
I have too many interesting options!
Here's where I need help, based on your experiences in modular app development, with this constraints, what kind of language/framework feature should I be looking for?
What question should I make myself about this project that would let me narrow my search?
Edit
I hadn't noticed that OpenCV had GPU bindings only for C++, so I'm stuck with it.
Now that the language is fixed, the search has narrowed a lot. I could use Objective-C++ to get the dynamism needed (Obj-C runtime + NSBundle from Cocoa/GnuStep/Cocotron), which sounds complicated; or C++ with a framework.
So I'll now narrow my question to:
Is using NSBundle in a crossplatform way with Objective-C++ easier than it sounds?
What C++ framework will provide me with hot-swappable modules?
The main reason for swapping modules in runtime is to be able to change code in a fast way, would Runtime-Compiled C++ be a better solution?
Meta: I did my research on how to ask a question like this, I hope it's acceptable.
"What question should I make myself about this project that would let me narrow my search?"
if you need gpu support(cuda/ocl), your only choice is c++.
you can safely discard C, as it won't be supported in the near future
have no fear of python, even if you need direct pixel access, that's all numpy arrays (running c-code again)
i'd be a bit sceptical of ruby, c# ch and the like, since those bindings are community based, and might not be up to date / maintained properly, while the java & python bindings are machine - generated from the c++ api, and are part of the official distribution.
If you're looking for portability and have large memory for disposal then you can go with Java.
The performance hit between C++ and Java is not that bad. For conversion between Mat and other image format I'm still not sure, coz it needs deep copy to perform that, so if your code can display the image in openCV native format then you can fasten the application
pro :
You can stop worrying about memory leak
The project is much more portable compared to C/C++(this can be wrong if you can avoid using primitive datatypes which size is non consistent and for example always use int*_t in C)
cons:
slower than C/C++
more memory and CPU clock needed
http://www.ibm.com/developerworks/java/library/j-jtp09275/index.html

C++ code/XML generation tools

I'm not sure what exactly the right term is, kind of like ORM using XML as the data store. Are there any decent tools which will autogenerate C++ classes (including data and serialization/deserialization) based on an XML schema? Or will create XML-sync code and schema based on a C++ class definition?
TinyXML is great but it's so old-school to spend all that time writing code to load/save XML data to classes. I've seen similar tools focused on SOAP/WSDL, but they generated all kinds of other code on top of the basics.
Any good open-source libraries out there?
The only thing I've seen that attempts to do this is CodeSynthesisXSD.
If you are looking for an open source and commercial licensed tool to auto-generate C++ classes, including data and serialization/deserialization, based on an XML schema, then I strongly recommend GSOAP. It is easy to use, compliant to industry standards, and actively maintained.
See also http://www.rpbourret.com/xml/XMLDataBinding.htm
I was disappointed with many other C++ XML tools that promise full data bindings but will fail to process more extensive sets of WSDLs and schemas such as ONVIF. Having to retool an entire project was a pain. I know that GSOAP will do the job. A winner IMHO.
Not open source, but won't XML Thunder work for you?

Best XML Library in C++, Fast Set-Up

I was wondering what is the best XML Library in C++ (I'm using Visual Studio), considering fast set-up is critical. Basically, I want to create a file to save annotations on various .avi files.
Thank you in advance.
You should be able to get TinyXML set up and working in a matter of minutes.
TinyXML is simple enough for almost all your use (if you don't bother having the whole xml representation in memory) but other libraries offer better important features :
RapidXML is made to be really really fast. It's used in the boost::property_tree library for the xml file read/write features. If you already use boost, using directly boost::property_tree might be a good idea, if adequate, as you already can easily use it with it's simple interface.
pugiXML has been mentionned as a good replacement for RapidXML by someone on the boost mailing list, but I'm not aware of the differences.
Xerces-C++ is made to allow you high level manipulations on xml like validation using xsd files -- but is really heavy on both speed and memory size...
wrappers around classic C xml libraries (like LibXML2) might be interesting choice if you don't find what you're looking for with the previous ones...
I've used XercesC++ in the past and it was relatively painless to get working and working with.
I'm currently using MSXML and it is painful.

What Linux Full Text Indexing Tool Has A Good C++ API?

I'm looking to add full text indexing to a Linux desktop application written in C++. I am thinking that the easiest way to do this would be to call an existing library or utility. This article reviews various open source utilities available for the Gnome and KDE desktops; metatracker, recoll and stigi are all written in C++ so they each seem reasonable. But I cannot find any notable documentation on how to use them as libraries or through an API. I could, instead, use something like Clucene or Xapian, which are generic full text indexing libraries. They seem more straightforward but if I used them, I'd have to implement my own indexing daemon, an unappealing prospect.
Also, Xesam seems to be the latest thing, does anyone have any evidence that it works?
So, does anyone have experience using any of the applications or libraries? How did you use it and what documentation was useful?
I used CLucene, which you mentioned (and also Lucene.NET), and found it to be pretty good.
There's also Strigi which AFAIK works with Xesam and is the default used in KDE.
After further looking around, I found and worked with Recol. It believe that it has the best C++ interface to a full text search engine, in this case Xapian.
It is important to realize that clucene and Xapian are both highly complex libraries designed primarily for multi-user server applications. Cutting them down to a level appropriate for a client-system is not easy. If I remember correctly, Strigi has a complex, pure C interface which isn't adapted.
Clucene also doesn't seem to be that actively maintained currently and Xapian seems to be maintained. But the thing is the existence of recol, which allows you to index particular files without the massive, massive setup that raw Xapian or clucene requires - creating your own "stemming" set is not normally desirable, etc.

Best XML serialization library for a MFC C++ app

I have an application, written in C++ using MFC and Stingray libraries. The application works with a wide variety of large data types, which are all currently serialized based on MFC Document/View serialize derived functionality. I have also added options for XML serialization based on the Stingray libraries, which implements DOM via the Microsoft XML SDK. While easy to implement the performance is terrible, to the extent that it is unusable on anything other than very small documents.
What other XML serialization tools would you folks recommend for this scenario. I don't want DOM, as it seems to be a memory hog, and I'm already dealing with large in memory data. Ideally, i'd like a streaming parser that is fast, and easy to use with MFC. My current front runner is expat which is fast and simple, but would require a lot of class by class serialization code to be added. Any other efficient and easier to implement alternatives out there that people would recommend?
The Boost Serialization library supports XML. This library basically consists in:
Start from the principles of MFC serialization and take all the good things it provides.
Solve every single issue of MFC serialization!
Among the improvements compared to MFC is support for XML.
Note that you don't necessarily control the XML schema of this serialization. It uses its own schema.
This is an age old problem. I was the team lead of the development team with the most critical path dependencies on the largest software project in the world during 1999 and 2000 and this very issue was the focus of my work during that time. I am convinced that the wheel was invented by multiple engineers who were unaware that others had already invented it. The same is true of XML Data binding in C++. I invented it too, and I've been perfecting it for over 10 years on various projects. I have a solution that addresses the issues noted here and some additional issues that repeatedly arise:
XML Updates. This is the ability to re-apply a subset of XML into an existing object model. In many cases the XML is bound to indexed objects and we cannot afford to re-index for each update.
COM and CORBA interface management. In the same respect that the XML Data Binding can be automated through object oriented practices - so can the instances of interface objects that provide that data to the application layer.
State Tracking. The application often needs to distinguish between an empty value vs. a missing value - both create an empty string. This provides the validation along with Data Binding.
The source code uses the least restrictive license - less so that GPL. The project is supported and managed from here:
http://www.codeproject.com/KB/XML/XMLFoundation.aspx
Now that it's the year 2010, I believe that nobody else will attempt to reinvent the wheel because there are a few to choose from. IMHO - this wheel is the most polished and well rounded implementation available.
Enjoy.
A good solution would be libxml. It provides lightweight SAX parsing and data structures for XML processing. There are several DOM libraries which are built on top of libxml.
Unfortunatly it is a C library, but C++ wrappers are available.
A few years ago I switched from MSXML to libxml because of the performance issues you mentioned.
If you decide to use libxml, you should also take a look at libxslt.
We use Xerces-C++. It was easy to setup and performance is good enough so we don't need to think about changing. However we aren't XML heavy.
I did listen to a podcast by Scott Hanselman (from Hansel Minutes) where they discuss the XML performance of MSXML and XSLT.
what about RapidXML, I am using it in an MFC app with some modification to support UTF-16 with std::string. I am quite satisfied with it so far.
The gSOAP toolkit auto-serializes native C and C++ data to/from XML and supports the full XML schema specification through XML data bindings:
gSOAP SourceForge Project
It has evolved since 1999 to a significant code base with code generation tools and libraries. It supports many databinding and customization features, which is especially critical for mapping XML schema types to/from the C and C++ types. It can serialize any C/C++ type and also STL containers, container templates, and cyclic data structures. It has been used in the W3C Schema Patterns for Databinding working group (with 100% schema pattern coverage success since years). There is an active open source user base and the gSOAP development functionality has been used in many industrial projects and Fortune 100 companies to develop SOAP/XML infrastructures.
This is late in the game, I just want to mention that we also use LIBXML. It's robust and reliable, and has worked well. A little bit too low-level, you'll want to build some wrappers on top of its functions.
For instance, you'll get a different sequence of function returns depending on whether you have this:
<tag attribute="value"/>
or this:
<tag attribute="value"> </tag>
Sometimes you may want that, sometimes you don't care.
We use TinyXML for all our XML needs be it MFC or straight C++.
http://sourceforge.net/projects/tinyxml