How to generate C++ library with xerces for specific XML - c++

I've gone through this xerces C++ tutorial, which shows how you might write a nice C++ class that allows you to access your data from the XML using simple function calls. The problem is that 200 lines of C++ seems like excessive amount of work just to grab a couple pieces of data from an XML file. I am hoping to find something that will take in my XML file and spit out C++. I have tried to search for solutions online to generate this for me but I can't find anything.

Related

Parsing a C++ file to a XML file

I am looking to find some parsers that will help me converting a C++ file that describes a test case into a XML file.
I have found one parser named GCC-XML but I didn't find any other one. I need a parser that will convert everything in my C+ file
Has someone ever used a C++ to XML parser ?
If you're talking about just data members of the c++ files then I personally like using tinyXML for outputing data to an XML file. If you mean literally you want some format of your entire c++ file in an XML form then I apologize for being unable to give you a good answer. TinyXML has a pretty simple set of functions to use if that is what you're looking for, and shouldn't take more than 10 - 20 minutes to learn how to efficiently output your data.

xmlReadFile() (C++ Ubuntu) core dumps on broken XML

I am using the libxml2 libraries to parse XML sent to me (my program) as a file from another program. With care that should mean that I never get bad XML, but twice already I've made hand tweaks that broke the XML in the received file. By broken I mean that the elements have errors, end tags not matching start tags, random characters in between tags, etc.
The file is small so there are no particular memory worries about loading all of it into the parser, so I use xmlReadFile() to read in the doc.
My problem comes when the XML is broken. xmlReadFile() does an abend and core dumps. I can't catch it with an exception nor does setting the flag to "recover" work.
I've looked at Google with minimum success. I found xmllint, but I really would like not to have to call system() or popen() every time I get a new XML file. I looked at DTDs but can't seem to figure out how to tell a DTD to actually validate the value passed in a . (Many of the tags in the doc have values that are one of a set of, say, 5 possible answers.) Of course, if DTD worked I at least wouldn't crash the xmlReadFile().
Any suggestions on how to validate the XML before xmlReadFile() or with xmlReadFile() and how to prevent the crashes? Does xmllint have a C++ interface that I just haven't found?
No boost. No changing libraries.
Have you tried xmlReaderForFile(... XML_PARSE_RECOVER ...) ?

High performance XML parsing in C++

Well a lot of questions have been made about parsing XML in C++ and so on...
But, instead of a generic problem, mine is very specific.
I am asking for a very efficient XML parser for C++. In particular I have a VERY VERY BIG XML file to parse.
My application must open this file and retrieve data. It must also insert new nodes and save the final result in the file again.
To do this I used, at the beginning, rapidxml, but it requires me to open the file, parse it all (all the content because this lib has no functions to access the file directly without loading the entire tree first), then edit the tree, modify it and store the final tree on the file by overwriting it... It consumes too much resources.
Is there an XML parser that does not require me to load the entire file, but that I can use to insert, quickly, new nodes and retrieve data? Can you please indicate solutions for this problem of mine?
You want a streaming XML parser rather than what is called a DOM parser.
There are two types of streaming parsers: pull and push. A pull parser is good for quickly writing XML parsers that load data into program memory. A push parser is good for writing a program to translate one document to another (which is what you are trying to accomplish). I think, therefore, that a push parser would be best for your problem.
In order to use a push parser, you need to write what is essentially an event handler for parsing events. By "parsing event", I mean events like "start tag reached", "end tag reached", "text found", "attribute parsed", etc.
I suggest that as you read in the document, you write out the transformed document to a separate, temporary file. Thus, your XML parsing event handlers will need to be written so that they are stateful and write out the XML of the translated document incrementally.
Three excellent push parser libraries for C++ include Expat, Xerces-C++, and libxml2.
Search for "SAX parser". They are mostly tokenizers, i.e. they emit tag by tag without building a tree.
SAX parsers are faster than DOM parsers because DOM parsers read the entire file into memory before building an in-memory representation of the XML document, whereas a SAX parser behaves like an event listener and builds the document as it reads in the file. Go here for an explanation.
As you mentioned Xerces is a good C++ SAX parser.
I would recommend looking into ways of breaking the XML document into smaller XML documents as that seems to be part of your problem.
Okay, here is one off the beaten track, I looked at this, but haven't really used it myself, it's called asmxml. These boys claim performance bar none, downside, you need x86 assembler.
If you really seek high performance XML stream parser then libhpxml is likely the right thing for you.
I’m convinced that no XML library exists that allows you to modify a file without loading it first. This simply isn’t possible because files don’t work that way: you cannot insert (or remove) in the middle of a file. You can only overwrite a block of identical size, or append at the end. But your request would require to append or remove in the middle of the file.
Reading only parts of an XML file may be possible. But writing … no way.
Go for template libraries as much as possible, like Boost::property_tree or Boost::XMLParser or POCO::XML and Folly has XML Parser in it.
Avoid old C libraries, it all old code designs.
someone say QtXML module is high performance for huge XML files.

How to start using xml with C++

(Not sure if this should be CW or not, you're welcome to comment if you think it should be).
At my workplace, we have many many different file formats for all kinds of purposes. Most, if not all, of these file formats are just written in plain text, with no consistency. I'm only a student working part-time, and I have no experience with using xml in production, but it seems to me that using xml would improve productivity, as we often need to parse, check and compare these outputs.
So my questions are: given that I can only control one small application and its output (only - the inputs are formats that are used in other applications as well), is it worth trying to change the output to be xml-based? If so, what are the best known ways to do that in C++ (i.e., xml parsers/writers, etc.)? Also, should I also provide a plain-text output to make it easy for the users (which are also programmers) to get used to xml? Should I provide a script to translate xml-plaintext? What are your experiences with this subject?
Thanks.
Don't just use XML because it's XML.
Use XML because:
other applications (that only accept XML) are going to read your output
you have an hierarchical data structure that lends itself perfectly for XML
you want to transform the data to other formats using XSL (e.g. to HTML)
EDIT:
A nice personal experience:
Customer: your application MUST be able to read XML.
Me: Er, OK, I will adapt my application so it can read XML.
Same customer (a few days later): your application MUST be able to read fixed width files, because we just realized our mainframe cannot generate XML.
Amir, to parse an XML you can use TinyXML which is incredibly easy to use and start with. Check its documentation for a quick brief, and read carefully the "what it does not do" clause. Been using it for reading and all I can say is that this tiny library does the job, very well.
As for writing - if your XML files aren't complex you might build them manually with a string object. "Aren't complex" for me means that you're only going to store text at most.
For more complex XML reading/writing you better check Xerces which is heavier than TinyXML. I haven't used it yet I've seen it in production and it does deliver it.
You can try using the boost::property_tree class.
http://www.boost.org/doc/libs/1_43_0/doc/html/property_tree.html
http://www.boost.org/doc/libs/1_43_0/doc/html/boost_propertytree/tutorial.html
http://www.boost.org/doc/libs/1_43_0/doc/html/boost_propertytree/parsers.html#boost_propertytree.parsers.xml_parser
It's pretty easy to use, but the page does warn that it doesn't support the XML format completely. If you do use this though, it gives you the freedom to easily use XML, INI, JSON, or INFO files without changing more than just the read_xml line.
If you want that ability though, you should avoid xml attributes. To use an attribute, you have to look at the key , which won't transfer between filetypes (although you can manually create your own subnodes).
Although using TinyXML is probably better. I've seen it used before in a couple of projects I've worked on, but don't have any experience with it.
Another approach to handling XML in your application is to use a data binding tool, such as CodeSynthesis XSD. Such a tool will generate C++ classes that hide all the gory details of parsing/serializing XML -- all that you see are objects corresponding to your XML vocabulary and functions that you can call to get/set the data, for example:
Person p = person ("person.xml");
cout << p.name ();
p.name ("John");
p.age (30);
ofstream ofs ("person.xml");
person (ofs, p);
Here's what previous SO threads have said on the topic. Please add others you know of that are relevant:
What is the best open XML parser for C++?
What is XML good for and when should i be using it?
What are good alternative data formats to XML?
BTW, before you decide on an XML parser, you may want to make sure that it will actually be able to parse all XML documents instead of just the "simple" ones, as discussed in this article:
Are you using a real XML parser?

xsd-based code generator to build xml?

I have a schema (xsd), and I want to create xml files that conform to it.
I've found code generators that generate classes which can be loaded from an xml file (CodeSynthesis). But I'm looking to go the other direction.
I want to generate code that will let me build an object which can easily be written out as an xml file. In C++. I might be able to use Java for this, but C++ would be preferable. I'm on solaris, so a VisualStudio plugin won't help me (such as xsd2code).
Is there a code generator that lets me do this?
To close this out: I did wind up using CodeSynthesis. It worked very well, as long as I used a single xsd as its source. Since I actually had two xsds (one imported the other), I had to manually merge them (they did some weird inheritance that needed manual massaging).
But yes, Code Synthesis was the way to go.