Parsing a xml File and find any errors - c++

I have an configuration xml file which has some values like
<config>
<map>100,1,200,1</map>
<image>abc.bmp</image>
.
.
.
.
</config>
etc.
I imported the file read line by line all are done. I have to validate the fields in file. Like
1. <map> " "</map> is not empty,no junk value,
2. <image>abc**,**bmp</im*E*ge> (spelling mistake)
3. <image>abc.bmp </config> ( missing tags)
I have to develop a unique algorithm so that cant use libs . Is there any idea rather than loading and checking every one character by character?

I'd recommend to use a 3rd party library to implement XML parsing. Getting all the details and pitfalls of XML parsing correct is much harder than you might think.
Your points 2. and 3. will be supported well by any complete XML parser. Point 1. will need either XSL schema definition and a parser that supports schema validation, or you'll need to provide extra validation code manually.
If you're concerned about impact (code/memory usage) you should refer to these lightweight C/C++ XML parsers:
C++ expat (can be used in commercial projects)
TinyXml (can be used in commercial projects)
Other XML parsers
POCO XML
Xerces C++ (provides XSL Schema validation)

Related

XML bindings for Microsoft XMLLite

I have a C++ project in which I am using Microsoft XmlLite for parsing several XML files. Now I have a new file that I need to parse and I have an XSD schema for it. I know there are many C++ XML binding tools out there, but all I have found so far require me to include yet another XML parsing library, which I would like to avoid. Hence my question: is there any open source or commercial tool that generates C++ XML bindings based on Microsoft XmlLite?
CodeSynthesis seems to be the closest tool which will provide in-memory XML data binding to integrate with XMLLite.
The C++/Tree mapping generates C++ classes that represent data types defined in XML Schema, a set of parsing functions that convert XML documents to a tree-like in-memory object model, and a set of serialization functions that convert the object model back to XML.

c++ linux library for creating an xml and reading from an xml (serialize/ deserialize)

I am working in Ubuntu. I have a .h file with a class and a lot of nested classes. I would like to create an XML file from an object. Can someone please give me a library that creates XML files, serializes, and deserializes objects? I am compiling with g++.
Try libxml2.
But it seems like you want to serialize and desirialize an object from and to XML. Boost::serialization might come in handy. it also supports serialization from and to XML.
Here you can find an example for Boost::serialization with XML.
If you want to handle XML in C++ you may have a look at these projects
http://xmlsoft.org/
http://www.grinninglizard.com/tinyxml/
http://xerces.apache.org/xerces-c/
It doesn't serialize with XML (which I consider a feature, personally), but Google protocol buffers does a good job of serializing (in a binary format) objects that are defined in the .proto language.
You may want to explore the XML Data Binding. The main idea is that given an xml schema the data binding software generates a class hierarchy corresponding to the schema, and the code to serialize / unserialize (called marshal / unmarshal). There are several tools that can do this, gsoap is a free one, xmlSpy is one of the commercial ones.
What you describe is an XML data binding for C++. There are several tools for what you want to do, see e.g. XML Data Binding Tools. I've used gSOAP for several C++ projects, including starting from C++ files with classes which is really nice (other tools force you to start from XML schemas or WSDLs). With gSOAP I have been able to generate XML schemas and XML, see e.g. map C/C++ types to XML schema.
A super-lightweight, simple xml library is pugixml.
Though keep in mind that C++ does not have the reflection capabilities that .NET has. No library will generate the serialization/deserialization code for you (which I guess you hoped for).

High performance XML parsing in C++

Well a lot of questions have been made about parsing XML in C++ and so on...
But, instead of a generic problem, mine is very specific.
I am asking for a very efficient XML parser for C++. In particular I have a VERY VERY BIG XML file to parse.
My application must open this file and retrieve data. It must also insert new nodes and save the final result in the file again.
To do this I used, at the beginning, rapidxml, but it requires me to open the file, parse it all (all the content because this lib has no functions to access the file directly without loading the entire tree first), then edit the tree, modify it and store the final tree on the file by overwriting it... It consumes too much resources.
Is there an XML parser that does not require me to load the entire file, but that I can use to insert, quickly, new nodes and retrieve data? Can you please indicate solutions for this problem of mine?
You want a streaming XML parser rather than what is called a DOM parser.
There are two types of streaming parsers: pull and push. A pull parser is good for quickly writing XML parsers that load data into program memory. A push parser is good for writing a program to translate one document to another (which is what you are trying to accomplish). I think, therefore, that a push parser would be best for your problem.
In order to use a push parser, you need to write what is essentially an event handler for parsing events. By "parsing event", I mean events like "start tag reached", "end tag reached", "text found", "attribute parsed", etc.
I suggest that as you read in the document, you write out the transformed document to a separate, temporary file. Thus, your XML parsing event handlers will need to be written so that they are stateful and write out the XML of the translated document incrementally.
Three excellent push parser libraries for C++ include Expat, Xerces-C++, and libxml2.
Search for "SAX parser". They are mostly tokenizers, i.e. they emit tag by tag without building a tree.
SAX parsers are faster than DOM parsers because DOM parsers read the entire file into memory before building an in-memory representation of the XML document, whereas a SAX parser behaves like an event listener and builds the document as it reads in the file. Go here for an explanation.
As you mentioned Xerces is a good C++ SAX parser.
I would recommend looking into ways of breaking the XML document into smaller XML documents as that seems to be part of your problem.
Okay, here is one off the beaten track, I looked at this, but haven't really used it myself, it's called asmxml. These boys claim performance bar none, downside, you need x86 assembler.
If you really seek high performance XML stream parser then libhpxml is likely the right thing for you.
I’m convinced that no XML library exists that allows you to modify a file without loading it first. This simply isn’t possible because files don’t work that way: you cannot insert (or remove) in the middle of a file. You can only overwrite a block of identical size, or append at the end. But your request would require to append or remove in the middle of the file.
Reading only parts of an XML file may be possible. But writing … no way.
Go for template libraries as much as possible, like Boost::property_tree or Boost::XMLParser or POCO::XML and Folly has XML Parser in it.
Avoid old C libraries, it all old code designs.
someone say QtXML module is high performance for huge XML files.

C++ Logger-Should I use an ordinary xml parser?

I'm working on a logging system for my 2D engine, and I'm confused on how I should go about creating/editing the file, and how I should output that file.
I've learned that XML is more of a data carrier rather than a data displayer like HTML is. I've read that I can use XML to HTML converters. One method I've thought about is writing characters to a file in HTML.
Clarity on these matters is what I ask of you, stack overflow.
Creating an XML (or HTML) file doesn't need any special library. Straightforward string concatenation is usually good enough, you may have to encode some special characters (e.g. > into >.
But as Owen says, plain text is a log more common for log files. One reasonable compromise is comma-separated values in a text file, this gives you a little bit of structure without much overhead. For example, the Windows web server (IIS) uses this format by default, and if you have some fields that are output for each line such as timestamp or source filename and line number, this makes it easy to separate those out again.
Just about every log I've ever worked with has been pure text delimited by newlines. If you're going to depart from that, you may want to ask yourself what it is about your logging needs that you want to accomplish with markup.
If you must go the way of markup, I would suggest an XML format that contains a minimal set of markup that would be useful in your situation. You could use XML to capture structure in your log entries (timestamp, severity, and operational code, for example) that would be inconvenient to code for in HTML.
Note that you could also go hybrid and embed some XHTML tags in an XML element whose purpose is to capture displayable text, if you want.
The problem with XML or HTML files is that you cannot append at any time. You have to close the final tag (document tag) properly at the end of writing.
Therefore, it's not a popular format for logging.
For logging, I suggest using one of the existing log engines, such as Apache logger, or, John Torjo's boost log candidate. They will support log levels, runtime configuration, etc.
If you are considering writing logs in XML files, please, stop.
Log files should be simple plain text files, XML-izing it is introducing needless complexity. They are not structured data, they are meant to be read by people, not automated tools.
It all starts with XML logs, and then it goes downhill from there.

XML usage for c++ application

I have a couple of questions about XML.
Can XML be used for normal c++ application instead of using a text file ?
If so, does this method have advantages?
and finally, how can I use XML to store data? what tools are needed?
Regards.
You can use XML for storing information - it's less Human readable than a text file, but can be more easily communicated with other systems and coding languages.
If all you need is a few text/numeric properties, stick to a property file.
If you need a mix of configuration options, and you want to use validation (can be accomplished using XML schema), automatic modification (e.g. XSL transformations) or communicate it easily with Web Services, than XML is useful.
If you want to store binary data, XML is probably not that answer. Though you can store it in a filesystem and use the XML for the metadata (i.e. where each file is located).
Take a look at Apache Xerces-C for C++ XML code - http://xerces.apache.org/xerces-c/
XML can be parsed as a text file by your application. There are libraries available.
Advantage: the files can be exchanged with other applications more easily, especially if you provide an XML-schema file.
Storing data in XML can be done with boost.serialization
It depends of the kind of data you want to read/write, but XML is generally a good way to go for storing structured and hierarchical datas.
You can use librairies such as TinyXML to easily parse and write XML files in C++.
The main drawback is that XML is verbose ; that's why you can also use an alternative such as JSON to store your datas.