std::string xml string to object - c++

What would be the most effective way to take an XML string of type std::string and convert it to an in-memory XML object, the object structure is of no importance, what I'm after is whether I'd need to go through the string char by char and pick out all the pieces or if there is some easier way?

The easiest way is probably to use a library to do that.
If you want to do that yourself, you'll need to parse the string containing XML code. There are many ways to do that; easiest is probably a recursive-descent parser.

Take a look at Arabica: http://www.jezuk.co.uk/cgi-bin/view/arabica Of all XML libraries/wrappers I am aware of it is most std friendly.

Try Expat, compact, user-friendly and free:
http://expat.sourceforge.net/

You could use MSXML for this, it will take the string and produce a DOM (Document Object Model) representation. Documentation is OK but this is not the easiest library to use. WIndows only of course.
Pros - reliable, widely used.
Cons - you have to learn COM programming model to a degree. Not the most intuitive to use.
A simpler option would be xerces. Sample file parse code here, there are other samples as well. I've used both this and MSXML in different jobs.

Use libxml2. Not a highly complicated library and easy to use. Portable, written in C but bindings to other languages available, and loads of examples to use and learn from.

Related

Sharing data structures between perl and cpp

I have a perl script which generates a very large data structure (which starts life as an array of array references). This is then written to a text file using some weird home-brew serialisation scheme.
The data from the text file is stored as the value in a key-value store db.
A c++ file then retrieves the data, and deserializes it (into a hashmap, although can potentially be flexible on how this data is structured).
What I'm interested in is finding if there are any good ways of sharing a data structure between perl and c++ (something like Storable, but that is meant for perl->perl not perl->c++). The current method is a headache to maintain, and may not have the best performance.
The most important factors are speed of deserialisation, and the size of the serialized structure in that order. Anyone know of something that might do the trick?
Storable is one way to dump and load perl data structures. I wouldn't actually recommend it for general usage though - it's handy in that it's part of core and easy to use.
But for multi-platform (and language) portability, it's far better to use a standard data representation. Which you choose is probably a matter of what sort of data you're holding in your structure, but core contenders are:
JSON - good for arrays and hashes (key-value).
YAML - Excellent for 'config file' style data (but extends in ways similar to JSON)
And if you must, XML - but bear in mind that XML is designed for documents-with-metadata, and so IMO isn't suitable for most of the applications it's used for.
As standards, they've got documented formatting and parsers are widely available. And implementing your own isn't too hard, if that's the route you want to go. Just make sure you follow the spec and you're good.
Note - that because XML and JSON (and I think YAML?) are recursive, you can parse as a stream, rather than a standalone object. (Trap, process and discard as you hit 'close brackets' in JSON, or 'close tags' in XML).
easy job.
I like perl , and I also like C/C++. To make the best of both,
I wrote a github project to solve this issue.
please see:
https://github.com/tlqtangok/perlcpp
a short example is here :
P_eval("$a=2;$a=$a**10;");
Int("a") ; // a= 1024

Swift: fastest way to parse HTML

I have a large file of source code that I need to parse some specific text out of. I want to get it done as fast as possible. What would be the fastest way to do this in Swift? These are all the options I could think of?
Using a third-party library of string functions- I've tried this. It works well, but I imagine this is much slower compared to other lower level methods in general, unless there are some notably fast ones out there specifically for Swift.
Using a third-party HTML parser. I've looked into a few, but I'm not sure if they will suit my needs. Before I proceed with this, I just want to know if these are generally faster, if there are any notabley fast ones out there, and if I'm able to tweak them to get specifically what I want from the source code.
Using String or NSString. From what I understand, using String vs NSString should give no difference in speed. I am quite comfortable with this approach, and it's lower level than some of the other ones, so should I expect fairly fast performance?
Using regular expressions. I've been told that since these are lower-level, they should ideally be the fastest. I've used regular expressions before, but not in ios. Is it easy to do string parsing with NSRegularExpression, and is it faster?
Thank you!
Came upon this link while researching your question: http://benedictcohen.co.uk/blog/archives/74
The authors explains an older approach to what #CodaFi suggested, but there is a relevant update at the end you should check out:
The easiest way to parse HTML is to treat it as XML and use the
NSXMLParser. iOS comes with LibTidy which is capable of fixing a
multitude of markup sins. Use LibTidy to create clean XML and pass
this XML to NSXMLParser. Only use the approach outlined above if it’s
not possible to use NSXMLParser.
So perhaps option 4 or 5 for you to check out?

Tiny C++ YAML reader/writer

I'm writing an embedded C++ program, and need to add serialization/deserialization. The format should be human readable and writeable, and I would much prefer to use (a subset of) a standard format like YAML. I also prefer YAML to JSON since it is more concise.
While yaml-cpp has the exact functionality I'd like, the source code is almost 300K and would almost double my code size, which seems excessive to me just in order to add human readable serialization/deserialization.
Before I start writing my own reader/writer for a subset of YAML, I'd like to first check whether this already exists? I have not been able to find one, but would much prefer to use existing code rather than rolling my own. Are there any C or C++ YAML readers/writers out there of, say, 50K code or less? I only need functionality for the basic data structures (scalar, array, hash), not any advanced stuff.
With many thanks in advance.
The Oops library is doing what you are looking for. It is written for serialization using reflection and supports YAML format as well.
https://bitbucket.org/barczpe/oops

XML Representation of C++ Objects

I'm trying to create a message validation program and would like to create easily modifiable rules that apply to certain message types. Due to the risk of the rules changing I've decided to define these validation rules external to the object code.
I've created a basic interface that defines a rule and am wondering what the best way to store this simple data would be. I was leaning towards XML but it seems like it might be too heavy.
Each rule would only need a very small set of data (i.e. type of rule, value, applicable mask, etc).
Does anyone know of a good resource that I could look at that would perform a similar functionality. I'd rather not dig too deep into XML on a problem that seems to barely need a subset of the functionality I see in most of the examples I bump into.
If I can find a concise example to examine I would be able to decide on whether or not to just go with a flat file.
Thanks in advance for your input!
Personally, for small, easily modifiable XML, I find TinyXML to be an excellent library. You can make each class understand it's own format, so your object hierarchy is represented directly in the XML.
However, if you don't think you need XML, you might want to go with a lighter storage like yaml. I find it is much easier to understand the underlying data, modify it and extend functionality.
(Also, boost::serialization has an XML archive, but it isn't what I'd call easily modifiable)
The simplest is to use a flat file designed to be easy to parse using the C++ >> operator. Just simple tokens separated by whitespace.
Well, if you want your rules to be human readable, XML is the way to go, and you can interface it nicely with c++ using xerces. If you want performance and or size, you could save the data as binaries using simple structs.
Another way to implement this would be to define your rules in XML Schema and then have an XML Data Binding tool generate the corresponding C++ object model along with the XML parsing and serialization code. One such tool (that I happen to be working on) is CodeSynthesis XSD:
http://www.codesynthesis.com/products/xsd/
For a 2-minutes overview of the idea, see the "Hello World" example in the C++/Tree mapping documentation.

A lightweight XML parser efficient for large files?

I need to parse potentially huge XML files, so I guess this rules out DOM parsers.
Is out there any good lightweight SAX parser for C++, comparable with TinyXML on footprint?
The structure of XML is very simple, no advanced things like namespaces and DTDs are needed. Just elements, attributes and cdata.
I know about Xerces, but its sheer size of over 50mb gives me shivers.
Thanks!
If you are using C, then you can use LibXML from the Gnome project. You can choose from DOM and SAX interfaces to your document, plus lots of additional features that have been developed over years. If you really want C++, then you can use libxml++, which is a C++ OO wrapper around LibXML.
The library has been proven again and again, is high performance, and can be compiled on almost any platform you can find.
I like ExPat
http://expat.sourceforge.net/
It is C based but there are several C++ wrappers around to help.
RapidXML is quite a fast parser for XML written in C++.
http://sourceforge.net/projects/wsdlpull this is a straight c++ port of the java xmlpull api (http://www.xmlpull.org/)
I would highly recommend this parser. I had to customize it for use on my embedded device (no STL support) but I have found it to be very fast with very little overhead. I had to make my own string and vector classes, and even with those it compiles to about 60k on windows.
I think that pull parsing is a lot more intuitive than something like SAX. The code much more closely mirrors the xml document making it easy to correlate the two.
The one downside is that it is forward only, meaning that you need to parse the elements as them come. We have a fairly messed up design for reading our config files, and I need to parse a whole subtree, make some checks, then set some defaults then parse again. With this parser the only real way to handle something like that is to make a copy of the state, parse with that, then continue on with the original. It still ends up being a big win in terms of resources vs our old DOM parser.
If your XML structure is very simple you can consider building a simple lexer/scanner based on lex/yacc (flex/bison) . The sources at the W3C may inspire you: http://www.w3.org/XML/9707/parser.y and http://www.w3.org/XML/9707/scanner.l.
See also the SAX2 interface in libxml
firstobject's CMarkup is a C++ class that works as a lightweight huge file pull parser (I recommend a pull parser rather than SAX), and huge XML file writer too. It adds up to about 250kb to your executable. When used in-memory it has 1/3 the footprint of tinyxml by one user's report. When used on a huge file it only holds a small buffer (like 16kb) in memory. CMarkup is currently a commercial product so it is supported, documented, and designed to be easy to add to your project with a single cpp and h file.
The easiest way to try it out is with a script in the free firstobject XML editor such as this:
ParseHugeXmlFile()
{
CMarkup xml;
xml.Open( "HugeFile.xml", MDF_READFILE );
while ( xml.FindElem("//record") )
{
// process record...
str sRecordId = xml.GetAttrib( "id" );
xml.IntoElem();
xml.FindElem( "description" );
str sDescription = xml.GetData();
}
xml.Close();
}
From the File menu, select New Program, paste this in and modify it for your elements and attributes, press F9 to run it or F10 to step through it line by line.
you can try https://github.com/thinlizzy/die-xml . it seems to be very small and easy to use
this is a recently made C++0x XML SAX parser open source and the author is willing feedbacks
it parses an input stream and generates events on callbacks compatible to std::function
the stack machine uses finite automata as a backend and some events (start tag and text nodes) use iterators in order to minimize buffering, making it pretty lightweight
I'd look at tools that generate a DTD/Schema-specific parser if you want small and fast. These are very good for huge documents.
I highly recommend pugixml
pugixml is a light-weight C++ XML processing library.
"pugixml is a C++ XML processing library, which consists of a DOM-like interface with rich traversal/modification capabilities, an extremely fast XML parser which constructs the DOM tree from an XML file/buffer, and an XPath 1.0 implementation for complex data-driven tree queries. Full Unicode support is also available, with Unicode interface variants and conversions between different Unicode encodings."
I have tested a few XML parsers including a few expensive ones before choosing and using pugixml in a commercial product.
pugixml was not only the fastest parser but also had the most mature and friendly API. I highly recommend it. It is very stable product! I have started to use it since version 0.8. Now it is 1.7.
The great bonus in this parser is XPath 1.0 implementation! For any more complex tree queries the XPath is a God sent feature!
DOM-like interface with rich traversal/modification capabilities is extremely useful to tackle a real life "heavy" XML files.
It is small, fast parser. It is good choice even for iOS or Android app if you do not mind linking C++ code.
Benchmarks can tell a lot. See: http://pugixml.org/benchmark.html
A few examples for (x86):
pugixml is more than 38 times faster than TinyXML
4.1 times faster than CMarkup,
2.7 times faster than expat or libxml
For (x64) pugixml is the fastest parser which I know.
Check also the usage of the memory by your XML parser. Some parsers just gobble precious memory!