It's often quoted that
"It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." — Alan Perlis
and this is something that is heavily observed in Clojure & it's libraries too e.g
In the gist here, Rich has explained the design principle of clojure.xml where once the XML file is read, it is converted to a map and then you have all the functions available to manipulate these maps however you need and that you can reuse these functions with other maps
I'm having trouble understanding how these functions implemented to manipulate maps representing XML's to be used anywhere else?
I mean wouldn't the 100 functions I would've written be specific to the XML domain (i.e. specific to the map schema for XML), so the only reuse of these fn's would be if the map adheres to the same schema
Am I missing anything?
From the linked gist:
Look at the way clojure.xml processes XML - it deals with an external entity, a SAX parser fronting some XML source, gets the data out of it and into a map, and it's done. There aren't and won't be many more functions specific to XML. The world of functions for manipulating maps will now apply. And if it becomes necessary to have more XQuery-like capabilities in order to handle XML applications, they will be built for maps generally, and be available and reusable on all maps. Contrast this with your typical XML DOM, a huge collection of functions useful nowhere else.
Wouldn't the 100 functions I would've written be specific to the XML domain?
No.
Because you would not be dealing with XML anymore. XML is merely a markup language. In other language ecosystems (say, Java) when someone refers to XML, it means the markup language AND (almost always) it's extensions and the tools built around them. The way Clojure (at least clojure.xml) handles XML is to read the data and then continue processing it in Clojure's native maps.
It is worth noting (as of now) clojure.xml contains only one function, parse. This means it does not support serializing to XML, neither does it support in-place editing of XML data elements.
I think you are confusing XML with the particular instance of an XML file.
Given a specific XML file like say:
<xml>
<node1>One</node1>
<node2>Two</node1>
<node3>
<node31>
ThreeOne
</node31>
<node32>
ThreeTwo
</node31>
</node1>
</xml>
It is true that for example, you might want to have logic that grabs the values form the child nodes under node3. And so maybe you'd have a function called get-three-node-child-values. That function would not be reusable given another XML with a different structure that did not have a node3 with children as the above XML does.
But this is not the functions Rich is talking about being reused. The functions being reused are the ones that you use to implement the logic of get-three-node-child-values. Because if what you had was an XML Object, that XML Object would need to have a method to get the node3, and another method to get children of that node, and another method to get the value of that node. All these methods only work for the XML Class of objects, and had to be written for that. But if you turn the XML into a Map, you don't need these methods at all, and don't need to implement them. Since a Map already has methods to navigate and loop over its nodes.
Hope that made it more clear.
Related
In Qt there are a number of different ways to work with XML. To keep this simple I only want to look at the QXml* classes and the QDom* classes.
I'm trying to figure out which one to use but they both look to have similar functionality.
What's are the main differences between QXml and QDom?
Hypothetical example: Does one read the whole xml file into memory making it slow at startup but faster after startup?
What scenarios should require you to you to use one method over the other? and why should you use one over the other?
Hypothetical example: let's say you you are doing a "one-pass" versus "multi-pass"...
In short, QXml* classes implement SAX (Simple API for XML) XML parser while QDom* implement DOM (Document Object Model) XML parser.
The main difference is that SAX is a sequential access parser, so it parses the document as it reads it, and makes first chunks of parsed data available almost instantly. DOM needs to load the whole document into the memory to get it parsed, but it might be a bit easier to handle in terms of code overhead (for SAX you have to implement XML handler class). In general, SAX is more lightweight and faster.
There's lots of reading online regarding comparison of SAX and DOM:
why is sax parsing faster than dom parsing ? and how does stax work?
http://developerlife.com/tutorials/?p=28
And here's a nice document comparing various multiplatform XML parsers (including QXml* and QDom*). Your best choice depends on your use case, if you're working with huge XML documents, you'd prefer SAX. For tiny XMLs you'd be better off using DOM, since it's just a few lines of code to get data you need from a file.
I need to handle big XML files, but I want to make relatively small set of changes to it. I also want the program to adhere strict memory constraints. We must never use more than, say, 300Mb of ram.
Is there a library that allows me not to keep all the DOM in memory, and parse the XML on the go, while I traverse the DOM?
I know you can do that with call-back based approach, but I don't want that. I want to have my cake and eat it too. I want to use the DOM API, but to parse each element lazily, so that existing code that use the DOM API won't have to change.
There are two possible approaches I thought of for this problem:
Parse the lazily XML, each call to getChildren() will parse the next bit of XML.
Parse the entire XML tree, but cache whatever you're not using right now on the disk.
Two of the approaches are acceptable, is there an existing solution.
I'm looking for a native solution, but I'll be interested with hearing about libraries in other languages.
It sounds like what you want is something similar to the Streaming API for XML (StAX).
While it does not use the standard DOM API, it is similar in principle to your "getChildren()" approach. It does not have the memory overheads of the DOM approach, nor the complexity of the callback (SAX) approach.
There are a number of implementations linked on the Wikipedia page for StAX most of which are for Java, but there are a couple for C++ too - Ambiera irrXML and Llamagraphics LlamaXML.
edit: Since you mention "small changes" to the document, if you don't need to use the document contents for anything else, you might also consider Streaming Transformations for XML (STX) (described in this XML.com introduction to STX). STX is to XSLT something like what SAX/StAX is to DOM.
I want to use the DOM API, but to parse each element lazily, so that existing code that use the DOM API won't have to change.
You want a streaming DOM-style API? Such a thing generally does not exist, and for good reason: it would be difficult if not impossible to make it actually work.
XML is generally intended to be read one-way: from front to back. What you're suggesting would require being able to random-access an XML file.
I suppose you could do something where you build a table of elements, with file offsets pointing to where that element is in the file. But at that point, you've already read and parsed the file more or less. Unless most of your data is in text elements (which is entirely possible), you may as well be using a DOM.
Really, you would be much better off just rewriting your existing code to use an xmlReader or SAX-style API.
How to do streaming transformations is a big, open, unsolved problem. There are numerous partial solutions, depending on what restrictions you are prepared to accept. Current releases of Saxon-EE, for example, have the capability to do some XSLT transformations in a streaming fashion: see http://www.saxonica.com/html/documentation/sourcedocs/streaming.html. Also, as already mentioned, there is STX (though implementations are not especially mature).
Your title suggests you want to write the transformation in C++. That's severely limiting, because it pretty well means the programmer has to cope with the complexities rather than leaving it to the transformation engine. You can of course hand-code streaming transformations using SAX-like or StAX-like parser APIs, but both are hard work, and each case will need to be approached from scratch.
Google for "streaming XML transformation"
We're doing a mapping process from an XML file generated by a legacy system to EDI 834/837 files. We have BizTalk 2010 and are using the Microsoft built in EDI schemas.
The EDI files are fairly complex, and the XML file we are getting is also complex, with a lot of pieces bolted on. I started going through the mapping tool, but it seemed like there was a lot of repitition that I could eliminate by running the XML file through an XSLT.
I found the following link, but I'm not happy with just one source. http://blog.eliasen.dk/2009/07/08/CustomXSLTScriptingFunctoidOrBuiltinFunctoidsAQuestionAboutReligion.aspx
So, any other advantages on using the mapping tool over just building a custom XSLT?
My experience with BizTalk maps is that things that are very simple to do with XSLT can be very complex with maps.
For good counter-examples of BizTalk maps, look at the book "Pro Mapping in BizTalk Server 2009". The book has some examples of very complex things you can achieve with BizTalk maps, but the downside to it is that in fact they have hidden all the complexity in scripting functoids. Therefore, the maps are not visual at all anymore (they don't even have links between nodes to provide at least hints to deduce what the map is doing).
XSLT can be more visual than a map, since you can see the resulting XML in the XSLT (keep in mind that "text" does not imply "not visual" - if you are transforming between text formats, then a natural way to visualize the transformation is by looking at text)
BizTalk maps can be used for very simple mappings, where you are essentially copying a set of properties from one structure to another structure with the same properties. However, as soon as you have to map a structure to another different structure, you quickly get something that's hard to write AND hard to read/understand.
Not really, I prefer XSLT too. It's easier to document (using comments in the source) and therefore to maintain. However, keep in mind that in BizTalk 2006 R2 you could not import external XSLTs, which reduces your options for reuse. I have no idea if this has changed in subsequent versions of BizTalk, that's for you to find out and perhaps let us all know...
Not really an answer, more sharing of expierence;
In my team we've had discussion on this issue. The argument for maps was that it is understood by most colleagues (as it is touched by every basic BizTalk training), and XSLT not.
I've personally worked with XSLT for a long time, before i started working with BizTalk, and find the mapper tool very .. unintuitive. Every connection i make raises more questions than it gives me comfort in knowing what the result is. What happens when the source node is nil, not present, or repeating? Whathappens when the target node is defined as minOccurs=2? What does the table mapping functoid do exactly? What does the table value extract functoid do when a value is not found? How do i create a node with an autonumbering sequence, and how do i relate other created nodes that can relate to those nodes by using the generated number?
Working with XSLT gives me the control back, i know exactly what happens.
XSLT maps have the added value of being text-based, wich works well with branching and mering in source control, and allows us to add coments in the sources. Ever tried to merge changes from a map from two diffrent branches?
End result is that we now prefer XSLT for mapping, but not every developer is fluent in XSLT. That requires some training.
One last tip: invest in unit test tooling for your maps. Find an open source toolkit, or write some plumbing to test your maps yourself. Most BizTalk artifacts are perfectly testable, even when it doesn't seem that way, with possible exception for orchestrations (which you should use as a last resort only anyway).
IMO:
Benefits of XSLT
You get better DRY by reusing mapping functionality using XSLT apply + call
templates and custom script functions (e.g. C# script) in the same
map. Unfortunately AFAIK <xsl:include> doesn't work, so you will
need to copy-paste to get reuse across multiple map xslt files.
XSLT native call templates tend to be more performant than C# script (which is how most of the functoids are implemented anyhow)
You can use the XSLT debugger in Visual Studio.
And to emphasize ckarras' point that for complex maps, XSLT is actually easier to understand than a visual spider web.
Benefits of Visual Map
Productivity for trivial maps, e.g. where all elements are exactly the same name and type and can be mapped at the root level, or if you need a dummy map with hard coded output element values.
And I guess the hurdle rate for XSLT may be quite high.
As someone with experience in both BizTalk as well as another GUI-based mapping tool (BridgeGate), I can say that for the non-programmer these applications contain solutions in the form of their mapping interface to solve most problems. When they fall short, they offer a back door to exit to a more code-based solution in the form of a scripting functoid. So while XSLT is certainly an alternative, I find that those who prefer it often are those with more comfort writing code than those who are not.
My experience specifically with 837P and 837I files was with the prior mapping tool (BridgeGate), and it WAS arduous--but that was mainly the fault of the complexity of the file. What I CAN say and what is not being mentioned is that changes later to the process to accommodate client change requests WAS much easier in the GUI-based maps; I can only imagine how it would have been to have to dive into an XSLT big enough to handle 837 transformations and make changes to touch every node involved with a change request. You know how big an 837 is, and how complex the looping can be. Keep that in mind when making your choice.
I don't envy your task, but know the satisfaction when you complete it will make it all worthwhile. Good luck!
I'm working on a C++ project, and wanted to get some inputs from developers with similar experience.
The task is to connect to a web service which gives the results in an XML form. My role in the task is once I receive the XML form, I need to convert the XML into a C++ object and parse the XML data to the C++ object.
Following are my clarifications.
a) One way is to handcraft the whole thing but I need to do this for around hundreds of web services. I am aware there are simpler tools for C# and Java to do the same.
Is there a tool/utility for C++ too?
Any suggestions, would be helpful.
In the past, I've used TinyXML for my XML parsing needs. My parsing code operated under the assumption that all XML input conforms to a particular XSD schema I wrote. It worked fairly well but the ripple effects were annoying - if I wanted to change the XSD, I had to update all my XML test files as well as my parsing code. While it's not so bad in the case of parsing one schema, I'd hate to have to do it for hundreds of them.
I'm not sure what the common solution is, but CodeSynthesis XSD sounds pretty promising. I haven't used it, but it appears that it generates a data layer, a parser and serialisation code for you. Could save you a lot of time.
If you're asking if there's a way to dynamically create an object representation of an XML data stream (such that you can access it like topLevel.subObject.value), it's not possible. C++ is a statically-typed language, which means all objects need to be defined a compile time. The best you could do is something like: xmlData.getSubObject("objectName").getValue().
As for toolsets for parsing into something usable dynamically (as per my later example), there are several. For Windows, for example, you could use the "built-in" MSXML objects. There's nothing in the base C++ libraries to do so, however, as far as I am aware.
Hope that helps.
I'm trying to create a message validation program and would like to create easily modifiable rules that apply to certain message types. Due to the risk of the rules changing I've decided to define these validation rules external to the object code.
I've created a basic interface that defines a rule and am wondering what the best way to store this simple data would be. I was leaning towards XML but it seems like it might be too heavy.
Each rule would only need a very small set of data (i.e. type of rule, value, applicable mask, etc).
Does anyone know of a good resource that I could look at that would perform a similar functionality. I'd rather not dig too deep into XML on a problem that seems to barely need a subset of the functionality I see in most of the examples I bump into.
If I can find a concise example to examine I would be able to decide on whether or not to just go with a flat file.
Thanks in advance for your input!
Personally, for small, easily modifiable XML, I find TinyXML to be an excellent library. You can make each class understand it's own format, so your object hierarchy is represented directly in the XML.
However, if you don't think you need XML, you might want to go with a lighter storage like yaml. I find it is much easier to understand the underlying data, modify it and extend functionality.
(Also, boost::serialization has an XML archive, but it isn't what I'd call easily modifiable)
The simplest is to use a flat file designed to be easy to parse using the C++ >> operator. Just simple tokens separated by whitespace.
Well, if you want your rules to be human readable, XML is the way to go, and you can interface it nicely with c++ using xerces. If you want performance and or size, you could save the data as binaries using simple structs.
Another way to implement this would be to define your rules in XML Schema and then have an XML Data Binding tool generate the corresponding C++ object model along with the XML parsing and serialization code. One such tool (that I happen to be working on) is CodeSynthesis XSD:
http://www.codesynthesis.com/products/xsd/
For a 2-minutes overview of the idea, see the "Hello World" example in the C++/Tree mapping documentation.