We're doing a mapping process from an XML file generated by a legacy system to EDI 834/837 files. We have BizTalk 2010 and are using the Microsoft built in EDI schemas.
The EDI files are fairly complex, and the XML file we are getting is also complex, with a lot of pieces bolted on. I started going through the mapping tool, but it seemed like there was a lot of repitition that I could eliminate by running the XML file through an XSLT.
I found the following link, but I'm not happy with just one source. http://blog.eliasen.dk/2009/07/08/CustomXSLTScriptingFunctoidOrBuiltinFunctoidsAQuestionAboutReligion.aspx
So, any other advantages on using the mapping tool over just building a custom XSLT?
My experience with BizTalk maps is that things that are very simple to do with XSLT can be very complex with maps.
For good counter-examples of BizTalk maps, look at the book "Pro Mapping in BizTalk Server 2009". The book has some examples of very complex things you can achieve with BizTalk maps, but the downside to it is that in fact they have hidden all the complexity in scripting functoids. Therefore, the maps are not visual at all anymore (they don't even have links between nodes to provide at least hints to deduce what the map is doing).
XSLT can be more visual than a map, since you can see the resulting XML in the XSLT (keep in mind that "text" does not imply "not visual" - if you are transforming between text formats, then a natural way to visualize the transformation is by looking at text)
BizTalk maps can be used for very simple mappings, where you are essentially copying a set of properties from one structure to another structure with the same properties. However, as soon as you have to map a structure to another different structure, you quickly get something that's hard to write AND hard to read/understand.
Not really, I prefer XSLT too. It's easier to document (using comments in the source) and therefore to maintain. However, keep in mind that in BizTalk 2006 R2 you could not import external XSLTs, which reduces your options for reuse. I have no idea if this has changed in subsequent versions of BizTalk, that's for you to find out and perhaps let us all know...
Not really an answer, more sharing of expierence;
In my team we've had discussion on this issue. The argument for maps was that it is understood by most colleagues (as it is touched by every basic BizTalk training), and XSLT not.
I've personally worked with XSLT for a long time, before i started working with BizTalk, and find the mapper tool very .. unintuitive. Every connection i make raises more questions than it gives me comfort in knowing what the result is. What happens when the source node is nil, not present, or repeating? Whathappens when the target node is defined as minOccurs=2? What does the table mapping functoid do exactly? What does the table value extract functoid do when a value is not found? How do i create a node with an autonumbering sequence, and how do i relate other created nodes that can relate to those nodes by using the generated number?
Working with XSLT gives me the control back, i know exactly what happens.
XSLT maps have the added value of being text-based, wich works well with branching and mering in source control, and allows us to add coments in the sources. Ever tried to merge changes from a map from two diffrent branches?
End result is that we now prefer XSLT for mapping, but not every developer is fluent in XSLT. That requires some training.
One last tip: invest in unit test tooling for your maps. Find an open source toolkit, or write some plumbing to test your maps yourself. Most BizTalk artifacts are perfectly testable, even when it doesn't seem that way, with possible exception for orchestrations (which you should use as a last resort only anyway).
IMO:
Benefits of XSLT
You get better DRY by reusing mapping functionality using XSLT apply + call
templates and custom script functions (e.g. C# script) in the same
map. Unfortunately AFAIK <xsl:include> doesn't work, so you will
need to copy-paste to get reuse across multiple map xslt files.
XSLT native call templates tend to be more performant than C# script (which is how most of the functoids are implemented anyhow)
You can use the XSLT debugger in Visual Studio.
And to emphasize ckarras' point that for complex maps, XSLT is actually easier to understand than a visual spider web.
Benefits of Visual Map
Productivity for trivial maps, e.g. where all elements are exactly the same name and type and can be mapped at the root level, or if you need a dummy map with hard coded output element values.
And I guess the hurdle rate for XSLT may be quite high.
As someone with experience in both BizTalk as well as another GUI-based mapping tool (BridgeGate), I can say that for the non-programmer these applications contain solutions in the form of their mapping interface to solve most problems. When they fall short, they offer a back door to exit to a more code-based solution in the form of a scripting functoid. So while XSLT is certainly an alternative, I find that those who prefer it often are those with more comfort writing code than those who are not.
My experience specifically with 837P and 837I files was with the prior mapping tool (BridgeGate), and it WAS arduous--but that was mainly the fault of the complexity of the file. What I CAN say and what is not being mentioned is that changes later to the process to accommodate client change requests WAS much easier in the GUI-based maps; I can only imagine how it would have been to have to dive into an XSLT big enough to handle 837 transformations and make changes to touch every node involved with a change request. You know how big an 837 is, and how complex the looping can be. Keep that in mind when making your choice.
I don't envy your task, but know the satisfaction when you complete it will make it all worthwhile. Good luck!
Related
I need to handle big XML files, but I want to make relatively small set of changes to it. I also want the program to adhere strict memory constraints. We must never use more than, say, 300Mb of ram.
Is there a library that allows me not to keep all the DOM in memory, and parse the XML on the go, while I traverse the DOM?
I know you can do that with call-back based approach, but I don't want that. I want to have my cake and eat it too. I want to use the DOM API, but to parse each element lazily, so that existing code that use the DOM API won't have to change.
There are two possible approaches I thought of for this problem:
Parse the lazily XML, each call to getChildren() will parse the next bit of XML.
Parse the entire XML tree, but cache whatever you're not using right now on the disk.
Two of the approaches are acceptable, is there an existing solution.
I'm looking for a native solution, but I'll be interested with hearing about libraries in other languages.
It sounds like what you want is something similar to the Streaming API for XML (StAX).
While it does not use the standard DOM API, it is similar in principle to your "getChildren()" approach. It does not have the memory overheads of the DOM approach, nor the complexity of the callback (SAX) approach.
There are a number of implementations linked on the Wikipedia page for StAX most of which are for Java, but there are a couple for C++ too - Ambiera irrXML and Llamagraphics LlamaXML.
edit: Since you mention "small changes" to the document, if you don't need to use the document contents for anything else, you might also consider Streaming Transformations for XML (STX) (described in this XML.com introduction to STX). STX is to XSLT something like what SAX/StAX is to DOM.
I want to use the DOM API, but to parse each element lazily, so that existing code that use the DOM API won't have to change.
You want a streaming DOM-style API? Such a thing generally does not exist, and for good reason: it would be difficult if not impossible to make it actually work.
XML is generally intended to be read one-way: from front to back. What you're suggesting would require being able to random-access an XML file.
I suppose you could do something where you build a table of elements, with file offsets pointing to where that element is in the file. But at that point, you've already read and parsed the file more or less. Unless most of your data is in text elements (which is entirely possible), you may as well be using a DOM.
Really, you would be much better off just rewriting your existing code to use an xmlReader or SAX-style API.
How to do streaming transformations is a big, open, unsolved problem. There are numerous partial solutions, depending on what restrictions you are prepared to accept. Current releases of Saxon-EE, for example, have the capability to do some XSLT transformations in a streaming fashion: see http://www.saxonica.com/html/documentation/sourcedocs/streaming.html. Also, as already mentioned, there is STX (though implementations are not especially mature).
Your title suggests you want to write the transformation in C++. That's severely limiting, because it pretty well means the programmer has to cope with the complexities rather than leaving it to the transformation engine. You can of course hand-code streaming transformations using SAX-like or StAX-like parser APIs, but both are hard work, and each case will need to be approached from scratch.
Google for "streaming XML transformation"
While learning Sitecore I have found that the majority of Sitecore sample code on the web is in XSL instead of .NET.
What would be the advantage of choosing XSL over the processes I have become accustomed to as a .NET developer?
Are there processing speed advantages to using XSL?
Is XSL actually easier once you are comfortable with the syntax?
I'll just add my 2 cents too:
I find that there are too many limitations in XSLT that need to be overcome with either external "libraries" or with you developing a method in C# that can be used in XSLT.
So I find using Asp.Net simpler. But then I'm also a lot better with Asp.Net than with XSLT.
But XSLT has some good things:
good when getting fields from the current context item
good with simple content etc.
doesn't force the solution to recycle/rebuild
usually a nice way it fails, ie. the page still works, but the xslt that failed says it fails
When I first started working with Sitecore, my company used quite a bit of XSLT, but we've slowly gone away from that, because of it's limitations and because most people here are more familiar with Asp.Net/C#.
Some folks prefer XSL because of existing team skill set, the availability of XSL talent, or the belief that XSL is easier or cheaper to learn.
In Sitecore, ASP.NET-based sublayouts actually perform much better than XSL renderings. If that's what you are comfortable with, go for it. I've never created an XSL rendering myself.
XSLT is a powerful language; its main advantages over languages like ASP.NET tend to come when you want to reuse and customize logic over a wide variety of different pages or different source document structures with common shared elements and other variable structures. To achieve this it uses a rule-based processing model which some people find quite difficult to get to grips with on first encounter. Learning it is an investment that will pay off over time, but it can be daunting at first.
As for performance, I've never come across a site where it isn't fast enough for the job, and that includes some pretty high-stress services; when people have had performance problems they've usually turned out to be in other parts of the processing pipeline (or simply due to bad coding).
The choice between XSLT and .Net components in Sitecore is largely one of taste and skillset. XSLT in Sitecore does have some drawbacks though - it tends to be outperformed by .NET components for all but the most simple renderings and the places where it might seem most logical to use it, such as replicating content tree structure as a site menu, are actually those that tend to take the biggest performance hit. In the right situations XSLT is an incredibly powerful tool and well worth learning, but I've yet to see a convincing argument for making much use of it in Sitecore. It's also worth noting that some of the standard patterns of XSLT programming aren't the most efficient in Sitecore.
The only real advantage I can think of, would be that XSLT renderings are easier to deploy in isolation. Say, for instance, that you're updating your "News Spots" rendering and you want to deploy this change to test/production right away - it would be a simple case of uploading the .xsl file itself.
Using .NET development (and enduring the Web Application Project model), a deployment of the code base would implicitly deploy any and all changes to the affected assemblies - including whatever work you have in progress.
There are, of course, ways you can manage this. Source code branching/merging and so on - but that's an additional layer of complexity to your solution.
That being said, I use .NET for well over 95% of all my Sitecore development myself :-)
"In summary, a primary goal of software design and coding is conquering complexity. The motivation behind many programming practices is to reduce a program's complexity. Reducing complexity is a key to being an effective programmer." -Steve McConnell (1993)
Let that guide when to use XSLT over C#.
I'm working on a C++ project, and wanted to get some inputs from developers with similar experience.
The task is to connect to a web service which gives the results in an XML form. My role in the task is once I receive the XML form, I need to convert the XML into a C++ object and parse the XML data to the C++ object.
Following are my clarifications.
a) One way is to handcraft the whole thing but I need to do this for around hundreds of web services. I am aware there are simpler tools for C# and Java to do the same.
Is there a tool/utility for C++ too?
Any suggestions, would be helpful.
In the past, I've used TinyXML for my XML parsing needs. My parsing code operated under the assumption that all XML input conforms to a particular XSD schema I wrote. It worked fairly well but the ripple effects were annoying - if I wanted to change the XSD, I had to update all my XML test files as well as my parsing code. While it's not so bad in the case of parsing one schema, I'd hate to have to do it for hundreds of them.
I'm not sure what the common solution is, but CodeSynthesis XSD sounds pretty promising. I haven't used it, but it appears that it generates a data layer, a parser and serialisation code for you. Could save you a lot of time.
If you're asking if there's a way to dynamically create an object representation of an XML data stream (such that you can access it like topLevel.subObject.value), it's not possible. C++ is a statically-typed language, which means all objects need to be defined a compile time. The best you could do is something like: xmlData.getSubObject("objectName").getValue().
As for toolsets for parsing into something usable dynamically (as per my later example), there are several. For Windows, for example, you could use the "built-in" MSXML objects. There's nothing in the base C++ libraries to do so, however, as far as I am aware.
Hope that helps.
A coworker of mine mentioned that one use of XSLT is processing business rules. He mentioned that there were systems that allowed users to write business rules in some kind of text format, and then the program uses XSLT to process the text and apply the rules at run-time in the application.
Can someone shed some light on this subject for me?
Thanks!
Ouch. I wouldn't recommend that.
As the first responder said, XSL-T is for transforming XML. It's not a rules engine. I think it sounds like a misuse of the technology.
XSL-T transforms are not intuitive to write. If one of your goals for business rules is allowing business folks to update and maintain the rules, I can't imagine a more obtuse and difficult technology for doing so than XSL-T.
I suppose your colleague was refering to BPEL, the Business Process Execution Language. BPEL is an XML-based executable language for describing business processes.
Being an XML format, business rules may be generated or transformed using XSLT. However, I'm not familiar with BPEL so I don't know any system doing something like that.
Yes. The somewhat-like text format is called Excel, and users tend to do all kinds of complex things with it. The programmer then spends an awful lot of time trying to process it with every shiny new technology he can find, including XSLT, and finally decides to hand-code around all the inconsistencies. It is not fully automated, as no sane user trusts the programmer to get it right first time.
XSLT stands for XSL Transform. It is used to change an XML document from one form to another.
As for systems, Microsoft BizTalk uses XSLT in mapping operations that map one XML document into another. Within the XSLT the user can make use of .net code to do more complex processing.
I'm sure someone else will have a much nicer explanation but you can easily find out more by Googling XSLT tutorials. It's a huge topic.
It should be possible: write your rules in XML, the case data should also be in XML, and then a generic XSLT could be written that compares the case data against the rules and executes the relevant rules in the correct sequence.
The business users don't need to know XSLT, they just need to know how to write the rules.
I need a tool to execute XSLTs against very large XML files. To be clear, I don't need anything to design, edit, or debug the XSLTs, just execute them. The transforms that I am using are already well optimized, but the large files are causing the tool I have tried (Saxon v9.1) to run out of memory.
I found a good solution: Apache's Xalan C++. It provides a pluggable memory manager, allowing me to tune allocation based on the input and transform.
In multiple cases it is consuming ~60% less memory (I'm looking at private bytes) than the others I have tried.
You may want to look into STX for streaming-based XSLT-like transformations. Alternatively, I believe StAX can integrate with XSLT nicely through the Transformer interface.
It sounds like you're sorted - but often, another potential approach is to split the data first. Obviously this only works with some transformations (i.e. where different chunks of data can be treated in isolation from the whole) - but then you can use a simple parser (rather than a DOM) to do the splitting into manageable pieces, then process each chunk separately and reassemble.
Since I'm a .NET bod, things like XmlReader can do the chunking without a DOM; I'm sure there are equivalents for every language.
Again - just for completeness.
[edit re question]
I'm not aware of any specific name; maybe Divide and Conquer.
For an example; if your data is actually a flat list of like objects, then you could simply split the first-level children - i.e. rather than having 2M rows, you split it into 10 lots of 200K rows, or 100 lots of 20K rows. I've done this before lots of times for working with bulk data (for example, uploading in chunks of data [all valid] and re-assembling at the server so that each individual upload is small enough to be robust).
For what it's worth, I suspect that for Java, Saxon is as good as it gets, if you need to use XSLT. It is quite efficient (both cpu and memory) for larger documents, but XSLT itself essentially forces full in-memory tree of contents to be created and retained, except for limited cases. Saxon-SA (for-fee version) supposedly has extensions to allow taking advantage of such "streaming" cases, so that might be worth checking out.
But the advice to split up the contents is the best one: if you are dealing with independent records, just split the input using other techniques (like, use Stax! :-) )
I have found that a custom tool built to run the XSLT using earlier versions of MSXML makes it very fast, but also consumes incredible amounts of memory, and will not actually complete if it is too large. You also lose out on some advanced XSLT functionality as the earlier versions of MSXML don't support the full xpath stuff.
It is worth a try if your other options take too long.
That's an interesting question. XSLT could potentially be optimized for space, but I expect all but the most obscure implementations around start by parsing the source document into DOM, which is bound to use a low multiple of the document size in memory.
Unless the stylesheet is specially designed to support a single-pass transformation, reasonable time performance would probably require parsing the source document into a disk-based hierarchical database.
I do not have an answer, though.
It appears that Saxon 9.2 may provide an answer to your problem. If your document can be transformed without using predicates (does not reference any siblings of the current node) you may be able to use Streaming XSLT.
See this link
I have not tried this myself, I am just reading about it. But I hope it works.
Are you using the Java version of Saxon, or the .Net port? You can assign more memory to the Java VM running Saxon, if you are running out of memory (using the -Xms command line parameter).
I've also found that the .Net version of Saxon runs out of memory less easily than the Java version.
For .NET you can use solution suggestion on Microsoft Knowledge Base:
http://support.microsoft.com/kb/307494
XPathDocument srcDoc = new XPathDocument(srcFile);
XslCompiledTransform myXslTransform = new XslCompiledTransform();
myXslTransform.Load(xslFile);
using (XmlWriter destDoc = XmlWriter.Create(destFile))
{
myXslTransform.Transform(srcDoc, destDoc);
}
Take a look at Xselerator