XSLT Set difference but matching on a subsection of the node - xslt

I've implemented this in a recursive fashon but as most xml editors seem to run out of stack space I thought there should be a more efficient solution out there.
I've looked at Jenni Tenison's set difference template:
http://www.exslt.org/set/functions/difference/set.difference.template.xsl
but need something slightly different. I need node equality to be defined
as concat(node(.),#name).
There is a predefined set of nodes:
<a name="Adam"><!-- don't care about contents for equality purposes --></a>
<b name="Berty"><!-- don't care about contents for equality purposes --></b>
<a name="Charly"><!-- don't care about contents for equality purposes --></a>
I want to find out the subset of the below nodes that are not in the above list:
<b name="Berty"><!-- different contents --></b>
<b name="Boris"><!-- different contents --></b>
The result I'm after would be a node set of:
<b name="Boris"><!-- different contents --></b>
To complicate things I can't use Key as the nodes are in different documents (overriding imported definitions are the reason I'm trying to process this).
Also this needs to be XSLT 1.0 as I need to render in IE / Firefox.
Any thoughts / suggestions / guidence wellcome!

Have you taken a look at the technique in the XSLT Cookbook?
http://books.google.com/books?id=POJkiuHIAfoC&lpg=PP1&pg=PA324#v=onepage&q=&f=false
Mr. Mangano has a recipe for set difference, and a fairly well written explanation as well. Mind you, when you are comparing two elements that seem to be the same but have two different source documents, XSLT will usually report them as different, so you must test by the value of the element, attributes, etc.
You might want to poke at the example code from the book, provided here:
http://oreilly.com/catalog/9780596009748

Related

How to skip a self closing tag in a ST function on an SAP system?

So I have this problem handling an XML file in my SAP ABAP-based software, with a Simple Transformation.
The file I receive have normally no empty tags like <test></test>, but can happen sometimes that I receive some self closing tag like <test/>.
This is an example of what I thought to use now. The first condition handles if the ref('test') is blank by skipping it. The second one takes the values if we have one.
<tt:cond check="initial(ref('test'))">
<tt:skip count="*" name="test"/>
</tt:cond>
<tt:cond check="not-initial(ref('test'))">
<test tt:value-ref="test"/>
</tt:cond>
The idea is: if we have this tag <test/> we need to skip it, otherwise we need to assign the data. Now, this working in the first case, cause he takes no date, but not in the second cause it not takes the data again.
Someone can help?
Thanks in advantages.
The XDM tree representations of <test></test> and <test/> are 100% identical, so there is no way an XSLT stylesheet can distinguish them or treat them differently. The idea of attaching different meanings to the two constructs is completely misguided: you can never be sure which representation an XML library will choose to use.
It is of course possible to distinguish an element that contains a value (such as <test>value</test>) from one that is empty - but both the above examples represent empty elements and must be treated as equivalent.

RegEx to remove specific XML elements

I'm using Kate to process text to create an XML file but I've hit a roadblock. The text now contains additional data that I need to remove based on its content.
To be specific, I have an XML element called <officers> that contains 0 or more <officer> elements, which contain further elements such as <title>, <name>, etc.. While I probably could exclude these at run time using XSL, the file also drives another process that I don't want to touch - it's a general purpose data importer for Scribus so I don't want to touch the coding.
What I want to do is remove an <officer> element if the <title> content isn't what I want. For example, I don't want the First VP, so I'd like to remove:
<officer>
<title>First VP</title>
<incumbent>Joe Somebody</incumbent>
<address>....</address>
<address>....</address>
......
</officer>
I don't know how many lines will be in any <officer> element nor what positions they will in within the <officers> element.
The easy part it getting to the start of the content I want removed. The hard part is getting to the </officer> end tag. All the solutions I've found so far just result in Kate deciding that the RegEx is invalid.
Any suggestions are appreciated.
Regex is the wrong tool for this job; never process XML without a proper parser, except possibly for a one-off job on a single document where you will throw the code away after running it and checking the results by hand. You might find a regex that works on one sample document, but you'll never get it to work properly on a well-designed set of 100 test documents.
And it's easily done using XSLT. It's a stylesheet with two template rules: a default "identity template" rule to copy elements unchanged, and a second rule to delete the elements you don't want. In fact in XSLT 3.0 it gets even simpler:
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="officer[title='First VP']"/>

Truncate text formatted via HTML with XSLT 1.0

I am trying to truncate some text that has been formatted via HTML, but I need to keep the html in tact. I am doing so in SharePoint 2007 - so I am using XSLT 1.0.
I found this bit of XSLT here: http://symphony-cms.com/download/xslt-utilities/view/20816/
I was able to implement it, but it is telling me that the variable or parameter "Limit" has been defined twice.
However, the author has named many variables and parameters "Limit" and I am not sure which one I need to change.
I am fairly new to XSLT, and any help is greatly appreciated.
This is because at the top the XSLT the author has defined limit as a parameter
<xsl:param name="limit"/>
But a few lines down, then defines it as a variable
<xsl:variable name="limit">
Perhaps he had a 'buggy' xslt processor which allowed variables to be re-defined, but it should not actually be valid.
I did try renaming the variable to newlimit but it is hard to know whn he subsequently refers to limit whether it is the paramater or variable it is referring too (I couldn't actually get it to output useful HTML).
You are probably better off looking for something else to meet your needs. There may even be similar questions here on StackOverflow if you search about. For example, perhaps this meets your needs
XSLT - Using substring with copy-of to preserve inner HTML tags
I am sure there may be others if you look. If not, feel free to ask a new question, giving your input HTML, and your expected output, so that it is clear what your requirements are.

How to deal with presence or not of xml namespaces using xslt

I have some XML/TEI documents, and i'm writing an XSLT 2.0 to extract their content.
Almost all TEI documents has no namespace, but one has the default namespace (xmlns="http://www.tei-c.org/ns/1.0").
So all documents has the same aspect, with unqulified tags like <TEI> or <teiHeader>, but if I try to extract the content, all works with "non-namespaced-documents", but nothing (of course) is extracted from the namespaced-document.
So i used the attribute xpath-default-namespace="http://www.tei-c.org/ns/1.0" and now (of course) the only document working is the namespaced one.
I can't edit documents at all, so what I'm asking is if there's a way to change dynamically the xpath-default-namespace in order to make work xpaths like //teiHeader both with namespaced and non-namespaced documents
If you are using XSLT 2.0, then you do have the option for a wildcard match for the namespace in a node test.
e.g. //*:teiHeader
http://www.w3.org/TR/xpath20/#node-tests
A node test can also have the form
*:NCName. In this case, the node test is true for any node of the principal
node kind of the step axis whose local
name matches the given NCName,
regardless of its namespace or lack of
a namespace.
This is functionally equivalent to Dimitre Novatchev's example, but a little shorter/easier to type.
However, this will only work in XSLT/XPATH 2.0.
There isn't really a clean way to do precisely what you are asking. However, there are workarounds available. You could use a two stage process whereby you strip the namespace from the document if it's present and then pass it through the same templates for all content.
There is a good example (in XSLT 1) of doing this in the DocBook XSLT. Take a look at html/docbook.xsl and common/stripns.xsl
Basically, you would need to assign the result of stripping the namespace to a variable and then call your existing templates (for the non namespaced) content but select the variable.
It is ugly, but this gives you what you want:
//*[name()='teiHeader']
If you use this style for all location steps in any XPath expression, the XPath expressions will select elements only by name, regardless whether or not the elements belong to any namespace.

preproccesing in XSLT

is it at all possible to 'pre-proccess' in XSLT?
with preprocessing i mean updating the (in memory representation) of the source tree.
is this possible, or do i need to do multiple transforms for it.
use case:
we have Docbook reference manuals for out clients but for certain clients these need different 'skins' (different images etc). so what i was hoping to do is transform the image fileref path depending on a parameter. then apply the rest of the normal Docbook XSL templates.
Expanding on Eamon's answer...
In the case of either XSLT 1.0 or 2.0, you'd start by putting the intermediate (pre-processed) result in an <xsl:variable> element, declared either globally (top-level) or locally (inside a template).
<xsl:variable name="intermediate-result">
<!-- code to create pre-processed result, e.g.: -->
<xsl:apply-templates mode="pre-process"/>
</xsl:variable>
In XSLT 2.0, the value of the $intermediate-result variable is a node sequence consisting of one document node (was called "root node" in XSLT/XPath 1.0). You can access and use it just as you would any other variable, e.g., select="$intermediate-result/doc"
But in XSLT 1.0, the value of the $intermediate-result variable is not a first-class node-set. Instead, it's something called a "result tree fragment". It behaves like a node-set containing one root node, but you're restricted in how you can use it. You can copy it and get its string-value, but you can't drill down using XPath, as in select="$intermediate-result/doc". To do that, you must first convert it to a first-class node-set using your processor's node-set() extension function. In Saxon 6.5, libxslt, and 4xslt, you can use exsl:node-set() (as in Eamon's answer). In MSXML, you'd need to use msxsl:node-set(), where xmlns:msxsl="urn:schemas-microsoft-com:xslt", and in Xalan, I believe it's called xalan:nodeset() (without the hyphen, but you'll have to Google for the namespace URI). For example: select="exsl:node-set($intermediate-result)/doc"
XSLT 2.0 simply abolished the result tree fragment, making node-set() unnecessary.
This is not possible with standards compliant XSLT 1.0. It is possible in every actual implementation I've used, however. The extensions with which to do that differ by engine, however. It is also possible in standard XSLT 2.0 (which is in any case much easier to work with - so if you can, just use that).
If your xslt processor supports EXSLT, the exsl:node-set() function does what you're looking for. msxml has an identically named extension function as well (but with a different namespace uri, the functions are unfortunately not trivially compatible).
Since you are trying to generate slightly different output from the same DocBook XML source, you might want to look into the "profiling" (conditional markup) support in DocBook XSL stylesheets. See Chapter 26 in DocBook XSL: The Complete Guide by Bob Stayton:
Profiling is the term used in DocBook
to describe conditional text.
Conditional text means you can create
a single XML document with some
elements marked as conditional. When
you process such a document, you can
specify which conditions apply for
that version of output, and the
stylesheet will include or exclude the
marked text to satisfy the conditions.
This feature is useful when you need
to produce more than one version of a
document, and the versions differ in
minor ways.
For example, to use different images for, say, Windows and Mac versions of the same document, you might have a DocBook XML fragment like this:
<figure>
<title>The Foo dialog</title>
<mediaobject>
<imageobject os="windows">
<imagedata fileref="screenshots/windows/foo.png"/>
</imageobject>
<imageobject os="mac">
<imagedata fileref="screenshots/mac/foo.png"/>
</imageobject>
</mediaobject>
</figure>
Then, you would use the profiling-enabled versions of the DocBook XSL stylesheets with the profile.os parameter set to windows or mac.
Maybe you should use XSLT "OOP" methods here. Put all the common templates to all clients in a stylesheet, and create an stylesheet for each client with specific templates overriding common ones. Import the common stylesheet within the specific ones with xsl:import, and you'll do only one processing by calling the stylesheet corresponding to a client.