Given this (simplified) XML:
<p>
<hi rend="italic">Some text</hi>, <hi rend="italic">and some more</hi>: <hi rend="italic"
>followed by some more.</hi>
<hi rend="bold">This text is fully in bold.</hi> Here we have plain text, which should't be
touched. <hi rend="bold">Here we go with bold</hi>, <hi rend="bold">yet again.</hi>
</p>
I would like to merge all the nodes that have the same name and same attribute together with all the text nodes between them but only if the normalize-space() of the text nodes can be reduced to punctuation signs.
In other words, if two or more hi[#rend='italic'] or hi[#rend='bold'] nodes are separated by text nodes containing only punctuation and spaces, they should be merged.
If, on the other hand, the text node between two hi[#rend='italic'] or two hi[#rend='bold'] nodes is not reducible to punctuation, it shouldn't be touched.
I would like to learn how to do this without hard-coding element hi and attribute #rend, i.e. I would like the stylesheet to merge any identical element/attribute combos separated by punctuation text nodes.
The punctuation characters should be matched by the regex \p{P}.
The output should look like this:
<p>
<hi rend="italic">Some text, and some more: followed by some more.</hi>
<hi rend="bold">This text is fully in bold.</hi> Here we have plain text, which should't be
touched. <hi rend="bold">Here we go with bold, yet again.</hi>
</p>
Many thanks in advance.
I am not sure there is a one step solution, one approach I could think of is a two step transformation where in a first step the inter element punctuation text nodes are transformed into elements so that the second transformation step can use group-adjacent. In the following I have used XSLT 3 and a composite grouping key composed of the element's node-name() and the sequence of node-name() sorted attribute values:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="#all"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:mode name="text-to-el" on-no-match="shallow-copy"/>
<xsl:function name="mf:match" as="xs:boolean">
<xsl:param name="e1" as="element()"/>
<xsl:param name="e2" as="element()"/>
<xsl:sequence
select="deep-equal(($e1!(node-name(), mf:sort(#* except #mf:punctuation)!data())), ($e2!(node-name(), mf:sort(#* except #mf:punctuation)!data())))"/>
</xsl:function>
<xsl:function name="mf:sort" as="attribute()*">
<xsl:param name="attributes" as="attribute()*"/>
<xsl:perform-sort select="$attributes">
<xsl:sort select="node-name()"/>
</xsl:perform-sort>
</xsl:function>
<xsl:template match="text()[matches(normalize-space(.), '^\p{P}+$') and mf:match(preceding-sibling::node()[1], following-sibling::node()[1])]" mode="text-to-el">
<xsl:element name="{node-name(preceding-sibling::node()[1])}" namespace="{namespace-uri(preceding-sibling::node()[1])}">
<xsl:apply-templates select="preceding-sibling::node()[1]/#*" mode="#current"/>
<xsl:attribute name="mf:punctuation">true</xsl:attribute>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:variable name="punctuation-text-to-element">
<xsl:apply-templates mode="text-to-el"/>
</xsl:variable>
<xsl:template match="/">
<xsl:apply-templates select="$punctuation-text-to-element/node()"/>
</xsl:template>
<xsl:template match="*[*]">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:for-each-group select="node()" composite="yes" group-adjacent="if (. instance of element()) then (node-name(), mf:sort(#* except #mf:punctuation)!data()) else false()">
<xsl:choose>
<xsl:when test="current-grouping-key() instance of xs:boolean and not(current-grouping-key())">
<xsl:apply-templates select="current-group()"/>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates select="#*, current-group()/node()"/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/gWvjQf6
In XSLT 2 you don't have composite grouping keys but of course it is possible to string-join the sequence used in the XSLT 3 sample as a grouping key into some single string grouping key, you just have to make sure you use a separator character with string-join that doesn't occur in the element names and attribute values.
Instead of using xsl:mode the identity transformation would need to be spelled out and the use of ! has to be replaced with for .. return expressions or / steps where possible:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="#all"
version="2.0">
<xsl:param name="sep" as="xs:string">|</xsl:param>
<xsl:template match="#*|node()" mode="#all">
<xsl:copy>
<xsl:apply-templates select="#*|node()" mode="#current"/>
</xsl:copy>
</xsl:template>
<xsl:function name="mf:match" as="xs:boolean">
<xsl:param name="e1" as="element()"/>
<xsl:param name="e2" as="element()"/>
<xsl:sequence
select="deep-equal(($e1/(node-name(.), for $att in mf:sort(#* except #mf:punctuation) return data($att))), ($e2/(node-name(.), for $att in mf:sort(#* except #mf:punctuation) return data($att))))"/>
</xsl:function>
<xsl:function name="mf:sort" as="attribute()*">
<xsl:param name="attributes" as="attribute()*"/>
<xsl:perform-sort select="$attributes">
<xsl:sort select="node-name(.)"/>
</xsl:perform-sort>
</xsl:function>
<xsl:template match="text()[matches(normalize-space(.), '^\p{P}+$') and mf:match(preceding-sibling::node()[1], following-sibling::node()[1])]" mode="text-to-el">
<xsl:element name="{node-name(preceding-sibling::node()[1])}" namespace="{namespace-uri(preceding-sibling::node()[1])}">
<xsl:apply-templates select="preceding-sibling::node()[1]/#*" mode="#current"/>
<xsl:attribute name="mf:punctuation">true</xsl:attribute>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:variable name="punctuation-text-to-element">
<xsl:apply-templates mode="text-to-el"/>
</xsl:variable>
<xsl:template match="/">
<xsl:apply-templates select="$punctuation-text-to-element/node()"/>
</xsl:template>
<xsl:template match="*[*]">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:for-each-group select="node()" group-adjacent="if (. instance of element()) then string-join((string(node-name(.)), for $att in mf:sort(#* except #mf:punctuation) return data($att)), $sep) else false()">
<xsl:choose>
<xsl:when test="current-grouping-key() instance of xs:boolean and not(current-grouping-key())">
<xsl:apply-templates select="current-group()"/>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates select="#*, current-group()/node()"/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
http://xsltransform.net/asnmyS
Related
Using XSLT 2.0, suppose you have current-group() = { A, X, B, B, X } where A, B, and X are elements. What is an efficient and legible way to split it on the first occurrence of B to get two sequences S1 and S2 such that S1 = { A, X } and S2 = { B, B, X }? Is it possible to accomplish this using a xsl:for-each-group construct?
EDIT: The elements of the current-group() are not guaranteed to be siblings but are guaranteed to be in document order.
First attempt: Using xsl:for-each-group with group-starting-with
<xsl:for-each-group select="current-group()" group-starting-with="B[1]">
<xsl:choose>
<xsl:when test="position() = 1">
<!-- S1 := current-group() -->
</xsl:when>
<xsl:otherwise>
<!-- S2 := current-group() -->
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
This works provided there is no preceding sibling B to the first B of the current-group().
I would have thought the position predicate [1] would be scoped to the select clause since current-group()[self::B][1] returns the correct B. I'm curious to know why it doesn't scope this way.
XML
<root>
<A>A1</A>
<B>B1-1</B>
<B>B1-2</B>
<A>A2</A>
<B>B2-1</B>
<B>B2-2</B>
</root>
XSLT
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="*" group-starting-with="A">
<xsl:for-each-group select="current-group()" group-starting-with="B[1]">
<xsl:choose>
<xsl:when test="position() = 1">
<S1><xsl:copy-of select="current-group()" /></S1>
</xsl:when>
<xsl:otherwise>
<S2><xsl:copy-of select="current-group()" /></S2>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
Result
<root>
<S1>
<A>A1</A>
</S1>
<S2>
<B>B1-1</B>
<B>B1-2</B>
</S2>
<S1>
<A>A2</A>
<B>B2-1</B>
<B>B2-2</B>
</S1>
</root>
As you can see the first group is correctly split, but the second group is not. This will work, however, if you wrap the current-group() in a parent and then pass that to the select clause, but that seems inefficient.
The functx library defines a functions functx:index-of-node (http://www.xsltfunctions.com/xsl/functx_index-of-node.html):
<xsl:function name="functx:index-of-node" as="xs:integer*"
xmlns:functx="http://www.functx.com">
<xsl:param name="nodes" as="node()*"/>
<xsl:param name="nodeToFind" as="node()"/>
<xsl:sequence select="
for $seq in (1 to count($nodes))
return $seq[$nodes[$seq] is $nodeToFind]
"/>
</xsl:function>
That would reduce your second approach to
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="*" group-starting-with="A">
<xsl:variable name="pos" select="functx:index-of-node(current-group(), (current-group()[self::B])[1])"/>
<S1>
<xsl:copy-of select="current-group()[position() lt $pos]"/>
</S1>
<S2>
<xsl:copy-of select="current-group()[position() ge $pos]"/>
</S2>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
In the "new" "XSLT 4" world of Saxon 10 PE or EE with the extension functions saxon:items-before and saxon:items-from and syntax extension for anonymous functions you could write it as
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:saxon="http://saxon.sf.net/"
exclude-result-prefixes="#all" version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output indent="yes"/>
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="*" group-starting-with="A">
<S1>
<xsl:apply-templates
select="saxon:items-before(current-group(), .{ . instance of element(B) })"/>
</S1>
<S2>
<xsl:apply-templates
select="saxon:items-from(current-group(), .{ . instance of element(B) })"/>
</S2>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
There are many questions about how to remove duplicate elements when you can group those elements by a certain attribute or value, however, in my case the attributes are being dynamically generated in the XSLT already and I don't want to have to program in every attribute for every element to use as a grouping key.
How do you remove duplicate elements without knowing in advance their attributes? So far, I've tried using generate-id() on each element and grouping by that, but the problem is generate-id isn't generating the same ID for elements with the same attributes:
<xsl:template match="root">
<xsl:variable name="tempIds">
<xsl:for-each select="./*>
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:attribute name="tempID">
<xsl:value-of select="generate-id(.)"/>
</xsl:attribute>
<xsl:copy-of select="node()"/>
</xsl:copy>
</xsl:for-each>
</xsl:variable>
<xsl:for-each-group select="$tempIds" group-by="#tempID">
<xsl:sequence select="."/>
</xsl:for-each-group>
</xsl:template>
Test data:
<root>
<child1>
<etc/>
</child1>
<dynamicElement1 a="2" b="3"/>
<dynamicElement2 c="3" d="4"/>
<dynamicElement2 c="3" d="5"/>
<dynamicElement1 a="2" b="3"/>
</root>
With the end result being only one of the two dynamicElement1 elements remaining:
<root>
<child1>
<etc/>
</child1>
<dynamicElement1 a="2" b="3"/>
<dynamicElement2 c="3" d="4"/>
<dynamicElement2 c="3" d="5"/>
</root>
In XSLT 3 as shown in https://xsltfiddle.liberty-development.net/pPqsHTi you can use a composite key of all attributes with e.g.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output indent="yes"/>
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="*" composite="yes" group-by="#*">
<xsl:sequence select="."/>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Note that technically attributes are not ordered so it might be safer to group by a sort of the attributes by node-name() or similar, as done with XSLT 3 without higher-order functions in https://xsltfiddle.liberty-development.net/pPqsHTi/2
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:mf="http://example.com/mf"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output indent="yes"/>
<xsl:function name="mf:node-sort" as="node()*">
<xsl:param name="input-nodes" as="node()*"/>
<xsl:perform-sort select="$input-nodes">
<xsl:sort select="namespace-uri()"/>
<xsl:sort select="local-name()"/>
</xsl:perform-sort>
</xsl:function>
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="*" composite="yes" group-by="mf:node-sort(#*)">
<xsl:sequence select="."/>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
or as you could do with Saxon EE simply with
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="*" composite="yes" group-by="sort(#*, (), function($att) { namespace-uri($att), local-name($att) })">
<xsl:sequence select="."/>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="root/*[#a= following-sibling::*/#a]|root/*[#c= following-sibling::*/#c and #d= following-sibling::*/#d]"/>
You may try this
With XSLT 2.0, I am trying to create a list of relations between all children of given elements, in a document such as:
<doc>
<part1>
<name>John</name>
<name>Paul</name>
<name>George</name>
<name>Ringo</name>
<place>Liverpool</place>
</part1>
<part2>
<name>Romeo</name>
<name>Romeo</name>
<name>Juliet</name>
<fam>Montague</fam>
<fam>Capulet</fam>
</part2>
</doc>
The result I would like to obtain, ideally by conflating and weighing the identical relations, would be (in whatever order) something like:
<doc>
<part1>
<rel><name>John</name><name>Paul</name></rel>
<rel><name>John</name><name>George</name></rel>
<rel><name>John</name><name>Ringo</name></rel>
<rel><name>Paul</name><name>George</name></rel>
<rel><name>Paul</name><name>Ringo</name></rel>
<rel><name>George</name><name>Ringo</name></rel>
<rel><name>John</name><place>Liverpool</place></rel>
<rel><name>Paul</name><place>Liverpool</place></rel>
<rel><name>George</name><place>Liverpool</place></rel>
<rel><name>Ringo</name><place>Liverpool</place></rel>
</part1>
<part2>
<rel weight="2"><name>Romeo</name><name>Juliet</name></rel>
<rel weight="2"><name>Romeo</name><fam>Montague</fam></rel>
<rel weight="2"><name>Romeo</name><fam>Capulet</fam></rel>
<rel><name>Juliet</name><fam>Montague</fam></rel>
<rel><name>Juliet</name><fam>Capulet</fam></rel>
<rel><fam>Montague</fam><fam>Capulet</fam></rel>
</part2>
</doc>
—but I'm not sure how to proceed. Many thanks in advance for your help.
You still haven't explained the logic that needs to be applied here, so this is based largely on a guess:
XSLT 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="/">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="doc/*">
<!-- first pass-->
<xsl:variable name="unique-items">
<xsl:for-each-group select="*" group-by="concat(name(), '|', .)">
<item name="{name()}" count="{count(current-group())}" value="{.}"/>
</xsl:for-each-group>
</xsl:variable>
<!-- output -->
<xsl:copy>
<xsl:for-each select="$unique-items/item">
<xsl:variable name="left" select="."/>
<xsl:for-each select="following-sibling::item">
<xsl:variable name="weight" select="$left/#count * #count" />
<rel>
<xsl:if test="$weight gt 1">
<xsl:attribute name="weight" select="$weight"/>
</xsl:if>
<xsl:apply-templates select="$left | ." />
</rel>
</xsl:for-each>
</xsl:for-each>
</xsl:copy>
</xsl:template>
<xsl:template match="item">
<xsl:element name="{#name}">
<xsl:value-of select="#value"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
The idea here is to remove duplicates in the first pass, then enumerate all combinations in the second (final) pass. The weight is computed by multiplying the number of occurrences of each member of a combination pair and shown only when it exceeds 1.
At least the combinatoric part of your problem could be solved with the following XSLT script. It does not solve the elimination of duplicates, but that could possibly be done in a second transformation.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- standard copy template -->
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*" />
</xsl:copy>
</xsl:template>
<xsl:template match="doc/*">
<xsl:copy>
<xsl:variable name="l" select="./*"/>
<xsl:for-each select="$l">
<xsl:variable name="a" select="."/>
<xsl:variable name="posa" select="position()"/>
<xsl:variable name="namea" select="name()"/>
<xsl:for-each select="$l">
<xsl:if test="position() > $posa and (. != $a or name() != $namea)">
<rel>
<xsl:copy-of select="$a"/>
<xsl:copy-of select="."/>
</rel>
</xsl:if>
</xsl:for-each>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When applied to the first part of your example, this produces:
<part1>
<rel><name>John</name><name>Paul</name></rel>
<rel><name>John</name><name>George</name></rel>
<rel><name>John</name><name>Ringo</name></rel>
<rel><name>John</name><place>Liverpool</place></rel>
<rel><name>Paul</name><name>George</name></rel>
<rel><name>Paul</name><name>Ringo</name></rel>
<rel><name>Paul</name><place>Liverpool</place></rel>
<rel><name>George</name><name>Ringo</name></rel>
<rel><name>George</name><place>Liverpool</place></rel>
<rel><name>Ringo</name><place>Liverpool</place></rel>
</part1>
Which seems about correct. If have no idea if the duplicate elimination (or weighting, as you call it) could be done in the same transformation.
I have two types of input xml, one with namespace prefix and another one without prefix and i want to replace the namespaces.
Sample1
<v1:Library xmlns:v1="http://testlibrary" xmlns:v2="http://commonprice">
<v1:Books_details>
<v1:Name>test1</v1:Name>
<v1:title>test2</v1:title>
<v2:price xmlns="http://commonprice">12</v2:price>
</v1:Books_details>
</v1:Library>
Sample2
<Library xmlns="http://testlibrary">
<Books_details>
<Name>test1</Name>
<title>test2</title>
<price xmlns="http://commonprice">12</price>
</Books_details>
</Library>
I have written following XSLT to change the namespace from "http://testlibrary" to "http://newlibrary" and it works fine for sample1 but it doesn't work for the sample2. It gives wrong result. It also the change the namespace of the price element even though it doesn't have namespace to be replaced.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:param name="old_namespace"/>
<xsl:param name="new_namespace"/>
<xsl:template match="/">
<xsl:apply-templates select="#* | node()"/>
</xsl:template>
<xsl:template match="text() | comment() | processing-instruction()">
<xsl:copy>
<xsl:apply-templates select="text() | comment() | processing-instruction()"/>
</xsl:copy>
</xsl:template>
<!-- Template used to copy elements -->
<xsl:template match="*">
<xsl:variable name="name">
<xsl:choose>
<xsl:when test="contains(name(), ':')">
<xsl:value-of select="name()"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="local-name()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:element name="{$name}" namespace="{$new_namespace}">
<!-- Copy all namespace through except for namespace to be changed -->
<xsl:for-each select="namespace::*">
<xsl:if test="string(.)!=$old_namespace">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:for-each>
<xsl:apply-templates select="node()|#*"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Note: My first answer was wrong because in XSLT 1.0, you can't use variable references within a match pattern.
This style-sheet will take either Sample1 or Sample2 as input document and replace all occurrences of the old namespace, from the element names, with the new namespace. Note: you can change the xsl:variable for xsl:param.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:variable name="old_namespace" select="'http://testlibrary'" />
<xsl:variable name="new_namespace" select="'http://newlibrary'" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:choose>
<xsl:when test="namespace-uri()=$old_namespace">
<xsl:element name="{local-name()}" namespace="{$new_namespace}">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Caveat
The above style-sheet will only change the namespace of the elements. It will not change the namespaces of attributes, nor will it remove extraneous namespaces nodes.
On a more specific solution
It is very unusual to require a general solution for an operation on a variable namespace. Namespaces, being what they are tend to be fixed and known. Consider carefully, if you really need a generalized solution. If you need a specific solution, meaning replacing occurrences of a specific namespace, then things get a lot easier, such as with this style-sheet...
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:old="http://testlibrary"
xmlns:new="http://newlibrary">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="old:*">
<xsl:element name="{local-name()}" namespace="http://newlibrary">
<xsl:apply-templates select="#*|node()" />
</xsl:element>
</xsl:template>
</xsl:stylesheet>
UPDATE
I just noticed Dimitre's solution to a very similar question here.
I'm trying to match a series of xml tags that are comma separated, and to then apply an xslt transformation on the whole group of nodes plus text. For example, given the following partial XML:
<p>Some text here
<xref id="1">1</xref>,
<xref id="2">2</xref>,
<xref id="3">3</xref>.
</p>
I would like to end up with:
<p>Some text here <sup>1,2,3</sup>.</p>
A much messier alternate would also be acceptable at this point:
<p>Some text here <sup>1</sup><sup>,</sup><sup>2</sup><sup>,</sup><sup>3</sup>.</p>
I have the transformation to go from a single xref to a sup:
<xsl:template match="xref"">
<sup><xsl:apply-templates/></sup>
</xsl:template>
But I'm at a loss as to how to match a group of nodes separated by commas.
Thanks.
Update: Thanks to #Flynn1179 who alerted me that the solution wasn't producing exactly the wanted output, I have slightly modified it. Now the wanted "good" format is produced.
This XSLT 1.0 transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()[1]|#*"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:template>
<xsl:template match=
"xref[not(preceding-sibling::node()[1]
[self::text() and starts-with(.,',')]
)
]">
<xsl:variable name="vBreakText" select=
"following-sibling::text()[not(starts-with(.,','))][1]"/>
<xsl:variable name="vPrecedingTheBreak" select=
"$vBreakText/preceding-sibling::node()"/>
<xsl:variable name="vFollowing" select=
".|following-sibling::node()"/>
<xsl:variable name="vGroup" select=
"$vFollowing[count(.|$vPrecedingTheBreak)
=
count($vPrecedingTheBreak)
]
"/>
<sup>
<xsl:apply-templates select="$vGroup" mode="group"/>
</sup>
<xsl:apply-templates select="$vBreakText"/>
</xsl:template>
<xsl:template match="text()" mode="group">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
</xsl:stylesheet>
when applied on the following XML document (based on the provided one, but made more complex and interesting):
<p>Some text here
<xref id="1">1</xref>,
<xref id="2">2</xref>,
<xref id="3">3</xref>.
<ttt/>
<xref id="4">4</xref>,
<xref id="5">5</xref>,
<xref id="6">6</xref>.
<zzz/>
</p>
produces exactly the wanted, correct result:
<p>Some text here
<sup>1,2,3</sup>.
<ttt/>
<sup>4,5,6</sup>.
<zzz/>
</p>
Explanation:
We use a "fined-grained" identity rule, which processes the document node-by node in document order and copies the matched node "as-is"
We override the identity rule with a template that matches any xref element that is the first in a group of xref elements, each of which (but the last one) is followed by an immediate text-node-sibling that starts with the ',' character. Here we find the first text-node-sibling that breaks the rule (its starting character isn't ','.
Then we find all the nodes in the group, using the Kayessian (after #Michael Kay) formula for the intersection of two nodesets. This formula is: $ns1[count(.|$ns2) = count($ns2)]
Then we process all nodes in the group in a mode named "group".
Finally, we apply templates (in anonymous mode) to the breaking text node (that is the first node following the group), so that the chain of processing continues.
Interesting question. +1.
Here's an XSLT 2.0 solution:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:variable name="comma-regex">^\s*,\s*$</xsl:variable>
<!-- Identity transform -->
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<!-- Don't directly process xrefs that are second or later in a comma-separated series.
Note that this template has a higher default priority than the following one,
because of the predicate. -->
<xsl:template match="xref[preceding-sibling::node()[1]/
self::text()[matches(., $comma-regex)]/
preceding-sibling::*[1]/self::xref]" />
<!-- Don't directly process comma text nodes that are in the middle of a series. -->
<xsl:template match="text()[matches(., $comma-regex) and
preceding-sibling::*[1]/self::xref and following-sibling::*[1]/self::xref]" />
<!-- for xrefs that first (or solitary) in a comma-separated series: -->
<xsl:template match="xref">
<sup>
<xsl:call-template name="process-xref-series">
<xsl:with-param name="next" select="." />
</xsl:call-template>
</sup>
</xsl:template>
<xsl:template name="process-xref-series">
<xsl:param name="next"/>
<xsl:if test="$next">
<xsl:value-of select="$next"/>
<xsl:variable name="followingXref"
select="$next/following-sibling::node()[1]/
self::text()[matches(., $comma-regex)]/
following-sibling::*[1]/self::xref"/>
<xsl:if test="$followingXref">
<xsl:text>,</xsl:text>
<xsl:call-template name="process-xref-series">
<xsl:with-param name="next" select="$followingXref"/>
</xsl:call-template>
</xsl:if>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
(This could be simplified if we could make some assumptions about the input.)
Run against the sample input you gave, the result is:
<p>Some text here
<sup>1,2,3</sup>.
</p>
The second alternative can be achieved with
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p/text()[normalize-space() = ',' and preceding-sibling::node()[1][self::xref]]">
<sup>,</sup>
</xsl:template>
<xsl:template match="xref">
<sup>
<xsl:apply-templates/>
</sup>
</xsl:template>
</xsl:stylesheet>
There's an almost trivial solution to your 'messy alternative':
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="xref">
<sup>
<xsl:apply-templates />
</sup>
</xsl:template>
<xsl:template match="text()[normalize-space(.)=',']">
<sup>,</sup>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*| node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
EDIT: I just noticed it's almost a clone of Martin's solution, except without the additional check of a preceding xref element on the commas. His is probably safer :)
And a slightly less trivial solution to your preferred result, although this only works if you only have one collection of xref tags in any p tag. You didn't mention the possibility of more than one collection, and even if there are, I would have thought it unlikely they'd be within the same containing p tag. If that can happen though, it's possible to extend it further to allow for that, although it will get a lot more complicated.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="xref[not(preceding-sibling::text()[normalize-space(.)=','])]">
<sup>
<xsl:value-of select="." />
<xsl:for-each select="following-sibling::text() | following-sibling::xref">
<xsl:if test="following-sibling::text()[substring(.,1,1)='.']">
<xsl:value-of select="normalize-space(.)" />
</xsl:if>
</xsl:for-each>
</sup>
</xsl:template>
<xsl:template match="xref | text()[normalize-space(.)=',']" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*| node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
In case you can use XSLT 2.0 (e.g. with Saxon 9 or AltovaXML or XQSharp) then here is an XSLT 2.0 solution that should produce the first output you asked for:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p">
<xsl:for-each-group select="node()" group-adjacent="self::xref or self::text()[normalize-space() = ',']">
<xsl:choose>
<xsl:when test="current-grouping-key()">
<sup>
<xsl:value-of select="current-group()/normalize-space()" separator=""/>
</sup>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>