XSLT 2.0 Splitting current-group() by first occurence of an element - xslt

Using XSLT 2.0, suppose you have current-group() = { A, X, B, B, X } where A, B, and X are elements. What is an efficient and legible way to split it on the first occurrence of B to get two sequences S1 and S2 such that S1 = { A, X } and S2 = { B, B, X }? Is it possible to accomplish this using a xsl:for-each-group construct?
EDIT: The elements of the current-group() are not guaranteed to be siblings but are guaranteed to be in document order.
First attempt: Using xsl:for-each-group with group-starting-with
<xsl:for-each-group select="current-group()" group-starting-with="B[1]">
<xsl:choose>
<xsl:when test="position() = 1">
<!-- S1 := current-group() -->
</xsl:when>
<xsl:otherwise>
<!-- S2 := current-group() -->
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
This works provided there is no preceding sibling B to the first B of the current-group().
I would have thought the position predicate [1] would be scoped to the select clause since current-group()[self::B][1] returns the correct B. I'm curious to know why it doesn't scope this way.
XML
<root>
<A>A1</A>
<B>B1-1</B>
<B>B1-2</B>
<A>A2</A>
<B>B2-1</B>
<B>B2-2</B>
</root>
XSLT
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="*" group-starting-with="A">
<xsl:for-each-group select="current-group()" group-starting-with="B[1]">
<xsl:choose>
<xsl:when test="position() = 1">
<S1><xsl:copy-of select="current-group()" /></S1>
</xsl:when>
<xsl:otherwise>
<S2><xsl:copy-of select="current-group()" /></S2>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
Result
<root>
<S1>
<A>A1</A>
</S1>
<S2>
<B>B1-1</B>
<B>B1-2</B>
</S2>
<S1>
<A>A2</A>
<B>B2-1</B>
<B>B2-2</B>
</S1>
</root>
As you can see the first group is correctly split, but the second group is not. This will work, however, if you wrap the current-group() in a parent and then pass that to the select clause, but that seems inefficient.

The functx library defines a functions functx:index-of-node (http://www.xsltfunctions.com/xsl/functx_index-of-node.html):
<xsl:function name="functx:index-of-node" as="xs:integer*"
xmlns:functx="http://www.functx.com">
<xsl:param name="nodes" as="node()*"/>
<xsl:param name="nodeToFind" as="node()"/>
<xsl:sequence select="
for $seq in (1 to count($nodes))
return $seq[$nodes[$seq] is $nodeToFind]
"/>
</xsl:function>
That would reduce your second approach to
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="*" group-starting-with="A">
<xsl:variable name="pos" select="functx:index-of-node(current-group(), (current-group()[self::B])[1])"/>
<S1>
<xsl:copy-of select="current-group()[position() lt $pos]"/>
</S1>
<S2>
<xsl:copy-of select="current-group()[position() ge $pos]"/>
</S2>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
In the "new" "XSLT 4" world of Saxon 10 PE or EE with the extension functions saxon:items-before and saxon:items-from and syntax extension for anonymous functions you could write it as
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:saxon="http://saxon.sf.net/"
exclude-result-prefixes="#all" version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output indent="yes"/>
<xsl:template match="root">
<xsl:copy>
<xsl:for-each-group select="*" group-starting-with="A">
<S1>
<xsl:apply-templates
select="saxon:items-before(current-group(), .{ . instance of element(B) })"/>
</S1>
<S2>
<xsl:apply-templates
select="saxon:items-from(current-group(), .{ . instance of element(B) })"/>
</S2>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

Related

Predicates vs recursive templates vs other

consider this simple problem:
we wish to map this input to the same output except the first occurence of a 'foo' element with "#bar = '1'", we add a new attribute #wibble, so this:
<root>
<foo/>
<foo/>
<foo/>
<foo bar="1"/>
<foo bar="1"/>
<foo/>
<foo/>
<foo/>
<foo/>
<foo/>
</root>
goes to this:
<root>
<foo />
<foo />
<foo />
<foo wibble="2" bar="1" />
<foo bar="1" />
<foo />
<foo />
<foo />
<foo />
<foo />
</root>
I could implement this mapping using the identity pattern (not sure what this pattern is called), but it would go like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:apply-templates select="root" mode="findFirst"/>
</xsl:template>
<xsl:template match="#* | node()" mode="findFirst">
<xsl:copy>
<xsl:apply-templates select="#* | node()" mode="findFirst"/>
</xsl:copy>
</xsl:template>
<xsl:template match="foo[#bar='1'][1]" mode="findFirst">
<xsl:copy>
<xsl:attribute name="wibble">2</xsl:attribute>
<xsl:apply-templates select="#* | node()" mode="findFirst"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
i.e. we override the identity template with some match statement which matches the specific scenario we want to match, implement our overriding mapping, and then continue.
I use this style a lot.
Sometimes though the match statement is complex (we saw this in another question recently about mapping lines of code). I find these sort of matches problematic, in the above scenario the use case is simple, but sometimes the logic isnt easily (or at all) expressibly inside the match statement, in which case I'm tempted to fall back on recursive functional patterns, and in this case I'd write a recursive template like this.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<root>
<xsl:apply-templates select="root/foo[1]" mode="findFirst">
<xsl:with-param name="isFound" select="false()"/>
</xsl:apply-templates>
</root>
</xsl:template>
<xsl:template match="foo" mode="findFirst">
<xsl:param name="isFound"/>
<xsl:copy>
<xsl:if test="$isFound = false() and #bar = '1'">
<xsl:attribute name="wibble">2</xsl:attribute>
</xsl:if>
<xsl:apply-templates select="#* | node()" mode="identity"/>
</xsl:copy>
<xsl:choose>
<xsl:when test="$isFound = false() and #bar = '1'">
<xsl:apply-templates select="following-sibling::foo[1]" mode="findFirst">
<xsl:with-param name="isFound" select="true()"/>
</xsl:apply-templates>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="following-sibling::foo[1]" mode="findFirst">
<xsl:with-param name="isFound" select="$isFound"/>
</xsl:apply-templates>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="#* | node()" mode="identity">
<xsl:copy>
<xsl:apply-templates select="#* | node()" mode="identity"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
this basically treats the nodeset as a functional 'list', taking the head (and passing the tail implicitly).
Now we can implement much more complex logic and use parameters to pass the current state of the (effectively fold) through the recursion, but at the cost of extra complexity.
BUT....
Is this style of programming sustainable in XSLT? - I always worry about stack overflow (ironically!), due to probable non tail recursion in the XSLT engine of the recursive template.
My knowledge of XSLT 3.0 is extremely limited (any references to good learning resources always appreciated), but in a FP language the alternative to direct recursion would be to use fold, where fold is written as a tail recursive function, and fold IS available in XSLT 3.0, but is this a sensible alternative?
are there other patterns of usage that I can use?
XSLT has xsl:iterate (https://www.w3.org/TR/xslt-30/#iterate) which allows you to implement your sibling recursion in a declarative way that looks a bit like a loop and due to its structure and implementation avoids any stack overflow recursion; iterate example:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:iterate select="node()">
<xsl:param name="found" select="false()"/>
<xsl:variable name="is-first-foo" select="if (. instance of element(foo)) then not($found) and boolean(self::foo[#bar = 1]) else $found"/>
<xsl:choose>
<xsl:when test="$is-first-foo">
<xsl:copy>
<xsl:attribute name="wibble" select="2"/>
<xsl:apply-templates select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="."/>
</xsl:otherwise>
</xsl:choose>
<xsl:next-iteration>
<xsl:with-param name="found" select="$is-first-foo"/>
</xsl:next-iteration>
</xsl:iterate>
</xsl:copy>
</xsl:template>
<xsl:mode on-no-match="shallow-copy"/>
</xsl:stylesheet>
fold-left is certainly also available at the XPath 3.1 level, integrating it with the XML syntax of XSLT (3.0) is a bit more convoluted than in XQuery 3.1 where basically all is an expression. But is is certainly an option; example online:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
xmlns:mf="http://example.com/mf"
expand-text="yes">
<xsl:function name="mf:add-attribute" as="element()">
<xsl:param name="element" as="element()"/>
<xsl:copy select="$element">
<xsl:attribute name="wibble" select="2"/>
<xsl:apply-templates select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:function>
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:sequence
select="fold-left(
node(),
map { 'found-foos' : 0, 'nodes' : () },
function($a, $n) {
let $is-foo := $n instance of element(foo) and boolean($n/self::foo[#bar = 1]),
$is-first-foo := $a?found-foos = 0 and $is-foo
return
map {
'found-foos' : if ($is-foo) then $a?found-foos + 1 else $a?found-foos,
'nodes': ($a?nodes, if ($is-first-foo) then mf:add-attribute($n) else $n)
}
}
)?nodes"/>
</xsl:copy>
</xsl:template>
<xsl:mode on-no-match="shallow-copy"/>
</xsl:stylesheet>
And for your sample an accumulator might allow you to check your conditions in a declarative way and then use its value in your match pattern to check whether you need to add your attribute. Online sample of accumulator use:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:param name="pattern" static="yes" as="xs:string" select="'foo[#bar = 1][1]'"/>
<xsl:accumulator name="have-first-foo-bar" as="xs:boolean" initial-value="false()">
<xsl:accumulator-rule _match="{$pattern}" select="true()"/>
<xsl:accumulator-rule phase="end" _match="{$pattern}" select="false()"/>
</xsl:accumulator>
<xsl:template match="foo[accumulator-before('have-first-foo-bar')]">
<xsl:copy>
<xsl:attribute name="wibble" select="2"/>
<xsl:apply-templates select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:mode on-no-match="shallow-copy" use-accumulators="#all"/>
</xsl:stylesheet>
A pattern I sometimes use for this is a global variable combined with a template rule:
<xsl:variable name="special-nodes" select="//foo[#bar='1'][1]"/>
<xsl:template match="$special-nodes">...</xsl:template>
It only works, of course, in a "single document" scenario where the global variable applies to the same document that you're processing with the template rule.

XSLT 2.0: Merge sibling nodes if separated by punctuation only

Given this (simplified) XML:
<p>
<hi rend="italic">Some text</hi>, <hi rend="italic">and some more</hi>: <hi rend="italic"
>followed by some more.</hi>
<hi rend="bold">This text is fully in bold.</hi> Here we have plain text, which should't be
touched. <hi rend="bold">Here we go with bold</hi>, <hi rend="bold">yet again.</hi>
</p>
I would like to merge all the nodes that have the same name and same attribute together with all the text nodes between them but only if the normalize-space() of the text nodes can be reduced to punctuation signs.
In other words, if two or more hi[#rend='italic'] or hi[#rend='bold'] nodes are separated by text nodes containing only punctuation and spaces, they should be merged.
If, on the other hand, the text node between two hi[#rend='italic'] or two hi[#rend='bold'] nodes is not reducible to punctuation, it shouldn't be touched.
I would like to learn how to do this without hard-coding element hi and attribute #rend, i.e. I would like the stylesheet to merge any identical element/attribute combos separated by punctuation text nodes.
The punctuation characters should be matched by the regex \p{P}.
The output should look like this:
<p>
<hi rend="italic">Some text, and some more: followed by some more.</hi>
<hi rend="bold">This text is fully in bold.</hi> Here we have plain text, which should't be
touched. <hi rend="bold">Here we go with bold, yet again.</hi>
</p>
Many thanks in advance.
I am not sure there is a one step solution, one approach I could think of is a two step transformation where in a first step the inter element punctuation text nodes are transformed into elements so that the second transformation step can use group-adjacent. In the following I have used XSLT 3 and a composite grouping key composed of the element's node-name() and the sequence of node-name() sorted attribute values:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="#all"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:mode name="text-to-el" on-no-match="shallow-copy"/>
<xsl:function name="mf:match" as="xs:boolean">
<xsl:param name="e1" as="element()"/>
<xsl:param name="e2" as="element()"/>
<xsl:sequence
select="deep-equal(($e1!(node-name(), mf:sort(#* except #mf:punctuation)!data())), ($e2!(node-name(), mf:sort(#* except #mf:punctuation)!data())))"/>
</xsl:function>
<xsl:function name="mf:sort" as="attribute()*">
<xsl:param name="attributes" as="attribute()*"/>
<xsl:perform-sort select="$attributes">
<xsl:sort select="node-name()"/>
</xsl:perform-sort>
</xsl:function>
<xsl:template match="text()[matches(normalize-space(.), '^\p{P}+$') and mf:match(preceding-sibling::node()[1], following-sibling::node()[1])]" mode="text-to-el">
<xsl:element name="{node-name(preceding-sibling::node()[1])}" namespace="{namespace-uri(preceding-sibling::node()[1])}">
<xsl:apply-templates select="preceding-sibling::node()[1]/#*" mode="#current"/>
<xsl:attribute name="mf:punctuation">true</xsl:attribute>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:variable name="punctuation-text-to-element">
<xsl:apply-templates mode="text-to-el"/>
</xsl:variable>
<xsl:template match="/">
<xsl:apply-templates select="$punctuation-text-to-element/node()"/>
</xsl:template>
<xsl:template match="*[*]">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:for-each-group select="node()" composite="yes" group-adjacent="if (. instance of element()) then (node-name(), mf:sort(#* except #mf:punctuation)!data()) else false()">
<xsl:choose>
<xsl:when test="current-grouping-key() instance of xs:boolean and not(current-grouping-key())">
<xsl:apply-templates select="current-group()"/>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates select="#*, current-group()/node()"/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/gWvjQf6
In XSLT 2 you don't have composite grouping keys but of course it is possible to string-join the sequence used in the XSLT 3 sample as a grouping key into some single string grouping key, you just have to make sure you use a separator character with string-join that doesn't occur in the element names and attribute values.
Instead of using xsl:mode the identity transformation would need to be spelled out and the use of ! has to be replaced with for .. return expressions or / steps where possible:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="#all"
version="2.0">
<xsl:param name="sep" as="xs:string">|</xsl:param>
<xsl:template match="#*|node()" mode="#all">
<xsl:copy>
<xsl:apply-templates select="#*|node()" mode="#current"/>
</xsl:copy>
</xsl:template>
<xsl:function name="mf:match" as="xs:boolean">
<xsl:param name="e1" as="element()"/>
<xsl:param name="e2" as="element()"/>
<xsl:sequence
select="deep-equal(($e1/(node-name(.), for $att in mf:sort(#* except #mf:punctuation) return data($att))), ($e2/(node-name(.), for $att in mf:sort(#* except #mf:punctuation) return data($att))))"/>
</xsl:function>
<xsl:function name="mf:sort" as="attribute()*">
<xsl:param name="attributes" as="attribute()*"/>
<xsl:perform-sort select="$attributes">
<xsl:sort select="node-name(.)"/>
</xsl:perform-sort>
</xsl:function>
<xsl:template match="text()[matches(normalize-space(.), '^\p{P}+$') and mf:match(preceding-sibling::node()[1], following-sibling::node()[1])]" mode="text-to-el">
<xsl:element name="{node-name(preceding-sibling::node()[1])}" namespace="{namespace-uri(preceding-sibling::node()[1])}">
<xsl:apply-templates select="preceding-sibling::node()[1]/#*" mode="#current"/>
<xsl:attribute name="mf:punctuation">true</xsl:attribute>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:variable name="punctuation-text-to-element">
<xsl:apply-templates mode="text-to-el"/>
</xsl:variable>
<xsl:template match="/">
<xsl:apply-templates select="$punctuation-text-to-element/node()"/>
</xsl:template>
<xsl:template match="*[*]">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:for-each-group select="node()" group-adjacent="if (. instance of element()) then string-join((string(node-name(.)), for $att in mf:sort(#* except #mf:punctuation) return data($att)), $sep) else false()">
<xsl:choose>
<xsl:when test="current-grouping-key() instance of xs:boolean and not(current-grouping-key())">
<xsl:apply-templates select="current-group()"/>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates select="#*, current-group()/node()"/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
http://xsltransform.net/asnmyS

XSLT – creating a network from all children elements

With XSLT 2.0, I am trying to create a list of relations between all children of given elements, in a document such as:
<doc>
<part1>
<name>John</name>
<name>Paul</name>
<name>George</name>
<name>Ringo</name>
<place>Liverpool</place>
</part1>
<part2>
<name>Romeo</name>
<name>Romeo</name>
<name>Juliet</name>
<fam>Montague</fam>
<fam>Capulet</fam>
</part2>
</doc>
The result I would like to obtain, ideally by conflating and weighing the identical relations, would be (in whatever order) something like:
<doc>
<part1>
<rel><name>John</name><name>Paul</name></rel>
<rel><name>John</name><name>George</name></rel>
<rel><name>John</name><name>Ringo</name></rel>
<rel><name>Paul</name><name>George</name></rel>
<rel><name>Paul</name><name>Ringo</name></rel>
<rel><name>George</name><name>Ringo</name></rel>
<rel><name>John</name><place>Liverpool</place></rel>
<rel><name>Paul</name><place>Liverpool</place></rel>
<rel><name>George</name><place>Liverpool</place></rel>
<rel><name>Ringo</name><place>Liverpool</place></rel>
</part1>
<part2>
<rel weight="2"><name>Romeo</name><name>Juliet</name></rel>
<rel weight="2"><name>Romeo</name><fam>Montague</fam></rel>
<rel weight="2"><name>Romeo</name><fam>Capulet</fam></rel>
<rel><name>Juliet</name><fam>Montague</fam></rel>
<rel><name>Juliet</name><fam>Capulet</fam></rel>
<rel><fam>Montague</fam><fam>Capulet</fam></rel>
</part2>
</doc>
—but I'm not sure how to proceed. Many thanks in advance for your help.
You still haven't explained the logic that needs to be applied here, so this is based largely on a guess:
XSLT 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="/">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="doc/*">
<!-- first pass-->
<xsl:variable name="unique-items">
<xsl:for-each-group select="*" group-by="concat(name(), '|', .)">
<item name="{name()}" count="{count(current-group())}" value="{.}"/>
</xsl:for-each-group>
</xsl:variable>
<!-- output -->
<xsl:copy>
<xsl:for-each select="$unique-items/item">
<xsl:variable name="left" select="."/>
<xsl:for-each select="following-sibling::item">
<xsl:variable name="weight" select="$left/#count * #count" />
<rel>
<xsl:if test="$weight gt 1">
<xsl:attribute name="weight" select="$weight"/>
</xsl:if>
<xsl:apply-templates select="$left | ." />
</rel>
</xsl:for-each>
</xsl:for-each>
</xsl:copy>
</xsl:template>
<xsl:template match="item">
<xsl:element name="{#name}">
<xsl:value-of select="#value"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
The idea here is to remove duplicates in the first pass, then enumerate all combinations in the second (final) pass. The weight is computed by multiplying the number of occurrences of each member of a combination pair and shown only when it exceeds 1.
At least the combinatoric part of your problem could be solved with the following XSLT script. It does not solve the elimination of duplicates, but that could possibly be done in a second transformation.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- standard copy template -->
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*" />
</xsl:copy>
</xsl:template>
<xsl:template match="doc/*">
<xsl:copy>
<xsl:variable name="l" select="./*"/>
<xsl:for-each select="$l">
<xsl:variable name="a" select="."/>
<xsl:variable name="posa" select="position()"/>
<xsl:variable name="namea" select="name()"/>
<xsl:for-each select="$l">
<xsl:if test="position() > $posa and (. != $a or name() != $namea)">
<rel>
<xsl:copy-of select="$a"/>
<xsl:copy-of select="."/>
</rel>
</xsl:if>
</xsl:for-each>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When applied to the first part of your example, this produces:
<part1>
<rel><name>John</name><name>Paul</name></rel>
<rel><name>John</name><name>George</name></rel>
<rel><name>John</name><name>Ringo</name></rel>
<rel><name>John</name><place>Liverpool</place></rel>
<rel><name>Paul</name><name>George</name></rel>
<rel><name>Paul</name><name>Ringo</name></rel>
<rel><name>Paul</name><place>Liverpool</place></rel>
<rel><name>George</name><name>Ringo</name></rel>
<rel><name>George</name><place>Liverpool</place></rel>
<rel><name>Ringo</name><place>Liverpool</place></rel>
</part1>
Which seems about correct. If have no idea if the duplicate elimination (or weighting, as you call it) could be done in the same transformation.

Create a recursive string function for adding a sequence

I've a challenging problem and so far I wasn't able to solve.
Within my xlst I have variable which contains a string.
I need to add the following sequence [eol] to this string.
On a fix position namely every 65 characters
I thought to use a function or template to recursive add this charackter.
The reason is that the string length can variate in length.
<xsl:function name="funct:insert-eol" as="xs:string" >
<xsl:param name="originalString" as="xs:string?"/>
<xsl:variable name="length">
<xsl:value-of select="string-length($originalString)"/>
</xsl:variable>
<xsl:variable name="start" as="xs:integer">
<xsl:value-of select="1"/>
</xsl:variable>
<xsl:variable name="eol" as="xs:integer">
<xsl:value-of select="65"/>
</xsl:variable>
<xsl:variable name="newLines">
<xsl:value-of select="$length idiv number('65')"/>
</xsl:variable>
<xsl:for-each select="1 to $newLines">
<xsl:value-of select="substring($originalString, $start, $eol)" />
</xsl:for-each>
</xsl:function>
The more I write code the more variables I need to introduce. This is still my lack on understanding.
For example we want every 5 chars an [eol]
aaaaaaabbbbbbccccccccc
aaaaa[eol]aabbb[eol]bbbcc[eol]ccccc[eol]cc
Hope someone has a starting point for me..
Regards Dirk
Rather straight-forward and short -- no recursion is necessary (and can even be specified as a single XPath expression):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:param name="pLLength" select="5"/>
<xsl:template match="/*">
<xsl:variable name="vText" select="string()"/>
<xsl:for-each select="1 to string-length($vText) idiv $pLLength +1">
<xsl:value-of select="substring($vText, $pLLength*(position()-1)+1, $pLLength)"/>
<xsl:if test=
"not(position() eq last()
or position() eq last() and string-length($vText) mod $pLLength)">[eol]</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on this XML document:
<t>aaaaaaabbbbbbccccccccc</t>
the wanted, correct result is produced:
aaaaa[eol]aabbb[eol]bbbcc[eol]ccccc[eol]cc
When this XML document is processed:
<t>aaaaaaabbbbbbcccccccccddd</t>
again the wanted, correct result is produced:
aaaaa[eol]aabbb[eol]bbbcc[eol]ccccc[eol]ccddd[eol]
You can treat it as a grouping problem, using for-each-group:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="xs mf">
<xsl:function name="mf:insert-eol" as="xs:string">
<xsl:param name="input" as="xs:string"/>
<xsl:param name="chunk-size" as="xs:integer"/>
<xsl:value-of>
<xsl:for-each-group select="string-to-codepoints($input)" group-by="(position() - 1) idiv $chunk-size">
<xsl:if test="position() gt 1"><xsl:sequence select="'eol'"/></xsl:if>
<xsl:sequence select="codepoints-to-string(current-group())"/>
</xsl:for-each-group>
</xsl:value-of>
</xsl:function>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* , node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text">
<xsl:copy>
<xsl:sequence select="mf:insert-eol(., 5)"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
That stylesheet transforms
<root>
<text>aaaaaaabbbbbbccccccccc</text>
</root>
into
<root>
<text>aaaaaeolaabbbeolbbbcceolccccceolcc</text>
</root>
Try this one:
<?xml version='1.0' ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:param name="TextToChange" select="'aaaaaaabbbbbbccccccccc'"/>
<xsl:param name="RequiredLength" select="xs:integer(5)"/>
<xsl:template match="/">
<xsl:call-template name="AddText"/>
</xsl:template>
<xsl:template name="AddText">
<xsl:param name="Text" select="$TextToChange"/>
<xsl:param name="TextLength" select="string-length($TextToChange)"/>
<xsl:param name="start" select="xs:integer(1)"/>
<xsl:param name="end" select="$RequiredLength"/>
<xsl:choose>
<xsl:when test="$TextLength gt $RequiredLength">
<xsl:value-of select="substring($Text,$start,$end)"/>
<xsl:text>[eol]</xsl:text>
<xsl:call-template name="AddText">
<xsl:with-param name="Text" select="substring-after($Text, substring($Text,$start,$end))"/>
<xsl:with-param name="TextLength"
select="string-length(substring-after($Text, substring($Text,$start,$end)))"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$Text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

xslt generate children based on split and parent node name

is it possible to do the following in xsl. I'm tring to split the contents of an element and create sub-elements based on the split. To make things trickier there are the occasional exception (ie node-4 doesn't get split). I'm wondering if there is a way i can do this without explicit splits hardcoded for each element. Again, not sure if this is possible. thanks for the help!
original XML:
<document>
<node>
<node-1>hello world1</node-1>
<node-2>hello^world2</node-2>
<node-3>hello^world3</node-3>
<node-4>hello^world4</node-4>
</node>
</document>
transformed XML
<document>
<node>
<node-1>hello world1</node-1>
<node-2>
<node2-1>hello</node2-1>
<node2-2>world2</node2-2>
</node-2>
<node-3>
<node3-1>hello</node3-1>
<node3-2>world3</node3-2>
</node-3>
<node-4>hello^world4</node-4>
</node>
</document>
To make things trickier there are the
occasional exception (ie node-4
doesn't get split). I'm wondering if
there is a way i can do this without
explicit splits hardcoded for each
element.
Pattern matching text nodes to tokenize, this more semantic stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()[contains(.,'^')]" name="tokenize">
<xsl:param name="pString" select="concat(.,'^')"/>
<xsl:param name="pCount" select="1"/>
<xsl:if test="$pString">
<xsl:element name="{translate(name(..),'-','')}-{$pCount}">
<xsl:value-of select="substring-before($pString,'^')"/>
</xsl:element>
<xsl:call-template name="tokenize">
<xsl:with-param name="pString"
select="substring-after($pString,'^')"/>
<xsl:with-param name="pCount" select="$pCount + 1"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
<xsl:template match="node-4/text()">
<xsl:value-of select="."/>
</xsl:template>
</xsl:stylesheet>
Output:
<document>
<node>
<node-1>hello world1</node-1>
<node-2>
<node2-1>hello</node2-1>
<node2-2>world2</node2-2>
</node-2>
<node-3>
<node3-1>hello</node3-1>
<node3-2>world3</node3-2>
</node-3>
<node-4>hello^world4</node-4>
</node>
</document>
Note: A classic tokenizer (In fact, this use a normalized string allowing empty items in sequence). Pattern matching and overwriting rules (preserving node-4 text node).
Here's an XSL 1.0 solution. I presume that the inconsistency in node-4 in your sample output was just a typo. Otherwise you'll have to define why node3 was split and node4 wasn't.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<document>
<node>
<xsl:apply-templates select="document/node/*"/>
</node>
</document>
</xsl:template>
<xsl:template match="*">
<xsl:variable name="tag" select="name()"/>
<xsl:choose>
<xsl:when test="contains(text(),'^')">
<xsl:element name="{$tag}">
<xsl:element name="{concat($tag,'-1')}">
<xsl:value-of select="substring-before(text(),'^')"/>
</xsl:element>
<xsl:element name="{concat($tag,'-2')}">
<xsl:value-of select="substring-after(text(),'^')"/>
</xsl:element>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
This works as long as all the nodes you want split are at the same level, under /document/node. If the real document structure is different you will have to tweak the solution to match.
Can you use XSLT 2.0? If so, it sounds like <xsl:analyze-string> is right up your alley. You can split based on a regexp.
If you need further details, ask...
solution i used:
<xsl:output omit-xml-declaration="yes" method="xml" indent="yes"/>
<xsl:preserve-space elements="*"/>
<xsl:template match="node()|#*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()[1]|#*"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:template>
<xsl:template match="node()" mode="copy">
<xsl:call-template name="identity"/>
</xsl:template>
<xsl:template match="node-2 | node-3" name="subFieldCarrotSplitter">
<xsl:variable name="tag" select="name()"/>
<xsl:element name="{$tag}">
<xsl:for-each select="str:split(text(),'^')">
<xsl:element name="{concat($tag,'-',position())}">
<xsl:value-of select="text()"/>
</xsl:element>
</xsl:for-each>
</xsl:element>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:template>