XSLT 2 - find first missing element in source list - xslt

I have problem with XSLT and/or XPATH. Let's say I have XML Input:
<context>
<pdpid-set>
<list>
<item>1</item>
<item>2</item>
<item>4</item>
<item>6</item>
<item>7</item>
<item>8</item>
</list>
</pdpid-set>
</context>
Task is: find FIRST missing element in array pdpid-set/list. In example above answer is 3.
I tried to use <xsl:for-each to find missing element but there is no possibility to break such loop so my XSL produce more than one element in output:
<xsl:variable name="list" select="context/pdpid-set/list"/>
<xsl:variable name="length" select="count(context/pdpid-set/list/item)"/>
<xsl:for-each select="1 to ($length)">
<xsl:variable name="position" select="position()"/>
<xsl:if test="$list/item[$position] > $position">
<missing-value>
<xsl:value-of select="$position"/>
</missing-value>
</xsl:if>
</xsl:for-each>
in code above output will be:
<missing-value>3</missing-value><missing-value>4</missing-value><missing-value>5</missing-value>...
I don't want to have more than one missing-value. Any suggestion?

Even in XPath 1.0
/context
/pdpid-set
/list
/item[not(position()=.)][1]
Do note: this select the first item not aligned with the ascending order. I still think that position() is better than following-sibling axis performance wise and for code clarity. Also, it lets you easily change starting number and step like in:
/context
/pdpid-set
/list
/item[not((position() - 1) * $step + $start = .)][1]

Task is: find FIRST missing element in array pdpid-set/list. In
example above answer is 3
Here is a correct XPath 1.0 expression that when evaluates to the wanted result (3):
/*/*/*/item[not(. +1 = following-sibling::*[1])][1] + 1
The XPath expression in the currently selected answer, on the other side, selects this element:
<item>4</item>
And the complete correct XSLT 1.0 transformation is:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<missing-value>
<xsl:copy-of select="/*/*/*/item[not(. +1 = following-sibling::*[1])][1] + 1"/>
</missing-value>
</xsl:template>
</xsl:stylesheet>
When applied on the provided XML document, the wanted, correct result is produced:
<missing-value>3</missing-value>
Finally, if the task is to find all missing elements:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match=
"item[following-sibling::* and not(number(.) +1 = following-sibling::*[1]/number())]">
<xsl:for-each select="xs:integer(.) + 1 to following-sibling::*[1]/xs:integer(.) -1">
<missing-value><xsl:copy-of select="."/></missing-value>
</xsl:for-each>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
when this XSLT 2.0 transformation is applied on the following XML document (missing 3, 5, and 6):
<context>
<pdpid-set>
<list>
<item>1</item>
<item>2</item>
<item>4</item>
<item>7</item>
<item>8</item>
</list>
</pdpid-set>
</context>
the wanted, correct result is produced:
<missing-value>3</missing-value>
<missing-value>5</missing-value>
<missing-value>6</missing-value>

Related

XSLT 3.0 Streaming with Grouping and Sum/Accumulator

I'm trying to figure out how to use XSLT Streaming (to reduce memory usage) in a scenario that requires grouping (with an arbitrary number of groups) and summing the group. So far I haven't been able to find any examples. Here's an example XML
<?xml version='1.0' encoding='UTF-8'?>
<Data>
<Entry>
<Genre>Fantasy</Genre>
<Condition>New</Condition>
<Format>Hardback</Format>
<Title>Birds</Title>
<Count>3</Count>
</Entry>
<Entry>
<Genre>Fantasy</Genre>
<Condition>New</Condition>
<Format>Hardback</Format>
<Title>Cats</Title>
<Count>2</Count>
</Entry>
<Entry>
<Genre>Non-Fiction</Genre>
<Condition>New</Condition>
<Format>Paperback</Format>
<Title>Dogs</Title>
<Count>4</Count>
</Entry>
</Data>
In XSLT 2.0 I would use this to group by Genre, Condition and Format and Sum the counts.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="yes" />
<xsl:template match="/">
<xsl:call-template name="body"/>
</xsl:template>
<xsl:template name="body">
<xsl:for-each-group select="Data/Entry" group-by="concat(Genre,Condition,Format)">
<xsl:value-of select="Genre"/>
<xsl:value-of select="Condition"/>
<xsl:value-of select="Format"/>
<xsl:value-of select="sum(current-group()/Count)"/>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
For output I would get two lines, a sum of 5 for Fantasy, New, Hardback and a sum of 4 for Non-Fiction, New, Paperback.
Obviously this won't work with Streaming because the sum accesses the whole group. I think I need to iterate through the document twice. The first time I could build a map of the groups (creating a new group if one doesn't exist yet). The second time The problem is I also need an accumulator for each group with a rule that matches the group, and it doesn't seem you can create dynamic accumulators.
Is there a way to create accumulators on the fly? Is there another/easier way to do this with Streaming?
To be able to use streamed grouping with XSLT 3.0 one option that I see is to first transform the element based data you have into attribute based data using a stylesheet like
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
exclude-result-prefixes="xs math"
version="3.0">
<xsl:mode streamable="yes" on-no-match="shallow-copy"/>
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Entry/*">
<xsl:attribute name="{name()}" namespace="{namespace-uri()}" select="."/>
</xsl:template>
</xsl:stylesheet>
then you can perfectly used streamed grouping (as far as a streamed group-by is possible at all, as far as I understand there will be some buffering necessary) as follows:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
exclude-result-prefixes="xs math"
version="3.0">
<xsl:mode streamable="yes"/>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:fork>
<xsl:for-each-group select="Data/Entry" composite="yes" group-by="#Genre, #Condition, #Format">
<xsl:value-of select="current-grouping-key(), sum(current-group()/#Count)"/>
<xsl:text>
</xsl:text>
</xsl:for-each-group>
</xsl:fork>
</xsl:template>
</xsl:stylesheet>
I don't know whether first creating an attribute centric document is an option but I think it is better to share suggestions with code in an answer instead of trying to put them into a comment. And the answer in XSLT Streaming Chained Transform shows how to use Saxon 9 with Java or Scala to chain two streaming transformations without the need to write a temporary output file for the first transformation step.
As for doing it with copy-of on the original input format, Saxon 9.7 EE assesses the following as streamable and executes it with the right result:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="xs math"
version="3.0">
<xsl:mode streamable="yes"/>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each-group select="copy-of(Data/Entry)" composite="yes"
group-by="Genre, Condition, Format">
<xsl:value-of select="current-grouping-key(), sum(current-group()/Count)"/>
<xsl:text>
</xsl:text>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
I am not sure it consumes less memory however than normal, tree based grouping. Perhaps you can measure with your real input data.
As a third alternative, to use a map as you seemed to want to do, here is an xsl:iterate example that iterates through the Entry elements, collecting the accumulated Count value in a map:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
xmlns:map="http://www.w3.org/2005/xpath-functions/map" exclude-result-prefixes="xs math map"
version="3.0">
<xsl:mode streamable="yes"/>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:iterate select="Data/Entry">
<xsl:param name="groups" as="map(xs:string, xs:integer)" select="map{}"/>
<xsl:on-completion>
<xsl:value-of select="map:keys($groups)!(. || ' ' || $groups(.))" separator="
"/>
</xsl:on-completion>
<xsl:variable name="current-entry" select="copy-of()"/>
<xsl:variable name="key"
select="string-join($current-entry/(Genre, Condition, Format), '|')"/>
<xsl:next-iteration>
<xsl:with-param name="groups"
select="
if (map:contains($groups, $key)) then
map:put($groups, $key, map:get($groups, $key) + xs:integer($current-entry/Count))
else
map:put($groups, $key, xs:integer($current-entry/Count))"
/>
</xsl:next-iteration>
</xsl:iterate>
</xsl:template>
</xsl:stylesheet>

How do I iterate over an xs:list?

Consider the schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="TheList">
<xs:simpleType>
<xs:list itemType="xs:string" />
</xs:simpleType>
</xs:element>
</xs:schema>
And the xml:
<TheList>
This list has 5 values.
</TheList>
How can I iterate over each of the words in the list? To create something like:
<item>This</item>
<item>list</item>
<item>has</item>
<item>5</item>
<item>values.</item>
Based on the answers I've found here and here, I should do something like:
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="TheList">
<xsl:for-each select="tokenize(., ' ')">
<item><xsl:value-of select="." /></item>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
However, at least in Altova's XML spy, I am getting this error:
Wrong occurrence to match required sequence type: The supplied sequence ('5' item(s)) has the wrong occurrence to match the sequence type xs:string ('zero or one')
Using the built in debugger, I have been able to determine that the error is thrown when calling tokenize on an element that has been declared as an xs:list. Which makes sense, since the element should already be split according to the rules regarding xs:list. To me, this suggests:
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="TheList">
<xsl:for-each select=".">
<item><xsl:value-of select="." /></item>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
However, this treats the list as a single item and does not create a new item element for each word.
The for-each command seems to treat the xs:list as a single element while the tokenize function seems to treat the same xs:list as multiple elements. What am I missing?
If you're using a schema-aware transformation, then you don't need to tokenize the value yourself - the process of atomization does it for you automatically.
<xsl:template match="TheList">
<xsl:for-each select="data(.)">
<item><xsl:value-of select="." /></item>
</xsl:for-each>
</xsl:template>
If you want the code to work in both schema-aware and non-schema-aware environments you can write
<xsl:template match="TheList">
<xsl:for-each select="tokenize(string(.), ' ')">
<item><xsl:value-of select="." /></item>
</xsl:for-each>
</xsl:template>

XSL condition to check if node exists

I want to check if in my XML exists node that has type attribute containing string type_attachment_.
Is it a correct way to check it?
<xsl:if test="count(*[contains(#Type, 'type_attachment_')]) > 0">
something
</xsl:if>
I don't know how nested can this node be. It can be for example as simple as that:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl"?>
<hello-world>
<greeter>
<dsdsds>An XSLT Programmer
<greeting type = 'type_attachment_'>Hello, World!
</greeting>
</dsdsds>
</greeter>
</hello-world>
but can also contain this node nested in different other elements.
Expressions that match existing nodes are truthy. Expressions that do not match any nodes are falsy.
Therefore, you don't need to count the set of nodes returned. Simply test to see if anything matches.
<xsl:if test="*[contains(#Type, 'type_attachment')]">
something
</xsl:if>
Find out an example:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:param name="filt">
<filters>
<ritem type="type_attachment_" relateditemnumber="8901037"/>
<ritem relateditemnumber="8901038"/>
<ritem type="type_attachment_" relateditemnumber="8901039"/>
<ritem relateditemnumber="8901040"/>
</filters>
</xsl:param>
<xsl:template match="/">
<xsl:for-each select="$filt/filters/ritem[#type='type_attachment_']">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
OUTPUT:
<ritem type="type_attachment_" relateditemnumber="8901037"/>
<ritem type="type_attachment_" relateditemnumber="8901039"/>

Transform an non numbered list to a numbered list with XSLT

I am trying to transform
<Address>
<Line>Some street1</Line>
<Line>Some street2</Line>
<Line>Some street3</Line>
...
</Address>
into
<Address1>Some street1</Address1>
<Address2>Some street2</Address2>
<Address3>Some street3</Address3>
<Address4></Address4>
<Address5></Address5>
The first xml is malleable and can be redefined if neccessary, however the second xml is part of a legacy system which cannot me changed.
Most of what I find, correctly, points me to using attributes but unfortunatly, its the element itself that I wish to edit.
Would anyone be able to assist or if not, point me in the right direction?
As easy as this, and probably the shortest solution:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="Line">
<xsl:element name="Address{position()}"><xsl:apply-templates/></xsl:element>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<Address>
<Line>Some street1</Line>
<Line>Some street2</Line>
<Line>Some street3</Line>
</Address>
the wanted, correct result is produced:
<Address1>Some street1</Address1>
<Address2>Some street2</Address2>
<Address3>Some street3</Address3>
Explanation:
Proper use of xsl:element and AVTs (Attribute Value Templates).
Have a look at the <xsl:element> element. In its name attribute, you can also supply an expression that is computed while running the XSLT:
<xsl:template match="Line">
<xsl:element name="{concat('Address', position())}"><xsl:value-of select="text()"/></xsl:element>
</xsl:template>
Update: position() is one-based.
It can be done by mangling a new element with the current position() :
<xsl:template match="/Address">
<Addresses>
<xsl:for-each select="Line">
<xsl:variable name="elename" select="concat('Address', string(position()))"></xsl:variable>
<xsl:element name="{$elename}">
<xsl:value-of select="text()"/>
</xsl:element>
</xsl:for-each >
</Addresses>
</xsl:template>

XSL: How to concatenate nodes with conditions?

I have the following code (eg):
<response>
<parameter>
<cottage>
<cot>
<res>
<hab desc="Lakeside">
<reg cod="OB" prr="600.84>
<lwz>TR#2#AB#200.26#0#QB#OK#20120829#20120830#EU#3-0#</lwz>
<lwz>TR#2#AB#200.26#0#QB#OK#20120830#20120831#EU#3-0#</lwz>
<lwz>TR#2#AB#200.26#0#QB#OK#20120831#20120901#EU#3-0#</lwz>
I need to create a concatenated string that includes the whole of the first 'lwz' line and then the price (200.26, but it can be different in each line) for each corresponding line.
So the output, separating each line with | would be:
TR#2#AB#200.26#0#QB#OK#20120829#20120830#EU#3-0#|200.26|200.26
Thanks
This XSLT 1.0 transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="lwz[1]">
<xsl:value-of select="."/>
</xsl:template>
<xsl:template match="lwz[position() >1]">
<xsl:value-of select=
"concat('
',
substring-before(substring-after(substring-after(substring-after(.,'#'),'#'),'#'),'#')
)
"/>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
when applied on the provided text (converted to a well-formed XML document !!!):
<response>
<parameter>
<cottage>
<cot>
<res>
<hab desc="Lakeside">
<reg cod="OB" prr="600.84">
<lwz>TR#2#AB#200.26#0#QB#OK#20120829#20120830#EU#3-0#</lwz>
<lwz>TR#2#AB#200.26#0#QB#OK#20120830#20120831#EU#3-0#</lwz>
<lwz>TR#2#AB#200.26#0#QB#OK#20120831#20120901#EU#3-0#</lwz>
</reg>
</hab>
</res>
</cot>
</cottage>
</parameter>
</response>
produces the wanted, correct result:
TR#2#AB#200.26#0#QB#OK#20120829#20120830#EU#3-0#
200.26
200.26
II XSLT 2.0 solution:
This transformation:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="lwz[1]">
<xsl:value-of select="."/>
</xsl:template>
<xsl:template match="lwz[position() >1]">
<xsl:value-of select=
"concat('
', tokenize(.,'#')[4])"/>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
when applied on the above XML document, again produces the wanted, correct result. Note the use of the standard XPath 2.0 function tokenize():
TR#2#AB#200.26#0#QB#OK#20120829#20120830#EU#3-0#
200.26
200.26
You can use the XPath substring function to select substrings from your lwz node data. You don't really give much more detail about your problem, if you want a more detailed answer, perhaps provide the full XML document and your best-guess XSLT