XSLT 2.0 using key with except returns unexpected result - xslt

NB: title changed to reflect the problem better.
My xml documents contain an element <tei:seg #type #xml:id #corresp> which wrap little 'stories'. The attribute #corresp allows me to connect these stories to a master story. For example, these seg are all connected by their #corresp:
doc1.xml//seg[#type='dep_event' #corresp='#JKL' #xml:id='doc1-05']
doc2.xml//seg[#type='dep_event' #corresp='#JKL' #xml:id='doc2-06']
doc6.xml//seg[#type='dep_event' #corresp='#JKL' #xml:id='doc6-03']
My objective is: when the XSLT template finds a #corresp, find other seg in other documents with the same #corresp and output their respective `#xml:id``
So, in the above example, if the current seg was #xml:id='doc1-05', the template outputs a list: Corresponds to doc2-06, doc6-03
Until I can solve the current problems with XSLT collection() in eXist-DB, I'm falling back on my previous solution: a 'TEI corpus' xml document which maintains a master list of all related tei-xml documents via xi:include. This way I provide a single document node whereby the processor can access and search all the xml documents.
So, I declare the corpus document:
<xsl:variable name="corpus" select="doc('ms609_corpus.xml')"/>
Then create a key for the #corresp:
<xsl:key name="correspkey" match="//tei:seg[#type='dep_event' and #corresp]" use="#corresp"/>
Then I use the key with the doc() to search:
<xsl:when test="tei:seg[#type='dep_event' and #corresp]">
<xsl:variable name="correspvar"
select="data(self::seg[#type='dep_event' and #corresp]/#corresp)"/>
<xsl:text>Corresponds to </xsl:text>
<xsl:value-of select="data($corpus/(key('correspkey',$correspvar) except $correspvar)/#xml:id)" separator=", "/>
</xsl:when>
It returns the results, but the except should exclude the current #corresp. Yet it is included in the results.

The except operator works on sequences of nodes based on node identity, see https://www.w3.org/TR/xpath20/#combining_seq defining
The except operator takes two node sequences as operands and returns a
sequence containing all the nodes that occur in the first operand but
not in the second operand ... All these operators eliminate duplicate
nodes from their result sequences based on node identity
Based on that I think you simply want
<xsl:value-of select="$corpus/(key('correspkey', current()/#corresp) except current())/#xml:id)" separator=", "/>
Using data on nodes which atomizes nodes to values and then trying to use except which works on nodes doesn't seem to make sense to me.

Related

Simple XSLT template failing in some cases

Part of some XSLT I am working on is this very simple template to show up an unresolved reference type of error.
<!-- a basic check when matching on copying index elements - are they referring to a defined item element -->
<xsl:template match="index" mode="expand">
<xsl:variable name="index_name_xml"><xsl:value-of select="#name"/></xsl:variable>
<xsl:if test="not(//item[#name=$index_name_xml])">
<xsl:message terminate="yes"><xsl:value-of select="concat('FAIL : cannot find "',$index_name_xml,'" in items')"/></xsl:message>
</xsl:if>
</xsl:template>
When this element
<index name="User X Ordinate"/>
is matched in input doc the above template is called, the templates xpath SHOULD find this node (in input doc)
<item name="User X Ordinate" address="UserXOrd_s" usage="realtime" type="uint16_t" unit="unit_ordinate_q8" />
but it doesn't and I get my fail message
FAIL : cannot find "User X Ordinate" in dbitems Error at char 7 in xsl:value-of/#select on line 253 column 130 of db_expander.xsl:
XTMM9000: Processing terminated by xsl:message at line 253 in db_expander.xsl
and I am scratching my head as there are dozens of cases in my transformation where the template does what I want, and TWO cases when it doesn't (a clue I cant figure out yet). I cant see any spelling errors and the two slashes in the xpath should mean ALL 'item' elements at any level in the document are checked. I cant see how is doesn't work.
EDIT :: Apologies for this amateurish post. I kind of got lost trying to recreate a simple version of the problem where I could post the whole source. My partial understanding is that the problem may be related to how the XSL is passing the node /context into the template -- its slightly out of my depth at the moment but -- result tree fragment / context in the source XML?
However, if I add a 'root' variable into the template (shown below) the template does what I want -- the problems are gone -- so the problem seems to be relating to the context being passed. I tried but failed to make a small stand alone example that fails to post here -- my tests kept working...so I am obviously still not grasping a finer point(s) yet.
<xsl:variable name="root" select="/"/>
<xsl:template match="index" mode="expand">
<xsl:variable name="index_name" > <xsl:value-of select="#name"/></xsl:variable>
<xsl:choose>
<xsl:when test="$root//dbgroup//item[#name=$index_name]">
<!--xsl:message terminate="no">
<xsl:value-of select="concat('item found for : ',$index_name, ' (parent is ',parent::node()/#name,')')"/>
</xsl:message-->
</xsl:when>
<xsl:otherwise>
<xsl:message terminate="no">
<xsl:value-of select="concat('item NOT found for : ',$index_name, ' (parent is ',parent::node()/#name,')')"/>
</xsl:message>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
I need to do some more reading as I dont know a good way to debug this other than xsl:messages....
If a template is supplied with an RTF (result tree fragment) then unfortunately '//*' doesn't refer to ANY element in the hierarchy of the source document anymore, but rather ANY element in the RTF hierarchy which does not contain all the elements (in my case / most cases) of the source document.
Hence why I needed to use the $root variable inside the template in my 'EDIT' above in order to get access to elements not in the RTF.
The trick is knowing that you will end up with an RTF when you apply templates within a variable declaration in order to populate it. And so when this is passed to a template, you will need another way to get back to the context of your source document.

Consuming the same node twice in Streaming XSLT

I am trying to convert some XML to the interstitial JSON representation specified in XSLT 3.0 (for later conversion to JSON via xml-to-json). My current XSLT works well as a non-streaming stylesheet, but I'm running into issues during the conversion to streaming. Specifically, there are situations where I need to consume the same node twice, especially in the case of repeating tags in XML, which I am converting to arrays in equivalent JSON representation.
<xsl:if test="boolean(cdf:AdjudicatorName)">
<array key="AdjudicatorName">
<xsl:for-each select="cdf:AdjudicatorName">
<string>
<xsl:value-of select="."/>
</string>
</xsl:for-each>
</array>
</xsl:if>
boolean(cdf:AdjudicatorName) tests for the existance of the tag, and creates an array if so.
This code fails in Oxygen (Saxon-EE) with the following message
Template rule is declared streamable but it does not satisfy the streamability rules. * There is more than one consuming operand: {fn:exists(...)} on line {X}, and {<string {(attr{key=...}, ...)}/>} on line {Y}
I am aware of the copy-of workaround, however, many items in the source file can repeat at the highest level, so use of this approach would yield minimal memory savings.
This looks like the perfect use case for xsl:where-populated:
<xsl:where-populated>
<array key="AdjudicatorName">
<xsl:for-each select="cdf:AdjudicatorName">
<string>
<xsl:value-of select="."/>
</string>
</xsl:for-each>
</array>
</xsl:where-populated>
The xsl:where-populated instruction (which was invented for exactly this purpose) logically evaluates its child instructions, and then eliminates any items in the resulting sequence that are "deemed empty", where an element is deemed empty if it has no children. In a streaming implementation the start tag (<array>) will be "held back" until its first child is generated, and when the corresponding end tag (</array>) is emitted, the pair of tags will be discarded if there were no intervening children emitted.

Using <xsl:for-each> for incremental element names

I have an XML that is converted from a java map. So all the map keys are converted into node names. The XML structure is as below
<map>
<firstName>AAA</firstName>
<firstName1>BBB</firstName1>
<firstName2>CCC</firstName2>
<firstName3>DDD</firstName3>
</map>
I am trying to write a for-each loop to extract data from this XML to create an output XML. I have tried most of the options available such as name(), local-name(), contains(), etc but couldn't come up with something that worked. What are the options available since the incremental node name can go upto count 100 or more. Any inputs in coding the loop would be of great help. I am using XSLT 1.0.
There are many ways to select the children of the top element (map):
/*/*
This selects all elements that are children of the top element of the XML document.
/*/*[starts-with(name(), 'firstName')]
This selects all top element's children-elements, whose name starts with the string 'firstName'.
/*/*[starts-with(name(), 'firstName')
and floor(substring-after(name(), 'firstName')) = substring-after(name(), 'firstName')) ]
This selects all top element's children-elements, whose name starts with the string 'firstName' and the remaining substring after this is an integer.
/*/*[starts-with(name(), 'firstName')
and translate(name(), '0123456789', '') = 'firstName')) ]
This selects all top element's children-elements, whose name starts with the string 'firstName' and the remaining substring after this contains only digits.
Finally, in XPath 2.0 (XSLT 2.0) one can use regular expressions:
/*/*[matches(name(), '^firstName\d+$')]
This will select all the first level elements and their information, which you can then use as you wish:
<xsl:for-each select="/*/*">
<xsl:value-of select="local-name()"/>
<xsl:value-of select="."/>
</xsl:for-each>

XSL unique value key

Goal
(XSLT 1.0). My goal is to take a set of elements, S, and produce another set, T, where T contains the unique elements in S. And to do so as efficiently as possible. (Note: I don't have to create a variable containing the set, or anything like that. I just need to loop over the elements that are unique).
Example Input and Key
<!-- My actual input consists of a bunch of <Result> elements -->
<AllMyResults>
<Result>
<someElement>value</state>
<otherElement>value 2</state>
<subject>Get unique subjects!</state>
</Result>
</AllMyResults>
<xsl:key name="SubjectKey" match="AllMyResults/Result" use="subject"/>
I think the above works, but when I go to use my key, it is incredibly slow. Below is the code for how I use my key.
<xsl:for-each select="Result[count(. | key('SubjectKey', subject)[1]) = 1]">
<xsl:sort select="subject" />
<!-- Do something with the unique subject value -->
<xsl:value-of select="subject" />
</xsl:for-each>
Additional Info
I believe I am doing this wrong because it slowed down my XSL considerably. As some additional info, the code shown above is in a separate XSL file from my main XSL file. From the main XSL, I am calling a template that contains the xsl:key and the for-each shown above. The input to this template is an xsl:param containing my node-set (similar to the example input shown above).
I can't see any reason from the information given why the code should be slow. It might be worth seeing if the slowness is something that happens on all XSLT processors, or if it's peculiar to one.
Try substituting
count(. | key('SubjectKey', subject)[1]) = 1
with:
generate-id() = generate-id(key('SubjectKey', subject)[1])
In some XSLT processors the latter is much faster.

XSL: Combining grouping and call-template

I've read with interest the techniques available on the web to extract a unique list of items from a XML file containing duplicates using XSL.
These range into 2 categories:
1) The Muenchian method (example: http://www.jenitennison.com/xslt/grouping/)
2) Or the previous-sibling look-up
These both rely on an XPath expression to select the data to group by.
However, in the XML file that I'm trying to work out, the data is not present "natively" in the XML file. I am using a xsl:template to compute some aggregated data from my elements. And I would like to group based on the aggregated data.
For example I have:
<filmsreview>
<record><data name='movie'>Star Wars</data><data name='ratings'>John:Good, Mary:Good</data></record>
<record><data name='movie'>Indiana Jones</data><data name='ratings'>John:Good, Mary:Bad, Helen:Average</data></record>
<record><data name='movie'>Titanic</data><data name='ratings'>John:Bad, Helen:Good</data></record>
</filmsreview>
I know that the structuration of data is not perfect and that by creating sub-elements I could do something easier, but I cannot change the data source easily, so let's take this as a challenge.
And I would like to build a recap where I have John's distinct ratings:
John's ratings:
Good
Bad
I have a xsl:template that can take a record element and return John's rating for this record:
Example:
<xsl:template name="get_rating">
<xsl:param name="reviewer" />
<!-- I use some string manipulation, and xsl:value-of to return the review for John-->
</xsl:template>
I can just call it under a xsl:for-each to get the exhaustive list of John's review. But I cannot combine this call with any of the methods to get unique values.
Do I have to use an intermediary XSL to convert my XML file to a more structured way? Or can I do in a single step?
Many thanks
Gerard
Hmm... This should be possible using xslt variables and the nodeset method, perhaps something like this:
<xsl:variable name="_johnsRatings">
<xsl:apply-templates select="/" mode="johnsRatings" />
</xsl:variable>
<xsl:variable name="johnsRatings" select="msxsl:node-set($_johnsRatings)" />
<xsl:template match="/" mode="johnsRatings">
<Ratings>
<xsl:for-each select="/filmsReview/record/data[#name='ratings']">
<Rating><xsl:call-template name="get_rating" /></Rating>
</xsl:for-each>
</Ratings>
</xsl:template>
At this point, it should be possible to query the $johnsRatings variable using standard XPath queries, and you can use either of the two methods you mentioned above to retrieve unique values from it...
Hope that helps
EDIT:
I don't know what XSLT engine you are using, I assumed you have access to the msxsl:node-set() function. However, most XSLT processors have similar methods, so you might have to search around for an equivalent method in your processor