How to find unmatched rows with XSLT - xslt

I have two large xml files, one of which has the following format:
<Persons>
<Person>
<ID>1</ID>
<LAST_NAME>London</LAST_NAME>
</Person>
<Person>
<ID>2</ID>
<LAST_NAME>Twain</LAST_NAME>
</Person>
<Person>
<ID>3</ID>
<LAST_NAME>Dikkens</LAST_NAME>
</Person>
</Persons>
The second file has the following format:
<SalesPersons>
<SalesPerson>
<ID>2</ID>
<LAST_NAME>London</LAST_NAME>
</SalesPerson>
<SalesPerson>
<ID>3</ID>
<LAST_NAME>Dikkens</LAST_NAME>
</SalesPerson>
</SalesPersons>
I need to find those records from file 1, which does not exist in file 2. Although I have it done using for-each loop, such an approach is taking a substantial amount of time. Is it possible to somehow make it run faster using a different approach?

Using a key can help to improve performance on lookups:
<xsl:key name="sales-person" match="SalesPerson" use="concat(ID, '|', LAST_NAME)"/>
<xsl:template match="/">
<xsl:for-each select="Persons/Person">
<xsl:variable name="person" select="."/>
<!-- need to change context document for key function use -->
<xsl:for-each select="$doc2">
<xsl:if test="not(key('sales-person', concat($person/ID, '|', $person/LAST_NAME)))">
<xsl:copy-of select="$person"/>
</xsl:if>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
That assumes you have bound doc2 as a variable or parameter with e.g. <xsl:param name="doc2" select="document('sales-persons.xml')"/>.

Related

How to get XSLT key-function working with my scenario?

Here is my in-data:
<Results>
<Result>
<Id>1</Id>
</Result>
<Result>
<Id>2</Id>
</Result>
</Results>
<Results>
<RefId>1</RefId>
<Text>One</Text>
</Results>
<Results>
<RefId>2</RefId>
<Text>Two</Text>
</Results>
How the output should be:
<OBR></OBR>
<OBX>One</OBX>
<OBR></OBR>
<OBX>Two</OBX>
My xslt-code
<xsl:key name="test" match="Results/Result" use="Id"/>
<xsl:template match="Results/Result">
<OBR></OBR>
<xsl:for-each select="Results[key('test', RefId)/RefId]">
<OBX><xsl:value-of select="Text" /></OBX>
</xsl:for-each>
</xsl:template>
It does not work. My result is:
<OBR></OBR>
<OBX>One</OBX>
<OBX>Two</OBX>
<OBR></OBR>
<OBX>One</OBX>
<OBX>Two</OBX>
I assume that the problem is with the for-each in my template.. It´s looping twice every time the template runs. Any suggestions?
I complicated it unnecessarily much with they key-function. I solved with just creating a variable named ID and it´s based on the Id-field. Then in the for-each I just tested if the variable and the RefId-element matched and it works perfectly.
<xsl:template match="Results/Result">
<xsl:variable name="ID" select="Id"></xsl:variable>
<OBR></OBR>
<xsl:for-each select="Results[(RefId = $ID)]">
<OBX><xsl:value-of select="Text" /></OBX>
</xsl:for-each>
</xsl:template>

XSLT using extra for-each or apply-templates using Muenchian method?

I've started learning XSLT and I've used the Muenchian method in an exercise. I've found 2 different ways of getting my expected result. With the apply-templates and with an extra for-each.
The key:
<xsl:key name="tech" match="technology" use="."/>
The first solution using the apply-templates:
<xsl:for-each select="//./technology[generate-id(.)=generate-id(key('tech', .)[1])]">
<team>
<xsl:variable name="selectedTech" select="."/>
<xsl:apply-templates select="../../person[./technology=$selectedTech]">
</team>
</xsl:for-each>
<xsl:template match="person">
<member><xsl:value-of select="name"/></member>
</xsl:template>
The second solution using an additional for-each:
<xsl:for-each select="//./technology[generate-id(.)=generate-id(key('tech', .)[1])]">
<team>
<xsl:variable name="selectedTech" select="."/>
<xsl:for-each select="key('tech', .)">
<member><xsl:value-of select="../name"/></member>
</xsl:for-each>
</team>
</xsl:for-each>
Input is something like this:
<employees>
<person>
<name>Bert</name>
<technology>IBM</technology>
</person>
<person>
<name>Jack</name>
<technology>Microsoft</technology>
</person>
<person>
<name>Karel</name>
<technology>IBM</technology>
</person>
<person>
<name>Bill</name>
<technology>Microsoft</technology>
</person>
<person>
<name>Joris</name>
<technology>OpenSource</technology>
</person>
<person>
<name>Piet</name>
<technology>OpenSource</technology>
</person>
</employees>
Is it better to use a particular solution of these 2? Or which one of these do you recommend and why?
Once you have defined a key and want to access the items in a group it is certainly more efficient to use key('key-name', keyValueExpression) to do that instead of walking an axis and writing a predicate.
So in my view instead of ../../person[./technology=$selectedTech] (where I wonder whether it does not need to be ../person[./technology=$selectedTech]) I would certainly use key('tech', .) to find the items in a group.
The decision between apply-templates or for-each is another question as you can use both.
Generally using apply-templates and separate templates a stylesheet is better structured and more readable but for quick and short ones for-each might suffice.
For the whole problem I would define the key on person
<xsl:key name="tech" match="person" use="technology"/>
<xsl:for-each select="//person[generate-id(.)=generate-id(key('tech', technology)[1])]">
<team>
<xsl:apply-templates select="key('tech', technology)">
</team>
</xsl:for-each>
<xsl:template match="person">
<member><xsl:value-of select="name"/></member>
</xsl:template>
And of course the first for-each could also be eliminated using apply-templates and a mode:
<xsl:key name="tech" match="person" use="technology"/>
<xsl:template match="root">
<xsl:copy>
<xsl:apply-templates select="//person[generate-id(.)=generate-id(key('tech', technology)[1])]" mode="team"/>
</xsl:copy>
</xsl:template>
<xsl:template match="person" mode="team">
<team>
<xsl:apply-templates select="key('tech', technology)">
</team>
</xsl:for-each>
<xsl:template match="person">
<member><xsl:value-of select="name"/></member>
</xsl:template>

Change template so XSLT Outputs a sum instead of a list of values

I have an XSLT template that is working fine.
<xsl:template match="Row[contains(BenefitType, 'MyBenefit')]">
<value>
<xsl:value-of select="BenefitList/Row/Premium* 12" />
</value>
</xsl:template>
The output is
<value>100</value>
<value>110</value>
What I would prefer is if it would just output 220. So, basically in the template I would need to use some sort of variable or looping to do this and then output the final summed value?
XSLT 1 compliance is required.
The template is being used as follows:
<xsl:apply-templates select="Root/Row[contains(BenefitType, 'MyBenefit')]" />
For some reason, when I use the contains here it only sums the first structure that matches and not all of them. If The XML values parent wasn't dependent on having a sibling element that matched a specific value then a'sum' approach would work.
The direct solution to the problem was already mentioned in the comments, but assuming you really want to do the same with some variables, this might be interesting for you:
XML:
<Root>
<Row>
<BenefitType>MyBenefit</BenefitType>
<BenefitList>
<Premium>100</Premium>
</BenefitList>
</Row>
<Row>
<BenefitType>MyBenefit, OtherBenefit</BenefitType>
<BenefitList>
<Premium>100</Premium>
</BenefitList>
</Row>
<Row>
<BenefitType>OtherBenefit</BenefitType>
<BenefitList>
<Premium>1000</Premium>
</BenefitList>
</Row>
<Row>
<BenefitType>OtherBenefit</BenefitType>
<BenefitList>
<Premium>1000</Premium>
</BenefitList>
</Row>
</Root>
XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
exclude-result-prefixes="exsl">
<xsl:template match="/">
<total>
<xsl:variable name="valuesXml">
<values>
<xsl:apply-templates select="Root/Row[contains(BenefitType, 'MyBenefit')]" />
</values>
</xsl:variable>
<xsl:variable name="values" select="exsl:node-set($valuesXml)/values/value" />
<xsl:value-of select="sum($values)" />
</total>
</xsl:template>
<xsl:template match="Row[contains(BenefitType, 'MyBenefit')]">
<value>
<xsl:value-of select="BenefitList/Premium * 12" />
</value>
</xsl:template>
</xsl:stylesheet>
Here the same result set generated in your question is saved in another variable, which can then again be processed.

xslt wrapping duplicate lines case insensitive inside a for-each

I am trying to write a loop using XSLT so that it automatically groups all items with the same ID but in a case insensitive way. Unfortunately the data that I am trying to parse through is client driven so I cannot change it prior to load.
regardless here is a XML structure...
<Document>
<Row>
<Cell>ID</Cell>
</Row>
<Row>
<Cell>hi</Cell>
</Row>
<Row>
<Cell>Hi</Cell>
</Row>
<Row>
<Cell>Hello</Cell>
</Row>
<Row>
<Cell>Hello</Cell>
</Row>
<Row>
<Cell>Hola</Cell>
</Row>
</Document>
This is the XSLT I am currently using...
<xsl:template match="Document">
<NewDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsl:for-each select="//Row[position() > 1]/Cell[1][not(.=preceding::Row/Cell[1])]">
<xsl:variable name="currentOrderID" select="." />
<xsl:variable name="currentOrderGroup" select="//Row[Cell[1] = $currentOrderID]" />
<MainID>
<xsl:value-of select="$currentOrderGroup[1]/Cell[1]"/>
</MainID>
<IDs>
<xsl:for-each select="$currentOrderGroup">
<id>
<xsl:value-of select="Cell[1]"/>
</id>
</xsl:for-each>
</IDs>
</xsl:for-each>
</NewDocument>
</xsl:template>
This is just wrapping up things as expected in a CaSe SeNSiTiVe way...
I've been trying to use a translate in there in order to make everything uppercase, however I can't seem to get the syntax just right.
The result I am trying to achieve here is this:
<NewDocument>
<MainID>hi</MainID>
<IDs>
<id>hi</id>
<id>Hi</id>
</IDs>
<MainID>Hello</MainID>
<IDs>
<id>Hello</id>
<id>Hello</id>
</IDs>
<MainID>Hola</MainID>
<IDs>
<id>Hola</id>
</IDs>
</NewDocument>
Can't seem to find anything specifically for what I need.
Thanks!
In XSLT1.0, to convert strings to lower case you need to use the rather cumbersome translate function in xpath.
translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')
Furthermore, your problem is one of grouping, and in XSLT1.0 that usually means a technique known as Meunchian Grouping. To do, this you first define a key to look up items in the groups you require
<xsl:key
name="Cell"
match="Cell"
use="translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')"/>
Here we are looking up cells based on their (lower-case) text content.
To find the first element in each group, you look for Cell elements in the XML which also happen to be the first element occurring in your look-up key
<xsl:apply-templates
select="Row/Cell
[generate-id()
= generate-id(
key('Cell',
translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'))[1])]"/>
Then, when you match the first element, you can then match all elements within the group by looking at the key.
Here is the full XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="Cell" match="Cell" use="translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')"/>
<xsl:template match="Document">
<NewDocument>
<xsl:apply-templates select="Row/Cell[generate-id() = generate-id(key('Cell', translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'))[1])]"/>
</NewDocument>
</xsl:template>
<xsl:template match="Cell">
<MainID>
<xsl:value-of select="."/>
</MainID>
<IDs>
<xsl:apply-templates select="key('Cell', translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'))" mode="group"/>
</IDs>
</xsl:template>
<xsl:template match="Cell" mode="group">
<id>
<xsl:value-of select="."/>
</id>
</xsl:template>
</xsl:stylesheet>
Note the use of the mode attribute, to distinguish between the two templates matching Cell elements.
When applied to your XML, the following is output:
<NewDocument>
<MainID>ID</MainID>
<IDs>
<id>ID</id>
</IDs>
<MainID>hi</MainID>
<IDs>
<id>hi</id>
<id>Hi</id>
</IDs>
<MainID>Hello</MainID>
<IDs>
<id>Hello</id>
<id>Hello</id>
</IDs>
<MainID>Hola</MainID>
<IDs>
<id>Hola</id>
</IDs>
</NewDocument>
Note, I wasn't sure what to do with the Cell with ID as a value, so I left that it in. If you do want to exclude it, just add this line to the XSLT
<xsl:template match="Cell[. = 'ID']" />

Produce context data for first and last occurrences of every value of an element

Given the following xml:
<container>
<val>2</val>
<id>1</id>
</container>
<container>
<val>2</val>
<id>2</id>
</container>
<container>
<val>2</val>
<id>3</id>
</container>
<container>
<val>4</val>
<id>1</id>
</container>
<container>
<val>4</val>
<id>2</id>
</container>
<container>
<val>4</val>
<id>3</id>
</container>
I'd like to return something like
2 - 1
2 - 3
4 - 1
4 - 3
Using a nodeset I've been able to get the last occurrence via:
exsl:node-set($list)/container[not(val = following::val)]
but I can't figure out how to get the first one.
To get the first and the last occurrence (document order) in each "<val>" group, you can use an <xsl:key> like this:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output method="text" />
<xsl:key name="ContainerGroupByVal" match="container" use="val" />
<xsl:variable name="ContainerGroupFirstLast" select="//container[
generate-id() = generate-id(key('ContainerGroupByVal', val)[1])
or
generate-id() = generate-id(key('ContainerGroupByVal', val)[last()])
]" />
<xsl:template match="/">
<xsl:for-each select="$ContainerGroupFirstLast">
<xsl:value-of select="val" />
<xsl:text> - </xsl:text>
<xsl:value-of select="id" />
<xsl:value-of select="'
'" /><!-- LF -->
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
EDIT #1: A bit of an explanation since this might not be obvious right away:
The <xsl:key> returns all <container> nodes having a given <val>. You use the key() function to query it.
The <xsl:variable> is where it all happens. It reads as:
for each of the <container> nodes in the document ("//container") check…
…if it has the same unique id (generate-id()) as the first node returned by key() or the last node returned by key()
where key('ContainerGroupByVal', val) returns the set of <container> nodes matching the current <val>
if the unique ids match, include the node in the selection
the <xsl:for-each> does the output. It could just as well be a <xsl:apply-templates>.
EDIT #2: As Dimitre Novatchev rightfully points out in the comments, you should be wary of using the "//" XPath shorthand. If you can avoid it, by all means, do so — partly because it potentially selects nodes you don't want, and mainly because it is slower than a more specific XPath expression. For example, if your document looks like:
<containers>
<container><!-- ... --></container>
<container><!-- ... --></container>
<container><!-- ... --></container>
</containers>
then you should use "/containers/container" or "/*/container" instead of "//container".
EDIT #3: An alternative syntax of the above would be:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output method="text" />
<xsl:key name="ContainerGroupByVal" match="container" use="val" />
<xsl:variable name="ContainerGroupFirstLast" select="//container[
count(
.
| key('ContainerGroupByVal', val)[1]
| key('ContainerGroupByVal', val)[last()]
) = 2
]" />
<xsl:template match="/">
<xsl:for-each select="$ContainerGroupFirstLast">
<xsl:value-of select="val" />
<xsl:text> - </xsl:text>
<xsl:value-of select="id" />
<xsl:value-of select="'
'" /><!-- LF -->
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Explanation: The XPath union operator "|" combines it's arguments into a node-set. By definition, a node-set cannot contain duplicate nodes — for example: ". | . | ." will create a node-set containing exactly one node (the current node).
This means, if we create a union node-set from the current node ("."), the "key(…)[1]" node and the "key(…)[last()]" node, it's node count will be 2 if (and only if) the current node equals one of the two other nodes, in all other cases the count will be 3.
Basic XPath:
//container[position() = 1] <- this is the first one
//container[position() = last()] <- this is the last one
Here's a set of XPath functions in more detail.
I. XSLT 1.0
Basically the same solution as the one by Tomalak, but more understandable Also it is complete, so you only need to copy and paste the XML document and the transformation and then just press the "Transform" button of your favourite XSLT IDE:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:key name="kContByVal" match="container"
use="val"/>
<xsl:template match="/*">
<xsl:for-each select=
"container[generate-id()
=
generate-id(key('kContByVal',val)[1])
]
">
<xsl:variable name="vthisvalGroup"
select="key('kContByVal', val)"/>
<xsl:value-of select=
"concat($vthisvalGroup[1]/val,
'-',
$vthisvalGroup[1]/id,
'
'
)
"/>
<xsl:value-of select=
"concat($vthisvalGroup[last()]/val,
'-',
$vthisvalGroup[last()]/id,
'
'
)
"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the originally-provided XML document (edited to be well-formed):
<t>
<container>
<val>2</val>
<id>1</id>
</container>
<container>
<val>2</val>
<id>2</id>
</container>
<container>
<val>2</val>
<id>3</id>
</container>
<container>
<val>4</val>
<id>1</id>
</container>
<container>
<val>4</val>
<id>2</id>
</container>
<container>
<val>4</val>
<id>3</id>
</container>
</t>
the wanted result is produced:
2-1
2-3
4-1
4-3
Do note:
We use the Muenchian method for grouping to find one container element for each set of such elements that have the same value for val.
From the whole node-list of container elements with the same val value, we output the required data for the first container element in the group and for the last container element in the group.
II. XSLT 2.0
This transformation:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="text"/>
<xsl:template match="/*">
<xsl:for-each-group select="container"
group-by="val">
<xsl:for-each select="current-group()[1], current-group()[last()]">
<xsl:value-of select=
"concat(val, '-', id, '
')"/>
</xsl:for-each>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
when applied on the same XML document as above, prodices the wanted result:
2-1
2-3
4-1
4-3
Do note:
The use of the <xsl:for-each-group> XSLT instruction.
The use of the current-group() function.