XSLT - merge nodes and remove duplicates - is this feasible? - xslt

XSPT experts, can you please guide on the following requirement?
Source XML file:
<root>
<G1>
<EMPLOYEE_ID>1</EMPLOYEE_ID>
<G1>
<G1>
<EMPLOYEE_ID>2</EMPLOYEE_ID>
<G1>
<G2>
<EMPLOYEE_ID>1</EMPLOYEE_ID>
<G2>
<G2>
<EMPLOYEE_ID>3</EMPLOYEE_ID>
<G2>
<G3>
<EMPLOYEE_ID>1</EMPLOYEE_ID>
<G3>
<root>
Have a requirement to get all unique EMPLOYEE_IDs and print as flat file pipe delimitted as follows. We have 3 unique IDs 1,2 and 3. ID -1 is common for G1,G2 and G3 and print it for all groups. ID -2 is for G1 alone and ID -3 is for G2 alone and print those.
G1|1|
G2|1|
G3|1|
G1|2|
G2|3|
To achieve this, have to merge G1, G2 and G3 and remove duplicate value
for each unique Employee IDS [ 1,2,3 ]
var current_employee select="Employee ID[$index]"
for each G1[EMPLOYEE = $current_employee ]
<xsl:text>G1</xsl:text>
<xsl:text>|</xsl:text>
<xsl:value-of select="EMPLOYEE_ID"/>
for each G2[EMPLOYEE = $current_employee ]
<xsl:text>G2</xsl:text>
<xsl:text>|</xsl:text>
<xsl:value-of select="EMPLOYEE_ID"/>
for each G3[EMPLOYEE = $current_employee ]
<xsl:text>G3</xsl:text>
<xsl:text>|</xsl:text>
<xsl:value-of select="EMPLOYEE_ID"/>
How to get unique values of Employee IDs from the groups G1, G2 and G3 and store in Array to run a loop as mentioned above?

Related

combine multiple fields with the same name using regexp

I have multiple fields with the same name "item" inside json object, how can I combine their values under one field (as an array) using regex
sample input:
{"item":"A","item":"B","item":"C"}
expected output:
"item": ["A","B","C"]
note that we could have more than 3 items on above sample
I have tried {"item":"(.*?)","item":"(.*?)","item":"(.*?)"} but here I'm limited with 3 items, I need something to work for any number of items.
Thanks
The conversion you want can be performed quite easily in XSLT 2.0 or higher using:
<xsl:text>"item": [</xsl:text>
<xsl:value-of select="for $i in tokenize(translate($input, '{}', ''), ',') return substring-after($i, ':')" separator=","/>
<xsl:text>]</xsl:text>
Demo: https://xsltfiddle.liberty-development.net/eiorv1e
Of course, if your ultimate goal is to parse the input in order to use its values, then it's not necessary to put it back as JSON.

XSL joining 2 group statements for output

using saxon, i'm trying to print the totals from a table fetched by sql printed to a csv in a specific format.
the structure should be something like, and the totals i'm trying to get can be seen here too:
to keep in mind col2 can have any letter, but col4 can have only a, b or c
rough idea of how the input looks:
<ROW>
<ROW[1]>
<col1>1</col1>
<col2>a</col2>
<col3>1</col3>
<col4>a</col4>
</ROW[1]>
<ROW[2]>
<col1>2</col1>
<col2>a</col2>
<col3>2</col3>
<col4>a</col4>
</ROW[2]>
<ROW[3]>
<col1>3</col1>
<col2>a</col2>
<col3>3</col3>
<col4>a</col4>
</ROW[3]>
<ROW[4]>
<col1>6</col1>
<col2>b</col2>
<col3>2</col3>
<col4>a</col4>
</ROW[4]>
...
</ ROW>
doing totals just for col1 should be easy enough:
<xsl:template name = "totals">
<xsl:for-each-group select = "//ROW" group-by="col2">
<xsl:variable name="group_total">
<xsl:value-of select="sum(current-group()/col1)"/>
<xsl:value-of select="';'"/>
</xsl:variable>
<xsl:value-of select="'
'"/>
</xsl:for-each-group>
</xsl:template>
but what would be the best way to join totals for col3 in the same row, like totals for a are next to a?
i thought calling a template and passing col2 value to run the totals by the same principle, but that has a problem when col2 doesn't have a value from col4
I am not good at imagining input structures but if you declare a key <xsl:key name="by-col4" match="ROW" use="col4"/> and then use sum(key('by-col4', current-grouping-key())/col3) inside of your for-each-group it might give the result you have shown.

XSLT: generate-id() and 2 equal arguments

Using XSLT 2.0:
# Linenumber 8370 this code:
<TestCaseElement>
<Name><![CDATA[DUT_AC_ON]]></Name>
<TaggedValues>
</TaggedValues>
<Description>
<Line><![CDATA[{TEXT_LANG} DUT AC ON]]></Line>
<Line><![CDATA[{TEXT_ENGL} DUT AC ON]]></Line>
</Description>
<ModelingToolID><![CDATA[EAID_E9ACC0C9_D383_4ef0_99FF_F87C90BDF43C]]></ModelingToolID>
<Hash><![CDATA[1238228468]]></Hash>
<ID><![CDATA[1115]]></ID>
<Stereotypes>
<Stereotype><![CDATA[StepStart]]></Stereotype>
</Stereotypes>
<Role><![CDATA[TESTSTEP]]></Role>
</TestCaseElement>
and later in the XML-Document the same ModelingToolID
Here is an external Link to the picture to visualize: http://i.imgur.com/vTmki.png
I generate ID's with this XSL-Code:
<xsl:for-each select="/TestCases/TestCase/TestCaseElement/ModelingToolID[
( not( ../Stereotypes ) or ( ../Stereotypes/Stereotype != 'Precondition' and
../Stereotypes/Stereotype != 'Postcondition' ) ) and
(../Stereotypes/Stereotype = 'StepStart') and
( ../Role = 'TESTSTEP' or ../Role = 'VP' ) and
../Description and
( generate-id() = generate-id( key( 'ModelingToolID', .)[ 1 ] ) ) ]">
You see in Linenumber 8370 and 10296 two identic ModelingToolID's.
I need both TestCaseElements in my Transformation and in my desired output.
But, understandably, only the first will be taken.
What can i do to get both TestCaseElement's?
You see in Linenumber 8370 and 10296 two identic ModelingToolID's.
I need both TestCaseElements in my Transformation and in my desired
output. But, understandably, only the first will be taken. What can
i do to get both TestCaseElement's?
The function key() (without a predicate appended to it) by definition produces a node-set of nodes, each having the same key as the second argument.
Therefore, inside the xsl:for-each instruction you need:
key( 'ModelingToolID', .)
This selects all nodes that match the match pattern in the match attribute of the xsl:key named "ModelingToolID" -- exactly what you want to obtain.
You can use this expression in various XSLT instructions:
<xsl:variable name="vGroup" select="key( 'ModelingToolID', .)"/>
Or:
<xsl:for-each select="key( 'ModelingToolID', .)">
<!-- Process the group here -->
</xsl:for-each>
Or whatever you need to do.

how to fetch the other tags by targeting a particular tag

Here I have pasted a sample of XML that was tag names like 21A,50F,21D,22B.
Normally if I need to fetch a particular tag, I can use logic below easily in XSLT:
<xsl:choose>
<xsl:when test="tag[name = '21A'] ">
<xsl:choose>
<xsl:when test="substring(tag[name = '21A']/value,1,1) = '/'">
<xsl:variable name="result" select="concat(translate(tag[name = '21A']/value,',',' '),'
')"/>
<xsl:value-of select="substring(substring-before(substring-after($result,'
'),'
'),1,11)"/>
</xsl:when>
<xsl:when test="substring(tag[name = '21A']/value,1,1) != '/'">
<xsl:value-of select="substring(tag[name = '21A']/value,1,11)"/>
</xsl:when>
</xsl:choose>
</xsl:when>
</xsl:choose>
but here I have a requirement as Sequence A and Sequence B.
the tags which populated above 50F come under sequence A
the tags which populated below 50F come under sequence B
since we need to fetch the tags based on using 50F tag. Can any one please give a suggestion?
<local>
<message>
<block4>
<tag>
<name>21A</name>
<value>ALW1031</value>
</tag>
<tag>
<name>50F</name>
<value>TESTING CITI BANK EFT9</value>
</tag>
<tag>
<name>21D</name>
<value>OUR</value>
</tag>
<tag>
<name>22B</name>
<value>ipubby</value>
</tag>
</block4>
</message>
</local>
output required:
ALW1031,OUR
Previously if suppose they have populate 21A two times means I have used the position as [1] and [2] as while calling tag values. Now they will populate 21 tag repeatedly but tags may be A or D so I need to target 50f tag blindly. Whatever tag they will provide, either A or D before 50F I need to fetch similarly whatever they populate tags after 50F we able to fetch so avoiding positions.
Summary:
#Treemonkey: hope you had a glance of my sample XML. It has some tag like 21A,50F and so on. Assume if I have two fields field1,field2 earlier they have populated tags as same repeated tags as 21A at that time I have fetched as 21A having beside by marking position [1] for field 1 (tag[name = '21A'][1])
Similarly 21A having beside by marking position [2] for field 2, now they will populate 21 but tags were different as A or D. As I have said field1 should concentrate sequence A and field2 should concentrate as sequence B so now we should not bother about positions for fetch we have a demarcation like tag 50F whatever fields will populate before 50F has to treated as sequence A and after 50F has to be treated as sequence B.
So finally we need to write as XSLT by targeting 50F. If I want to display 21A field in (sample XML) which before 50F so we need write a logic in XSLT as select tag 21A before 50F tag for to produce data in field 1 and for field 2 we need to fetch as 21D after 50F so we need to write a logic as select 21D after 50F.
Your requirements aren't exactly clear, but the following expressions should help.
Select all siblings before the tag whose child name has a particular value:
/*/*/*/tag[name='50F']/preceding-sibling::*
Select following siblings:
/*/*/*/tag[name='50F']/following-sibling::*
Select just the immediately preceding element:
/*/*/*/tag[name='50F']/preceding-sibling::*[1]
Select preceding siblings having a child name with a particular value:
/*/*/*/tag[name='50F']/preceding-sibling::*[name='21A']
Select following siblings having a child name with a particular value:
/*/*/*/tag[name='50F']/following-sibling::*[name='21D']

Walk/loop through an XSL key: how?

Is there a way to walk-through a key and output all the values it contains?
<xsl:key name="kElement" match="Element/Element[#idref]" use="#idref" />
I though of it this way:
<xsl:for-each select="key('kElement', '.')">
<li><xsl:value-of select="." /></li>
</xsl:for-each>
However, this does not work. I simply want to list all the values in a key for testing purposes.
The question is simply: how can this be done?
You can't. That's not what keys are for.
You can loop through every element in a key using a single call to key() if and only if the key of each element is the same.
If you need to loop over everything the key is defined over, you can use the expression in the match="..." attribute of your <key> element.
So if you had a file like this:
<root>
<element name="Bill"/>
<element name="Francis"/>
<element name="Louis"/>
<element name="Zoey"/>
</root>
And a key defined like this:
<xsl:key name="survivors" match="element" use="#name"/>
You can loop through what the key uses by using the contents of its match attribute:
<xsl:for-each select="element">
<!-- stuff -->
</xsl:for-each>
Alternatively, if each element had something in common:
<root>
<element name="Bill" class="survivor"/>
<element name="Francis" class="survivor"/>
<element name="Louis" class="survivor"/>
<element name="Zoey" class="survivor"/>
</root>
Then you could define your key like this:
<xsl:key name="survivors" match="element" use="#class"/>
And iterate over all elements like this:
<xsl:for-each select="key('survivors', 'survivor')">
<!-- stuff -->
</xsl:for-each>
Because each element shares the value "survivor" for the class attribute.
In your case, your key is
<xsl:key name="kElement" match="Element/Element[#idref]" use="#idref" />
So you can loop through everything it has like this:
<xsl:for-each select="Element/Element[#idref]">
<!-- stuff -->
</xsl:for-each>
You CAN create a key to use for looping - if you simply specify a constant in the use attribute of the key element:
<xsl:key name="survivors" match="element" use="'all'"/>
Then you can loop over all elements in the following way:
<xsl:for-each select="key('survivors','all')">
...
</xsl:for-each>
Or count them:
<xsl:value-of select="count(key('survivors','all'))"/>
Note that the constant can be any string or even a number - but 'all' reads well.
However, you cannot use this key to lookup information about the individual entries (because they all have the same key).
In other words there are two types of possible keys:
"lookup keys" = standard keys with varying indexes in the use attribute
"looping keys" = keys with a constant in the use attribute
I do not know how efficient this method is to execute, it does however make the maintenance of the XSL more efficient by avoiding repetition of the same (potentially very complex) XPath expression throughout the XSL code.
Rather than think of the XSL keys in programming language terms, think of them as record sets of SQL. That will give a better understanding. For a given key index created as
<xsl:key name="paths" match="path" use="keygenerator()">
it can be "iterated"/"walk-through" as below
<xsl:for-each select="//path[generate-id()=generate-id(key('paths',keygenerator())[1])]">
To understand this magic number [1], let s go through the below example :
Consider this XML snippet
<root>
<Person>
<name>Johny</name>
<date>Jan10</date>
<cost itemID="1">34</cost>
<cost itemID="1">35</cost>
<cost itemID="2">12</cost>
<cost itemID="3">09</cost>
</Person>
<Person>
<name>Johny</name>
<date>Jan09</date>
<cost itemID="1">21</cost>
<cost itemID="1">41</cost>
<cost itemID="2">11</cost>
<cost itemID="2">14</cost>
</Person>
</root>
transformed using this XSL.
<xsl:for-each select="*/Person">
<personrecords>
<xsl:value-of select="generate-id(.)" />--
<xsl:value-of select="name"/>--
<xsl:value-of select="date"/>--
</personrecords>
</xsl:for-each>
<xsl:for-each select="*/*/cost">
<costrecords>
<xsl:value-of select="generate-id(.)" />--
<xsl:value-of select="../name"/>--
<xsl:value-of select="../date"/>--
<xsl:value-of select="#itemID"/>--
<xsl:value-of select="text()"/>
</costrecords>
</xsl:for-each>
The above XSL transformation lists the unique id of the Person nodes and the cost nodes in the form of idpxxxxxxx as the result below shows.
1. <personrecords>idp2661952--Johny--Jan10-- </personrecords>
2. <personrecords>idp4012736--Johny--Jan09-- </personrecords>
3. <costrecords>idp2805696--Johny-- Jan10-- 1-- 34</costrecords>
4. <costrecords>idp4013568--Johny-- Jan10-- 1-- 35</costrecords>
5. <costrecords>idp2808192--Johny-- Jan10-- 2-- 12</costrecords>
6. <costrecords>idp2808640--Johny-- Jan10-- 3-- 09</costrecords>
7. <costrecords>idp2609728--Johny-- Jan09-- 1-- 21</costrecords>
8. <costrecords>idp4011648--Johny-- Jan09-- 1-- 41</costrecords>
9. <costrecords>idp2612224--Johny-- Jan09-- 2-- 11</costrecords>
10.<costrecords>idp2610432--Johny-- Jan09-- 2-- 14</costrecords>
Let us create a key on the cost records using a combination of name and itemID values.
<xsl:key name="keyByNameItem" match="cost" use="concat(../name, '+', #itemID)"/>
Manually looking at the XML, the number of unique keys for the above would be three : Johny+1, Johny+2 and Johny+3.
Now lets test out this key by using the snippet below.
<xsl:for-each select="*/*/cost">
<costkeygroup>
<xsl:value-of select="generate-id(.)" />--
(1)<xsl:value-of select="generate-id(key('keyByNameItem',concat(../name, '+', #itemID) )[1] ) " />--
(2)<xsl:value-of select="generate-id(key('keyByNameItem',concat(../name, '+', #itemID) )[2] ) " />--
(3)<xsl:value-of select="generate-id(key('keyByNameItem',concat(../name, '+', #itemID) )[3] ) " />--
(4)<xsl:value-of select="generate-id(key('keyByNameItem',concat(../name, '+', #itemID) )[4] ) " />
</costkeygroup>
</xsl:for-each>
And here is the result:
1. <costkeygroup>idp2805696-- (1)idp2805696-- (2)idp4013568-- (3)idp2609728-- (4)idp4011648</costkeygroup>
2. <costkeygroup>idp4013568-- (1)idp2805696-- (2)idp4013568-- (3)idp2609728-- (4)idp4011648</costkeygroup>
3. <costkeygroup>idp2808192-- (1)idp2808192-- (2)idp2612224-- (3)idp2610432-- (4)</costkeygroup>
4. <costkeygroup>idp2808640-- (1)idp2808640-- (2)-- (3)-- (4)</costkeygroup>
5. <costkeygroup>idp2609728-- (1)idp2805696-- (2)idp4013568-- (3)idp2609728-- (4)idp4011648</costkeygroup>
6. <costkeygroup>idp4011648-- (1)idp2805696-- (2)idp4013568-- (3)idp2609728-- (4)idp4011648</costkeygroup>
7. <costkeygroup>idp2612224-- (1)idp2808192-- (2)idp2612224-- (3)idp2610432-- (4)</costkeygroup>
8. <costkeygroup>idp2610432-- (1)idp2808192-- (2)idp2612224-- (3)idp2610432-- (4)</costkeygroup>
Our interest is in trying to understand the importance of [1],[2], [3],[4]. In our case, the keygenerator is concat(../name, '+', #itemID).
For a given key, [1] refers to the first occurence of a node that satisfies the keygenerator. Similarly [2] refers to the second occurence of a node that satisfies the keygenerator. Thus [2], [3],[4], etc. are all nodes that satisfy the same key, and thus can be considered duplicates for the given key. The number of duplicates depends on the input XML. Thus:
Key Johny+1 satisfies 4 nodes (1)idp2805696-- (2)idp4013568-- (3)idp2609728-- (4)idp4011648
Key Johny+2 satisfies 3 nodes (1)idp2808192-- (2)idp2612224-- (3)idp2610432-- (4)
Key Johny+3 satisfies 1 node (1)idp2808640-- (2)-- (3)-- (4)
Thus we see that ALL 8 cost nodes of the XML can be accessed through the key.
Here is a image that combines the transformation results to help better understand.
The red squares indicate the matching nodes for Johny+1. The green squares indicate the matching nodes for Johny+3. Match the idpxxxxxxx values in <costkeygroup> to the values in <costrecords>. The <costrecords> help map the idpxxxxxxx values to the source XML.
The takeaway is that,
an XSL key does not filter or eliminate nodes. All nodes including duplicates can be accessed through the key. Thus when we say "walk through" of the key, there is no concept of a resultant subset of nodes from the original set of nodes made available to the key for processing.
To "walk through" only unique nodes of the key in the above example, use
<xsl:for-each select="*/*/workTime[generate-id()=generate-id(key('keyByNameItem', concat(../name, '+', #itemID) )[1] ) ] ">
[1] signifies that the first record for a given key value is denoted as the unique record. [1] is almost always used because there will exist at least one node that satisfies a given key value. If we are sure that there will be a minimum of 2 records to satisfy each key value in the key, we can go ahead and use [2] to identify the second record in the record set as the unique record.
P.S The words nodes / records / elements are used interchangeably.
There is no way to walk-through the keys, although we can output all the values it contains. In XSLT2 it is quite easier than in XSLT1 (e.g., using fn:generate-id according to the previous answer).
Using fn:distinct-values
<xsl:variable name="e" select="."/>
<xsl:for-each select="distinct-values(Element/Element[#idref]/#idref)">
<li key="{.}"><xsl:value-of select="key('kElement', ., $e )" /></li>
</xsl:for-each>
Using xsl:for-each-group
<xsl:for-each-group select="Element/Element[#idref]" group-by="#idref">
<li key="{current-grouping-key()}"><xsl:value-of select="current-group()" /></li>
</xsl:for-each-group>