XSLT: remove all but the first occurrence of a given node - xslt

I have XML something like this:
<MyXml>
<RandomNode1>
<TheNode>
<a/>
<b/>
<c/>
</TheNode>
</RandomeNode1>
<RandomNode2>
</RandomNode2>
<RandomNode3>
<RandomNode4>
<TheNode>
<a/>
<b/>
<c/>
</TheNode>
</RandomNode4>
</RandomNode3>
</MyXml>
Where <TheNode> appears throughout the XML but not at the same level, often deep within other nodes. What I need to do is eliminate all occurrences of <TheNode> EXCEPT the first. The rest are redundant and taking up space. What would be the XSL that could do this?
I have something like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="node() | #*" />
</xsl:copy>
</xsl:template>
<xsl:template match="//TheNode[position()!=1]">
</xsl:template>
</xsl:stylesheet>
But that is not correct. Any suggestions?

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="node() | #*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="TheNode[preceding::TheNode]"/>
</xsl:stylesheet>

//TheNode[position()!=1] does not work because here, position() is relative to the parent context of each <TheNode>. It would select all <TheNode>s which are not first within their respective parent.
But you were on the right track. What you meant was:
(//TheNode)[position()!=1]
Note the parentheses - they cause the predicate to be applied to the entire selected node-set, instead of to each node individually.
Unfortunately, even though this is valid XPath expression, it is not valid as a match pattern. A match pattern must be meaningful (applicable) to an individual node, it cannot be a select expression.
So #Alohci's solution,
//TheNode[preceding::TheNode]
is the correct way to express what you want.

Other approach for the pattern would be:
<xsl:template match="TheNode[generate-id()
!= generate-id(/descendant::TheNode[1)]"/>
Note: It's more likely that an absolute expression gets optimizated inteads of a relative expression like preceding::TheNode

Related

XSLT to remove duplicates completely

I know there are solutions plenty to remove duplicates, but this one is slightly different. I need to remove the element from the output if it is a duplicate.
Input:
<SanctionList>
<row>
<PersonId>1000628</PersonId>
<PersonId>1000634</PersonId>
<PersonId>1113918</PersonId>
<PersonId>1133507</PersonId>
<PersonId>1113918</PersonId>
</row>
</SanctionList>
Output expected:
<SanctionList>
<row>
<PersonId>1000628</PersonId>
<PersonId>1000634</PersonId>
<PersonId>1133507</PersonId>
</row>
</SanctionList>
Here is what I tried but the parser returns 1 for each of the groups. Shouldnt it return 2 for PersonId 1113918 since it appears twice in the list?
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0">
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="SanctionList">
<xsl:for-each-group select="row" group-by="PersonId">
<xsl:text> Count for </xsl:text>
<xsl:value-of select="current-grouping-key()" />
<xsl:text> is </xsl:text>
<xsl:value-of select="count(current-group())" />
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
Thanks kindly!
I know there are solutions plenty to remove duplicates, but this one
is slightly different. I need to remove the element from the output if
it is a duplicate
Use this short and simple transformation (both in XSLT 2.0 and XSLT 1.0):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kPersonByVal" match="PersonId" use="."/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="PersonId[key('kPersonByVal', .)[2]]"/>
</xsl:stylesheet>
when the transformation is applied on the provided XML document:
<SanctionList>
<row>
<PersonId>1000628</PersonId>
<PersonId>1000634</PersonId>
<PersonId>1113918</PersonId>
<PersonId>1133507</PersonId>
<PersonId>1113918</PersonId>
</row>
</SanctionList>
the wanted, correct result is produced:
<SanctionList>
<row>
<PersonId>1000628</PersonId>
<PersonId>1000634</PersonId>
<PersonId>1133507</PersonId>
</row>
</SanctionList>
Explanation:
A wellknown design pattern for copying an existing XML document and deleting/replacing/inserting some nodes into the copy, is by overriding the identity rule.
In this particular case the task is to delete <PersonId> elements. This is done by providing a matching template with no (empty) body.
The criterion for deletion is that the element must have a duplicate -- that is, at least two <PersonId> elements must exist, having the same string value. This is most conveniently done using an <xsl:key> declaration and the key() function to get all elements with the same string value.
Finally, in the match pattern of the empty (deleting) template we check if the node-set of equally-valued elements has a second element.
Note: You can learn more about the <xsl:key> declaration and the key() function in module 9 of my Pluralsight training course "XSLT 2.0 and 1.0 foundations"

Adding new xml element and changing the value of an element within the same parent element

I need to make certain modifications to my XML input, depending on certain conditions. I am using XSLT 1.0.
the value of the message_type element (child element of m_cotrol) should be changed
A new element message_status should be added (as a child of the m_control element).
These changes are reflected in the expected output XML. With my current XSLT code, I am only able to achieve the second requirement.
Input XML:
<?xml version="1.0"?>
<message xmlns="http://www.origoservices.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<m_control>
<control_timestamp>2013-04-12T09:24:38.902</control_timestamp>
<message_id>a50ec030-72ab</message_id>
<retry_number>0</retry_number>
<message_type>Request</message_type>
<message_version>test.XSD</message_version>
<expected_response_type>synchronous</expected_response_type>
<initiator_id>FST</initiator_id>
<initiator_orchestration_id>1637280</initiator_orchestration_id>
<responder_id>mycomp</responder_id>
</m_control>
<m_content>
<b_control>
<service_provider_reference_number>650971</service_provider_reference_number>
<intermediary_case_reference_number>Sample1</intermediary_case_reference_number>
<quote_type>Comparison</quote_type>
<quote_or_print>Print</quote_or_print>
<message_version_number>3.7</message_version_number>
<submission_date>0001-04-12</submission_date>
</b_control>
</m_content>
</message>
Expected Output:
<message xmlns="http://www.origoservices.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<m_control>
<control_timestamp>2013-04-12T09:24:38.902</control_timestamp>
<message_id>a50ec030-72ab</message_id>
<retry_number>0</retry_number>
<message_type>Response</message_type>
<message_version>test.XSD</message_version>
<expected_response_type>synchronous</expected_response_type>
<initiator_id>FST</initiator_id>
<initiator_orchestration_id>1637280</initiator_orchestration_id>
<responder_id>mycomp</responder_id>
<message_status>User not allowed access</message_status>
</m_control>
<m_content>
<b_control>
<service_provider_reference_number>650971</service_provider_reference_number>
<intermediary_case_reference_number>Sample1</intermediary_case_reference_number>
<quote_type>Comparison</quote_type>
<quote_or_print>Print</quote_or_print>
<message_version_number>3.7</message_version_number>
<submission_date>0001-04-12</submission_date>
<quote_response_status>Error</quote_response_status>
<quote_error_note>
<reason>[Error] Check if the User has access to the requested service</reason>
</quote_error_note>
</b_control>
</m_content>
</message>
XSLT code: Based on the value of DataPower variable (var://service/error-message), I need the expected output.
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dp="http://www.datapower.com/extensions" version="1.0" extension-element-prefixes="dp" exclude-result-prefixes="dp">
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="//*[contains(name(),'m_control')]">
<xsl:choose>
<xsl:when test="dp:variable('var://service/error-message') = 'not present'">
<m_control xmlns="http://www.origoservices.com">
<xsl:apply-templates select="#* | *"/>
<message_status>User not recognized</message_status>
</m_control>
</xsl:when>
</xsl:choose>
</xsl:template>
<xsl:template match="//*[contains(name(),'b_control')]">
<xsl:choose>
<xsl:when test="dp:variable('var://service/error-subcode')='0x01d30002'">
<b_control xmlns="http://www.origoservices.com">
<xsl:apply-templates select="#* | *"/>
<quote_response_status>Error</quote_response_status>
<quote_error_note>
<reason>[Error] Check if the User has access to the requested service</reason>
</quote_error_note>
</b_control>
</xsl:when>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
The following stylesheet meets both of your requirements. It does a common identity transform (which your XSLT does, too) with exceptions.
Note that I have not taken into consideration any changes that are performed by your stylesheet but not listed as a requirement (i.e. changing quote_error_note and quote_response_status).
This line:
<xsl:template match="text()[parent::ori:message_type]">
meets your first requirement, the one you were unable to code. It matches the text content of message_type and outputs "Response" instead.
But this solution differs from yours in another way: it does not match elements along the lines of:
<xsl:template match="//*[contains(name(),'m_control')]">
Rather, their correct namespace is identified:
<xsl:template match="ori:m_control">
Now, what's the difference? Your way of describing the template match allows elements of any namespace to be matched. This might not be a problem in your case (no conflicting namespaces) but it could be one in general.
Full stylesheet
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ori="http://www.origoservices.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
exclude-result-prefixes="ori xsi">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ori:m_control">
<xsl:copy>
<xsl:apply-templates/>
<message_status>
<xsl:text>User not allowed access</xsl:text>
</message_status>
</xsl:copy>
</xsl:template>
<xsl:template match="text()[parent::ori:message_type]">
<xsl:text>Response</xsl:text>
</xsl:template>
</xsl:stylesheet>

Removing empty tags within a variable

My question is related to to another poster's StackOverflow question on Two Phase Processing. I didn't want to use mode="#all" without fully understanding it and how it could affect the rest of my XSLT. I'm thinking the below code accomplishes the same thing without risking interference with other templates but would like confirmation. It kind of seems like I am processing $completepolicy twice without need to do so.
Empty tag definition: <field/> <field></field>. Tags can have attributes but there will never be an empty tag that has an attribute. There will also never be nodes with <field> </field> where the white space could represent many other things.
Given this XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<!-- many other apply-templates here -->
<xsl:variable name="completepolicy" as="element()">
<holder>
<TABLE1 type="global">
<col1>Red</col1>
<col2/>
</TABLE1>
<TABLE2>
<field1>Blue</field1>
<field2/>
</TABLE2>
</holder>
</xsl:variable>
<xsl:apply-templates mode="emptytags" select="$completepolicy/*"/>
</xsl:template>
<xsl:template match="*[not(node())]" mode="emptytags"/>
<xsl:template match="node() | #*" mode="emptytags">
<xsl:copy>
<xsl:apply-templates select="node() | #*" mode="#current"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Results in this output for $completepolicy:
<TABLE1 type="global">
<col1>Red</col1>
</TABLE1>
<TABLE2>
<field1>Blue</field1>
</TABLE2>
Why do you think the $completepolicy variable is being processed twice? This cannot be seen in the provided code.
I confirm that the provided code looks good to me.
I would recommend never to use mode="#all". This is too powerful and dangerous -- this is almost never needed.

Replacing particuar String using xsl

I have a block as below.
<rightOperand>.*ifIndedx.*</rightOperand>
But i need to change the above snippet to the below one
<rightOperand>(?i)(?s).*ifIndex.*</rightOperand>
This translation needs to be done only when the right operand starts and ends with the string .*
please provide me some pointers .
You can do this my overriding the identity transform with an extra template just to match the text within rightOperand that matches your criteria
<xsl:template match="rightOperand/text()
[starts-with(., '.*')]
[substring(., string-length(.) - 1, 2) = '.*']">
Note that XSLT 1.0 does not have the ends-with function, which is why there is the extra work to check the ending with substring. If you were using XSLT 2.0 you could simplify this with ends_with though.
Here is the full XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="rightOperand/text()
[starts-with(., '.*')]
[substring(., string-length(.) - 1, 2) = '.*']">
<xsl:text>(?i)(?s)</xsl:text><xsl:copy />
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When applied to your sample XML, the following is output:
<rightOperand>(?i)(?s).*ifIndedx.*</rightOperand>

I want to select all texts (including node names) under a node

I currently have a xml file like this:
<aaa>
<b>I am a <i>boy</i></b>.
</aaa>
How can I get the exact string as: <b>I am a <i>boy</i></b>.? Thanks.
You have to tell XSLT that you want to copy elements through as well. That can be done with an additional rule. Note that I use custom select clauses on my apply-templates elements to select attributes as well as all node-type objects. Also note that the rule for aaa takes precedence, and does not copy the aaa element itself to the output.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="aaa">
<xsl:apply-templates select="#*|node()"/>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
<aaa>
<b>I am a <i>boy</i></b>.
</aaa>
How can I get the exact string as:
<b>I am a <i>boy</i></b>.?
The easiest/shortest way to do this in your case is to output the result of the following XPath expression:
/*/node()
This means: "Select all nodes that are children of the top element."
Of course, there are some white-space-only text nodes that we don't want selected, but XSLT can take care of this, so the XPath expression is just as simple as shown above.
Now, to get the result with an XSLT transformation, we use the following:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="/*/node()"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document, the wanted result is produced:
<b>I am a <i>boy</i></b>.
Do note:
The use of the <xsl:copy-of> xslt instruction (not <xsl:value-of>), which copies nodes, not string values.
The use of the <xsl:strip-space elements="*"/> XSLT instruction, directing the XSLT processor to ignore any white-space-only text node in the XML document.