uncomment XML content with XSLT - xslt

I have a problem with XML and XSLT.
I have one XML file with some comment and I want to uncomment it.
For example:
<my-app>
<name>
</name>
<!-- <class>
<line></line>
</class>-->
</my-app>
I want to uncomment this commented tag.

<!-- the identity template copies everything
(unless more specific templates apply) -->
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="node() | #*" />
</xsl:copy>
</xsl:template>
<!-- this template matches comments and uncomments them -->
<xsl:template match="comment()">
<xsl:value-of select="." disable-output-escaping="yes" />
</xsl:template>
Be aware that disable-output-escaping="yes" implies that the comment contents should be well-formed.

If one use Saxon, saxon:parse() is cleaner, because it creates a real XML structure.
saxon:parse($xml as xs:string) ==> document-node()
In xsl:stylesheet element, add xmlns:saxon=http://saxon.sf.net/
Example:
<xsl:template match="comment()">
<xsl:variable name="comment" select="saxon:parse(.)" as="document-node()"/>
<xsl:copy-of select="$comment"/>
</xsl:template>

Related

XSL copy without values is it possible?

I want to compare two xmls.
1. First compare XML strucutre/schema.
2. Compare values.
I am using beyond compare tool to compare. Since these two xmls are different values, there are lot many differences in comparison report, for which I am not interested. Since, my focus now is to only compare structure/schema.
I tried to copy the xmls by following template, and other as well. But every time it is with values.
I surfed on google, xsl-copy command itself copies everything for selected node/element..
Is there any ways with which I can filter out values and only schema is copied ?
My Data :
<root>
<Child1>xxxx</Child1>
<Child2>yyy</Child2>
<Child3>
<GrandChild1>dddd<GrandChild1>
<GrandChild2>erer<GrandChild2>
</Child3>
</root>
Template used :
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<!-- for all elements (tags) -->
<xsl:template match="*">
<!-- create a copy of the tag (without attributes and children) in the output -->
<xsl:copy>
<!-- For all attributes of the current tag -->
<xsl:for-each select="#*">
<xsl:sort select="name( . )" order="ascending" case-order="lower-first" />
<xsl:copy/>
</xsl:for-each>
<!-- recurse through all child tags -->
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()|comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
OutPut Required :
Something like..
<root>
<Child1></Child1>
<Child2></Child2>
<Child3>
<GrandChild1><GrandChild1>
<GrandChild2><GrandChild2>
</Child3>
</root>
At the moment, you have a template matching text() to copy it. What you need to do is remove this match from that template, and have a separate template match, that matches only non-whitespace text, and remove it.
<xsl:template match="comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
<xsl:template match="text()[normalize-space()]" />
For white-space only text (as used in indentation), these will be matched by XSLT'S built-in templates.
For attributes, use xsl:attribute to create a new attribute, without a value, rather than using xsl:copy which will copy the whole attribute.
<xsl:attribute name="{name()}" />
Note the use of Attribute Value Templates (the curly braces) to indicate the expression is to be evaluated to get the string to use.
Try this XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- for all elements (tags) -->
<xsl:template match="*">
<!-- create a copy of the tag (without attributes and children) in the output -->
<xsl:copy>
<!-- For all attributes of the current tag -->
<xsl:for-each select="#*">
<xsl:sort select="name( . )" order="ascending" case-order="lower-first" />
<xsl:attribute name="{name()}" />
</xsl:for-each>
<!-- recurse through all child tags -->
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
<xsl:template match="text()[normalize-space()]" />
</xsl:stylesheet>
Also note that attributes are considered to be unordered in XML, so although you have code to sort the attributes, and they probably will appear in the right order, you can't guarantee it.

how to exclude footer element content from body using xslt 1.0

I have a html page for an example like below
<html><head><title>test</title></head><body><div>test1</div><footer><div>test2</div></footer></body></html>
I have written xslt 1.0 to transform and extract the title and body content, but my requirement is to ignore footer content alone and consider all other element values inside body content. How to achieve this ?
<xsl:template match="/">
<document >
<xsl:copy-of select="#*" />
<xsl:apply-templates select="html/head" />
<xsl:apply-templates select="html/body" />
</document>
</xsl:template>
<xsl:template match="html/head">
<content name="title">
<xsl:value-of select="title" />
</content>
</xsl:template>
<xsl:template match="html/body">
<content name="snippet">
<xsl:value-of select="viv:replace(viv:replace(.,'<[^>]*>',' ', 'gi'),'&nbsp;','','gi')"/>
</content>
</xsl:template>
Q: how to exclude footer element content from body using xslt 1.0
If this is really your question this should be answered hundred times.
Stat with an identity transform and have empty templates for elements to ignore.
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="body/footer"/>
Looking to your xlst let us assume there a some strange_other_things requested.
Do strange_other_things for the body without footer put the result form identity transfer into a variable.
<xsl:template match="body" mode="strange_other_things">
<xsl:variable name="body" >
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:variable>
<!-- use $body but I'm out here -->
</xsl:template>
Further guess: With viv:replace(.,'<[^>]*>',' ', 'gi') you try to remove xml element names. This will not work because . is used in a text context an will only return all text inside the current node.
So if I'm right the question is a lite deceptive.

Remove all elements with a specific namespace in xslt

I have an xml which I'd like to change the namespace of most elements, remove some specific element names and also remove elements which contains a specific namespace. Example of such an xml
<root xmlns="somenamespace">
<elem1>sometext</eleme1>
<ns0:elem2 xmlns:ns0="othernamespace">
<ns1:elem3 xmlns:ns1="thirdnamespace" />
</ns0:elem2>
<elem4>sometext</elem4>
</root>
I am trying to use the following xslt:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*[namespace-uri() = 'somenamespace']">
<xsl:choose>
<!-- change element name from root to root2 -->
<xsl:when test="local-name(.)='root'">
<xsl:element name="root2" namespace="mynamespace">
<xsl:apply-templates select="#* | node()" />
</xsl:element>
</xsl:when>
<!-- skip these elements that are not in root2 -->
<xsl:when test="local-name(.)='elem1'" />
<xsl:when test="namespace-uri()='othernamespace'" />
<!-- Copy other elemnts -->
<xsl:otherwise>
<xsl:element name="{name()}" namespace="mynamespace">
<xsl:apply-templates select="#* | node()" />
</xsl:element>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<!-- Copy the rest -->
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
The output xml should be
<root2 xmlns="mynamespace">
<elem4>sometext</elem4>
</root2>
However the result is
<root2 xmlns="mynamespace" xmlns:ns0="othernamespace">
<ns0:elem2>
<ns1:elem3 xmlns:ns1="thirdnamespace" />
</ns0:elem2>
<elem4>sometext</elem4>
</root2>
It seems that most elements of the xslt are working except the one that is supposed to remove all elements of a specific namespace. Is there anything wrong in the xslt above?
Your first template is only matching elements in the somenamespace namespace. The other namespaces (othernamespace,thirdnamespace) are matched by the identity transform (last template) and are output as-is.
To strip all elements that aren't in the somenamespace namespace, add this template:
<xsl:template match="*[not(namespace-uri()='somenamespace')]" priority="1"/>

how to merge element using xslt?

I have an reference type of paragraph with element.
Example
Input file:
<reference>
<emph type="bold">Antony</emph><emph type="bold">,</emph> <emph type="bold">R.</emph>
<emph type="bold">and</emph> <emph type="bold">Micheal</emph><emph type="bold">,</emph> <emph type="bold">V.</emph>
<emph type="italic">reference title</emph></reference>
Output received now:
<p class="reference"><strong>Antony</strong><strong>,</strong> <strong>R.</strong>
<strong>and</strong> <strong>Micheal</strong><strong>,</emph>
<emph type="bold">V.</strong> <em>reference title></em></p>
Required output file:
<p class="reference"><strong>Antony, R. and Micheal, V.</strong> <em>reference title</em></p>
My xslt scripts:
<xsl:template match="reference">
<p class="reference"><xsl:apply-templates/></p>
</xsl:template>
<xsl:template match="emph">
<xsl:if test="#type='bold'">
<strong><xsl:apply-templates/></strong>
</xsl:if>
<xsl:if test="#type='italic'">
<em><xsl:apply-templates/></em>
</xsl:if>
</xsl:template>
What needs to be corrected in xslt to get the <strong> element single time like the required output file?
Please advice anyone..
By,
Antny.
This is an XSLT 1.0 solution:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output method="xml" encoding="utf-8" />
<!-- the identity template copies everything verbatim -->
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="node() | #*" />
</xsl:copy>
</xsl:template>
<!-- this matches the first <emph> nodes of their kind in a row -->
<xsl:template match="emph[not(#type = preceding-sibling::emph[1]/#type)]">
<xsl:variable name="elementname">
<xsl:choose>
<xsl:when test="#type='bold'">strong</xsl:when>
<xsl:when test="#type='italic'">em</xsl:when>
</xsl:choose>
</xsl:variable>
<xsl:if test="$elementname != ''">
<!-- the first preceding node with a different type is the group separator -->
<xsl:variable
name="boundary"
select="generate-id(preceding-sibling::emph[#type != current()/#type][1])
" />
<xsl:element name="{$elementname}">
<!-- select all <emph> nodes of the row with the same type... -->
<xsl:variable
name="merge"
select=". | following-sibling::emph[
#type = current()/#type
and
generate-id(preceding-sibling::emph[#type != current()/#type][1]) = $boundary
]"
/>
<xsl:apply-templates select="$merge" mode="text" />
</xsl:element>
</xsl:if>
</xsl:template>
<!-- default: keep <emph> nodes out of the identity template mechanism -->
<xsl:template match="emph" />
<!-- <emph> nodes get their special treatment here -->
<xsl:template match="emph" mode="text">
<!-- effectively, this copies the text node via the identity template -->
<xsl:apply-templates />
<!-- copy the first following node - if it is a text node
(this is to get interspersed spaces into the output) -->
<xsl:if test="
generate-id(following-sibling::node()[1])
=
generate-id(following-sibling::text()[1])
">
<xsl:apply-templates select="following-sibling::text()[1]" />
</xsl:if>
</xsl:template>
</xsl:stylesheet>
It results in:
<reference>
<strong>Antony, R. and Micheal, V.</strong>
<em>reference title</em>
</reference>
I'm not overly happy with
<xsl:variable
name="merge"
select=". | following-sibling::emph[
#type = current()/#type
and
generate-id(preceding-sibling::emph[#type != current()/#type][1]) = $boundary
]"
/>
if someone has a better idea, please tell me.
Here is my method, which uses recursive calls of a template to match elements with the same type.
It first matchs the first 'emph' element, and them recursively calls a template matching 'emph' elements of the same type. Next, it repeats the process matching the next 'emph' element of a type different to the one currently matched.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" encoding="utf-8"/>
<!-- Match root element -->
<xsl:template match="reference">
<p class="reference">
<!-- Match first emph element -->
<xsl:apply-templates select="emph[1]"/>
</p>
</xsl:template>
<!-- Used to match first occurence of an emph element for any type -->
<xsl:template match="emph">
<xsl:variable name="elementname">
<xsl:if test="#type='bold'">strong</xsl:if>
<xsl:if test="#type='italic'">em</xsl:if>
</xsl:variable>
<xsl:element name="{$elementname}">
<xsl:apply-templates select="." mode="match">
<xsl:with-param name="type" select="#type"/>
</xsl:apply-templates>
</xsl:element>
<!-- Find next emph element with a different type -->
<xsl:apply-templates select="following-sibling::emph[#type!=current()/#type][1]"/>
</xsl:template>
<!-- Used to match emph elements of a specific type -->
<xsl:template match="*" mode="match">
<xsl:param name="type"/>
<xsl:if test="#type = $type">
<xsl:value-of select="."/>
<xsl:apply-templates select="following-sibling::*[1]" mode="match">
<xsl:with-param name="type" select="$type"/>
</xsl:apply-templates>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Where this currently fails though, is that it doesn't match the whitespace in between the 'emph' elements.

How to remove <b/> from a document

I'm trying to have an XSLT that copies most of the tags but removes empty "<b/>" tags. That is, it should copy as-is "<b> </b>" or "<b>toto</b>" but completely remove "<b/>".
I think the template would look like :
<xsl:template match="b">
<xsl:if test=".hasChildren()">
<xsl:element name="b">
<xsl:apply-templates/>
</xsl:element>
</xsl:if>
</xsl:template>
But of course, the "hasChildren()" part doesn't exist ... Any idea ?
dsteinweg put me on the right track ... I ended up doing :
<xsl:template match="b">
<xsl:if test="./* or ./text()">
<xsl:element name="b">
<xsl:apply-templates/>
</xsl:element>
</xsl:if>
</xsl:template>
This transformation ignores any <b> elements that do not have any node child. A node in this context means an element, text, comment or processing instruction node.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="b[not(node()]"/>
</xsl:stylesheet>
Notice that here we use one of the most fundamental XSLT design patterns -- using the identity transform and overriding it for specific nodes.
The overriding template will be selected only for nodes that are elements named "b" and do not have (any nodes as) children. This template is empty (does not have any contents), so the effect of its application is that the matching node is ignored/discarded and is not reproduced in the output.
This technique is very powerful and is widely used for such tasks and also for renaming, changing the contents or attributes, adding children or siblings to any specific node that can be matched (avery type of node with the exception of a namespace node can be used as a match pattern in the "match" attribute of <xsl:template/>
Hope this helped.
Cheers,
Dimitre Novatchev
I wonder if this will work?
<xsl:template match="b">
<xsl:if test="b/text()">
...
See if this will work.
<xsl:template match="b">
<xsl:if test=".!=''">
<xsl:element name="b">
<xsl:apply-templates/>
</xsl:element>
</xsl:if>
</xsl:template>
An alternative would be to do the following:
<xsl:template match="b[not(text())]" />
<xsl:template match="b">
<b>
<xsl:apply-templates/>
</b>
</xsl:template>
You could put all the logic in the predicate, and set up a template to match only what you want and delete it:
<xsl:template match="b[not(node())] />
This assumes that you have an identity template later on in the transform, which it sounds like you do. That will automatically copy any "b" tags with content, which is what you want:
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
Edit: Now uses node() like Dimitri, below.
If you have access to update the original XML, you could try using use xml:space=preserve on the root element
<html xml:space="preserve">
...
</html>
This way, the space in the empty <b> </b> tag is preserved, and so can be distinguished from <b /> in the XSLT.
<xsl:template match="b">
<xsl:if test="text() != ''">
....
</xsl:if>
</xsl:template>