How to convert and move XML comment nodes to element nodes in a different location - xslt

I have a document that contains comment nodes in a variety of locations. I want to move these comments to a single new location in the document, and convert them to p elements.
I am an XSLT beginner, and with the help of W3schools and StackFlow I’ve been able to get these comments converted and in the correct location. However, the converted comments are copied, not moved, so they stay in their original locations.
For example, given the following input:
<?xml version="1.0" encoding="UTF-8"?>
<!--Comment before root element-->
<concept>
<!--Comment element is parent-->
<title>Test Topic</title>
<shortdesc>This is a shortdesc element <!-- shortdesc element is parent --></shortdesc>
<conbody>
<!-- Conbody element is parent -->
<p>This is para 1 </p>
<section>
<!--Section element is parent; comment is before title-->
<title>Section 1</title>
<!--Section element is parent; comment is after title-->
<p>This is para 1 in section1</p>
</section>
</conbody>
</concept>
I want the following output:
<?xml version="1.0" encoding="UTF-8"?>
<concept>
<title>Test Topic</title>
<shortdesc>This is a shortdesc element </shortdesc>
<conbody>
<p>This is para 1 </p>
<section>
<title>Section 1</title>
<p>This is para 1 in section1</p>
</section>
<section outputclass="authorNote">
<p>Comment before root element </p>
<p>Comment element is parent</p>
<p>shortdesc element is parent</p>
<p>Conbody element is parent</p>
<p>Section element is parent; comment is before title</p>
<p>Section element is parent; comment is after title</p>
</section>
</conbody>
This is my stylesheet:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:template match="/">
<xsl:copy>
<xsl:apply-templates select="node() | #*" />
</xsl:copy>
</xsl:template>
<!-- Create new section to hold converted comments -->
<xsl:template match="conbody" >
<xsl:copy>
<xsl:apply-templates/>
<section outputclass="authorNote">
<xsl:apply-templates select="//comment()"/>
</section>
</xsl:copy>
</xsl:template>
<!-- Convert comment nodes to p elements -->
<xsl:template match="//comment()">
<p><xsl:value-of select="."/></p>
</xsl:template>
</xsl:stylesheet>
Which produces the following output:
<?xml version="1.0" encoding="UTF-8"?>
<p>Comment before root element </p>
<concept>
<p>Comment element is parent</p>
<title>Test Topic</title>
<shortdesc>This is a shortdesc element <p>shortdesc element is parent</p></shortdesc>
<conbody>
<p>Conbody element is parent</p>
<p>This is para 1 </p>
<section>
<p>Section element is parent; comment is before title</p>
<title>Section 1</title>
<p>Section element is parent; comment is after title</p>
<p>This is para 1 in section1</p>
</section>
<section outputclass="authorNote">
<p>Comment before root element </p>
<p>Comment element is parent</p>
<p>shortdesc element is parent</p>
<p>Conbody element is parent</p>
<p>Section element is parent; comment is before title</p>
<p>Section element is parent; comment is after title</p>
</section>
</conbody>
</concept>
What am I doing wrong? The articles I have found so far are all about manipulating elements, rather than comment nodes. Can someone give me some tips on how to address this problem?

Separate the two tasks with modes (don't copy the comments with the default mode, transform them with a different mode):
<!-- suppress treatment of comments through default mode -->
<xsl:template match="comment()"/>
<!-- Create new section to hold converted comments -->
<xsl:template match="conbody" >
<xsl:copy>
<xsl:apply-templates/>
<section outputclass="authorNote">
<xsl:apply-templates select="//comment()" mode="transform"/>
</section>
</xsl:copy>
</xsl:template>
<!-- Convert comment nodes to p elements -->
<xsl:template match="comment()" mode="transform">
<p><xsl:value-of select="."/></p>
</xsl:template>
Online sample at http://xsltransform.net/93dEHGn.

Related

Strip all the text from a specific node and remove all tags from xml using xslt1

I'm trying to strip all tags from a xml doc and i need to strip all text from a specific node only. For more clearity see the below example:
<root>
<p>My 1st Semester Visual</p>
<p>
<b>Self Reflection</b>
</p>
<p>The activity</p>
<content-block>
<div class="imageWrapper" />
</content-block>
<p id="5fce699db97470099ea6c7e6"> </p>
<content-block>
<div class="carousel">
<div class="carouselHeader" />
<div class="carouselNavbar">
<div class="carouselNavbarThumbnails" />
</div>
</div>
My Space Unit Flyer
</content-block>
<div>
<br />
</div>
</root>
Result:
<root><text>My 1st Semester VisualSelf ReflectionThe activity
My Space Unit Flyer
</text><contentBlocks>2</contentBlocks></root>
Expected result: I also need to remove text that is inside the <content-block>.
<root><text>My 1st Semester VisualSelf ReflectionThe activity
</text><contentBlocks>2</contentBlocks></root>
My xslt:
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" encoding="UTF-8" indent="no" omit-xml-declaration="yes"/>
<!-- Strip out white space -->
<xsl:strip-space elements="*"/>
<!-- Strip out all html tags, only leaving text contents -->
<xsl:template match="*">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="root">
<root>
<text>
<xsl:apply-templates/>
</text>
<contentBlocks>
<xsl:if test="//content-block">
<xsl:value-of select="count(//content-block)"/>
</xsl:if>
<xsl:if test="figure">
<xsl:value-of select="count(figure)"/>
</xsl:if>
</contentBlocks>
</root>
</xsl:template>
</xsl:transform>
Thanks in advance
<!-- Add this to your code. It suppresses content-block. -->
<xsl:template match="content-block"/>

Apply templates not working as desired

Given the following xml inputs:
file1:
<?xml version="1.0" encoding="UTF-8"?>
<File1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<code code="file1_code" displayName="file1_display" codeSystem="file1_cs" codeSystemName="file1_csn"/>
<title>Title of file1</title>
<component typeCode="COMP">
<structuredBody classCode="DOCBODY">
<component typeCode="COMP">
<section>
<templateId root="someRoot_file1" assigningAuthorityName="someAuhthority_file1"/>
<code code="file1-sec1_code" displayName="file1_sec1_display" codeSystem="file1_sec1_cs" codeSystemName="file1_sec1_csn"/>
<title>Tile of sec 1 from file1</title>
<text>
<content styleCode="Italics">
Text of sec 1 from file1
</content>
</text>
<entry> file 1 sec 1
</entry>
</section>
</component>
<component typeCode="COMP">
<section classCode="DOCSECT">
<code code="file1_sec2_code" codeSystem="file2_sec2_cs" displayName="file2_sec2_display" codeSystemName="file2_sec2_csn"/>
<title>Tile from sec 2 file 1</title>
<text>
<content styleCode="Italics">
Text from file1 sec 2
</content>
</text>
<entry typeCode="test"> file2 sec 2
</entry>
</section>
</component>
</structuredBody>
</component>
</File1>
file2:
<?xml version="1.0"?>
<A>
<title value="Title of file2"/>
<text>
<status value="generated"/>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>File 2 Text</p>
</div>
</text>
<section>
<code>
<coding>
<system value="sec 1 file2 sys"/>
<code value="sec 1 file 2 code"/>
<display value="sec 1 file 2 display"/>
</coding>
</code>
<title>Title of sec 1 file2</title>
<text>
<content styleCode="Italics">Section 1 Text
</content>
</text>
<entry>
<someEntry>
</someEntry>
</entry>
</section>
<section>
<code>
<coding>
<system value="sec 2 file2 sys"/>
<code value="sec 2 file 2 code"/>
<display value="sec 2 file 2 display"/>
</coding>
</code>
<title>Title of sec 2 file2</title>
<text>
<content styleCode="Italics">Section 2 file2 Text
</content>
</text>
<entry>
<someEntry> entry sec 2 file 2
</someEntry>
</entry>
</section>
</A>
and the following xslt:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*" />
<xsl:variable name="input" select="/" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<Bundle>
<id value="test"/>
<type value="document"/>
<entry>
<resource>
<xsl:apply-templates select="document('file2.xml')/*"/>
</resource>
</entry>
</Bundle>
</xsl:template>
<xsl:template match="text">
<text>
<status value="generated"/>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>This is the text from the stylesheet </p>
</div>
</text>
</xsl:template>
<xsl:template match="title">
<xsl:apply-templates select="$input/File1/title"/>
</xsl:template>
<xsl:template match="section[1]">
<xsl:apply-templates select="$input/File1/component/structuredBody/component/section"/>
</xsl:template>
<xsl:template match="section[2]"/>
<xsl:template match="File1/title">
<title>
<xsl:attribute name="value">
<xsl:value-of select="." />
</xsl:attribute>
</title>
</xsl:template>
<xsl:template match = "File1/component/structuredBody/component/section">
<section>
<xsl:apply-templates/>
</section>
</xsl:template>
</xsl:stylesheet>
And this is the output:
<?xml version="1.0" encoding="UTF-8"?>
<Bundle>
<id value="test"/>
<type value="document"/>
<entry>
<resource>
<A>
<title value="Title of file1"/>
<text>
<status value="generated"/>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>This is the text from the stylesheet </p>
</div>
</text>
<section>
<templateId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" root="someRoot_file1" assigningAuthorityName="someAuhthority_file1"/>
<code xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" code="file1-sec1_code" displayName="file1_sec1_display" codeSystem="file1_sec1_cs" codeSystemName="file1_sec1_csn"/>
<title value="Title of file1"/>
<text>
<status value="generated"/>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>This is the text from the stylesheet </p>
</div>
</text>
<entry xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
</section>
<section>
<code xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" code="file1_sec2_code" codeSystem="file2_sec2_cs" displayName="file2_sec2_display" codeSystemName="file2_sec2_csn"/>
<title value="Title of file1"/>
<text>
<status value="generated"/>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>This is the text from the stylesheet </p>
</div>
</text>
<entry xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" typeCode="test"/>
</section>
</A>
</resource>
</entry>
</Bundle>
And this is the expected output:
<?xml version="1.0" encoding="UTF-8"?>
<Bundle>
<id value="test"/>
<type value="document"/>
<entry>
<resource>
<A>
<title value="Title of file1"/>
<text>
<status value="generated"/>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>This is the text from the stylesheet </p>
</div>
</text>
<section>
<templateId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" root="someRoot_file1" assigningAuthorityName="someAuhthority_file1"/>
<code xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" code="file1-sec1_code" displayName="file1_sec1_display" codeSystem="file1_sec1_cs" codeSystemName="file1_sec1_csn"/>
<title>Tile of sec 1 from file1</title>
<text>
<content styleCode="Italics">
Text of sec 1 from file1
</content>
</text>
<entry xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
</section>
<section>
<code xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" code="file1_sec2_code" codeSystem="file2_sec2_cs" displayName="file2_sec2_display" codeSystemName="file2_sec2_csn"/>
<title>Tile from sec 2 file 1</title>
<text>
<content styleCode="Italics">
Text from file1 sec 2
</content>
</text>
<entry xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" typeCode="test"/>
</section>
</A>
</resource>
</entry>
</Bundle>
I have the following questions:
Why is the title in the section elements coming from the main title (i.e. File1/title) when the apply templates is within File1/component/structuredBody/component/section? I was expecting that the title of the section will be output, which is what is desired. Even more confusing is that it does indeed output the elements in the section like code, entry and so on but title and text (see q2 below) seems to be treated differently and I can't for the life of me understand why.
Same with text. Why is the text for section not being output?
Here is my presumably false understanding of the process:
We start with the <xsl:template match="/"> and create elements Bundle, id etc. and then using <xsl:apply-templates select="document('file2.xml')/*"/> we match the top element of file2 (A) and since we don't have a template matching it explicitly, the identity template is called, copies it and process its child elements, which are text, title and section. For each of these child elements, it looks for a matching template. it finds them and matches them.
For element section however, it matches only the first section element because of <xsl:template match="section[1]"> and then because of <xsl:apply-templates select="$input/File1/component/structuredBody/component/section"/> in the template, it looks for a template matching children of section in FIle1, which are code, text, title and templateId. It finds no such explicitly defined template, so calls the identity templates for them, copies and processes them till the end. At least that is my understanding of it.
Why is the title in the section elements coming from the main title
Because any time the processor is instructed to apply templates to a title, it looks for the best-matching template to apply, and finds this:
<xsl:template match="title">
<xsl:apply-templates select="$input/File1/title"/>
</xsl:template>
This changes the context to the title in File1.xml, and the best-matching template for this one is:
<xsl:template match="File1/title">
<title>
<xsl:attribute name="value">
<xsl:value-of select="." />
</xsl:attribute>
</title>
</xsl:template>
and that is the result you see.
Same with text. Why is the text for section not being output?
-- edited in response to the following clarification: --
When I say text I am talking about text elements only.
The original text element (child of section in File1.xml) is not being output because you have a specific template matching it and outputting something else instead:
<xsl:template match="text">
<text>
<status value="generated"/>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>This is the text from the stylesheet </p>
</div>
</text>
</xsl:template>
# michael.hor257k and #Michael Kay yeah, that was definitely the case, that I misunderstood how xsl:apply-templates works with regards to context . I thought because I called the xsl:apply-templates from within xsl:template match = "$input/File1/component/structuredBody/component/section"> that it will only look for match templates that match the children of section. In other words, I thought it will look for templates like “<xsl:template match=”File1/component/structuredBody/component/section/title"> but that is clearly not the case.
xsl:apply-templates simply looks for the children and then looks for a match template regardless of the context from which they were called. So, it will look for title or text template that matches and if it finds them, it will match them.
The easiest solution I could find that seems to solve the problem is to add a path to the title and text templates. In other words, instead of just <xsl:template match="text"> I should have <xsl:template match="A/text">. Same for title. This way, the xsl:apply-templates will not apply <xsl:template match="A/text"> as the title in section is not a child of A. So given that no matching explicit template is defined, the identity template will be applied and will output the title of section as desired.

catch the first occurrence value

I've the below XML.
<chapter num="1">
<section level="sect2">
<page>22</page>
</section>
<section level="sect3">
<page>23</page>
</section>
</chapter>
here I'm trying to get the first occurrence of <page>.
I'm using the below XSLT.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ntw="Number2Word.uri" exclude-result-prefixes="ntw">
<xsl:output method="html"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="ThisDocument" select="document('')"/>
<xsl:template match="/">
<xsl:text disable-output-escaping="yes"><![CDATA[<!DOCTYPE html>]]></xsl:text>
<html>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="chapter">
<section class="tr_chapter">
<xsl:value-of select="//page[1]/text()"/>
<div class="chapter">
</div>
</section>
</xsl:template>
</xsl:stylesheet>
but the output that I get all the page valyes printed. I only want the first one.
Current output.
<!DOCTYPE html>
<html>
<body>
<section class="tr_chapter">2223
<div class="chapter">
</div>
</section>
</body>
</html>
the page values are printed here after <section class="tr_chapter">, i want only 22 but I'm getting 2223
here I'm using //page[1]/text(), because I'm not sure that the page comes within the section, it is random.
please let me know how I can get only the first page value.
here is the transformation http://xsltransform.net/3NzcBsR
Try:
<xsl:value-of select="(//page)[1]"/>
http://xsltransform.net/3NzcBsR/1
Note that this gets the value of the first page element in the entire document.
If you want to search the contents of the chapter context element in your template for the first page descendant then use <xsl:value-of select="descendant::page[1]"/> or <xsl:value-of select="(.//page)[1]"/>.

xslt 1.0 - transform nodes before specific element into other element

i have following Input:
<p>
XYZZ
<nl/>
DEF
<process>gggg</process>
KKK
<nl/>
JKLK
<nl/>
QQQQ
</p>
I need each node seprated by element <nl/> to be output in element <title>:
<p>
<title>XYZZ</title>
<title>
DEF<process>gggg</process>KKK
</title>
<title>JKLK</title>
<title>QQQQ</title>
</p>
`
Please suggest me the way to get the specified output.
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kFollowing" match="/*/node()[not(self::nl)]"
use="generate-id(preceding-sibling::nl[1])"/>
<xsl:key name="kPreceding" match="/*/node()[not(self::nl)]"
use="generate-id(following-sibling::nl[1])"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates select="#*|nl"/>
</xsl:copy>
</xsl:template>
<xsl:template match="nl" name="groupFollowing">
<title>
<xsl:apply-templates select="key('kFollowing',generate-id())"/>
</title>
</xsl:template>
<xsl:template match="nl[1]">
<title>
<xsl:apply-templates select="key('kPreceding',generate-id())"/>
</title>
<xsl:call-template name="groupFollowing"/>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<p>
XYZZ
<nl/>
DEF
<process> gggg </process>
KKK
<nl/>
JKLK
<nl/>
QQQQ
</p>
produces the wanted, correct result:
<p>
<title>
XYZZ
</title>
<title>
DEF
<process> gggg </process>
KKK
</title>
<title>
JKLK
</title>
<title>
QQQQ
</title>
</p>
Do note:
The identity rule is used to copy nodes "as-is".
There are specific templates matching the top element, the first nl child of the top element and any nl child of the top element.
Two keys are defined that select all non-nl nodes that immediately-precede an nl element and all nodes that immediately-follow an nl element.
An nl element is replaced by a title element and all immediately-following non-nl nodes are processed and the result is put into this title element.
For the first (child of its parent) nl element there is an initial step in which a title element is added and all immediately-preceding non-nl nodes are processed and the result is put into this title element. Then the processing in step 4. above is performed.

Converting plain text into html style lists using xstl, or grouping elements according to their contents and their positions using xslt

Trying to convert a plain text document into a html document using xslt, I am struggling with unordered lists.
I have:
<item>some text</item>
<item>- a list item</item>
<item>- another list item</item>
<item>more plain text</item>
<item>more and more plain text</item>
<item>- yet another list item</item>
<item>even more plain text</item>
What I want:
<p>some text</p>
<ul>
<li>a list item</li>
<li>another list item</li>
</ul>
<p>more plain text</p>
<p>more and more plain text</p>
<ul>
<li>yet another list item</li>
</ul>
<p>even more plain text</p>
I was looking at the Muenchian grouping but it would combine all list items into one group and all the plain text items into another. Then I tried to do select only items which preceding elements first char is different from its first char. But when I try to combine everything, I still get all the li in one ul.
Do you have any hints for me?
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kFollowing"
match="item[contains(., 'list')]
[preceding-sibling::item[1][contains(.,'list')]]"
use="generate-id(preceding-sibling::item
[not(contains(.,'list'))]
[1]
/following-sibling::item[1]
)"/>
<xsl:template match="item[contains(.,'list')]
[preceding-sibling::item[1][not(contains(.,'list'))]]">
<ul>
<xsl:apply-templates mode="list"
select=".|key('kFollowing',generate-id())"/>
</ul>
</xsl:template>
<xsl:template match="item" mode="list">
<li><xsl:value-of select="."/></li>
</xsl:template>
<xsl:template match="item[not(contains(.,'list'))]">
<p><xsl:value-of select="."/></p>
</xsl:template>
<xsl:template match="item[contains(.,'list')]
[preceding-sibling::item[1][contains(.,'list')]]"/>
</xsl:stylesheet>
when applied on the provided XML document (corrected from severely malformed into a well-formed XML document):
<t>
<item>some text</item>
<item>- a list item</item>
<item>- another list item</item>
<item>more plain text</item>
<item>more and more plain text</item>
<item>- yet another list item</item>
<item>even more plain text</item>
</t>
produces the wanted, correct result:
<p>some text</p>
<ul>
<li>- a list item</li>
<li>- another list item</li>
</ul>
<p>more plain text</p>
<p>more and more plain text</p>
<ul>
<li>- yet another list item</li>
</ul>
<p>even more plain text</p>
This stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()">
<xsl:apply-templates select="node()[1]|following-sibling::node()[1]"/>
</xsl:template>
<xsl:template match="item">
<p>
<xsl:value-of select="."/>
</p>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:template>
<xsl:template match="item[starts-with(.,'- ')]">
<ul>
<xsl:call-template name="open"/>
</ul>
<xsl:apply-templates
select="following-sibling::node()
[not(self::item[starts-with(.,'- ')])][1]"/>
</xsl:template>
<xsl:template match="node()" mode="open"/>
<xsl:template match="item[starts-with(.,'- ')]" mode="open" name="open">
<li>
<xsl:value-of select="substring-after(.,'- ')"/>
</li>
<xsl:apply-templates select="following-sibling::node()[1]" mode="open"/>
</xsl:template>
</xsl:stylesheet>
Output:
<p>some text</p>
<ul>
<li>a list item</li>
<li>another list item</li>
</ul>
<p>more plain text</p>
<p>more and more plain text</p>
<ul>
<li>yet another list item</li>
</ul>
<p>even more plain text</p>
Note: This is like wrapping adjacents. Ussing fine grained traversal.