How to parse the particular first child <dateline> tag data of a parent tag <body> using XSL 1.0 and skip the value in the output - xslt

From the below sample input and respective output, I need a XSL transformation to skip only the fist occurrence of the <dateline> field in under <body> parent tag.
<!--Given sample Input XML: -->
<content>
<data>
<datatext>
<message name="message">
<p>Test message paragraph.
<dateline name="dateline">Message datelines</dateline>?
<annotation type="note">Test message Note.</annotation>
</p>
</message>
<head name="head">
<p>Test Head paragraph <annotation type="note">Head notes </annotation> paragraph.
<dateline name="dateline">Head dateline</dateline>
</p>
</head>
<body name="body">
<p>
Test first Body paragraph.
<annotation type="note">First Body notes.</annotation>
</p>
<p>Test Second Body paragraph.</p>
<p>
<annotation type="note">Second Body notes.</annotation>
Test third Body paragraph.
<dateline name="dateline">SECOND DATELINE</dateline>
</p>
<p>Test Fouth Body paragraph.</p>
<p>
<dateline name="dateline">THIRD DATELINE</dateline>
Test fourth Body paragraph.
<annotation type="note">Third Body notes.</annotation>
</p>
</body>
</datatext>
</data>
</content>
The expected output, the first occurrence of the <dateline> tag should be removed,
<!-- Expected Output XML -->
<content>
<data>
<datatext>
<message name="message">
<p>Test message paragraph.
<dateline name="dateline">Message datelines</dateline>?
<annotation type="note">Test message Note.</annotation>
</p>
</message>
<head name="head">
<p>Test Head paragraph <annotation type="note">Head notes </annotation> paragraph.
<dateline name="dateline">Head dateline</dateline>
</p>
</head>
<body name="body">
<p>
Test first Body paragraph.
<annotation type="note">First Body notes.</annotation>
</p>
<p>Test Second Body paragraph.</p>
<p>
<annotation type="note">Second Body notes.</annotation>
Test third Body paragraph.
</p>
<p>Test Fouth Body paragraph.</p>
<p>
<dateline name="dateline">THIRD DATELINE</dateline>
Test fourth Body paragraph.
<annotation type="note">Third Body notes.</annotation>
</p>
</body>
</datatext>
</data>
</content>

skip only the fist occurrence of the <dateline> field in under
<body> parent tag
First, body is an ancestor of dateline, not the parent.
Now, since you want to copy everything except one node, it would be best to start with the identity transform template (that copies everything) as the rule, and add an exception for the node in question:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="body//dateline[generate-id()=generate-id(ancestor::body/descendant::dateline[1])]"/>
</xsl:stylesheet>
Why must this be so complicated:
In order to select the first dateline descendant of body, you must use the expression:
body/descendant::dateline[1]
and not:
body//dateline[1]
This is explained in the XPath specification:
NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first
descendant para element; the former selects all descendant para
elements that are the first para children of their parents.
However, the expression:
body/descendant::dateline[1]
is not a valid match pattern. Although patterns may use the // operator, they must not use the descendant axis: https://www.w3.org/TR/xslt/#patterns
Therefore I have chosen to match any dateline that is a descendant of body, and add a predicate that compares the unique id of the current dateline with the one that is truly the first descendant of the ancestor body. This works because the descendant axis is allowed in a predicate.

Here's one possible solution.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:variable name="bdl" select="//body//dateline"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="dateline[index-of($bdl,.) = 1]"/>
</xsl:stylesheet>
At first I thought you could get by with just
<xsl:template match="//body//dateline[1]"/>
But this doesn't work since the [1] predicate is focus and context dependent, and both dateline tags in the body are first under their immediate parent. This solution first builds a sequence of all body dateline tags (in $bdl) and then deletes only the one that matches the first entry in the list.
There is probably a "better" or more idiomatic way of accomplishing this, and I hope one of the XSLT gurus will answer as well.

Related

How i can get text value from body node?

I need get node if it contains text.
So when i process <p> tag - i need to check previous <topic> if it has text in body or in any child tag in <body>
With next XSL code
ancestor::topic[1]/preceding-sibling::topic[1]/body/child::node()[(self::text() and normalize-space()) or self::*][position() = last()]
But it's for some reasons not working... Why?
<topic>
<body>Topic 3 with only a paragraph, no topic title</body>
</topic>
<topic>
<body>
<p> <!-- from here -->
<image href="" />
</p> <!-- and from here -->
</body>
</topic>
<topic>
<body>Topic 5 with only a paragraph, no topic title</body>
</topic>
I think you need to tell us in more detail what you are trying and how it exactly it fails for your; when I convert your input snippet into a well-formed input document
<root>
<topic>
<body>Topic 3 with only a paragraph, no topic title</body>
</topic>
<topic>
<body>
<p> <!-- from here -->
<image href="" />
</p> <!-- and from here -->
</body>
</topic>
<topic>
<body>Topic 5 with only a paragraph, no topic title</body>
</topic>
</root>
and run it through a stylesheet matching on a p with your posted condition in a predicate it obviously finds that single p you have i.e.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0">
<xsl:mode on-no-match="shallow-skip"/>
<xsl:template match="p[ancestor::topic[1]/preceding-sibling::topic[1]/body/child::node()[(self::text() and normalize-space()) or self::*][position() = last()]]">
found
</xsl:template>
</xsl:stylesheet>
outputs found as the match happens.
So explain with minimal but complete samples what your are trying, which output you expect and how it fails (i.e. which error you get or which wrong output), then we can tell perhaps what is wrong.
Sorry for the non-answer, but I couldn't stuff the code example that shows your expression does seem to work into a comment.

How to convert and move XML comment nodes to element nodes in a different location

I have a document that contains comment nodes in a variety of locations. I want to move these comments to a single new location in the document, and convert them to p elements.
I am an XSLT beginner, and with the help of W3schools and StackFlow I’ve been able to get these comments converted and in the correct location. However, the converted comments are copied, not moved, so they stay in their original locations.
For example, given the following input:
<?xml version="1.0" encoding="UTF-8"?>
<!--Comment before root element-->
<concept>
<!--Comment element is parent-->
<title>Test Topic</title>
<shortdesc>This is a shortdesc element <!-- shortdesc element is parent --></shortdesc>
<conbody>
<!-- Conbody element is parent -->
<p>This is para 1 </p>
<section>
<!--Section element is parent; comment is before title-->
<title>Section 1</title>
<!--Section element is parent; comment is after title-->
<p>This is para 1 in section1</p>
</section>
</conbody>
</concept>
I want the following output:
<?xml version="1.0" encoding="UTF-8"?>
<concept>
<title>Test Topic</title>
<shortdesc>This is a shortdesc element </shortdesc>
<conbody>
<p>This is para 1 </p>
<section>
<title>Section 1</title>
<p>This is para 1 in section1</p>
</section>
<section outputclass="authorNote">
<p>Comment before root element </p>
<p>Comment element is parent</p>
<p>shortdesc element is parent</p>
<p>Conbody element is parent</p>
<p>Section element is parent; comment is before title</p>
<p>Section element is parent; comment is after title</p>
</section>
</conbody>
This is my stylesheet:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:template match="/">
<xsl:copy>
<xsl:apply-templates select="node() | #*" />
</xsl:copy>
</xsl:template>
<!-- Create new section to hold converted comments -->
<xsl:template match="conbody" >
<xsl:copy>
<xsl:apply-templates/>
<section outputclass="authorNote">
<xsl:apply-templates select="//comment()"/>
</section>
</xsl:copy>
</xsl:template>
<!-- Convert comment nodes to p elements -->
<xsl:template match="//comment()">
<p><xsl:value-of select="."/></p>
</xsl:template>
</xsl:stylesheet>
Which produces the following output:
<?xml version="1.0" encoding="UTF-8"?>
<p>Comment before root element </p>
<concept>
<p>Comment element is parent</p>
<title>Test Topic</title>
<shortdesc>This is a shortdesc element <p>shortdesc element is parent</p></shortdesc>
<conbody>
<p>Conbody element is parent</p>
<p>This is para 1 </p>
<section>
<p>Section element is parent; comment is before title</p>
<title>Section 1</title>
<p>Section element is parent; comment is after title</p>
<p>This is para 1 in section1</p>
</section>
<section outputclass="authorNote">
<p>Comment before root element </p>
<p>Comment element is parent</p>
<p>shortdesc element is parent</p>
<p>Conbody element is parent</p>
<p>Section element is parent; comment is before title</p>
<p>Section element is parent; comment is after title</p>
</section>
</conbody>
</concept>
What am I doing wrong? The articles I have found so far are all about manipulating elements, rather than comment nodes. Can someone give me some tips on how to address this problem?
Separate the two tasks with modes (don't copy the comments with the default mode, transform them with a different mode):
<!-- suppress treatment of comments through default mode -->
<xsl:template match="comment()"/>
<!-- Create new section to hold converted comments -->
<xsl:template match="conbody" >
<xsl:copy>
<xsl:apply-templates/>
<section outputclass="authorNote">
<xsl:apply-templates select="//comment()" mode="transform"/>
</section>
</xsl:copy>
</xsl:template>
<!-- Convert comment nodes to p elements -->
<xsl:template match="comment()" mode="transform">
<p><xsl:value-of select="."/></p>
</xsl:template>
Online sample at http://xsltransform.net/93dEHGn.

Recursively replacing elements in XSLT

I need to replace the <tref> element with other tags from elsewhere in my document. For example, I have:
<tref id="57236"/>
and
<Topic>
<ID>57236</ID>
<Text>
<p id="4">
<cs id="56792">1090-189-01 </cs>
<href id="57237">
<cs id="56792">Document Name</cs>
</href>
</p>
</Text>
</Topic>
Obtaining the following is not a problem:
<p id="4">
<cs id="56792">1090-189-01 </cs>
<href id="57237">
<cs id="56792">Document Name</cs>
</href>
</p>
With this stylesheet:
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="tref">
<xsl:variable name="NodeID"><xsl:value-of select="#id"/></xsl:variable>
<xsl:copy-of select="//Topic[ID = $NodeID]/Text/p/node()"/>
</xsl:template>
What I cannot do is replacing trefs nested into other trefs. For example, consider the following:
<tref id="57236"/>
and:
<Topic>
<ID>57236</ID>
<Text>
<p id="251">
<tref id="37287"/>
</p>
</Text>
</Topic>
My stylesheet duly replaces the tref with the content of the tag - which also contains a tref:
<p id="251">
<tref id="37287"/>
</p>
My current solution is to call <xsl:template match="tref"> from two different stylesheets. It does the job, but it is not very elegant, and what if trefs are nested at an even deeper level? And recursion is the bread and butter of XSLT.
Is there a solution to recursively replace all trefs as in XSLT?
Instead of using xsl:copy-of, use xsl:apply-templates
<xsl:apply-templates select="//Topic[ID = $NodeID]/Text/p/node()"/>
Or, to eliminate the use of the varianle
<xsl:apply-templates select="//Topic[ID = current()/#id]/Text/p/node()"/>
Note you can make use of an xsl:key to look-up the Topic elements
<xsl:key name="topic" match="Topic" use="ID" />
Then you can write this
<xsl:apply-templates select="key('topic', #id)/Text/p/node()"/>
Be wary of infinite recursion if you have a tref referring to a Topic that is an ancestor of it.

With XSLT, how can I process normally, but hold some nodes until the end and then output them all at once (e.g. footnotes)?

I have an XSLT application which reads the internal format of Microsoft Word 2007/2010 zipped XML and translates it into HTML5 with XSLT. I am investigating how to add the ability to optionally read OpenOffice documents instead of MSWord.
Microsoft stores XML for footnote text separately from the XML of the document text, which happens to suit me because I want the footnotes in a block at the end of the output HTML page.
However, unfortunately for me, OpenOffice puts each footnote right next to its reference, inline with the text of the document. Here is a simple paragraph example:
<text:p text:style-name="Standard">The real breakthrough in aerial mapping
during World War II was trimetrogon
<text:note text:id="ftn0" text:note-class="footnote">
<text:note-citation>1</text:note-citation>
<text:note-body>
<text:p text:style-name="Footnote">Three separate cameras took three
photographs at once, a direct downward and an oblique on each side.</text:p>
</text:note-body>
</text:note>
photography, but the camera was large and heavy, so there were problems finding
the right aircraft to carry it.
</text:p>
My question is, can XSLT process the XML as normal, but hold each of the text:note items until the end of the document text, and then emit them all at one time?
You're thinking of your logic as being driven by the order of things in the input, but in XSLT you need to be driven by the order of things in the output. When you get to the point where you want to output the footnotes, go find the footnote text wherever it might be in the input. Admittedly that doesn't always play too well with the apply-templates recursive descent processing model, which is explicitly input-driven; but nevertheless, that's the way you have to do it.
Don't think of it as "holding" the text:note items, instead simply ignore them in the main pass and then gather them at the end with a //text:note and process them there, e.g.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:text="whateveritshouldbe">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<!-- normal mode - replace text:note element by [reference] -->
<xsl:template match="text:note">
<xsl:value-of select="concat('[', text:note-citation, ']')" />
</xsl:template>
<xsl:template match="/">
<document>
<xsl:apply-templates select="*" />
<footnotes>
<xsl:apply-templates select="//text:note" mode="footnotes"/>
</footnotes>
</document>
</xsl:template>
<!-- special "footnotes" mode to de-activate the usual text:node template -->
<xsl:template match="#*|node()" mode="footnotes">
<xsl:copy>
<xsl:apply-templates select="#*|node()" mode="footnotes" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
You could use <xsl:apply-templates mode="..."/>. I'm not sure on the exact syntax and your use case, but maybe the example below will give you a clue on how to approach your problem.
Basic idea is to process your nodes twice. First iteration would be pretty much the same as now, and the second iteration only looks for footnotes and only outputs those. You differentiate those iteration by setting "mode" parameter.
Maybe this example will give you a clue how to approach your problem. Note that I used different tags that in your code, so the example would be simpler.
XSLT sheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" />
<xsl:template match="doc">
<xml>
<!-- First iteration - skip footnotes -->
<doc>
<xsl:apply-templates select="text" />
</doc>
<!-- Second iteration, extract all footnotes.
'mode' = footnotes -->
<footnotes>
<xsl:apply-templates select="text" mode="footnotes" />
</footnotes>
</xml>
</xsl:template>
<!-- Note: no mode attribute -->
<xsl:template match="text">
<text>
<xsl:for-each select="p">
<p>
<xsl:value-of select="text()" />
</p>
</xsl:for-each>
</text>
</xsl:template>
<!-- Note: mode = footnotes -->
<xsl:template match="text" mode="footnotes">
<xsl:for-each select=".//footnote">
<footnote>
<xsl:value-of select="text()" />
</footnote>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Input XML:
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<text>
<p>
some text
<footnote>footnote1</footnote>
</p>
<p>
other text
<footnote>footnote2</footnote>
</p>
</text>
<text>
<p>
some text2
<footnote>footnote3</footnote>
</p>
<p>
other text2
<footnote>footnote4</footnote>
</p>
</text>
</doc>
Output XML:
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<!-- Output from first iteration -->
<doc>
<text>
<p>some text</p>
<p>other text</p>
</text>
<text>
<p>some text2</p>
<p>other text2</p>
</text>
</doc>
<!-- Output from second iteration -->
<footnotes>
<footnote>footnote1</footnote>
<footnote>footnote2</footnote>
<footnote>footnote3</footnote>
<footnote>footnote4</footnote>
</footnotes>
</xml>

Parsing XML string using XSLT

I have an XML document that has a TextBlock that contains HTML code.
<TextBlock>
<h1>This is a header.</h1>
<p>This is a paragraph.</p>
</TextBlock>
In the actual XML, however, it is coded like this:
<TextBlock>
<h1>This is a header.</h1>
<p>This is a paragraph.</p>
</TextBlock>
So when I use <xsl:value-of select="TextBlock"/> it displays all of the coding on the page. Is there a way using XSLT to convert < to < within the TextBlock element?
<xsl:value-of select="TextBlock" disable-output-escaping="yes"/>
and the result:
<h1>This is a header.</h1>
<p>This is a paragraph.</p>
Firefox has a corresponding bug: https://bugzilla.mozilla.org/show_bug.cgi?id=98168, which contains a lot of comments and is an interesting reading.
I am looking for a fix now.
EDIT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="disable-output-escaping.xsl"/>
<!-- https://bug98168.bugzilla.mozilla.org/attachment.cgi?id=434081 -->
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:template match="/TextBlock">
<xsl:copy>
<xsl:call-template name="disable-output-escaping"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When inspecting via Firebug, the result looks correct:
<textblock>
<h1>This is a header.</h1>
<p>This is a paragraph.</p>
</textblock>