Input contains a paragraph character that needs to be removed - xslt

I have been attempting to modify the text of the parent element from within the xsl. How can I delete the element from the XSL code ( I do not control the input ). I only want to delete the preceding line break not all line breaks in the body. The preceding 'some text here' may take the form of multiple paragraphs.
Xsl
<xsl:template match="element">
<!-- attempting to add fix here -->
<xsl:apply-templates />
</xsl:template>
Input
<body>
<p>
some text here
</p>
<element>
some more text
</element>
</body>
Output
some text here
some more text
Desired Output
some text here some more text

Does
<xsl:template match="p[following-sibling::*[1][self::element]]//text() | element[preceding-sibling::*[1][self::p]//text()">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
do what you want?
You don't need the <xsl:template match="element"><xsl:apply-templates/></xsl:template> as the built-in template will do that anyway.
I found some time to test code, now I have
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="p[following-sibling::*[1][self::element]]//text() |
element[preceding-sibling::*[1][self::p]]//text()">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
<xsl:template match="text()[preceding-sibling::*[1][self::p] and following-sibling::*[1][self::element] and not(normalize-space())]">
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
transforms
<body>
<p>
some text here
</p>
<element>
some more text
</element>
</body>
into
some text here some more text

Related

XSLT wrap element and following-sibling text

Kindly help me to wrap the img.inline element with the following sibling text comma (if comma exists):
text <img id="1" class="inline" src="1.jpg"/> another text.
text <img id="2" class="inline" src="2.jpg"/>, another text.
Should be changed to:
text <img id="1" class="inline" src="1.jpg"/> another text.
text <span class="img-wrap"><img id="2" class="inline" src="2.jpg"/>,</span> another text.
Currently, my XSLT will wrap the img.inline element and add comma inside the span, now I want to remove the following comma.
text <span class="img-wrap"><img id="2" class="inline" src="2.jpg"/>,</span>
, <!--remove this extra comma--> another text.
My XSLT:
<xsl:template match="//img[#class='inline']">
<xsl:copy>
<xsl:choose>
<xsl:when test="starts-with(following-sibling::text(), ',')">
<span class="img-wrap">
<xsl:apply-templates select="node()|#*"/>
<xsl:text>,</xsl:text>
</span>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="node()|#*"/>
</xsl:otherwise>
</xsl:choose>
</xsl:copy>
<!-- checking following-sibling::text() -->
<xsl:apply-templates select="following-sibling::text()" mode="commatext"/>
</xsl:template>
<!-- here I want to match the following text, if comma, then remove it -->
<xsl:template match="the following comma" mode="commatext">
<!-- remove comma -->
</xsl:template>
Is my approach is correct? or is this something should be handled differently? pls suggest?
Currently you are copying the img and the embedding the span within that. Also, you do <xsl:apply-templates select="node()|#*"/> which will select child nodes of img (or which there are none). And for the attributes it will end add them to the span.
You don't actually need the xsl:choose here as you can add the condition to the match attribute.
<xsl:template match="//img[#class='inline'][starts-with(following-sibling::node()[1][self::text()], ',')]">
Note I have changed the condition as following-sibling::text() selects ALL text elements that follow the img node. You only want to get the node immediately after the img node, but only if it is a text node.
Also, trying to select the following text node with xsl:apply-templates is probably not the right approach, assuming you have a template that matches the parent node which selects all child nodes (not just img ones). I am assuming you were using the identity template here.
Anyway, try this XSLT instead
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" indent="no" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="//img[#class='inline'][starts-with(following-sibling::node()[1][self::text()], ',')]">
<span class="img-wrap">
<xsl:copy-of select="." />
<xsl:text>,</xsl:text>
</span>
</xsl:template>
<xsl:template match="text()[starts-with(., ',')][preceding-sibling::node()[1][self::img]/#class='inline']">
<xsl:value-of select="substring(., 2)" />
</xsl:template>
</xsl:stylesheet>

XSLT: in text element, how to replace line break (<br/>) with blank space?

NOTE: I am using xsltproc on OS X Yosemite.
The source content for an XSLT transformation is HTML. Some
text nodes contain line breaks (<br/>). In the transformed
content (an XML file), I wish to convert the line breaks to spaces.
For example, I have:
<div class="location">London<br />Hyde Park<br /></div>
I want to transform this element like so:
<xsl:element name="location">
<xsl:variable name="location" select="div[#class='location']"/>
<xsl:value-of select="$location"/>
</xsl:element>
What happens is the <br /> are simply removed the output:
<location>LondonHyde Park</location>
I do have other templates that are involved:
<xsl:template match="node()|script"/>
<xsl:template match="*">
<xsl:apply-templates/>
</xsl:template>
What XSLT operations are required to transform the <br />'s here
to a single space?
I would use xsl:apply-templates instead of xsl:value-of and add a template to handle <br/>.
You would also need to modify <xsl:template match="node()|script"/> because node() also selects text nodes. You can replace node() with processing-instruction()|comment() if you need to, but they would not be output by default anyway.
Here's a working example:
Input
<div class="location">London<br />Hyde Park<br /></div>
XSLT 1.0
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="script"/>
<xsl:template match="*">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="div[#class='location']">
<location><xsl:apply-templates/></location>
</xsl:template>
<xsl:template match="br">
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
Output
<location>London Hyde Park </location>
If you don't want the trailing space, you could either...
put the xsl:apply-templates in a variable ($var) and use normalize-space() in an xsl:value-of. Like: <xsl:value-of select="normalize-space($var)"/>
update the match for the br element. Like: br[not(position()=last())]

Transforming node contents to remove whitespace

If the contents of a citations node is something like the following:
<p>
WAJWAJADS:
</p>
<p>
asdf
</p>
<p>
ALSOAS:
</p>
<p>
lorem ipsum...<br />
lorem<br />
blah blah <i>
adfas & dasdsaafs
</i>, April 2011.<br />
lorem lorem dear lord the whitespace
</p>
Is there any way to transform this to properly formatted HTML with XSLT?
normalize-space() just concats everything together. The best I've managed to do is normalize-space() on all p descendants within a for-each loop and wrap them in a p element. However, then any inner tags are still lost.
Is there a better way to parse this WYSIWYG generated trainwreck? Unfortunately I have no control over the generated XML.
I've modified a little the answer by Martin Honnen:
<xsl:template match="text()">
<xsl:value-of select="normalize-space(.)"/>
<xsl:if test="substring(., string-length(.)) = ' ' and substring(., string-length(.) - 1, string-length(.)) != ' '">
<xsl:text> </xsl:text>
</xsl:if>
</xsl:template>
it tests if the last character is a space and the last 2 characters are not both spaces, if true, it inserts a space.
You first need to have a well-formed XML with a root.
Assuming you have that, you can apply an identity transform to copy the source tree to the result, strip spaces between the tags, optionally generate output in HTML (without the XML declaration) and indented, and use normalize-space() only in the text nodes.
Try this stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes" method="html"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
</xsl:stylesheet>
The result applied to the data you provided will be:
<p>WAJWAJADS:</p>
<p>asdf</p>
<p>ALSOAS:</p>
<p>lorem ipsum...<br>lorem<br>blah blah<i>adfas & dasdsaafs</i>, April 2011.<br>lorem lorem dear lord the whitespace
</p>
You can see the result applied to your example in this XSLT Fiddle
UPDATE 1: to add an extra space around each text node (and avoid concatenation when the string value of the node is calculated) you can replace the last template with:
<xsl:template match="text()">
<xsl:value-of select="concat(' ',normalize-space(.),' ')"/>
</xsl:template>
Result:
<html>
<p> WAJWAJADS: </p>
<p> asdf </p>
<p> ALSOAS: </p>
<p> lorem ipsum... <br> lorem <br> blah blah <i> adfas & dasdsaafs </i> , April 2011. <br> lorem lorem dear lord the whitespace
</p>
</html>
See: http://xsltransform.net/3NzcBsE/1
UPDATE 2: to add a space or newline after each copied element. Place this <xsl:text>
</xsl:text> (for a newline) or this <xsl:text> </xsl:text> (for a space) after the </xsl:copy> in the first template:
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
<xsl:text>
</xsl:text>
</xsl:template>
Result:
<html>
<p>WAJWAJADS:</p>
<p>asdf</p>
<p>ALSOAS:</p>
<p>lorem ipsum...<br>
lorem<br>
blah blah<i>adfas & dasdsaafs</i>
, April 2011.<br>
lorem lorem dear lord the whitespace
</p>
</html>
See: http://xsltransform.net/3NzcBsE/2
Use the identity transformation template plus a template for text nodes doing the normalize-space:
<xsl:template match="text()"><xsl:value-of select="normalize-space()"/></xsl:template>
This question would have been a lot easier to understand if the example contained real text instead of gibberish. "No additional whitespace between node start/end and text." is not an accurate enough description of the expected result.
I am going to take a guess here and assume you actually want to perform a "run of spaces to one space" operation on all the text nodes. This could be done as follows:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()" priority="1">
<xsl:variable name="temp" select="normalize-space(concat('x', ., 'x'))" />
<xsl:value-of select="substring($temp, 2, string-length($temp) - 2)"/>
</xsl:template>
</xsl:stylesheet>
When applied to the following test input:
<chapter>
<p>
This question would have
been a lot <b> easier </b> to understand
if the example contained
<i> real </i> text instead of
gibberish.
</p>
<p>
Here is an example of preserving zero spaces
between text nodes:<br/>(continued) on a new line.
</p>
<p>
Here is another example of
preserving zero spaces within a text
node: <i>some text in italic</i> followed
by normal text.
</p>
</chapter>
the result will be:
<?xml version="1.0" encoding="UTF-8"?>
<chapter>
<p> This question would have been a lot <b> easier </b> to understand if the example contained <i> real </i> text instead of gibberish. </p>
<p> Here is an example of preserving zero spaces between text nodes:<br/>(continued) on a new line. </p>
<p> Here is another example of preserving zero spaces within a text node: <i>some text in italic</i> followed by normal text. </p>
</chapter>
--
Note that there will be no difference between the input and output when rendered in HTML.

How do I understand linebreaks in XSLT?

I have a piece of XML that looks like:
<bunch of other things>
<bunch of other things>
<errorLog> error1 \n error2 \n error3 </errorLog>
I want to modify the XSLT that this XML runs through to apply newlines after errors1 through error3.
I can completely control the output of errorLog or the contents of the XSLT file, but I'm not sure how to craft either the XML or the XSLT to make the output HTML show line breaks. Is it easier to change the XML output into some special character that will cause a newline, or do I modify the XSLT to interpret \n as newlines?
There is an example on this site that contains something akin to what I want, but my <errorLog> XSLT is nested in another template, and I'm not sure how templates inside templates can work.
Backslash is used as an escape character in a number of languages including C and Java, but not in XML or XSLT. If you put \n in your stylesheet, that's not a newline, it's two characters backslash followed by "n". The XML way of writing a newline is
. However, if you send a newline to the browser in HTML, it displays it as a space. If you want a newline displayed by the browser, you need to send a <br/> element.
If you have control over your errorLog element then you may as well use a literal LF character in there. It is no different from any other character as far as XSLT is concerned.
As for creating HTML that displays with line breaks, you will want to add a <br/> element in place of whatever marker you have in your XML source. It would be easiest of all if you could put each error within a separate element, like this
<errorLog>
<error>error1</error>
<error>error2</error>
<error>error3</error>
</errorLog>
then the XSLT doesn't have to go through the rather clumsy process of splitting up the text itself.
With this XML data taken from your question
<document>
<bunch-of-other-things/>
<bunch-of-other-things/>
<errorLog>error1 \n error2 \n error3</errorLog>
</document>
this stylesheet
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes" />
<xsl:template match="/document">
<html>
<head>
<title>Error Log</title>
</head>
<body>
<xsl:apply-templates select="*"/>
</body>
</html>
</xsl:template>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="errorLog">
<p>
<xsl:call-template name="split-on-newline">
<xsl:with-param name="string" select="."/>
</xsl:call-template>
</p>
</xsl:template>
<xsl:template name="split-on-newline">
<xsl:param name="string"/>
<xsl:choose>
<xsl:when test="contains($string, '\n')">
<xsl:value-of select="substring-before($string, '\n')"/>
<br/>
<xsl:call-template name="split-on-newline">
<xsl:with-param name="string" select="substring-after($string, '\n')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$string"/>
<br/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
will produce this output
<html>
<head>
<title>Error Log</title>
</head>
<body>
<bunch-of-other-things/>
<bunch-of-other-things/>
<p>error1 <br/> error2 <br/> error3<br/>
</p>
</body>
</html>

With XSLT, how can I process normally, but hold some nodes until the end and then output them all at once (e.g. footnotes)?

I have an XSLT application which reads the internal format of Microsoft Word 2007/2010 zipped XML and translates it into HTML5 with XSLT. I am investigating how to add the ability to optionally read OpenOffice documents instead of MSWord.
Microsoft stores XML for footnote text separately from the XML of the document text, which happens to suit me because I want the footnotes in a block at the end of the output HTML page.
However, unfortunately for me, OpenOffice puts each footnote right next to its reference, inline with the text of the document. Here is a simple paragraph example:
<text:p text:style-name="Standard">The real breakthrough in aerial mapping
during World War II was trimetrogon
<text:note text:id="ftn0" text:note-class="footnote">
<text:note-citation>1</text:note-citation>
<text:note-body>
<text:p text:style-name="Footnote">Three separate cameras took three
photographs at once, a direct downward and an oblique on each side.</text:p>
</text:note-body>
</text:note>
photography, but the camera was large and heavy, so there were problems finding
the right aircraft to carry it.
</text:p>
My question is, can XSLT process the XML as normal, but hold each of the text:note items until the end of the document text, and then emit them all at one time?
You're thinking of your logic as being driven by the order of things in the input, but in XSLT you need to be driven by the order of things in the output. When you get to the point where you want to output the footnotes, go find the footnote text wherever it might be in the input. Admittedly that doesn't always play too well with the apply-templates recursive descent processing model, which is explicitly input-driven; but nevertheless, that's the way you have to do it.
Don't think of it as "holding" the text:note items, instead simply ignore them in the main pass and then gather them at the end with a //text:note and process them there, e.g.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:text="whateveritshouldbe">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<!-- normal mode - replace text:note element by [reference] -->
<xsl:template match="text:note">
<xsl:value-of select="concat('[', text:note-citation, ']')" />
</xsl:template>
<xsl:template match="/">
<document>
<xsl:apply-templates select="*" />
<footnotes>
<xsl:apply-templates select="//text:note" mode="footnotes"/>
</footnotes>
</document>
</xsl:template>
<!-- special "footnotes" mode to de-activate the usual text:node template -->
<xsl:template match="#*|node()" mode="footnotes">
<xsl:copy>
<xsl:apply-templates select="#*|node()" mode="footnotes" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
You could use <xsl:apply-templates mode="..."/>. I'm not sure on the exact syntax and your use case, but maybe the example below will give you a clue on how to approach your problem.
Basic idea is to process your nodes twice. First iteration would be pretty much the same as now, and the second iteration only looks for footnotes and only outputs those. You differentiate those iteration by setting "mode" parameter.
Maybe this example will give you a clue how to approach your problem. Note that I used different tags that in your code, so the example would be simpler.
XSLT sheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" />
<xsl:template match="doc">
<xml>
<!-- First iteration - skip footnotes -->
<doc>
<xsl:apply-templates select="text" />
</doc>
<!-- Second iteration, extract all footnotes.
'mode' = footnotes -->
<footnotes>
<xsl:apply-templates select="text" mode="footnotes" />
</footnotes>
</xml>
</xsl:template>
<!-- Note: no mode attribute -->
<xsl:template match="text">
<text>
<xsl:for-each select="p">
<p>
<xsl:value-of select="text()" />
</p>
</xsl:for-each>
</text>
</xsl:template>
<!-- Note: mode = footnotes -->
<xsl:template match="text" mode="footnotes">
<xsl:for-each select=".//footnote">
<footnote>
<xsl:value-of select="text()" />
</footnote>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Input XML:
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<text>
<p>
some text
<footnote>footnote1</footnote>
</p>
<p>
other text
<footnote>footnote2</footnote>
</p>
</text>
<text>
<p>
some text2
<footnote>footnote3</footnote>
</p>
<p>
other text2
<footnote>footnote4</footnote>
</p>
</text>
</doc>
Output XML:
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<!-- Output from first iteration -->
<doc>
<text>
<p>some text</p>
<p>other text</p>
</text>
<text>
<p>some text2</p>
<p>other text2</p>
</text>
</doc>
<!-- Output from second iteration -->
<footnotes>
<footnote>footnote1</footnote>
<footnote>footnote2</footnote>
<footnote>footnote3</footnote>
<footnote>footnote4</footnote>
</footnotes>
</xml>