outputting entities in xml source to valid html - xslt

I'm trying to transform some xml with content encoded as html entities. I want to output the entity content as valid html.
The xml is like this ..
<?xml version="1.0" encoding="UTF-8"?><memo Version="1.0">
<header>
<meta title="==PROGRAMMING=="/>
<meta favourite="false"/>
<meta uuid="85f94ab2-77a8-XXXXXXXXXXXXXXX"/>
<meta createdTime="1551038092051"/>
</header>
<contents>
<content><p value="memo2" >=====</p><p>https://medium.freecodecamp.org/</p><p>=====</p>
</content>
</contents>
</memo>
I have some xslt as so..
xslt_src = '''
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<xsl:apply-templates select="memo/header/meta"/>
</head>
<body>
<xsl:apply-templates select="memo/contents"/>
</body>
</html>
</xsl:template>
<xsl:template match="memo/header/meta">
<xsl:apply-templates select="#title"/>
</xsl:template>
<xsl:template match="memo/contents">
<div class='content'>
<xsl:value-of select="content/text()"/>
</div>
</xsl:template>
<xsl:template match="#title">
<span id='title'>
<xsl:value-of select="."/>
</span>
</xsl:template>
</xsl:stylesheet>
'''
I process it with lxml in Python...
_________________________________________________________________python
from lxml import etree
xslt = etree.XML(xslt_src)
transform = etree.XSLT(xslt)
src = open('simple.xml').read()
xml = etree.XML(str.encode(src))
result = transform(xml)
root = result.getroot()
print('-----------------------out 1')
print(etree.tostring(root, pretty_print=True).decode('utf-8'))
print('-----------------------out 2')
content = root.xpath('/html/body/div/text()')
print(content[0])
==============================================================
etree.tostring(root) prints the structured document but leaves the html entities as encoded in the original xml.
-----------------------out 1
<html>
<head>
<span id="title">==PROGRAMMING==</span>
</head>
<body>
<div class="content"><p value="memo2" >=====</p><p>https://medium.freecodecamp.org/</p><p>=====</p>
</div>
</body>
</html>
but if I print root.xpath('/html/body/div/text()')[0] (the node with the html content) I get what I want...
-----------------------out 2
<p value="memo2" >=====</p><p>https://medium.freecodecamp.org/</p><p>=====</p>
=======================================================================
My question is: how can I make etree.tostring(root) replace the html entities with valid html, as is printed when I use the text attribute directly?
Cheers!
bitrat

Instead of:
<xsl:value-of select="content/text()"/>
try:
<xsl:value-of select="content" disable-output-escaping="yes"/>

Related

XSL embed original XML file

I am transforming XML files with XSL to HTML files. Is it possible to embed the original XML file in the HTML output? When yes, how is that possible?
Update 1: To make my need better understandable: In my HTML file, I want a form where I can download the original XML file. Therefore I have to embed the original XML file into my HTML file (e.g. as a hidden input field)
Thanks
If you want to copy the nodes through you can simply do <xsl:copy-of select="/"/> where you want to insert them, however, putting arbitrary XML nodes into HTML does not make sense usually. If you want to serialize an XML document to plain text to render it then you can use solutions like http://lenzconsulting.com/xml-to-string/, for instance:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:import href="http://lenzconsulting.com/xml-to-string/xml-to-string.xsl"/>
<xsl:output method="html" doctype-public="XSLT-compat" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<xsl:template match="/">
<html>
<head>
<title>Test</title>
</head>
<body>
<section>
<h1>Test</h1>
<xsl:apply-templates/>
<section>
<h2>Source</h2>
<pre>
<xsl:apply-templates mode="xml-to-string"/>
</pre>
</section>
</section>
</body>
</html>
</xsl:template>
<xsl:template match="data">
<ul>
<xsl:apply-templates/>
</ul>
</xsl:template>
<xsl:template match="item">
<li>
<xsl:apply-templates/>
</li>
</xsl:template>
</xsl:transform>
transforms an XML input like
<data>
<item att="value">
<!-- comment -->
<foo>bar</foo>
</item>
</data>
into the HTML
<!DOCTYPE html
PUBLIC "XSLT-compat">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Test</title>
</head>
<body>
<section>
<h1>Test</h1>
<ul>
<li>
bar
</li>
</ul>
<section>
<h2>Source</h2><pre><data>
<item att="value">
<!-- comment -->
<foo>bar</foo>
</item>
</data></pre></section>
</section>
</body>
</html>

using XSLT to display image series with captions in XHTML

I'm trying to display a series of images each with its own caption using XSLT. I've coded the images and the captions by nesting <img> and then <figcaption> within but the resultant html does not display as intended (the captions are not lining up with corresponding images). Is there a way to nest <xsl: for-each> for the captions within the images? Here's the XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output method="html"/>
<xsl:template match="letter">
<html>
<head>
<style type="text/css">
#wrapper {min-height: 100%;}
#figcaption {
text-align: left;
}
#main {
padding-top: 15px;;
width: 1200px;
}
</style>
</head>
<body>
<div id="wrapper">
<div id="images">
<figure>
<xsl:if test="image">
<xsl:for-each select="image/#xlink:href">
<img>
<xsl:attribute name="src">
<xsl:value-of select="."/>
</xsl:attribute>
</img>
</xsl:for-each>
</xsl:if>
<xsl:if test="image/#label">
<xsl:for-each select="image/#label">
<figcaption><xsl:value-of select="."/></figcaption>
</xsl:for-each>
</xsl:if>
</figure>
</div>
</div>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Here's the corresponding XML:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="XSLT.xsl"?>
<letter xmlns:xlink="http://www.w3.org/1999/xlink">
<image label="page 1" xlink:href="http://tinyurl.com/nu7zmhc"/>
<image label="page 2" xlink:href="http://tinyurl.com/pysyztr"/>
<title>Letter from Shawn Schuyler</title>
<date>1963-06-30</date>
<language>English</language>
<creator>
<firstName>William</firstName>
<lastName>Schultz</lastName>
<street>Unites States Disciplinary Barracks</street>
<city>Fort Leavenworth</city>
<state abbr="KS">Kansas</state>
</creator>
</letter>
My desired output in html is basically this for each image:
<figure>
<img src='image.jpg'/>
<figcaption>Caption</figcaption>
</figure>
Or simply:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xlink="http://www.w3.org/1999/xlink"
exclude-result-prefixes="xlink">
<xsl:template match="/letter">
<html>
<head>
<style type="text/css">
#wrapper {min-height: 100%;}
#figcaption {
text-align: left;
}
#main {
padding-top: 15px;;
width: 1200px;
}
</style>
</head>
<body>
<div id="wrapper">
<div id="images">
<xsl:for-each select="image">
<figure>
<img src='{#xlink:href}'/>
<figcaption>
<xsl:value-of select="#label"/>
</figcaption>
</figure>
</xsl:for-each>
</div>
</div>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Note:
There's nothing wrong with using xsl:for-each, especially in a
simple case like this;
There is something wrong with using xsl:element when you can use a literal result element. And while XSLT is naturally verbose, using the attribute value template can reduce the code (quite significanltly, as you can see in this case).
Try this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output method="html" indent="yes" />
<xsl:template match="letter">
<html>
<head>
<style type="text/css">
#wrapper {min-height: 100%;}
#figcaption {
text-align: left;
}
#main {
padding-top: 15px;;
width: 1200px;
}
</style>
</head>
<body>
<div id="wrapper">
<div id="images">
<xsl:apply-templates select="./image"></xsl:apply-templates>
</div>
</div>
</body>
</html>
</xsl:template>
<xsl:template match="letter/image">
<xsl:element name="figure">
<xsl:element name="img">
<xsl:attribute name="src">
<xsl:value-of select="./#xlink:href"/>
</xsl:attribute>
</xsl:element>
<xsl:apply-templates select="./#label"></xsl:apply-templates>
</xsl:element>
</xsl:template>
<xsl:template match="letter/image/#label">
<xsl:element name="figcaption">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
xsl:apply-templates says where anything matching the pattern specified in select should be put (with the dot showing the current element's context).
xsl:template is matched against the source document based on the path given in match. Any hits are processed in parallel, then later stitched together based on where the apply-templates elements indicate.
NB: depending on your XSLT engine having output="html" may have different effects on your img element. In HTML5 the img element is defined as not requiring a close tag (or being self-closing), so the engine won't close that tag. Arguments about whether that inconsistency is a good choice or not can be found throughout the net.
Ref: Are (non-void) self-closing tags valid in HTML5?
A good article on this alternate approach to for-each can be found here: http://gregbee.ch/blog/using-xsl-for-each-is-almost-always-wrong
You'll find with XLST that once the concept of a template clicks your code will become way shorter are simpler to maintain.

output escaping in alt text with xslt1

In my source XML, the less-than sign is represented as <, but in the output (html, as alt-text) it is represented as the < sign, which causes problems in post-processing.
I'm using saxon655 with this command line:
java -cp saxon655/saxon.jar com.icl.saxon.StyleSheet test.xml test.xsl
This really doesn't make sense to me. Here are the details:
The DocBook XML:
<chapter xmlns="http://docbook.org/ns/docbook">
<info><title>The Chapter</title></info>
<para>
<informalequation>
<mediaobject>
<imageobject>
<imagedata fileref="images/g0589.png" />
</imageobject>
<textobject role="tex"><phrase>|z_ s-z_ t|<r</phrase></textobject>
</mediaobject>
</informalequation>
</para>
</chapter>
The XSLT. If you copy this, change the path the docbook stylesheets.
<xsl:stylesheet version="1.0"
xmlns:d="http://docbook.org/ns/docbook"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="/path/to/docbook/xsl-1.78.1/html/docbook.xsl" />
<xsl:template match="d:mediaobject/d:imageobject/d:imagedata">
<xsl:element name="img">
<xsl:attribute name="alt">
<xsl:value-of select="../../d:textobject[#role='tex']/d:phrase" />
</xsl:attribute>
<xsl:attribute name="src">
<xsl:value-of select="#fileref" />
</xsl:attribute>
</xsl:element>
<xsl:apply-templates />
</xsl:template>
</xsl:stylesheet>
And the resulting HTML portion:
<div class="informalequation">
<div class="mediaobject">
<img alt="|z_ s-z_ t|<r" src="images/g0589.png"></div>
</div>
Am I doing something wrong?
As far as the W3C HTML validator says, for text/html the output is fine, I created a minimal HTML 4.01 document with the markup you have at http://home.arcor.de/martin.honnen/html/test2015040301.html, it has the content
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>img alt attribute test</title>
</head>
<body>
<div class="informalequation">
<div class="mediaobject">
<img alt="|z_ s-z_ t|<r" src="images/g0589.png"></div>
</div>
</body>
</html>
and the validator says (http://validator.w3.org/check?uri=http%3A%2F%2Fhome.arcor.de%2Fmartin.honnen%2Fhtml%2Ftest2015040301.html&charset=%28detect+automatically%29&doctype=Inline&group=0) "This document was successfully checked as HTML 4.01 Strict!". So I think Saxon is creating correct HTML, I don't know how you post-process the result of the XSLT transformation but an HTML or SGML parser should do fine with it.
With an XML output (method="xml") Saxon does escape the less than in the attribute value.

catch the first occurrence value

I've the below XML.
<chapter num="1">
<section level="sect2">
<page>22</page>
</section>
<section level="sect3">
<page>23</page>
</section>
</chapter>
here I'm trying to get the first occurrence of <page>.
I'm using the below XSLT.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ntw="Number2Word.uri" exclude-result-prefixes="ntw">
<xsl:output method="html"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="ThisDocument" select="document('')"/>
<xsl:template match="/">
<xsl:text disable-output-escaping="yes"><![CDATA[<!DOCTYPE html>]]></xsl:text>
<html>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="chapter">
<section class="tr_chapter">
<xsl:value-of select="//page[1]/text()"/>
<div class="chapter">
</div>
</section>
</xsl:template>
</xsl:stylesheet>
but the output that I get all the page valyes printed. I only want the first one.
Current output.
<!DOCTYPE html>
<html>
<body>
<section class="tr_chapter">2223
<div class="chapter">
</div>
</section>
</body>
</html>
the page values are printed here after <section class="tr_chapter">, i want only 22 but I'm getting 2223
here I'm using //page[1]/text(), because I'm not sure that the page comes within the section, it is random.
please let me know how I can get only the first page value.
here is the transformation http://xsltransform.net/3NzcBsR
Try:
<xsl:value-of select="(//page)[1]"/>
http://xsltransform.net/3NzcBsR/1
Note that this gets the value of the first page element in the entire document.
If you want to search the contents of the chapter context element in your template for the first page descendant then use <xsl:value-of select="descendant::page[1]"/> or <xsl:value-of select="(.//page)[1]"/>.

XSLT validation error javax.xml.transform.TransformerConfigurationException:

i'm having this error when i tried to validate my XSLT
javax.xml.transform.TransformerConfigurationException:
javax.xml.transform.TransformerException:
javax.xml.transform.TransformerException:
A node test that matches either NCName:* or QName was expected.
this is my XSLT
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:output method="html" />
<xsl:template match="\Apps">
<html>
<head> <title>Apps List</title>
<link rel="StyleSheet" href="table_style.css" type="text/css"/>
<style type="text/css">
body {font-family: Helvetca;}
h1 { color : Grey;}
h2 {color : Blue;}</style>
</head>
<body>
<h1> Apps List: <xsl:value-of select="\#List_Type" /></h1>
<p>This is a list of all currently hot apps:</p>
<xsl:for-each select="\App">
<xsl:if test="\App\#installed == true">
<h2 style="color:Green;"><xsl:value-of select="\App\app_name" />(instaled)</h2>
</xsl:if>
<xsl:otherwise>
<h2><xsl:value-of select="\App\app_name" /></h2>
</xsl:otherwise>
<p style="font-style:bold;">App info:</p>
<table id="#gradient-style">
<tr><th>Category:</th><td><xsl:value-of select="\App\catogry" /></td></tr>
<tr><th>Verdion:</th><td><xsl:value-of select="\App\version" /></td></tr>
<tr><th>Description:</th><td><xsl:value-of select="\App\description" /></td></tr>
<tr><th>App Reviews:</th><td><xsl:for-each select="\App\reviews\review">
<span style="font-style:bold;"><xsl:value-of select="\App\reviews\review\reviewer_name" /></span>
| <xsl:value-of select="\App\reviews\review\review_date" />
| <xsl:value-of select="\App\reviews\review\review_Time" /><br/>
<span style="font-style:bold;">Rating:</span>
<xsl:value-of select="string(\App\reviews\review\rating" /> <br/>
<xsl:value-of select="\App\reviews\review\ontent" /><br/>
----------------------------------------------------------
</xsl:for-each>
</td></tr>
</table>
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
this is the XML that tried with
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="ShdenXSLT.xsl"?>
<Apps xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" List_Type="new releases" >
<App device_type="tablet" app_id="120">
<app_name>Meeting Manager</app_name>
<catogry>LifeStyle </catogry>
<catogry>Bussnisse </catogry>
<version>1.0</version>
<description>This app is about managing the bussnisse meeting</description>
<reviews>
<review>
<reviewer_name>Shaden</reviewer_name>
<review_date>2012-02-13</review_date>
<review_time>11:35:02</review_time>
<content>it was a useful app</content>
<rating>4.5</rating>
</review>
<review>
<reviewer_name>Mohamed</reviewer_name>
<review_date>2012-03-01</review_date>
<review_time>12:15:00</review_time>
<content>i really loved this app</content>
<rating>5.0</rating>
</review>
</reviews>
</App>
<App device_type="tablet" app_id="100">
<app_name>ToDoList</app_name>
<catogry>LifeStyle </catogry>
<version>3.4.2</version>
<description>a simple To Do List applecation</description>
<reviews>
<review>
<reviewer_name>Fahad</reviewer_name>
<review_date>2010-02-05</review_date>
<review_time>09:40:55</review_time>
<content>nice app</content>
<rating>4.0</rating>
</review>
</reviews>
</App>
</Apps>
You are using backslash (\) as your XPath separator (i.e. <xsl:value-of select="\#List_Type" />), which is incorrect. It should be a forward slash (/)