Dereferencing entities using XSLT - xslt

I have XML files that use embedded entity declarations. In the result file I need to replace the entity names with the actual file they are pointing to. But AFAIK entities cannot be pointed to in an XSLT stylesheet, as they are supposed to be handled by the XML parser.
Sample source file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic SYSTEM "../../../App/Converted-DTD.dtd" [
<!-- Begin Document Specific Declarations -->
<!NOTATION pdf SYSTEM "">
<!ENTITY image1 SYSTEM "graphics/img-1.svg" NDATA pdf>
<!ENTITY image2 SYSTEM "graphics/img-2.svg" NDATA pdf>
<!-- End Document Specific Declarations -->
]>
<topic>
<title>My test topic</title>
<body>
<p>This is an image:
<image entity="image1"/></p>
<p>This is another image
<image entity="image2"/></p>
</body>
</topic>
What I need is a file that changes the entity reference to an actual href:
<topic>
<title>My test topic</title>
<body>
<p>This is an image:
<image href="graphics/img-1.svg"/></p>
<p>This is another image
<image href="graphics/img-2.svg"/></p>
</body>
</topic>
I cannot rely on the entities being declared in a DTD as every file has different entities pointing to different files.
Maybe it is trivial but I could not find a way to get to the embedded document specific declarations.

You can try
<xsl:template match="image/#entity">
<xsl:attribute name="href" select="unparsed-entity-uri(.)"/>
</xsl:template>

Related

How i can get text value from body node?

I need get node if it contains text.
So when i process <p> tag - i need to check previous <topic> if it has text in body or in any child tag in <body>
With next XSL code
ancestor::topic[1]/preceding-sibling::topic[1]/body/child::node()[(self::text() and normalize-space()) or self::*][position() = last()]
But it's for some reasons not working... Why?
<topic>
<body>Topic 3 with only a paragraph, no topic title</body>
</topic>
<topic>
<body>
<p> <!-- from here -->
<image href="" />
</p> <!-- and from here -->
</body>
</topic>
<topic>
<body>Topic 5 with only a paragraph, no topic title</body>
</topic>
I think you need to tell us in more detail what you are trying and how it exactly it fails for your; when I convert your input snippet into a well-formed input document
<root>
<topic>
<body>Topic 3 with only a paragraph, no topic title</body>
</topic>
<topic>
<body>
<p> <!-- from here -->
<image href="" />
</p> <!-- and from here -->
</body>
</topic>
<topic>
<body>Topic 5 with only a paragraph, no topic title</body>
</topic>
</root>
and run it through a stylesheet matching on a p with your posted condition in a predicate it obviously finds that single p you have i.e.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0">
<xsl:mode on-no-match="shallow-skip"/>
<xsl:template match="p[ancestor::topic[1]/preceding-sibling::topic[1]/body/child::node()[(self::text() and normalize-space()) or self::*][position() = last()]]">
found
</xsl:template>
</xsl:stylesheet>
outputs found as the match happens.
So explain with minimal but complete samples what your are trying, which output you expect and how it fails (i.e. which error you get or which wrong output), then we can tell perhaps what is wrong.
Sorry for the non-answer, but I couldn't stuff the code example that shows your expression does seem to work into a comment.

XSL embed original XML file

I am transforming XML files with XSL to HTML files. Is it possible to embed the original XML file in the HTML output? When yes, how is that possible?
Update 1: To make my need better understandable: In my HTML file, I want a form where I can download the original XML file. Therefore I have to embed the original XML file into my HTML file (e.g. as a hidden input field)
Thanks
If you want to copy the nodes through you can simply do <xsl:copy-of select="/"/> where you want to insert them, however, putting arbitrary XML nodes into HTML does not make sense usually. If you want to serialize an XML document to plain text to render it then you can use solutions like http://lenzconsulting.com/xml-to-string/, for instance:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:import href="http://lenzconsulting.com/xml-to-string/xml-to-string.xsl"/>
<xsl:output method="html" doctype-public="XSLT-compat" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<xsl:template match="/">
<html>
<head>
<title>Test</title>
</head>
<body>
<section>
<h1>Test</h1>
<xsl:apply-templates/>
<section>
<h2>Source</h2>
<pre>
<xsl:apply-templates mode="xml-to-string"/>
</pre>
</section>
</section>
</body>
</html>
</xsl:template>
<xsl:template match="data">
<ul>
<xsl:apply-templates/>
</ul>
</xsl:template>
<xsl:template match="item">
<li>
<xsl:apply-templates/>
</li>
</xsl:template>
</xsl:transform>
transforms an XML input like
<data>
<item att="value">
<!-- comment -->
<foo>bar</foo>
</item>
</data>
into the HTML
<!DOCTYPE html
PUBLIC "XSLT-compat">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Test</title>
</head>
<body>
<section>
<h1>Test</h1>
<ul>
<li>
bar
</li>
</ul>
<section>
<h2>Source</h2><pre><data>
<item att="value">
<!-- comment -->
<foo>bar</foo>
</item>
</data></pre></section>
</section>
</body>
</html>

saxon including boolean itemscope value and closing source tag in html output

I'm using Saxon-HE 9.6.0.1J from Saxonica to generate HTML documents (xsl:output method="html"). It's generally good at omitting the value of boolean attributes and closing tags for empty elements, but I've found a few situations where it fails:
The microdata itemscope="itemscope" attribute is not output as simply itemscope
empty source elements are given closing tags
Here is an example stylesheet that demonstrates the problem:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="html" encoding="utf-8" include-content-type="no"/>
<xsl:template match="/">
<html>
<head>
<meta charset="UTF-8" />
<title>HTML test</title>
</head>
<body>
<div itemscope="itemscope" itemtype="http://example.com/dummy/">
<span itemprop="prop1">val1</span>
</div>
<audio autoplay="autoplay" controls="controls">
<source type="audio/mpeg" src="example.mp3" />
<source type="audio/x-wav" src="example.wav" />
</audio>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Sample XML:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="example.xsl"?>
<example/>
Command:
java -cp saxon9he.jar net.sf.saxon.Transform -s:example.xml -a
Results:
<html>
<head>
<meta charset="UTF-8">
<title>HTML test</title>
</head>
<body>
<div itemscope="itemscope" itemtype="http://example.com/dummy/"><span itemprop="prop1">val1</span></div>
<audio autoplay controls>
<source type="audio/mpeg" src="example.mp3"></source>
<source type="audio/x-wav" src="example.wav"></source>
</audio>
</body>
</html>
As demonstrated, meta is properly empty but source is not, and the values for autoplay and controls are properly omitted but not for itemscope.
Is this a bug, or am I missing the solution to tell Saxon how to treat those elements and attributes? I've searched the docs on saxonica.com and questions here for a clue, but not found anything.
Thanks in advance!
Quick update: In XSLT 3.0, you can specify the #html-version attribute, which you can set to 5.0 if you want to use XHTML5. Doing so solved the issue of at <source> for me while still using #method="xhtml".
The "source" element is recognized as an empty element if you specify version="5.0" on xsl:output.
The list of attributes that Saxon recognizes as boolean attributes when you specify method="html" version="5.0" comes from here:
http://www.w3.org/TR/html5/index.html#attributes-1
which does not include "itemscope". I'm afraid I can't help you with the history of how it comes to be present in some flavours of HTML and not in the W3C flavour, but Saxon inevitably follows the W3C specs.
Perhaps we should provide some way of extending the list (if you're really keen you can do it by writing your own serializer factory class that customizes the Saxon serializer, but that's serious hackery).

Select DIV tag with XSLT?

I'm trying to create a new XHTML document from another XHTML document. I only want to use some of the DIV tags in the old XHTML document, but I'm not sure if I'm doing it right. To start, if I want to select a special DIV tag with ID = mbContent, could I use
<xsl:template match="x:div[#id='mbContent']">
This DIV tag contains other DIVs and content like images and so on. How do I do if I want to use the same CSS style that is applied to the content? Is there a way to copy the CSS style or do I have to add new CSS style and how do I do that? Since the new XHTML document is going to be part of antother XHTML, I cant use HEAD tag and put a reference to the CSS stylesheet that way.
Hmm, but if I use the CSS stylesheet that is going to be in the HEAD of the main XHTML documnet, perhaps I could apply that CSS styles to this DIV, or? How do I apply styles in the new XHTML document?
I'm a little bit confused, but I hope my question isn't to confusing?! :)
Hi! I need some new help since the code below isn't working for me. It's this that isn't working
xmlns:x="http://www.w3.org/1999/xhtml" and "x:div[#id='mbContent']"
I think it's because I'm using a CMS tool that has a proxy module that not accept this code for some strange reason. Therefore I'm looking for some alternative solution to add CLASS or ID and also add values to DIV elements by using this instead xsl:apply-templates select="//*[#id='mbSubMenu']" and also use copy as in the example below? Preciate some new help! Thanks! :)
The Xpath used in the expression is fine until you are using 'x' as xmlns in the XSLT document.
The template will match for the <div> provided its id is mbContent and the selected context will have all the descendents.
You can change the inline CSS for the elements. Since you said that this part is going to be within some other XHTML document. You can choose XML as output.
Change the inline CSS if you want to.
You can also assign them different classes so that it takes global styles automatically.
The idea is that given a XML document you are transforming it to another XML document.
Therefore, you can apply styles as you like it.
I hope that answers your question.
P.S. use proper xmlns in the XPath expression.
Let's assume following is the HTML doc.
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title></title>
</head>
<body>
<div id="mbContent">
<div>
<span>Some complex structure</span>
</div>
</div>
</body>
Apply the following XSL
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:x="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="x">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*" />
</xsl:copy>
</xsl:template>
<xsl:template match="x:div[#id='mbContent']">
<xsl:copy>
<xsl:attribute name="class">
<xsl:text>someNewStyle</xsl:text>
</xsl:attribute>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
This will result in the following output.
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title/>
</head>
<body>
<div class="someNewStyle" id="mbContent">
<div>
<span>Some complex structure</span>
</div>
</div>
</body>
You can change the XSL to suit your need.
Regards,
Ravish.

XLST collpases my <script> contents to one line as a result commenting out javascript!

UPDATE: Apologies all it was my http server stripping white space from from xslt before it was sent and was not aware of javascript comments (I should really del the question but cannot).
My XSLT looks like:
<?xml version="1.0"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml">
<xsl:output
method="xml"
indent="yes"/>
<xsl:template match="/root">
<html>
<head>
<title>Title</title>
<script type="text/javascript"><![CDATA[
// ©2011
function function(){
// do stuff...
}
]]></script>
</head>
<body>
<p> blah blah... </p>
</body>
</html>
</xsl:template>
But the resulting xml is always collapsed to one line resulting in my javascript being commented out from the inital comment! This happens is all major browsers! Despite indent="yes"..
I couldn't repro this.
With all of the following nine XSLT processors (MSXML3 included -- so in IE you should get a good result):
MSXML (3, 4, 6)
.NET (XslCompiledTransform and XslTransform)
Altova (XML-SPY)
Saxon 6.5.4
Saxon 9.1.07 (XSLT 2.0 processor)
XQSharp (XSLT 2.0 processor)
when I perform the provided XSLT transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/root">
<html>
<head>
<title>Title</title>
<script type="text/javascript">
<![CDATA[
// ©2011
function function()
{
// do stuff...
}
]]>
</script>
</head>
<body>
<p> blah blah... </p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
on this XML document (as no source XML document is provided in the question):
<root/>
the result is the same:
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Title</title>
<script type="text/javascript">
// ©2011
function function()
{
// do stuff...
}
</script>
</head>
<body>
<p> blah blah... </p>
</body>
</html>
Therefore, this is behavior of a buggy XSLT processor, not on the above list -- or there is some missing data in the question.
Try to wrap your javascript in <xsl:text> - Element instead of the CDATA Section. This will at least keep up your linebreaks you made inside. I'm not sure if CDATA stuff cares about linebreaks.
<script type="text/javascript"><xsl:text>
// ©2011
function function(){
// do stuff...
}
</xsl:text></script>
You also should try to to use method=html instead of xml because your generating html content.
In addition: i think the indent=yes stuff only applies to the indention of the XML Elements. I don't thin that mechanism cares about Text or CDATA Sections so you have to do the linebreaks yourself (as you already did in your javascript).
Three things to try:
You're generating HTML, so why have output method XML?
The CDATA will be used by the XML Parser on input for the XSLT engine, and not carried through (CDATAdoesn't appear in the XML info model).
Would using xml:space='preserve' on the script element help?