How can I preserve HTML entities with Diazo? - xslt

I have the following simple Diazo rules file:
<rules
xmlns="http://namespaces.plone.org/diazo"
xmlns:css="http://namespaces.plone.org/diazo/css"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<theme href="theme/theme.html" />
<replace css:theme-children="#content" css:content-children=".content" />
</rules>
and theme:
<html>
<body>
<div id="content">
Lorem ipsum ...
</div>
</body>
</html>
The source I want to transform is:
<html>
<body>
<div class="content">
info
</div>
</body>
</html>
What I get is
... info ...
but I want to keep the HTML entities of the href attribute intact. How can I do this with Diazo?

Note numeric character references are not entity references so your title is a bit misleading (the answer for preserving or not entity references such as "& n b s p ; " is very different)
I don't know Diazo but in XSLT if you add
<xsl:output encoding="US-ASCII"/>
to your document then any non ascii characters will be output using numeric references.
However in your example they are in fact ascii characters that are quoted such as "." as "." There isn't any standard way in xslt 1 to do that (and there should never be any reason to do that if the document is going to be processed by a conforming html or xml system). Any such system will expand those references to their characters before processing starts. (Which is why XSLT can not preserve them: they have been removed by the xml parser before XSLT sees the input data.)

Related

XSLT to rename link targets in result-document()

When splitting a deeply nested XML document into multiple output files using result-document(), is there a method to rewrite the #href values to point to ids inside the new documents? For example, splitting a book into multiple documents based on each becoming a new file, named with book-part/#id. In output file for chapter 1 there may be a link to a target in output file for chapter 2, which link value used to be relative within the single file. Now this link pointing to a different file should have the file name of chapter 2 followed by # and the original target value. There are changes to make the proper linking element (related-object), too, but it is the target value that I'm trying to generate specifically.
i.e link target pattern: [outputfilename.xml]#[original-filetarget-id]
It seems that I need to gather the values of each #rid in the original file and check before I insert the filename if the target will be in a different file and write the output #document-id according to the file in which it will be output. But I'm having trouble understanding how I would know the output file name and where in the XSLT to rewrite the target.
source xml:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//NLM//DTD Book DTD v2.1 20050630//EN" "book.dtd">
<book dtd-version="3.0">
<book-meta>
<book-id>123.4567890</book-id>
</book-meta>
<body>
<book-part book-part-type="chapter" id="book.123.4567890.ch01">
<book-part-meta>
<title-group>
<title>Chapter 1</title>
</title-group>
</book-part-meta>
<body>
<p> some text with a <xref rid="a">link to chapter 1</xref></p>
<p> some text with a <xref rid="b">link to chapter 2</xref></p>
<p id="a">a target id in chapter 1</p>
</body>
</book-part>
<book-part book-part-type="chapter" id="book.123.4567890.ch02">
<book-part-meta>
<title-group>
<title>Chapter 2</title>
</title-group>
</book-part-meta>
<body>
<p> some text with a <xref rid="a">link to chapter 1</xref></p>
<p> some text with a <xref rid="b">link to chapter 2</xref></p>
<p id="b">a target id in chapter 1</p>
</body>
</book-part>
</body>
</book>
output book.123.4567890.ch01.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//NLM//DTD Book DTD v2.1 20050630//EN" "book.dtd">
<book dtd-version="3.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table">
<book-meta>
<book-id>123.4567890</book-id>
</book-meta>
<body>
<book-part book-part-type="chapter" id="book.123.4567890.ch01">
<book-part-meta>
<title-group>
<title>Chapter 1</title>
</title-group>
</book-part-meta>
<body>
<p> some text with a <xref rid="a">link to chapter 1</xref></p>
<p> some text with a <related-object document-type="chapter" object-id="book.123.4567890.ch02.xml#b">link to chapter 2</related-object></p>
<p id="a">a target id in chapter 1</p>
</body>
</book-part>
</body>
</book>
output book.123.4567890.ch02.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//NLM//DTD Book DTD v2.1 20050630//EN" "book.dtd">
<book dtd-version="3.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table">
<book-meta>
<book-id>123.4567890</book-id>
</book-meta>
<body>
<book-part book-part-type="chapter" id="book.123.4567890.ch02">
<book-part-meta>
<title-group>
<title>Chapter 2</title>
</title-group>
</book-part-meta>
<body>
<p> some text with a <related-object document-type="chapter" object-id="book.123.4567890.ch01.xml#a">link to chapter 1</related-object></p>
<p> some text with a <xref rid="b" >link to chapter 2</xref></p>
<p id="b">a target id in chapter 1</p>
</body>
</book-part>
</body>
</book>
The short answer is: yes, you have understood correctly what you need to do.
You need to figure out, for each hyperlink, whether its target will be in the same output file as the source of the link, or a different one. And you have correctly identified the challenge here: knowing what the new file name will be. It's not really as difficult as it may look at first; just take a deep breath and work it out.
You are at an xref element; it has an rid attribute. You want to know: will the xref and the target be in the same output file or different ones? To decide this, you must
Ascend from the xref element to the containing book-part, and figure out what its filename will be. Put this value in a variable (fn-xref).
Go to the target element (id(#rid)) and then ascend from that element to the containing book-part, and figure out what its filename will be. Put this value in a variable (fn-rid).
Compare the values of $fn-xref and $fn-rid. If they are equal, do the right thing. If they differ, do the other right thing.
I'm guessing you don't need help turning this prose description into XSLT, but speak up if you do.

XSLT: Getting XML namespace as an attribute

I have the following xml:
<article article-type="research-article">
<body>
<graphic xlink:href="zee9991370930006.g.eps"/>
<self-uri xlink:title="pdf" xlink:href="zee00813002857.pdf" />
</body>
</article>
I need to convert this to:
<article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
<body>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="zee9991370930006.g.eps"/>
<self-uri xlink:title="pdf" xlink:href="zee00813002857.pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</body>
</article>
I used the following command in XSLT 2.0 for each of the elements for which namespace attribute is required:
<xsl:namespace name="xlink" select="'http://www.w3.org/1999/xlink'"/>
<xsl:namespace name="mml" select="'http://www.w3.org/1998/Math/MathML'"/>
But the issue is I am getting the namespace attribute only for one element i.e. article. I have declared the namespaces at the beginning of my xslt as well. Can't figure out what is the exact issue. Help of any kind would be truly appreciated. Thanks.
XML generators are not supposed to do what you want. They will produce your XML according to the specs. It is not recommended that you define the same namespaces in all elements that are using them! this makes it verbose, ugly and weird way of doing tings.
What is the problem if the namespace is defined only at the top (root element)? You can use it only in the elements that require it. simple.
(OP's comment: I need it at the root and I have declared it. But it is not available for the nodes under it i.e graphic and self-uri in my case).
Have you checked if you xml is well-formed? If what you post here is the complete xml, then graphic and self-uri should always have the namespace available. You should aim for the following output for the reasons told above.
<article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
<body>
<graphic xlink:href="zee9991370930006.g.eps"/>
<self-uri xlink:title="pdf" xlink:href="zee00813002857.pdf"/>
</body>
</article>

Edit html document using regex replace and matching contents of only immediate child

I have html that looks like so:
<ul style="list-style-type: square;">
<br />
<li margin-left="80px">
<br />first line
<br />
<br />second line
</li>
<br />
<li margin-left="80px">
<br />text line 1
</li>
<br />
<li margin-left="80px">
<br />text line 2
</li>
<br />
</ul>
I want to match contents of the ul, but I don't want to match contents of the li elements
The end goal is to get rid of the <br /> tags that are directly under the <ul></ul> and not under the <li></li>
Note:For clarity of the example I did formate the above html, but in my real world scenario it comes as a single giant string without any /r/n's
here:
<p margin-left="40px"><br /> <b>[What is the nature of the Services?]</b></p><br /><p><br /> [What are the overarching goals, objectives and outcomes you want to achieve?]</p><br /><p margin-left="80px"><br /> <b><i><u>[How should the Services be delivered?]</u></i></b></p><br /><ul style="list-style-type: square;"><br /> <li margin-left="80px"><br /> gfhsdfsdf<br /><br /> some line here</li><br /> <li margin-left="80px"><br /> sfdsfsdfsdf</li><br /> <li margin-left="80px"><br /> sdfsdfsdf</li><br /></ul><br /><p><br /> [Is the appointment of this Supplier exclusive?]</p><br /><p><br /> [Refer to any proposal prepared by the Supplier if this helps describes any aspects of the Service]</p><br />
Anyway the first thing in my mind was to
use this to extract the contents of the <ul>
<ul[^>]*>(.*)</ul>
and then maybe do a subsequent one to select all the li
<li[^>]*>.*</li>
and then somehow get rid of anything else that's left over
but that's kind of lame and then again
<li[^>]*>.*</li>
matches whole bunch of li's
this entrie string gets captured:
<li margin-left="80px"><br />\t\tgfhsdfsdf<br /><br />\t\tsome line here</li><br />\t<li margin-left="80px"><br />\t\tsfdsfsdfsdf</li><br />\t<li margin-left="80px"><br />\t\tsdfsdfsdf</li>
i know it's because dot is greedy, but not sure how to avoid it
something like [^</li>]* wouldn't work cuz it treats it like list of characters not a string
any help much appreciated
So I have 2 problems
1) i don't like the way I'm approaching this - better ideas needed (I'm considering using set operations of linq to xml to achieve this) - still hope to do this with regex, but if anyone knows exactly how to do this then please share
2) how do I capture separate groups of lis instead of capturing entire first opening <li> and last closing </li>?
I think you should go look at this...
RegEx match open tags except XHTML self-contained tags
Then recognize that parsing html with a regex is not quite that easy. personally I would load the html in to an html dom object then crawl the document... you might look at this project for some help.
http://htmlagilitypack.codeplex.com/
Since you don't say which regex flavor you're using, here's a JavaScript-compatible regex to match a <br /> that's inside a <ul> element but not inside a <li> element:
<br\s*/>(?=[^<]*(?:<(?!/?ul\b)[^<]*)*</ul>)(?![^<]*(?:<(?!/?li\b)[^<]*)*</li>)
Breaking that down,
<br\s*/> matches the BR tag, of course.
(?=[^<]*(?:<(?!/?ul\b)[^<]*)*</ul>) looks ahead for the next occurrence of </ul>, but only if it doesn't encounter a <ul> tag first.
(?![^<]*(?:<(?!/?li\b)[^<]*)*</li>) does the same thing with </li> and <li> tags, but this time negating the result.
Being JS compatible, this should work in Dreamweaver as well as in editors with solid regex support, like EditPad and TextMate. It's also compatible with most Perl-derived flavors (Python, .NET, Java, etc.), though some syntactic tweaking will probably be needed.

html templates in Yii

I'm new to Yii but I want to learn the best practices.
For example, I have the following HTML:
<html>
<head></head>
<body>
<!-- begin header -->
<div id="header"></div>
<!-- end header -->
<!-- begin main -->
<div id="main"></div>
<!-- end main -->
<!-- begin footer -->
<div id="footer"></div>
<!-- end footer -->
</body>
</html>
I usually cut the portions of HTML and distributed them in different files so that I had something like this:
<html>
<head></head>
<body>
<!-- begin header -->
<?php require_once('header.php')?>
<!-- end header -->
<!-- begin main -->
<?php require_once('main.php')?>
<!-- end main -->
<!-- begin footer -->
<?php require_once('footer.php')?>
<!-- end footer -->
</body>
</html>
so that if I changed something in "header.php" was visualized in all the other templates that required the file, which is the correct way to do this in Yii?
thanks for your answers
......header here......
<?php echo $content; ?>
......footer here......
Read this first
Everything in Yii is in the layout file under views->layouts->main.php. This is where you would handle all of the changes that take affect throughout the entire site. For more complex sites you can use multiple layouts, column layouts etc.
If you decide to use one of the multiple column layouts then they still refer back to the main layout for the header, footer, etc.

XHTML Transitional 1.o Highlight single word within unordered ul tag

I have an unordered list and each item is a sentence. I would like to highlight a word or two within the sentence but the font color tag
<font color="CC9966">highlighted words </font>
generates an error. Can anyone please suggest a fix for this? I have tried putting it in
<li> </li>
and while it works and generates no error, but this is not the formatting I want as I have a roll over on the li in my CSS style sheet. Thank you.
Wrap a span around the word you want to highlight. It would probably be easier to make a class in the CSS called highlight and use that though.
<span style="color:#DD0000">wordToHighlight</span>
You can check a documents validity by using the W3C Markup Validator. These are the people that make the rules. If you change the tab to validate by direct input and put in:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>test</title>
</head>
<body>
<ul>
<li>this is a <span style="color:#FF0000">test</span> sentence</li>
</ul>
</body>
</html>
You will see that a span in a li is valid markup.