docbook saxon toolchain does not recognize customization hard-page-break - xslt

I cannot get docbook tool chain to do the hard page break
as described at the end of http://www.sagehill.net/docbookxsl/PageBreaking.html
(I used to have this working for me but seem to have lost the mojo.)
Here is the script to invoke docbook and saxon
#!/bin/sh
export CLASSPATH=/home/leffstudent/saxon-6.0.1.jar:/home/leffstudent/docbook-sl-1.79.1/saxon65.jar
echo $CLASSPATH
java com.icl.saxon.StyleSheet \
-o $1.fo $1 stO.xsl \
use.extensions=1 default.table.width=auto title.margin.left=0pc insert.xref.page.number=yes
(stO.xsl also sets my ref parameters on how xref should display page numbers. That is
not working, either. Thus, I suspect that my invocation of com.icl.saxon.Stylesheet
is ignoring my customization link
Here is the test docbook file I tried. (The real files is a 500 page
class notes.)
<section><title> </title>
<para>
abc
</para>
<?hard-pagebreak?>
<para>
def
</para>
</section>
Here is the style sheet, stO.xsl
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" version="1.0">
<xsl:import href="./titlepage.xsl"/>
<xsl:import href="/home/leffstudent/docbook-xsl-1.79.1/fo/docbook.xsl"/>
<xsl:template match="processing-instruction('hard-pagebreak')">
<fo:block break-after='page'/>
</xsl:template>
<xsl:attribute-set name="formal.object.properties">
<xsl:attribute name="keep-together.within-column">auto</xsl:attribute>
</xsl:attribute-set>
<xsl:param name="local.l10n.xml" select="document('')"/>
<l:i18n xmlns:l="http://docbook.sourceforge.net/xmlns/l10n/1.0">
<l:l10n language="en">
<l:context name="xref">
<l:template name="section" text="%t on Page Number %p"/>
<l:template name="mediaobject" text="%t on Page Number %p"/>
<l:template name="imageobject" text="%p"/>
</l:context>
<l:context name="xref-number-and-title">
<l:template name="section" text="%t on Page Number %p"/>
<l:template name="imageobject" text="%p"/>
</l:context>
</l:l10n>
</l:i18n>
</xsl:stylesheet>

I got this working finally with XSLTPROC:
#!/bin/sh
xsltproc --output $1.fo sd.xsl $1
It prints a separate page where I have the hard-pagebreak processing instruction.
Here is the customization layer, sd.xsl
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" version="1.0">
<xsl:import href="/home/leffstudent/docbook-xsl-1.79.1/fo/docbook.xsl"/>
<xsl:template match="processing-instruction('hard-pagebreak')">
<fo:block break-after='page'/>
</xsl:template>
</xsl:stylesheet>
I have tried again getting my xref's to work with pictures. (That, of course, is
with a larger file than sd.xsl But that is a separate issue, both literally
and figuratively.)
I still have not been able to get this working with Xalan. See
Question 55941299.
I have to check again to see if I can get this to work with saxon.
This is is what I used to use to prepare my class notes.
However, I can prepare out my 530-page class notes with xsltproc with proper page breaks.

Related

Adjust font size XSL-FO with Apache FOP

I have the following Docbook 5.1 document
<?xml version="1.0" encoding="UTF-8"?>
<book xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.1">
<info>
<title>Test</title>
</info>
<chapter>
<title>Switchs of <systemitem class="domainname">corp.net</systemitem></title>
<para>
Switchs: <systemitem class="fqdomainname">switch-<replaceable>id</replaceable>.corp.net</systemitem>.
</para>
</chapter>
</book>
And I use the following customization stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:import href="http://docbook.sourceforge.net/release/xsl-ns/current/fo/profile-docbook.xsl"/>
<xsl:param name="monospace.font.family">Liberation Mono</xsl:param>
<xsl:attribute-set name="monospace.properties">
<!-- XXX -->
</xsl:attribute-set>
</xsl:stylesheet>
Unfortunately, the monospace font I use is too large, so I want to reduce
it down to 85% of the serif font's size.
The first thing I tried was
<xsl:attribute name="font-size">0.85em</xsl:attribute>
It worked well except when there is nested monospaced text. In this case, the inner one
is too small.
Then I tried
<xsl:attribute name="font-size">
<xsl:value-of select="$body.font.master * 0.85"/>
<xsl:text>pt</xsl:text>
</xsl:attribute>
It doesn't work when the monospaced text is "inside" a larger sized text (eg. titles).
I've read http://docbook.sourceforge.net/release/xsl/1.78.1/doc/fo/monospace.properties.html, that recommends to use font-size-adjust, but FOP 2.3 doesn't support it.
Is there a way to simulate the font-size-adjust behavior ?

Choose XSL transform acoording to document content

I have a large number of XHTML documents which are created by different publishers, determined by a meta tag:
<meta name="citation_publisher" content="ACME publisher"/>
or in a different document
<meta name="citation_publisher" content="BETA publisher"/>
etc.
I have written stylesheets (about 1 page each) such as,
acme.xsl
beta.xsl
etc.
However I do not know the name of the publisher until I read the XHTML file.
It is possible, though very messy, to write a gigantic stylesheet of the form:
<xsl:choose>
<xsl:when test="$publisher='ACME publisher'">
<!-- acme.xsl sheet-->
</xsl:when>
<xsl:when test="$publisher='BETA publisher'">
<!-- beta.xsl sheet-->
</xsl:when>
</xsl:choose>
but there are at least 100 XSL files.
Is there any way, in XSL1, to select the stylesheet chunk according to the publisher? It would be nice to have the stylesheets as separate files and <xsl:import> them rather than have a single huge file.
UPDATE:
I think #Dimitre has answered the question correctly (and so I have accepted). I suspect that #MichaelKay's is actually better , but it does depend on having a pipeline managing the transformer. I shall try the <xsl:include> as a prototype and see whether it has downsides.
I wouldn't attempt to do this within a single XSLT stylesheet. It sounds to me like a good candidate for XProc, or some similar pipeline technology (e.g. Orbeon). Step 1, use XPath to classify the document, Step 2, transform it using the stylesheet chosen according to the results of Step 1.
but there are at least 100 XSL files. Is there any way, in XSL1, to
select the stylesheet chunk according to the publisher? It would be
nice to have the stylesheets as separate files and <xsl:import> them
rather than have a single huge file.
Here is one way to do this (I am showing working just with two content publisher types and this can be done for as many as needed):
Primary stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="unknown.xsl"/>
<xsl:import href="acme.xsl"/>
<xsl:import href="beta.xsl"/>
</xsl:stylesheet>
acme.xsl:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/*[meta[#content='ACME publisher']]">
<xsl:value-of select="x * y"/>
</xsl:template>
</xsl:stylesheet>
beta.xsl:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/*[meta[#content='BETA publisher']]">
<xsl:value-of select="x + y"/>
</xsl:template>
</xsl:stylesheet>
unknown.xsl:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/*">
<xsl:message terminate="yes">Error: Unknown content source</xsl:message>
</xsl:template>
</xsl:stylesheet>
When the transformation specified in the primary stylesheet is applied on this XML document:
acme.xml:
<t>
<meta name="citation_publisher" content="ACME publisher"/>
<x>6</x>
<y>4</y>
</t>
the wanted, correct result (x*y) is produced:
24
When the same transformation is applied on this XML document:
beta.xml:
<t>
<meta name="citation_publisher" content="BETA publisher"/>
<x>6</x>
<y>4</y>
</t>
again the correct result (x+y) is produced:
10
Finally, when the same transformation is applied on this XML document:
other.xml:
<t>
<meta name="citation_publisher" content="OTHER publisher"/>
<x>6</x>
<y>4</y>
</t>
the result of the transformation is the wanted termination with error message:
Error: Unknown content source
Processing terminated by xsl:message at line 5

Regex of XML with multiple tags

I'm trying to find all text that is not within the XML markup:
<transcript>
<text start="9.75" dur="5.94">welcome to about my property here you
can learn more about how your property</text>
<text start="15.69" dur="4.71">was assessed see the information impact
has on file and compare your property to</text>
<text start="20.4" dur="1.3">others in your neighborhood</text>
<text start="21.7" dur="5.32">interested in learning about market
trends in your municipality no problem</text>
<text start="105.79" dur="6.23">I have all of this and more about life property
. see your property assessment know more</text>
<text start="112.02" dur="0.11">about</text>
</transcript>
I am using the following regex pattern, but obviously it is not correct because it grabs all of the text between the opening and closing <transcript> tags:
<transcript>[\s\S]*?<\/transcript>
How can modify this regex pattern to select only the text that is not within any of the markup tags?
Use XSLT. XSLT is a language specifically designed to convert XML into another output format (back to valid XML again, or something else such as (X)HTML, plain text, or any other format – but preferably, based on plain text).
In this case the smallest XSLT necessary is just this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0" >
<xsl:output method="text" indent="no" />
<xsl:template match="text">
<!-- do NOTHING here! -->
</xsl:template>
</xsl:stylesheet>
This works because the default for processing a single XML tag is to recursively apply template matches to its containing tags, and plain text will always be copied. The only tag inside your <template> is <text>, and you process it by doing 'nothing' – i.e., by not copying its contents to the output. The line inside that template is just a comment.
All other "nodes", in XML terminology, are those without a surrounding tag and so are copied to the output.
Alternatively, if you have more types of tags than just <text> elements and you want to skip all of them, apply templates to / and transcript to process each and apply another to * (which will select all remaining tags not specified elsewhere) to not process them:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0" >
<xsl:output method="text" indent="no" />
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="transcript">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="*">
<!-- do NOTHING here! -->
</xsl:template>
</xsl:stylesheet>
Again, the plain untagged text will fall through and not get processed, so their contents will be copied to output.
Both XSLT stylesheets will output only I ha, the only part in your sample text that is not surrounded by tags.
Do you want to find
welcome to about my property here you can learn more about how your property
from
<text start="9.75" dur="5.94">welcome to about my property here you can learn more about how your property</text>
??
Than it will work.
(?<=>).+?(?=<)

XSL Generating CSV

Trying to convert this:
<list>
<entry>
<parentFeed>
<feedUrl>http://rss.nzherald.co.nz/rss/xml/nzhrsscid_000000001.xml</feedUrl>
<id>68</id>
</parentFeed>
<content>Schools will have to put up with problematic pay administered through Novopay for another eight weeks after the Government announced it would persist with the unstable system.Minister responsible for Novopay, Steven Joyce, delayed...</content>
<link>http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10872300&ref=rss</link>
<title>Novopay: Govt sticks with unstable system</title>
<id>55776</id>
<published class="sql-timestamp">2013-03-19 03:38:55.0</published>
<timestamp>2013-03-19 07:31:16.358 UTC</timestamp>
</entry>
</list>
into this, using XSLT:
Title, Link, Date
Novopay: Govt sticks with unstable system, http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10872300&ref=rss, 2013-03-19 03:38:55.0
But try as I might, I can't get rid of the blank line at the beginning of the document. My stylesheet follows:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:csv="csv:csv"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/list"> <xsl:for-each select="entry">
Title, Link, Date
<xsl:value-of select="title"/>, <xsl:value-of select="link"/>, <xsl:value-of select="published"/>
</xsl:for-each></xsl:template>
</xsl:stylesheet>
I've tried putting in <xsl:text>
</xsl:text> as suggested here which stripped the last linebreak, so I moved it to the top of the file, at which point it turned into a no-op. The solution here actually adds a blank line (which makes sense, as the hex code is for newline, according to the ascii manpage).
As a workaround, I've been using Java to generate the CSV output.
However, I do feel XSLT would be a lot faster as it is designed to transform XML to various other formats. A similar XSLT generates HTML, RSS, and ATOM feeds perfectly.
You have done it perfectly, your logic is spot on. However what you need to take in mind, when your outputting text all indents in your XSLT will affect the output so your XSLT should look like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:csv="csv:csv"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/list"> <xsl:for-each select="entry">Title, Link, Date
<xsl:value-of select="title"/>, <xsl:value-of select="link"/>, <xsl:value-of select="published"/>
<xsl:text>
</xsl:text>
</xsl:for-each></xsl:template>
</xsl:stylesheet>
Run the above XSLT and it will work perfectly.

After transformation getting output in text instead of xml nodes

My problem is after executing xlst file i am getting the output in text all in one line, but not in xml as required. My xml as well as xslt file is as follows.
<root>
<Jobs Found="10" Returned="50">
<Job ID="8000000" PositionID="600002">
<Title>Development Manager</Title>
<Summary>
<![CDATA[ An experienced Development Manager with previous experience leading a small to mid-size team of developers in a Java/J2EE environment. A hands on role, you will be expected to manage and mentor a team of developers working on a mix of greenfield and maintenance projects.   My client, a well known investment bank, requires an experienced Development Manager to join their core technology team. This t
]]>
</Summary>
<DateActive Date="2009-10-06T19:36:43-05:00">10/6/2009</DateActive>
<DateExpires Date="2009-11-05T20:11:34-05:00">11/5/2009</DateExpires>
<DateUpdated Date="2009-10-06 20:12:00">10/6/2009</DateUpdated>
<Location>
<Country>xxxx</Country>
<State>xxx</State>
<City>xxx</City>
<PostalCode>xxx</PostalCode>
</Location>
<CompanyName>abc Technology</CompanyName>
<BuilderFields />
<DisplayOptions />
<AddressType>1234</AddressType>
</Job>
</Jobs>
</root>
XSLT stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" media-type="application/xml"
cdata-section-elements="Summary"/>
<!-- default: copy everything using the identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- override: for Location and Salary nodes, just process the children -->
<xsl:template match="Location|Salary">
<xsl:apply-templates select="node()"/>
</xsl:template>
<!-- override: for selected elements, convert attributes to elements -->
<xsl:template match="Jobs/#*|Job/#*">
<xsl:element name="{name()}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<!-- override: for selected elements, remove attributes -->
<xsl:template match="DateActive/#*|DateExpires/#*|DateUpdated/#*"/>
</xsl:stylesheet>
Current Output in text is:
492 50 83000003 61999998 Market-leading company With a newly created role High Profile Position With Responsibilty, Visibility & Opportunity Must Have Solid BA Skills Honed in a SDLC environment Market-leading company With a newly created role High Profile Position With Responsibilty, Visibility & Opportunity Must Have Solid BA Skills Honed in a SDLC environment My client is a market-leader who continue to go from strengt 10/5/2009 11/4/2009 10/5/2009 Australia NSW Sydney 2000 Skill Quest 90,000.00 120,000.00 Per Year AUD 6
This outout i want in xml. pls help me to get a solution.
Do you have a line like this at the top of your XSLT file??
<xsl:output method="xml" indent="yes"/>
That defines what the output format is - "text" is default, "html" and "xml" are the other options.
I don't know what you're doing, but when I run your XSLT file on the sample XML file provided, I get this as output:
<?xml version="1.0" encoding="utf-8"?>
<root>
<Jobs><Found>10</Found><Returned>50</Returned>
<Job><ID>8000000</ID><PositionID>600002</PositionID>
<Title>Development Manager</Title>
<Summary>
An experienced Development Manager with previous experience leading a small to mid-size team of developers in a Java/J2EE environment. A hands on role, you will be expected to manage and mentor a team of developers working on a mix of greenfield and maintenance projects.&#160;&#160; My client, a well known investment bank, requires an experienced Development Manager to join their core technology team. This t
</Summary>
<DateActive>10/6/2009</DateActive>
<DateExpires>11/5/2009</DateExpires>
<DateUpdated>10/6/2009</DateUpdated>
<Country>xxxx</Country>
<State>xxx</State>
<City>xxx</City>
<PostalCode>xxx</PostalCode>
<CompanyName>abc Technology</CompanyName>
<BuilderFields />
<DisplayOptions />
<AddressType>1234</AddressType>
</Job>
</Jobs>
</root>
Marc
I suspect you are watching the transformation result in a browser.
The transformation itself works perfectly, but the browser displays the plain text of the XML (since it expects HTML contents by default and ignores any tags it does not recognize, displaying their text contents only).
Try media-type="text/xml" and see if that makes any difference. If it doesn't, don't let the browser display confuse you - there is nothing wrong with the XSLT. You should use another XSLT processor to confirm/debug the XSLT.
You probably write out the inner text of an xml node instead of calling apply-templates in one of your nodes. I couldn't find your attached xsl, so it's not easy to guess. But post the xslt, and I'll tell you.