Converting TeX from BITS xml structured for section heads section heads

Converting TeX from BITS xml structured for section heads section heads - xslt

I am trying to convert the tex file from the xml (BITS structure), i am have the issue for section heads, when section heads in the <boxed-text>, the hierarchy is not printing correctly
for example, <boxed-text> is in 3rd level section, and the <boxed-text> have section heads
those section heads are treated as next level of 3rd level section (ie., 4th level head), it should be treated as 1st level head in the box-text and so on only in the <boxed-text>
I need the output like below, but i am getting the wrong output, how to get the desired output? please suggest.
\section*{First Level Head}
Text under Head
\subsection*{Second Level Head}
\subsubsection*{Third Level Head}
Text under Head
\begin{annotebox}
Boxed TeXT Astrophysical S factor
\section{Box Level 1 head }
\subsection{Box Level 2 head}
The powers.......
\subsubsection{Box Level 3 head}
The powers......
\paragraph{Box Level 4 head}
.....
\subsection{Box Level 4 head}
\end{annotebox}
Text under Boxed TeXT
\paragraph*{4th Level Head}
i am using the below xsl structure to convert the file
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:m="http://www.w3.org/1998/Math/MathML"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:mml="http://www.w3.org/1998/Math/MathML"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:functx="http://www.functx.com"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
version="2.0" >
<xsl:output omit-xml-declaration="yes" indent="no" encoding="UTF-8" use-character-maps="latex"/>
<xsl:template match="title">
<xsl:choose>
<xsl:when test="parent::sec"><!--Section Head-->
<xsl:if test="parent::sec/label!=''">
<xsl:value-of select="if(ancestor::sec[5]) then '\def\thesubparagraph{' else if(ancestor::sec[4]) then '\def\theparagraph{' else if(ancestor::sec[3]) then '\def\thesubsubsection{' else if(ancestor::sec[2]) then '\def\thesubsection{' else if(ancestor::sec[1]) then '\def\thesection{' else ''" disable-output-escaping="yes"/>
<xsl:value-of select="parent::sec/label"/>
<xsl:text disable-output-escaping="yes">}
</xsl:text>
</xsl:if>
<xsl:value-of select="if(ancestor::sec[5])
then (if(parent::sec/label!='') then '\subparagraph{' else '\subparagraph*{')
else if(ancestor::sec[4])
then (if(parent::sec/label!='') then '\paragraph{' else '\paragraph*{')
else if(ancestor::sec[3])
then (if(parent::sec/label!='') then '\subsubsection{' else '\subsubsection*{')
else if(ancestor::sec[2])
then (if(parent::sec/label!='') then '\subsection{' else '\subsection*{')
else if(ancestor::sec[1])
then (if(parent::sec/label!='') then '\section{' else '\section*{')
else ''" disable-output-escaping="yes"/>
<xsl:apply-templates/>
<xsl:value-of select="if(ancestor::sec[5]) then '}
' else if(ancestor::sec[4]) then '}
' else if(ancestor::sec[3]) then '}
' else if(ancestor::sec[2]) then '}
' else if(ancestor::sec[1]) then '}
' else ''" disable-output-escaping="yes"/>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates/>
</xsl:otherwise>
<xsl:choose>
</xsl:template>
</xsl:stylesheet>
my MWE is
<?xml version="1.0" encoding="utf-8"?>
<book xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="eng">
<book-body>
<book-part book-part-type="chapter" id="isbn-9876543212345-book-part-002">
<book-part-meta>
<book-part-id book-part-id-type="doi">10.1093/9876543212345.003.0002</book-part-id>
<title-group id="isbn-9876543212345-book-part-002-title-group-001">
<label>8</label>
<title>Germany</title>
<subtitle>Chapter subtitle</subtitle>
</title-group>
</book-part-meta>
<body>
<sec id="isbn-9876543212345-book-part-001-sec-003">
<title>First Level Head</title>
<p>Text under Head</p>
<sec id="isbn-9876543212345-book-part-001-sec-003">
<title>Second Level Head</title>
<sec id="isbn-9876543212345-book-part-001-sec-003">
<title>Third Level Head</title>
<p>Text under Head</p>
<boxed-text content-type="annotation" id="isbn-9876543212345-book-part-002-boxed-text-002">
<p>Boxed TeXT Astrophysical S factor</p>
<sec id="isbn-9876543212345-book-part-002-sec-0006">
<label>1</label><title>Box Level 1 head </title>
<sec id="isbn-9876543212345-book-part-002-sec-0007">
<label>1.1</label><title>Box Level 2 head</title>
<p>The powers and procedures in this Code must be used fairly, responsibly, with respect for the people to whom they apply and without unlawful
discrimination</p>
<sec id="isbn-9876543212345-book-part-002-sec-0007">
<label>1.1.1</label><title>Box Level 3 head</title>
<p>The powers and procedures in this Code must be used fairly, responsibly, with respect for the people to whom they apply and without unlawful
discrimination</p>
<sec id="isbn-9876543212345-book-part-002-sec-0008">
<label>1.1.1.1</label><title>Box Level 4 head</title>
<p>All persons in custody must be dealt with expeditiously, and released as soon as the need for detention no longer applies.Use Quote to identify short prose
quotes of material.</p>
<p>The Quote typecode can include a Source typecode for attribution, but otherwise won't have much structure. For</p>
<p>longer structured extracts use the Extract code.</p>
</sec>
</sec>
</sec>
<sec id="isbn-9876543212345-book-part-002-sec-0006">
<label>1</label><title>Box Level 1 head </title>
<p>Boxed TeXT Astrophysical S factor</p>
</sec>
</sec>
</boxed-text>
<p>Text under Boxed TeXT</p>
<sec id="isbn-9876543212345-book-part-001-sec-003">
<title>4th Level Head</title>
<p>Text under Head</p>
</sec>
</sec>
</sec>
</sec>
<sec id="isbn-9876543212345-book-part-001-sec-003">
<title>First Level Head</title>
<p>Text under Head</p>
</sec>
</body>
</book-part>
</book-body>
</book>

Perhaps this helps:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:mode on-multiple-match="use-last"/>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates select="book//body/node()"/>
</xsl:template>
<xsl:template match="sec/label"/>
<xsl:template match="sec/title">\{string-join((1 to count(ancestor::sec) - 1)!'sub', '')}section*{{{.}}}</xsl:template>
<xsl:template match="sec[label]/title">\{string-join((1 to count(ancestor::sec) - 1)!'sub', '')}section{{{.}}}</xsl:template>
<xsl:template match="sec/sec/sec/sec/title">\paragraph*{{{.}}}</xsl:template>
<xsl:template match="sec/sec/sec/sec[label]/title">\paragraph{{{.}}}</xsl:template>
<xsl:template match="boxed-text" expand-text="no">\begin{annotebox}<xsl:apply-templates/>\end{annotebox}</xsl:template>
<xsl:template match="boxed-text//sec/title">\{string-join((1 to count(ancestor::sec[not(descendant::boxed-text)]) - 1)!'sub', '')}section*{{{.}}}</xsl:template>
<xsl:template match="boxed-text//sec[label]/title">\{string-join((1 to count(ancestor::sec[not(descendant::boxed-text)]) - 1)!'sub', '')}section{{{.}}}</xsl:template>
<xsl:template match="boxed-text/sec/sec/sec/sec/title">\paragraph{{{.}}}</xsl:template>
<xsl:template match="boxed-text//sec/sec/sec/sec[label]/title">\paragraph*{{{.}}}</xsl:template>
</xsl:stylesheet>
Outputs
\section*{First Level Head}
Text under Head
\subsection*{Second Level Head}
\subsubsection*{Third Level Head}
Text under Head
\begin{annotebox}
Boxed TeXT Astrophysical S factor
\section{Box Level 1 head }
\subsection{Box Level 2 head}
The powers and procedures in this Code must be used fairly, responsibly, with respect for the people to whom they apply and without unlawful
discrimination
\subsubsection{Box Level 3 head}
The powers and procedures in this Code must be used fairly, responsibly, with respect for the people to whom they apply and without unlawful
discrimination
\paragraph*{Box Level 4 head}
All persons in custody must be dealt with expeditiously, and released as soon as the need for detention no longer applies.Use Quote to identify short prose
quotes of material.
The Quote typecode can include a Source typecode for attribution, but otherwise won't have much structure. For
longer structured extracts use the Extract code.
\subsection{Box Level 1 head }
Boxed TeXT Astrophysical S factor
\end{annotebox}
Text under Boxed TeXT
\paragraph*{4th Level Head}
Text under Head
\section*{First Level Head}
Text under Head

Related

XML to parse eCFR - again

Once more into the breach.
I am looking to derive ordered pairs of information from an XML source for use in a lookup table in a database. The XML is very flat as its structure relates instructions for typesetting the documents. Data is not differentiated except by its format in this XML. A sample of the XML is as follows:
<APPENDIX>
<EAR>Pt. 774, Supp. 1</EAR>
<HD SOURCE="HED">Supplement No. 1 to Part 774—The Commerce Control List</HD>
<HD SOURCE="HD1">Category 0—Nuclear Materials, Facilities, and Equipment [and Miscellaneous Items]</HD>
<HD SOURCE="HD1">A. “End Items,” “Equipment,” “Accessories,” “Attachments,” “Parts,” “Components,” and “Systems”</HD>
<FP SOURCE="FP-2">
<E T="02">0A002Power generating or propulsion equipment “specially designed” for use with space, marine or mobile “nuclear reactors”. (These items are “subject to the ITAR.” See 22 CFR parts 120 through 130.)</E>
</FP>
<FP SOURCE="FP-2">
<E T="02">0A018Items on the Wassenaar Munitions List (see List of Items Controlled).</E>
</FP>
<FP SOURCE="FP-1">
<E T="04">License Requirements</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">Reason for Control:</E> NS, AT, UN</FP>
<GPOTABLE CDEF="s50,r50" COLS="2" OPTS="L2">
<BOXHD>
<CHED H="1">Control(s)</CHED>
<CHED H="1">Country Chart (See Supp. No. 1 to part 738)</CHED>
</BOXHD>
<ROW>
<ENT I="01">NS applies to entire entry</ENT>
<ENT>NS Column 1.</ENT>
</ROW>
<ROW>
<ENT I="01">AT applies to entire entry</ENT>
<ENT>AT Column 1.</ENT>
</ROW>
<ROW>
<ENT I="01">UN applies to entire entry</ENT>
<ENT>See § 746.1(b) for UN controls.</ENT>
</ROW>
</GPOTABLE>
<FP SOURCE="FP-1">
<E T="05">List Based License Exceptions (See Part 740 for a description of all license exceptions)</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">LVS:</E> $3,000 for 0A018.b</FP>
<FP SOURCE="FP-1">$1,500 for 0A018.c and .d</FP>
<FP SOURCE="FP-1">
<E T="03">GBS:</E> N/A</FP>
<FP SOURCE="FP-1">
<E T="03">CIV:</E> N/A</FP>
<FP SOURCE="FP-1">
<E T="04">List of Items Controlled</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">Related Controls:</E> (1) See also 0A979, 0A988, and 22 CFR 121.1 Categories I(a), III(b-d), and X(a). (2) See ECCN 0A617.y.1 and .y.2 for items formerly controlled by ECCN 0A018.a. (3) See ECCN 1A613.c for military helmets providing less than NIJ Type IV protection and ECCN 1A613.y.1 for conventional military steel helmets that, immediately prior to July 1, 2014, were classified under 0A018.d and 0A988. (4) See 22 CFR 121.1 Category X(a)(5) and (a)(6) for controls on other military helmets.</FP>
<FP SOURCE="FP-1">
<E T="03">Related Definitions:</E> N/A</FP>
<FP>
<E T="03">Items:</E> a. [Reserved]</FP>
<P>b. “Specially designed” components and parts for ammunition, except cartridge cases, powder bags, bullets, jackets, cores, shells, projectiles, boosters, fuses and components, primers, and other detonating devices and ammunition belting and linking machines (all of which are “subject to the ITAR.” (See 22 CFR parts 120 through 130);</P>
<NOTE>
<HD SOURCE="HED">
<E T="03">Note:</E>
</HD>
<P>
<E T="03">0A018.b does not apply to “components” “specially designed” for blank or dummy ammunition as follows:</E>
</P>
<P>
<E T="03">a. Ammunition crimped without a projectile (blank star);</E>
</P>
</APPENDIX>
Also attached are two XSL samples. The first will obtain the ECCN numbers from the nodes FP/E where the attributes are "FP-2" and "02", respectively. The second uses an xsl:if statement to obtain the "Reasons for Control" also from node FP. In this latter case the IF statement is used to determine whether the E node within the FP node includes the "Reason/s for Control" text.
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//FP[#SOURCE = 'FP-2']/E[#T='02']">
<xsl:value-of select="."/>\n
</xsl:for-each>
</xsl:template>
</xsl:stylesheet
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//FP[#SOURCE = 'FP-1']">
<xsl:if test= "E='Reason for Control:' or E='Reasons for Control:'">
<xsl:value-of select="."/>\n
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The output that I need is an ordered pair of the preceding ECCN and Reasons for Control information. My thought would be that if one were to move down the list to each FP node and perform a test on its attributes, keeping the correct ones as suggested above by the XSL samples, I should get a 1D list of the necessary information with an ECCN followed by its matching Reasons for Control, if any. However, I get most of the text of the original XML with a whole lot of "Nothing" thrown in. In other words, I am apparently matching the FP nodes, but the 'when' statements are not being satisfied for some reason.
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="FP">
<xsl:choose>
<xsl:when test="#Source='FP-2'">
<xsl:value-of select="."/>\n
</xsl:when>
<xsl:when test="#Source='FP-1'">
<xsl:if test= "E='Reason for Control:' or E='Reasons for Control:'">
<xsl:value-of select="."/>\n
</xsl:if>
</xsl:when>
<xsl:otherwise>
Nothing
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
I believe that if I can obtain a 1D list as described above, that I would be able to later get this into a Filemaker database. Given these premises, can anyone offer any advice on how to proceed?

Here is what I think I understand from this very confusing description:
There are two types of nodes of interest here; the first one can be selected
by:
/APPENDIX/FP[#SOURCE='FP-2'][E[#T='02']]
and the second one by:
/APPENDIX/FP[#SOURCE='FP-1'][E[#T='03']='Reason for Control:']
These nodes are siblings.
The nodes of the second type are related to the first
preceding-sibling node of the first type; Not every node of the first
type has a related node of the second type.
Based on these assumptions, the following styleheet:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="k" match="FP[#SOURCE='FP-1'][E[#T='03']='Reason for Control:']" use="generate-id(preceding-sibling::FP[#SOURCE='FP-2'][E[#T='02']][1])" />
<xsl:template match="/APPENDIX">
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
<METADATA>
<FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
<FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
</METADATA>
<RESULTSET>
<xsl:for-each select="FP[#SOURCE='FP-2'][E[#T='02']]">
<ROW>
<COL><DATA><xsl:value-of select="substring(E[#T='02'], 1, 5)"/></DATA></COL>
<COL><DATA><xsl:value-of select="key('k', generate-id())/text()"/></DATA></COL>
</ROW>
</xsl:for-each>
</RESULTSET>
</FMPXMLRESULT>
</xsl:template>
</xsl:stylesheet>
when applied to your input example (after correcting the unclosed <NOTE> element!), will produce:
Result
<?xml version="1.0" encoding="UTF-8"?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
<METADATA>
<FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
<FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
</METADATA>
<RESULTSET>
<ROW>
<COL>
<DATA>0A002</DATA>
</COL>
<COL>
<DATA/>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A018</DATA>
</COL>
<COL>
<DATA> NS, AT, UN</DATA>
</COL>
</ROW>
</RESULTSET>
</FMPXMLRESULT>

With XSLT, how can I process normally, but hold some nodes until the end and then output them all at once (e.g. footnotes)?

I have an XSLT application which reads the internal format of Microsoft Word 2007/2010 zipped XML and translates it into HTML5 with XSLT. I am investigating how to add the ability to optionally read OpenOffice documents instead of MSWord.
Microsoft stores XML for footnote text separately from the XML of the document text, which happens to suit me because I want the footnotes in a block at the end of the output HTML page.
However, unfortunately for me, OpenOffice puts each footnote right next to its reference, inline with the text of the document. Here is a simple paragraph example:
<text:p text:style-name="Standard">The real breakthrough in aerial mapping
during World War II was trimetrogon
<text:note text:id="ftn0" text:note-class="footnote">
<text:note-citation>1</text:note-citation>
<text:note-body>
<text:p text:style-name="Footnote">Three separate cameras took three
photographs at once, a direct downward and an oblique on each side.</text:p>
</text:note-body>
</text:note>
photography, but the camera was large and heavy, so there were problems finding
the right aircraft to carry it.
</text:p>
My question is, can XSLT process the XML as normal, but hold each of the text:note items until the end of the document text, and then emit them all at one time?

You're thinking of your logic as being driven by the order of things in the input, but in XSLT you need to be driven by the order of things in the output. When you get to the point where you want to output the footnotes, go find the footnote text wherever it might be in the input. Admittedly that doesn't always play too well with the apply-templates recursive descent processing model, which is explicitly input-driven; but nevertheless, that's the way you have to do it.

Don't think of it as "holding" the text:note items, instead simply ignore them in the main pass and then gather them at the end with a //text:note and process them there, e.g.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:text="whateveritshouldbe">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<!-- normal mode - replace text:note element by [reference] -->
<xsl:template match="text:note">
<xsl:value-of select="concat('[', text:note-citation, ']')" />
</xsl:template>
<xsl:template match="/">
<document>
<xsl:apply-templates select="*" />
<footnotes>
<xsl:apply-templates select="//text:note" mode="footnotes"/>
</footnotes>
</document>
</xsl:template>
<!-- special "footnotes" mode to de-activate the usual text:node template -->
<xsl:template match="#*|node()" mode="footnotes">
<xsl:copy>
<xsl:apply-templates select="#*|node()" mode="footnotes" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

You could use <xsl:apply-templates mode="..."/>. I'm not sure on the exact syntax and your use case, but maybe the example below will give you a clue on how to approach your problem.
Basic idea is to process your nodes twice. First iteration would be pretty much the same as now, and the second iteration only looks for footnotes and only outputs those. You differentiate those iteration by setting "mode" parameter.
Maybe this example will give you a clue how to approach your problem. Note that I used different tags that in your code, so the example would be simpler.
XSLT sheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" />
<xsl:template match="doc">
<xml>
<!-- First iteration - skip footnotes -->
<doc>
<xsl:apply-templates select="text" />
</doc>
<!-- Second iteration, extract all footnotes.
'mode' = footnotes -->
<footnotes>
<xsl:apply-templates select="text" mode="footnotes" />
</footnotes>
</xml>
</xsl:template>
<!-- Note: no mode attribute -->
<xsl:template match="text">
<text>
<xsl:for-each select="p">
<p>
<xsl:value-of select="text()" />
</p>
</xsl:for-each>
</text>
</xsl:template>
<!-- Note: mode = footnotes -->
<xsl:template match="text" mode="footnotes">
<xsl:for-each select=".//footnote">
<footnote>
<xsl:value-of select="text()" />
</footnote>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Input XML:
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<text>
<p>
some text
<footnote>footnote1</footnote>
</p>
<p>
other text
<footnote>footnote2</footnote>
</p>
</text>
<text>
<p>
some text2
<footnote>footnote3</footnote>
</p>
<p>
other text2
<footnote>footnote4</footnote>
</p>
</text>
</doc>
Output XML:
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<!-- Output from first iteration -->
<doc>
<text>
<p>some text</p>
<p>other text</p>
</text>
<text>
<p>some text2</p>
<p>other text2</p>
</text>
</doc>
<!-- Output from second iteration -->
<footnotes>
<footnote>footnote1</footnote>
<footnote>footnote2</footnote>
<footnote>footnote3</footnote>
<footnote>footnote4</footnote>
</footnotes>
</xml>

XSLT - How to ignore a sub tag in XML if supertag contains a particular term

I am trying to change my XSLT so that when a super tag in my XML contains a particular word the sub tag is not selected.
In this example I do not want the sub tag <para> displayed when the super tag <formalpara> contains the word "Galaxy"
Thanks in advance.
My XML
<formalpara>
Galaxy
<para>
<bridgehead>Galaxy Zoo</bridgehead>
<sliceXML>Galaxy</sliceXML>
The human eye is far better at identifying characteristics of galaxies
than any computer. So Galaxy Zoo has called for everyday citizens to
help in a massive identification project. Well over a hundred thousand
people have helped identify newly discovered galaxies. Now you can, too.
</para>
</formalpara>
My XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" version="1.0">
<xsl:template match="/">
<xsl:call-template name="results"/>
<xsl:message>FROM simpleHMHTransform XSLT8</xsl:message>
</xsl:template>
<xsl:template name="results">
<xsl:for-each select="//formalpara">
<xsl:call-template name="formalpara"/>
</xsl:for-each>
<xsl:for-each select="//para">
<xsl:call-template name="para"/>
</xsl:for-each>
</xsl:template>
<xsl:template name="formalpara">
<div id="formalpara">
<xsl:copy-of select="text()"/>
</div>
</xsl:template>
<xsl:template name="para">
<div id="para">
<xsl:copy-of select="text()"/>
</div>
</xsl:template>
My current output
<?xml version="1.0" encoding="UTF-8"?><div xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" id="formalpara">
Galaxy
</div><div xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" id="para">
The human eye is far better at identifying characteristics of galaxies
than any computer. So Galaxy Zoo has called for everyday citizens to
help in a massive identification project. Well over a hundred thousand
people have helped identify newly discovered galaxies. Now you can, too.
</div>
My desired output
<?xml version="1.0" encoding="UTF-8"?><div xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" id="formalpara">
Galaxy
</div>

You should really do xsl:apply-templates instead of calling named templates. You could then add this template:
<xsl:template match="formalpara[contains(text(),'Galaxy')]/para"/>
I can give a full example later.
Full Example:
XML Input
<formalpara>
Galaxy
<para>
<bridgehead>Galaxy Zoo</bridgehead>
<sliceXML>Galaxy</sliceXML>
The human eye is far better at identifying characteristics of galaxies
than any computer. So Galaxy Zoo has called for everyday citizens to
help in a massive identification project. Well over a hundred thousand
people have helped identify newly discovered galaxies. Now you can, too.
</para>
</formalpara>
XSLT 1.0
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:sparql-results="http://www.w3.org/2005/sparql-results#">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="formalpara">
<div id="formalpara">
<xsl:apply-templates/>
</div>
</xsl:template>
<xsl:template match="formalpara[contains(text(),'Galaxy')]/para"/>
<xsl:template match="para">
<div id="para">
<xsl:apply-templates/>
</div>
</xsl:template>
</xsl:stylesheet>
XML Output
<div xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" id="formalpara">
Galaxy
</div>
NOTE: I also changed the contains() to contains(text(),'Galaxy') so it only looks at the text that is a direct child of formalpara.

Serialize XML file on the basis of Character Count during an XSL transformation

I have an XML document (A.xml) and it is being transformed to another XML document (B.xml), which is nothing but a replica of A.xml with an unique #id being added to each element belonging to B.xml. And this part is done.
Now I would like implement a mechanism which would track character count of every text node within B.xml (within a temporary tree) and based on maximum character count, the mechanism would be able to split and serialize B.xml in one or several parts.
Source XML Document (A.xml):
<?xml version="1.0" encoding="UTF-8"?>
<root>
<!--
Rules for splitting:
1. «head/text()» is common for all splits.
2. split files can have 600 characters max each.
3. «title» elements could not be the last element of the any result document.
-->
<head><!-- 8 characters -->Kinesics</head>
<section>
<para><!-- 37 characters -->From Wikipedia, the free encyclopedia</para>
<para><!-- 204 characters [space normalized]-->Kinesics is the interpretation of body
language such as facial expressions and gestures — or, more formally, non-verbal
behavior related to movement, either of any part of the body or the body as a
whole. </para>
<section>
<title><!-- 19 characters -->Birdwhistell's work</title>
<para><!-- 432 characters [space normalized]-->The term was first used (in 1952) by Ray
Birdwhistell, an anthropologist who wished to study how people communicate through
posture, gesture, stance, and movement. Part of Birdwhistell's work involved making
film of people in social situations and analyzing them to show different levels of
communication not clearly seen otherwise. The study was joined by several other
anthropologists, including Margaret Mead and Gregory Bateson.</para>
<para><!-- 453 characters [space normalized]--> Drawing heavily on descriptive
linguistics, Birdwhistell argued that all movements of the body have meaning (i.e.
are not accidental), and that these non-verbal forms of language (or paralanguage)
have a grammar that can be analyzed in similar terms to spoken language. Thus, a
"kineme" is "similar to a phoneme because it consists of a group of movements which
are not identical, but which may be used interchangeably without affecting social
meaning".</para>
</section>
<section>
<title><!-- 19 characters -->Modern applications</title>
<para><!-- 390 characters [space normalized]-->Kinesics are an important part of
non-verbal communication behavior. The movement of the body, or separate parts,
conveys many specific meanings and the interpretations may be culture bound. As many
movements are carried out at a subconscious or at least a low-awareness level,
kinesic movements carry a significant risk of being misinterpreted in an
intercultural communications situation.</para>
</section>
</section>
</root>
XSL File
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">
<xsl:output method="xml" encoding="UTF-8" indent="no"/>
<!--update 1-->
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:variable name="root-replica">
<xsl:call-template name="create-root-replica">
<xsl:with-param name="context" select="*"/>
</xsl:call-template>
</xsl:variable>
<xsl:copy-of select="$root-replica"/>
<!--
<xsl:call-template name="split-n-serialize">
<xsl:with-param name="context" select="$root-replica"/>
</xsl:call-template>
-->
</xsl:template>
<xsl:template name="split-n-serialize">
<xsl:param name="context"/>
<xsl:for-each select="$context">
<xsl:result-document encoding="utf-8" href="{concat('split_',position(),'.xml')}" method="xml" indent="no">
<xsl:sequence select="."/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
<xsl:template name="create-root-replica">
<xsl:param name="context"/>
<root>
<head>
<xsl:value-of select="$context/head"/>
</head>
<xsl:apply-templates select="$context/*[not(self::head)]"/>
</root>
</xsl:template>
<xsl:template match="element()">
<xsl:element name="{local-name()}">
<xsl:attribute name="id">
<xsl:value-of select="generate-id()"/>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<!--update 2-->
<xsl:template match="text()">
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
</xsl:transform>
My input XML contains 1562 characters (assuming \s+ is equal to ), and I like to split A.xml into 4 parts using the rule mentioned within source xml document.
Does anyone have any idea how to do this? Any ideas or comments are greatly appreciated.
Update 3
Details of split files
1st File
8
37
204 = 249
2nd File
8
19
432 = 459
3rd File
8
453 = 461
4th File
8
19
390 = 417
Details on Split procedure:
Contents of element «head» should part of each and every XML file.
Files could be splitted from middle of section but not in the middle of a paragraph.
Not «title» element should come at the end of an split.
Maximum number characters (excluding opening and closing tags) in a split file is upto 600.
Sample output files (indents are used for better readability)
1st file
<?xml version="1.0" encoding="UTF-8"?>
<root>
<head>Kinesics</head>
<section id="d1e6">
<para id="d1e7">From Wikipedia, the free encyclopedia</para>
<para id="d1e10">Kinesics is the interpretation of body language such as facial expressions and gestures — or, more formally, non-verbal behavior related to movement, either of any part of the body or the body as a whole.</para>
</section>
</root>
2nd file
<?xml version="1.0" encoding="UTF-8"?>
<root>
<head>Kinesics</head>
<section id="d1e6">
<section id="d1e13">
<title id="d1e14">Birdwhistell's work</title>
<para id="d1e17">The term was first used (in 1952) by Ray Birdwhistell, an anthropologist who wished to study how people communicate through posture, gesture, stance, and movement. Part of Birdwhistell's work involved making film of people in social situations and analyzing them to show different levels of communication not clearly seen otherwise. The study was joined by several other anthropologists, including Margaret Mead and Gregory Bateson.</para>
</section>
</section>
</root>
3rd File
<?xml version="1.0" encoding="UTF-8"?>
<root>
<head>Kinesics</head>
<section id="d1e6">
<section id="d1e13">
<para id="d1e20">Drawing heavily on descriptive linguistics, Birdwhistell argued that all movements of the body have meaning (i.e. are not accidental), and that these non-verbal forms of language (or paralanguage) have a grammar that can be analyzed in similar terms to spoken language. Thus, a "kineme" is "similar to a phoneme because it consists of a group of movements which are not identical, but which may be used interchangeably without affecting social meaning".</para>
</section>
</section>
</root>
4th file
<?xml version="1.0" encoding="UTF-8"?>
<root>
<head>Kinesics</head>
<section id="d1e6">
<section id="d1e23">
<title id="d1e24">Modern applications</title>
<para id="d1e27">Kinesics are an important part of non-verbal communication behavior. The movement of the body, or separate parts, conveys many specific meanings and the interpretations may be culture bound. As many movements are carried out at a subconscious or at least a low-awareness level, kinesic movements carry a significant risk of being misinterpreted in an intercultural communications situation.</para>
</section>
</section>
</root>

You would use string-length() to get the "character count" and then xsl:result-document to split your result tree into parts.
Do you need further help coding it up?

XSLT Match attribute and then its element

In my source XML, any element can have an #n attribute. If one does, I want to output it before processing the element and all its children.
For example
<line n="2">Ipsum lorem</line>
<verse n="5">The sounds of silence</verse>
<verse>Four score and seven</verse>
<sentence n="3">
<word n="1">Hello</word>
<word n="2">world</word>
</sentence>
I have templates that match "line", "verse", "sentence" and "word". If any of those elements has an #n value, I want to output it in front of whatever the element's template generates.
The above might come out something like
2 <div class="line">Ipsum lorem</span>
5 <span class="verse">The sounds of silence</span>
<span class="verse">Four score and seven</span>
3 <p class="sentence">
1 <span class="word">Hello</span>
2 <span class="word">world</span>
</p>
where the templates for "line", "verse", etc. generated the div, span and p elements.
How should I think of this problem? -- Match the attribute and then apply-templates to its parent? (What would the syntax for that be?) Put a call-template at the beginning of every element's template? (That's unappealing.) Something else? (Probably!)
I tried a few things but got either an infinite loop, or nothing, or processing of the attribute and then its parent's children, but not the parent itself.

To simplify matters, I've placed the mapping from XML to HTML elements in an in-document data structure (accessible via the document() function with no arguments). Now only one template is needed requiring special processing of the #n attribute in only one place.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<map>
<elt xml="line" html="class"/>
<elt xml="verse" html="span"/>
<elt xml="sentence" html="p"/>
<elt xml="word" html="span"/>
</map>
<xsl:template match="line|verse|sentence|word">
<xsl:if test="#n"><xsl:value-of select="#n"/> </xsl:if>
<xsl:element name="{document()/map/elt[#xml=name()]/#html}">
<xsl:attribute name="class"><xsl:value-of select="name()"/></xsl:attibute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>

Here is one simple way to do this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*/*[#n]">
<xsl:value-of select="concat('
', #n, ' ')"/>
<xsl:apply-templates select="self::*" mode="content"/>
</xsl:template>
<xsl:template match="*/*[not(#*)]">
<xsl:apply-templates select="." mode="content"/>
</xsl:template>
<xsl:template match="line" mode="content">
<div class="line"><xsl:apply-templates/></div>
</xsl:template>
<xsl:template match="verse | word" mode="content">
<span class="{name()}"><xsl:apply-templates mode="content"/></span>
</xsl:template>
<xsl:template match="sentence" mode="content">
<p class="sentence"><xsl:apply-templates/></p>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<t>
<line n="2">Ipsum lorem</line>
<verse n="5">The sounds of silence</verse>
<verse>Four score and seven</verse>
<sentence n="3">
<word n="1">Hello</word>
<word n="2">world</word>
</sentence>
</t>
the wanted, correct result is produced:
2 <div class="line">Ipsum lorem</div>
5 <span class="verse">The sounds of silence</span>
<span class="verse">Four score and seven</span>
3 <p class="sentence">
1 <span class="word">Hello</span>
2 <span class="word">world</span>
</p>
Explanation: Appropriate use of templates and modes.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Converting TeX from BITS xml structured for section heads section heads - xslt

Related

XML to parse eCFR - again

With XSLT, how can I process normally, but hold some nodes until the end and then output them all at once (e.g. footnotes)?

XSLT - How to ignore a sub tag in XML if supertag contains a particular term

Serialize XML file on the basis of Character Count during an XSL transformation

XSLT Match attribute and then its element

Categories

Resources