Once more into the breach.
I am looking to derive ordered pairs of information from an XML source for use in a lookup table in a database. The XML is very flat as its structure relates instructions for typesetting the documents. Data is not differentiated except by its format in this XML. A sample of the XML is as follows:
<APPENDIX>
<EAR>Pt. 774, Supp. 1</EAR>
<HD SOURCE="HED">Supplement No. 1 to Part 774—The Commerce Control List</HD>
<HD SOURCE="HD1">Category 0—Nuclear Materials, Facilities, and Equipment [and Miscellaneous Items]</HD>
<HD SOURCE="HD1">A. “End Items,” “Equipment,” “Accessories,” “Attachments,” “Parts,” “Components,” and “Systems”</HD>
<FP SOURCE="FP-2">
<E T="02">0A002Power generating or propulsion equipment “specially designed” for use with space, marine or mobile “nuclear reactors”. (These items are “subject to the ITAR.” See 22 CFR parts 120 through 130.)</E>
</FP>
<FP SOURCE="FP-2">
<E T="02">0A018Items on the Wassenaar Munitions List (see List of Items Controlled).</E>
</FP>
<FP SOURCE="FP-1">
<E T="04">License Requirements</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">Reason for Control:</E> NS, AT, UN</FP>
<GPOTABLE CDEF="s50,r50" COLS="2" OPTS="L2">
<BOXHD>
<CHED H="1">Control(s)</CHED>
<CHED H="1">Country Chart (See Supp. No. 1 to part 738)</CHED>
</BOXHD>
<ROW>
<ENT I="01">NS applies to entire entry</ENT>
<ENT>NS Column 1.</ENT>
</ROW>
<ROW>
<ENT I="01">AT applies to entire entry</ENT>
<ENT>AT Column 1.</ENT>
</ROW>
<ROW>
<ENT I="01">UN applies to entire entry</ENT>
<ENT>See § 746.1(b) for UN controls.</ENT>
</ROW>
</GPOTABLE>
<FP SOURCE="FP-1">
<E T="05">List Based License Exceptions (See Part 740 for a description of all license exceptions)</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">LVS:</E> $3,000 for 0A018.b</FP>
<FP SOURCE="FP-1">$1,500 for 0A018.c and .d</FP>
<FP SOURCE="FP-1">
<E T="03">GBS:</E> N/A</FP>
<FP SOURCE="FP-1">
<E T="03">CIV:</E> N/A</FP>
<FP SOURCE="FP-1">
<E T="04">List of Items Controlled</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">Related Controls:</E> (1) See also 0A979, 0A988, and 22 CFR 121.1 Categories I(a), III(b-d), and X(a). (2) See ECCN 0A617.y.1 and .y.2 for items formerly controlled by ECCN 0A018.a. (3) See ECCN 1A613.c for military helmets providing less than NIJ Type IV protection and ECCN 1A613.y.1 for conventional military steel helmets that, immediately prior to July 1, 2014, were classified under 0A018.d and 0A988. (4) See 22 CFR 121.1 Category X(a)(5) and (a)(6) for controls on other military helmets.</FP>
<FP SOURCE="FP-1">
<E T="03">Related Definitions:</E> N/A</FP>
<FP>
<E T="03">Items:</E> a. [Reserved]</FP>
<P>b. “Specially designed” components and parts for ammunition, except cartridge cases, powder bags, bullets, jackets, cores, shells, projectiles, boosters, fuses and components, primers, and other detonating devices and ammunition belting and linking machines (all of which are “subject to the ITAR.” (See 22 CFR parts 120 through 130);</P>
<NOTE>
<HD SOURCE="HED">
<E T="03">Note:</E>
</HD>
<P>
<E T="03">0A018.b does not apply to “components” “specially designed” for blank or dummy ammunition as follows:</E>
</P>
<P>
<E T="03">a. Ammunition crimped without a projectile (blank star);</E>
</P>
</APPENDIX>
Also attached are two XSL samples. The first will obtain the ECCN numbers from the nodes FP/E where the attributes are "FP-2" and "02", respectively. The second uses an xsl:if statement to obtain the "Reasons for Control" also from node FP. In this latter case the IF statement is used to determine whether the E node within the FP node includes the "Reason/s for Control" text.
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//FP[#SOURCE = 'FP-2']/E[#T='02']">
<xsl:value-of select="."/>\n
</xsl:for-each>
</xsl:template>
</xsl:stylesheet
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//FP[#SOURCE = 'FP-1']">
<xsl:if test= "E='Reason for Control:' or E='Reasons for Control:'">
<xsl:value-of select="."/>\n
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The output that I need is an ordered pair of the preceding ECCN and Reasons for Control information. My thought would be that if one were to move down the list to each FP node and perform a test on its attributes, keeping the correct ones as suggested above by the XSL samples, I should get a 1D list of the necessary information with an ECCN followed by its matching Reasons for Control, if any. However, I get most of the text of the original XML with a whole lot of "Nothing" thrown in. In other words, I am apparently matching the FP nodes, but the 'when' statements are not being satisfied for some reason.
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="FP">
<xsl:choose>
<xsl:when test="#Source='FP-2'">
<xsl:value-of select="."/>\n
</xsl:when>
<xsl:when test="#Source='FP-1'">
<xsl:if test= "E='Reason for Control:' or E='Reasons for Control:'">
<xsl:value-of select="."/>\n
</xsl:if>
</xsl:when>
<xsl:otherwise>
Nothing
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
I believe that if I can obtain a 1D list as described above, that I would be able to later get this into a Filemaker database. Given these premises, can anyone offer any advice on how to proceed?
Here is what I think I understand from this very confusing description:
There are two types of nodes of interest here; the first one can be selected
by:
/APPENDIX/FP[#SOURCE='FP-2'][E[#T='02']]
and the second one by:
/APPENDIX/FP[#SOURCE='FP-1'][E[#T='03']='Reason for Control:']
These nodes are siblings.
The nodes of the second type are related to the first
preceding-sibling node of the first type; Not every node of the first
type has a related node of the second type.
Based on these assumptions, the following styleheet:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="k" match="FP[#SOURCE='FP-1'][E[#T='03']='Reason for Control:']" use="generate-id(preceding-sibling::FP[#SOURCE='FP-2'][E[#T='02']][1])" />
<xsl:template match="/APPENDIX">
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
<METADATA>
<FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
<FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
</METADATA>
<RESULTSET>
<xsl:for-each select="FP[#SOURCE='FP-2'][E[#T='02']]">
<ROW>
<COL><DATA><xsl:value-of select="substring(E[#T='02'], 1, 5)"/></DATA></COL>
<COL><DATA><xsl:value-of select="key('k', generate-id())/text()"/></DATA></COL>
</ROW>
</xsl:for-each>
</RESULTSET>
</FMPXMLRESULT>
</xsl:template>
</xsl:stylesheet>
when applied to your input example (after correcting the unclosed <NOTE> element!), will produce:
Result
<?xml version="1.0" encoding="UTF-8"?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
<METADATA>
<FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
<FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
</METADATA>
<RESULTSET>
<ROW>
<COL>
<DATA>0A002</DATA>
</COL>
<COL>
<DATA/>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A018</DATA>
</COL>
<COL>
<DATA> NS, AT, UN</DATA>
</COL>
</ROW>
</RESULTSET>
</FMPXMLRESULT>
Related
I am looking to shorten my XSLT codebase by seeing if XSLT can increase a text number for each match. The text number exists in both the attribute value "label-period0" and the "xls:value-of" value.
The code works, no errors so this is more a question of how to shorten the code and make use of some sort of iteration on a specific character in a string.
I added 2 similar code structures for "period0" and "period1" to better see what exactly are the needed changes in terms of the digit in the text strings.
Source XML file:
<data>
<periods>
<period0><from>2016-01-01</from><to>2016-12-01</to></period0>
<period1><from>2015-01-01</from><to>2015-12-01</to></period1>
<period2><from>2014-01-01</from><to>2014-12-01</to></period2>
<period3><from>2013-01-01</from><to>2013-12-01</to></period3>
</periods>
<balances>
<balance0><instant>2016-12-31</instant></balance0>
<balance1><instant>2015-12-31</instant></balance1>
<balance2><instant>2014-12-31</instant></balance2>
<balance3><instant>2013-12-31</instant></balance3>
</balances>
</data>
XSL file:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform
version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output method="xml" indent="yes"/>
<!-- Block all data that has no user defined template -->
<xsl:mode on-no-match="shallow-skip"/>
<xsl:template match="data">
<results>
<periods>
<periods label="period0">
<xsl:value-of
select =
"concat(periods/period0/from, '--', periods/period0/to)"
/>
</periods>
<periods label="period1">
<xsl:value-of
select =
"concat(periods/period1/from, '--', periods/period1/to)"
/>
</periods>
<!-- Etc for period [2 and 3]-->
</periods>
<balances>
<balance label="balance0">
<xsl:value-of select ="balances/balance0/instant"/>
</balance>
<!-- Etc for balance [1,2 and 3] -->
</balances>
</results>
</xsl:template>
</xsl:transform>
Result:
<?xml version="1.0" encoding="UTF-8"?>
<results>
<periods>
<periods label="period0">2016-01-01--2016-12-01</periods>
<periods label="period1">2015-01-01--2015-12-01</periods>
</periods>
<balances>
<balance label="balance0">2016-12-31</balance>
</balances>
</results>
Wanted result:
(with an XSL that steps the digit in the text string, or any other logics in XSL that could cater for manipulating the digit in text string)
<?xml version="1.0" encoding="UTF-8"?>
<results>
<periods>
<periods label="period0">2016-01-01--2016-12-01</periods>
<periods label="period1">2015-01-01--2015-12-01</periods>
<periods label="period2">2014-01-01--2015-12-01</periods>
<periods label="period3">2013-01-01--2015-12-01</periods>
</periods>
<balances>
<balance label="balance0">2016-12-31</balance>
<balance label="balance1">2015-12-31</balance>
<balance label="balance2">2014-12-31</balance>
<balance label="balance3">2013-12-31</balance>
</balances>
</results>
Couldn't you do simply something like:
<xsl:template match="/data">
<results>
<periods>
<xsl:for-each select="periods/*">
<periods label="{name()}">
<xsl:value-of select="from"/>
<xsl:text>--</xsl:text>
<xsl:value-of select="to"/>
</periods>
</xsl:for-each>
</periods>
<balances>
<xsl:for-each select="balances/*">
<balance label="{name()}">
<xsl:value-of select="instant"/>
</balance>
</xsl:for-each>
</balances>
</results>
</xsl:template>
If you want to do your own numbering, you can change:
<periods label="{name()}">
to:
<periods label="period{position() - 1}">
I have a large xml document containing annotated speech transcripts. Following is a short fragment.
<?xml version="1.0" encoding="UTF-8"?>
<U>
<A/>
<C type="start" id="cb01s"/>
<P/>
<T>a</T>
<T>woman</T>
<P/>
<T>took</T>
<T>off</T>
<T>the</T>
<T>train</T>
<C type="end" id="cb02e"/>
<P/>
<T>but</T>
<P/>
<F/>
<RT>
<O>
<C type="start" id="cb03s"/>
<T>her</T>
<T>bag</T>
<P/>
<T>are</T>
</O>
<P/>
<E>
<C type="start" id="cb04s"/>
<T>her</T>
<T>bag</T>
<T>are</T>
</E>
</RT>
<P/>
<T>still</T>
<P/>
<T>in</T>
<T>the</T>
<T>train</T>
<C type="end" id="cb05e"/>
<PC>.</PC>
</U>
The basic task I need to do is to get the number of <T> nodes between certain pairs of <C> nodes. I've used the following stylesheet fragment to do this (illustrating with one specific pair of <C> nodes).
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="U">
<xsl:variable name="start-node" select="descendant::C[#id = 'cb01s']"/>
<xsl:variable name="end-node" select="descendant::C[#id = 'cb02e']"/>
<xsl:text>Result: </xsl:text>
<xsl:value-of select="count($start-node/following::T[following::C[generate-id(.) = generate-id($end-node)]])"/>
</xsl:template>
</xsl:stylesheet>
This works fine on such a short XML fragment as above and gives the correct result: Result: 6.
However, the actual XML document contains tens of thousands of <C> nodes and even more <T> nodes. So when I try to run the stylesheet on it the result comes back very slowly. (It would probably take days to finish completely.) I suppose the problem must be that on each run of the <xsl:value-of... line, the processor (Saxon) is checking all <T> nodes and generating id's for <C> nodes multiples times (i.e., exponentially) and that slows everything down.
Is there a way to speed up the process while still using generate-id()? Or do I need to get the number of <T> nodes with some alternate approach?
You do not need generate-id() just to avoid matching <C> elements intervening between the start and end nodes. You are matching <C> elements by their id attributes in the first place, and I see no reason not to use that more directly. For example,
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="U">
<xsl:variable name="start-id" select="cb01s"/>
<xsl:variable name="end-id" select="cb02e"/>
<xsl:text>Result: </xsl:text>
<xsl:value-of select="count(descendant::C[#id = $start-id]/following::T[following::C[#id = $end-id][1]])"/>
</xsl:template>
</xsl:stylesheet>
You can simplify that by removing the [1] position predicate if you can rely on the <C> element #ids to be unique in the document.
If generate-id() is indeed the primary cause of your performance problem, then avoiding it altogether ought to provide a big boost.
I have an XSLT template that is working fine.
<xsl:template match="Row[contains(BenefitType, 'MyBenefit')]">
<value>
<xsl:value-of select="BenefitList/Row/Premium* 12" />
</value>
</xsl:template>
The output is
<value>100</value>
<value>110</value>
What I would prefer is if it would just output 220. So, basically in the template I would need to use some sort of variable or looping to do this and then output the final summed value?
XSLT 1 compliance is required.
The template is being used as follows:
<xsl:apply-templates select="Root/Row[contains(BenefitType, 'MyBenefit')]" />
For some reason, when I use the contains here it only sums the first structure that matches and not all of them. If The XML values parent wasn't dependent on having a sibling element that matched a specific value then a'sum' approach would work.
The direct solution to the problem was already mentioned in the comments, but assuming you really want to do the same with some variables, this might be interesting for you:
XML:
<Root>
<Row>
<BenefitType>MyBenefit</BenefitType>
<BenefitList>
<Premium>100</Premium>
</BenefitList>
</Row>
<Row>
<BenefitType>MyBenefit, OtherBenefit</BenefitType>
<BenefitList>
<Premium>100</Premium>
</BenefitList>
</Row>
<Row>
<BenefitType>OtherBenefit</BenefitType>
<BenefitList>
<Premium>1000</Premium>
</BenefitList>
</Row>
<Row>
<BenefitType>OtherBenefit</BenefitType>
<BenefitList>
<Premium>1000</Premium>
</BenefitList>
</Row>
</Root>
XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
exclude-result-prefixes="exsl">
<xsl:template match="/">
<total>
<xsl:variable name="valuesXml">
<values>
<xsl:apply-templates select="Root/Row[contains(BenefitType, 'MyBenefit')]" />
</values>
</xsl:variable>
<xsl:variable name="values" select="exsl:node-set($valuesXml)/values/value" />
<xsl:value-of select="sum($values)" />
</total>
</xsl:template>
<xsl:template match="Row[contains(BenefitType, 'MyBenefit')]">
<value>
<xsl:value-of select="BenefitList/Premium * 12" />
</value>
</xsl:template>
</xsl:stylesheet>
Here the same result set generated in your question is saved in another variable, which can then again be processed.
I have this XML with two sets of table data (element names generalized for simplicity).
<root>
<table>
<Row type="1">
<Id>AAAA</Id>
<Properties>
<Property>A</Property>
<Property>D</Property>
</Properties>
</Row>
<Row type="1">
<Id>BBBB</Id>
<Properties>
<Property>B</Property>
</Properties>
</Row>
<Row type="1">
<Id>CCCC</Id>
<Properties>
<Property>G</Property>
<Property>H</Property>
</Properties>
</Row>
</table>
<table>
<Row type="2">
<Id>123abc</Id>
<Properties>
<Property>A</Property>
<Property>D</Property>
<Property>E</Property>
</Properties>
</Row>
<Row type="2">
<Id>456def</Id>
<Properties>
<Property>B</Property>
<Property>C</Property>
<Property>I</Property>
</Properties>
</Row>
<Row type="2">
<Id>798ghi</Id>
<Properties>
<Property>F</Property>
<Property>G</Property>
<Property>H</Property>
</Properties>
</Row>
</table>
</root>
I am trying to write a transform to output a new table that associates a row from table 1 to a row in table 2 based on their properties. Rows in table 1 are not required to have all of the properties present in a row in table 2; any one property is all that's required to be considered a match. There will always only be a one to one relationship between a row in table 1 and a row in table 2.
My desired output is:
<root>
<Row>
<Name>NewTable:AAAA</Name>
<Table2Id>123abc</Table2Id>
</Row>
<Row>
<Name>NewTable:BBBB</Name>
<Table2Id>456def</Table2Id>>
</Row>
<Row>
<Name>NewTable:CCCC</Name>
<Table2Id>789ghi</Table2Id>
</Row>
</root>
I've started with this and have been trying to follow this logic: Find me the Id tag of the row in table 2 who has at least one property that matches one of the properties of the current row being processed in table 1.
Here's what I have so far. It's not working, but I feel like I'm close.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:template match="//Row[#type != '2']">
<xsl:variable name="Name" select="concat('NewTable:', ./Id)"/>
<Row>
<Name>
<xsl:value-of select="$Name"/>
</Name>
<Table2Id>
<xsl:value-of select="//Row[#type = '2'][Property = ./Property]/Id"/>
</Table2Id>
</Row>
</xsl:template>
<xsl:template match="/">
<root>
<xsl:apply-templates/>
</root>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
There are two things that can be very useful here: one is the key feature of XSLT that allows you to create a relationship based on matching values. The other is set comparison, where if at least one member of a set matches at least one member of another set, the sets will be considered matching.
Try the following stylesheet that takes advantage of both:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:key name="table2" match="Row[#type=2]" use="Properties/Property" />
<xsl:template match="/">
<root>
<xsl:for-each select="root/table/Row[#type=1]">
<row>
<Name>
<xsl:value-of select="Id" />
</Name>
<Table2Id>
<xsl:value-of select="key('table2', Properties/Property)/Id" />
</Table2Id>
</row>
</xsl:for-each>
</root>
</xsl:template>
</xsl:stylesheet>
BTW, your method would have worked too (albeit less efficiently) if only you had used:
[Properties/Property = ./Properties/Property]
instead of just:
[Property = ./Property]
since you are in the context of a <Row> (a grandparent of <Property>).
I have an XSLT application which reads the internal format of Microsoft Word 2007/2010 zipped XML and translates it into HTML5 with XSLT. I am investigating how to add the ability to optionally read OpenOffice documents instead of MSWord.
Microsoft stores XML for footnote text separately from the XML of the document text, which happens to suit me because I want the footnotes in a block at the end of the output HTML page.
However, unfortunately for me, OpenOffice puts each footnote right next to its reference, inline with the text of the document. Here is a simple paragraph example:
<text:p text:style-name="Standard">The real breakthrough in aerial mapping
during World War II was trimetrogon
<text:note text:id="ftn0" text:note-class="footnote">
<text:note-citation>1</text:note-citation>
<text:note-body>
<text:p text:style-name="Footnote">Three separate cameras took three
photographs at once, a direct downward and an oblique on each side.</text:p>
</text:note-body>
</text:note>
photography, but the camera was large and heavy, so there were problems finding
the right aircraft to carry it.
</text:p>
My question is, can XSLT process the XML as normal, but hold each of the text:note items until the end of the document text, and then emit them all at one time?
You're thinking of your logic as being driven by the order of things in the input, but in XSLT you need to be driven by the order of things in the output. When you get to the point where you want to output the footnotes, go find the footnote text wherever it might be in the input. Admittedly that doesn't always play too well with the apply-templates recursive descent processing model, which is explicitly input-driven; but nevertheless, that's the way you have to do it.
Don't think of it as "holding" the text:note items, instead simply ignore them in the main pass and then gather them at the end with a //text:note and process them there, e.g.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:text="whateveritshouldbe">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<!-- normal mode - replace text:note element by [reference] -->
<xsl:template match="text:note">
<xsl:value-of select="concat('[', text:note-citation, ']')" />
</xsl:template>
<xsl:template match="/">
<document>
<xsl:apply-templates select="*" />
<footnotes>
<xsl:apply-templates select="//text:note" mode="footnotes"/>
</footnotes>
</document>
</xsl:template>
<!-- special "footnotes" mode to de-activate the usual text:node template -->
<xsl:template match="#*|node()" mode="footnotes">
<xsl:copy>
<xsl:apply-templates select="#*|node()" mode="footnotes" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
You could use <xsl:apply-templates mode="..."/>. I'm not sure on the exact syntax and your use case, but maybe the example below will give you a clue on how to approach your problem.
Basic idea is to process your nodes twice. First iteration would be pretty much the same as now, and the second iteration only looks for footnotes and only outputs those. You differentiate those iteration by setting "mode" parameter.
Maybe this example will give you a clue how to approach your problem. Note that I used different tags that in your code, so the example would be simpler.
XSLT sheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" />
<xsl:template match="doc">
<xml>
<!-- First iteration - skip footnotes -->
<doc>
<xsl:apply-templates select="text" />
</doc>
<!-- Second iteration, extract all footnotes.
'mode' = footnotes -->
<footnotes>
<xsl:apply-templates select="text" mode="footnotes" />
</footnotes>
</xml>
</xsl:template>
<!-- Note: no mode attribute -->
<xsl:template match="text">
<text>
<xsl:for-each select="p">
<p>
<xsl:value-of select="text()" />
</p>
</xsl:for-each>
</text>
</xsl:template>
<!-- Note: mode = footnotes -->
<xsl:template match="text" mode="footnotes">
<xsl:for-each select=".//footnote">
<footnote>
<xsl:value-of select="text()" />
</footnote>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Input XML:
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<text>
<p>
some text
<footnote>footnote1</footnote>
</p>
<p>
other text
<footnote>footnote2</footnote>
</p>
</text>
<text>
<p>
some text2
<footnote>footnote3</footnote>
</p>
<p>
other text2
<footnote>footnote4</footnote>
</p>
</text>
</doc>
Output XML:
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<!-- Output from first iteration -->
<doc>
<text>
<p>some text</p>
<p>other text</p>
</text>
<text>
<p>some text2</p>
<p>other text2</p>
</text>
</doc>
<!-- Output from second iteration -->
<footnotes>
<footnote>footnote1</footnote>
<footnote>footnote2</footnote>
<footnote>footnote3</footnote>
<footnote>footnote4</footnote>
</footnotes>
</xml>