How to make range section attribute "specific-use" value - XSLT - xslt

I am trying to numbers range section attribute specific-use in single para element. I am fetching single value <span style="case"> with group-by and fetching the value group-by="ancestor::sec[1]". Is it possible make range when comes <span style="case"> value continue descendant:: or following-sibling:: then can we fetch last specific-use value.
Input XML
<book>
<sec disp-level="1" specific-use="1">
<p>aaaaaa</p>
<sec disp-level="2" specific-use="2">
<p><span style="case">super</span></p>
</sec>
<sec disp-level="2" specific-use="3">
<blockquote>
<p><span style="case">super</span></p>
</blockquote>
</sec>
<sec disp-level="2" specific-use="4">
<p><span style="case">super</span></p>
</sec>
</sec>
<sec disp-level="1" specific-use="5">
<p><span style="case">super</span>, <span style="case">active</span></p>
<sec disp-level="2" specific-use="6">
<p><span style="case">active</span></p>
</sec>
</sec>
<sec disp-level="1" specific-use="7">
<p><span style="case">active</span></p>
<sec disp-level="2" specific-use="8">
<p><span style="case">active</span></p>
</sec>
<sec disp-level="2" specific-use="9">
<p><span style="case">super</span></p>
</sec>
<sec disp-level="2" specific-use="10">
<p><span style="case">active</span></p>
</sec>
</sec>
XSLT Code
<xsl:template match="book">
<xsl:copy>
<xsl:for-each-group select="//*" group-by="span[#style='case']">
<xsl:text>
</xsl:text>
<p>
<case><xsl:value-of select="current-grouping-key()"/></case><xsl:text> </xsl:text>
<sec>
<xsl:for-each select="current-group()">
<number>
<xsl:value-of select="ancestor::sec[1]/#specific-use"/>
</number>
<xsl:if test="position()!=last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
</sec>
</p>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
My XSLT given output https://xsltfiddle.liberty-development.net/3NSTbfj/7
My Expected Output
<book>
<p><case>super</case> <sec><number>2</number>–<number>4</number>, <number>5</number>, <number>9</number></sec></p>
<p><case>active</case> <sec><number>5</number>–<number>8</number>, <number>10</number></sec></p>
</book>
For Example:
super 2–4, 5, 9
active 5–8, 10
2–4 : Create range Because <span style="case">super</span> search continued same value child:: or following-sibling::.
5–8 : Create range Because <span style="case">active</span> search continued same value child:: or following-sibling::.

Related

XSLT : How to split strings of text into phrases or words

I appologize in advance, my knowledge if xsl transformations is poor.
I have created a xml document with paragraphs of text. These paragraphs contain other elements like <em>, <title>, <pb>...
for example :
<document>
<body xmlns:epub="http://www.idpf.org/2007/ops" epub:type="bodymatter">
<chap>
<tit>Une adorable petite dévoreuse de livres</tit>
<p>A <em>trois ans</em>, Matilda avait appris toute seule à lire en s'exerçant avec les journaux et les magazines qui traî<pb ed="original" n="14">14</pb>naient à la maison. A quatre ans, elle lisait couramment et, tout naturellement, se mit à rêver de livres. Le seul disponible dans ce foyer de haute culture, <title type="oeuvre">La Cuisine pour tous</title>, appartenait à sa mère et, lorsqu'elle l'eut épluché de la première page à la dernière et appris toutes les recettes par cœur, elle décida de se lancer dans des lectures plus intéressantes. </p>
</chap>
</document>
I need to split my text into sentences, adding an element <span class="sentence"> around them.
And then I need, for each sentence, to add an element <span class="word"> around each word, taking ito account the other elements already there.
so I need to obtain something like :
<p><span class="sentence">A <em>trois ans</em>, Matilda avait appris toute seule à lire en s'exerçant avec les journaux et les magazines qui traî<pb ed="original" n="14">14</pb>naient à la maison.</span> ...</p>
and then :
<p><span class="sentence"><span class="word">A</span> <em><span class="word">trois</span> <span class="word">ans</span></em>, <span class="word">Matilda</span> <span class="word">avait</span> <span class="word">appris</span> <span class="word">toute</span> <span class="word">seule</span> <span class="word">à</span> <span class="word">lire</span> <span class="word">en</span> <span class="word">s'exerçant</span> <span class="word">avec</span> <span class="word">les</span> <span class="word">journaux</span> <span class="word">et</span> <span class="word">les</span> <span class="word">magazines</span> <span class="word">qui</span> <span class="word">traî<pb ed="original" n="14">14</pb>naient</span> <span class="word">à</span> <span class="word">la</span> <span class="word">maison</span>.</span> ...</p>
As you can see, sometimes I need the elements to be inside other elements (inside the <em> because there are multiple words inside) and other times I don't take elements into account (<pb ed="original" n="14">14</pb> won't appear, it's only there to refer to the location of the pages on the printed version)
Is this kind of splitting possible ?
I thank you for any help you could offer me.
I have tried to first wrap sentences and then to wrap anything not being whitespace as a word, using XSLT 3:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:param name="sentence-markers" as="xs:string" select="'[!?\.]'"/>
<xsl:output method="html"/>
<xsl:template match="p">
<xsl:variable name="mark-sentence-ends">
<xsl:apply-templates mode="sentence"/>
</xsl:variable>
<p>
<xsl:for-each-group select="$mark-sentence-ends/node()" group-ending-with="eos">
<span class="sentence">
<xsl:apply-templates select="current-group()" mode="wrap-words"/>
</span>
</xsl:for-each-group>
</p>
</xsl:template>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:mode name="wrap-words" on-no-match="shallow-copy"/>
<xsl:template mode="wrap-words" match="text()">
<xsl:analyze-string select="." regex="\s+">
<xsl:matching-substring>
<xsl:value-of select="."/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<span class="word">{.}</span>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template mode="wrap-words" match="pb"/>
<xsl:template mode="wrap-words" match="title">
<span class="{#type}">
<xsl:apply-templates mode="#current"/>
</span>
</xsl:template>
<xsl:template mode="wrap-words" match="eos">
<xsl:apply-templates/>
</xsl:template>
<xsl:mode name="sentence" on-no-match="shallow-copy"/>
<xsl:template mode="sentence" match="text()">
<xsl:analyze-string select="." regex="{$sentence-markers}">
<xsl:matching-substring>
<eos>{.}</eos>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="document">
<html>
<head>
<title>Example</title>
</head>
<xsl:apply-templates/>
</html>
</xsl:template>
<xsl:template match="body">
<body>
<xsl:apply-templates/>
</body>
</xsl:template>
<xsl:template match="tit">
<h1>
<xsl:apply-templates/>
</h1>
</xsl:template>
<xsl:template match="chap">
<section>
<xsl:apply-templates/>
</section>
</xsl:template>
</xsl:stylesheet>
Here's something you could posssibly use as your starting point. It uses the \w "word character" character class to split the text into individual words (as mentioned in comments, XSLT's regex has no support for the \b word boundary anchor).
XSLT 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="no"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p//text()">
<xsl:analyze-string select="." regex="\w+">
<xsl:matching-substring>
<span class="word">
<xsl:value-of select="." />
</span>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="." />
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
Given your example (after correcting it for well-formedness!), this will replace the p element with:
<p><span class="word">A</span> <em><span class="word">trois</span> <span class="word">ans</span></em>, <span class="word">Matilda</span> <span class="word">avait</span> <span class="word">appris</span> <span class="word">toute</span> <span class="word">seule</span> <span class="word">à</span> <span class="word">lire</span> <span class="word">en</span> <span class="word">s</span>'<span class="word">exerçant</span> <span class="word">avec</span> <span class="word">les</span> <span class="word">journaux</span> <span class="word">et</span> <span class="word">les</span> <span class="word">magazines</span> <span class="word">qui</span> <span class="word">traî</span><pb ed="original" n="14"><span class="word">14</span></pb><span class="word">naient</span> <span class="word">à</span> <span class="word">la</span> <span class="word">maison</span>. <span class="word">A</span> <span class="word">quatre</span> <span class="word">ans</span>, <span class="word">elle</span> <span class="word">lisait</span> <span class="word">couramment</span> <span class="word">et</span>, <span class="word">tout</span> <span class="word">naturellement</span>, <span class="word">se</span> <span class="word">mit</span> <span class="word">à</span> <span class="word">rêver</span> <span class="word">de</span> <span class="word">livres</span>. <span class="word">Le</span> <span class="word">seul</span> <span class="word">disponible</span> <span class="word">dans</span> <span class="word">ce</span> <span class="word">foyer</span> <span class="word">de</span> <span class="word">haute</span> <span class="word">culture</span>, <title type="oeuvre"><span class="word">La</span> <span class="word">Cuisine</span> <span class="word">pour</span> <span class="word">tous</span></title>, <span class="word">appartenait</span> <span class="word">à</span> <span class="word">sa</span> <span class="word">mère</span> <span class="word">et</span>, <span class="word">lorsqu</span>'<span class="word">elle</span> <span class="word">l</span>'<span class="word">eut</span> <span class="word">épluché</span> <span class="word">de</span> <span class="word">la</span> <span class="word">première</span> <span class="word">page</span> <span class="word">à</span> <span class="word">la</span> <span class="word">dernière</span> <span class="word">et</span> <span class="word">appris</span> <span class="word">toutes</span> <span class="word">les</span> <span class="word">recettes</span> <span class="word">par</span> <span class="word">cœur</span>, <span class="word">elle</span> <span class="word">décida</span> <span class="word">de</span> <span class="word">se</span> <span class="word">lancer</span> <span class="word">dans</span> <span class="word">des</span> <span class="word">lectures</span> <span class="word">plus</span> <span class="word">intéressantes</span>. </p>
Demo: https://xsltfiddle.liberty-development.net/93wniUm
(Change the indent to "yes" to better observe the splitting action.)
If this provides a satisfactory output, you could move to the next step of creating the sentence wrappers. However, I suspect that you will need a human eye to go over the result and make corrections where necessary, and - as I already mentioned in the comments - you may have to put in a lot more work if a sentence or a word boundary can be within another element.
--- added ---
To prevent an apostrophe from breaking up a word, try:
<xsl:analyze-string select="string" regex="[\w']+">

Use for each in xslt

Input :
<lq>
<ol class="- topic/ol ">
<li>Text 1
<ol class="- topic/ol ">
<li>Text 2</li>
<li>Text 3</li>
</ol>
</li>
<li>Text 4</li>
<li>Text 5</li>
</ol>
</lq>
Out should be :
<node>
<p type="extract_number_1">Text 1</p>
<p type="extract_number_2">Text 2</p>
<p type="extract_number_2">Text 3</p>
<p type="extract_number_1">Text 4</p>
<p type="extract_number_1">Text 5</p>
</node>
Tried code :
<xsl:template match="lq/ol">
<xsl:for-each select="li">
<p type="extract_number_{position()}">
<xsl:apply-templates/>
</p>
</xsl:for-each>
</xsl:template>
I have mentioned above my Input Output should be and Tried code. Here <p> #type should be <li> position.
As my tried code I am not getting expected output. I am using XSLT 2.0. How can I solve this. Thank you.
If you want the number in the type attribute to show the nesting level then use e.g.
<xsl:template match="lq/ol">
<xsl:apply-templates select=".//li"/>
</xsl:template>
<xsl:template match="lq/ol//li">
<p type="extract_number_{count(ancestor::ol)}">
<xsl:value-of select="text()"/>
</p>
</xsl:template>

xslt: XPath select elements with specific attribute value

Input:
<list list-type="simple" specific-use="front">
<list-item><p>Preface <xref rid="b-9781783084944-FM-001" ref-type="sec">00</xref></p></list-item>
<list-item><p>Series Title <xref rid="b-9781783084944-FM-003" ref-type="sec">00</xref></p></list-item>
<list-item><p>Dedication</p></list-item>
<list-item><p>Acknowledgments <xref rid="b-9781783084944-FM-005" ref-type="sec">00</xref></p></list-item>
<list-item><p>Contributors <xref rid="b-9781783084944-FM-006" ref-type="sec">00</xref></p></list-item>
<list-item><p>Glossary <xref rid="b-9781783084944-FM-008" ref-type="sec">00</xref></p></list-item>
</list>
I need output lik below
<div class="pagebreak" id="b-9781783084944-FM-002">
<h2 class="PET">CONTENTS</h2>
<div class="TocPrelims">Preface</div>
<div class="TocPrelims">Series Title </div>
</div>
My xslt:
<xsl:template match="list[#specific-use='front'][#list-type='simple']/list-item/p">
<div class="TocPrelims">
<a>
<xsl:attribute name="href">
<xsl:text>#toc</xsl:text>
<xsl:copy-of select="//list[#specific-use='front'][#list-type='simple']/list-item/p/xref[#rid]"/>
</xsl:attribute>
<xsl:apply-templates/>
</a>
</div>
</xsl:template>
Above coding of mine is not correct.. pls give suggestions.
You have a problem with this line:
<xsl:copy-of select="//list[#specific-use='front'][#list-type='simple']/list-item/p/xref[#rid]"/>
Firstly, the condition will select all xref elements, but you only need the one for the current p you are positioned on. Secondly, it is selecting the xref element if it has an rid attribute, but you actually want to select the rid attribute. You also really want to use xsl:value-of here
<xsl:value-of select="xref/#rid"/>
Try this template instead:
<xsl:template match="list[#specific-use='front'][#list-type='simple']/list-item/p">
<div class="TocPrelims">
<a>
<xsl:attribute name="href">
<xsl:text>#toc</xsl:text>
<xsl:value-of select="xref/#rid"/>
</xsl:attribute>
<xsl:value-of select="text()[1]" />
</a>
</div>
</xsl:template>
In fact, you can make use of Attribute Value Templates to simplify it to this:
<xsl:template match="list[#specific-use='front'][#list-type='simple']/list-item/p">
<div class="TocPrelims">
<a href="#toc{xref/#rid}">
<xsl:value-of select="text()[1]" />
</a>
</div>
</xsl:template>

Using Xsl-Key and generate-id() function

I would associate at the first IMG the first ATTACHED_FILENAME and
at the second IMG the second attached filename.
This is my XML:
<INSTRUCTION_LIST_ITEM>
<NTC_SD_INSTRUCT>
<ACTION>Sostituire</ACTION>
<PLACEMENT>le righe 10 ÷ 18 con:</PLACEMENT>
<DESCRIPTION>
<P>Il porto è protetto da un molo foraneo.</P>
<P>
<IMG border="0" hspace="0" alt="" align="baseline" src="C:\Users\l_sturla\Desktop\albany.jpg"/>
</P>
<P>Ben visibile da nord è il faro della Vittoria.</P>
<P>
<IMG border="0" hspace="0" alt="" align="baseline" src="C:\Users\l_sturla\Desktop\Faro vittoria.JPG"/>
</P>
<P> </P>
<P>Mantenersi a distanza di sicurezza.</P>
</DESCRIPTION>
<ATTACHMENT_LIST>
<ATTACHMENT>
<ATTACHED_FILENAME>albany.jpg</ATTACHED_FILENAME>
</ATTACHMENT>
<ATTACHMENT>
<ATTACHED_FILENAME>Faro vittoria.JPG</ATTACHED_FILENAME>
</ATTACHMENT>
</ATTACHMENT_LIST>
</NTC_SD_INSTRUCT>
</INSTRUCTION_LIST_ITEM>
I create this XSLT:
<xsl:template match="//IMG">
<span style="font-style:italic">
<xsl:choose>
<xsl:when test="count(ancestor::DESCRIPTION//IMG) = count(ancestor::DESCRIPTION/following-sibling::ATTACHMENT_LIST/ATTACHMENT/ATTACHED_FILENAME)">
<img>
<xsl:attribute name="src">
<xsl:value-of select="ancestor::NTC_SD_INSTRUCT/ATTACHMENT_LIST/ATTACHMENT/ATTACHED_FILENAME"/>
</xsl:attribute>
</img>
</xsl:when>
</xsl:choose>
</span>
</xsl:template>
But this give always the first image. ATTACHED_FILENAME tag is the parameter of attribute SRC.
Try
<xsl:template match="IMG">
<xsl:variable name="counter">
<xsl:number level="any" from="DESCRIPTION"/>
</xsl:variable>
<img src="{(//ATTACHED_FILENAME)[number($counter)]}"/>
</xsl:template>
If you define a key <xsl:key name="attachment-by-pos" match="NTC_SD_INSTRUCT/ATTACHMENT_LIST/ATTACHMENT" use="concat(generate-id(ancestor::NTC_SD_INSTRUCT), '|', count(preceding-sibling::ATTACHMENT))"/>, then you can use a template
<xsl:template match="IMG">
<span style="font-style:italic">
<img src="{key('attachment-by-pos', concat(generate-id(ancestor::NTC_SD_INSTRUCT), '|', count(../preceding-sibling::*//IMG)))/ATTACHED_FILENAME}"/>
</span>
</xsl:template>

XSLT 1.0 Grouping on multiple values on multiple levels

edited in response to comments *
Hello,
I am an XSLT noob and need some help. I am trying to do an filter/group combination with XSLT 1.0 (can't use XSLT 2.0 for this application).
Here is an example of the xml
<entry>
<item>
<name>Widget 2</name>
<rank>2</rank>
<types>
<type>Wood</type>
<type>Fixed</type>
<type>Old</type>
</types>
</item>
<item>
<name>Widget 1</name>
<rank>2</rank>
<types>
<type>Metal</type>
<type>Broken</type>
<type>Old</type>
</types>
</item>
<item>
<name>Widget 3</name>
<rank>1</rank>
<types>
<type>Metal</type>
<type>New</type>
</types>
</item>
</entry>
Now what I want to do is output html where I get a subset of the XML based on <type> and then group on rank. For example, if the user selects all items with the type Metal, the output should be:
<p class="nospace"><font color="#800000">
<b>Rank 1</b></font></p>
<li id="mylist"><b>Widget 3</b></li>
<br\>
<p class="nospace"><font color="#800000">
<b>Rank 2</b></font></p>
<li id="mylist"><b>Widget 1</b></li>
<br\>
of if the user user chooses the type Old the output would be
<p class="nospace"><font color="#800000">
<b>Rank 2</b></font></p>
<li id="mylist"><b>Widget 1</b></li>
<li id="mylist"><b>Widget 2</b></li>
<br\>
I can group using keys on rank along easily enough, but trying to do both is not working. Here is a sample of the xslt I have tried:
<xsl:param name="typeParam"/>
<xsl:key name="byRank" use="rank" match="item"/>
<xsl:for-each select="item[count(.|key('byRank',rank)[1])=1]">
<xsl:sort data-type="number" select="rank"/>
<xsl:for-each select="key('byRank',rank)">
<xsl:sort select="name"/>
<xsl:if test="count(rank)>0">
<p class="nospace"><font color="#800000"><b>Rank<xsl:value-of select="rank"/></b></font></p>
<xsl:for-each select="types[types=$typeParam]">
<li id="mylist"><b><xsl:value-of select="../name"/></b></li>
</xsl:for-each>
<br/>
</xsl:if>
</xsl:for-each>
</xsl:for-each>
The result I get from this is I do indeed get the subset of my xml that I want but it also displays all of the various rank values. I want to limit it to just the ranks of the type that is specified in $typeParam.
I have tried moving the for-each statement to earlier in the code as well as modifying the if statement to select for $typeParam but neither works. I have also tried concat-ing my key with rank and type but that doesn't seem to work either (It only works if the type in $typeParam is the first child under types).
Thanks
jeff
This stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="kItemByRank" match="item" use="rank"/>
<xsl:param name="pType" select="'Old'"/>
<xsl:template match="entry">
<xsl:for-each select="item[count(.|key('kItemByRank',rank)[1])=1]">
<xsl:sort select="rank" data-type="number"/>
<xsl:variable name="vGroup" select="key('kItemByRank',rank)[
types/type = $pType
]"/>
<xsl:if test="$vGroup">
<p class="nospace">
<font color="#800000">
<b>
<xsl:value-of select="concat('Rank ',rank)"/>
</b>
</font>
</p>
<xsl:apply-templates select="$vGroup">
<xsl:sort select="name"/>
</xsl:apply-templates>
<br/>
</xsl:if>
</xsl:for-each>
</xsl:template>
<xsl:template match="item">
<li id="mylist">
<b>
<xsl:value-of select="name"/>
</b>
</li>
</xsl:template>
</xsl:stylesheet>
Output:
<p class="nospace">
<font color="#800000">
<b>Rank 1</b>
</font>
</p>
<li id="mylist">
<b>Widget 3</b>
</li>
<br />
<p class="nospace">
<font color="#800000">
<b>Rank 2</b>
</font>
</p>
<li id="mylist">
<b>Widget 1</b>
</li>
<br />
And whit pType param set to 'Old', output:
<p class="nospace">
<font color="#800000">
<b>Rank 2</b>
</font>
</p>
<li id="mylist">
<b>Widget 1</b>
</li>
<li id="mylist">
<b>Widget 2</b>
</li>
<br />