I am quite new to programming and XSLT: I try to improve the way I ask questions and explain problems, but I still have a long way to go. Sorry if there is something unclear.
I need to detect various alphabets in my XML document, which looks like this, with a lot more different language options.
<text>
<p>Some text. dise´mbər Some text. Some text.</p> <!-- text in International Phonetic Alphabet + English -->
<p>Some text. dise´mbər Some text. Издательство Академии Наук СССР Some text.</p> <!-- text in International Phonetic Alphabet + English + Cyrillic alphabet -->
<p>Some text. Издательство Академии Наук СССР dise´mbər Some text. Some text.</p>
<p>Some text. Some text. Издательство Академии Наук СССР Some text.</p> <!-- text in English + Cyrillic alphabet -->
</text>
What I started to do in XSLT is this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output method="xml" indent="no" encoding="UTF-8" omit-xml-declaration="no" />
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:for-each select="#*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="processing-instruction()">
<xsl:processing-instruction name="{local-name()}"><xsl:apply-templates></xsl:apply-templates></xsl:processing-instruction>
</xsl:template>
<xsl:template name="IPA">
<xsl:variable name="text" ><xsl:copy-of select="."/></xsl:variable>
<xsl:analyze-string select="$text" regex="((\p{{IsIPAExtensions}}|\p{{IsPhoneticExtensions}})+)" >
<xsl:matching-substring>
<IPA><xsl:value-of select="regex-group(1)"/></IPA>
</xsl:matching-substring>
<xsl:non-matching-substring><xsl:copy-of select="."></xsl:copy-of></xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template name="Cyrillic">
<xsl:variable name="texte" ><xsl:call-template name="IPA"></xsl:call-template></xsl:variable>
<xsl:analyze-string select="$texte" regex="(\p{{IsCyrillic}}+)" >
<xsl:matching-substring>
<Cyrillic><xsl:apply-templates select="regex-group(1)"/></Cyrillic>
</xsl:matching-substring>
<xsl:non-matching-substring><xsl:call-template name="IPA"></xsl:call-template></xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="text()">
<xsl:call-template name="Cyrillic"></xsl:call-template>
</xsl:template>
</xsl:stylesheet>
So that I could get an XML like this:
<?xml version="1.0" encoding="UTF-8"?><text>
<p>Some text. dise´mb<IPA>ə</IPA>r Some text. Some text.</p>
<p>Some text. dise´mb<IPA>ə</IPA>r Some text. <Cyrillic>Издательство</Cyrillic> <Cyrillic>Академии</Cyrillic> <Cyrillic>Наук</Cyrillic> <Cyrillic>СССР</Cyrillic> Some text.</p>
<p>Some text. <Cyrillic>Издательство</Cyrillic> <Cyrillic>Академии</Cyrillic>
<Cyrillic>Наук</Cyrillic> <Cyrillic>СССР</Cyrillic> dise´mb<IPA>ə</IPA>r Some text. Some text.</p>
<p>Some text. Some text. <Cyrillic>Издательство</Cyrillic> <Cyrillic>Академии</Cyrillic>
<Cyrillic>Наук</Cyrillic> <Cyrillic>СССР</Cyrillic> Some text.</p>
</text>
This is what I needed, however, there is a ten or so regex blocks that I use and the processing time will be quite long if I use this method. What would you do instead? Do you think XSLT is appropriate for this?
Thank you !
Maria
(XSLT 2, Saxon-HE 9.8.0.8)
Edit: here's the profile:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Analysis of Stylesheet Execution Time</title>
</head>
<body>
<h1>Analysis of Stylesheet Execution Time</h1>
<p>Total time: 72128.065 milliseconds</p>
<h2>Time spent in each template, function or global variable:</h2>
<p>The table below is ordered by the total net time spent in the template, function
or global variable. Gross time means the time including called templates and functions
(recursive calls only count from the original entry); net time means time excluding
time spent in called templates and functions.
</p>
<table border="border" cellpadding="10">
<thead>
<tr>
<th>file</th>
<th>line</th>
<th>instruction</th>
<th>count</th>
<th>average time (gross/ms)</th>
<th>total time (gross/ms)</th>
<th>average time (net/ms)</th>
<th>total time (net/ms)</th>
</tr>
</thead>
<tbody>
<tr>
<td> "*code/unicode.xsl" </td>
<td>21</td>
<td>template Greek</td>
<td align="right">2,755,968</td>
<td align="right">0.017</td>
<td align="right">46,854.785</td>
<td align="right">0.017</td>
<td align="right">46,854.785</td>
</tr>
<tr>
<td> "*code/unicode.xsl" </td>
<td>32</td>
<td>template Hebrew</td>
<td align="right">1,329,696</td>
<td align="right">0.043</td>
<td align="right">57,529.163</td>
<td align="right">0.008</td>
<td align="right">10,674.378</td>
</tr>
<tr>
<td> "*code/unicode.xsl" </td>
<td>54</td>
<td>template IPA</td>
<td align="right">333,984</td>
<td align="right">0.206</td>
<td align="right">68,964.076</td>
<td align="right">0.019</td>
<td align="right">6,381.186</td>
</tr>
<tr>
<td> "*code/unicode.xsl" </td>
<td>43</td>
<td>template Cyrillic</td>
<td align="right">665,392</td>
<td align="right">0.094</td>
<td align="right">62,582.890</td>
<td align="right">0.008</td>
<td align="right">5,053.727</td>
</tr>
<tr>
<td> "*code/unicode.xsl" </td>
<td>65</td>
<td>template Arabic</td>
<td align="right">167,068</td>
<td align="right">0.421</td>
<td align="right">70,284.800</td>
<td align="right">0.008</td>
<td align="right">1,320.724</td>
</tr>
<tr>
<td> "*code/unicode.xsl" </td>
<td>76</td>
<td>template Arrows</td>
<td align="right">83,536</td>
<td align="right">0.849</td>
<td align="right">70,945.946</td>
<td align="right">0.008</td>
<td align="right">661.146</td>
</tr>
<tr>
<td> "*code/unicode.xsl" </td>
<td>8</td>
<td>template *</td>
<td align="right">12,122</td>
<td align="right">5.959</td>
<td align="right">72,238.100</td>
<td align="right">0.034</td>
<td align="right">413.937</td>
</tr>
<tr>
<td> "*code/unicode.xsl" </td>
<td>87</td>
<td>template Dingbats</td>
<td align="right">41,768</td>
<td align="right">1.708</td>
<td align="right">71,323.074</td>
<td align="right">0.009</td>
<td align="right">377.128</td>
</tr>
<tr>
<td> "*code/unicode.xsl" </td>
<td>98</td>
<td>template Private</td>
<td align="right">20,884</td>
<td align="right">3.427</td>
<td align="right">71,576.916</td>
<td align="right">0.012</td>
<td align="right">253.842</td>
</tr>
<tr>
<td> "*code/unicode.xsl" </td>
<td>18</td>
<td>template processing-instruction()</td>
<td align="right">6,907</td>
<td align="right">0.014</td>
<td align="right">98.490</td>
<td align="right">0.014</td>
<td align="right">98.490</td>
</tr>
<tr>
<td> "*code/unicode.xsl" </td>
<td>121</td>
<td>template text()</td>
<td align="right">20,884</td>
<td align="right">3.429</td>
<td align="right">71,600.976</td>
<td align="right">0.001</td>
<td align="right">24.060</td>
</tr>
</tbody>
</table>
</body>
</html>
The profile of Martin Honnen's code:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Analysis of Stylesheet Execution Time</title>
</head>
<body>
<h1>Analysis of Stylesheet Execution Time</h1>
<p>Total time: 2900.594 milliseconds</p>
<h2>Time spent in each template, function or global variable:</h2>
<p>The table below is ordered by the total net time spent in the template, function
or global variable. Gross time means the time including called templates and functions
(recursive calls only count from the original entry); net time means time excluding
time spent in called templates and functions.
</p>
<table border="border" cellpadding="10">
<thead>
<tr>
<th>file</th>
<th>line</th>
<th>instruction</th>
<th>count</th>
<th>average time (gross/ms)</th>
<th>total time (gross/ms)</th>
<th>average time (net/ms)</th>
<th>total time (net/ms)</th>
</tr>
</thead>
<tbody>
<tr>
<td> "*code/unicode.xsl" </td>
<td>44</td>
<td>template text()</td>
<td align="right">222,968</td>
<td align="right">0.009</td>
<td align="right">1,949.720</td>
<td align="right">0.009</td>
<td align="right">1,949.720</td>
</tr>
<tr>
<td> "*code/unicode.xsl" </td>
<td>26</td>
<td>template text()</td>
<td align="right">20,884</td>
<td align="right">0.135</td>
<td align="right">2,823.597</td>
<td align="right">0.042</td>
<td align="right">873.877</td>
</tr>
</tbody>
</table>
</body>
</html>
Regular expressions such as \p{IsIPAExtensions} should be reasonably efficient: most of the blocks are a single consecutive range of codepoints and testing a character should simply check whether it is in that range. The cost, I suspect, arises not so much from the cost of checking one character against one Unicode block, but from the number of characters and the number of blocks.
It might be worth getting a profile at the Java level to see where it is spending its time. I can guess, but a profile would reveal if my guess is right.
The thing that can kill performance with regular expressions is backtracking, but I don't immediately see any risk of backtracking with this code.
The only other approach that comes to mind is to generate an enormous translate() call that classifies characters into groups (so all latin characters become "1", all Cyrillic characters become "2", etc) and then to process the result using `<xsl:for-each-group select="string-to-codepoints(.)" group-adjacent=".">. But there's no guarantee that would perform any better, and it's a lot of work to do the experiments to find out.
In XSLT 3, I would consider the following approach:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
exclude-result-prefixes="#all"
version="3.0">
<xsl:param name="scripts"
as="map(xs:string, xs:string)*"
select="map { 'Cyrillic' : '\p{IsCyrillic}+'},
map { 'IPA' : '[\p{IsIPAExtensions}\p{IsPhoneticExtensions}]+' }"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="text()">
<xsl:iterate select="$scripts">
<xsl:param name="input" select="."/>
<xsl:on-completion>
<xsl:sequence select="$input"/>
</xsl:on-completion>
<xsl:next-iteration>
<xsl:with-param name="input">
<xsl:apply-templates select="$input" mode="wrap">
<xsl:with-param name="script-map" tunnel="yes" select="."/>
</xsl:apply-templates>
</xsl:with-param>
</xsl:next-iteration>
</xsl:iterate>
</xsl:template>
<xsl:mode name="wrap" on-no-match="shallow-copy"/>
<xsl:template match="text()" mode="wrap">
<xsl:param name="script-map" tunnel="yes"/>
<xsl:analyze-string select="." regex="{$script-map?*}">
<xsl:matching-substring>
<xsl:element name="{map:keys($script-map)}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
I haven't measured whether it performs better but for the regular expressions I would [\p{IsIPAExtensions}\p{IsPhoneticExtensions}]+ consider to be easier than (\p{IsIPAExtensions}|\p{IsPhoneticExtensions})+.
The other improvements are to rely on the xsl:mode based identity transformation and xsl:iterate.
I have a code that return each row, but how can i filter them based on one of the returned column values?
<xsl:for-each select="//Sqls/Stuff/Row">
<tr>
<td>
<xsl:value-of select="First" />
</td>
<td>
<xsl:value-of select="Second" />
</td>
<td>
<xsl:value-of select="Third" />
</td>
<td>
<xsl:value-of select="Fourth" />
</td>
</tr>
</xsl:for-each>
So how can i make this code return only rows where "Third" equals to "SHOWTHIS"?
Thanks.
Try:
<xsl:for-each select="//Sqls/Stuff/Row[Third='SHOWTHIS']">
You could wrap the tr in an xsl:if block and test the value of //Sqls/Stuff/Row/Third
https://msdn.microsoft.com/en-us/library/ms256209.aspx
I woud like to make some nested loops over my xml doc
Here is my xml
<evolutionlist>
<date id="22_05_2014">
<objet>
<identifier>1VD5-3452-8R5</identifier>
<link>Link1</link>
<title>EXCHANGE OF ELEMENTS</title>
</objet>
<objet>
<identifier>1V24-34A2-8C5</identifier>
<link>Link1</link>
<title>NEW ELEMENT</title>
</objet>
</date>
<date id="21_05_2014">
<identifier>1VV4-34A2-8C5</identifier>
<link>Link2</link>
<title>REPLACE</title>
</date>
</evolutionlist>
Ideally, I woudl like to display something like
22_05_2014
objet1 (with add infos)
objet2 (with add infos)
21_05_2014
objet3 (with add infos)
I made:
<xsl:for-each select="//date">
<xsl:value-of select="#id"/>
<xsl:for-each select="objet">
<tr>
<td>
<xsl:value-of select="identifier"/>
</td>
<td>
<xsl:value-of select="link"/>
</td>
<td>
<xsl:value-of select="title"/>
</td>
</tr>
</xsl:for-each>
</xsl:for-each>
but I got
22_05_2014 21_05_2014
objet1
objet2
objet3
where did I go wrong ?
EDIT
I tried
<xsl:for-each select="./objet">
for the second loop but that did not work either
Shame on me !
I forgot to do this:
<tr>
<td>
<xsl:value-of select="./#id"/>
</td>
</tr>
You were right, Ian Roberts, by encouraging me do write the real code
I use xslt to extract information form a xml file.
the xml file is :
<object>
<identifier>identifier</identifier>
<link>UD</link>
<title>Current Title</tite>
<impact>
<product>The product</product>
<evolution id="Evo 1">
<descr>Current description</descr>
</evolution>
</impact>
</object>
my xlst file is :
<xsl:for-each select="//object">
<tr>
<td>
<xsl:value-of select="identifier"/>
</td>
<td>
<xsl:value-of select="link"/>
</td>
<td>
<xsl:value-of select="title"/>
</td>
<td>
<xsl:value-of select="impact[1]/product"/>
</td>
<td>
<xsl:value-of select="impact[1]/product/evolution[1]/#id"/>
</td>
<td>
<xsl:value-of select="impact[1]/product/evolution[1]/descr"/>
</td>
</tr>
However, I can't seem to get the last two xsl values, I must have made a mistake (the first 4 columns are OK). Could you explain me why ?
Evolution is a sibling, not a child of product - it is a direct child of impact.
<xsl:value-of select="impact[1]/evolution[1]/#id"/>
<xsl:value-of select="impact[1]/evolution[1]/descr"/>
I have a Library object that contains a collection of Books... The Library object has properties like Name, Address, Phone... While the Book object has properties like ISDN, Title, Author, and Price.
XML looks something like this...
<Library>
<Name>Metro Library</Name>
<Address>1 Post Rd. Brooklyn, NY 11218</Address>
<Phone>800 976-7070</Phone>
<Books>
<Book>
<ISDN>123456789</ISDN>
<Title>Fishing with Luke</Title>
<Author>Luke Miller</Author>
<Price>18.99</Price>
</Book>
<Book>
<ISDN>234567890</ISDN>
<Title>Hunting with Paul</Title>
<Author>Paul Worthington</Author>
<Price>28.99</Price>
</Book>
...
And more books
...
</Books>
</Library>
I have a template with space for only 10 per page for example. There can be hundreds of books in the list of Books... So I need to limit the number of books and repeat the template every 10 books.
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<div>
<table>
<tr>
<td>NAME</td>
<td><xsl:value-of select="/Library/Name"/></td>
</tr>
<tr>
<td>ADDRESS</td>
<td><xsl:value-of select="/Library/Address"/></td>
</tr>
<tr>
<td>PHONE</td>
<td><xsl:value-of select="/Library/Phone"/></td>
</tr>
</table>
<table>
<xsl:for-each select="/Library/Books/Book">
<tr>
<td><xsl:value-of select="position()"/></td>
<td><xsl:value-of select="ISDN"/></td>
<td><xsl:value-of select="Title"/></td>
<td><xsl:value-of select="Author"/></td>
<td><xsl:value-of select="Price"/></td>
</tr>
</xsl:for-each>
</table>
</div>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
How can I get the Library information to appear on all repeating pages and add 10 books per page?... First page has Library info with Books 1 thru 10, Second page has Library info with Books 11 thru 20, and so on??
Thanks
For starters, try not to use for-each, apply-templates allows the engine to optimise the order that events are processed.
It appears that you are calling this stylesheet from some other system, so the approach I've taken is to define a pagination param. In the host language, when you call this just change the root parameter. This then allows you to select the require pages in this line here:
Books/Book[($page - 1)*10 < position() and position() <= ($page)*10]
This should do the trick.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:param name="page" select="1"/>
<xsl:template match="/Library">
<html>
<body>
<div>
<table>
<tr>
<td>NAME</td>
<td>
<xsl:value-of select="/Name"/>
</td>
</tr>
<tr>
<td>ADDRESS</td>
<td>
<xsl:value-of select="/Address"/>
</td>
</tr>
<tr>
<td>PHONE</td>
<td>
<xsl:value-of select="/Phone"/>
</td>
</tr>
</table>
<table>
<xsl:apply-templates select="Books/Book[($page - 1)*10 < position() and position() <= ($page)*10]"/>
</table>
</div>
</body>
</html>
</xsl:template>
<xsl:template match="/Book">
<tr>
<td>
<xsl:value-of select="position()"/>
</td>
<td>
<xsl:value-of select="ISDN"/>
</td>
<td>
<xsl:value-of select="Title"/>
</td>
<td>
<xsl:value-of select="Author"/>
</td>
<td>
<xsl:value-of select="Price"/>
</td>
</tr>
</xsl:template>
</xsl:stylesheet>