XSLT for-each, iterating through text and nodes - xslt

I'm still learning XSLT, and have a question about the for-each loop.
Here's what I have as far as XML
<body>Here is a great URL<link>http://www.somesite.com</link>Some More Text</body>
What I'd like is if the for-each loop iterate through these chunks
1. Here is a great URL
2. http://www.somesite.com
3. Some More Text
This might be simple, or impossible, but if anyone can help me out I'd appreciate it!
Thanks,
Michael

You should be able to do so with something like the following:
<xsl:for-each select=".//text()">
<!-- . will have the value of each chunk of text. -->
<someText>
<xsl:value-of select="." />
</someText>
</xsl:for-each>
or this may be preferable because it allows you to have a single template that you can invoke from multiple different places:
<xsl:apply-templates select=".//text()" mode="processText" />
<xsl:template match="text()" mode="processText">
<!-- . will have the value of each chunk of text. -->
<someText>
<xsl:value-of select="." />
</someText>
</xsl:for-each>

Related

Exclude first element of a certain type when doing apply-templates

This is my source XML:
<DEFINITION>
<DEFINEDTERM>criminal proceeding</DEFINEDTERM>
<TEXT> means a prosecution for an offence and includes –</TEXT>
<PARAGRAPH>
<TEXT>a proceeding for the committal of a person for trial or sentence for an offence; and</TEXT>
</PARAGRAPH>
<PARAGRAPH>
<TEXT>a proceeding relating to bail –</TEXT>
</PARAGRAPH>
<TEXT>but does not include a prosecution that is a prescribed taxation offence within the meaning of Part III of the Taxation Administration Act 1953 of the Commonwealth;</TEXT>
</DEFINITION>
This is my XSL:
<xsl:template name="DEFINITION" match="DEFINITION">
<xsl:element name="body">
<xsl:attribute name="break">before</xsl:attribute>
<xsl:element name="defn">
<xsl:attribute name="id" />
<xsl:attribute name="scope" />
<xsl:value-of select="DEFINEDTERM" />
</xsl:element>
<xsl:element name="text">
<xsl:value-of select="replace(TEXT[1],'–','--')" />
</xsl:element>
</xsl:element>
<xsl:apply-templates select="*[not(self::TEXT[1])]" />
</xsl:template>
As per my XSL, I want to do something with the DEFINEDTERM element and the TEXT element that immediately follows it.
Then I want to apply-templates to the rest of the elements, except for the DEFINEDTERM and TEXT element that have already been dealt with. Most importantly, I don't want to apply templates to the first TEXT element.
How do I achieve this, because my XSL above does not work.
I have other templates for TEXT and PARAGRAPH, but not DEFINEDTERM. I have <xsl:template match="*|#*" /> at the top of the XSL.
You did not post the expected result nor a minimal reproducible example, so I can only guess you want to do:
<xsl:template match="DEFINITION">
<body break="before">
<defn id="" scope="">
<xsl:value-of select="DEFINEDTERM" />
</defn>
<text>
<xsl:value-of select="replace(DEFINEDTERM/following-sibling::TEXT[1],'–','--')" />
</text>
</body>
<xsl:apply-templates select="* except (DEFINEDTERM | DEFINEDTERM/following-sibling::TEXT[1])" />
</xsl:template>
At least that's what I understand as:
I want to do something with the DEFINEDTERM element and the TEXT element that immediately follows it.
This is assuming you are using XSLT 2.0 or higher (otherwise you would not be able to use the replace() function).
--
P.S. You might want to make this a bit more efficient by defining DEFINEDTERM/following-sibling::TEXT[1] as a variable first, then referring to the variable instead.

optimization of XSLT code

While searching for some profiling tools for XSLT, I came across this post. Since a lot of people there suggested to just post the code and offered to give feedback on that, I was wondering if anyone could give me some feedback on mine. I tried this (http://www.saxonica.com/documentation/#!using-xsl/performanceanalysis), but the output html is not very detailed.
I'm new to XSLT and usually work with python/perl, where regex support is much better (however, I won't rule out the possibility that it's just my very basic understanding of XSLT). For the purpose of this project however, I had to work with XSLT. It could be that I'm forcing it to do things in a very unnatural way. Any comments -on performance in particular, but anything else is also welcome, as I'd like to learn- are welcome!
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<xsl:template name="my_terms">
<xsl:variable name="excludes" select="not (codeblock or draft-comment or filepath or shortdesc or uicontrol or varname)"/>
<!-- leftover example of how to work with excludes var -->
<!--<xsl:if test=".//*[$excludes]/text()[contains(.,'access management console')]"><li class="prodterm"><b>PB QA:access management console should be "AppCenter"</b></li></xsl:if>-->
<!-- Loop through all sentences and check for deprecated stuff -->
<xsl:for-each select=".//*[$excludes]/text()">
<xsl:variable name="sentenceList" select="tokenize(., '[\.!\?:;]\s+')"/>
<xsl:variable name="segment" select="."/>
<!-- main sentence loop -->
<xsl:for-each select="$sentenceList">
<xsl:variable name="sentence" select="."/>
<!-- very rudimentary sentence length check -->
<xsl:if test="count(tokenize(., '\W+')) > 30"> <li class="prodterm"><b>Sentence too long:</b> <xsl:value-of select="."/></li></xsl:if>
<!-- efforts to flag the shady case of the gerund -->
<xsl:if test="matches(., '\w+ \w+ing (the|a)')">
<!-- some extra checks to weed out the false positives -->
<xsl:if test="not(matches(., '\b(on|about|for|before|while|when|after|by|a|the|an|some|all|every) \w+ing (the|a)', '!i')) and not(matches(., 'during'))">
<li class="prodterm"><b>Possible unclear usage of gerund. If so, consider rewriting:</b> <xsl:value-of select="."/></li>
</xsl:if>
</xsl:if>
<!-- comma's after certain starting phrases -->
<xsl:if test="matches(., '^\s*Therefore[^,]')"><li class="prodterm"><b>Use a comma after starting a sentence with 'Therefore':</b> <xsl:value-of select="."/></li></xsl:if>
<xsl:if test="matches(., '^\s*(If you|Before|When)[^,]+$')"><li class="prodterm"><b>Use a comma after starting a sentence with 'Before', 'If you' or 'When':</b> <xsl:value-of select="."/></li></xsl:if>
<!-- experimenting with phrasal verbs (if there are a lot of verbs in phrasalVerbs.xml, it will be better to have this as the main loop (and do it outside the sentence loop)) -->
<xsl:for-each select="document('phrasalVerbs.xml')/verbs/verb[matches($sentence, concat('.* ', ./#text, ' .*'))]">
<xsl:variable name="verbPart" select="."/>
<xsl:for-each select="$verbPart/particles/particle/#text[matches($sentence, .) and not(matches($sentence, concat($verbPart/#text, ' ', .)))]">
<xsl:variable name="particle" select="."/>
<li class="prodterm"><b>Separated phrasal verb found in:</b> <xsl:value-of select="$sentence"/></li>
</xsl:for-each>
</xsl:for-each>
<!-- checking if conditionals (should be followed by then) -->
<xsl:if test="matches($sentence, '^\s*If\b', '!i') and not(matches($sentence, '\bthen\b', '!i'))"><li class="prodterm"><b>Conditional If found, but no then:</b> <xsl:value-of select="."/></li></xsl:if>
<!-- very dodgy way of detecting passive voice -->
<!--<xsl:if test="matches($sentence, '\b(are|can be|must be) \w+ed\b', '!i')"><li class="prodterm"><b>PB QA:Possible passive voice. If so, consider using active voice for:</b> <xsl:value-of select="."/></li></xsl:if>-->
<xsl:for-each select='document("generalDeprecatedTermsAndPhrases.xml")/terms/dt'>
<xsl:variable name="pattern" select="./#pattern"/>
<xsl:variable name="message" select="./#message"/>
<xsl:variable name="regexFlag" select="./#regexFlag"/>
<!-- <xsl:if test="matches($sentence, $pattern, $regexFlag)"> -->
<xsl:if test="matches($sentence, concat('(^|\W)', $pattern, '($|\W)'), $regexFlag)"> <!-- This is the work around for not being able to use \b when variable is passed on inside matches() -->
<li class="prodterm"><b><xsl:value-of select="$message"/> in: </b> <xsl:value-of select="$sentence"/> </li>
</xsl:if>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
To get an idea, the stripped down version of my "generalDeprecatedTermsAndPhrases.xml" looks like this:
<dt pattern='to be able to' message="Use 'to' instead of 'to be able to'" regexFlag="i"></dt>
</terms>
The reason that Saxon's profile is not very detailed is that your code is so monolithic: it's all in one great big template rule.
However, being monolithic isn't by itself the cause of any performance problems.
First observation is a functionality problem: your variable
<xsl:variable name="excludes" select="not (codeblock or draft-comment or filepath or shortdesc or uicontrol or varname)"/>
doesn't do what you think. It's evaluated with the root document node as the context item, and its value is a boolean which is true if the outermost element has a name which is not one of those listed. So I think your xsl:for-each that uses [$excludes] as a predicate is applying to all elements, whereas I suspect you intended it to apply to selected elements. I don't know how much that affects the performance.
The main influence on performance will be the cost of evaluating the regular expressions. The best way to find out which ones are causing the problem is to measure the impact of removing them one-by-one. When you've isolated the problem, there may be a way of rewriting the regular expression to make it perform better (e.g. by making it avoid backtracking).

Using XSL Choose to Substitute Empty String

I'm trying to substitute an empty value in a csv file with a number.
Here's an example:
1111111,,11222
So I tried this:
<xsl:template match="/">
<xsl:apply-templates select="//tr" />
</xsl:template>
<xsl:template match="tr">
<document>
<content name="title">
<xsl:value-of select="td[1]/text()" />
</content>
<content name="loanID">
<xsl:value-of select="td[1]/text()" />
</content>
<content name="cNumber">
<xsl:variable name="score" select="td[2]/text()" />
<xsl:choose>
<xsl:when test="$score=''">
<xsl:value-of select="550" />
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="td[18]/text()" />
</xsl:otherwise>
</xsl:choose>
</content>
</document>
</xsl:template>
I constantly get a null value for the cNumber node when the value is empty in the row, and I'm expecting my code to substitute the empty value for '550'. What am I doing wrong? I checked this question here: and it seems like this should work. I'm using a special application for this but my guess is the fault lies with me.
Thanks
If the td element is empty, then td/text() returns an empty sequence/node-set, and when you compare an empty sequence to '', the result is false. This is one of the reasons that many people advise against using text(). You're not interested here in the text node, you are interested in the string value of the td element, and to get that you should use
<xsl:variable name="score" select="string(td[2])" />
Your other uses of text() are also incorrect, though you're only likely to see a problem if your XML input contains comments or processing instructions. But you should get out of this coding habit, and replace
<xsl:value-of select="td[1]/text()" />
by
<xsl:value-of select="td[1]" />
As a general rule, when you see /text() in an XPath expression it's usually wrong.

When is match-attribute of xsl:template applied? (and how can it be overridden)

As I have understood the match atribute of a template tag, it defines what part of the xml tree that will be enclosed in the template.
However ther seem to be some exceptions, I have a working peace of code, lite this:
<xsl:template match="/root/content">
<xsl:for-each select="/root/meta/errors/error">
<p>
<strong>Error:</strong> <xsl:value-of select="message" /> (<xsl:value-of select="data/param" />)<br />
<xsl:for-each select="data/option">
<xsl:value-of select="." /><br />
</xsl:for-each>
</p>
<br /><br />
</xsl:for-each>
</xsl:template>
But when I try to add a conditional like this:
<xsl:template match="/root/content">
<xsl:if test="not(/root/meta/error/errors/data/param)"-->
<xsl:for-each select="/root/meta/errors/error">
<p>
<strong>Error:</strong> <xsl:value-of select="message" /> (<xsl:value-of select="data/param" />)<br />
<xsl:for-each select="data/option">
<xsl:value-of select="." /><br />
</xsl:for-each>
</p>
<br /><br />
</xsl:for-each>
<xsl:call-template name="trip_form">
<xsl:with-param name="type" select="'driver'" />
<xsl:with-param name="size" select="'savetrip'" />
</xsl:call-template>
</xsl:if>
</xsl:template>
It doesn't work any more, why, and how can I make it work again?
Attribute matches are applied when you ask for it (you are pulling with complex and unneeded for-each resulting in no attribute matching at all), otherwise they are ignored. That's why the copy idiom is used with specific attribute apply-templates:
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="* | #*" />
</xsl:copy>
</xsl:template>
When it comes to the order in which they are applied, the order is the document order, which means: after the element is applied, its attributes will be applied (in undetermined order) and then the element's children are applied. Attributes never have children and their parent is the containing element.
"it defines what part of the xml tree that will be enclosed in the template."
No. It is called when the processor encounters input that matches the specification, or when you specifically apply this input by using xsl:apply-templates. Your code should not use xsl:for-each, that's rarely needed. Instead, use xsl:apply-templates. This will also give you the possibility to match the attributes when you like.
Normally, you don't (need to) specify the parent in the match-attribute of apply-templates. And you surely don't write down the whole path inside the templates each time, that will wreak havoc on usability of your stylesheet... Try something like this instead and have a look at some XSL tutorials on the net (w3schools provides some basic information and Tennison's book is next to invaluable to learn about this variant of functional programming):
<xsl:template match="/">
<xsl:apply-templates select="/root/content" />
</xsl:template>
<xsl:template match="content">
<xsl:apply-templates select="errors/error" />
</xsl:template>
<xsl:template match="error">
<p>
<strong>Error:</strong>
<xsl:value-of select="message" />
(<xsl:value-of select="data/param" />)
<br />
<xsl:apply-templates select="data/option" />
</p>
<br /><br />
</xsl:template>
<xsl:template match="option">
<xsl:value-of select="." /><br />
</xsl:template>
"It doesn't work any more, why, and how can I make it work again?"
Because your if-statement is probably always true (or always false). Reason: if anywhere in your document the XPath is correct, it will always be false, if it is never correct, it will always be true. Using xsl:if with an XPath that starts in the root will, for the live of the transformation, always yield the same result. Not sure what you are after, so I cannot really help you further here. Normally, instead of xsl:if, we tend to use a matching template (again, yes, I know it gets boring ;).
Note: you ask something about attributes in your question, this I tried to answer in the opening paragraph (before this edit). However, there's nothing about attributes inside your code, so I don't know how to really help you.
Note on the note: LarsH suggests that you perhaps mean to ask about the match-attribute inside xsl:template. If so, the answer lies in the text above anywhere, where I talk about apply-templates and the sort. In short: the input document is processed, node by node, possibly directed by xsl:apply-templates, and it tries to find a matching template for each node it's currently at. That's all there is to it.

How do I render a comma delimited list using xsl:for-each

I am rendering a list of tickers to html via xslt and I would like for the list to be comma deliimited. Assuming I was going to use xsl:for-each...
<xsl:for-each select="/Tickers/Ticker">
<xsl:value-of select="TickerSymbol"/>,
</xsl:for-each>
What is the best way to get rid of the trailing comma? Is there something better than xsl:for-each?
<xsl:for-each select="/Tickers/Ticker">
<xsl:if test="position() > 1">, </xsl:if>
<xsl:value-of select="TickerSymbol"/>
</xsl:for-each>
In XSLT 2.0 you could do it (without a for-each) using the string-join function:
<xsl:value-of select="string-join(/Tickers/Ticker, ',')"/>
In XSLT 1.0, another alternative to using xsl:for-each would be to use xsl:apply-templates
<xsl:template match="/">
<!-- Output first element without a preceding comma -->
<xsl:apply-templates select="/Tickers/Ticker[position()=1]" />
<!-- Output subsequent elements with a preceding comma -->
<xsl:apply-templates select="/Tickers/Ticker[position()>1]">
<xsl:with-param name="separator">,</xsl:with-param>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="Ticker">
<xsl:param name="separator" />
<xsl:value-of select="$separator" /><xsl:value-of select="TickerSymbol" />
</xsl:template>
I know you said xsl 2.0 is not an option and it has been a long time since the question was asked, but for all those searching for a posibility to do what you wanted to achieve:
There is an easier way in xsl 2.0 or higher
<xsl:value-of separator=", " select="/Tickers/Ticker/TickerSymbol" />
This will read your /Tickers/Ticker elements and insert ', ' as separator where needed
If there is an easier way to do this I am looking forward for advice
Regards Kevin