xslt 1.0 template that reduces multiple spaces to a single space - xslt

In my XSLT 2.0 stylesheet, I use the following template reduces multiple spaces to a single space.
<xsl:template match="text()">
<xsl:value-of select="replace(., '\s+', ' ')"/>
</xsl:template>
I'd like to do the same thing in a XSLT 1.0 stylesheet, but the "replace" function is not supported. Any suggestions for what I can do?

You could use normalize-space():
<xsl:template match="text()">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
This will remove any leading and trailing whitespace and reduce multiple spaces to a single space.
For reference: https://developer.mozilla.org/en-US/docs/Web/XPath/Functions/normalize-space

Related

One pesky blank line in an XSLT transform

I'm using XSLT to extract some data from a trademark XML file from the Patent and Trademark Office. It's mostly okay, except for one blank line. I can get rid of it with a moderately ugly workaround, but I'd like to know if there a better way.
Here's a subset of my XSLT:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tm="http://www.wipo.int/standards/XMLSchema/trademarks" xmlns:pto="urn:us:gov:doc:uspto:trademark:status">
<xsl:output method="text" encoding="utf-8" />
<xsl:strip-space elements="*"/>
<xsl:template match="tm:Transaction">
<xsl:apply-templates select=".//tm:TradeMark"/>
<xsl:apply-templates select=".//tm:ApplicantDetails"/>
<xsl:apply-templates select=".//tm:MarkEvent"/>
</xsl:template>
<xsl:template match="tm:TradeMark">
MarkCurrentStatusDate,"<xsl:value-of select="normalize-space(tm:MarkCurrentStatusDate)"/>"<xsl:text/>
ApplicationNumber,"<xsl:value-of select="normalize-space(tm:ApplicationNumber)"/>"<xsl:text/>
ApplicationDate,"<xsl:value-of select="normalize-space(tm:ApplicationDate)"/>"<xsl:text/>
RegistrationNumber,"<xsl:value-of select="normalize-space(tm:RegistrationNumber)"/>"<xsl:text/>
RegistrationDate,"<xsl:value-of select="normalize-space(tm:RegistrationDate)"/>"<xsl:text/>
<xsl:apply-templates select="tm:WordMarkSpecification"/>
<xsl:apply-templates select="tm:TradeMarkExt"/>
<xsl:apply-templates select="tm:PublicationDetails"/>
<xsl:apply-templates select="tm:RepresentativeDetails"/>
</xsl:template>
<xsl:template match="tm:WordMarkSpecification">
MarkVerbalElementText,"<xsl:value-of select="normalize-space(tm:MarkVerbalElementText)"/>"<xsl:text/>
</xsl:template>
It has a few more templates, but that's the gist of it. I always get a blank line at the very beginning of the output, before any data; I don't get any other blank lines. My circumvention is to combine the two lines:
<xsl:template match="tm:TradeMark">
MarkCurrentStatusDate,"<xsl:value-of select="normalize-space(tm:MarkCurrentStatusDate)"/>"<xsl:text/>
into a single line:
<xsl:template match="tm:TradeMark">MarkCurrentStatusDate,"<xsl:value-of select="normalize-space(tm:MarkCurrentStatusDate)"/>"<xsl:text/>
This works, and I guess I'm okay with it if there's nothing better, but it seems inelegant and like a kludge to me. None of the other templates need this treatment (e.g. the tm:WordMarkSpecification template or another six after that)), and I'm confused why it's needed here. Any ideas?
Because I can see the specific point in the XSLT that's inserting the blank line, I presume it's not helpful to provide the XML I'm testing on, but if you do need to see it, you can get it at https://tsdrapi.uspto.gov/ts/cd/casestatus/rn2178784/download.zip ; it's the XML file in that archive.
Use at the beginning of the template the same trick you are using at the end of the template to chop up the stylesheet node tree with empty <xsl:text/> instructions:
<xsl:template match="tm:TradeMark">
<xsl:text/>MarkCurrentStatusDate,"<xsl:value-of select="normalize-space(tm:MarkCurrentStatusDate)"/>"<xsl:text/>
Personally, I think it's cleaner to use a concat() when you need to combine static text and dynamic values:
<xsl:template match="tm:TradeMark">
<xsl:value-of
select="concat(
'MarkCurrentStatusDate,"', normalize-space(tm:MarkCurrentStatusDate), '"',
'ApplicationNumber,"', normalize-space(tm:ApplicationNumber), '"',
'ApplicationDate,"', normalize-space(tm:ApplicationDate), '"',
'RegistrationNumber,"', normalize-space(tm:RegistrationNumber), '"',
'RegistrationDate,"', normalize-space(tm:RegistrationDate), '"'
)"/>
<xsl:apply-templates select="tm:WordMarkSpecification"/>
<xsl:apply-templates select="tm:TradeMarkExt"/>
<xsl:apply-templates select="tm:PublicationDetails"/>
<xsl:apply-templates select="tm:RepresentativeDetails"/>
</xsl:template>
This should also solve your issue with the blank spaces showing up.
Remember you can always just play with the XML syntax to ignore end-of-line sequences that are inside of start and end tag delimiters:
<xsl:template match="tm:TradeMark"
>MarkCurrentStatusDate,"<xsl:value-of select="normalize-space(tm:MarkCurrentStatusDate)"
/>"ApplicationNumber,"<xsl:value-of select="normalize-space(tm:ApplicationNumber)"
/>"ApplicationDate,"<xsl:value-of select="normalize-space(tm:ApplicationDate)"
/>"RegistrationNumber,"<xsl:value-of select="normalize-space(tm:RegistrationNumber)"
/>"RegistrationDate,"<xsl:value-of select="normalize-space(tm:RegistrationDate)"
/>"<xsl:apply-templates select="tm:WordMarkSpecification"/>
<xsl:apply-templates select="tm:TradeMarkExt"/>
<xsl:apply-templates select="tm:PublicationDetails"/>
<xsl:apply-templates select="tm:RepresentativeDetails"/>
</xsl:template>
There is no rule in XML that a tag's closing delimiter /> has to be on the same line as the tag's opening delimiter <. White-space inside of a tag is ignored (where innocuous), and an end-of-line sequence is considered white-space.
If you want all of the text to be emitted without extra line breaks or white-space, then put the literal text inside of <xsl:text> elements.
<xsl:template match="tm:TradeMark">
<xsl:text>MarkCurrentStatusDate,"</xsl:text>
<xsl:value-of select="normalize-space(tm:MarkCurrentStatusDate)"/>
<xsl:text>"</xsl:text>
<xsl:text>ApplicationNumber,"</xsl:text>
<xsl:value-of select="normalize-space(tm:ApplicationNumber)"/>
<xsl:text>"</xsl:text>
<xsl:text>ApplicationDate,"</xsl:text>
<xsl:value-of select="normalize-space(tm:ApplicationDate)"/>
<xsl:text>"</xsl:text>
<xsl:text>RegistrationNumber,"</xsl:text>
<xsl:value-of select="normalize-space(tm:RegistrationNumber)"/>
<xsl:text>"</xsl:text>
<xsl:text>RegistrationDate,"</xsl:text>
<xsl:value-of select="normalize-space(tm:RegistrationDate)"/>
<xsl:text>"</xsl:text>
<xsl:apply-templates select="tm:WordMarkSpecification"/>
<xsl:apply-templates select="tm:TradeMarkExt"/>
<xsl:apply-templates select="tm:PublicationDetails"/>
<xsl:apply-templates select="tm:RepresentativeDetails"/>
</xsl:template>
That way, none of the line breaks and white-space inside of the <xsl:template> will be seen as significant and will not be included in the result tree output.

XSL Analyze-String -> Matching-Substring into multiple variables

I was wondering if it is possible to use analyze-string and set multiple groups within the RegEx and then store all of the matching groups in variables to use later on.
like so:
<xsl:analyze-string regex="^Blah\s+(\d+)\s+Bloo\s+(\d+)\s+Blee" select=".">
<xsl:matching-substring>
<xsl:variable name="varX">
<xsl:value-of select="regex-group(1)"/>
</xsl:variable>
<xsl:variable name="varY">
<xsl:value-of select="regex-group(2)"/>
</xsl:variable>
</xsl:matching-substring>
</xsl:analyze-string>
This doesn't actually work, but that's the sort of thing I'm after, I know I can wrap the analyze-string in a variable, but that seems daft that for every group I have to process the RegEx, not very efficient, I should be able to process the regex once and store all of the groups for use later on.
Any ideas?
Well does
<xsl:variable name="groups" as="element(group)*">
<xsl:analyze-string regex="^Blah\s+(\d+)\s+Bloo\s+(\d+)\s+Blee" select=".">
<xsl:matching-substring>
<group>
<x><xsl:value-of select="regex-group(1)"/></x>
<y><xsl:value-of select="regex-group(2)"/></y>
</group>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:variable>
help? That way you have a variable named groups which is a sequence of group elements with the captures.
This transformation shows that xsl:analyze-string isn't necessary to obtain the wanted results -- a simpler and generic solution exists.:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="*[matches(., '^Blah\s+(\d+)\s+Bloo\s+(\d+)\s+Blee')]">
<xsl:variable name="vTokens" select=
"tokenize(replace(., '^Blah\s+(\d+)\s+Bloo\s+(\d+)\s+Blee', '$1 $2'), ' ')"/>
<xsl:variable name="varX" select="$vTokens[1]"/>
<xsl:variable name="varY" select="$vTokens[2]"/>
<xsl:sequence select="$varX, $varY"/>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document:
<t>Blah 123 Bloo 4567 Blee</t>
which produces the wanted, correct result:
123 4567
Here we don't rely on knowing the RegEx (can be supplied as parameter) and the string -- we just replace the string with a delimited string of the RegEx groups, which we then tokenize and every item in the sequence produced by tokenize() can readily be assigned to a corresponding variable.
We don't have to find the wanted results buried in a temp. tree -- we just get them all in a result sequence.

Formatting string (Removing leading zeros)

I am newbie to xslt. My requirement is to transform xml file into text file as per the business specifications. I am facing an issue with one of the string formatting issue. Please help me out if you have any idea.
Here is the part of input xml data:
"0001295"
Expected result to print into text file:
1295
My main issue is to remove leading Zeros. Please share if you have any logic/function.
Just use this simple expression:
number(.)
Here is a complete example:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="t">
<xsl:value-of select="number(.)"/>
</xsl:template>
</xsl:stylesheet>
When applied on this XML document:
<t>0001295</t>
the wanted, correct result is produced:
1295
II. Use format-number()
format-number(., '#')
There are a couple of ways you can do this. If the value is entirely numeric (for example not a CSV line or part of a product code such as ASN0012345) you can convert from a string to a number and back to a string again :
string(number($value)).
Otherwise just replace the 0's at the start :
replace( $value, '^0*', '' )
The '^' is required (standard regexp syntax) or a value of 001201 will be replaced with 121 (all zero's removed).
Hope that helps.
Dave
Here is one way you could do it in XSLT 1.0.
First, find the first non-zero element, by removing all the zero elements currently in the value
<xsl:variable name="first" select="substring(translate(., '0', ''), 1, 1)" />
Then, you can find the substring-before this first character, and then use substring-after to get the non-zero part after this
<xsl:value-of select="substring-after(., substring-before(., $first))" />
Or, to combine the two statements into one
<xsl:value-of select="substring-after(., substring-before(., substring(translate(., '0', ''), 1, 1)))" />
So, given the following input
<a>00012095Kb</a>
Then using the following XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/a">
<xsl:value-of select="substring-after(., substring-before(., substring(translate(., '0', ''), 1, 1)))" />
</xsl:template>
</xsl:stylesheet>
The following will be output
12095Kb
As a simple alternative in XSLT 2.0 that can be used with numeric or alpha-numeric input, with or without leading zeros, you might try:
replace( $value, '^0*(..*)', '$1' )
This works because ^0* is greedy and (..*) captures the rest of the input after the last leading zero. $1 refers to the captured group.
Note that an input containing only zeros will output 0.
XSLT 2.0
Remove leading zeros from STRING
<xsl:value-of select="replace( $value, '^0+', '')"/>
You could use a recursive template that will remove the leading zeros:
<xsl:template name="remove-leading-zeros">
<xsl:param name="text"/>
<xsl:choose>
<xsl:when test="starts-with($text,'0')">
<xsl:call-template name="remove-leading-zeros">
<xsl:with-param name="text"
select="substring-after($text,'0')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Invoke it like this:
<xsl:call-template name="remove-leading-zeros">
<xsl:with-param name="text" select="/path/to/node/with/leading/zeros"/>
</xsl:call-template>
</xsl:template>
<xsl:value-of select="number(.) * 1"/>
works for me
All XSLT1 parser, like the popular libXML2's module for XSLT, have the registered functions facility... So, we can suppose to use it. Suppose also that the language that call XSLT, is PHP: see this wikibook about registerPHPFunctions.
The build-in PHP function ltrim can be used in
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fn="http://php.net/xsl">
<xsl:output method="xml" encoding="utf-8" indent="yes"/>
<xsl:template match="test">
show <xsl:value-of select="fn:function('ltrim',string(.),'0')" />",
</xsl:template>
</xsl:stylesheet>
Now imagine a little bit more complex problem, to ltrim a string with more than 1 number, ex. hello 002 and 021, bye.
The solution is the same: use registerPHPFunctions, except to change the build-in function to a user defined one,
function ltrim0_Multi($s) {
return preg_replace('/(^0+|(?<= )0+)(?=[1-9])/','',$s);
}
converts the example into hello 2 and 21, bye.

Trim whitespace from parent element only

I'd like to trim the leading whitespace inside p tags in XML, so this:
<p> Hey, <em>italics</em> and <em>italics</em>!</p>
Becomes this:
<p>Hey, <em>italics</em> and <em>italics</em>!</p>
(Trimming trailing whitespace won't hurt, but it's not mandatory.)
Now, I know normalize-whitespace() is supposed to do this, but if I try to apply it to the text nodes..
<xsl:template match="text()">
<xsl:text>[</xsl:text>
<xsl:value-of select="normalize-space(.)"/>
<xsl:text>]</xsl:text>
</xsl:template>
...it's applied to each text node (in brackets) individually and sucks them dry:
[Hey,]<em>[italics]</em>[and]<em>[italics]</em>[!]
My XSLT looks basically like this:
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
So is there any way I can let apply-templates complete and then run normalize-space on the output, which should do the right thing?
This stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p//text()[1][generate-id()=
generate-id(ancestor::p[1]
/descendant::text()[1])]">
<xsl:variable name="vFirstNotSpace"
select="substring(normalize-space(),1,1)"/>
<xsl:value-of select="concat($vFirstNotSpace,
substring-after(.,$vFirstNotSpace))"/>
</xsl:template>
</xsl:stylesheet>
Output:
<p>Hey, <em>italics</em> and <em>italics</em>!</p>
Edit 2: Better expression (now only three function calls).
Edit 3: Matching the first descendant text node (not just the first node if it's a text node). Thanks to #Dimitre's comment.
Now, with this input:
<p><b> Hey, </b><em>italics</em> and <em>italics</em>!</p>
Output:
<p><b>Hey, </b><em>italics</em> and <em>italics</em>!</p>
I would do something like this:
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
<!-- strip leading whitespace -->
<xsl:template match="p/node()[1][self::text()]">
<xsl:call-template name="left-trim">
<xsl:with-param name="s" value="."/>
</xsl:call-template>
</xsl:template>
This will strip left space from the initial node child of a <p> element, if it is a text node. It will not strip space from the first text node child, if it is not the first node child. E.g. in
<p><em>Hey</em> there</p>
I intentionally avoid stripping the space from the front of 'there', because that would make the words run together when rendered in a browser. If you did want to strip that space, change the match pattern to
match="p/text()[1]"
If you also want to strip trailing whitespace, as your title possibly implies, add these two templates:
<!-- strip trailing whitespace -->
<xsl:template match="p/node()[last()][self::text()]">
<xsl:call-template name="right-trim">
<xsl:with-param name="s" value="."/>
</xsl:call-template>
</xsl:template>
<!-- strip leading/trailing whitespace on sole text node -->
<xsl:template match="p/node()[position() = 1 and
position() = last()][self::text()]"
priority="2">
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
The definitions of the left-trim and right-trim templates are at Trim Template for XSLT (untested). They might be slow for documents with lots of <p>s. If you can use XSLT 2.0, you can replace the call-templates with
<xsl:value-of select="replace(.,'^\s+','')" />
and
<xsl:value-of select="replace(.,'\s+$','')" />
(Thanks to Priscilla Walmsley.)
You want:
<xsl:template match="text()">
<xsl:value-of select=
"substring(
substring(normalize-space(concat('[',.,']')),2),
1,
string-length(.)
)"/>
</xsl:template>
This wraps the string in "[]", then performs normalize-string(), then finally removes the wrapping characters.

XSLT: Regular Expression function does not work?

Ok, this one has been driving me up the wall...
I have a xslt function that is supposed to split out the Zip-code part from a Zip+City string depending on the country. I cannot get it to work! This is what I got so far:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/functions" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:function name="exslt:GetZip" as="xs:string">
<xsl:param name="zipandcity" as="xs:string"/>
<xsl:param name="countrycode" as="xs:string"/>
<xsl:choose>
<xsl:when test="$countrycode='DK'">
<xsl:analyze-string select="$zipandcity" regex="(\d{4}) ([A-Za-zÆØÅæøå]{3,24})">
<xsl:matching-substring>
<xsl:value-of select="regex-group(1)"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:text>fail</xsl:text>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:when>
<xsl:otherwise>
<xsl:text>error</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
I am running it on a source XML where the following values are passed to the function:
zipandcity: "DK-2640 København SV"
countrycode: "DK"
...will output 'fail'!
I think there is something I am misunderstanding here...
Aside from that facts that regexes aren't supported until XSLT 2.0 and braces have to be escaped (but backslashes don't), there's one more reason why that code won't work: XSLT regexes are implicitly anchored at both ends. Given the string DK-2640 København SV, your regex only matches 2640 København, so you need to "pad" it to make it consume the whole string:
regex=".*(\d{{4}}) ([A-Za-zÆØÅæøå]{{3,24}}).*"
.* is probably sufficient in this case, but sometimes you have to be more specific. For example, if there's more than one place where \d{4} could match, you might use \D* at the beginning to make sure the first capturing group matches the first bunch of digits.
The regex attribute is parsed as an attribute value template whery curly braces have a special meaning. If this is in fact an XSL 2.0 Stylesheet, you need to escape the curly braces in the regex attribute by doubling them: (\d{{4}}) ([A-Za-zÆØÅæøå]{{3,24}})
Alternatively you could define a variable containing your pattern like this:
<xsl:variable name="pattern">(\d{4}) ([A-Za-zÆØÅæøå]{3,24})</xsl:variable
<xsl:analyze-string select="$zipandcity" regex="{$pattern}">
Regular expressions are only supported in XSLT 2.x -- not in XSLT 1.0.