Regex search in XSL, select string after match - regex

I have a solution where the filename has a prefix showing the filesize of a PDF. I need to pick up that value in to a XML-file that has a lot of other info that is collected with the XSLT.
How ever I can't get just this Regex match to work.
Filename have this structure as this example:
776524_P9466_Novilon_Broschyr_SE_Omslag.xml where the digits before the underscore is the filesize.
I have a Regex search pattern of _(.*) and I can validate that it will match everything after the first section of the digits.
Here is my XSL that I'm having problems with:
<xsl:param name="find_size">
<xsl:text>(_.*)</xsl:text>
</xsl:param>
<xsl:variable name="filename_of_start"><xsl:value-of select="replace($filename_of_file, '$find_size', '')"/></xsl:variable>
<artwork_size><xsl:value-of select="$filename_of_start"/></artwork_size>
$filename_of_file has the string: 776524_P9466_Novilon_Broschyr_SE_Omslag.xml
I have also tried to match the digits before the underscore and replace with that match but haven't got that to work either. Other replaces where I remove other matches from the beginning of the string works.
Thanks

How about using the substring-before() XPath function?
<xsl:variable name="file_size" select="substring-before($filename, '_')" />

Instead of replace($filename_of_file, '$find_size', '') you want replace($filename_of_file, $find_size, '').

Related

How XSLT handles regex \w?

I have an intput xml file with the name of "718322_c341b0-TEST_NOC_20160423121052.XML", which in my XSLT is assigned to $SourceFile. I am trying to test if the $SourceFile contains the string of "-TEST" using the following code:
<xsl:if test="matches($SourceFile, '^\w+-TEST.*')">
However, it did not match. So I updated the code to
<xsl:if test="matches($SourceFile, '^[A-Za-z0-9_]+-TEST.*')">
Then I got a match. I did more testing and the following code got a match, too.
<xsl:if test="matches($SourceFile, '^\w+_\w+-TEST.*')">
Here's what confused me, I think \w means [A-Za-z0-9_], correct? Why \w did not work in this case? It seems to have a trouble including the underscore. Thanks!
See https://www.w3.org/TR/xmlschema-2/#charcter-classes, the class \w is defined as [#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}], explained as 'all characters except the set of "punctuation", "separator" and "other" characters', that seems to exclude _.

regex pattern for replacement of string

I am trying to replace the string with the below regex pattern but it is not getting replaced. I tried different combinations also but nothing worked. Any Idea?
<regex pattern="jre64\/1\.6\.0"
replacement="jre64/1.7.0" />

xslt 2.0 how replace $ by escaped dollar (for conversion to LaTeX)

I am new to XSLT. I googled extensively but couldn't figure out how to do the following:
I am transforming XML to LaTeX. Of course, LaTeX needs to escape characters such as $ and #. I tried the following in the replace function but it does not work. (They do work without the replace function.)
<xsl:template match="xyz:doc">
\subsubsection{<xsl:value-of select="replace( xyz:headline, '(\$)', '\$1' )"/>}
...
</xsl:template>
<xsl:template match="xyz:doc">
\subsubsection{<xsl:value-of select="replace( xyz:headline, '\$', '\$' )"/>}
...
</xsl:template>
Possible content to be escaped is:
"Locally defined field #931" or
"Locally defined subfield $b"
What am I doing wrong?
Many thanks for your answers!
If you want to replace a dollar symbol $ in the input with \$ in the output then use replace(xyz:headline, '\$', '\\\$').
If there are several characters that need the same escaping then replace(xyz:headline, '([$#])', '\\$1') should do.
Sample at http://xsltransform.net/bdxtqX/1

How to find a word within text using XSLT 2.0 and REGEX (which doesn't have \b word boundary)?

I am attempting to scan a string of words and look for the presence of a particular word(case insensitive) in an XSLT 2.0 stylesheet using REGEX.
I have a list of words that I wish to iterate over and determine whether or not they exist within a given string.
I want to match on a word anywhere within the given text, but I do not want to match within a word (i.e. A search for foo should not match on "food" and a search for bar should not match on "rebar").
XSLT 2.0 REGEX does not have a word boundary(\b), so I need to replicate it as best I can.
You can use alternation to avoid repetition:
<xsl:if test="matches($prose, concat('(^|\W)', $word, '($|\W)'),'i')">
If your XSLT 2.0 processor is Saxon 9 then you can use Java regular expression syntax (including \b) with the functions matches, tokenize and replace by starting the flag attribute with an exclamation mark:
<xsl:value-of select="matches('all foo is bar', '\bfoo\b', '!i')"/>
Michael Kay mentioned that option recently on the XSL mailing list.

XSL disable-output-escaping removes whitespaces

Part of the XML:
<text><b>Title</b> <b>Happy</b></text>
In my XSL I have:
<xsl:value-of select="text" disable-output-escaping="yes" />
My output becomes
**TitleHappy**
My spacing went missing - there's supposed to be a space between </b> and <b>.
I tried normalize-space(), it doesn't work.
Any suggestions? Thanks!
if you want whitespace from an xsl, use:
<xsl:text> </xsl:text>
whitespace is only preserved if its recognized as a text node (ie: " a " both spaces will be recognized)
whitespace from the orignal source xml has to be preserved by telling the parser (for example)
parser.setPreserveWhitespace(true);
As your outputting HTML you could substitute your space with a non-breaking space
Do you have any control over the generation of the original XML? If so, you could try adding xml:space="preserve" to the text element which should tell the processor to keep the whitespace.
<text xml:space="preserve"><b>Title</b> <b>Happy</b></text>
Alternatively, try looking at the "xsl:preserve-space" element in XSLT.
<xsl:preserve-space elements="text"/>
Although I have never used this personally, it might of some help. See W3Schools for more information.
thank you for everyone's input.
Currently I am using MattH suggestion which is to test for space and substitue to non-breaking space. Another method I thought of is to test for "</b> <b>" and substitue with " </b><b>". The space contain within a bold tags are actually output. Both methods worked. Don't know what the implications are though. And I still can't figure out why the spacing is removed when it is found between 2 seperate bold tags.