How to remove a space in XML using stylesheet using RegEx - regex

I have an XML and I am looking for finding particular tag (in this case "FirstName") and removing space in the value only if there is a - character before the space.
In other words, I want to keep spaces if there is no - front of them. I want to do this using an XSL stylesheet with RegEx matching and replace function.
Expected result is Sam-Louise, removing space between "Sam-" and "Louise"
<?xml version="1.0" encoding="utf-8"?>
<NCV Version="1.14">
<Invoice>
<customer>
<customerId>12785</customerId>
<FirstName>Sam- Louise</FirstName>
<LastName>Jones</LastName>
</customer>
</Invoice>
</NCV>

This is one possible XSLT :
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="html" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:template match="FirstName">
<FirstName>
<xsl:value-of select="replace(., '-\s+', '-')"/>
</FirstName>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:transform>
xsltransform.net demo
output :
<NCV Version="1.14">
<Invoice>
<customer>
<customerId>12785</customerId>
<FirstName>Sam-Louise</FirstName>
<LastName>Jones</LastName>
</customer>
</Invoice>
</NCV>

You can use following RegEx in match
(\<FirstName\>.*?-)\s+
And replace it with the first captured group $1
RegEx (\<FirstName\>.*?-)\s+ matches,
\<FirstName\>.*?-: Literal <FirstName> followed by any character non-greedy, until first hyphen is found. This match is added in the captured group.
\s+: Match one or more of the space characters.
By replacing it with $1, will remove the spaces after hyphen.

Related

XSLT - Get String between commas

How can I get the value 'four' in XSLT?
<root>
<entry>(one,two,three,four,five,six)</entry>
</root>
Thanks in advance.
You didn't specify the XSLT version, so I assume version 2.0.
I also assume that word four is only a "marker", stating from which place
take the result string (between the 3rd and 4th comma).
To get the fragment you want, you can:
Use tokenize function to "cut" the whole content of entry
into pieces, using a comma as the cutting pattern.
Take the fourth element of the result array.
This expression can be used e.g. in a template matching entry.
So the example script can look like below:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="entry">
<xsl:copy>
<xsl:value-of select="tokenize(., ',')[4]"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy><xsl:apply-templates select="#*|node()"/></xsl:copy>
</xsl:template>
</xsl:transform>
For your input XML it gives:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<entry>four</entry>
</root>

XSLT 2.0 regular expression replace

I have the following XML:
<t>a_35345_0_234_345_666_888</t>
I would like to replace the first occurrence of number after "_" with a fixed number 234. So the result should look like:
<t>a_234_0_234_345_666_888</t>
I have tried using the following but it does not work:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:value-of select='replace(., "(.*)_\d+_(.*)", "$1_234_$2")'/>
</xsl:template>
</xsl:stylesheet>
UPDATE
The following works for me (thanks #Chris85). Just remove the underscore and add "? to make it non greedy.
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:value-of select='replace(., "(.*?)_\d+(.*)", "$1_234$2")'/>
</xsl:template>
</xsl:stylesheet>
Your regex is/was greedy, the .* consumes everything until the last occurrence of the next character.
So
(.*)_\d+_(.*)
was putting
a_35345_0_234_345_666_
into $1. Then 888 was being removed and nothing went into $2.
To make it non-greedy add a ? after the .*. This tells the * to stop at the first occurrence of the next character.
Functional example:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:value-of select='replace(., "(.*?)_\d+(.*)", "$1_234$2")'/>
</xsl:template>
</xsl:stylesheet>
Here's some more documentation on repetition and greediness, http://www.regular-expressions.info/repeat.html.

XSLT - match doesn't seem to work

I have this line in my transform:
<xsl:template match="simplesect[#kind='since']">
When I apply it to the following:
<detaileddescription>
<para><simplesect kind="since">
<para>yesterday</para>
</simplesect></para></detaileddescription>
I expect it to work.
However, I noticed that a space needs to exist between and tags.
So the following matches where the above doesn't
<detaileddescription>
<para> <simplesect kind="since">
<para>yesterday</para>
</simplesect></para></detaileddescription>
I'm stumped. Any ideas why or is here a call I make? Right now, the only solution I have is to find every instance of <para><simplesect #kind="since">and changing it to <para> <simplesect #kind=since. Notice the space between <para> and <simplesect>
This stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="simplesect[#kind='since']">
<modified><xsl:apply-templates/></modified>
</xsl:template>
</xsl:stylesheet>
when applied to the first input, produces:
<?xml version="1.0" encoding="utf-8"?>
<detaileddescription>
<para>
<modified>
<para>yesterday</para>
</modified>
</para>
</detaileddescription>

XSL: Find text not currently in elements and wrap?

Given the following fragment:
<recipe>
get a bowl
<ingredient>flour</ingredient>
<ingredient>milk</ingredient>
mix it all together!
</recipe>
How can I match "get a bowl" and "mix it all together!" and wrap them in another element to create the following?
<recipe>
<action>get a bowl</action>
<ingredient>flour</ingredient>
<ingredient>milk</ingredient>
<action>mix it all together!</action>
</recipe>
You could define a template matching text nodes that are a direct child of recipe:
<xsl:template match="recipe/text()">
<action><xsl:value-of select="normalize-space()" /></action>
</xsl:template>
Full XSLT example:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:strip-space elements="*" />
<xsl:output method="xml" indent="yes" />
<xsl:template match="#*|node()">
<xsl:copy><xsl:apply-templates select="#*|node()" /></xsl:copy>
</xsl:template>
<xsl:template match="recipe/text()">
<action><xsl:value-of select="normalize-space()" /></action>
</xsl:template>
</xsl:stylesheet>
Note that the normalize-space() is required, even with the xsl:strip-space - that only affects text nodes that contain only whitespace, it doesn't strip off leading and trailing space from nodes that contain any non-whitespace characters. If you had
<action><xsl:value-of select="." /></action>
then the result would be something like
<recipe>
<action>
get a bowl
</action>
<ingredient>flour</ingredient>
<ingredient>milk</ingredient>
<action>
mix it all together!
</action>
</recipe>

How to detect a leading space in the text

I want to add some strings in the text right after a leading space. Any ideas how to detect the leading space? Thanks.
For example, I would like to add "def" in front of abc but after the leading space.
<AAA>
<CCC> abc</CCC>
</AAA>
Output should become: " defabc"
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()[starts-with(.,' ')]">
<xsl:value-of select=
"concat(' ', 'def', substring(.,2))"/>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<AAA>
<CCC> abc</CCC>
</AAA>
produces the wanted, correct result:
<AAA>
<CCC> defabc</CCC>
</AAA>
Assuming, from your tag, that you are trying to do this in xslt, I'd use XSLT's starts-with function.
If you provide some example XSLT code, it'd be easier to explain more.
Besides Dimitre's answer with proper use of pattern matching, this XPath expression could help you:
concat(substring($AddString, 1 div starts-with($String,' ')), $String)