Replace special characters in XSLT - xslt

I want to remove characters other than alphabets from a string in XSLT. For example
<Name>O'Niel</Name> = <Name>ONiel</Name>
<Name>St Peter</Name> = <Name>StPeter</Name>
<Name>A.David</Name> = <Name>ADavid</Name>
Can we use Regular Expression in XSLT to do this? Which is right way to implement this?
EDIT: This needs to done on XSLT 1.0.

There is a pure XSLT way to do this.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:variable name="vAllowedSymbols"
select="'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'"/>
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="node() | #*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="
translate(
.,
translate(., $vAllowedSymbols, ''),
''
)
"/>
</xsl:template>
</xsl:stylesheet>
Result against this sample:
<t>
<Name>O'Niel</Name>
<Name>St Peter</Name>
<Name>A.David</Name>
</t>
Will be:
<t>
<Name>ONiel</Name>
<Name>StPeter</Name>
<Name>ADavid</Name>
</t>

Here's a 2.0 option:
EDIT: Sorry...the 1.0 requirement was added after I started on my answer.
XML
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<Name>O'Niel</Name>
<Name>St Peter</Name>
<Name>A.David</Name>
</doc>
XSLT 2.0
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="replace(.,'[^a-zA-Z]','')"/>
</xsl:template>
</xsl:stylesheet>
Output
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<Name>ONiel</Name>
<Name>StPeter</Name>
<Name>ADavid</Name>
</doc>
Here are a couple more ways of using replace()...
Using "i" (case-insensitive mode) flag:
replace(.,'[^A-Z]','','i')
Using category escapes:
replace(.,'\P{L}','')

I just created a function based on the code in this example...
<xsl:function name="lancet:stripSpecialChars">
<xsl:param name="string" />
<xsl:variable name="AllowedSymbols" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789()*%$##!~<>,.?[]=- + /\ '"/>
<xsl:value-of select="
translate(
$string,
translate($string, $AllowedSymbols, ''),
' ')
"/>
</xsl:function>
and an example of the usage would be as follows:
<xsl:value-of select="lancet:stripSpecialChars($string)"/>

quickest way is <xsl:value-of select="translate(Name,translate(Name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',''),'')" />
the inner translate removes the alphabets (the needed characters). The result of that translate leaves other characters. the outer translate removes those unwanted characters

Related

xsl:strip-space combined with xsl:text messes up automatic indentation

Using XSLT 1.0 I would like to comment out certain XML elements and replace other XML elements, while keeping the XML nicely formatted.
For example, the following XML document
<doc>
<e1>foo</e1>
<e2>bar</e2>
</doc>
should be converted to
<doc>
<!--<e1>foo</e1>-->
<e3>foobar</e3>
<e4>foobar</e4>
</doc>
I am using the following XSL transformation and xsltproc for testing it:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*" />
<xsl:output indent="yes" />
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*" />
</xsl:copy>
</xsl:template>
<xsl:template match="e1">
<xsl:text disable-output-escaping="yes"><!--</xsl:text> <!--*-->
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
<xsl:text disable-output-escaping="yes">--></xsl:text> <!--*-->
</xsl:template>
<xsl:template match="e2">
<e3>foobar</e3><e4>foobar</e4>
</xsl:template>
</xsl:stylesheet>
But what I get is this:
<doc><!--<e1>foo</e1>--><e3>foobar</e3><e4>foobar</e4></doc>
The problem seems to be caused by the lines marked with '*' in my transformation; more specifically from inserting <!-- and -->. When I remove these two elements, the result is indented as expected.
Is there a way to wrap elements in comments while still keeping the output document nicely formatted?
Try whether outputting a comment with the serialization of the element, as in
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:import href="http://lenzconsulting.com/xml-to-string/xml-to-string.xsl"/>
<xsl:strip-space elements="*"/>
<xsl:output indent="yes"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*" />
</xsl:copy>
</xsl:template>
<xsl:template match="e1">
<xsl:comment>
<xsl:call-template name="xml-to-string"></xsl:call-template>
</xsl:comment>
</xsl:template>
<xsl:template match="e2">
<e3>foobar</e3><e4>foobar</e4>
</xsl:template>
</xsl:stylesheet>
gives you a better result.

XSLT regular expression to remove sequences text

I have an XML, something like this:
<?xml version="1.0" encoding="UTF-8"?>
<earth>
<computer>
<parts>;;remove;;This should stay;;remove too;;This stay;;yeah also remove;;this stay </parts>
</computer>
</earth>
I want to create an XSLT 2.0 transform to remove all text which starts and ends with ;;
<?xml version="1.0" encoding="utf-8"?>
<earth>
<computer>
<parts>This should stay This stay this stay </parts>
</computer>
</earth>
Try to do something like this but no luck:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="fn">
<xsl:output encoding="utf-8" method="xml" indent="yes" />
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="parts">
<xsl:element name="parts" >
<xsl:value-of select="replace(., ';;.*;;','')" />
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Wow, what a dumb way to markup text. You have XML at your disposal, why not use it? And even if marking this way, why not use different symbols for opening and closing the marked parts?
Anyway, I believe this returns the expected result:
XSLT 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="parts">
<xsl:copy>
<xsl:value-of select="replace(., ';;.+?;;', '')" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Another approach would be tokenize on ";;" as separator, then remove all even-numbered tokens:
<xsl:template match="parts">
<parts>
<xsl:value-of select="tokenize(.,';;')[position() mod 2 = 1]"
separator=""/>
</parts>
</xsl:template>
XSLT 1.0
For this kind of thing I'd use recursion. Just using string replace you can get what is before and after a certain character (or set of characters). All you need to do is continually loop over the string until there are no more occurrences of the replace character, like follows:
<xsl:template name="string-remove-between">
<xsl:param name="text" />
<xsl:param name="remove" />
<xsl:choose>
<xsl:when test="contains($text, $remove)">
<xsl:value-of select="substring-before($text,$remove)" />
<xsl:call-template name="string-remove-between">
<xsl:with-param name="text" select="substring-after(substring-after($text,$remove), $remove)" />
<xsl:with-param name="remove" select="$remove" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Then you'd just call the template with your text and the section you want to remove:
<xsl:call-template name="string-remove-between">
<xsl:with-param name="text" select="parts"/>
<xsl:with-param name="remove">;;</xsl:with-param>
</xsl:call-template>
Note that there are two substring-after calls, this makes sure we get the second instance of the replace characters ';;' so we aren't pulling in the text between.

Change the value of a tag only if another tag has an specific value

I'm using this xsl to change two tags of some xml
xsl
xsltproc - "filename" << EOF
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="root/attr1/text()">
<xsl:text>new-text</xsl:text>
</xsl:template>
<xsl:template match="root/group1/attr1/text()">
<xsl:text>another-new-text</xsl:text>
</xsl:template>
</xsl:stylesheet>
EOF
xml
<root>
<attr1>someold</attr1>
<group1>
<attr1>anotherold</attr1>
</group1>
<attr2>0</attr2>
</root>
output
<root>
<attr1>new-text</attr1>
<group1>
<attr1>another-new-text</attr1>
</group1>
<attr2>0</attr2>
</root>
This xsl works great for my needs but now I need to validate attr2 before the transformation. If attr2 is 0 I need to change, otherwise I should leave the old value.
I have hundreds of xml to convert, each one with hundreds of lines, because of this I'm looking for an automatic way to validate. I tried xsl:if but couldn't figure out where to place the tag and how to build the test attribute.
How to change the value of a tag only if another tag has an specific value? Other improvements on the xsl are also welcome.
You can add conditions in match patterns, e.g. <xsl:template match="root[attr2 = 0]/attr1/text()">...</xsl:match> and/or <xsl:template match="root[attr2 = 0]/group1/attr1/text()">.
you can take the attr2 as variable and use the variable to validate your conditions.....
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:variable name="attr2" select="root/attr2 "/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="root/attr1/text()">
<xsl:value-of select="."/>
</xsl:when>
<xsl:otherwise>
<xsl:text>new-text</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="root/group1/attr1/text()">
<xsl:text>another-new-text</xsl:text>
</xsl:template>

regex error in XSL

Regex should verify such words a23-abcefghijk
<xsl:variable name="myRegex">
<xsl:value-of select="a\\d{2}\\-[a-zA-Z]+" />
</xsl:variable>
but I am getting syntax error I tried escape characters but didnt find any solution yet
The value of your variable needs to be a string, so you need to quote it inside the select...
<xsl:variable name="myRegex" select="'a\d{2}-[a-zA-Z]+'"/>
As it is now, the processor is trying to evaluate your select as an XPath expression.
Also note that if you're using XSLT 1.0, you're going to have to use an extension function (and a processor that supports it).
Here's a 2.0 example...
XML Input
<doc>
<test>a23-abcefghijk</test>
<test>qwerty</test>
</doc>
XSLT 2.0
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="myRegex" select="'a\d{2}-[a-zA-Z]+'"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="test">
<test matches="{if (matches(.,$myRegex)) then 'yes' else 'no'}">
<xsl:value-of select="."/>
</test>
</xsl:template>
</xsl:stylesheet>
XML Output
<doc>
<test matches="yes">a23-abcefghijk</test>
<test matches="no">qwerty</test>
</doc>
<xsl:variable name="myRegex">
<xsl:value-of select="regExp:match('This is a test string', '(a\d{2}\-[a-zA-Z]+)', 'g'" />
</xsl:variable>

Removing empty tags from XML via XSLT

I had an xml of the following pattern
<?xml version="1.0" encoding="UTF-8"?>
<Person>
<FirstName>Ahmed</FirstName>
<MiddleName/>
<LastName>Aboulnaga</LastName>
<CompanyInfo>
<CompanyName>IPN Web</CompanyName>
<Title/>
<Role></Role>
<Department>
</Department>
</CompanyInfo>
</Person>
I used the following xslt (got from forums) in my attempt to remove empty tags
<xsl:template match="#*|node()">
<xsl:if test=". != '' or ./#* != ''">
<xsl:copy>
<xsl:copy-of select = "#*"/>
<xsl:apply-templates />
</xsl:copy>
</xsl:if>
The xslt used is successful in removing tags like
<Title/>
<Role></Role>
...but fails when empty tags are on two lines, eg:
<Department>
</Department>
Is there any fix for this?
This transformation doesn't need any conditional XSLT instructions at all and uses no explicit priorities:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"*[not(#*|*|comment()|processing-instruction())
and normalize-space()=''
]"/>
</xsl:stylesheet>
When applied on the provided XML document:
<Person>
<FirstName>Ahmed</FirstName>
<MiddleName/>
<LastName>Aboulnaga</LastName>
<CompanyInfo>
<CompanyName>IPN Web</CompanyName>
<Title/>
<Role></Role>
<Department>
</Department>
</CompanyInfo>
</Person>
it produces the wanted, correct result:
<Person>
<FirstName>Ahmed</FirstName>
<LastName>Aboulnaga</LastName>
<CompanyInfo>
<CompanyName>IPN Web</CompanyName>
</CompanyInfo>
</Person>
<xsl:template match="#*|node()">
<xsl:if test="normalize-space(.) != '' or ./#* != ''">
<xsl:copy>
<xsl:copy-of select = "#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:if>
</xsl:template>
(..) Is there any fix for this?
The tag on two lines is not an empty tag. It is a tag containing spaces inside (like new lines and possibly some kind of white space characters). The XPath 1.0 function normalize-space() allows you to normalize the content of your tags by stripping unwanted new lines.
Once you have applied the function to the tag content you can then check for the empty string. A good way to do this is by applying the XPath 1.0 boolean() function to the tag content. If the content is a zero-length string its result will be false.
Finally you can embed everything slightly changing your identity transform. You do not need xsl:if instructions or any other additional template.
The final transform will look like this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates
select="node()[boolean(normalize-space())]
|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Additional note
Your xsl:if instruction is currently checking also for empty attributes. In that way you are actually removing also non-empy tags with empty attributes. It does not sound like just "Removing empty tags". So be careful, or you question is missing some detail, or you are using unsafe code.
Your question is underspecified. What does empty mean? Is <outer> empty here?
<outer><inner/></outer>
Anyway, here's one approach that might fit your bill:
<xsl:template match="*[not(.//#*) and not( normalize-space() )]" priority="3"/>
Note you might have to tweak the priority to fit your needs.
From what I have found on the net, this is the most correct answer:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>
<xsl:template match="/">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="*">
<xsl:if test=".!=''">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
You can use the following xslt to remove empty tags/attributes:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="node()">
<xsl:if test="normalize-space(string(.)) != ''
or count(#*[normalize-space(string(.)) != '']) > 0
or count(descendant::*[normalize-space(string(.)) != '']) > 0
or count(descendant::*/#*[normalize-space(string(.)) != '']) > 0">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:if>
</xsl:template>
<xsl:template match="#*">
<xsl:if test="normalize-space(string(.)) != ''">
<xsl:copy>
<xsl:apply-templates select="#*" />
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>