Regex applied to Replace function on XSLT 3 not working - xslt

I believe this is a simple problem, I'm trying to apply a Regex to my replace method in a variable in XSLT 3 (I'm also using Saxon (latest version)). I know it is possible to use replace with a regex but it seems that the regex I'm trying to use is wrong, it works in Java but not there on XSLT.
Here is my variable with the replace method:
<xsl:variable name="namePrefix" select="replace(#name, '/(.*_[^_]+)/')" />
I want this variable namePrefix to return me an specific part of the name of my Node (found under the attribute #name), here is an Name Example:
ALBA_MASTER_FIX_Test
I want this regex and replace methode to return to my variable everything before the last _.
I would like to know wheter I'm applying the regex correctly? Or if I should do it in a different way or use a different regex. Thx :)

everything before the last _.
I believe that could be simply:
replace(#name, '_[^_]*$', '')
Alternatively, you could use something like:
<xsl:value-of select="tokenize(#name, '_')[position() lt last()]" separator="_"/>

This works in the very useful xslt3fiddle based upon SaxonJS:
replace(#name, '(.*)_(.*)', '$1')
Can see used in this stylesheet
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all">
<xsl:output method="xml" indent="yes" />
<xsl:variable name="inputElement" as="element()" >
<input name="ALBA_MASTER_FIX_Test" />
</xsl:variable>
<xsl:template name="xsl:initial-template">
<output value="{replace($inputElement/#name, '(.*)_(.*)', '$1')}" />
</xsl:template>
</xsl:stylesheet>
with output
<output value="ALBA_MASTER_FIX"/>

Related

Can't get the "s" flag to work in regex in Saxon 9.5

I have an XML envelope/payload structure like this:
<RootEnvelopeTag>
<EnvelopeTag />
<EnvelopeTag />
<EnvelopeTagContainingPayload>
<WantedPayloadTag>Some text and nested tags.</WantedPayloadTag><UnwantedPayloadTag>Lots of text and nested tags.</UnwantedPayloadTag>
</EnvelopeTagContainingPayload>
</RootEnvelopeTag>
To extract the payload, by removing all envelope elements, I use the following XSLT:
<xsl:transform version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output method="text" encoding="utf-8"/>
<xsl:template match="/">
<xsl:apply-templates select="*/EnvelopeTagContainingPayload"/>
</xsl:template>
<xsl:template match="EnvelopeTagContainingPayload">
<xsl:value-of select="."/>
</xsl:template>
</xsl:transform>
The result is a new text file that, once parsed as XML, allows me to work only with the payload XML.
This works fine in both Saxon HE 9.5, and AltovaXML 2013. However, I am now in the need to also remove part of the payload, specifically, one element, including the tags and all of its content (the <UnwantedPayloadTag>ALL TEXT IN BETWEEN</UnwantedPayloadTag>).
Since, in the original XML file, the payload is just a string, I use replace() with a regular expression that matches the unwanted element and the empty string as replacement string. I include the "s" flag, to get the "." in the regex to match newlines present within the unwanted element. So, the template for the container envelope element changes to:
<xsl:template match="EnvelopeTagContainingPayload">
<xsl:variable name="removeUnwanted" as="xs:string" select="replace(., '<UnwantedPayloadTag.*UnwantedPayloadTag>', '', 's')" />
<xsl:value-of select="$removeUnwanted"/>
</xsl:template>
In AltovaXML, this works seamlessly. The result is exactly as expected. But in Saxon, it wreaks havoc. No output is generated; instead, I get in the command line an endless repetition of the following error message that clutters the whole DOS command line window:
at net.sf.saxon.regex.Operation$OpStar.exec(Operation.java:235)
at net.sf.saxon.regex.REMatcher.matchNodes(REMatcher.java:413)
The problem appears only when I use the "s" flag. But if I drop it, I won't get the match. I tried an alternative that doesn't require the flag and does the same:
<xsl:variable name="removeUnwanted" as="xs:string" select="replace(., '<UnwantedPayloadTag[\s\S]*UnwantedPayloadTag>', '')" />
But I get the same error on Saxon. And again, Altova gets it right. I'm unsure of whether the problem is on my code, since it works fine in Altova. But I would really like to get this to work in Saxon, as well. So, what's wrong?
As Saxon 9.6 is now available and even the Home Edition HE supports XPath 3.0 functions like parse-xml-fragment the right approach to your problem is now doing
<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output method="xml" encoding="utf-8"/>
<xsl:template match="/">
<xsl:apply-templates select="*/EnvelopeTagContainingPayload"/>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* , node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="EnvelopeTagContainingPayload">
<xsl:apply-templates select="parse-xml-fragment(.)"/>
</xsl:template>
<xsl:template match="UnwantedPayloadTag"/>
</xsl:transform>
as that way you simply parse the markup as XML and then use templates to filter out any elements you don't want.
You're getting a stack overflow in the Saxon regular expression engine because there's too much backtracking. We've got a fix for that in the future 9.6 release, but in the meantime you need to be careful about regular expressions that do too much backtracking.
Really, your approach is wrong. Regular expressions should not be used to parse XML. Your expression is wrong, because it can match things that it shouldn't match, e.g. something in a comment that looks like an end tag. You can't get it right by tweaking the regex, because XML has a recursive grammar and regular expressions can't handle recursive grammars. Saxon provides parse-xml() for this purpose.

How to use an xsl variable in a javascript block

I am trying to use an xsl variable in a javascript block in my xslt file and I am at my wit's end.
Here is the XSLT (edited for public consumption):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" indent="yes"/>
<xsl:template match="a">
<xsl:variable name="myVar" select="xpath to the node"/>
<script type='text/javascript'>
googletag.cmd.push(function() {
...
googletag.pubads().setTargeting('label', '<xsl:value-of select="$myVar"/>');
googletag.enableServices();
});
</script>
</xsl:template>
</xsl:stylesheet>
If I transform the XML in Oxygen, this code works fine. But when I run it through my servlet, which uses javax.xml.transform.Transformer.transform(Source xmlSource, Result outputTarget) throws TransformerException, I get this:
googletag.pubads().setTargeting('label', ' ');
Can anyone suggest a possible reason for this discrepancy between Oxygen and my servlet?
Try this instead as contents of your variable:
<xsl:variable name="myVar" select="'xpath to the node'"/>
This ensures the contents is seen as a literal string. Possibly Oxygen is forgiving enough to ignore it and your Java is (correctly) not -- it's an error, because inside the quoted string you should put an expression.
The outer double quotes do not take part in forming the expression, only what's inside. That way you can express either the string '1+2' or the sum 1+2 in a variable.

Escaping Double Quotes, Space and Allowing for an Extra Forward Slash

I have XML
<?xml version="1.0" encoding="UTF-8"?>
<icestats>
<stats_connections>0</stats_connections>
<source mount="/live">
<bitrate>Some data</bitrate>
<server_description>This is what I want to return</server_description>
</source>
</icestats>
And I have XSL
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:copy-of select="/icestats/source mount="/live"/server_description/node()" />
</xsl:template>
</xsl:stylesheet>
I want the output
This is what I want to return
If I remove the double quotes, space and forward slash from the source it works, but I haven't been able to successfully escape the non standard characters yet using suggested methods in other posts.
For clarity, below is the solution thanks to Lego Stormtroopr
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:copy-of select="/icestats/source[#mount='/live']/server_description/node()" />
</xsl:template>
</xsl:stylesheet>
There are a couple of issues you will need to resolve before your processor will produce the output you're looking for.
1) Your XML input must be made well-formed. The closing tag of the source element should not include the mount attribute that is specified on the opening tag.
<source mount="/live">
...
</source>
2) The XPath on your xsl:copy-of element must be made valid. The syntax for an XPath expression is (fortunately) not like the syntax for XML elements and attributes. Specifying which source element to match is done by predicating on an attribute value, like you have done, except that you need to use square brackets:
/icestats/source[#mount="/live"]/server_description
In order to use this XPath expression in an XSLT select statement, you will need to make sure that you enclose the entire select attribute value with one type of quotes, and use the other type of quotes within the attribute value, e.g.:
<xsl:value-of select="/icestats/source[#mount='/live']/server_description" />
With This input
<?xml version="1.0" encoding="UTF-8"?>
<icestats>
<stats_connections>0</stats_connections>
<source mount="/live">
<bitrate>Some data</bitrate>
<server_description>This is what I want to return</server_description>
</source>
</icestats>
and this stylesheet
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="/icestats/source[#mount='/live']/server_description" />
</xsl:template>
</xsl:stylesheet>
I get the following line of text from xsltproc and saxon:
This is what I want to return
The xsl:value-of element will return the string value of an element (here, that one text node). If you actually wanted the server_description element, then you can use xsl:copy-of to get the whole thing, tags and all. (You would have to update xsl:output as well.)
It looks like you are doing a select based on the attribute, so you just need to properly capture the attribute in the XPath. The quotes you use in the document and the XPath don't need to match, so you can switch them to single quotes ('):
<xsl:copy-of select="/icestats/source[#mount='/live']/server_description/node()" />
(Edited to correct the the missing / from the mount attribute.)
Also, your original document isn't valid XML, as XML doesn't allow attributes in the closing tag.
I think all you need to do is escape the quotes in the attribute string with ":
<xsl:copy-of select="/icestats/source mount="/live"/server_description/node()" />

whitespace URL in XSLT

I have a xslt showing no whitespace as characters.
In this case show only %.
URL:
http://localhost:8888/tire/details/Bridgestone/ECOPIA%EP001S/Bridgestone,ECOPIA%EP001S,195--65%R15%91H,TL,ECO,0
XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:x="http://www.w3.org/1999/xhtml" version="1.0">
<xsl:param name="extractorHost" />
<xsl:template match="/">
<links>
<xsl:apply-templates />
</links>
</xsl:template>
<xsl:template match="//x:form/x:a[#class='arrow-link forward']">
<xsl:variable name="url" select="translate(#href, ' ', '%20')"/>
<link href="{concat($extractorHost, $url)}" />
</xsl:template>
<xsl:template match="text()" />
</xsl:stylesheet>
The correct URL should be:
http://localhost:8888/tire/details/Bridgestone/ECOPIA%20EP001S/Bridgestone,ECOPIA%20EP001S,195--65%20R15%2091H,TL,ECO,0
Is it wrong XSLT formed?. Thanks.
The XPath translate function doesn't work the way you think it does. That is, it is not a replace-string function.
It maps individual characters from one list to the corresponding characters in the other list.
So this,
translate(#href, ' ', '%20')
means, translate a space into %. The 20 part of the third argument is ignored.
Take a look here: XSLT string replace
You can use already existing templates that will let you use "replace" function.

Using Xsl:value-of is converting & in url into &

I have a xml something likethis
<root>
<testxml>
<details name="test" url="http://www.test.com/test.aspx?val=100&val2=200" />
</testxml>
<root>
When i transform this xml using xslt to another xml
<xsl:output method="html" indent="yes" encoding="UTF-8" />
<xsl:template match="root/testxml/details">
<convert>
<xsl:attribute name="url">
<xsl:value-of select="#url"/>
</xsl:attribute>
</convert>
I get the result output as this
<convert url="http://www.test.com/test.aspx?val=100&val2=200">
instead of
<convert url="http://www.test.com/test.aspx?val=100&val2=200">
issue here is: & in url is getting changed to & amp; how can i avoid this ,i want & in url as &.(not & amp;)
i tried <xsl:value-of disable-output-escaping="yes" select="#url" />
also but of not use.
could anyone please help me in this..
I get the result output as this
instead of
issue here is: & in url is getting
changed to & amp;
There is no issue and the URL is not changed at all:
To verify this use a transformation like this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/*">
<xsl:value-of select="#url"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the following XML document:
<convert url="http://www.test.com/test.aspx?val=100&val2=200"/>
The result is:
http://www.test.com/test.aspx?val=100&val2=200
So, the URL hasn't been changed in any way and there is still only a single & character in it.
What you observe is the mandatory escaping of some special characters (typically < and &) in XML, as dictated by the XML Spec.
The escaping of some special characters never alters the content of the string that is escaped -- only how it looks like when it is a part of an XML document.
how can i avoid this ,i want & in url
as &.(not & amp;)
You cannot, and as shown above, there is nothing to avoid.
& has to be escaped as & in a URL in an XML document. You can't have an unescaped ampersand like that.
Dimitre is quite correct - an ampersand in XML must always be escaped, and it's not clear why you're trying so hard to prevent it.
I'm puzzled though - you say you're generating XML, but you use the HTML output method, but you are generating a <convert> element which isn't defined in HTML. So I fear you are a little confused by something.