XSLT-1.0 to read a specific substring from a message - xslt

I've below message in a variable .
7c
 {"code":3001,"message":"issued"}
 0
I would like to take the message starting with '{' and ending with '}' using XSLT. I tried using sub-string() and starts-with functions, but without success.
My final out put should be
{"code":3001,"message":"issued"}

In XSLT 2.0 you could use analyze-string with matching-substring inside, to process the captured regex.
Let's move to an example. Start with a source XML given below:
<?xml version="1.0" encoding="UTF-8"?>
<main>
<message>7c {"code":3001,"message":"issued"} 0</message>
</main>
Then we can use such XSLT:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="message">
<xsl:copy>
<xsl:analyze-string select="." regex="\{{(.*)\}}">
<xsl:matching-substring>
<xsl:value-of select="regex-group(1)"/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:copy>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy><xsl:apply-templates select="#*|node()"/></xsl:copy>
</xsl:template>
</xsl:transform>
Note the content of regex attribute.
In XSLT curly braces must be doubled in order to tell them apart from
an attribute value template.
But these curly braces are here literal curly braces, i.e. we are looking
just for { and } chars (they are not here as delimiters of repetition
counts for the preceding regex). For this reason we have to precede
each of them with a backslash.
Between these curly braces we have a capturing group (...).
We refer to the content of the captured group in regex-group(1) below.
If you need, you can put more capturing groups in the regex, to capture
individual parts of the message and then make some use of them.
But if you are really limited to XSLT 1.0 you can:
Start from substring-before to cut off } and everything after.
Then use substring-after to cut off { and everything before.
Or maybe you need the text with surrounding curly braces?
Then use concat to prepend { and append }.

I tried substring-after and before which give me text after '{' and
before '}'
If you're using XSLT 1.0, then do exactly that, and add the missing separators as text - for example:
<xsl:variable name="var">7c {"code":3001,"message":"issued"} 0 </xsl:variable>
<xsl:text>{</xsl:text>
<xsl:value-of select="substring-before(substring-after($var, '{'), '}')"/>
<xsl:text>}</xsl:text>
returns:
{"code":3001,"message":"issued"}
In XSLT 2.0, you could do simply:
<xsl:value-of select="replace($var, '.*(\{.*\}).*', '$1')"/>
to get the same result.

Related

How to remove last N character using XSLT

I have following code
<xsl:value-of select=concat(substring(DBColumn, string-length(DBColumn)-3),concat('-',DBColumn))
It results me
230-Virginia-230.
I want it as 230-Virginia.
Originally in database it is as Virginia-230
Furthermore
ABC, 230-Virginia
How to trim whitespace in the same mentioned code so that it should look like as follow ABC,230-Virginia
It's not clear what exactly your question is.
To answer the question as stated in your title: you can remove the last N characters from a string using:
substring($string, 1, string-length($string) - $N)
Trying to illustrate with an input document that contains the data that you mentioned:
<input>
<DBColumn>Virginia-230</DBColumn>
<other>ABC </other> <!-- N.B. trailing space -->
</input>
This XSLT 3.0 stylesheet does some of the things that you mentioned in the "proposed value". I've also included the input value and the "old-value" with the value-of expression that you mentioned in your post.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
exclude-result-prefixes="#all">
<xsl:output method="xml" indent="yes" />
<xsl:template match="/input">
<output>
<input-value><xsl:value-of select="DBColumn" /></input-value>
<old-output-value>
<xsl:value-of
select="concat(substring(DBColumn, string-length(DBColumn)-3),
concat('-', DBColumn))"/>
</old-output-value>
<proposed-value>
<xsl:value-of
select="normalize-space(other)
|| ',' ||
string-join(reverse(tokenize(DBColumn, '-')), '-')"
/>
</proposed-value>
</output>
</xsl:template>
</xsl:stylesheet>
which produces:
<output>
<input-value>Virginia-230</input-value>
<old-output-value>-230-Virginia-230</old-output-value>
<proposed-value>ABC,230-Virginia</proposed-value>
</output>
For an xsl:value-of() that I believe works in XSLT1.0 (but I won't guarantee), you could try:
<xsl:value-of
select="concat(other, ',',
substring-after(DBColumn, '-'),
'-',
substring-before(DBColumn, '-'))" />
which does not address the trailing space in other but at least suggests how to reverse the two values around the '-' char in DBColumn.
For suggestions on removing leading/trailing spaces on string, see: XSLT 1.0 to remove leading and trailing spaces

Can't get the "s" flag to work in regex in Saxon 9.5

I have an XML envelope/payload structure like this:
<RootEnvelopeTag>
<EnvelopeTag />
<EnvelopeTag />
<EnvelopeTagContainingPayload>
<WantedPayloadTag>Some text and nested tags.</WantedPayloadTag><UnwantedPayloadTag>Lots of text and nested tags.</UnwantedPayloadTag>
</EnvelopeTagContainingPayload>
</RootEnvelopeTag>
To extract the payload, by removing all envelope elements, I use the following XSLT:
<xsl:transform version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output method="text" encoding="utf-8"/>
<xsl:template match="/">
<xsl:apply-templates select="*/EnvelopeTagContainingPayload"/>
</xsl:template>
<xsl:template match="EnvelopeTagContainingPayload">
<xsl:value-of select="."/>
</xsl:template>
</xsl:transform>
The result is a new text file that, once parsed as XML, allows me to work only with the payload XML.
This works fine in both Saxon HE 9.5, and AltovaXML 2013. However, I am now in the need to also remove part of the payload, specifically, one element, including the tags and all of its content (the <UnwantedPayloadTag>ALL TEXT IN BETWEEN</UnwantedPayloadTag>).
Since, in the original XML file, the payload is just a string, I use replace() with a regular expression that matches the unwanted element and the empty string as replacement string. I include the "s" flag, to get the "." in the regex to match newlines present within the unwanted element. So, the template for the container envelope element changes to:
<xsl:template match="EnvelopeTagContainingPayload">
<xsl:variable name="removeUnwanted" as="xs:string" select="replace(., '<UnwantedPayloadTag.*UnwantedPayloadTag>', '', 's')" />
<xsl:value-of select="$removeUnwanted"/>
</xsl:template>
In AltovaXML, this works seamlessly. The result is exactly as expected. But in Saxon, it wreaks havoc. No output is generated; instead, I get in the command line an endless repetition of the following error message that clutters the whole DOS command line window:
at net.sf.saxon.regex.Operation$OpStar.exec(Operation.java:235)
at net.sf.saxon.regex.REMatcher.matchNodes(REMatcher.java:413)
The problem appears only when I use the "s" flag. But if I drop it, I won't get the match. I tried an alternative that doesn't require the flag and does the same:
<xsl:variable name="removeUnwanted" as="xs:string" select="replace(., '<UnwantedPayloadTag[\s\S]*UnwantedPayloadTag>', '')" />
But I get the same error on Saxon. And again, Altova gets it right. I'm unsure of whether the problem is on my code, since it works fine in Altova. But I would really like to get this to work in Saxon, as well. So, what's wrong?
As Saxon 9.6 is now available and even the Home Edition HE supports XPath 3.0 functions like parse-xml-fragment the right approach to your problem is now doing
<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output method="xml" encoding="utf-8"/>
<xsl:template match="/">
<xsl:apply-templates select="*/EnvelopeTagContainingPayload"/>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* , node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="EnvelopeTagContainingPayload">
<xsl:apply-templates select="parse-xml-fragment(.)"/>
</xsl:template>
<xsl:template match="UnwantedPayloadTag"/>
</xsl:transform>
as that way you simply parse the markup as XML and then use templates to filter out any elements you don't want.
You're getting a stack overflow in the Saxon regular expression engine because there's too much backtracking. We've got a fix for that in the future 9.6 release, but in the meantime you need to be careful about regular expressions that do too much backtracking.
Really, your approach is wrong. Regular expressions should not be used to parse XML. Your expression is wrong, because it can match things that it shouldn't match, e.g. something in a comment that looks like an end tag. You can't get it right by tweaking the regex, because XML has a recursive grammar and regular expressions can't handle recursive grammars. Saxon provides parse-xml() for this purpose.

Escaping Double Quotes, Space and Allowing for an Extra Forward Slash

I have XML
<?xml version="1.0" encoding="UTF-8"?>
<icestats>
<stats_connections>0</stats_connections>
<source mount="/live">
<bitrate>Some data</bitrate>
<server_description>This is what I want to return</server_description>
</source>
</icestats>
And I have XSL
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:copy-of select="/icestats/source mount="/live"/server_description/node()" />
</xsl:template>
</xsl:stylesheet>
I want the output
This is what I want to return
If I remove the double quotes, space and forward slash from the source it works, but I haven't been able to successfully escape the non standard characters yet using suggested methods in other posts.
For clarity, below is the solution thanks to Lego Stormtroopr
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:copy-of select="/icestats/source[#mount='/live']/server_description/node()" />
</xsl:template>
</xsl:stylesheet>
There are a couple of issues you will need to resolve before your processor will produce the output you're looking for.
1) Your XML input must be made well-formed. The closing tag of the source element should not include the mount attribute that is specified on the opening tag.
<source mount="/live">
...
</source>
2) The XPath on your xsl:copy-of element must be made valid. The syntax for an XPath expression is (fortunately) not like the syntax for XML elements and attributes. Specifying which source element to match is done by predicating on an attribute value, like you have done, except that you need to use square brackets:
/icestats/source[#mount="/live"]/server_description
In order to use this XPath expression in an XSLT select statement, you will need to make sure that you enclose the entire select attribute value with one type of quotes, and use the other type of quotes within the attribute value, e.g.:
<xsl:value-of select="/icestats/source[#mount='/live']/server_description" />
With This input
<?xml version="1.0" encoding="UTF-8"?>
<icestats>
<stats_connections>0</stats_connections>
<source mount="/live">
<bitrate>Some data</bitrate>
<server_description>This is what I want to return</server_description>
</source>
</icestats>
and this stylesheet
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="/icestats/source[#mount='/live']/server_description" />
</xsl:template>
</xsl:stylesheet>
I get the following line of text from xsltproc and saxon:
This is what I want to return
The xsl:value-of element will return the string value of an element (here, that one text node). If you actually wanted the server_description element, then you can use xsl:copy-of to get the whole thing, tags and all. (You would have to update xsl:output as well.)
It looks like you are doing a select based on the attribute, so you just need to properly capture the attribute in the XPath. The quotes you use in the document and the XPath don't need to match, so you can switch them to single quotes ('):
<xsl:copy-of select="/icestats/source[#mount='/live']/server_description/node()" />
(Edited to correct the the missing / from the mount attribute.)
Also, your original document isn't valid XML, as XML doesn't allow attributes in the closing tag.
I think all you need to do is escape the quotes in the attribute string with ":
<xsl:copy-of select="/icestats/source mount="/live"/server_description/node()" />

whitespace URL in XSLT

I have a xslt showing no whitespace as characters.
In this case show only %.
URL:
http://localhost:8888/tire/details/Bridgestone/ECOPIA%EP001S/Bridgestone,ECOPIA%EP001S,195--65%R15%91H,TL,ECO,0
XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:x="http://www.w3.org/1999/xhtml" version="1.0">
<xsl:param name="extractorHost" />
<xsl:template match="/">
<links>
<xsl:apply-templates />
</links>
</xsl:template>
<xsl:template match="//x:form/x:a[#class='arrow-link forward']">
<xsl:variable name="url" select="translate(#href, ' ', '%20')"/>
<link href="{concat($extractorHost, $url)}" />
</xsl:template>
<xsl:template match="text()" />
</xsl:stylesheet>
The correct URL should be:
http://localhost:8888/tire/details/Bridgestone/ECOPIA%20EP001S/Bridgestone,ECOPIA%20EP001S,195--65%20R15%2091H,TL,ECO,0
Is it wrong XSLT formed?. Thanks.
The XPath translate function doesn't work the way you think it does. That is, it is not a replace-string function.
It maps individual characters from one list to the corresponding characters in the other list.
So this,
translate(#href, ' ', '%20')
means, translate a space into %. The 20 part of the third argument is ignored.
Take a look here: XSLT string replace
You can use already existing templates that will let you use "replace" function.

not adding new line in my XSLT

I am not certain why my xslt won't put a new line in my output...
This is my xslt....
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
<xsl:output method="text" encoding="iso-8859-1"/>
<xsl:variable name="newline"></xsl:variable>
<xsl:template name="FairWarningTransform" match="/"> <!--#* | node()">-->
<xsl:for-each select="//SelectFairWarningInformationResult">
<xsl:value-of select="ApplicationID"/>,<xsl:value-of select="USERID"/>
</xsl:for-each>
* Note. This report outlines Fair warning entries into reported for the above time frame.
</xsl:template>
</xsl:stylesheet>
Here is my output...
1,TEST1,test2,
I want it to look like...
1,TEST
1,test2,
Why isn't this character
creating a newline
Try replacing
with
<xsl:text>
</xsl:text>
That helps XSLT distinguish it from other whitespace in your stylesheet that is part of the stylesheet formatting (not part of the desired output).
XSLT's default behavior is to ignore any text nodes in the stylesheet that are entirely whitespace (this is true even if some of the whitespace is encoded as entities like
), except for text inside <xsl:text>, which is preserved.
I suggest replacing these lines:
<xsl:value-of select="ApplicationID"/>,<xsl:value-of select="USERID"/>
with this:
<xsl:value-of select="concat(ApplicationID, ',', USERID, '
')"/>
That way the newline should be ensured to be included in the output.
Try using this as your newline instead of the escaped character:
<xsl:text>
</xsl:text>