I'm trying to split a tab separated value string using substring-before I'm having a hard time because the XSLT mediator does not work as it should.
The xslt stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:p576="http://p576.test.ws" version="1.0">
<xsl:output encoding="UTF-8" method="xml" indent="yes"></xsl:output>
<xsl:param name="NAMESPACE"></xsl:param>
<xsl:param name="LOG_ID"></xsl:param>
<xsl:template match="/">
<xsl:element name="Response" namespace="{$NAMESPACE}">
<xsl:if test="$LOG_ID">
<xsl:element name="LogId" namespace="{$NAMESPACE}">
<xsl:value-of select="$LOG_ID"></xsl:value-of>
</xsl:element>
</xsl:if>
<xsl:element name="Text" namespace="{$NAMESPACE}">
<xsl:value-of select="substring-before(//p576:execMRPCResponse/p576:execMRPCReturn, ' ')"></xsl:value-of>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
The string:
999900418559 59 4730 Payment Created & Posted
Instead of cutting the string "999900418559" it cuts it here: "999900418559 59 4730 Payment" where a space is. The same result is achieved if you use substring-before '$#32;' meaning tab and space are converted to space when looking into the string.
Is there any way to keep this from happening? or if the problem is with the version of saxon used by wso2, should I update it (I'm using WSO2ESB v4.8.1)?
If tabs in the source document are being converted to spaces, then this is happening before Saxon gets to see the data. It could happen, for example, if the data is put through a schema validator and the type of the relevant element uses a whitespace facet of "collapse".
To prevent it happening, you first need to find out when and where it is happening, and there are no clues to that in your post (at least for someone who has never heard of WSO2ESB).
As a first diagnostic step, try to capture the exact form of the source document supplied as input to the transformation, and check the state of the whitespace it contains.
Having written that, I think there is another possibility, which is that the character reference in your stylesheet is being corrupted by whatever process it is that compiles the stylesheet. You could defend against that by replacing ' ' with codepoints-to-string(9).
Related
I have an XML envelope/payload structure like this:
<RootEnvelopeTag>
<EnvelopeTag />
<EnvelopeTag />
<EnvelopeTagContainingPayload>
<WantedPayloadTag>Some text and nested tags.</WantedPayloadTag><UnwantedPayloadTag>Lots of text and nested tags.</UnwantedPayloadTag>
</EnvelopeTagContainingPayload>
</RootEnvelopeTag>
To extract the payload, by removing all envelope elements, I use the following XSLT:
<xsl:transform version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output method="text" encoding="utf-8"/>
<xsl:template match="/">
<xsl:apply-templates select="*/EnvelopeTagContainingPayload"/>
</xsl:template>
<xsl:template match="EnvelopeTagContainingPayload">
<xsl:value-of select="."/>
</xsl:template>
</xsl:transform>
The result is a new text file that, once parsed as XML, allows me to work only with the payload XML.
This works fine in both Saxon HE 9.5, and AltovaXML 2013. However, I am now in the need to also remove part of the payload, specifically, one element, including the tags and all of its content (the <UnwantedPayloadTag>ALL TEXT IN BETWEEN</UnwantedPayloadTag>).
Since, in the original XML file, the payload is just a string, I use replace() with a regular expression that matches the unwanted element and the empty string as replacement string. I include the "s" flag, to get the "." in the regex to match newlines present within the unwanted element. So, the template for the container envelope element changes to:
<xsl:template match="EnvelopeTagContainingPayload">
<xsl:variable name="removeUnwanted" as="xs:string" select="replace(., '<UnwantedPayloadTag.*UnwantedPayloadTag>', '', 's')" />
<xsl:value-of select="$removeUnwanted"/>
</xsl:template>
In AltovaXML, this works seamlessly. The result is exactly as expected. But in Saxon, it wreaks havoc. No output is generated; instead, I get in the command line an endless repetition of the following error message that clutters the whole DOS command line window:
at net.sf.saxon.regex.Operation$OpStar.exec(Operation.java:235)
at net.sf.saxon.regex.REMatcher.matchNodes(REMatcher.java:413)
The problem appears only when I use the "s" flag. But if I drop it, I won't get the match. I tried an alternative that doesn't require the flag and does the same:
<xsl:variable name="removeUnwanted" as="xs:string" select="replace(., '<UnwantedPayloadTag[\s\S]*UnwantedPayloadTag>', '')" />
But I get the same error on Saxon. And again, Altova gets it right. I'm unsure of whether the problem is on my code, since it works fine in Altova. But I would really like to get this to work in Saxon, as well. So, what's wrong?
As Saxon 9.6 is now available and even the Home Edition HE supports XPath 3.0 functions like parse-xml-fragment the right approach to your problem is now doing
<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output method="xml" encoding="utf-8"/>
<xsl:template match="/">
<xsl:apply-templates select="*/EnvelopeTagContainingPayload"/>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* , node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="EnvelopeTagContainingPayload">
<xsl:apply-templates select="parse-xml-fragment(.)"/>
</xsl:template>
<xsl:template match="UnwantedPayloadTag"/>
</xsl:transform>
as that way you simply parse the markup as XML and then use templates to filter out any elements you don't want.
You're getting a stack overflow in the Saxon regular expression engine because there's too much backtracking. We've got a fix for that in the future 9.6 release, but in the meantime you need to be careful about regular expressions that do too much backtracking.
Really, your approach is wrong. Regular expressions should not be used to parse XML. Your expression is wrong, because it can match things that it shouldn't match, e.g. something in a comment that looks like an end tag. You can't get it right by tweaking the regex, because XML has a recursive grammar and regular expressions can't handle recursive grammars. Saxon provides parse-xml() for this purpose.
Short question: How to deal with a raw ampersand in xml input file.
ADDED: Im not even selecting the field with the ampersand. The parser complains at the presence of the ampersand within the file.
Long explanation:
im dealing with xml that is generated via a url response.
<NOTE>I%20hope%20this%20won%27t%20require%20a%20signature%3f%20%20
There%20should%20be%20painters%20%26%20stone%20guys%20at%20the
%20house%20on%20Wednesday%2c%20but%20depending%20on%20what%20time%20
it%20is%20delivered%2c%20I%20can%27t%20guarantee%21%20%20
Also%2c%20just%20want%20to%20make%20sure%20the%20billing%20address
%20is%20different%20from%20shipping%20address%3f
</NOTE>
which is url decoded into this:
<NOTE>I hope this won't require a signature?
There should be painters & stone guys at the
house on Wednesday, but depending on what time it is delivered, I can't guarantee!
Also, just want to make sure the billing address is different from shipping address?
</NOTE>
The Problem:
xslproc chokes on that last string because of the '&' in "painters & stone guys"
with the following error:
xmlParseEntityRef: no name
<NOTE>I hope this won't require a signature? There should be painters &
It looks like xsltproc expects a closing </NOTE>
Ive tried all manner of disable-output-escaping="yes" in various locations. xsl:output and xsl:value-of
And also tried xsltproc --decode-uri but cant figure out that one out. No documentation.
Note:
I wonder if its worth keeping the input in urlencoded format. And using a DOCTYPE..such as the following (not sure how to do that). The output is eventually a browser.
<!DOCTYPE xsl:stylesheet [
<!ENTITY nbsp " ">
<!ENTITY copy "©">
<!ENTITY reg "®">
]>
The XML is malformed if there's an ampersand that's not escaped. If you put the string inside <![CDATA[]]>, then it should work.
<NOTE><![CDATA[I hope this won't require a signature?
There should be painters & stone guys at the
house on Wednesday, but depending on what time it is delivered, I can't guarantee!
Also, just want to make sure the billing address is different from shipping address?]]>
</NOTE>
Or, of course, use & instead of &.
Edit: You can also translate the URL escapes into numeric character references if the XSLT processor supports disable-output-escaping (and xsltproc does):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="NOTE">
<xsl:copy>
<xsl:call-template name="decodeURL"/>
</xsl:copy>
</xsl:template>
<xsl:template name="decodeURL">
<xsl:param name="URL" select="string()"/>
<xsl:choose>
<xsl:when test="contains($URL,'%')">
<xsl:variable name="remainingURL" select="substring-after($URL,'%')"/>
<xsl:value-of disable-output-escaping="yes" select="concat(
substring-before($URL,'%'),
'&#x',
substring($remainingURL,1,2),
';')"/>
<xsl:call-template name="decodeURL">
<xsl:with-param name="URL" select="substring($remainingURL,3)"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$URL"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Of course you don't have to use this transformation as a preprocessing step, you can re-use decodeURL in a stylesheet that transforms your source XML that includes the URL encoded string to HTML or whatever.
here i have pasted a sample xml of 50G, earlier i used fetch the data from this below tag with the help of using crlf but now i want to fetch by using line feed ,because i need data correctly what if i ask like suppose i want linefeed 1 content means AE012345677890
similarly line feed 2 means it should fetch Bank code by using xslt how do i can call line feed .
<local>
<message>
<block4>
<tag>
<name>50G</name>
<value>AE012345677890
Bank code
country name
country code</value>
</tag>
</block4>
</message>
</local>
output required :
AE012345677890,Bank code,country name,country code
It's obviously bad use of XML. The point of XML is that you shouldn't need any other parsing and here you do need one, namely splitting on newline. Anyway, when you already have that, you can split on newline using the core XPath functions substring-before and substring-after.
First line should be something like
substring-before(value, '
')
(that's an xpath expression, so you have to put it into or similar tag) and the remaining lines should be
substring-after(value, '
')
You can combine these two, so second line is
substring-before(substring-after(value, '
'), '
')
third line is
substring-before(substring-after(substring-after(value, '
'), '
'), '
')
etc.
PS: I am not sure whether you need to use
or \n for newline.
Depending on the value space for the different constituent types (e.g. if it is known that they don't contain a space), one of these simple XSLT 1.0 solutions may be what you need:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="value">
<xsl:value-of select=
"translate(., '
', ',')"/>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
Produces:
AE012345677890,Bankcode,countryname,countrycode
And this transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="value">
<xsl:value-of select=
"normalize-space(translate(., '
', ','))"/>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
produces:
AE012345677890, Bank code, country name, country code
If none of these two XSLT 1.0 transformation satisfies your requirements, you may need to perform a trim operation. There is a trim function/template in FXSL -- ready to use.
II. A quick XSLT 2.0 solution:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="value">
<xsl:variable name="vLines" select="tokenize(., '
?
')"/>
<xsl:for-each select="$vLines">
<xsl:value-of select=
"translate(replace(., '(^[ \t\r]+)|([ \t\r]+$)', '~~'), '~', '')"/>
<xsl:if test="not(position() eq last())">,</xsl:if>
</xsl:for-each>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
produces exactly the wanted result:
AE012345677890,Bank code,country name,country code
If you're using XSLT 2.0, you can also do this using the tokenize function:
<xsl:template match="value">
<!-- loop through each segment that's before a line break, output
its normalised value and add a comma if required -->
<xsl:for-each select="tokenize(., '
')">
<xsl:value-of select="normalize-space(current())"/>
<xsl:if test="not(position()=last())">,</xsl:if>
</xsl:for-each>
</xsl:template>
This produces the desired result:
AE012345677890,Bank code,country name,country code
(As Dimitre Novatchev points out below, it will also collapse multiple white space, ie: spaces and tabs inside each line, into one single space, so you might want to experiment and see if that's okay with your data)
If you are limited to XSLT 1.0, you may be able to implement the EXSLT library which also contains tokenize (see the tokenize page and click on "How To" in the upper lefthand menu for more information on implementing the library).
hi all i have written a logic based on a requirement concact more than two data at a time in my xslt code but i m not reaching my expected output can any one give some suggestions
here is my xml
<Swift>
<block4>
<tag>
<name>50K</name>
<value>
0101/0457887750
SAMAROCA
MENENDEZ Y PELAYO
</value>
</tag>
</block4>
</Swift>
i have written an xslt here :
<xsl:template match="swift/message/block4/tag [name='50K']">
<xsl:variable name ="del50k" select ="(translate(substring-after(value,'
'),'
','~'))"/>
<xsl:value-of select="concat(substring(value, 1, 5), ',',substring(substring-before(value,'
'),6), ',',$del50k)" />
</xsl:template>
is that way doing is correct or not ? can any one help
EXPECTED OUTPUT:
0101/,0457887750,SAMAROCA~MENENDEZ Y PELAYO
I'm giving you a full working example based on your input. A few notes:
Use normalize-space() and split the string by space.
Just play with substring-before and substring-after.
make sure to use xsl:strip-space.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output omit-xml-declaration="yes" method="text"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="space" select="' '"/>
<xsl:template match="block4/tag[name='50K']">
<xsl:variable name="value" select="normalize-space(value)"/>
<xsl:variable name="code" select="substring-before($value,$space)"/>
<xsl:variable name="string1" select="concat(
substring-before($code,'/'),
'/,',
substring-after($code,'/'))"/>
<xsl:variable name="string2" select="substring-before(
substring-after($value,$space),
$space)"/>
<xsl:variable name="string3" select="substring-after(
substring-after($value,$space),
$space)"/>
<xsl:value-of select="concat($string1,',',$string2,'~',$string3)"/>
</xsl:template>
<xsl:template match="name|value"/>
</xsl:stylesheet>
Your biggest problem is that value is you context node (defined in your template's match attribute), but you're referring to value in your XPath. This will look for a value node within the value node, which is obviously wrong.
In your <xsl:variable> and <xsl:value-of> statements, change refences to value to ., to refer to the current node instead.
I think that's probably not the only issue, but given that your template isn't going to match anything in that document anyway, it's difficult to derive where else it could be going wrong. One possible additional problem is that your substring-before(value,'
') predicate within your <xsl:value-of> isn't going to return anything with the formatting given, as there's a newline before the 0101/etc... Now I think about it, that's also going to be issue in the substring-after in the previous line. That's very dependent on how it's actually formatted though, but from what you've given here, it is a problem.
I want to produce a newline for text output in XSLT. Any ideas?
The following XSL code will produce a newline (line feed) character:
<xsl:text>
</xsl:text>
For a carriage return, use:
<xsl:text>
</xsl:text>
My favoured method for doing this looks something like:
<xsl:stylesheet>
<xsl:output method='text'/>
<xsl:variable name='newline'><xsl:text>
</xsl:text></xsl:variable>
<!-- note that the layout there is deliberate -->
...
</xsl:stylesheet>
Then, whenever you want to output a newline (perhaps in csv) you can output something like the following:
<xsl:value-of select="concat(elem1,elem2,elem3,$newline)" />
I've used this technique when outputting sql from xml input. In fact, I tend to create variables for commas, quotes and newlines.
Include the attribute Method="text" on the xsl:output tag and include newlines in your literal content in the XSL at the appropriate points. If you prefer to keep the source code of your XSL tidy use the entity
where you want a new line.
You can use: <xsl:text>
</xsl:text>
see the example
<xsl:variable name="module-info">
<xsl:value-of select="#name" /> = <xsl:value-of select="#rev" />
<xsl:text>
</xsl:text>
</xsl:variable>
if you write this in file e.g.
<redirect:write file="temp.prop" append="true">
<xsl:value-of select="$module-info" />
</redirect:write>
this variable will produce a new line infile as:
commons-dbcp_commons-dbcp = 1.2.2
junit_junit = 4.4
org.easymock_easymock = 2.4
IMHO no more info than #Florjon gave is needed. Maybe some small details are left to understand why it might not work for us sometimes.
First of all, the 
 (hex) or 
 (dec) inside a <xsl:text/> will always work, but you may not see it.
There is no newline in a HTML markup. Using a simple <br/> will do fine. Otherwise you'll see a white space. Viewing the source from the browser will tell you what really happened. However, there are cases you expect this behaviour, especially if the consumer is not directly a browser. For instance, you want to create an HTML page and view its structure formatted nicely with empty lines and idents before serving it to the browser.
Remember where you need to use disable-output-escaping and where you don't. Take the following example where I had to create an xml from another and declare its DTD from a stylesheet.
The first version does escape the characters (default for xsl:text)
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" encoding="utf-8"/>
<xsl:template match="/">
<xsl:text><!DOCTYPE Subscriptions SYSTEM "Subscriptions.dtd">
</xsl:text>
<xsl:copy>
<xsl:apply-templates select="*" mode="copy"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*|node()" mode="copy">
<xsl:copy>
<xsl:apply-templates select="#*|node()" mode="copy"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
and here is the result:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Subscriptions SYSTEM "Subscriptions.dtd">
<Subscriptions>
<User id="1"/>
</Subscriptions>
Ok, it does what we expect, escaping is done so that the characters we used are displayed properly. The XML part formatting inside the root node is handled by ident="yes". But with a closer look we see that the newline character 
 was not escaped and translated as is, performing a double linefeed! I don't have an explanation on this, will be good to know. Anyone?
The second version does not escape the characters so they're producing what they're meant for. The change made was:
<xsl:text disable-output-escaping="yes"><!DOCTYPE Subscriptions SYSTEM "Subscriptions.dtd">
</xsl:text>
and here is the result:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Subscriptions SYSTEM "Subscriptions.dtd">
<Subscriptions>
<User id="1"/>
</Subscriptions>
and that will be ok. Both cr and lf are properly rendered.
Don't forget we're talking about nl, not crlf (nl=lf). My first attempt was to use only cr:
 and while the output xml was validated by DOM properly.
I was viewing a corrupted xml:
<?xml version="1.0" encoding="utf-8"?>
<Subscriptions>riptions SYSTEM "Subscriptions.dtd">
<User id="1"/>
</Subscriptions>
DOM parser disregarded control characters but the rendered didn't. I spent quite some time bumping my head before I realised how silly I was not seeing this!
For the record, I do use a variable inside the body with both CRLF just to be 100% sure it will work everywhere.
You can try,
<xsl:text>
</xsl:text>
It will work.
I added the DOCTYPE directive you see here:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY nl "
">
]>
<xsl:stylesheet xmlns:x="http://www.w3.org/2005/02/query-test-XQTSCatalog"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
This allows me to use &nl; instead of
to produce a newline in the output. Like other solutions, this is typically placed inside a <xsl:text> tag.
I second Nic Gibson's method, this was
always my favorite:
<xsl:variable name='nl'><xsl:text>
</xsl:text></xsl:variable>
However I have been using the Ant task <echoxml> to
create stylesheets and run them against files. The
task will do attribute value templates, e.g. ${DSTAMP} ,
but is also will reformat your xml, so in some
cases, the entity reference is preferable.
<xsl:variable name='nl'><xsl:text>
</xsl:text></xsl:variable>
I have found a difference between literal newlines in <xsl:text> and literal newlines using
.
While literal newlines worked fine in my environment (using both Saxon and the default Java XSLT processor) my code failed when it was executed by another group running in a .NET environment.
Changing to entities (
) got my file generation code running consistently on both Java and .NET.
Also, literal newlines are vulnerable to being reformatted by IDEs and can inadvertently get lost when the file is maintained by someone 'not in the know'.
I've noticed from my experience that producing a new line INSIDE a <xsl:variable> clause doesn't work.
I was trying to do something like:
<xsl:variable name="myVar">
<xsl:choose>
<xsl:when test="#myValue != ''">
<xsl:text>My value: </xsl:text>
<xsl:value-of select="#myValue" />
<xsl:text></xsl:text> <!--NEW LINE-->
<xsl:text>My other value: </xsl:text>
<xsl:value-of select="#myOtherValue" />
</xsl:when>
</xsl:choose>
<xsl:variable>
<div>
<xsl:value-of select="$myVar"/>
</div>
Anything I tried to put in that "new line" (the empty <xsl:text> node) just didn't work (including most of the simpler suggestions in this page), not to mention the fact that HTML just won't work there, so eventually I had to split it to 2 variables, call them outside the <xsl:variable> scope and put a simple <br/> between them, i.e:
<xsl:variable name="myVar1">
<xsl:choose>
<xsl:when test="#myValue != ''">
<xsl:text>My value: </xsl:text>
<xsl:value-of select="#myValue" />
</xsl:when>
</xsl:choose>
<xsl:variable>
<xsl:variable name="myVar2">
<xsl:choose>
<xsl:when test="#myValue != ''">
<xsl:text>My other value: </xsl:text>
<xsl:value-of select="#myOtherValue" />
</xsl:when>
</xsl:choose>
<xsl:variable>
<div>
<xsl:value-of select="$myVar1"/>
<br/>
<xsl:value-of select="$myVar2"/>
</div>
Yeah, I know, it's not the most sophisticated solution but it works, just sharing my frustration experience with XSLs ;)
I couldn't just use the <xsl:text>
</xsl:text> approach because if I format the XML file using XSLT the entity will disappear. So I had to use a slightly more round about approach using variables
<xsl:variable name="nl" select="'
'"/>
<xsl:template match="/">
<xsl:value-of select="$nl" disable-output-escaping="no"/>
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:text xml:space="preserve">
</xsl:text>
just add this tag:
<br/>
it works for me ;) .