Short question: How to deal with a raw ampersand in xml input file.
ADDED: Im not even selecting the field with the ampersand. The parser complains at the presence of the ampersand within the file.
Long explanation:
im dealing with xml that is generated via a url response.
<NOTE>I%20hope%20this%20won%27t%20require%20a%20signature%3f%20%20
There%20should%20be%20painters%20%26%20stone%20guys%20at%20the
%20house%20on%20Wednesday%2c%20but%20depending%20on%20what%20time%20
it%20is%20delivered%2c%20I%20can%27t%20guarantee%21%20%20
Also%2c%20just%20want%20to%20make%20sure%20the%20billing%20address
%20is%20different%20from%20shipping%20address%3f
</NOTE>
which is url decoded into this:
<NOTE>I hope this won't require a signature?
There should be painters & stone guys at the
house on Wednesday, but depending on what time it is delivered, I can't guarantee!
Also, just want to make sure the billing address is different from shipping address?
</NOTE>
The Problem:
xslproc chokes on that last string because of the '&' in "painters & stone guys"
with the following error:
xmlParseEntityRef: no name
<NOTE>I hope this won't require a signature? There should be painters &
It looks like xsltproc expects a closing </NOTE>
Ive tried all manner of disable-output-escaping="yes" in various locations. xsl:output and xsl:value-of
And also tried xsltproc --decode-uri but cant figure out that one out. No documentation.
Note:
I wonder if its worth keeping the input in urlencoded format. And using a DOCTYPE..such as the following (not sure how to do that). The output is eventually a browser.
<!DOCTYPE xsl:stylesheet [
<!ENTITY nbsp " ">
<!ENTITY copy "©">
<!ENTITY reg "®">
]>
The XML is malformed if there's an ampersand that's not escaped. If you put the string inside <![CDATA[]]>, then it should work.
<NOTE><![CDATA[I hope this won't require a signature?
There should be painters & stone guys at the
house on Wednesday, but depending on what time it is delivered, I can't guarantee!
Also, just want to make sure the billing address is different from shipping address?]]>
</NOTE>
Or, of course, use & instead of &.
Edit: You can also translate the URL escapes into numeric character references if the XSLT processor supports disable-output-escaping (and xsltproc does):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="NOTE">
<xsl:copy>
<xsl:call-template name="decodeURL"/>
</xsl:copy>
</xsl:template>
<xsl:template name="decodeURL">
<xsl:param name="URL" select="string()"/>
<xsl:choose>
<xsl:when test="contains($URL,'%')">
<xsl:variable name="remainingURL" select="substring-after($URL,'%')"/>
<xsl:value-of disable-output-escaping="yes" select="concat(
substring-before($URL,'%'),
'&#x',
substring($remainingURL,1,2),
';')"/>
<xsl:call-template name="decodeURL">
<xsl:with-param name="URL" select="substring($remainingURL,3)"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$URL"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Of course you don't have to use this transformation as a preprocessing step, you can re-use decodeURL in a stylesheet that transforms your source XML that includes the URL encoded string to HTML or whatever.
Related
I have an example document that looks like this
<document>
<memo>
<to>Allen</to>
<p>Hello! My name is <bold>Josh</bold></p>
<p>It's nice to meet you <bold>Allen</bold>. I hope that we get to meet up more often.</p>
<from>Josh</from>
<memo>
</document>
and this is my XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:company="http://my.company">
<xsl:output method="html"/>
<xsl:variable name="link" select="company:generate-link()"/>
<xsl:template match="/document/memo">
<h1>To: <xsl:value-of select="to"/></h1>
<xsl:for-each select="p">
<p><xsl:apply-templates select="node()" mode="paragraph"/></p>
</xsl:for-each>
<xsl:if test="from">
<p>From: <strong><xsl:value-of select="from"/></strong></p>
</xsl:if>
<xsl:copy-of select="$link"/>
</xsl:template>
<xsl:template match="bold" mode="paragraph">
<b><xsl:value-of select="."/></b>
</xsl:template>
<xsl:template match="text()" mode="paragraph">
<xsl:value-of select="."/>
</xsl:template>
</xsl:stylesheet>
and the variable link contains the following example node:
When I do a copy-of out the variable link it prints out the node correctly (but obviously without any text). I want to insert it into the document and replace the text using XSLT. For example, the text could be:
View all of <xsl:value-of select="/document/memo/from"/>'s documents.
So the resulting document would look like:
<h1>To: Allen</h1>
<p>Hello! My name is <b>Josh</b></p>
<p>It's nice to meet you <b>Allen</b>. I hope that we get to meet up more often.</p>
<from>Josh</from>
View all of Josh's documents.
I have been searching the internet on how to do this but I couldn't find anything. If anyone could help I would appreciate it a lot!
Thanks,
David.
You haven't said which XSLT processor that is nor have you shown the code of that extension function to allow us to understand what it returns but based on your comment saying it returns a node you can usually process it further with templates so if you use <xsl:apply-templates select="$link"/> and then write a template
<xsl:template match="a[#href]">
<xsl:copy>
<xsl:copy-of select="#*"/>
View all of <xsl:value-of select="$main-doc/document/memo/from"/>'s documents.
</xsl:copy>
</xsl:template>
where you declare a global variable <xsl:variable name="main-doc" select="/"/> you should be able to transform the node returned from your function.
I'm trying to split a tab separated value string using substring-before I'm having a hard time because the XSLT mediator does not work as it should.
The xslt stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:p576="http://p576.test.ws" version="1.0">
<xsl:output encoding="UTF-8" method="xml" indent="yes"></xsl:output>
<xsl:param name="NAMESPACE"></xsl:param>
<xsl:param name="LOG_ID"></xsl:param>
<xsl:template match="/">
<xsl:element name="Response" namespace="{$NAMESPACE}">
<xsl:if test="$LOG_ID">
<xsl:element name="LogId" namespace="{$NAMESPACE}">
<xsl:value-of select="$LOG_ID"></xsl:value-of>
</xsl:element>
</xsl:if>
<xsl:element name="Text" namespace="{$NAMESPACE}">
<xsl:value-of select="substring-before(//p576:execMRPCResponse/p576:execMRPCReturn, ' ')"></xsl:value-of>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
The string:
999900418559 59 4730 Payment Created & Posted
Instead of cutting the string "999900418559" it cuts it here: "999900418559 59 4730 Payment" where a space is. The same result is achieved if you use substring-before '$#32;' meaning tab and space are converted to space when looking into the string.
Is there any way to keep this from happening? or if the problem is with the version of saxon used by wso2, should I update it (I'm using WSO2ESB v4.8.1)?
If tabs in the source document are being converted to spaces, then this is happening before Saxon gets to see the data. It could happen, for example, if the data is put through a schema validator and the type of the relevant element uses a whitespace facet of "collapse".
To prevent it happening, you first need to find out when and where it is happening, and there are no clues to that in your post (at least for someone who has never heard of WSO2ESB).
As a first diagnostic step, try to capture the exact form of the source document supplied as input to the transformation, and check the state of the whitespace it contains.
Having written that, I think there is another possibility, which is that the character reference in your stylesheet is being corrupted by whatever process it is that compiles the stylesheet. You could defend against that by replacing ' ' with codepoints-to-string(9).
I've been trying to figure out a way to use a param/variable as an argument to a function.
At the very least, I'd like to be able to use basic string parameters as arguments as follows:
<xsl:param name="stringValue" default="'abcdef'"/>
<xsl:value-of select="substring(string($stringValue),1,3)"/>
The above code generates no output.
I feel like I'm missing a simple way of doing this. I'm happy to use exslt or some other extension if an xslt 1.0 processor does not allow this.
Edit:
I am using XSL 1.0 and transforming using Nokogiri, which supports XPATH 1.0 . Here is a more complete snippet of what I am trying to do:
I want to pass column numbers as parameters using nokogiri as follows
document = Nokogiri::XML(File.read('table.xml'))
template = Nokogiri::XSLT(File.read('extractTableData.xsl'))
transformed_document = template.transform(document,
["tableName","'Problems'", #Table Heading
"tablePath","'Table'", #Takes an absolute XPATH String
"nameColumnIndex","2", #column number
"valueColumnIndex","3"]) #column number
File.open('FormattedOutput.xml', 'w').write(transformed_document)
My xsl then wants to access every TD[valueColumnIndex] and and retrieve the first 3 characters at that position, which is why I am using a substring function. So I want to do something like:
<xsl:value-of select="substring(string(TD[$valueColumnIndex]),1,3)"/>
Since I was unable to do that, I tried to extract TD[$valueColumnIndex] to another param valueCode and then do substring(string(valueCode),1,3)
That did not work either (which is to say, no text was output, whereas <xsl:value-of select="$valueCode"/> gave me the expected output).
As a result, i decided to understand how to use parameters better, I would just use a hard coded string, as mentioned in my earlier question.
Things I have tried:
using single quotes around abcdef (and not) while
using string() around the param name (and not)
Based on the comments below, it seems I am handicapped in my ability to understand the error because Nokogiri does not report an error for these situations. I am in the process of installing xsltproc right now and seeing if I receive any errors.
Finally, here is my entire xsl. I use a separate template forLoop because of the valueCode param I am creating. The lines of interest are the last 5 or so. I cannot include the xml as there are data use issues involved.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common"
xmlns:dyn="http://exslt.org/dynamic"
exclude-result-prefixes="ext dyn">
<xsl:param name="tableName" />
<xsl:param name="tablePath" />
<xsl:param name= "nameColumnIndex" />
<xsl:param name= "valueColumnIndex"/>
<xsl:template match="/">
<xsl:param name="tableRowPath">
<xsl:value-of select="$tablePath"/><xsl:text>/TR</xsl:text>
</xsl:param>
<!-- Problems -->
<section>
<name>
<xsl:value-of select="$tableName" />
</name>
<!-- <xsl:for-each select="concat($tablePath,'/TR')"> -->
<xsl:for-each select="dyn:evaluate($tableRowPath)">
<!-- Encode record section -->
<xsl:call-template name="forLoop"/>
</xsl:for-each>
</section>
</xsl:template>
<xsl:template name="forLoop">
<xsl:param name="valueCode">
<xsl:value-of select="./TD[number($valueColumnIndex)][text()]"/>
</xsl:param>
<xsl:param name="RandomString" select="'Try123'"/>
<section>
<name>
<xsl:value-of select="./TD[number($nameColumnIndex)]"/>
</name>
<code>
<short>
<xsl:value-of select="substring(string($valueCode),1,3)"/>
</short>
<long>
<xsl:value-of select="$valueCode"/>
</long>
</code>
</section>
</xsl:template>
</xsl:stylesheet>
Use it this way:
<xsl:param name="stringValue" select="'abcdef'"/>
<xsl:value-of select="substring($stringValue,1,3)"/>
I have following xml
<xml>
<xref>
is determined “in prescribed manner”
</xref>
</xml>
I want to see if we can process xslt 2 and return the following result
<xml>
<xref>
is
</xref>
<xref>
determined
</xref>
<xref>
“in prescribed manner”
</xref>
</xml>
I tried few options like replace the space and entities and then using for-each loop but not able to work it out. May be we can use tokenize function of xslt 2.0 but don't know how to use it. Any hint will be helpful.
# JimGarrison: Sorry, I couldn't resist. :-) This XSLT is definitely not elegant but it does (I assume) most of the job:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:variable name="left_quote" select="'<'"/>
<xsl:variable name="right_quote" select="'>'"/>
<xsl:template name="protected_tokenize">
<xsl:param name="string"/>
<xsl:variable name="pattern" select="concat('^([^', $left_quote, ']+)(', $left_quote, '[^', $right_quote, ']*', $right_quote,')?(.*)')"/>
<xsl:analyze-string select="$string" regex="{$pattern}">
<xsl:matching-substring>
<!-- Handle the prefix of the string up to the first opening quote by "normal" tokenizing. -->
<xsl:variable name="prefix" select="concat(' ', normalize-space(regex-group(1)))"/>
<xsl:for-each select="tokenize(normalize-space($prefix), ' ')">
<xref>
<xsl:value-of select="."/>
</xref>
</xsl:for-each>
<!-- Handle the text between the quotes by simply passing it through. -->
<xsl:variable name="protected_token" select="normalize-space(regex-group(2))"/>
<xsl:if test="$protected_token != ''">
<xref>
<xsl:value-of select="$protected_token"/>
</xref>
</xsl:if>
<!-- Handle the suffix of the string. This part may contained protected tokens again. So we do it recursively. -->
<xsl:variable name="suffix" select="normalize-space(regex-group(3))"/>
<xsl:if test="$suffix != ''">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="$suffix"/>
</xsl:call-template>
</xsl:if>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:template match="*|#*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="xref">
<xsl:call-template name="protected_tokenize">
<xsl:with-param name="string" select="text()"/>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
Notes:
There is the general assumption that white space only serves as a token delimiter and need not be preserved.
“ and rdquo; seem to be invalid in XML although they are valid in HTML. In the XSLT there are variables defined holding the quote characters. They will have to be adapted once you find the right XML representation. You can also eliminate the variables and put the characters right into the regular expression pattern. It will be significantly simplified by this.
<xsl:analyze-string> does not allow a regular expression which may evaluate into an empty string. This comes as a little problem since either the prefix and/or the proteced token and/or the suffix may be empty. I take care of this by artificially adding a space at the beginning of the pattern which allows me to search for the prefix using + (at least one occurence) instead of * (zero or more occurences).
I want to produce a newline for text output in XSLT. Any ideas?
The following XSL code will produce a newline (line feed) character:
<xsl:text>
</xsl:text>
For a carriage return, use:
<xsl:text>
</xsl:text>
My favoured method for doing this looks something like:
<xsl:stylesheet>
<xsl:output method='text'/>
<xsl:variable name='newline'><xsl:text>
</xsl:text></xsl:variable>
<!-- note that the layout there is deliberate -->
...
</xsl:stylesheet>
Then, whenever you want to output a newline (perhaps in csv) you can output something like the following:
<xsl:value-of select="concat(elem1,elem2,elem3,$newline)" />
I've used this technique when outputting sql from xml input. In fact, I tend to create variables for commas, quotes and newlines.
Include the attribute Method="text" on the xsl:output tag and include newlines in your literal content in the XSL at the appropriate points. If you prefer to keep the source code of your XSL tidy use the entity
where you want a new line.
You can use: <xsl:text>
</xsl:text>
see the example
<xsl:variable name="module-info">
<xsl:value-of select="#name" /> = <xsl:value-of select="#rev" />
<xsl:text>
</xsl:text>
</xsl:variable>
if you write this in file e.g.
<redirect:write file="temp.prop" append="true">
<xsl:value-of select="$module-info" />
</redirect:write>
this variable will produce a new line infile as:
commons-dbcp_commons-dbcp = 1.2.2
junit_junit = 4.4
org.easymock_easymock = 2.4
IMHO no more info than #Florjon gave is needed. Maybe some small details are left to understand why it might not work for us sometimes.
First of all, the 
 (hex) or 
 (dec) inside a <xsl:text/> will always work, but you may not see it.
There is no newline in a HTML markup. Using a simple <br/> will do fine. Otherwise you'll see a white space. Viewing the source from the browser will tell you what really happened. However, there are cases you expect this behaviour, especially if the consumer is not directly a browser. For instance, you want to create an HTML page and view its structure formatted nicely with empty lines and idents before serving it to the browser.
Remember where you need to use disable-output-escaping and where you don't. Take the following example where I had to create an xml from another and declare its DTD from a stylesheet.
The first version does escape the characters (default for xsl:text)
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" encoding="utf-8"/>
<xsl:template match="/">
<xsl:text><!DOCTYPE Subscriptions SYSTEM "Subscriptions.dtd">
</xsl:text>
<xsl:copy>
<xsl:apply-templates select="*" mode="copy"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*|node()" mode="copy">
<xsl:copy>
<xsl:apply-templates select="#*|node()" mode="copy"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
and here is the result:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Subscriptions SYSTEM "Subscriptions.dtd">
<Subscriptions>
<User id="1"/>
</Subscriptions>
Ok, it does what we expect, escaping is done so that the characters we used are displayed properly. The XML part formatting inside the root node is handled by ident="yes". But with a closer look we see that the newline character 
 was not escaped and translated as is, performing a double linefeed! I don't have an explanation on this, will be good to know. Anyone?
The second version does not escape the characters so they're producing what they're meant for. The change made was:
<xsl:text disable-output-escaping="yes"><!DOCTYPE Subscriptions SYSTEM "Subscriptions.dtd">
</xsl:text>
and here is the result:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Subscriptions SYSTEM "Subscriptions.dtd">
<Subscriptions>
<User id="1"/>
</Subscriptions>
and that will be ok. Both cr and lf are properly rendered.
Don't forget we're talking about nl, not crlf (nl=lf). My first attempt was to use only cr:
 and while the output xml was validated by DOM properly.
I was viewing a corrupted xml:
<?xml version="1.0" encoding="utf-8"?>
<Subscriptions>riptions SYSTEM "Subscriptions.dtd">
<User id="1"/>
</Subscriptions>
DOM parser disregarded control characters but the rendered didn't. I spent quite some time bumping my head before I realised how silly I was not seeing this!
For the record, I do use a variable inside the body with both CRLF just to be 100% sure it will work everywhere.
You can try,
<xsl:text>
</xsl:text>
It will work.
I added the DOCTYPE directive you see here:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY nl "
">
]>
<xsl:stylesheet xmlns:x="http://www.w3.org/2005/02/query-test-XQTSCatalog"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
This allows me to use &nl; instead of
to produce a newline in the output. Like other solutions, this is typically placed inside a <xsl:text> tag.
I second Nic Gibson's method, this was
always my favorite:
<xsl:variable name='nl'><xsl:text>
</xsl:text></xsl:variable>
However I have been using the Ant task <echoxml> to
create stylesheets and run them against files. The
task will do attribute value templates, e.g. ${DSTAMP} ,
but is also will reformat your xml, so in some
cases, the entity reference is preferable.
<xsl:variable name='nl'><xsl:text>
</xsl:text></xsl:variable>
I have found a difference between literal newlines in <xsl:text> and literal newlines using
.
While literal newlines worked fine in my environment (using both Saxon and the default Java XSLT processor) my code failed when it was executed by another group running in a .NET environment.
Changing to entities (
) got my file generation code running consistently on both Java and .NET.
Also, literal newlines are vulnerable to being reformatted by IDEs and can inadvertently get lost when the file is maintained by someone 'not in the know'.
I've noticed from my experience that producing a new line INSIDE a <xsl:variable> clause doesn't work.
I was trying to do something like:
<xsl:variable name="myVar">
<xsl:choose>
<xsl:when test="#myValue != ''">
<xsl:text>My value: </xsl:text>
<xsl:value-of select="#myValue" />
<xsl:text></xsl:text> <!--NEW LINE-->
<xsl:text>My other value: </xsl:text>
<xsl:value-of select="#myOtherValue" />
</xsl:when>
</xsl:choose>
<xsl:variable>
<div>
<xsl:value-of select="$myVar"/>
</div>
Anything I tried to put in that "new line" (the empty <xsl:text> node) just didn't work (including most of the simpler suggestions in this page), not to mention the fact that HTML just won't work there, so eventually I had to split it to 2 variables, call them outside the <xsl:variable> scope and put a simple <br/> between them, i.e:
<xsl:variable name="myVar1">
<xsl:choose>
<xsl:when test="#myValue != ''">
<xsl:text>My value: </xsl:text>
<xsl:value-of select="#myValue" />
</xsl:when>
</xsl:choose>
<xsl:variable>
<xsl:variable name="myVar2">
<xsl:choose>
<xsl:when test="#myValue != ''">
<xsl:text>My other value: </xsl:text>
<xsl:value-of select="#myOtherValue" />
</xsl:when>
</xsl:choose>
<xsl:variable>
<div>
<xsl:value-of select="$myVar1"/>
<br/>
<xsl:value-of select="$myVar2"/>
</div>
Yeah, I know, it's not the most sophisticated solution but it works, just sharing my frustration experience with XSLs ;)
I couldn't just use the <xsl:text>
</xsl:text> approach because if I format the XML file using XSLT the entity will disappear. So I had to use a slightly more round about approach using variables
<xsl:variable name="nl" select="'
'"/>
<xsl:template match="/">
<xsl:value-of select="$nl" disable-output-escaping="no"/>
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:text xml:space="preserve">
</xsl:text>
just add this tag:
<br/>
it works for me ;) .