eXist-db special characters in transformation and xmldb:store - xslt

I've got a question concerning output escaping in eXist-db 4.5:
I'm using transform:transform (with $serialization-options = method=text media-type=application/text) and xmldb:store (with $mime-type = text/plain) to save the output of a XSL Transformation back to the Database. Inside my xslt-Stylesheet I'm using
<xsl:value-of select="concat('Tom ', '&', ' Peter')"/>
But the output that is saved back to eXist looks like Tom $amp; Peter instead of Tom & Peter like I was expecting.
When I specify disable-output-escaping="yes" eXist terminates with an error...
<xsl:value-of select="concat('Tom ', '&', ' Peter')" disable-output-escaping="yes"/>
Using transform:stream-transform like suggested here doesn't work either because I need the output to be saved to a text-file.
How can I make sure that I can use concat and special characters like & in my XSL Transformation?
Edit: Added Example
Say you've got an eXist collection named temp under /db/apps/ with the following files in it:
input.xml
<?xml version="1.0" encoding="UTF-8"?>
<testxml>
<name>Peter</name>
</testxml>
stylesheet.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.0">
<xsl:template match="/">
<!-- Ampersand is not encoded: --> <xsl:value-of select="concat('Tom ', '& ', testxml/name)"/> -->
<!-- transformation fails: <xsl:value-of disable-output-escaping="yes" select="concat('Tom ', '&', testxml/name)"/> -->
<!-- Doesn't work obviously: <xsl:value-of select="concat('Tom ', '&', testxml/name)"/> -->
</xsl:template>
</xsl:stylesheet>
And
transformation.xq
xquery version "3.1";
declare function local:xml2tex() as xs:string
{
let $mime-type := "text/plain"
let $stylesheet := doc("/db/apps/temp/stylesheet.xsl")
let $serializationoptions := "method=text media-type=application/text"
let $doc := doc("/db/apps/temp/input.xml")
let $filename := (replace(util:document-name($doc), "\.xml$", "") || ".tex")
let $transform := transform:transform(
$doc,
$stylesheet,
(),
(),
$serializationoptions)
let $store := xmldb:store("/db/apps/temp", $filename, $transform, $mime-type)
return
$filename
};
local:xml2tex()
When you evaluate transformation.xq with the three contained value-of select options, you see that the working one produces a *.tex file with the content Tom & Peter which is not what is intended (that would be Tom & Peter)

According to eXist's function documentation for transform:transform(), this function returns a node() (or an empty sequence). As a result, as much as you might try to force the result of your XSLT transformation to be a plain old string (as you did by supplying the method=text serialization parameter), the function will still return this string as a node - a text node.
When you pass a text node to the xmldb:store() function to store a text file (a .tex file in your case), serialization comes into play again, because the text node has to be serialized into the binary form that eXist uses for text files. The default serialization method is the XML method, which escapes strings when serializing text nodes.
To test this hypothesis, run the following query and examine the resulting files:
xmldb:store("/db", "01-text-node.txt", text { "Tom & Peter" } ),
xmldb:store("/db", "02-string.txt", "Tom & Peter" )
To avoid this problem and ensure the transformed value is stored using the text method, you should use one of several methods of deriving the text node's string value - here I'm applying these methods to your $transform variable:
Use the cast as operator: $transform cast as xs:string
Use the fn:string() function: string($transform) or $transform/string().
Use the fn:serialize() function: serialize($transform, map { "method": "text" } )
Update: An issue reported in the comments below may lead the transform:transform() function to return more than one node(), which may lead solutions 1 and 2 above to lead to an unexpected cardinality error. A workaround is to use the fn:string-join() function. Solution 3 works without adjustment.

Related

Regex to replace everything that doesn't match

Sample <AAA:BBB CCC:DDD EEE:FFF><GGG:HHH III:JJJ><KKK>
What I want is a substituion that removes everything except <BBB><HHH><KKK>
I've tried loads of things and just keep falling over
If its easier to one brace at a time that would be fine
As you can probably guess its XML using LibXML and I'm parsing all the elements against a list of paths and nodes in arrays. I just want the node name not things like
<com.fnf:NodeName/> needs to be <NodeName/>
or worse still <\com.com.com:NodeName xmlns:com.com.com="http://www.some.domain"> just needs to say <NodeName>
I think this short program will do what you need. It uses XML::Twig to process the XML data, and defines a twig handler which is called for all elements in the data, and removes the element's namespace prefix and all attributes.
I've had to make a guess at what your XML data really looks like, as what you show in your question is far from being valid XML.
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new;
$twig->setTwigHandler(_all_ => sub {
$_->set_name($_->local_name);
$_->del_atts;
});
$twig->parse( \*DATA );
$twig->print(pretty_print => 'indented');
__DATA__
<root>
<aaa:bbb ccc="ddd" eee="fff">
<ggg:hhh iii="jjj">
<kkk></kkk>
</ggg:hhh>
</aaa:bbb>
</root>
output
<root>
<bbb>
<hhh>
<kkk></kkk>
</hhh>
</bbb>
</root>
Use XML::Parser and set Namespaces to true:
Namespaces
This is an Expat option. If this is set to a true value, then namespace processing is done during the parse. See "Namespaces" in XML::Parser::Expat for further discussion of namespace processing.
…
When this option is given with a true value, then the parser does namespace processing. By default, namespace processing is turned off. When it is turned on, the parser consumes xmlns attributes and strips off prefixes from element and attributes names where those prefixes have a defined namespace. A name's namespace can be found using the "namespace" method and two names can be checked for absolute equality with the "eq_name" method.
An idea: this can be done with an xsl transformation:
the xsl file:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" method="xml" encoding="utf-8" omit-xml-declaration="yes"/>
<!-- template for all elements -->
<xsl:template match="*">
<!-- local-name() gets the tagname without namespace -->
<xsl:element name="{local-name()}">
<xsl:apply-templates select="node()"/>
</xsl:element>
</xsl:template>
<!-- template to copy all that is not a tag or an attribute -->
<xsl:template match="comment() | text() | processing-instruction()">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
The perl code:
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXSLT;
use XML::LibXML;
my $xslt = XML::LibXSLT->new();
my $source = XML::LibXML->load_xml(location => 'removens.xml');
my $style_doc = XML::LibXML->load_xml(location => 'removens.xsl');
my $stylesheet = $xslt->parse_stylesheet($style_doc);
my $results = $stylesheet->transform($source);
print $stylesheet->output_as_bytes($results);
or instead of using perl, you can use directly xsltproc in a terminal:
xsltproc removens.xsl removens.xml

BaseX: where to declare the XML document on which to perform a query

With the program BaseX I was able to use XPath and XQuery in order to query an XML document located at my home directory, but I have a problem with doing the same in XSLT.
The document I'm querying is BookstoreQ.xml.
XPath version, running totally fine:
doc("/home/ioannis/Desktop/BookstoreQ.xml")/Bookstore/Book/Title
XSLT code which I want to execute:
<xsl:stylesheet version = "2.0" xmlns:xsl = "http://www.w3.org/1999/XSL/Transform">
<xsl:output method= "xml" indent = "yes" omit-xml-declaration = "yes" />
<xsl:template match = "Book"></xsl:template>
</xsl:stylesheet>
I read BaseX' documentation on XSLT, but didn't manage to find a solution. How can I run given XSLT?
BaseX has no direct support for XSLT, you have to call it using XQuery functions (which is easy, though). There are two functions for doing this, one for returning XML nodes (xslt:transform(...)), one for returning text as a string (xslt:transform-text(...)). You need the second one.
xslt:transform-text(doc("/home/ioannis/Desktop/BookstoreQ.xml"),
<xsl:stylesheet version = "2.0" xmlns:xsl = "http://www.w3.org/1999/XSL/Transform">
<xsl:output method= "xml" indent = "yes" omit-xml-declaration = "yes" />
<xsl:template match = "Book"></xsl:template>
</xsl:stylesheet>
)
Both can either be called with the XSLT as nodes (used here), by passing it as a string or giving a path to a file containing the XSLT code.

XSLT if body has a class then do something

I'd like my xslt to show some html depending on whether the class of "video" displays on the body tag. Is this possible? For many reasons, I can't use Javascript.
In XSLT one can avoid almost completely explicit conditional instructions:
<xsl:template match=
"/html/body[contains(concat(' ', #class, ' '),' video ')]">
<!-- Wanted processing here -->
</xsl:template>
Of course, in order to be selected for execution, this template needs to match a node from the node-set specified in the select attribute of a corresponding <xsl:apply-templates> -- either explicitly or as part of the XSLT default processing (as part of a built-in XSLT template).
<xsl:if test="contains(/html/body/#class, 'video')">
</xsl:if>
Of course, this will also evaluate to true for my-video and other classes. If such collisions are possible, consider using
<xsl:if test="/html/body/#class = 'video' or
contains(/html/body/#class, ' video ' or
starts-with(/html/body/#class, 'video ' or
ends-with(/html/body/#class, ' video')">
</xsl:if>
If using XSLT 2.0, you can also use the matches() function

XSLT; parse escaped text to a node-set and extract subelements

I've been fighting with this problem all day and am just about at my wit's end.
I have an XML file in which certain portions of data are stored as escaped text but are themselves well-formed XML. I want to convert the whole hierarchy in this text node to a node-set and extract the data therein. No combination of variables and functions I can think of works.
The way I'd expect it to work would be:
<xsl:variable name="a" select="InnerXML">
<xsl:for-each select="exsl:node-set($a)/*">
'do something
</xsl:for-each>
The input element InnerXML contains text of the form
<root><elementa>text</elementa><elementb><elementc/><elementd>text</elementd></elementb></root>
but that doesn't really matter. I just want to navigate the xml like a normal node-set.
Where am I going wrong?
In case you can use Saxon 9.x, it provides the saxon:parse() extension function exactly for solving this task.
what I've done is had a msxsl script in the xslt ( this is in a windows .NET environment):
<msxsl:script implements-prefix="cs" language="C#" >
<![CDATA[
public XPathNodeIterator parse(String strXML)
{
System.IO.StringReader rdr = new System.IO.StringReader(strXML);
XPathDocument doc = new XPathDocument(rdr);
XPathNavigator nav = doc.CreateNavigator();
XPathExpression expr;
expr = nav.Compile("/");
XPathNodeIterator iterator = nav.Select(expr);
return iterator;
}
]]>
</msxsl:script>
then you can call it like this:
<xsl:variable name="itemHtml" select="cs:parse(EscapedNode)" />
and that variable now contains xml you can iterate through

Xslt transform on special characters

I have an XML document that needs to pass text inside an element with an '&' in it.
This is called from .NET to a Web Service and comes over the wire with the correct encoding &
e.g.
T&O
I then need to use XSLT to create a transform but need to query SQL server through a SP without the encoding on the Ampersand e.g T&O would go to the DB.
(Note this all has to be done through XSLT, I do have the choice to use .NET encoding at this point)
Anyone have any idea how to do this from XSLT?
Note my XSLT knowledge isn’t the best to say the least!
Cheers
<xsl:text disable-output-escaping="yes">&<!--&--></xsl:text>
More info at: http://www.w3schools.com/xsl/el_text.asp
If you have the choice to use .NET you can convert between an HTML-encoded and regular string using (this code requires a reference to System.Web):
string htmlEncodedText = System.Web.HttpUtility.HtmlEncode("T&O");
string text = System.Web.HttpUtility.HtmlDecode(htmlEncodedText);
Update
Since you need to do this in plain XSLT you can use xsl:value-of to decode the HTML encoding:
<xsl:variable name="test">
<xsl:value-of select="'T&O'"/>
</xsl:variable>
The variable string($test) will have the value T&O. You can pass this variable as an argument to your extension function then.
Supposing your XML looks like this:
<root>T&O</root>
you can use this XSLT snippet to get the text out of it:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:template match="root"> <!-- Select the root element... -->
<xsl:value-of select="." /> <!-- ...and extract all text from it -->
</xsl:template>
</xsl:stylesheet>
Output (from Saxon 9, that is):
T&O
The point is the <xsl:output/> element. The defauklt would be to output XML, where the ampersand would still be encoded.