XSLT; parse escaped text to a node-set and extract subelements - xslt

I've been fighting with this problem all day and am just about at my wit's end.
I have an XML file in which certain portions of data are stored as escaped text but are themselves well-formed XML. I want to convert the whole hierarchy in this text node to a node-set and extract the data therein. No combination of variables and functions I can think of works.
The way I'd expect it to work would be:
<xsl:variable name="a" select="InnerXML">
<xsl:for-each select="exsl:node-set($a)/*">
'do something
</xsl:for-each>
The input element InnerXML contains text of the form
<root><elementa>text</elementa><elementb><elementc/><elementd>text</elementd></elementb></root>
but that doesn't really matter. I just want to navigate the xml like a normal node-set.
Where am I going wrong?

In case you can use Saxon 9.x, it provides the saxon:parse() extension function exactly for solving this task.

what I've done is had a msxsl script in the xslt ( this is in a windows .NET environment):
<msxsl:script implements-prefix="cs" language="C#" >
<![CDATA[
public XPathNodeIterator parse(String strXML)
{
System.IO.StringReader rdr = new System.IO.StringReader(strXML);
XPathDocument doc = new XPathDocument(rdr);
XPathNavigator nav = doc.CreateNavigator();
XPathExpression expr;
expr = nav.Compile("/");
XPathNodeIterator iterator = nav.Select(expr);
return iterator;
}
]]>
</msxsl:script>
then you can call it like this:
<xsl:variable name="itemHtml" select="cs:parse(EscapedNode)" />
and that variable now contains xml you can iterate through

Related

eXist-db special characters in transformation and xmldb:store

I've got a question concerning output escaping in eXist-db 4.5:
I'm using transform:transform (with $serialization-options = method=text media-type=application/text) and xmldb:store (with $mime-type = text/plain) to save the output of a XSL Transformation back to the Database. Inside my xslt-Stylesheet I'm using
<xsl:value-of select="concat('Tom ', '&', ' Peter')"/>
But the output that is saved back to eXist looks like Tom $amp; Peter instead of Tom & Peter like I was expecting.
When I specify disable-output-escaping="yes" eXist terminates with an error...
<xsl:value-of select="concat('Tom ', '&', ' Peter')" disable-output-escaping="yes"/>
Using transform:stream-transform like suggested here doesn't work either because I need the output to be saved to a text-file.
How can I make sure that I can use concat and special characters like & in my XSL Transformation?
Edit: Added Example
Say you've got an eXist collection named temp under /db/apps/ with the following files in it:
input.xml
<?xml version="1.0" encoding="UTF-8"?>
<testxml>
<name>Peter</name>
</testxml>
stylesheet.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.0">
<xsl:template match="/">
<!-- Ampersand is not encoded: --> <xsl:value-of select="concat('Tom ', '& ', testxml/name)"/> -->
<!-- transformation fails: <xsl:value-of disable-output-escaping="yes" select="concat('Tom ', '&', testxml/name)"/> -->
<!-- Doesn't work obviously: <xsl:value-of select="concat('Tom ', '&', testxml/name)"/> -->
</xsl:template>
</xsl:stylesheet>
And
transformation.xq
xquery version "3.1";
declare function local:xml2tex() as xs:string
{
let $mime-type := "text/plain"
let $stylesheet := doc("/db/apps/temp/stylesheet.xsl")
let $serializationoptions := "method=text media-type=application/text"
let $doc := doc("/db/apps/temp/input.xml")
let $filename := (replace(util:document-name($doc), "\.xml$", "") || ".tex")
let $transform := transform:transform(
$doc,
$stylesheet,
(),
(),
$serializationoptions)
let $store := xmldb:store("/db/apps/temp", $filename, $transform, $mime-type)
return
$filename
};
local:xml2tex()
When you evaluate transformation.xq with the three contained value-of select options, you see that the working one produces a *.tex file with the content Tom & Peter which is not what is intended (that would be Tom & Peter)
According to eXist's function documentation for transform:transform(), this function returns a node() (or an empty sequence). As a result, as much as you might try to force the result of your XSLT transformation to be a plain old string (as you did by supplying the method=text serialization parameter), the function will still return this string as a node - a text node.
When you pass a text node to the xmldb:store() function to store a text file (a .tex file in your case), serialization comes into play again, because the text node has to be serialized into the binary form that eXist uses for text files. The default serialization method is the XML method, which escapes strings when serializing text nodes.
To test this hypothesis, run the following query and examine the resulting files:
xmldb:store("/db", "01-text-node.txt", text { "Tom & Peter" } ),
xmldb:store("/db", "02-string.txt", "Tom & Peter" )
To avoid this problem and ensure the transformed value is stored using the text method, you should use one of several methods of deriving the text node's string value - here I'm applying these methods to your $transform variable:
Use the cast as operator: $transform cast as xs:string
Use the fn:string() function: string($transform) or $transform/string().
Use the fn:serialize() function: serialize($transform, map { "method": "text" } )
Update: An issue reported in the comments below may lead the transform:transform() function to return more than one node(), which may lead solutions 1 and 2 above to lead to an unexpected cardinality error. A workaround is to use the fn:string-join() function. Solution 3 works without adjustment.

Refrence XSD in XSLT file and match on element name to get element type

I am a beginner in using XSLT so I am not sure if this is even feasible. I really appreciate your help.
XSLT to transform from one format to another.
The source XML does not have the type.
I need to reference the XSD to get the type for the given element.
<match="*[not(*)]">
<elementName>
<key>
<xsl:value-of select="name()"/>
</key>
<type>
if (name() matches name =id in xsd file"
// if name = id matches name = id in xsd get type=String
<type>
This is sample XSD:
<xs:complexType name = "test">
<xs:sequence
<xs:element name="id" type="xs:string"/>
</xs:sequence>
</xscomplexType>
Using an XSLT 2.0 schema-aware transformation, you can write:
<xsl:import-schema>
... schema goes here, either inline or by reference ...
</xsl:import-schema>
<xsl:template match="element(id, xs:string)">
...
</xsl:template>
The normal way of using this assumes that you write your stylesheet knowing what is in the schema. Saxon has extended this with extension functions allowing you to discover what is in your schema. For example:
<xsl:variable name="type" select="saxon:type-annotation()"/>
<key><xsl:value-of select="name()"/></key>
<type>Q{<xsl:value-of select="namespace-from-QName($p)||"}"||local-name-from-QName($type)"/></type>
See the saxon:type-annotation and saxon:schema extension functions at http://www.saxonica.com/documentation/index.html#!functions/saxon
Analysing a schema document from XSLT directly is possible in theory but it's an enormous amount of work to get it right, if you're going to handle things such as xs:include/import/redefine, named types and anonymous types, global and local element declarations, substitution groups, etc. etc.
Yet another approach is to analyse the "precompiled schema" in XML format (SCM) which you can output from Saxon, which eliminates many of these difficulties.
Other products also offer APIs to access the schema, but there is no real standard.

XPath selector not working during PDF transformation

I have a DITA bookmap where I am storing image paths:
<bookmap>
<bookmeta>
<data id="productLogo">
<image href="images/_notrans/frontcover/productLogo.svg" />
</data>
<data id="productPhoto" >
<image href="images/_notrans/frontcover/productPhoto.jpg" />
</data>
</bookmeta>
</bookmap>
Then I attempt to grab the href values by data[#id]:
<xsl:variable name="productLogo"><xsl:value-of select="//data[#id='productLogo']/image/#href" /></xsl:variable>
<xsl:variable name="productPhoto"><xsl:value-of select="//data[#id='productPhoto']/image/#href" /></xsl:variable>
(These XPath expressions match the href when I test against my bookmap in Oxygen.)
During transformation I output:
<xsl:message>productPhoto: <xsl:value-of select="$productPhoto"/></xsl:message>
The value-of is always empty.
However, everything works as expected if I replace the id attribute with numbers:
<xsl:variable name="productLogo"><xsl:value-of select="//data[1]/image/#href" /></xsl:variable>
<xsl:variable name="productPhoto"><xsl:value-of select="//data[2]/image/#href" /></xsl:variable>
What am I doing wrong that's preventing using #id="whatever"?
The XSLT is not applied directly over the Bookmap contents, it is applied over an XML document which contains the bookmap with all topic references expanded in it and with some preprocessing applied to it.
If you set the "clean.temp" parameter to "no" you will find in the temporary files folder a file called something like "mapName_MERGED.xml", that is the XML document over which the XSLT is applied and as you will see in it, all IDs have been changed to be unique in the context of the entire XML document.
When usually working with data elements you should set the #name attribute to them like:
<data name="productLogo">
and match that name in the XSLT code.
There are examples of using "data" in the DITA 1.2 specs as well:
http://docs.oasis-open.org/dita/v1.2/os/spec/langref/data.html#data
Another option, depending on your needs, is to develop a naming convention for the product photos and use the element to build the URI. As the product logo shouldn't change for a product family, it wouldn't hurt to hard-code that in the XSLT code.

Using a parameter passed into xslt stylesheet

I am using Saxon to perform a transformation of an XML document in my .NET application. I am passing in a parameter to my xslt document but I have no idea how to use it in my template.
Here is what I have done so far:
var zipcode = _db.AXCustomers.FirstOrDefault(x => x.ACCOUNTNUM == accNo).ZIPCODE;
transformer.SetParameter(new QName("CustomerZipCode"), new XdmAtomicValue(zipcode));
Then in my xslt document I am specifying the parameter like so:
<xsl:template match="/">
<xsl:param name="CustomerZipCode" />
But when I try to use the parameter, nothing appears. I am using it like so:
<xsl:value-of select="substring-before($CustomerZipCode, ' ')"/>
But nothing is output even though my zipcode does contain a value
You are using xsl:param inside a xsl:template element, it means that the param is for the template. The parameter you are passing from the .net code is a transformer parameter and related xsl:param must be placed at the top level of the stylesheet, into the xsl:stylesheet element.

Xslt transform on special characters

I have an XML document that needs to pass text inside an element with an '&' in it.
This is called from .NET to a Web Service and comes over the wire with the correct encoding &
e.g.
T&O
I then need to use XSLT to create a transform but need to query SQL server through a SP without the encoding on the Ampersand e.g T&O would go to the DB.
(Note this all has to be done through XSLT, I do have the choice to use .NET encoding at this point)
Anyone have any idea how to do this from XSLT?
Note my XSLT knowledge isn’t the best to say the least!
Cheers
<xsl:text disable-output-escaping="yes">&<!--&--></xsl:text>
More info at: http://www.w3schools.com/xsl/el_text.asp
If you have the choice to use .NET you can convert between an HTML-encoded and regular string using (this code requires a reference to System.Web):
string htmlEncodedText = System.Web.HttpUtility.HtmlEncode("T&O");
string text = System.Web.HttpUtility.HtmlDecode(htmlEncodedText);
Update
Since you need to do this in plain XSLT you can use xsl:value-of to decode the HTML encoding:
<xsl:variable name="test">
<xsl:value-of select="'T&O'"/>
</xsl:variable>
The variable string($test) will have the value T&O. You can pass this variable as an argument to your extension function then.
Supposing your XML looks like this:
<root>T&O</root>
you can use this XSLT snippet to get the text out of it:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:template match="root"> <!-- Select the root element... -->
<xsl:value-of select="." /> <!-- ...and extract all text from it -->
</xsl:template>
</xsl:stylesheet>
Output (from Saxon 9, that is):
T&O
The point is the <xsl:output/> element. The defauklt would be to output XML, where the ampersand would still be encoded.