XPath - Querying two XML documents - xslt

I have have two xml docs:
XML1:
<Books>
<Book id="11">
.......
<AuthorName/>
</Book>
......
</Books>
XML2:
<Authors>
<Author>
<BookId>11</BookId>
<AuthorName>Smith</AuthorName>
</Author>
</Authors>
I'm trying to do the following:
Get the value of XML2/Author/AuthorName where XML1/Book/#id equals XML2/Author/BookId.
XML2/Author/AuthorName[../BookId = XML1/Book/#id]

An XPath 1.0 expression cannot refer to more than one XML document, unless the references to the additional documents have been set up in the context of the XPath engine by the hosting language. For example, if XSLT is the hosting language, then it makes its document() function available to the XPath engine it is hosting.
document($xml2Uri)/Authors/Author[BookId = $mainDoc/Books/Book/#id]
Do note, that even the main XML document needs to be referenced via another <xsl:variable>, named here $mainDoc.
The document() function is available only if Xpath is hosted by XSLT! This is not mentioned in the answer of Doc Brown and is misleading the readers.
An XPath 2.x expression may refer to any additional XML document using the XPath 2.0 doc() function.
for $doc in /,
$doc2 in doc(someUri)
return
$doc2/Authors/Author[BookId = $doc/Books/Book/#id]

The document function is your friend, here is a short tutorial how to combine multiple input files.
EDIT: Of course, that works only if your are using Xpath in an Xslt script.

Related

How to process HTML entities in XSLT

I am trying to transform XHTML that contains the entity. Saxon complains that the entity is not defined. How can I define it?
Is it possible to add the entity definition at the beginning of the stylesheet? As suggested
here:
http://s-n-ushakov.blogspot.com/2011/09/xslt-entities-java-xalan.html
or here:
Using an HTML entity in XSLT (e.g. )
My puny attempt, ignored by Saxon, was to add the following to the beginning of the XSLT:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE stylesheet [
<!ENTITY nbsp " ">
]>
I am using Saxon 9.9 PE.
The HTML I am trying to transform is a complete document, not just a fragment.
One possibility is to pass the URL of the XHTML to the XSLT as a parameter, which would read the XHTML as text using the unparsed-text() function, expand the entity reference using the replace() function, and parse the result using the parse-xml() function. e.g.
<xsl:template name="xsl:initial-template">
<xsl:param name="source"/>
<xsl:apply-templates select="
$source
=> unparsed-text()
=> replace('&nbsp;', '&#x000A0;')
=> parse-xml()
"/>
</xsl:template>
If the input document contains an entity reference that isn't declared in the DOCTYPE declaration, then it isn't a well-formed XML document, and therefore it isn't a well-formed XHTML document; and if it isn't well-formed, then Saxon can't handle it.
It would be best to look at the processing workflow that generated this ill-formed document and fix it so the documents it produces are well-formed.
If you can't do that, then you might be able to parse it as HTML. Saxon has an extension function saxon:parse-html(); or if your application is in Java then you could create a SAXSource that uses validator.nu as its XMLReader.
You should consider using the tool Tidy and convert html files into xhtml. It corrects all such things.
Just run tidy with the argument -asxml.

XSLT copy from external document

In XSLT 2.0 I am transforming a tei-xml document into HTML. In that transformationI need content from another document: I want to copy/transform a small set of nodes from the second document into the HTML output.
While processing the principal tei document I get the id and assign it to a variable:
<xsl:variable name="licenseid" select="./replace(#corresp,'#','')"/>
Then I go out to the other document and fetch the node using the variable, with the returned node assigned to a variable:
<xsl:variable name="licenseloc" select="doc(concat($somepath,'includes_sourcedesc.xml'))//tei:list[#type='copyright_type']/tei:item[#xml:id=$licenseid]"/>
This node I obtain looks like this:
<list type="copyright_type">
<item xml:id="copyright-cc-by-nc-sa-4.0">
<desc xml:lang="en">This work is made by available the author under the
<ref target="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike
4.0 International License</ref>.</desc>
</item>
</list>
And I want to transform it (from desc) to this:
<span>This work is made by available by the author under the
<a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike
4.0 International License</a>.</span>
If this were in my 'current' tei document I would handle it through templates, but I'm unsure how to copy and transform the nested layers from within a different 'current' document.

Refrence XSD in XSLT file and match on element name to get element type

I am a beginner in using XSLT so I am not sure if this is even feasible. I really appreciate your help.
XSLT to transform from one format to another.
The source XML does not have the type.
I need to reference the XSD to get the type for the given element.
<match="*[not(*)]">
<elementName>
<key>
<xsl:value-of select="name()"/>
</key>
<type>
if (name() matches name =id in xsd file"
// if name = id matches name = id in xsd get type=String
<type>
This is sample XSD:
<xs:complexType name = "test">
<xs:sequence
<xs:element name="id" type="xs:string"/>
</xs:sequence>
</xscomplexType>
Using an XSLT 2.0 schema-aware transformation, you can write:
<xsl:import-schema>
... schema goes here, either inline or by reference ...
</xsl:import-schema>
<xsl:template match="element(id, xs:string)">
...
</xsl:template>
The normal way of using this assumes that you write your stylesheet knowing what is in the schema. Saxon has extended this with extension functions allowing you to discover what is in your schema. For example:
<xsl:variable name="type" select="saxon:type-annotation()"/>
<key><xsl:value-of select="name()"/></key>
<type>Q{<xsl:value-of select="namespace-from-QName($p)||"}"||local-name-from-QName($type)"/></type>
See the saxon:type-annotation and saxon:schema extension functions at http://www.saxonica.com/documentation/index.html#!functions/saxon
Analysing a schema document from XSLT directly is possible in theory but it's an enormous amount of work to get it right, if you're going to handle things such as xs:include/import/redefine, named types and anonymous types, global and local element declarations, substitution groups, etc. etc.
Yet another approach is to analyse the "precompiled schema" in XML format (SCM) which you can output from Saxon, which eliminates many of these difficulties.
Other products also offer APIs to access the schema, but there is no real standard.

Adding xml attribute in spring rest xml response

How to add inner attribute to spring REST webservice response?
Sample xml returned by my webservice:
<books>
<bookId>5</bookId>
<bookName>testBook</bookName>
</books>
I want to use book id as xml attribute for bookName:
<books>
<bookName id="5">testBook</bookName>
</books>
Any suggestion?
Based on how your java types are defined you can either annotate your classes with #XmlAttribute or may need to write your custom XmlAdapter

YQL XSLT implementation limitations

For some reason, YQL's XSLT table can't parse my stylesheet. I have used the stylesheet successfully with the W3C's XSLT service. Here's an example of the problem in YQL Console. Why does this not work in YQL?
Also, I have yet to figure out how to pass the results of a YQL query to the XSLT table as the XML to be transformed while also specifying a stylesheet url. Current workaround is to abuse the W3C's service.
Your stylesheet is defined as 1.0 but you're using replace() and tokenize() which is part of the 2.0 standard. However it is a fully valid XSLT/XPath 2.0 stylesheet.
As an addition to Per T answer, change this:
<xsl:variable name="r">
<xsl:value-of select="replace(tr/td/p/a/following-sibling::text(),
'\s*-\s*(\d+)\.(\d+)\.(\d+)\s*',
'$1,$2,$3')" />
</xsl:variable>
With this:
<xsl:variable name="r"
select="translate(tr/td/p/a/following-sibling::text(),'. -',',')">
These:
tokenize($r,',')[1]
tokenize($r,',')[2]
tokenize($r,',')[3]
With these:
substring-before($r,',')
substring-before(substring-after($r,','),',')
substring-after(substring-after($r,','),',')
Note: This is just in case you don't know the amount of digit in advance, otherwise you could do:
substring($r,1,2)
substring($r,4,2)
substring($r,7)
Also, this
replace(tr/td/p[#class='t11bold']/a,'\s+',' ')
It should be just this:
normalize-space(tr/td/p[#class='t11bold']/a)
And finaly this:
replace($d,'^[^\[]*\[\s*(\d+:\d{2})?\s*-?\s*([^\]]*)\]\s*$','$2')
Could be:
normalize-space(substring-after(substring-before(substring-after($d,'['),']'),'-'))