Exclude certain child nodes when data structure is unknown - xslt

EDIT -
I've figured out the solution to my problem and posted a Q&A here.
I'm looking to process XML conforming to the Library of Congress EAD standard (found here). Unfortunately, the standard is very loose regarding the structure of the XML.
For example the <bioghist> tag can exist within the <archdesc> tag, or within a <descgrp> tag, or nested within another <bioghist> tag, or a combination of the above, or can be left out entirely. I've found it to be very difficult to select just the bioghist tag I'm looking for without also selecting others.
Below are a few different possible EAD XML documents my XSLT might have to process:
First example
<ead>
<eadheader>
<archdesc>
<bioghist>one</bioghist>
<dsc>
<c01>
<descgrp>
<bioghist>two</bioghist>
</descgrp>
<c02>
<descgrp>
<bioghist>
<bioghist>three</bioghist>
</bioghist>
</descgrp>
</c02>
</c01>
</dsc>
</archdesc>
</eadheader>
</ead>
Second example
<ead>
<eadheader>
<archdesc>
<descgrp>
<bioghist>
<bioghist>one</bioghist>
</bioghist>
</descgrp>
<dsc>
<c01>
<c02>
<descgrp>
<bioghist>three</bioghist>
</descgrp>
</c02>
<bioghist>two</bioghist>
</c01>
</dsc>
</archdesc>
</eadheader>
</ead>
Third example
<ead>
<eadheader>
<archdesc>
<descgrp>
<bioghist>one</bioghist>
</descgrp>
<dsc>
<c01>
<c02>
<bioghist>three</bioghist>
</c02>
</c01>
</dsc>
</archdesc>
</eadheader>
</ead>
As you can see, an EAD XML file might have a <bioghist> tag almost anywhere. The actual output I'm suppose to produce is too complicated to post here. A simplified example of the output for the above three EAD examples might be like:
Output for First example
<records>
<primary_record>
<biography_history>first</biography_history>
</primary_record>
<child_record>
<biography_history>second</biography_history>
</child_record>
<granchild_record>
<biography_history>third</biography_history>
</granchild_record>
</records>
Output for Second example
<records>
<primary_record>
<biography_history>first</biography_history>
</primary_record>
<child_record>
<biography_history>second</biography_history>
</child_record>
<granchild_record>
<biography_history>third</biography_history>
</granchild_record>
</records>
Output for Third example
<records>
<primary_record>
<biography_history>first</biography_history>
</primary_record>
<child_record>
<biography_history></biography_history>
</child_record>
<granchild_record>
<biography_history>third</biography_history>
</granchild_record>
</records>
If I want to pull the "first" bioghist value and put that in the <primary_record>, I can't simply <xsl:apply-templates select="/ead/eadheader/archdesc/bioghist", as that tag might not be a direct descendant of the <archdesc> tag. It might be wrapped by a <descgrp> or a <bioghist> or a combination thereof. And I can't select="//bioghist", because that will pull all the <bioghist> tags. I can't even select="//bioghist[1]" because there might not actually be a <bioghist> tag there and then I'll be pulling the value below the <c01>, which is "Second" and should be processed later.
This is already a long post, but one other wrinkle is that there can be an unlimited number of <cxx> nodes, nested up to twelve levels deep. I'm currently processing them recursively. I've tried saving the node I'm currently processing (<c01> for example) as a variable called 'RN', then running <xsl:apply-templates select=".//bioghist [name(..)=name($RN) or name(../..)=name($RN)]">. This works for some forms of EAD, where the <bioghist> tag isn't nested too deeply, but it will fail if it ever has to process an EAD file created by someone who loves wrapping tags in other tags (which is totally fine according to the EAD Standard).
What I'd love is someway of saying
Get any <bioghist> tag anywhere below the current node but
don't dig deeper if you hit a <c??> tag
I hope that I've made the situation clear. Please let me know if I've left anything ambiguous. Any assistance you can provide would be greatly appreciated. Thanks.

As the requirements are rather vague, any answer only reflects the guesses its author has made.
Here is mine:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:my" exclude-result-prefixes="my">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<my:names>
<n>primary_record</n>
<n>child_record</n>
<n>grandchild_record</n>
</my:names>
<xsl:variable name="vNames" select="document('')/*/my:names/*"/>
<xsl:template match="/">
<xsl:apply-templates select=
"//bioghist[following-sibling::node()[1]
[self::descgrp]
]"/>
</xsl:template>
<xsl:template match="bioghist">
<xsl:variable name="vPos" select="position()"/>
<xsl:element name="{$vNames[position() = $vPos]}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<ead>
<eadheader>
<archdesc>
<bioghist>first</bioghist>
<descgrp>
<bioghist>first</bioghist>
<bioghist>
<bioghist>first</bioghist></bioghist>
</descgrp>
<dsc>
<c01>
<bioghist>second</bioghist>
<descgrp>
<bioghist>second</bioghist>
<bioghist>
<bioghist>second</bioghist></bioghist>
</descgrp>
<c02>
<bioghist>third</bioghist>
<descgrp>
<bioghist>third</bioghist>
<bioghist>
<bioghist>third</bioghist></bioghist>
</descgrp>
</c02>
</c01>
</dsc>
</archdesc>
</eadheader>
</ead>
the wanted result is produced:
<primary_record>first</primary_record>
<child_record>second</child_record>
<grandchild_record>third</grandchild_record>

I worked out a solution on my own and posted it at this Q&A because the solution is quite specific to a certain XML standard and seemed out of the scope of this question. If people feel it would be best to post it here as well, I can update this answer with a copy.

Related

Passing a node as parameter to a XSL stylesheet

I need to pass a node as a parameter to an XSL stylesheet. The issue is that the parameter gets sent as a string. I have seen the several SO questions regarding this topic, and I know that the solution (in XSLT 1.0) is to use an external node-set() function to transform the string to a node set.
My issue is that I am using eXist DB I cannot seem to be able to get its XSLT processor to locate any such function. I have tried the EXSLT node-set() from the namespace http://exslt.org/common as well as both the Saxon and Xalan version (I think eXist used to use Xalan but now it might be Saxon).
Are these extensions even allowed in the XSLT processor used by eXist? If not, is there something else I can do?
To reference or transform documents from the database, you should pass the path as a parameter to the transformation, and then refer to it using a parameter and variable
(: xquery :)
let $path-to-document := "/db/test/testa.xml"
let $stylesheet :=
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="source" required="no"/>
<xsl:variable name="error"><error>doc not available</error></xsl:variable>
<xsl:variable name="theDoc" select="if (doc-available($source)) then doc($source) else $error"/>
<xsl:template match="/">
<result><xsl:value-of select="$source"/> - <xsl:value-of select="node-name($theDoc/*)"/></result>
</xsl:template>
</xsl:stylesheet>
return transform:transform(<dummy/>,$stylesheet, <parameters><param name="source" value="xmldb:exist://{$path-to-document}"/></parameters>)
As per Martin Honnen's comments I don't think it is possible to pass an XML node via the <parameters> structure of the transform:transform() function in eXist. The function seems to strip away any XML tags passed to it as a value.
As a workaround I will wrap both my input XML and my parameter XML into a root element and pass that as input to the transform function.

Scope (root node, context) of XSLT "key" Element

I have an XSLT key defined. I need to access the key from within a for-each loop, where that loop is processing a node-set that is outside the scope of where the key was defined.
Snippet, where I've marked two lines, one which works and one which does not:
<xsl:value-of select="key('name', 'use')"/> <!-- works -->
<xsl:for-each select="$outOfScopeNodeSet">
<xsl:value-of select="key('name', 'use')"/> <!-- does not work -->
</xsl:for-each>
Is there a way to access the key from within the for-each loop?
XSLT 1.0, msxsl engine.
(I could not think of a reasonable way to provide a full working example for this. I'm also not sure of the correct terminology, such as "scope" - perhaps if I knew the correct terminology I'd be able to find my answer already. If the question is not clear enough please let me know and I'll try to edit it into better shape.)
In XSLT 1.0, keys do not work across documents. It seems that your $outOfScopeNodeSet contains a node-set whose root node is different from the root node of the XML document being processed (probably created by the exsl:node-set() function?) - while the key is supposed to fetch a value from the processed XML document.
To resolve this problem, you need to return the context back to the processed XML document before calling the key() function, for example:
<xsl:variable name="root" select="/" />
<xsl:for-each select="$outOfScopeNodeSet">
<xsl:variable name="use" select="some-value" />
<xsl:for-each select="$root">
<xsl:value-of select="key('name', $use)"/>
</xsl:for-each>
</xsl:for-each>

Value-of select in <a href=> (XSLT)

I try to concstruct link with
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:value-of select="concat('file:///', substring-before('%RolesPath%', 'roles'),'Flores.chm')"/>
</xsl:attribute>
Help
</xsl:element>
but I get error:
File file:///Flores.chm not found
I'm pretty sure, that variable %RolesPath% works fine. I'm using it in code normally. And if I use in code only
<xsl:value-of select="concat('file:///', substring-before('%RolesPath%', 'roles'),'Flores.chm')"/>
I get
file:///C:\Flores\Flores.chm
which is right path. Where I'm doing mistake please?
edit. %RolesPath% stores path to specify folder of program, which works with this code. In my case %RolesPath% stores "C:\Flores\roles\".
To specify my problem. I need open file(Flores.chm) in root folder of program. Program can be install everywhere in PC and prapably only way, how I can get the path is via %RolesPath%.
What you are passing to substring-before() is just a string ('%RolesPath%'). It appears that you are trying to use a Windows environment variable. This isn't going to work the way you're using it.
I think you have 2 options:
Option 1
Pass the value of the environment variable as an xsl:param when you call the stylesheet. This would work in either XSLT 1.0 or 2.0.
You would need the xsl:param:
<xsl:param name="RolesPath"/>
and this is how you would reference it:
<a href="{concat('file:///', substring-before($RolesPath, 'roles'),'Flores.chm')}"/>
Option 2
Use the environment-variable() function. This would only work with an XSLT 3.0 processor, such as Saxon-PE or EE.
Example:
<a href="{concat('file:///', substring-before(environment-variable('RolesPath'), 'roles'),'Flores.chm')}"/>
Here's another example of environment-variable() to show the function actually working:
XSLT 3.0
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<environment-variable name="TEMP" value="{environment-variable('TEMP')}"/>
</xsl:template>
</xsl:stylesheet>
Output (when applied to any well-formed XML)
<environment-variable name="TEMP" value="C:\Users\dhaley\AppData\Local\Temp"/>
Use this shorter expression:
<a href="file:///{substring-before($RolesPath, 'roles')}Flores.chm"/>
where $RolesPath is passed as an external, global parameter to the transformation.
How exactly to pass an external parameter to the transformation varies from one XSLT processor to another -- read your XSLT processor documentation. Some XSLT processors also allow string-typed parameters to be passed to the transformation from a command-line execution utility.

Getting the value of an attribute in an XML document using XSL

I'm trying to get the value of iWantToGetThis.jpg and put it into an <img> during my XSL transformation. This is how my xml is structures:
<article>
<info>
<mediaobject>
<imageobject>
<imagedata fileref='iWantToGetThis.jpg'>
Here's what I've come up with for the XSL:
<xsl:template name="user.header.content">
<xsl:stylesheet xmlns:d='http://docbook.org/ns/docbook'>
<img><xsl:attribute name="src">../icons/<xsl:value-of select='ancestor-or-self::d:article/info/mediaobject/imageobject/imagedata/#fileref' /></xsl:attribute></img>
</xsl:stylesheet>
</xsl:template>
The image is being added to the output, but the src attribute is set to "../icons/", so I'm assuming it's not finding the fileref attribute in the XML. This looks perfectly valid to me, so I'm not sure what I'm missing.
I am not sure how you can get anything back at all, because that does not look like a valid XSLT document (I would expect the error "Keyword xsl:stylesheet may not contain img.").
However, it may be you are just showing a fragment of the code. If this is the case, your issue may be that you have only specified the namespace for the article element, when you really need to specify it for all elements in your xpath. Try this
<xsl:value-of
select="ancestor-or-self::d:article/d:info/d:mediaobject/d:imageobject/d:imagedata/#fileref"/>
Another possible problem may be because you are using the 'ancestor-or-self' xpath axis to find the attribute. This would only work if your current context was already on the article element, or one of its descendants.
As a side note, you can simplify the code by making use of Attribute Value Templates here
<img src="../icons/{ancestor-or-self::d:article/d:info/d:mediaobject/d:imageobject/d:imagedata/#fileref}" />

XSLT - Two seperate data sources merged into one XSLT

I've have two XML data sources which are completly seperate. UserDetails.xml and UserSites.xml.
The UserDetails.xml contains:
<a:UserDetails>
<a:user>
<a:username>Clow</a:username>
<a:userid>9834</a:userid>
</a:user>
<a:user>
<a:username>Adam</a:username>
<a:userid>9867</a:userid>
</a:user>
</a:UserDetails>
UserSites.xml contains:
<a:UserSites>
<a:site>
<a:createdby>9834</a:userid>
<a:type>blog</a:type>
</a:site>
<a:site>
<a:createdby>9867</a:username>
<a:type>web</a:type>
</a:site>
What I would like to do is use data in both of these data sources to indicate which users have sites created and what type of site they have.
How can this be made possible in XSLT 1.0?
Use the document function to access nodes in an external document
For example, the following stylesheet applied to UserDetails.xml:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:a="a">
<xsl:template match="/">
<test>
<xsl:value-of
select="document('UserSites.xml')/a:UserSites/a:site/a:createdby"/>
</test>
</xsl:template>
</xsl:stylesheet>
Outputs the following result from UserSites.xml:
9834
Note: Your example XML is not well-formed, so I had to make minor adjustments before processing.