XSLT workflow with variable number of source files - xslt

I have a bunch of XML files with a fixed, country-based naming schema: report_en.xml, report_de.xml, report_fr.xml, etc. Now I want to write an XSLT style sheet that reads each of these files via the document() XPath function, extracts some values and generates one XML files with a summary. My question is: How can I iterate over the source files without knowing the exact names of the files I will process?
At the moment I'm planning to generate an auxiliary XML file that holds all the file names and use the auxiliary XML file in my stylesheet to iterate. The the file list will be generated with a small PHP or bash script. Are there better alternatives?
I am aware of XProc, but investing much time into it is not an option for me at the moment. Maybe someone can post an XProc solution. Preferably the solution includes workflow steps where the reports are downloaded as HTML and tidied up :)
I will be using Saxon as my XSLT processor, so if there are Saxon-specific extensions I can use, these would also be OK.

You can use the standard XPath 2.x collection() function, as implemented in Saxon 9.x
The Saxon implementation allows a search pattern to be used in the string-Uri argument of the function, thus you may be able to specify after the path of the directory a pattern for any filename starting with report_ then having two other characters, then ending with .xml.
Example:
This XPath expression:
collection('file:///c:/?select=report_*.xml')
selects the document nodes of every XML document that resides in c:\ in a file with name starting with report_ then having a 0 or more characters, then ending with .xml.

The answer by Dimitre looks like the quickest solution in your case. But since you asked, here an XProc alternative:
<p:declare-step version="1.0" xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" exclude-inline-prefixes="#all" name="main">
<!-- create context for p:variable with base-uri pointing to the location of this file -->
<p:input port="source"><p:inline><x/></p:inline></p:input>
<!-- any params passed in from outside get passed through to p:xslt automatically! -->
<p:input port="parameters" kind="parameter"/>
<!-- configuration options for steering input and output -->
<p:option name="input-dir" select="'./'"/>
<p:option name="input-filter" select="'^report_.*\.xml$'"/>
<p:option name="output-dir" select="'./'"/>
<!-- resolve any path to base uri of this file, to make sure they are absolute -->
<p:variable name="abs-input-dir" select="resolve-uri($input-dir, base-uri(/))"/>
<p:variable name="abs-output-dir" select="resolve-uri($output-dir, base-uri(/))"/>
<!-- first step: get list of all files in input-dir -->
<p:directory-list>
<p:with-option name="path" select="$abs-input-dir"/>
</p:directory-list>
<!-- iterate over each file to load it -->
<p:for-each>
<p:iteration-source select="//c:file[matches(#name, $input-filter)]"/>
<p:load>
<p:with-option name="href" select="resolve-uri(/c:file/#name, $abs-input-dir)"/>
</p:load>
</p:for-each>
<!-- wrap all files in a reports element to be able to hand it in to the xslt as a single input document -->
<p:wrap-sequence wrapper="reports"/>
<!-- apply the xslt (stylesheet is loaded below) -->
<p:xslt>
<p:input port="stylesheet">
<p:pipe step="style" port="result"/>
</p:input>
</p:xslt>
<!-- store the result in the output dir -->
<p:store>
<p:with-option name="href" select="resolve-uri('merged-reports.xml', $abs-output-dir)"/>
</p:store>
<!-- loading of the stylesheet.. -->
<p:load href="process-reports.xsl" name="style"/>
</p:declare-step>
Store the above as process-reports.xpl for instance. You can run it with XMLCalabash (http://xmlcalabash.com/download/). You can run it like this:
java -jar calabash.jar process-reports.xpl input-dir=./ output-dir=./
The above code assumes a process-reports.xsl that takes one documents that wraps all reports, and does a bit of processing on it. You could do processing in pure XProc as well, but you might prefer it this way.
You could also move the p:xslt step up to within the p:for-each (below the p:load), that would cause the xslt to be applied to each report individually.
Good luck!

Related

eXist-db / XSLT / Saxon collection() slow as molasses (or errors out with memory limit)

Coming from this question, I managed one entirely unsatisfactory solution for accessing an eXist-DB collection() from an XSLT 2.0 document loaded from within an eXist-db/Xquery transformation function:
The XSLT file declares a variable :
<xsl:variable name="coll" select="collection('xmldb:exist:///db/apps/deheresi/data/collection_ms609.xml')"/>
This points to a catalog xml file I created (per Saxon documentation) that looks like this, in order to load the actual collection:
<collection stable="true">
<doc href="xmldb:exist:///db/apps/deheresi/data/ms609_0001.xml"/>
<doc href="xmldb:exist:///db/apps/deheresi/data/ms609_0002.xml"/>
...
...
<doc href="xmldb:exist:///db/apps/deheresi/data/ms609_0709.xml"/>
<doc href="xmldb:exist:///db/apps/deheresi/data/ms609_0710.xml"/>
</collection>
This allows the XSLT file to use a key that needs to search across all these files:
<xsl:key name="correspkey" match="tei:seg[#type='dep_event' and #corresp]" use="#corresp"/>
<xsl:variable name="correspvar" select="self::seg[#type='dep_event' and #corresp]/#corresp"/>
<xsl:value-of select="$coll/(key('correspid',$correspvar) except $correspvar)/#id" separator=", "/>
As it stands, if I have 50 documents in the catalog, I get a result in 2 minutes; with all 710 I get a java GC error after 4 minutes.
I have set indexes on relevant nodes in eXist-DB, but this does nothing to performance. It seems to me Saxon is working 'outside' eXist-DB's optimisations, treating eXist-DB as a simple file system.
(For what it's worth, setting href="/db/apps/deheresi/data/ms609_0001.xml" does not let Saxon see the documents.)
I suspect all of this is why the eXist-DB documentation is non-existent.
As it goes, I am looking for solutions for intensive searches of collections from within XSLT 2.0 loaded within eXist-DB by Xquery transform().
If anything, I hope this post helps future searchers encountering the same problem.
The general architectural principle is: try to move the searching closer to the data. In this case this means: use eXist to find the documents of interest, don't extract every possible candidate document from eXist and then ask Saxon to do the searching. Select the actual documents of interest in an eXist XQuery, and then pass the list of these documents to Saxon in a stylesheet parameter.

xs3p: handling include-tags

I just wanted to generate a documentation of a schema with xs3p.
The problem is, as far as I understand it, that the schema is split into several files and that xs3p did not process the include-tags of the master file: The result is a documentation containing only the root element.
What did I do exactly?
I unzipped the xs3p-download into a certain directory
I copied all schema files into the directory
I called saxonb-xslt master.xsd xs3p.xsl >doku.html (under Ubuntu Trusty, if that matters)
Can you give me any help? I assume, there are two lines to solve the problem:
Making xs3p process the include-tags
Integrating all xsd-files into a single one — how would this work?
Thank you in advance!
You have to set the following xsl:param in xs3p.xsl :
<!-- If 'true', searches 'included' schemas for schema components
when generating links and XML Instance Representation tables. -->
<xsl:param name="searchIncludedSchemas">true</xsl:param>
<!-- If 'true', searches 'imported' schemas for schema components
when generating links and XML Instance Representation tables. -->
<xsl:param name="searchImportedSchemas">true</xsl:param>
<!-- File containing the mapping from file locations of external
(e.g. included, imported, refined) schemas to file locations
of their XHTML documentation. -->
<xsl:param name="linksFile">xs3p_links.xml</xsl:param>
Your included/imported schemas must have been previously transformed into a my_transformed_included_schema.html file, and you need to define a xs3p_links.xml file in order to assign some imported/included schema to theirs location such as :
<?xml version="1.0"?>
<links xmlns="http://titanium.dstc.edu.au/xml/xs3p">
<schema file-location="my-included-schema.xsd" docfile-location="./my_transformed_included_schema.html"/>
</links>
Hope that'll help !

find other files in same directory as XML file

I'm trying to get content from another XML file in the same directory as my XML file. However, I don't know how to get the uri of the source XML. The XSLT keeps relating to its own directory.
How can I get the URI of the source XML?
I would recommend document-uri() rather than base-uri(). It will usually be the same, but base-uri() is affected by the xml:base attribute and by use of XML external entities, while document-uri() is not.
You can use the base-uri() function:
<!-- your external XML -->
<xsl:variable name="doc" select="document('http://www.xyz.com./path/your-doc.xml')"/>
<!-- the base URI of your external XML -->
<xsl:variable name="doc-base-uri" select="base-uri($doc)"/>
I found an answer that worked for me.
I got it using base-uri(.).

Processing the output of another XSLT Stylesheet

I have an XSLT stylesheet that produces some output in XML. I want to processes that output with another stylesheet. Is there a way to tell the latter stylesheet to "run and use" the results from the former?
There is not, as far as I know, a standard way to tell an XSLT processor to run another stylesheet on given input and do something with the output. In some cases you can process the input against one set of templates and save the result in a variable, then apply a different set of templates to the value of the variable, something like this:
<xsl:template match="/">
<xsl:variable name="temp">
<xsl:apply-templates mode="first-pass"/>
</xsl:variable>
<xsl:apply-templates select="$temp" mode="second-pass"/>
</xsl:template>
This assumes you're running XSLT 2.0. In XSLT 1.0 you will need a processor that supports the node-set extension (many do), and you'll need to change the reference to $temp to something like exslt:nodeset($temp).
As you will perceive, this won't work very well if your two stylesheets both use the default mode and operate on overlapping sets of element types. So some XSLT processors have added extensions to provide the kind of functionality you describe (see, for example, discussions of the Xalan pipe:pipeDocument extension element).
Of course, you can also handle the pipe outside of XSLT. The simplest way to do it depends upon the environment you are running in.
If you're running XSLT from an operating system shell and your XSLT processor accepts input on stdin, you can pipe the output from one stylesheet into the other:
xsltproc a.xsl in.xml | xsltproc b.xsl - > out.xml
And as mohammed moh has already pointed out, many scripting environments make it possible to do similar things: he mentions PHP, and of course there's XProc.
yes You can. You must Transforms the source node to a DOMDocument I don't Know What is your Programming Language . For Example in php is transformToDoc() after Transforms You Can Run A New XSLT Stylesheet On DOMDocument Output

Dynamic XSL file

I have 3 XSL files which have paths in them to something like C:\templates\Test\file.pdf
This path isn't always going to be the same and rather than having it hard coded in the XSL, I'd like it so that the path C:\templates\test\ is replaced with a tag [BASEPATH] and when I read in the xsl file into the XSLTransform object (yes I know it's been deprecated, I may move over to the XSLCompiledTransform at the same time), I'd like the tag [BASEPATH] to be replaced with the absolute file path of the web folder (or Server.MapPath("~") seeing as it is in .net)
I thought I may be able to make an XSLLoader aspx page which takes the name of the XSL file through the querystring and then returns the XSL file via xml content-type. When I try this, I get a 503 error though so I'm not sure if you can pass urls like this into the XSLTransform.Load method.
Any ideas what to do?
Have you looked at XSL parameters?
<xsl:param name="basepath" select="'C:\Users\Graeme\'" />
<xsl:value-of select="document(concat($basepath, 'test.pdf'))" />
Then, most decent XSLT engines have a way to set a root level parameter from outside.