XSLT - Two seperate data sources merged into one XSLT - xslt

I've have two XML data sources which are completly seperate. UserDetails.xml and UserSites.xml.
The UserDetails.xml contains:
<a:UserDetails>
<a:user>
<a:username>Clow</a:username>
<a:userid>9834</a:userid>
</a:user>
<a:user>
<a:username>Adam</a:username>
<a:userid>9867</a:userid>
</a:user>
</a:UserDetails>
UserSites.xml contains:
<a:UserSites>
<a:site>
<a:createdby>9834</a:userid>
<a:type>blog</a:type>
</a:site>
<a:site>
<a:createdby>9867</a:username>
<a:type>web</a:type>
</a:site>
What I would like to do is use data in both of these data sources to indicate which users have sites created and what type of site they have.
How can this be made possible in XSLT 1.0?

Use the document function to access nodes in an external document
For example, the following stylesheet applied to UserDetails.xml:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:a="a">
<xsl:template match="/">
<test>
<xsl:value-of
select="document('UserSites.xml')/a:UserSites/a:site/a:createdby"/>
</test>
</xsl:template>
</xsl:stylesheet>
Outputs the following result from UserSites.xml:
9834
Note: Your example XML is not well-formed, so I had to make minor adjustments before processing.

Related

in xslt 2.0, can I import a common xslt file into two files where one outputs xml and the other output html?

I'm using xslt 2.0. Can I have a common xslt file that is imported by two main stylesheets, where one of those outputs html and the other outputs xml?
For example, say I have common.xsl. It transforms xml to xml.
Then I have main_output_xml.xsl. This will import common.xsl and its output format will be xml.
I also have main_output_html.xsl. This will also import common.xsl, but it will have an output format of html.
Is this possible?
As Ken Holman says, the answer is yes, you can do exactly what you are suggesting.
#Nalaka526 if your read the question then you will see that it only demands a "yes" or "no" answer. The only reason my answer is longer is that SO doesn't allow short answers.
Short: "Yes".
Long: use named output.
First, define the different output options at the top level of your XSLT.
"20 Serialization.
...
A stylesheet may contain multiple xsl:output declarations and may include or import stylesheet modules that also contain xsl:output declarations. The name of an xsl:output declaration is the value of its name attribute, if any."
<xsl:output name="text" method="text" indent="no" encoding="utf-8" />
<xsl:output name="default" indent="no" method="html" encoding="utf-8" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" />
Then use one of the defined name methods in your result-document output:
<xsl:result-document href="output_file.txt" format="text">
...
</xsl:result-document>
"19.1 Creating Final Result Trees
...
The xsl:result-document instruction defines the URI of the result tree, and may optionally specify the output format to be used for serializing this tree."

Processing hl7 type message using xslt or regex, or combination of two (XSLT 1.0)

so I have this hl7 type message that I have to transform using either regex or xslt or combination of two.
Format of this message is DateTime(as in YYYYMMDDHHMMSS)^UnitName^room^bed|). Each location is separated with a pipe, so each person can have one or multiple locations.
And the messages looks like this( when a patient has only one location):
20130602201605^Some Hospital^ABFG^411|
End xml result should look like this:
<Location>
<item>
<when>20130602201605</when>
<UnitName>Some Hospital</UnitName>
<room>ABFG</room>
<bed>411</bed>
</item>
</Location>
I would probably use substring type of function if it was only one location.
The problem I am running into is when there is more than one. I am relatively new to xslt and regex in general so I don't know how to use recursion in these instances.
So if I have a message like this with multiple locations:
20130601003203^GBMC^XXYZ^110|20130602130600^Sanai^ABC^|20130602150003^John Hopkins^J615^A|
The end result should be:
<Location>
<item>
<when>0130601003203</when>
<UnitName>GBMC</UnitName>
<room>XXYZ</room>
<bed>110</bed>
</item>
<item>
<when>20130602130600</when>
<UnitName>Sanai</UnitName>
<room>ABC</room>
<bed></bed>
</item>
<item>
<when>20130602150003</when>
<UnitName>John Hopkins</UnitName>
<room>J615</room>
<bed>A</bed>
</item>
</Location>
So how would I solve this? Thanks in advance.
Given that your Hl7 message is "|^~\&" encoded and not in an XML format, it is not clear how you will be using an XSLT 1.0 processor for your task. Can you describe your processing pipeline in greater detail? Your snippets are not complete messages, and it is not clear whether you will be starting with complete messages or attempting to parse isolated fields handed to a larger processing task through parameters or something.
If your processing starts with a complete HL7 message, I would suggest looking into the HAPI project, or a similar set of libraries, to have the messages converted from |^~\& to </> format, then invoking your XSLT on that version of the data. (You could also use the HAPI libraries in a full-Java solution. In either case, there are code examples at the HAPI site and at an Apache site on HL7.) If you are not interested in using Java at all, but are open to partial non-XSLT solutions, there are other projects that provide similar serialization options (e.g., Net::HL7 for Perl, nHAPI for VB/C#, etc.).
If you have isolated "|^~\&" encoded data in an otherwise XML formatted file, then I would suggest looking into the str:tokenize function in the XSLT 1.0 exslt functions. (XSLT 2.0 has a built-in tokenize function.) You can have str:tokenize split your data on the field or component separators, then create elements using the tokenized substrings.
Here is a stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:str="http://exslt.org/strings"
extension-element-prefixes="str"
version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="data">
<Location>
<xsl:for-each select="str:tokenize(.,'|')">
<xsl:call-template name="handle-field">
<xsl:with-param name="field" select="."/>
</xsl:call-template>
</xsl:for-each>
</Location>
</xsl:template>
<xsl:template name="handle-field">
<xsl:param name="field"/>
<xsl:variable name="components" select="str:tokenize($field,'^')"/>
<item>
<when><xsl:value-of select="$components[1]"/></when>
<UnitName><xsl:value-of select="$components[2]"/></UnitName>
<room><xsl:value-of select="$components[3]"/></room>
<bed><xsl:value-of select="$components[4]"/></bed>
</item>
</xsl:template>
</xsl:stylesheet>
that runs over this input
<?xml version="1.0" encoding="UTF-8"?>
<data>20130601003203^GBMC^XXYZ^110|20130602130600^Sanai^ABC^|20130602150003^John Hopkins^J615^A|</data>
to produce this output with xsltproc:
<?xml version="1.0"?>
<Location>
<item>
<when>20130601003203</when>
<UnitName>GBMC</UnitName>
<room>XXYZ</room>
<bed>110</bed>
</item>
<item>
<when>20130602130600</when>
<UnitName>Sanai</UnitName>
<room>ABC</room>
<bed/>
</item>
<item>
<when>20130602150003</when>
<UnitName>John Hopkins</UnitName>
<room>J615</room>
<bed>A</bed>
</item>
</Location>
Your source message is in a string form, you need to create a parser that uses regex to split the message based on first pipes and then carat. refer to Unable to parse ^ character which has my original code for the parser and the solution gives a different approach to it.
After you have individual elements you need to add it to your xml as nodes.

Why is my xsl processing instruction missing a question mark?

I want to create a html document with a php block (just for learning purposes) from an xsl transformation of a xml document. I am using the <xsl:processing-instruction> tag.
&ltxsl:stylesheet version="1.0" xmlns:xsl = "http://www.w3.org/1999/XSL/Transform"&gt
&ltxsl:template match="/"&gt
&ltxsl:processing-instruction name="php"&gt
&ltxsl:text&gt
setcookie("cookiename", "cookievalue");
echo "";
&lt/xsl:text&gt
&lt/xsl:processing-instruction&gt
&lthtml&gt
&lthead&gt
&ltmeta charset="utf-8" /&gt
&lt/head&gt
&ltbody&gt
&ltxsl:apply-templates /&gt
&lt/body&gt
&lt/html&gt
&lt/xsl:template&gt
&ltxsl:template match="pagina"&gt
&ltxsl:for-each select="paragraf"&gt
&ltp&gt
&ltxsl:value-of select="."/&gt
&lt/p&gt
&lt/xsl:for-each&gt
&lt/xsl:template&gt
&lt/xsl:stylesheet&gt
The result is:
&lt?php
setcookie("ceva", "textceva");
echo "";&gt
&lthtml&gt
&lthead&gt
&ltmeta http-equiv="Content-Type" content="text/html; charset=UTF-8"&gt
&ltmeta charset="utf-8"&gt
&lt/head&gt
&ltbody&gt
&ltp&gt
text 1
&lt/p&gt
&ltp&gt
text 2
&lt/p&gt
&lt/body&gt
&lt/html&gt
Why is the second question mark missing? I was expecting something like <?php setcookie(...).. ?> .
It's because your pi (processing instruction) is an SGML processing instruction (HTML is SGML). Normally the default output for XSLT is XML, but whatever processor that you're using must be defaulting to HTML (or you omitted something in your XSLT example). Another clue pointing to this is that your meta elements aren't closed in the output.
Example (note the method="html"):
XSLT 1.0
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" method="html"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<html>
<xsl:processing-instruction name="test">pi</xsl:processing-instruction>
</html>
</xsl:template>
</xsl:stylesheet>
Output (using the XSLT as the input (or any XML file))
<html><?test pi></html>
To force an XML pi, add the xsl:output:
<xsl:output indent="yes" method="xml"/>
It's my understanding that the correct representation of a processing instruction in HTML is (or was, at the relevant point in time) to omit the question mark, and the specification for XSLT serialization says that this is what should be done by the HTML output method. Sorry, I don't have time to consult the specs just now to confirm this.
Of course, you are trying to generate stuff which is defined in the PHP specification rather than the HTML specification, and the XSLT serialization spec knows nothing of PHP.

using xsl:param variable in xs:schema element output

I am trying to output the following line from an XSLT script. It is the first line just after xsl:template match="/". What I am trying to do is to transform XML document into an XML schema and need to output the xs:schema tag in particular way.
<xs:schema xmlns:ed="http://test1" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" targetNamespace="{$ns_name}" xmlns:tns="{$ns_name}" elementFormDefault="qualified" attributeFormDefault="unqualified" xsi:schemaLocation="http://test1 file://XmlSchemaAppinfo.xsd">
the $ns_name is a xsl:param name="ns_name". It is resolved in the targetNamespace="{$ns_name}" correctly but in the xmlns:tns="{$ns_name}" it is output literally
<xs:schema targetNamespace="akolodk" elementFormDefault="qualified" attributeFormDefault="unqualified" xsi:schemaLocation="http://test1 file://XmlSchemaAppinfo.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ed="test1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="{$ns_name}">
Namespace declarations are not treated the same as attributes, even though they look the same. The xmlns:tns will have been processed by the XML parser when the stylesheet was parsed, before it gets to the XSLT processor.
If you have XSLT 2.0 you can use
<xsl:namespace name="tns" select="$ns_name"/>
to create a namespace node in the result tree but there's no easy way I know of to generate a dynamic namespace in XSLT 1.0. You can't use xsl:attribute, the spec explicitly states that while
<xsl:attribute name="xmlns:xsl" namespace="whatever">http://www.w3.org/1999/XSL/Transform</xsl:attribute>
is not an error, it will generate an attribute, not a namespace declaration - the processor is required to ignore the xmlns prefix specified in the name and must use a different prefix to output the attribute.
If your processor supports the exslt node-set extension function then the following might work:
<xsd:schema .....>
<xsl:variable name="tnsElement">
<xsl:element name="tns:dummy" namespace="{$ns_name}"/>
</xsl:variable>
<xsl:copy-of select="exsl:node-set($tnsElement)/*/namespace::tns"/>
but again the processor is allowed to ignore the prefix of the xsl:element name attribute and use a different prefix bound to the same URI, you'll have to test it with your processor.
(and you'll have to add xmlns:exsl="http://exslt.org/common" exclude-result-prefixes="exsl" to your xsl:stylesheet element).
In XSL only some attributes can be written using attribute value templates (using the '{}' notation). In particular, xmlns attributes do not support the notation.

Exclude certain child nodes when data structure is unknown

EDIT -
I've figured out the solution to my problem and posted a Q&A here.
I'm looking to process XML conforming to the Library of Congress EAD standard (found here). Unfortunately, the standard is very loose regarding the structure of the XML.
For example the <bioghist> tag can exist within the <archdesc> tag, or within a <descgrp> tag, or nested within another <bioghist> tag, or a combination of the above, or can be left out entirely. I've found it to be very difficult to select just the bioghist tag I'm looking for without also selecting others.
Below are a few different possible EAD XML documents my XSLT might have to process:
First example
<ead>
<eadheader>
<archdesc>
<bioghist>one</bioghist>
<dsc>
<c01>
<descgrp>
<bioghist>two</bioghist>
</descgrp>
<c02>
<descgrp>
<bioghist>
<bioghist>three</bioghist>
</bioghist>
</descgrp>
</c02>
</c01>
</dsc>
</archdesc>
</eadheader>
</ead>
Second example
<ead>
<eadheader>
<archdesc>
<descgrp>
<bioghist>
<bioghist>one</bioghist>
</bioghist>
</descgrp>
<dsc>
<c01>
<c02>
<descgrp>
<bioghist>three</bioghist>
</descgrp>
</c02>
<bioghist>two</bioghist>
</c01>
</dsc>
</archdesc>
</eadheader>
</ead>
Third example
<ead>
<eadheader>
<archdesc>
<descgrp>
<bioghist>one</bioghist>
</descgrp>
<dsc>
<c01>
<c02>
<bioghist>three</bioghist>
</c02>
</c01>
</dsc>
</archdesc>
</eadheader>
</ead>
As you can see, an EAD XML file might have a <bioghist> tag almost anywhere. The actual output I'm suppose to produce is too complicated to post here. A simplified example of the output for the above three EAD examples might be like:
Output for First example
<records>
<primary_record>
<biography_history>first</biography_history>
</primary_record>
<child_record>
<biography_history>second</biography_history>
</child_record>
<granchild_record>
<biography_history>third</biography_history>
</granchild_record>
</records>
Output for Second example
<records>
<primary_record>
<biography_history>first</biography_history>
</primary_record>
<child_record>
<biography_history>second</biography_history>
</child_record>
<granchild_record>
<biography_history>third</biography_history>
</granchild_record>
</records>
Output for Third example
<records>
<primary_record>
<biography_history>first</biography_history>
</primary_record>
<child_record>
<biography_history></biography_history>
</child_record>
<granchild_record>
<biography_history>third</biography_history>
</granchild_record>
</records>
If I want to pull the "first" bioghist value and put that in the <primary_record>, I can't simply <xsl:apply-templates select="/ead/eadheader/archdesc/bioghist", as that tag might not be a direct descendant of the <archdesc> tag. It might be wrapped by a <descgrp> or a <bioghist> or a combination thereof. And I can't select="//bioghist", because that will pull all the <bioghist> tags. I can't even select="//bioghist[1]" because there might not actually be a <bioghist> tag there and then I'll be pulling the value below the <c01>, which is "Second" and should be processed later.
This is already a long post, but one other wrinkle is that there can be an unlimited number of <cxx> nodes, nested up to twelve levels deep. I'm currently processing them recursively. I've tried saving the node I'm currently processing (<c01> for example) as a variable called 'RN', then running <xsl:apply-templates select=".//bioghist [name(..)=name($RN) or name(../..)=name($RN)]">. This works for some forms of EAD, where the <bioghist> tag isn't nested too deeply, but it will fail if it ever has to process an EAD file created by someone who loves wrapping tags in other tags (which is totally fine according to the EAD Standard).
What I'd love is someway of saying
Get any <bioghist> tag anywhere below the current node but
don't dig deeper if you hit a <c??> tag
I hope that I've made the situation clear. Please let me know if I've left anything ambiguous. Any assistance you can provide would be greatly appreciated. Thanks.
As the requirements are rather vague, any answer only reflects the guesses its author has made.
Here is mine:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:my" exclude-result-prefixes="my">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<my:names>
<n>primary_record</n>
<n>child_record</n>
<n>grandchild_record</n>
</my:names>
<xsl:variable name="vNames" select="document('')/*/my:names/*"/>
<xsl:template match="/">
<xsl:apply-templates select=
"//bioghist[following-sibling::node()[1]
[self::descgrp]
]"/>
</xsl:template>
<xsl:template match="bioghist">
<xsl:variable name="vPos" select="position()"/>
<xsl:element name="{$vNames[position() = $vPos]}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<ead>
<eadheader>
<archdesc>
<bioghist>first</bioghist>
<descgrp>
<bioghist>first</bioghist>
<bioghist>
<bioghist>first</bioghist></bioghist>
</descgrp>
<dsc>
<c01>
<bioghist>second</bioghist>
<descgrp>
<bioghist>second</bioghist>
<bioghist>
<bioghist>second</bioghist></bioghist>
</descgrp>
<c02>
<bioghist>third</bioghist>
<descgrp>
<bioghist>third</bioghist>
<bioghist>
<bioghist>third</bioghist></bioghist>
</descgrp>
</c02>
</c01>
</dsc>
</archdesc>
</eadheader>
</ead>
the wanted result is produced:
<primary_record>first</primary_record>
<child_record>second</child_record>
<grandchild_record>third</grandchild_record>
I worked out a solution on my own and posted it at this Q&A because the solution is quite specific to a certain XML standard and seemed out of the scope of this question. If people feel it would be best to post it here as well, I can update this answer with a copy.