Find element that has both children and string in xslt - xslt

My source is:
<content>
<caption>text 1</caption>
<element1>Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text <bold>file</bold> is a <a>file</a> type typically identified by the .txt file name extension.</element1>
<section1>
<element2>Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text file is a file type typically identified by the .txt file name extension.</element2>
</section1>
</content>
I am trying to extract and create unique ID for the elements (it may be any element) which has both child (character elements) and text, and also the elements which has only text. The <bold> and <a> elements should not be seperated.
<caption id="id1">Text 1</caption>
<element1 id="id2">Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text <bold>file</bold> is a <a>file</a> type typically identified by the .txt file name extension.</element1>
<element2 id="id3">Notepad....</element2>
Any idea would be greatly appreciated...

I am not quite sure whether you want to preserve the hierarchy or whether you want to output a flat list of those elements you have described; the following simply extracts the described elements as a flat list (though preserving their content), the ids are generated by the XSLT processor:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*[not(*) and text()[normalize-space()]] | *[* and text()[normalize-space()]]">
<xsl:copy>
<xsl:attribute name="id" select="generate-id()"/>
<xsl:apply-templates select="#* , node()" mode="copy"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*" mode="copy">
<xsl:copy>
<xsl:apply-templates select="#* , node()" mode="#current"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When applied to your input sample, Saxon 9 outputs
<?xml version="1.0" encoding="UTF-8"?>
<caption id="d1e2">text 1</caption>
<element1 id="d1e4">Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text <bold>file</bold> is a <a>file</a> type typically identified by the .txt file name extension.</element1>
<element2 id="d1e13">Notepad is a basic text-editing program and it's most commonly used to view or edit text files. A text file is a file type typically identified by the .txt file name extension.</element2>

Related

Carriage return is deleted during xsl transformation

I'm transforming XML file with Ant xslt task.
I have CR-LF at the end of each line in original file, and only LF remains in transformed file. So all content is placed in one line.
XSL code:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" indent="yes" encoding="UTF-8" />
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="model[#name='docInfoDefaultDetails']">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
<xsl:if test="not(action[#name='attachments'])">
<action name="attachments" type="object"/>
</xsl:if>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
And Ant task (${UpdateTest_xml} is link to original xml file, ${UpdateTest_xsl} is link to xsl file):
<xslt extension=".xml" in="${UpdateTest_xml}" out="${UpdateTest_xml}.bak.xml"
style="${UpdateTest_xsl}">
</xslt>
How to preserve CRs?
As #MartinHonnen points out, XML parsers are required to normalize line endings to a single NL character, so by the time XSLT sees the content, the CR characters have already gone.
The serialization rules for XSLT permit the processor to output line endings as CRLF rather than as a single NL, but that's not much help to you if your chosen processor doesn't offer this option -- and I don't know of any that does.
(The output parameter saxon:newline mentioned by #MartinHonnen only affects the text output method, not the HTML or XML output methods.)
I think the simplest solution is: don't view XML in NotePad. There are plenty of utilities on the Windows platform that display files with NL line endings quite happily, NodePad is about the only one that doesn't.

Can we Create a Sequence in XSLT?

I am generating a text file using xslt.when i pass the xml input the xslt is converting the xml input as text file.can we provide the sequence number for each invocation.
and store it in some variable.
1)If suppose for the first time execution one text file is created so there is a variable inside the xslt (<sequence>) it should assign as number 1 like below
<sequence>1</sequence>
2)for the second time execution one more text file is created so the the sequence variable should increase.
<sequence>2<sequence>
3)for the third time execution one more text file is created so the sequence becomes like this
<sequence>3</sequence>
This thing we can generally do by create a sequence in oracle database and call that sequence inside the xslt and for each execution the sequence get increased
<sequence>CallOracleSequence</sequence>
can anyone please suggest with out using Oracle sequence.can we handle this inside the xslt.
The XSLT will not maintain state between executions of the transforms.
One option would be to leverage an external config file that contains the sequence number. Using an entity reference, you can make that XML config part of the XSLT document to read it's current value and when the XSLT executes, increment the number and overwrite the config file with the new sequence number using <xsl:result-document>.
Below is a working example of an XSLT 2.0 stylesheet that assumes there is a sequence file called sequence.xml in the same directory as the XSLT being executed:
<?xml version="1.0" encoding="UTF-8"?>
<!--delare entities to reference the sequence file-->
<!DOCTYPE xsl:stylesheet [
<!ENTITY sequenceFile SYSTEM "sequence.xml">
]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output name="sequenceOutput" method="xml" indent="yes"/>
<!--this variable is used to store the expanded entity reference for
the current sequence.xml file
When the XSLT is parsed it will "look" like this to the XML parser:
<xsl:variable name="sequence><sequence>1</sequence></xsl:variable>
-->
<xsl:variable name="sequence">
<!--
this entity reference will expand to:
<sequence>x</sequence>
when the XSLT is parsed
-->
&sequenceFile;
</xsl:variable>
<!--
Use the document() function with an empty value to read the XSLT
and then parse the sequence value produced by the entity reference
-->
<xsl:variable name="currentSequenceValue"
select="number(document('')/*/xsl:variable[#name='sequence']/sequence)"/>
<xsl:template match="/">
<!--do normal processing of the XML document-->
<xsl:apply-templates />
<!--
This will overwrite the sequence file with the incremented value
-->
<xsl:result-document format="sequenceOutput" href="sequence.xml">
<sequence><xsl:value-of select="$currentSequenceValue+1"/></sequence>
</xsl:result-document>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Usually the sequence number will relate to something in the input, so you can use position() or xsl:number. But the details depend on the structure of the input.
This can be achieved with a single XSLT 2.0 transformation by writing out two files -- the result of the transformation and the updated number of executions:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vRepetitions" select=
"document('file:///c:/temp/delete/numberOfRepetitions.xml')/* +1"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:result-document
href="file:///c:/temp/delete/iteration{$vRepetitions}.xml">
<xsl:apply-templates/>
</xsl:result-document>
<xsl:result-document href="file:///c:/temp/delete/numberOfRepetitions.xml">
<n><xsl:value-of select="$vRepetitions"/></n>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
There are two files involved: the source XML document and an file containing the current number of executions -- the latter must be initially created to contain:
<n>0</n>
When the above transformation is applied on any source XML document (for this demo it just applies the identity rule on it), it does its usual processing and produces the wanted result. Additionally, the transformation reads the XML document that contains the current number of executuions and updates this number and writes the updated (number of executions) document back to disk:
Saxon 9.1.0.5J from Saxonica
Java version 1.6.0_31
Stylesheet compilation time: 625 milliseconds
Processing file:/C:/Program%20Files/Java/jre6/bin/marrowtr.xml
Building tree for file:/C:/Program%20Files/Java/jre6/bin/marrowtr.xml using class net.sf.saxon.tinytree.TinyBuilder
Tree built in 16 milliseconds
Tree size: 4 nodes, 4 characters, 0 attributes
Loading net.sf.saxon.event.MessageEmitter
Building tree for file:///c:/temp/delete/numberOfRepetitions.xml using class net.sf.saxon.tinytree.TinyBuilder
Tree built in 0 milliseconds
Tree size: 4 nodes, 1 characters, 0 attributes
Writing to file:/c:/temp/delete/iteration2.xml
Writing to file:/c:/temp/delete/numberOfRepetitions.xml
Execution time: 140 milliseconds
Memory used: 11477344
NamePool contents: 16 entries in 16 chains. 6 prefixes, 7 URIs
Here we see that at the second execution the transformation creates two files:
iteration2.xml contains the result from the second execution of the transformation.
numberOfRepetitions.xml If we examine this file, its contents after the second execution is as expected:
. . . .
<n>2</n>

Having trouble selecting properties with XSLT

I need to select Property1, and SubProperty2 and strip out any other properties. I need to make this future proof so that any new properties added to the xml won't break validation. iow's new fields have to be stripped by default.
<Root>
<Property1/>
<Property2/>
<Thing>
<SubProperty1/>
<SubProperty2/>
</Thing>
<VariousProperties/>
</Root>
so in my xslt I did this:
<xsl:template match="Property1">
<Property1>
<xsl:apply-templates/>
</Property1>
</xsl:template>
<xsl:template match="/Thing">
<SubProperty1>
<xsl:apply-templates select="SubProperty1" />
</SubProperty1>
</xsl:template>
<xsl:template match="*" />
The last line should strip anything I haven't defined to be selected.
This works to select my property1 but it always selects an empty node for SubProperty. The match on * seems to strip out the deeper object before my match on them can work.
I removed the match on * and it select my SubProperty with a value. So, how can I select the sub properties and still strip everything away that I am not using.
Thanks for any advise.
There are two problems:
<xsl:template match="*"/>
This ignores any element for which there isn't an overriding, more specific template.
Because there isn't a specific template for the top element Root it is ignored together with all of its subtree -- which is the complete document -- no output at all is produced.
The second problem is here:
<xsl:template match="/Thing">
This template matches the top element named Thing.
However in the provided document the top element is named Root. Therefore the above template doesn't match any node from the provided XML document and is never selected for execution. As the code inside its body is supposed to generate SubProperty1, no such output is generated.
Solution:
Change
<xsl:template match="*"/>
to:
<xsl:template match="text()"/>
And change
<xsl:template match="/Thing">
to
<xsl:template match="Thing">
The whole transformation becomes:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="Property1">
<Property1>
<xsl:apply-templates/>
</Property1>
</xsl:template>
<xsl:template match="Thing">
<SubProperty1>
<xsl:apply-templates select="SubProperty1" />
</SubProperty1>
</xsl:template>
<xsl:template match="text()" />
</xsl:stylesheet>
And when applied on the following XML document (as the provided is severely malformed it had to be fixed):
<Root>
<Property1/>
<Property2/>
<Thing>
<SubProperty1/>
<SubProperty2/>
</Thing>
<VariousProperties/>
</Root>
the result now is what is wanted:
<Property1/>
<SubProperty1/>

remove elements based on external file

I have an external setting file which has some nodes holiding attribute values of main xml document. I need to remove certian nodes from mian xml file if the attribute value is there in the setting file.
My setting file looks like this:
setting.xml
<xml>
<removenode titlename="abc" subtitlename="xyz"></removenode>
<removenode titlename="dvd" subtitlename="dvd"></removenode>
</xml>
Main.xml
<xml>
<title titlename="abc">
<subtitle subtitlename="xyz"></subtitle>
</title>
<title titlename="book">
<subtitle subtitlename="book sub title"></subtitle>
</title>
</xml>
Need a script which look for setting.xml file and remove the title element if titlename and subtitlename found in main.xml. The output should be
output.xml
<xml>
<title titlename="book">
<subtitle subtitlename="book sub title"></subtitle>
</title>
</xml>
I tried using document to read setting.xml file but not able to find how to do the match on main.xml file
<xsl:variable name="SuppressionSettings" select="document('Setting.xml')" />
<xsl:variable name="SuppressSetting" select="$SuppressionSettings/xml/removenode" />
.
Any hint how to implement it?
The key is to use an identity/copy pattern and, before each output, check the current (context) node isn't prohibited by the suppression rules nodeset.
<!-- get suppression settings -->
<xsl:variable name='suppression_settings' select="document('http://www.mitya.co.uk/xmlp/settings.xml')/xml/removenode" />
<!-- begin identity/copy -->
<xsl:template match="node()|#*">
<xsl:if test='not($suppression_settings[#titlename = current()/#titlename and #subtitlename = current()/subtitle/#subtitlename])'>
<xsl:copy>
<xsl:apply-templates select='node()|#*' />
</xsl:copy>
</xsl:if>
</xsl:template>
You can run it here (see output source - the 'abc' title node is omitted):
http://www.xmlplayground.com/9oCYKp
This XSLT indicated below works for the given document.
Note that I'm storing the contents of Setting.xml in a variable as you did, however, I'd then use that variable directly in my queries.
An important issue here is that in the match element of a template, variables cannot be used. Therefore, my template matches any <title> elements and then determines in an <xsl:choose> element whether the attributes match any values given in the settings file - if so, the <title> element will be omitted in the output.
As an explanation for why that test attribute in the <xsl:when> does what it should, imagine a comparison of someAttribute = someOtherAttribute not as a restriction that the attribute someAttribute must have the same value as the attribute someOtherAttribute, but rather as the condition that there must be any two attributes someAttribute and someOtherAttribute with the same value.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="SuppressionSettings" select="document('Setting.xml')" />
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="//title">
<xsl:choose>
<xsl:when test="(#titlename = $SuppressionSettings/xml/removenode/#titlename) and (subtitle/#subtitlename = $SuppressionSettings/xml/removenode/#subtitlename)"/>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Here's a more generic answer where the names of the attributes are not hard coded into the XSLT. Like O. R. Mapper pointed out, in XSLT 1.0 you can't use variable references in the match, so I put the document() directly in the predicate. This may not be as efficient as using a variable and then testing the variable.
XSLT 1.0
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[#* = document('setting.xml')/*/removenode/#*]"/>
</xsl:stylesheet>
XML Output (using your 2 xml files with main.xml as the input)
<xml>
<title titlename="book">
<subtitle subtitlename="book sub title"/>
</title>
</xml>

Using xslt to alter nodes from a copied xml file

I would like to copy and alter data from another xml data. I have in addition to the normal two input xml files an extra xml file. I would like to embed the entire content of this xml file into my output xml and then change some aspect of it. I have managed to do this by copying the entire file into the right area as desired thus (Thanks to this post here):
<test>
<xsl:copy-of select="document('filename.xml')/*"/>
</test>
Problem is, I want to change some of the data in filename and I don't know how I can get this done. Something along this line, perhaps?
<xsl:template match="document('filename.xml')/root/elemntToBeChanged">
<xsl:apply-templates select="Test/changeItToThis"/>
Try This,
XSLT 2.0:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#*|node()">
<xsl:copy><xsl:apply-templates select="#*|node()"/></xsl:copy>
</xsl:template>
<xsl:template match="root"><!--Input document's root element -->
<xsl:copy><xsl:apply-templates select="*|document('External-Doc.xml')"/></xsl:copy><!--Extrenal document called here -->
</xsl:template>
</xsl:stylesheet>