XSLT - Remove Sub Tag from results - xslt

I am looking to return the content of a particular XML Tag <para> without its sub tags <bridgehead> or <sliceXML> in the results. I am testing the following useing http://xslttest.appspot.com/ Any help is as always, much appreciated.
My XML
<para>
<bridgehead>Galaxy Zoo</bridgehead>
<sliceXML>Galaxy</sliceXML>
The human eye is far better at identifying characteristics of galaxies
than any computer. So Galaxy Zoo has called for everyday citizens to
help in a massive identification project. Well over a hundred thousand
people have helped identify newly discovered galaxies. Now you can, too.
</para>
My XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" version="1.0">
<xsl:template match="/">
<xsl:call-template name="results"/>
<xsl:message>FROM simpleHMHTransform XSLT8</xsl:message>
</xsl:template>
<xsl:template name="results">
<xsl:for-each select="//para">
<xsl:call-template name="para"/>
</xsl:for-each>
</xsl:template>
<xsl:template name="para">
<div id="para">
<xsl:value-of select="."/>
</div>
</xsl:template>
</xsl:stylesheet>
My current results
<?xml version="1.0" encoding="UTF-8"?><div xmlns:sparql- results="http://www.w3.org/2005/sparql-results#" id="para">
Galaxy Zoo
Galaxy
The human eye is far better at identifying characteristics of galaxies
than any computer. So Galaxy Zoo has called for everyday citizens to
help in a massive identification project. Well over a hundred thousand
people have helped identify newly discovered galaxies. Now you can, too.
</div>
My desired results
<?xml version="1.0" encoding="UTF-8"?><div xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" id="para">
The human eye is far better at identifying characteristics of galaxies
than any computer. So Galaxy Zoo has called for everyday citizens to
help in a massive identification project. Well over a hundred thousand
people have helped identify newly discovered galaxies. Now you can, too.
</div>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" version="1.0">
<xsl:template match="para">
<div id="para"><xsl:copy-of select="text()"/></div>
</xsl:template>
</xsl:stylesheet>

Related

docbook saxon toolchain does not recognize customization hard-page-break

I cannot get docbook tool chain to do the hard page break
as described at the end of http://www.sagehill.net/docbookxsl/PageBreaking.html
(I used to have this working for me but seem to have lost the mojo.)
Here is the script to invoke docbook and saxon
#!/bin/sh
export CLASSPATH=/home/leffstudent/saxon-6.0.1.jar:/home/leffstudent/docbook-sl-1.79.1/saxon65.jar
echo $CLASSPATH
java com.icl.saxon.StyleSheet \
-o $1.fo $1 stO.xsl \
use.extensions=1 default.table.width=auto title.margin.left=0pc insert.xref.page.number=yes
(stO.xsl also sets my ref parameters on how xref should display page numbers. That is
not working, either. Thus, I suspect that my invocation of com.icl.saxon.Stylesheet
is ignoring my customization link
Here is the test docbook file I tried. (The real files is a 500 page
class notes.)
<section><title> </title>
<para>
abc
</para>
<?hard-pagebreak?>
<para>
def
</para>
</section>
Here is the style sheet, stO.xsl
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" version="1.0">
<xsl:import href="./titlepage.xsl"/>
<xsl:import href="/home/leffstudent/docbook-xsl-1.79.1/fo/docbook.xsl"/>
<xsl:template match="processing-instruction('hard-pagebreak')">
<fo:block break-after='page'/>
</xsl:template>
<xsl:attribute-set name="formal.object.properties">
<xsl:attribute name="keep-together.within-column">auto</xsl:attribute>
</xsl:attribute-set>
<xsl:param name="local.l10n.xml" select="document('')"/>
<l:i18n xmlns:l="http://docbook.sourceforge.net/xmlns/l10n/1.0">
<l:l10n language="en">
<l:context name="xref">
<l:template name="section" text="%t on Page Number %p"/>
<l:template name="mediaobject" text="%t on Page Number %p"/>
<l:template name="imageobject" text="%p"/>
</l:context>
<l:context name="xref-number-and-title">
<l:template name="section" text="%t on Page Number %p"/>
<l:template name="imageobject" text="%p"/>
</l:context>
</l:l10n>
</l:i18n>
</xsl:stylesheet>
I got this working finally with XSLTPROC:
#!/bin/sh
xsltproc --output $1.fo sd.xsl $1
It prints a separate page where I have the hard-pagebreak processing instruction.
Here is the customization layer, sd.xsl
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" version="1.0">
<xsl:import href="/home/leffstudent/docbook-xsl-1.79.1/fo/docbook.xsl"/>
<xsl:template match="processing-instruction('hard-pagebreak')">
<fo:block break-after='page'/>
</xsl:template>
</xsl:stylesheet>
I have tried again getting my xref's to work with pictures. (That, of course, is
with a larger file than sd.xsl But that is a separate issue, both literally
and figuratively.)
I still have not been able to get this working with Xalan. See
Question 55941299.
I have to check again to see if I can get this to work with saxon.
This is is what I used to use to prepare my class notes.
However, I can prepare out my 530-page class notes with xsltproc with proper page breaks.

XSLT - Extract and manipulate portion of XML data

The input XML:
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Description><![CDATA[Audience: Andrew Reed, Senior Training Specialist, Microsoft Corporation<br/>This session is for individuals who spend significant time writing and creating documents and have some familiarity with Microsoft Word.<br/>Thanks.]]></Description>
</root>
The XSLT:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:output method="html" indent="yes"/>
<xsl:template match="/root">
<div>
<xsl:value-of disable-output-escaping="yes" select="Description"/>
</div>
</xsl:template>
</xsl:stylesheet>
I need to add a couple of more BR tags after first occurrence of BR, that's after Audience line and before other description starts.
Can you please modify my XSLT to get the desired output?
So I want output like below:
Audience: Andrew Reed, Senior Training Specialist, Microsoft Corporation
This session is for individuals who spend significant time writing and creating documents and have some familiarity with Microsoft Word.
Thanks.
It would be nice if your input data had the <br/> elements as actual elements, instead of being escaped, so that they could be selected directly using XPath.
But since they are as they are, you can use regexp replace, relying on the assumption that they will always conform to a limited range of patterns. You will often be warned not to parse XML or HTML in general using regexps, and rightly so, because regexps aren't a general solution. But for limited uses they can be sufficient.
If I've guessed your requirements correctly, you could use something like
<xsl:value-of select="replace(Description, '<[Bb][Rr] ?/?>',
'
')"/>
That will give you the sample output you showed, as opposed to adding a couple of more BR tags after first occurrence of BR. It will tolerate some variation, e.g. <br> or <BR />.
This is assuming you can use XSLT 2.0, because replace() isn't available in 1.0. If you're limited to 1.0, please let me know.

XSLT - How to ignore a sub tag in XML if supertag contains a particular term

I am trying to change my XSLT so that when a super tag in my XML contains a particular word the sub tag is not selected.
In this example I do not want the sub tag <para> displayed when the super tag <formalpara> contains the word "Galaxy"
Thanks in advance.
My XML
<formalpara>
Galaxy
<para>
<bridgehead>Galaxy Zoo</bridgehead>
<sliceXML>Galaxy</sliceXML>
The human eye is far better at identifying characteristics of galaxies
than any computer. So Galaxy Zoo has called for everyday citizens to
help in a massive identification project. Well over a hundred thousand
people have helped identify newly discovered galaxies. Now you can, too.
</para>
</formalpara>
My XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" version="1.0">
<xsl:template match="/">
<xsl:call-template name="results"/>
<xsl:message>FROM simpleHMHTransform XSLT8</xsl:message>
</xsl:template>
<xsl:template name="results">
<xsl:for-each select="//formalpara">
<xsl:call-template name="formalpara"/>
</xsl:for-each>
<xsl:for-each select="//para">
<xsl:call-template name="para"/>
</xsl:for-each>
</xsl:template>
<xsl:template name="formalpara">
<div id="formalpara">
<xsl:copy-of select="text()"/>
</div>
</xsl:template>
<xsl:template name="para">
<div id="para">
<xsl:copy-of select="text()"/>
</div>
</xsl:template>
My current output
<?xml version="1.0" encoding="UTF-8"?><div xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" id="formalpara">
Galaxy
</div><div xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" id="para">
The human eye is far better at identifying characteristics of galaxies
than any computer. So Galaxy Zoo has called for everyday citizens to
help in a massive identification project. Well over a hundred thousand
people have helped identify newly discovered galaxies. Now you can, too.
</div>
My desired output
<?xml version="1.0" encoding="UTF-8"?><div xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" id="formalpara">
Galaxy
</div>
You should really do xsl:apply-templates instead of calling named templates. You could then add this template:
<xsl:template match="formalpara[contains(text(),'Galaxy')]/para"/>
I can give a full example later.
Full Example:
XML Input
<formalpara>
Galaxy
<para>
<bridgehead>Galaxy Zoo</bridgehead>
<sliceXML>Galaxy</sliceXML>
The human eye is far better at identifying characteristics of galaxies
than any computer. So Galaxy Zoo has called for everyday citizens to
help in a massive identification project. Well over a hundred thousand
people have helped identify newly discovered galaxies. Now you can, too.
</para>
</formalpara>
XSLT 1.0
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:sparql-results="http://www.w3.org/2005/sparql-results#">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="formalpara">
<div id="formalpara">
<xsl:apply-templates/>
</div>
</xsl:template>
<xsl:template match="formalpara[contains(text(),'Galaxy')]/para"/>
<xsl:template match="para">
<div id="para">
<xsl:apply-templates/>
</div>
</xsl:template>
</xsl:stylesheet>
XML Output
<div xmlns:sparql-results="http://www.w3.org/2005/sparql-results#" id="formalpara">
Galaxy
</div>
NOTE: I also changed the contains() to contains(text(),'Galaxy') so it only looks at the text that is a direct child of formalpara.

Serialize XML file on the basis of Character Count during an XSL transformation

I have an XML document (A.xml) and it is being transformed to another XML document (B.xml), which is nothing but a replica of A.xml with an unique #id being added to each element belonging to B.xml. And this part is done.
Now I would like implement a mechanism which would track character count of every text node within B.xml (within a temporary tree) and based on maximum character count, the mechanism would be able to split and serialize B.xml in one or several parts.
Source XML Document (A.xml):
<?xml version="1.0" encoding="UTF-8"?>
<root>
<!--
Rules for splitting:
1. «head/text()» is common for all splits.
2. split files can have 600 characters max each.
3. «title» elements could not be the last element of the any result document.
-->
<head><!-- 8 characters -->Kinesics</head>
<section>
<para><!-- 37 characters -->From Wikipedia, the free encyclopedia</para>
<para><!-- 204 characters [space normalized]-->Kinesics is the interpretation of body
language such as facial expressions and gestures — or, more formally, non-verbal
behavior related to movement, either of any part of the body or the body as a
whole. </para>
<section>
<title><!-- 19 characters -->Birdwhistell's work</title>
<para><!-- 432 characters [space normalized]-->The term was first used (in 1952) by Ray
Birdwhistell, an anthropologist who wished to study how people communicate through
posture, gesture, stance, and movement. Part of Birdwhistell's work involved making
film of people in social situations and analyzing them to show different levels of
communication not clearly seen otherwise. The study was joined by several other
anthropologists, including Margaret Mead and Gregory Bateson.</para>
<para><!-- 453 characters [space normalized]--> Drawing heavily on descriptive
linguistics, Birdwhistell argued that all movements of the body have meaning (i.e.
are not accidental), and that these non-verbal forms of language (or paralanguage)
have a grammar that can be analyzed in similar terms to spoken language. Thus, a
"kineme" is "similar to a phoneme because it consists of a group of movements which
are not identical, but which may be used interchangeably without affecting social
meaning".</para>
</section>
<section>
<title><!-- 19 characters -->Modern applications</title>
<para><!-- 390 characters [space normalized]-->Kinesics are an important part of
non-verbal communication behavior. The movement of the body, or separate parts,
conveys many specific meanings and the interpretations may be culture bound. As many
movements are carried out at a subconscious or at least a low-awareness level,
kinesic movements carry a significant risk of being misinterpreted in an
intercultural communications situation.</para>
</section>
</section>
</root>
XSL File
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">
<xsl:output method="xml" encoding="UTF-8" indent="no"/>
<!--update 1-->
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:variable name="root-replica">
<xsl:call-template name="create-root-replica">
<xsl:with-param name="context" select="*"/>
</xsl:call-template>
</xsl:variable>
<xsl:copy-of select="$root-replica"/>
<!--
<xsl:call-template name="split-n-serialize">
<xsl:with-param name="context" select="$root-replica"/>
</xsl:call-template>
-->
</xsl:template>
<xsl:template name="split-n-serialize">
<xsl:param name="context"/>
<xsl:for-each select="$context">
<xsl:result-document encoding="utf-8" href="{concat('split_',position(),'.xml')}" method="xml" indent="no">
<xsl:sequence select="."/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
<xsl:template name="create-root-replica">
<xsl:param name="context"/>
<root>
<head>
<xsl:value-of select="$context/head"/>
</head>
<xsl:apply-templates select="$context/*[not(self::head)]"/>
</root>
</xsl:template>
<xsl:template match="element()">
<xsl:element name="{local-name()}">
<xsl:attribute name="id">
<xsl:value-of select="generate-id()"/>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<!--update 2-->
<xsl:template match="text()">
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
</xsl:transform>
My input XML contains 1562 characters (assuming \s+ is equal to ), and I like to split A.xml into 4 parts using the rule mentioned within source xml document.
Does anyone have any idea how to do this? Any ideas or comments are greatly appreciated.
Update 3
Details of split files
1st File
8
37
204 = 249
2nd File
8
19
432 = 459
3rd File
8
453 = 461
4th File
8
19
390 = 417
Details on Split procedure:
Contents of element «head» should part of each and every XML file.
Files could be splitted from middle of section but not in the middle of a paragraph.
Not «title» element should come at the end of an split.
Maximum number characters (excluding opening and closing tags) in a split file is upto 600.
Sample output files (indents are used for better readability)
1st file
<?xml version="1.0" encoding="UTF-8"?>
<root>
<head>Kinesics</head>
<section id="d1e6">
<para id="d1e7">From Wikipedia, the free encyclopedia</para>
<para id="d1e10">Kinesics is the interpretation of body language such as facial expressions and gestures — or, more formally, non-verbal behavior related to movement, either of any part of the body or the body as a whole.</para>
</section>
</root>
2nd file
<?xml version="1.0" encoding="UTF-8"?>
<root>
<head>Kinesics</head>
<section id="d1e6">
<section id="d1e13">
<title id="d1e14">Birdwhistell's work</title>
<para id="d1e17">The term was first used (in 1952) by Ray Birdwhistell, an anthropologist who wished to study how people communicate through posture, gesture, stance, and movement. Part of Birdwhistell's work involved making film of people in social situations and analyzing them to show different levels of communication not clearly seen otherwise. The study was joined by several other anthropologists, including Margaret Mead and Gregory Bateson.</para>
</section>
</section>
</root>
3rd File
<?xml version="1.0" encoding="UTF-8"?>
<root>
<head>Kinesics</head>
<section id="d1e6">
<section id="d1e13">
<para id="d1e20">Drawing heavily on descriptive linguistics, Birdwhistell argued that all movements of the body have meaning (i.e. are not accidental), and that these non-verbal forms of language (or paralanguage) have a grammar that can be analyzed in similar terms to spoken language. Thus, a "kineme" is "similar to a phoneme because it consists of a group of movements which are not identical, but which may be used interchangeably without affecting social meaning".</para>
</section>
</section>
</root>
4th file
<?xml version="1.0" encoding="UTF-8"?>
<root>
<head>Kinesics</head>
<section id="d1e6">
<section id="d1e23">
<title id="d1e24">Modern applications</title>
<para id="d1e27">Kinesics are an important part of non-verbal communication behavior. The movement of the body, or separate parts, conveys many specific meanings and the interpretations may be culture bound. As many movements are carried out at a subconscious or at least a low-awareness level, kinesic movements carry a significant risk of being misinterpreted in an intercultural communications situation.</para>
</section>
</section>
</root>
You would use string-length() to get the "character count" and then xsl:result-document to split your result tree into parts.
Do you need further help coding it up?

After transformation getting output in text instead of xml nodes

My problem is after executing xlst file i am getting the output in text all in one line, but not in xml as required. My xml as well as xslt file is as follows.
<root>
<Jobs Found="10" Returned="50">
<Job ID="8000000" PositionID="600002">
<Title>Development Manager</Title>
<Summary>
<![CDATA[ An experienced Development Manager with previous experience leading a small to mid-size team of developers in a Java/J2EE environment. A hands on role, you will be expected to manage and mentor a team of developers working on a mix of greenfield and maintenance projects.   My client, a well known investment bank, requires an experienced Development Manager to join their core technology team. This t
]]>
</Summary>
<DateActive Date="2009-10-06T19:36:43-05:00">10/6/2009</DateActive>
<DateExpires Date="2009-11-05T20:11:34-05:00">11/5/2009</DateExpires>
<DateUpdated Date="2009-10-06 20:12:00">10/6/2009</DateUpdated>
<Location>
<Country>xxxx</Country>
<State>xxx</State>
<City>xxx</City>
<PostalCode>xxx</PostalCode>
</Location>
<CompanyName>abc Technology</CompanyName>
<BuilderFields />
<DisplayOptions />
<AddressType>1234</AddressType>
</Job>
</Jobs>
</root>
XSLT stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" media-type="application/xml"
cdata-section-elements="Summary"/>
<!-- default: copy everything using the identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- override: for Location and Salary nodes, just process the children -->
<xsl:template match="Location|Salary">
<xsl:apply-templates select="node()"/>
</xsl:template>
<!-- override: for selected elements, convert attributes to elements -->
<xsl:template match="Jobs/#*|Job/#*">
<xsl:element name="{name()}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<!-- override: for selected elements, remove attributes -->
<xsl:template match="DateActive/#*|DateExpires/#*|DateUpdated/#*"/>
</xsl:stylesheet>
Current Output in text is:
492 50 83000003 61999998 Market-leading company With a newly created role High Profile Position With Responsibilty, Visibility & Opportunity Must Have Solid BA Skills Honed in a SDLC environment Market-leading company With a newly created role High Profile Position With Responsibilty, Visibility & Opportunity Must Have Solid BA Skills Honed in a SDLC environment My client is a market-leader who continue to go from strengt 10/5/2009 11/4/2009 10/5/2009 Australia NSW Sydney 2000 Skill Quest 90,000.00 120,000.00 Per Year AUD 6
This outout i want in xml. pls help me to get a solution.
Do you have a line like this at the top of your XSLT file??
<xsl:output method="xml" indent="yes"/>
That defines what the output format is - "text" is default, "html" and "xml" are the other options.
I don't know what you're doing, but when I run your XSLT file on the sample XML file provided, I get this as output:
<?xml version="1.0" encoding="utf-8"?>
<root>
<Jobs><Found>10</Found><Returned>50</Returned>
<Job><ID>8000000</ID><PositionID>600002</PositionID>
<Title>Development Manager</Title>
<Summary>
An experienced Development Manager with previous experience leading a small to mid-size team of developers in a Java/J2EE environment. A hands on role, you will be expected to manage and mentor a team of developers working on a mix of greenfield and maintenance projects.&#160;&#160; My client, a well known investment bank, requires an experienced Development Manager to join their core technology team. This t
</Summary>
<DateActive>10/6/2009</DateActive>
<DateExpires>11/5/2009</DateExpires>
<DateUpdated>10/6/2009</DateUpdated>
<Country>xxxx</Country>
<State>xxx</State>
<City>xxx</City>
<PostalCode>xxx</PostalCode>
<CompanyName>abc Technology</CompanyName>
<BuilderFields />
<DisplayOptions />
<AddressType>1234</AddressType>
</Job>
</Jobs>
</root>
Marc
I suspect you are watching the transformation result in a browser.
The transformation itself works perfectly, but the browser displays the plain text of the XML (since it expects HTML contents by default and ignores any tags it does not recognize, displaying their text contents only).
Try media-type="text/xml" and see if that makes any difference. If it doesn't, don't let the browser display confuse you - there is nothing wrong with the XSLT. You should use another XSLT processor to confirm/debug the XSLT.
You probably write out the inner text of an xml node instead of calling apply-templates in one of your nodes. I couldn't find your attached xsl, so it's not easy to guess. But post the xslt, and I'll tell you.