Problems Trying to Pretty Print XSLT Output - xslt

this is my first post so please let me know if I can make it more constructive in any way. I have read the forum guidelines so if I inadvertantly break them in anyway it will be nothing more than an innocent mistake.
The Question
Is a simple one:
How do I pretty print the output of an XSL file?
But with some criteria:
Using only native XSL functionality.
Without having to use a second XSL file to do a 'second pass'.
It must also work for elements with mixed content.
I have googled this reasonably thoroughly but have not found a clear answer to this question. I have only used XSL for about a week so go easy if I have somehow missed the answer elsewhere.
An Example
This XML...
<email>
<attachedItem>priceless photograph.jpg</attachedItem>
<attachedItem>important document.doc</attachedItem>
<attachedItem>access codes.pdf</attachedItem>
</email>
...Transformed by this XSL...
<!-- Pretty Print Output -->
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<email>
"Please find attached the stuff."
<xsl:apply-templates/>
</email>
</xsl:template>
<xsl:template match="attachedItem">
<xsl:copy/>
</xsl:template>
...Produces this result...
<?xml version="1.0" encoding="utf-8"?>
<email>
"Please find attached the stuff."
<attachedItem>priceless photograph.jpg</attachedItem>
<attachedItem>important document.doc</attachedItem>
<attachedItem>access codes.pdf</attachedItem>
</email>
Using the Saxon6.5.5 engine
Desired Output
<?xml version="1.0" encoding="utf-8"?>
<email>
"Please find attached the stuff."
<attachedItem>priceless photograph.jpg</attachedItem>
<attachedItem>important document.doc</attachedItem>
<attachedItem>access codes.pdf</attachedItem>
</email>
My Own Progress on the Problem
From the XSL above you will see I have discovered the use of <xsl:strip-space> and <xsl:output>. This meets the first 2 criteria but not the 3rd. In other words, it produces nice pretty printed XML without the mixed content, but with it I recieve the undesired output you can see above.
I know that the reason I get this output is because of the way whitespace is preserved in the source XML. White space is always preserved if it is part of a text node that contains other non-whitespace characters, regardless of the <xsl:strip-space> instructions. However despite my understanding I still cannot think of a solution.
Although I have addressed the first 2 criteria myself I would still like to know if this is the best way to achieve a pretty printed result.
Thanks in advance!

The following stylesheet produces exactly the output you request. The transformation was performed with Saxon 6.5.5. The correct indentation can only be achieved by meticulously typing all the line feed (
) and space ( )characters.
Note that pretty printing XML has no meaning when text content is concerned. The indentation of element tags can be easily controlled, but text nodes of elements with mixed content are always a problem. An application that takes XML as input should never rely on the exact indentation or whitespace handling of text content in XML.
In general, it is considered a bad idea to directly output literal text in an XSLT stylesheet. Always put text content inside xsl:text. xsl:strip-space has an effect only on whitespace-only text nodes of elements that belong to the input XML document (as suggested by #TobiasKlevenz already).
Stylesheet
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- Pretty Print Output -->
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<email>
<xsl:text>
"Please find attached the stuff."
</xsl:text>
<xsl:apply-templates/>
</email>
</xsl:template>
<xsl:template match="attachedItem|text()">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:transform>
Output
<?xml version="1.0" encoding="utf-8"?>
<email>
"Please find attached the stuff."
<attachedItem>priceless photograph.jpg</attachedItem>
<attachedItem>important document.doc</attachedItem>
<attachedItem>access codes.pdf</attachedItem>
</email>

you can wrap "Please find attached the stuff." in an
<xsl:text>
which would produce my assumption of your desired result, if not please post a 'desired output' example/.

Related

How use regex for character after a digit

I want to select #reId which has a character after a digit( fig-FigF.3A ).
Input:
<p type="TOC_Level Two Entry">
<doclink refType="anchor" refId="fig-FigF.3A">Figure F.3A—Text<c
type="TOC_Leader Dots"><t/></tps:c></tps:doclink>
<ref format="TOC Page Number" refType="anchor" refId="fig-FigF.3A"/>
<p>
Output should be:
<p type="TOC_Level Two Entry"><doclink refType="anchor"
refId="fig-FigF.3A">F.3A<tps:t/>Text<c
type="TOC_Leader Dots"><t/></c></tps:doclink><ref
format="TOC Page Number" refType="anchor" refId="fig-FigF.3A"/></tps:p>
Tried code:
I tried to solve this with this regex ^(Figure )(\d+|[A-Z].\d+)(—)(.*). But it not workes.
How can I solve this? I am using xslt 2.0
Ist is not well-formed your Input plz check
if you want only text change then use this code with replace function:
Input:
<?xml version="1.0" encoding="UTF-8"?>
<p type="TOC_Level Two Entry">
<tps:doclink refType="anchor" refId="fig-FigF.3A" xmlns:tps="htttp:\\tps">Figure F.3A—Text<tps:c type="TOC_Leader Dots"><t/></tps:c></tps:doclink>
<ref format="TOC Page Number" refType="anchor" refId="fig-FigF.3A"/>
</p>
code:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output method="xml" omit-xml-declaration="no"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="replace(., '(Figure )([A-Z])([.])([0-9A-Z]+)(.+?)([A-Za-z]+)', '$2$3$4')"/>
</xsl:template>
</xsl:stylesheet>
output:
<?xml version="1.0" encoding="UTF-8"?>
<p type="TOC_Level Two Entry">
<tps:doclink xmlns:tps="htttp:\\tps" refType="anchor" refId="fig-FigF.3A">F.3B<tps:c type="TOC_Leader Dots"><t/></tps:c></tps:doclink>
<ref format="TOC Page Number" refType="anchor" refId="fig-FigF.3A"/>
</p>
DEMO: https://xsltfiddle.liberty-development.net/ncntCS9/1
So, trying to extract a clear requirements statement from this, it seems you want the input "fig-FigF.3A" to result in the output "F.3A". Alternatively, perhaps you want to treat "Figure F.3A—Text" as the input? On the one hand you say you are selecting the #reId attribute -- which doesn't exist in your input; on the other hand your attempt at a solution is looking for the text "Figure" which appears in a text node, rather than an attribute.
So I think we need a much clearer requirements statement.
The other problem with this as a requirements statement is that you only really give one example, not a general rule. There's a hint of a general rule in your question "which has a character after a digit". But what does this mean? Your example seems to be looking for the pattern letter-dot-digit, which doesn't match your description of the problem at all.
Sorry, SO moderators, this isn't an answer, it's a comment on the question. It started as an answer, until I realised the question wasn't clear, but by then it was too long for a comment.

XSLT: How to discard unwanted HTML nodes from source?

I am using XSLT 1.0, and using xsltproc on OS X Yosemite.
The source content is HTML; the target content is XML.
The issue is a fairly common one. I want all "uninteresting"
nodes simply to be discarded from the output. I've seen catch-all
directives like this:
<xsl:template match="node()|script"/>
<xsl:template match="*">
<xsl:apply-templates/>
</xsl:template>
This is close to what I need. But unfortunately, it's too strong when I need to add another template that visits one of the text nodes caught by node(). For example, suppose I added this template:
<xsl:template match="a/div[#class='location']/br">
<xsl:text> </xsl:text>
</xsl:template>
which simply replaces certain <br/> elements with spaces.
Well, node() precludes this latter template from taking effect,
because the relevant text node containing the line-break is discarded
already!
Well, to correct the issue, here's what I have done in lieu of the catch-all node():
<xsl:template match="html/head|div[#id='banner_parent']|button|ul|div[#id='feed_title']|span|div[#class='submit_event']|script"/>
But this is precisely the problem: I am now piecing together a template
whose matching criteria is likely to be error-prone when the source
content changes.
Is there a simpler directive that would accomplish the same thing? I'm aiming for something like this:
<xsl:template match="node()[not(locations)]|script"/>
Thanks.
If i understood correctly, you want only some nodes in the output and the rest you dont care abour, in this example I try to catch only li elements and throw the rest away.. not sure if this is what you want though http://xsltransform.net/gWmuiKk
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="html" doctype-public="XSLT-compat" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<!-- Lets pretend li is interesting for you -->
<xsl:template match="li">
<xsl:text>Interesting Node Only!
</xsl:text>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:apply-templates select="#*|node()"/>
</xsl:template>
</xsl:transform>

Saxon line break issue in output XML

I am using Saxon HE 9.5 as my XSLT processor. Since the source is a large-sized XML, I need to minimize the size of output. However, using the Saxon HE will add line breaks between each element tags. Such as the following example:
<Element1>
<attr1>
test1
</attr1>
</Element1>
I want it to be like:
<Element1> <attr1> test1 </attr1> </Element1>
so that I can minimize the size of the output XML. Is there any way to do it?
I have tried to set indent="no", but the output XML is failed to open.
Thank you!
You can use <xsl:output indent="no"/> to turn off the indenting, but your line breaks in elements that contain text will still be there (even with <xsl:strip-space elements="*"/>). You can use normalize-space() to remove them.
Example...
XML Input
<Element1>
<attr1>
test1
</attr1>
</Element1>
XSLT 2.0 (works as 1.0 too)
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="no"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|*|processing-instruction()|comment()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
</xsl:stylesheet>
XML Output
<Element1><attr1>test1</attr1></Element1>
The option indent="no" is the default. If you are getting indented output, then either (a) you have asked for it using indent="yes", or (b) the whitespace is present in the result tree before serialization. If the whitespace is present in the result tree, then either (b1) the stylesheet added it to the result tree, or (b2) it was copied from the source document. If (b2) is the cause, then putting <xsl:strip-space elements="*"/> in your stylesheet might be the answer (assuming you don't have any significant whitespace in the source document that needs to be preserved).
We can't give anything other than general advice unless you show us your code.

XSL Generating CSV

Trying to convert this:
<list>
<entry>
<parentFeed>
<feedUrl>http://rss.nzherald.co.nz/rss/xml/nzhrsscid_000000001.xml</feedUrl>
<id>68</id>
</parentFeed>
<content>Schools will have to put up with problematic pay administered through Novopay for another eight weeks after the Government announced it would persist with the unstable system.Minister responsible for Novopay, Steven Joyce, delayed...</content>
<link>http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10872300&ref=rss</link>
<title>Novopay: Govt sticks with unstable system</title>
<id>55776</id>
<published class="sql-timestamp">2013-03-19 03:38:55.0</published>
<timestamp>2013-03-19 07:31:16.358 UTC</timestamp>
</entry>
</list>
into this, using XSLT:
Title, Link, Date
Novopay: Govt sticks with unstable system, http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10872300&ref=rss, 2013-03-19 03:38:55.0
But try as I might, I can't get rid of the blank line at the beginning of the document. My stylesheet follows:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:csv="csv:csv"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/list"> <xsl:for-each select="entry">
Title, Link, Date
<xsl:value-of select="title"/>, <xsl:value-of select="link"/>, <xsl:value-of select="published"/>
</xsl:for-each></xsl:template>
</xsl:stylesheet>
I've tried putting in <xsl:text>
</xsl:text> as suggested here which stripped the last linebreak, so I moved it to the top of the file, at which point it turned into a no-op. The solution here actually adds a blank line (which makes sense, as the hex code is for newline, according to the ascii manpage).
As a workaround, I've been using Java to generate the CSV output.
However, I do feel XSLT would be a lot faster as it is designed to transform XML to various other formats. A similar XSLT generates HTML, RSS, and ATOM feeds perfectly.
You have done it perfectly, your logic is spot on. However what you need to take in mind, when your outputting text all indents in your XSLT will affect the output so your XSLT should look like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:csv="csv:csv"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/list"> <xsl:for-each select="entry">Title, Link, Date
<xsl:value-of select="title"/>, <xsl:value-of select="link"/>, <xsl:value-of select="published"/>
<xsl:text>
</xsl:text>
</xsl:for-each></xsl:template>
</xsl:stylesheet>
Run the above XSLT and it will work perfectly.

XSL for-each loop is not working

I'm using Java to transform an XML document to text:
Transformer transformer = tFactory.newTransformer(stylesource);
transformer.transform(source, result);
This seems to work except when there are colons in the XML document. I tried this example:
XML file:
<?xml version="1.0" encoding="UTF-8"?>
<test:TEST >
<one.two:three id="my id" name="my name" description="my description" >
</one.two:three>
<one.two:three id="some id" name="some name" description="some description" />
</test:TEST>
XSL file:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xmi="http://www.omg.org/XMI"
xmlns:one.two="http://www.one.two/one.two:three" >
<xsl:output method="text" indent="yes" omit-xml-declaration="yes"/>
<xsl:variable name="myVariable">one.two:three</xsl:variable>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*[substring(name(),1,9)='test:TEST']" >
<xsl:for-each select="./$myVariable">
inFirstLoop
</xsl:for-each>
<xsl:for-each select="./one.two:three">
inSecondLoop
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The result of the transformation I'm getting is a single line:
inFirstLoop
I'm expecting 4 lines of output
inFirstLoop
inFirstLoop
inSecondLoop
inSecondLoop
How do I fix this? Any help is greatly appreciated. Thanks.
There are multiple things wrong here. I'm surprised your transformation managed to run at all, instead of failing on parse errors and other errors.
One big problem is that your input XML uses namespace prefixes (that's what the colons are for) without declaring them. Declarations like
xmlns:one.two="http://www.one.two/one.two:three"
need to be in the source XML, as well as in the XSL. Otherwise your source XML is not well-formed (according to namespace rules).
A second problem is the XPath expression
./$myVariable
which should have thrown an error. I think what you wanted was
*[name() = $myVariable]
The third change I would make is not an error in the XSLT, but just a poor way of doing things... If you want to match <test:TEST>, you should use namespace tools to refer to namespaces. Therefore, instead of
<xsl:template match="*[substring(name(),1,9)='test:TEST']" >
use
<xsl:template match="test:TEST">
Much cleaner. Then you need to put in a namespace declaration on the outermost element of the stylesheet, as you already have to do in the input XML document:
xmlns:test="...test..."
XML namespaces, like driving a car, are a topic better learned from a little training than by trial-and-error. Reading a brief article like this will help you avoid a lot of confusion and pain down the road.