Trying to convert this:
<list>
<entry>
<parentFeed>
<feedUrl>http://rss.nzherald.co.nz/rss/xml/nzhrsscid_000000001.xml</feedUrl>
<id>68</id>
</parentFeed>
<content>Schools will have to put up with problematic pay administered through Novopay for another eight weeks after the Government announced it would persist with the unstable system.Minister responsible for Novopay, Steven Joyce, delayed...</content>
<link>http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10872300&ref=rss</link>
<title>Novopay: Govt sticks with unstable system</title>
<id>55776</id>
<published class="sql-timestamp">2013-03-19 03:38:55.0</published>
<timestamp>2013-03-19 07:31:16.358 UTC</timestamp>
</entry>
</list>
into this, using XSLT:
Title, Link, Date
Novopay: Govt sticks with unstable system, http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10872300&ref=rss, 2013-03-19 03:38:55.0
But try as I might, I can't get rid of the blank line at the beginning of the document. My stylesheet follows:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:csv="csv:csv"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/list"> <xsl:for-each select="entry">
Title, Link, Date
<xsl:value-of select="title"/>, <xsl:value-of select="link"/>, <xsl:value-of select="published"/>
</xsl:for-each></xsl:template>
</xsl:stylesheet>
I've tried putting in <xsl:text>
</xsl:text> as suggested here which stripped the last linebreak, so I moved it to the top of the file, at which point it turned into a no-op. The solution here actually adds a blank line (which makes sense, as the hex code is for newline, according to the ascii manpage).
As a workaround, I've been using Java to generate the CSV output.
However, I do feel XSLT would be a lot faster as it is designed to transform XML to various other formats. A similar XSLT generates HTML, RSS, and ATOM feeds perfectly.
You have done it perfectly, your logic is spot on. However what you need to take in mind, when your outputting text all indents in your XSLT will affect the output so your XSLT should look like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:csv="csv:csv"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/list"> <xsl:for-each select="entry">Title, Link, Date
<xsl:value-of select="title"/>, <xsl:value-of select="link"/>, <xsl:value-of select="published"/>
<xsl:text>
</xsl:text>
</xsl:for-each></xsl:template>
</xsl:stylesheet>
Run the above XSLT and it will work perfectly.
Related
This is a follow up question to
how to get 'excel' new lines in spreadsheetML (MSXSLT)
but asked as a new question, to separate this into different issue, as the behaviour seems to be different between engines (I'll leave the specific context in the other question, this is purely how to achieve some functional result).
This XSLT (in saxon he) will create what I want.
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<root>
<bar>
<xsl:text disable-output-escaping="yes"> </xsl:text>
</bar>
</root>
</xsl:template>
</xsl:stylesheet>
and gives the output
<root>
<bar>
</bar>
</root>
this one wont:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
version="1.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="foo">
<bar>
<xsl:text disable-output-escaping="yes"> </xsl:text>
</bar>
</xsl:variable>
<root>
<xsl:copy-of select="exsl:node-set($foo)"/>
</root>
</xsl:template>
</xsl:stylesheet>
it gives
<bar> </bar>
(the question is about XSLT 1.0 but interestingly XSLT 3.0 can be made to work like this
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="foo">
<bar>
<xsl:text disable-output-escaping="yes"> </xsl:text>
</bar>
</xsl:variable>
<root>
<xsl:sequence select="$foo"/>
</root>
</xsl:template>
</xsl:stylesheet>
whilst
<xsl:copy-of select="$foo"/>
doesnt. Even following the 'sequence' pattern, I don't seem to be able to preserve non escaping in anything but a non trivial xslt - I've got a complex transformation using call-templates/apply-templates etc, and I think understanding how nodes are interpreted and serialised is not trivial)
There's actually a long history to this question, which was known in the working group as the "sticky d-o-e problem" (d-o-e being disable-output-escaping). The question is, does d-o-e have any effect when writing to a temporary tree (an xsl:variable), or is it only effective when writing to serialized output?
The XSLT 1.0 specification is pretty clear on the matter:
It is an error for output escaping to be disabled for a text node that
is used for something other than a text node in the result tree. Thus,
it is an error to disable output escaping for an xsl:value-of or
xsl:text element that is used to generate the string-value of a
comment, processing instruction or attribute node; it is also an error
to convert a result tree fragment to a number or a string if the
result tree fragment contains a text node for which escaping was
disabled. In both cases, an XSLT processor may signal the error; if it
does not signal the error, it must recover by ignoring the
disable-output-escaping attribute.
XSLT 2.0 deprecated d-o-e, but retained the rule in a slightly different form:
This [property], however, can be set only within a final result tree
that is being passed to the serializer.
But in between those two versions, the working group dithered. The XSLT 1.1 working draft (which never became a recommendation, but was popularised by the first version of my XSLT book) says:
When a root node is copied using an xsl:copy-of element ... and
escaping was disabled for a text node descendant of that root node,
then escaping should also be disabled for the resulting copy of that
text node. For example
<xsl:variable name="x">
<xsl:text disable-output-escaping="yes"><</xsl:text>
</xsl:variable>
<xsl:copy-of select="$x"/>
This is the "sticky d-o-e" - the d-o-e property is attached to the text node in the temporary tree and springs into life when the text node is eventually serialized. So this behaviour was endorsed at some stage in the life of XSLT, and you may be using a processor that implements this version of the spec.
Generally, though, try to forget that d-o-e exists. Whatever the problem, it's not the best solution. It's an incredibly messy feature because it requires a breaking of the architectural boundary between the transformation processor and the serializer, and breaking this boundary leads to close coupling of the transformation and serialization, and prevents you reusing the same code in a different pipeline configuration.
I'm afraid that researching the history of the W3C spec on this is rather easier than researching exactly what was implemented in early versions of Saxon (which are now nearly a quarter of a century old).
So to take the information from Michael Kay's answer which explains how the specification for XSLT 1.0 handles this, then we CAN implement a solution, for this.
So we take a recap of the underlying issue.
Excel spreadsheetML requires data to be formatted with the specific chars "
" to interpret a line feed in a cell (but this solution applies generally).
<Cell>Alpha
Bravo
Charlie</Cell>
If we try to write an XSLT to generate this, lets say naively:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<Cell>
<xsl:text>Alpha</xsl:text>
<xsl:text> </xsl:text>
<xsl:text>Bravo</xsl:text>
<xsl:text> </xsl:text>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:template>
</xsl:stylesheet>
our
will get delimited and we get this
<Cell>Alpha Bravo Charlie</Cell>
this (thanks to the answer on how to get 'excel' new lines in spreadsheetML (MSXSLT)) can be fixed by using
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<Cell>
<xsl:text>Alpha</xsl:text>
<xsl:text disable-output-escaping="yes"> </xsl:text>
<xsl:text>Bravo</xsl:text>
<xsl:text disable-output-escaping="yes"> </xsl:text>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:template>
</xsl:stylesheet>
which produces this:
<Cell>Alpha
Bravo
Charlie</Cell>
unfortunately this 'breaks' if you process your output document via some intermediary internal document e.g. even this:
<xsl:stylesheet version="1.0"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="msxsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="output">
<Cell>
<xsl:text>Alpha</xsl:text>
<xsl:text disable-output-escaping="yes"> </xsl:text>
<xsl:text>Bravo</xsl:text>
<xsl:text disable-output-escaping="yes"> </xsl:text>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:variable>
<xsl:copy-of select="msxsl:node-set($output)"/>
</xsl:template>
</xsl:stylesheet>
reverts to:
<Cell>Alpha Bravo Charlie</Cell>
because (see Michael Hay's answer) the disable-output-escaping attribute gets ignored if its passed through some internal document (i.e. the variable).
So...how can you get around this?
If you generate a token for the LF, you can then construct your psuedo excel output almost in its entirety except you use a custom element to flag the LF char, and then you can process that DIRECTLY into the result tree and interpret the custom element as an unescaped "
"
so this:
<xsl:stylesheet version="1.0"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:kookerella="kookerella.com"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="msxsl kookerella">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="output">
<Cell>
<xsl:text>Alpha</xsl:text>
<kookerella:LF/>
<xsl:text>Bravo</xsl:text>
<kookerella:LF/>
<xsl:text>Charlie</xsl:text>
</Cell>
</xsl:variable>
<!-- process data directly into the result tree only -->
<xsl:apply-templates select="msxsl:node-set($output)" mode="injectLF"/>
</xsl:template>
<!-- Inject LF -->
<xsl:template match="#* | node()" mode="injectLF">
<xsl:copy>
<xsl:apply-templates select="#* | node()" mode="injectLF"/>
</xsl:copy>
</xsl:template>
<xsl:template match="kookerella:LF" mode="injectLF">
<xsl:text disable-output-escaping="yes"> </xsl:text>
<xsl:apply-templates select="#* | node()" mode="injectLF"/>
</xsl:template>
</xsl:stylesheet>
now results in:
<Cell>Alpha
Bravo
Charlie</Cell>
P.S.
as an aside, this seems to work for me in both the various MSXSLT and Saxon HE, but I have had an instance of using the MSXSLT engine where even this doesnt work, presumably due to some configuration out output serialisation issue.
Since Amazon shut off it's xslt support, I wanted to move it to my own server using php5's xsl. My output needs to be in a text format for my JS to process it for a web page. My problem is Amazon's xml response (very abbreviated) looks like this
<?xml version="1.0" ?>
<ItemLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01">
/............./
</ItemLookupResponse>
My problem is that my xsl stylesheet works fine as long as I remove the xmlns="http://...". What is needed in a xsl style to have it bypass or just ignore that ?
All the nodes I need are well inside that outer one.
Here is the xslt:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="CallBack" select="'amzJSONCallback'"/>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="$CallBack"/>
<xsl:text>( { "Item" : </xsl:text><xsl:apply-templates/><xsl:text> } ) </xsl:text>
</xsl:template>
<xsl:template match="OperationRequest"></xsl:template>
<xsl:template match="Request"></xsl:template>
<xsl:template match="Items">
<xsl:apply-templates select="Item"/>
</xsl:template>
<xsl:template match="Item">
<xsl:text> {</xsl:text>
<xsl:text>"title":"</xsl:text><xsl:apply-templates select="ItemAttributes/Title"/><xsl:text>",</xsl:text>
<xsl:text>"author":"</xsl:text><xsl:apply-templates select="ItemAttributes/Author"/><xsl:text>",</xsl:text>
<xsl:text>"pubbdate":"</xsl:text><xsl:apply-templates select="ItemAttributes/PublicationDate"/><xsl:text>"</xsl:text>
<xsl:text>} </xsl:text>
</xsl:template>
</xsl:stylesheet>
You should probably learn how XML namespaces work. In a nutshell, you have to define a namespace prefix in your XSL file like this:
<xsl:stylesheet ... xmlns:awse="http://webservices.amazon.com/AWSECommerceService/2011-08-01">
Then, you have to use qualified names to match and select elements under that namespace:
<xsl:template match="awse:ItemLookupResponse">
(With XSLT 2.0, you can define a default namespace. But since you're using PHP, you're probably limited to XSLT 1.0.)
It looks like nwellnhof is correct. I was using the wrong namespace in my testing. All I did was add:
<xsl:stylesheet ... xmlns:aws="http://webservices.amazon.com/AWSECommerceService/2011-08-01">
Then the elements look like
<xsl:template match="aws:ItemLookupResponse">
Now the conversion works perfectly. I don't know why it didn't work the first time I tried it.
I am a novice when it comes to XSLT.
I run the below select
<xsl:value-of select="./#name"/>
I get the following result
TestSomething.Cancel(GIVEN WHEN THEN)
I want the output to say
GIVEN WHEN THEN
instead of TestSomething.Cancel(GIVEN WHEN THEN)
Would be thankful if someone could point me in the right direction.
Use ...
<xsl:value-of select="substring-before(substring-after(./#name,'('),')')" />
It would help if you could post the source XML and some information on the xslt processor you are using, but at a guess I'd say this.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:value-of select="substring-before(substring-after(./#name, '('), ')')"/>
</xsl:template>
</xsl:stylesheet>
I'm going to be brief. I'm doing XSLT on the client. The output is a report/html with data. The report consists of several blocks, ie one block is one child element of root-node in the xml file.
There are n reports residing in n different xslt-files in my project and the reports can have the same block. That means if there is a problem with one block for one report and it is in n reports i have to update every n report (xslt file).
So i want to put all my blocks in templates (kind-of-a businesslayer) that i can reuse for my reports by xsl:include on the templates for those reports.
So the pseudo is something like this:
<?xml version="1.0".....?>
<xsl:stylesheet version="1.0"....>
<xsl:include href="../../Blocks/MyBlock.xslt"/>
<xsl:template match='/'>
<xsl:apply-templates />
</xsl:template>
</xsl:stylesheet>
MyBlock.xslt:
<?xml version="1.0"....?>
<xsl:stylesheet version="1.0".....>
<xsl:template match='/root/rating'>
HTML OUTPUT
</xsl:template>
</xsl:stylesheet>
I hope someone out there understands my question. I need pointers on how to go about this, if this is one way to do it. But it doesn't seem to work.
Below is my experience that how am dealing this.
This is example which I modified your code.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0">
<xsl:include href="../../Blocks/MyBlock.xslt"/>
<xsl:template match="/">
<xsl:apply-templates select="node()" mode="callingNode1"/>
</xsl:template>
</xsl:stylesheet>
MyBlock.xslt:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0">
<xsl:template mode="callingNode1" match="*">
HTML OUTPUT
</xsl:template>
<xsl:template mode="callingNode2" match="/root/rating">
HTML OUTPUT
</xsl:template>
</xsl:stylesheet>
Here am calling the nodes based on the mode & match.
I need to parse the following node:
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
into a valid string, preferably "keyword1,keyword2,keyword3" but I would settle for removing the cdata completely.
Trying to access the node gives me the text "keyword1,keyword2keyword3" and I can't tell where the CDATA begins.
original xml (simplified version of mRSS feed)
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
<channel>
<item>
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
</item>
</channel>
</rss>
xsl (simplified):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:media="http://search.yahoo.com/mrss/" exclude-result-prefixes="xs xsi fn">
<xsl:output method="xml" encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
<test>
<xsl:variable name="items" select="/rss/channel/item"/>
<xsl:for-each select="$items">
<xsl:variable name="mediakw" select="media:keywords"/>
<xsl:element name="mediaKeyWords">
<xsl:value-of select="$mediakw"/>
</xsl:element>
</xsl:for-each>
</test>
</xsl:template>
</xsl:stylesheet>
and the output:
<test xmlns:media="http://search.yahoo.com/mrss/"><mediaKeyWords>keyword1,keyword2keyword3</mediaKeyWords></test>
Thanks a lot!
XML and XSLT cannot help you here.
XSLT uses the INFOSET model in which there isn't anything as a "CDATA node" and there is just a single text() node:
"keyword1,keyword2keyword3"
The XML document needs to be corrected and a comma be inserted between the substrings "keyword2" and "keyword3"
One solution would be to process the CDATA DOM node using DOM, and only then initiate the XSLT transformation.
By the time the XSLT processor sees the text, the CDATA is gone. You cannot see the incoming CDATA, and have very little control over how output CDATA is generated (all or nothing for a given tag).
Can't be done in standard XSLT.
The input XML you're receiving,
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
is indistinguishable (to XSLT) from
<media:keywords>keyword1,keyword2keyword3</media:keywords>
because the CDATA markup is just a way of escaping the data inside it. There is really no special markup to escape in this case, so the CDATA happens to be a no-op. But XSLT has no way of knowing what data was originally expressed using CDATA, what was expressed using character entities, etc.
The solution would be to tell whoever is providing this XML that they need to put a delimiter between keyword2 and keyword3.