XSLT 2.0 processing invalid node mixing text and cdata - xslt

I need to parse the following node:
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
into a valid string, preferably "keyword1,keyword2,keyword3" but I would settle for removing the cdata completely.
Trying to access the node gives me the text "keyword1,keyword2keyword3" and I can't tell where the CDATA begins.
original xml (simplified version of mRSS feed)
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
<channel>
<item>
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
</item>
</channel>
</rss>
xsl (simplified):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:media="http://search.yahoo.com/mrss/" exclude-result-prefixes="xs xsi fn">
<xsl:output method="xml" encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
<test>
<xsl:variable name="items" select="/rss/channel/item"/>
<xsl:for-each select="$items">
<xsl:variable name="mediakw" select="media:keywords"/>
<xsl:element name="mediaKeyWords">
<xsl:value-of select="$mediakw"/>
</xsl:element>
</xsl:for-each>
</test>
</xsl:template>
</xsl:stylesheet>
and the output:
<test xmlns:media="http://search.yahoo.com/mrss/"><mediaKeyWords>keyword1,keyword2keyword3</mediaKeyWords></test>
Thanks a lot!

XML and XSLT cannot help you here.
XSLT uses the INFOSET model in which there isn't anything as a "CDATA node" and there is just a single text() node:
"keyword1,keyword2keyword3"
The XML document needs to be corrected and a comma be inserted between the substrings "keyword2" and "keyword3"
One solution would be to process the CDATA DOM node using DOM, and only then initiate the XSLT transformation.

By the time the XSLT processor sees the text, the CDATA is gone. You cannot see the incoming CDATA, and have very little control over how output CDATA is generated (all or nothing for a given tag).

Can't be done in standard XSLT.
The input XML you're receiving,
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
is indistinguishable (to XSLT) from
<media:keywords>keyword1,keyword2keyword3</media:keywords>
because the CDATA markup is just a way of escaping the data inside it. There is really no special markup to escape in this case, so the CDATA happens to be a no-op. But XSLT has no way of knowing what data was originally expressed using CDATA, what was expressed using character entities, etc.
The solution would be to tell whoever is providing this XML that they need to put a delimiter between keyword2 and keyword3.

Related

Is it possible to store the parser error message in variable using xslt2.0 or xslt 3.0

I am doing transform xml file using xslt and I want to display the error message of xslt parser in an element
Note: Error message should be original of parser message
I am not sure there is a way to capture XML parsing errors of the primary input document to an apply-templates based XSLT 3 transformation but in general XSLT 3 with xsl:try/xsl:catch allows you to capture and handle run-time errors, so assuming you can organize the rest of your code (for instance by using a named template as the starting point) to load/parse any XML documents with the doc or document function then you can use try/catch to handle parsing errors. An example is https://xsltfiddle.liberty-development.net/ej9EGcg/2
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:err="http://www.w3.org/2005/xqt-errors"
exclude-result-prefixes="#all"
version="3.0">
<xsl:template match="/">
<root>
<xsl:try>
<xsl:variable name="doc1" select="doc('https://raw.githubusercontent.com/martin-honnen/martin-honnen.github.io/master/xslt/2019/test2019032601.xml')"/>
<xsl:value-of select="count($doc1//item)"/>
<xsl:catch>Error code: <xsl:value-of select="$err:code"/>
Reason: <xsl:value-of select="$err:description"/>
</xsl:catch>
</xsl:try>
</root>
</xsl:template>
</xsl:stylesheet>
which, depending on your needs, can also be reduced to directly use the relevant XPath expression with the select attribute of the xsl:try element e.g. https://xsltfiddle.liberty-development.net/ej9EGcg/3
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:err="http://www.w3.org/2005/xqt-errors"
exclude-result-prefixes="#all"
version="3.0">
<xsl:template match="/">
<root>
<xsl:try select="count(doc('https://raw.githubusercontent.com/martin-honnen/martin-honnen.github.io/master/xslt/2019/test2019032601.xml'))">
<xsl:catch>Error code: <xsl:value-of select="$err:code"/>
Reason: <xsl:value-of select="$err:description"/>
</xsl:catch>
</xsl:try>
</root>
</xsl:template>
</xsl:stylesheet>

How to convert soap response containing CDATA to new formatted xml using xslt?

I want to convert below code to some formatted xml code,
Input For XSLT Transformation:
<soap:Envelope
xmlns:soap='http://schemas.xmlsoap.org/soap/envelope/'>
<soap:Body>
<rejectQuoteXMLResponse
xmlns='http://xxx.group.com'>
<out>
<![CDATA[
<TFGCPLResultXMLDO>
<processInstanceName>reejectQuoteXML</processInstanceName>
<duration>0</duration>
<accumulatedNumberOfExceptions>0</accumulatedNumberOfExceptions>
<accumulatedNumberOfErrors>0</accumulatedNumberOfErrors>
<accumulatedNumberOfWarnings>0</accumulatedNumberOfWarnings>
<numOfExceptions>0</numOfExceptions>
<numOfErrors>0</numOfErrors>
<numOfWarnings>0</numOfWarnings>
<BusinessMessages>
<BusinessErrors/>
<BusinessWarnings/>
<BusinessGenericMessages/>
</BusinessMessages>
<requestedTransSuccessfulInd>true</requestedTransSuccessfulInd>
<modifySuccessfulInd>false</modifySuccessfulInd>
<copySuccessfulInd>false</copySuccessfulInd>
<responseXMLString>
<RejectPolicyRes>
<status>Success</status>
<message>Reject Policy successful</message>
</RejectPolicyRes>
</responseXMLString>
</TFGCPLResultXMLDO>
]]>
</out>
</rejectQuoteXMLResponse>
</soap:Body>
</soap:Envelope>
Required Output:
<QuoteRejectRs>
<UserId>344758</UserId>
<QuoteDetails>
<QuoteNumber>PA-Q450000</QuoteNumber>
<Status>Success</Status>
<Message>Reject Policy successful</Message>
</QuoteDetails>
</QuoteRejectRs>
XSLT Transformation code:
<!-- <xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:md1="http://http://xxx.group.com" xmlns:exsl="http://exslt.org/common"
exclude-result-prefixes="xs md1"
extension-element-prefixes="exsl"
version="2.0"> -->
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<!--XML Response for Quote Reject which tell request has been submitted with success or fail -->
<QuoteRejectRs>
<UserId></UserId>
<QuoteDetails>
<QuoteNumber></QuoteNumber>
<Status>
<xsl:value-of select="substring-before(substring-after(.,'Envelope/Body/TFGCPLResultXMLDO>'), '<Envelope/Body/TFGCPLResultXMLDO/responseXMLString/RejectPolicyRes/status')"/>
</Status>
<Message>
<xsl:value-of select="substring-before(substring-after(.,'Envelope/Body/TFGCPLResultXMLDO>'), '<Envelope/Body/TFGCPLResultXMLDO/responseXMLString/RejectPolicyRes/message')"/>
</Message>
<!-- <Status><xsl:value-of select="soap:Envelope/soap:Body/md1:rejectQuoteXMLResponse/md1:out/md1:status"/></Status><Message><xsl:value-of select="Envelope/Body/TFGCPLResultXMLDO/responseXMLString/RejectPolicyRes/message"/></Message>-->
</QuoteDetails>
</QuoteRejectRs>
</xsl:template>
</xsl:stylesheet>
I have tried and able to convert xml which contains namespace (soap:xxx) , but unable to convert xml which contains CDATA.
I have tried but getting response containing < > format.
So anyone know solution for such task.
You complain that you are
getting response containing < > format.
That's easy enough to fix. xsl:value-of has an attribute disable-output-escaping specifically for that purpose, so something like
<Status>
<xsl:value-of
select="substring-before(substring-after(.,'Envelope/Body/TFGCPLResultXMLDO>'), '<Envelope/Body/TFGCPLResultXMLDO/responseXMLString/RejectPolicyRes/status')"
disable-output-escaping="yes"/>
</Status>
<Message>
<xsl:value-of
select="substring-before(substring-after(.,'Envelope/Body/TFGCPLResultXMLDO>'), '<Envelope/Body/TFGCPLResultXMLDO/responseXMLString/RejectPolicyRes/message')"
disable-output-escaping="yes"/>
</Message>
ought to do the job you seem to be looking for.
HOWEVER, I urge you to take a different approach. Picking apart and parsing XML text via string functions is bad news. What you really should do is parse the embedded XML as an XML document, and transform that. Unfortunately, XSLT 1.0 and 2.0 do not have a standard mechanism for that, but some implementations add that as an extension, and depending on how you are performing the transformation, it may be comparatively easy to install your own extension function for the purpose. Alternatively, you could extract the embedded XML as a preliminary step, and then transform just that.

XSLT transformation and CDATA

I have to transform my input xml using XSLT.
It contains, CDATA and I need to extract elements from CDATA and then I have to rename the tag.
Below is my input xml :
<getArtifactContentResponse>
<return>
<![CDATA[
<metadata>
<overview>
<name>scannapp</name>
<developerId>developer702</developerId>
<stateId>2</stateId>
<serverURL>dddd</serverURL>
<id>cspapp1103</id>
<description>scann doc</description>
<hostingTypeId>1</hostingTypeId>
</overview>
</metadata>
]]>
</return>
</getArtifactContentResponse>
And the expected output is :
<?xml version="1.0" encoding="UTF-8"?>
<metadata >
<information>
<name>scannapp</name>
<developerId>developer702</developerId>
<stateId>2</stateId>
<serverURL>ddddd</serverURL>
<id>cspapp1103</id>
<description>scann doc</description>
<hostingTypeId>1</hostingTypeId>
</Information>
</metadata>
XSLT I am using is below :
<xsl:output method="xml" version="1.0" encoding="UTF-8" />
<xsl:template match="/">
<xsl:value-of select="//ns:getArtifactContentResponse/ns:return/text()" disable-output-escaping="yes"/>
</xsl:template>
<xsl:template match="overview">
<Information>
<xsl:apply-templates select="#* | node()" />
</Information>
</xsl:template>
With this I am able to exrtact the CDATA but it is not renaming the element 'overview' to 'Information' .
Transformed xml is below :
<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<overview>
<name>scannapp</name>
<developerId>developer702</developerId>
<stateId>2</stateId>
<serverURL>dddddd</serverURL>
<id>cspapp1103</id>
<description>scann doc</description>
<hostingTypeId>1</hostingTypeId>
</overview>
</metadata>
Can someone tell me how I can rename the tag after extracting the CDATA?
I don't understand what I am missing here?
Thanks in Advance
There are no elements in your CDATA, there is only text. That's what CDATA means: "this stuff might look like markup, but I want it treated as text".
Turning text into elements is called parsing, so to extract the elements from the text in your CDATA you are going to have to parse it. There's no direct way to do this in XSLT until you get to XSLT 3.0 (which has a parse-xml() function). Some XSLT processors have an extension function to do it; in some (I believe) the exslt:node-set() function does this if you supply a string as input. With others, you can call out to your own Java or Javascript code to do the parsing. So it all becomes processor-dependent.
Another approach is to output the XML in your CDATA section using the disable-output-escaping trick, and then process it in a second transformation.
The best approach is to get rid of the CDATA tags before you start. They should never have been put there in the first place.

XSL Generating CSV

Trying to convert this:
<list>
<entry>
<parentFeed>
<feedUrl>http://rss.nzherald.co.nz/rss/xml/nzhrsscid_000000001.xml</feedUrl>
<id>68</id>
</parentFeed>
<content>Schools will have to put up with problematic pay administered through Novopay for another eight weeks after the Government announced it would persist with the unstable system.Minister responsible for Novopay, Steven Joyce, delayed...</content>
<link>http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10872300&ref=rss</link>
<title>Novopay: Govt sticks with unstable system</title>
<id>55776</id>
<published class="sql-timestamp">2013-03-19 03:38:55.0</published>
<timestamp>2013-03-19 07:31:16.358 UTC</timestamp>
</entry>
</list>
into this, using XSLT:
Title, Link, Date
Novopay: Govt sticks with unstable system, http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10872300&ref=rss, 2013-03-19 03:38:55.0
But try as I might, I can't get rid of the blank line at the beginning of the document. My stylesheet follows:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:csv="csv:csv"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/list"> <xsl:for-each select="entry">
Title, Link, Date
<xsl:value-of select="title"/>, <xsl:value-of select="link"/>, <xsl:value-of select="published"/>
</xsl:for-each></xsl:template>
</xsl:stylesheet>
I've tried putting in <xsl:text>
</xsl:text> as suggested here which stripped the last linebreak, so I moved it to the top of the file, at which point it turned into a no-op. The solution here actually adds a blank line (which makes sense, as the hex code is for newline, according to the ascii manpage).
As a workaround, I've been using Java to generate the CSV output.
However, I do feel XSLT would be a lot faster as it is designed to transform XML to various other formats. A similar XSLT generates HTML, RSS, and ATOM feeds perfectly.
You have done it perfectly, your logic is spot on. However what you need to take in mind, when your outputting text all indents in your XSLT will affect the output so your XSLT should look like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:csv="csv:csv"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/list"> <xsl:for-each select="entry">Title, Link, Date
<xsl:value-of select="title"/>, <xsl:value-of select="link"/>, <xsl:value-of select="published"/>
<xsl:text>
</xsl:text>
</xsl:for-each></xsl:template>
</xsl:stylesheet>
Run the above XSLT and it will work perfectly.

Problem with XSLT getting values from tags with namespace prefixes

I have a specific problem getting values for width and height out of some XML that has namespace prefixes defined. I can get other values such as SomeText from RelatedMaterial quite easily using normal xpath with namespace "n:" but unable to get values for width and height.
Sample XML:
<Description>
<Information>
<GroupInformation xml:lang="en">
<BasicDescription>
<RelatedMaterial>
<SomeText>Hello</SomeText>
<t:ContentProperties>
<t:ContentAttributes>
<t:Width>555</t:Width>
<t:Height>444</t:Height>
</t:ContentAttributes>
</t:ContentProperties>
</RelatedMaterial>
</BasicDescription>
</GroupInformation>
</Information>
</Description>
Here is an extract from the XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:n="urn:t:myfoo:2010" xmlns:tva2="urn:t:myfoo:extended:2008"
<xsl:apply-templates select="n:Description/n:Information/n:GroupInformation"/>
<xsl:template match="n:GroupInformation">
<width>
<xsl:value-of select="n:BasicDescription/n:RelatedMaterial/t:ContentProperties/t:ContentAttributes/t:Width"/>
</width>
</xsl:template>
The above XSLT does not work for getting the width. Any ideas?
I'm not sure you have realised that both your input and XSLT is invalid, it's always better to provide working examples.
Anyway, if we look at the XPath expression n:BasicDescription/n:RelatedMaterial/t:ContentProperties/t:ContentAttributes/t:Width you're using a prefix n which is mapped to urn:t:myfoo:2010 but when the data infact is in the default namespace. The same goes for the t prefix which isn't defined at all in neither the input data nor XSLT.
You need to define the namespaces on "both sides", in the XML data and the XSLT transformation and they need to be the same, not the prefixes, but the URI.
Somebody else could probably explain this better than me.
I've corrected your example and added a few things to make this work.
Input:
<?xml version="1.0" encoding="UTF-8"?>
<Description
xmlns="urn:t:myfoo:2010"
xmlns:t="something...">
<Information>
<GroupInformation xml:lang="en">
<BasicDescription>
<RelatedMaterial>
<SomeText>Hello</SomeText>
<t:ContentProperties>
<t:ContentAttributes>
<t:Width>555</t:Width>
<t:Height>444</t:Height>
</t:ContentAttributes>
</t:ContentProperties>
</RelatedMaterial>
</BasicDescription>
</GroupInformation>
</Information>
</Description>
XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
xmlns:n="urn:t:myfoo:2010"
xmlns:t="something...">
<xsl:template match="/">
<xsl:apply-templates select="n:Description/n:Information/n:GroupInformation"/>
</xsl:template>
<xsl:template match="n:GroupInformation">
<xsl:element name="width">
<xsl:value-of select="n:BasicDescription/n:RelatedMaterial/t:ContentProperties/t:ContentAttributes/t:Width"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Output:
<?xml version="1.0" encoding="UTF-8"?>
<width>555</width>