I have to transform my input xml using XSLT.
It contains, CDATA and I need to extract elements from CDATA and then I have to rename the tag.
Below is my input xml :
<getArtifactContentResponse>
<return>
<![CDATA[
<metadata>
<overview>
<name>scannapp</name>
<developerId>developer702</developerId>
<stateId>2</stateId>
<serverURL>dddd</serverURL>
<id>cspapp1103</id>
<description>scann doc</description>
<hostingTypeId>1</hostingTypeId>
</overview>
</metadata>
]]>
</return>
</getArtifactContentResponse>
And the expected output is :
<?xml version="1.0" encoding="UTF-8"?>
<metadata >
<information>
<name>scannapp</name>
<developerId>developer702</developerId>
<stateId>2</stateId>
<serverURL>ddddd</serverURL>
<id>cspapp1103</id>
<description>scann doc</description>
<hostingTypeId>1</hostingTypeId>
</Information>
</metadata>
XSLT I am using is below :
<xsl:output method="xml" version="1.0" encoding="UTF-8" />
<xsl:template match="/">
<xsl:value-of select="//ns:getArtifactContentResponse/ns:return/text()" disable-output-escaping="yes"/>
</xsl:template>
<xsl:template match="overview">
<Information>
<xsl:apply-templates select="#* | node()" />
</Information>
</xsl:template>
With this I am able to exrtact the CDATA but it is not renaming the element 'overview' to 'Information' .
Transformed xml is below :
<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<overview>
<name>scannapp</name>
<developerId>developer702</developerId>
<stateId>2</stateId>
<serverURL>dddddd</serverURL>
<id>cspapp1103</id>
<description>scann doc</description>
<hostingTypeId>1</hostingTypeId>
</overview>
</metadata>
Can someone tell me how I can rename the tag after extracting the CDATA?
I don't understand what I am missing here?
Thanks in Advance
There are no elements in your CDATA, there is only text. That's what CDATA means: "this stuff might look like markup, but I want it treated as text".
Turning text into elements is called parsing, so to extract the elements from the text in your CDATA you are going to have to parse it. There's no direct way to do this in XSLT until you get to XSLT 3.0 (which has a parse-xml() function). Some XSLT processors have an extension function to do it; in some (I believe) the exslt:node-set() function does this if you supply a string as input. With others, you can call out to your own Java or Javascript code to do the parsing. So it all becomes processor-dependent.
Another approach is to output the XML in your CDATA section using the disable-output-escaping trick, and then process it in a second transformation.
The best approach is to get rid of the CDATA tags before you start. They should never have been put there in the first place.
Related
I want to convert below code to some formatted xml code,
Input For XSLT Transformation:
<soap:Envelope
xmlns:soap='http://schemas.xmlsoap.org/soap/envelope/'>
<soap:Body>
<rejectQuoteXMLResponse
xmlns='http://xxx.group.com'>
<out>
<![CDATA[
<TFGCPLResultXMLDO>
<processInstanceName>reejectQuoteXML</processInstanceName>
<duration>0</duration>
<accumulatedNumberOfExceptions>0</accumulatedNumberOfExceptions>
<accumulatedNumberOfErrors>0</accumulatedNumberOfErrors>
<accumulatedNumberOfWarnings>0</accumulatedNumberOfWarnings>
<numOfExceptions>0</numOfExceptions>
<numOfErrors>0</numOfErrors>
<numOfWarnings>0</numOfWarnings>
<BusinessMessages>
<BusinessErrors/>
<BusinessWarnings/>
<BusinessGenericMessages/>
</BusinessMessages>
<requestedTransSuccessfulInd>true</requestedTransSuccessfulInd>
<modifySuccessfulInd>false</modifySuccessfulInd>
<copySuccessfulInd>false</copySuccessfulInd>
<responseXMLString>
<RejectPolicyRes>
<status>Success</status>
<message>Reject Policy successful</message>
</RejectPolicyRes>
</responseXMLString>
</TFGCPLResultXMLDO>
]]>
</out>
</rejectQuoteXMLResponse>
</soap:Body>
</soap:Envelope>
Required Output:
<QuoteRejectRs>
<UserId>344758</UserId>
<QuoteDetails>
<QuoteNumber>PA-Q450000</QuoteNumber>
<Status>Success</Status>
<Message>Reject Policy successful</Message>
</QuoteDetails>
</QuoteRejectRs>
XSLT Transformation code:
<!-- <xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:md1="http://http://xxx.group.com" xmlns:exsl="http://exslt.org/common"
exclude-result-prefixes="xs md1"
extension-element-prefixes="exsl"
version="2.0"> -->
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<!--XML Response for Quote Reject which tell request has been submitted with success or fail -->
<QuoteRejectRs>
<UserId></UserId>
<QuoteDetails>
<QuoteNumber></QuoteNumber>
<Status>
<xsl:value-of select="substring-before(substring-after(.,'Envelope/Body/TFGCPLResultXMLDO>'), '<Envelope/Body/TFGCPLResultXMLDO/responseXMLString/RejectPolicyRes/status')"/>
</Status>
<Message>
<xsl:value-of select="substring-before(substring-after(.,'Envelope/Body/TFGCPLResultXMLDO>'), '<Envelope/Body/TFGCPLResultXMLDO/responseXMLString/RejectPolicyRes/message')"/>
</Message>
<!-- <Status><xsl:value-of select="soap:Envelope/soap:Body/md1:rejectQuoteXMLResponse/md1:out/md1:status"/></Status><Message><xsl:value-of select="Envelope/Body/TFGCPLResultXMLDO/responseXMLString/RejectPolicyRes/message"/></Message>-->
</QuoteDetails>
</QuoteRejectRs>
</xsl:template>
</xsl:stylesheet>
I have tried and able to convert xml which contains namespace (soap:xxx) , but unable to convert xml which contains CDATA.
I have tried but getting response containing < > format.
So anyone know solution for such task.
You complain that you are
getting response containing < > format.
That's easy enough to fix. xsl:value-of has an attribute disable-output-escaping specifically for that purpose, so something like
<Status>
<xsl:value-of
select="substring-before(substring-after(.,'Envelope/Body/TFGCPLResultXMLDO>'), '<Envelope/Body/TFGCPLResultXMLDO/responseXMLString/RejectPolicyRes/status')"
disable-output-escaping="yes"/>
</Status>
<Message>
<xsl:value-of
select="substring-before(substring-after(.,'Envelope/Body/TFGCPLResultXMLDO>'), '<Envelope/Body/TFGCPLResultXMLDO/responseXMLString/RejectPolicyRes/message')"
disable-output-escaping="yes"/>
</Message>
ought to do the job you seem to be looking for.
HOWEVER, I urge you to take a different approach. Picking apart and parsing XML text via string functions is bad news. What you really should do is parse the embedded XML as an XML document, and transform that. Unfortunately, XSLT 1.0 and 2.0 do not have a standard mechanism for that, but some implementations add that as an extension, and depending on how you are performing the transformation, it may be comparatively easy to install your own extension function for the purpose. Alternatively, you could extract the embedded XML as a preliminary step, and then transform just that.
I'm trying to find all text that is not within the XML markup:
<transcript>
<text start="9.75" dur="5.94">welcome to about my property here you
can learn more about how your property</text>
<text start="15.69" dur="4.71">was assessed see the information impact
has on file and compare your property to</text>
<text start="20.4" dur="1.3">others in your neighborhood</text>
<text start="21.7" dur="5.32">interested in learning about market
trends in your municipality no problem</text>
<text start="105.79" dur="6.23">I have all of this and more about life property
. see your property assessment know more</text>
<text start="112.02" dur="0.11">about</text>
</transcript>
I am using the following regex pattern, but obviously it is not correct because it grabs all of the text between the opening and closing <transcript> tags:
<transcript>[\s\S]*?<\/transcript>
How can modify this regex pattern to select only the text that is not within any of the markup tags?
Use XSLT. XSLT is a language specifically designed to convert XML into another output format (back to valid XML again, or something else such as (X)HTML, plain text, or any other format – but preferably, based on plain text).
In this case the smallest XSLT necessary is just this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0" >
<xsl:output method="text" indent="no" />
<xsl:template match="text">
<!-- do NOTHING here! -->
</xsl:template>
</xsl:stylesheet>
This works because the default for processing a single XML tag is to recursively apply template matches to its containing tags, and plain text will always be copied. The only tag inside your <template> is <text>, and you process it by doing 'nothing' – i.e., by not copying its contents to the output. The line inside that template is just a comment.
All other "nodes", in XML terminology, are those without a surrounding tag and so are copied to the output.
Alternatively, if you have more types of tags than just <text> elements and you want to skip all of them, apply templates to / and transcript to process each and apply another to * (which will select all remaining tags not specified elsewhere) to not process them:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0" >
<xsl:output method="text" indent="no" />
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="transcript">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="*">
<!-- do NOTHING here! -->
</xsl:template>
</xsl:stylesheet>
Again, the plain untagged text will fall through and not get processed, so their contents will be copied to output.
Both XSLT stylesheets will output only I ha, the only part in your sample text that is not surrounded by tags.
Do you want to find
welcome to about my property here you can learn more about how your property
from
<text start="9.75" dur="5.94">welcome to about my property here you can learn more about how your property</text>
??
Than it will work.
(?<=>).+?(?=<)
I have XML
<?xml version="1.0" encoding="UTF-8"?>
<icestats>
<stats_connections>0</stats_connections>
<source mount="/live">
<bitrate>Some data</bitrate>
<server_description>This is what I want to return</server_description>
</source>
</icestats>
And I have XSL
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:copy-of select="/icestats/source mount="/live"/server_description/node()" />
</xsl:template>
</xsl:stylesheet>
I want the output
This is what I want to return
If I remove the double quotes, space and forward slash from the source it works, but I haven't been able to successfully escape the non standard characters yet using suggested methods in other posts.
For clarity, below is the solution thanks to Lego Stormtroopr
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:copy-of select="/icestats/source[#mount='/live']/server_description/node()" />
</xsl:template>
</xsl:stylesheet>
There are a couple of issues you will need to resolve before your processor will produce the output you're looking for.
1) Your XML input must be made well-formed. The closing tag of the source element should not include the mount attribute that is specified on the opening tag.
<source mount="/live">
...
</source>
2) The XPath on your xsl:copy-of element must be made valid. The syntax for an XPath expression is (fortunately) not like the syntax for XML elements and attributes. Specifying which source element to match is done by predicating on an attribute value, like you have done, except that you need to use square brackets:
/icestats/source[#mount="/live"]/server_description
In order to use this XPath expression in an XSLT select statement, you will need to make sure that you enclose the entire select attribute value with one type of quotes, and use the other type of quotes within the attribute value, e.g.:
<xsl:value-of select="/icestats/source[#mount='/live']/server_description" />
With This input
<?xml version="1.0" encoding="UTF-8"?>
<icestats>
<stats_connections>0</stats_connections>
<source mount="/live">
<bitrate>Some data</bitrate>
<server_description>This is what I want to return</server_description>
</source>
</icestats>
and this stylesheet
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="/icestats/source[#mount='/live']/server_description" />
</xsl:template>
</xsl:stylesheet>
I get the following line of text from xsltproc and saxon:
This is what I want to return
The xsl:value-of element will return the string value of an element (here, that one text node). If you actually wanted the server_description element, then you can use xsl:copy-of to get the whole thing, tags and all. (You would have to update xsl:output as well.)
It looks like you are doing a select based on the attribute, so you just need to properly capture the attribute in the XPath. The quotes you use in the document and the XPath don't need to match, so you can switch them to single quotes ('):
<xsl:copy-of select="/icestats/source[#mount='/live']/server_description/node()" />
(Edited to correct the the missing / from the mount attribute.)
Also, your original document isn't valid XML, as XML doesn't allow attributes in the closing tag.
I think all you need to do is escape the quotes in the attribute string with ":
<xsl:copy-of select="/icestats/source mount="/live"/server_description/node()" />
I need to parse the following node:
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
into a valid string, preferably "keyword1,keyword2,keyword3" but I would settle for removing the cdata completely.
Trying to access the node gives me the text "keyword1,keyword2keyword3" and I can't tell where the CDATA begins.
original xml (simplified version of mRSS feed)
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
<channel>
<item>
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
</item>
</channel>
</rss>
xsl (simplified):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:media="http://search.yahoo.com/mrss/" exclude-result-prefixes="xs xsi fn">
<xsl:output method="xml" encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
<test>
<xsl:variable name="items" select="/rss/channel/item"/>
<xsl:for-each select="$items">
<xsl:variable name="mediakw" select="media:keywords"/>
<xsl:element name="mediaKeyWords">
<xsl:value-of select="$mediakw"/>
</xsl:element>
</xsl:for-each>
</test>
</xsl:template>
</xsl:stylesheet>
and the output:
<test xmlns:media="http://search.yahoo.com/mrss/"><mediaKeyWords>keyword1,keyword2keyword3</mediaKeyWords></test>
Thanks a lot!
XML and XSLT cannot help you here.
XSLT uses the INFOSET model in which there isn't anything as a "CDATA node" and there is just a single text() node:
"keyword1,keyword2keyword3"
The XML document needs to be corrected and a comma be inserted between the substrings "keyword2" and "keyword3"
One solution would be to process the CDATA DOM node using DOM, and only then initiate the XSLT transformation.
By the time the XSLT processor sees the text, the CDATA is gone. You cannot see the incoming CDATA, and have very little control over how output CDATA is generated (all or nothing for a given tag).
Can't be done in standard XSLT.
The input XML you're receiving,
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
is indistinguishable (to XSLT) from
<media:keywords>keyword1,keyword2keyword3</media:keywords>
because the CDATA markup is just a way of escaping the data inside it. There is really no special markup to escape in this case, so the CDATA happens to be a no-op. But XSLT has no way of knowing what data was originally expressed using CDATA, what was expressed using character entities, etc.
The solution would be to tell whoever is providing this XML that they need to put a delimiter between keyword2 and keyword3.
I have a specific problem getting values for width and height out of some XML that has namespace prefixes defined. I can get other values such as SomeText from RelatedMaterial quite easily using normal xpath with namespace "n:" but unable to get values for width and height.
Sample XML:
<Description>
<Information>
<GroupInformation xml:lang="en">
<BasicDescription>
<RelatedMaterial>
<SomeText>Hello</SomeText>
<t:ContentProperties>
<t:ContentAttributes>
<t:Width>555</t:Width>
<t:Height>444</t:Height>
</t:ContentAttributes>
</t:ContentProperties>
</RelatedMaterial>
</BasicDescription>
</GroupInformation>
</Information>
</Description>
Here is an extract from the XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:n="urn:t:myfoo:2010" xmlns:tva2="urn:t:myfoo:extended:2008"
<xsl:apply-templates select="n:Description/n:Information/n:GroupInformation"/>
<xsl:template match="n:GroupInformation">
<width>
<xsl:value-of select="n:BasicDescription/n:RelatedMaterial/t:ContentProperties/t:ContentAttributes/t:Width"/>
</width>
</xsl:template>
The above XSLT does not work for getting the width. Any ideas?
I'm not sure you have realised that both your input and XSLT is invalid, it's always better to provide working examples.
Anyway, if we look at the XPath expression n:BasicDescription/n:RelatedMaterial/t:ContentProperties/t:ContentAttributes/t:Width you're using a prefix n which is mapped to urn:t:myfoo:2010 but when the data infact is in the default namespace. The same goes for the t prefix which isn't defined at all in neither the input data nor XSLT.
You need to define the namespaces on "both sides", in the XML data and the XSLT transformation and they need to be the same, not the prefixes, but the URI.
Somebody else could probably explain this better than me.
I've corrected your example and added a few things to make this work.
Input:
<?xml version="1.0" encoding="UTF-8"?>
<Description
xmlns="urn:t:myfoo:2010"
xmlns:t="something...">
<Information>
<GroupInformation xml:lang="en">
<BasicDescription>
<RelatedMaterial>
<SomeText>Hello</SomeText>
<t:ContentProperties>
<t:ContentAttributes>
<t:Width>555</t:Width>
<t:Height>444</t:Height>
</t:ContentAttributes>
</t:ContentProperties>
</RelatedMaterial>
</BasicDescription>
</GroupInformation>
</Information>
</Description>
XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
xmlns:n="urn:t:myfoo:2010"
xmlns:t="something...">
<xsl:template match="/">
<xsl:apply-templates select="n:Description/n:Information/n:GroupInformation"/>
</xsl:template>
<xsl:template match="n:GroupInformation">
<xsl:element name="width">
<xsl:value-of select="n:BasicDescription/n:RelatedMaterial/t:ContentProperties/t:ContentAttributes/t:Width"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Output:
<?xml version="1.0" encoding="UTF-8"?>
<width>555</width>