Escaping Double Quotes, Space and Allowing for an Extra Forward Slash - xslt

I have XML
<?xml version="1.0" encoding="UTF-8"?>
<icestats>
<stats_connections>0</stats_connections>
<source mount="/live">
<bitrate>Some data</bitrate>
<server_description>This is what I want to return</server_description>
</source>
</icestats>
And I have XSL
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:copy-of select="/icestats/source mount="/live"/server_description/node()" />
</xsl:template>
</xsl:stylesheet>
I want the output
This is what I want to return
If I remove the double quotes, space and forward slash from the source it works, but I haven't been able to successfully escape the non standard characters yet using suggested methods in other posts.
For clarity, below is the solution thanks to Lego Stormtroopr
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<xsl:copy-of select="/icestats/source[#mount='/live']/server_description/node()" />
</xsl:template>
</xsl:stylesheet>

There are a couple of issues you will need to resolve before your processor will produce the output you're looking for.
1) Your XML input must be made well-formed. The closing tag of the source element should not include the mount attribute that is specified on the opening tag.
<source mount="/live">
...
</source>
2) The XPath on your xsl:copy-of element must be made valid. The syntax for an XPath expression is (fortunately) not like the syntax for XML elements and attributes. Specifying which source element to match is done by predicating on an attribute value, like you have done, except that you need to use square brackets:
/icestats/source[#mount="/live"]/server_description
In order to use this XPath expression in an XSLT select statement, you will need to make sure that you enclose the entire select attribute value with one type of quotes, and use the other type of quotes within the attribute value, e.g.:
<xsl:value-of select="/icestats/source[#mount='/live']/server_description" />
With This input
<?xml version="1.0" encoding="UTF-8"?>
<icestats>
<stats_connections>0</stats_connections>
<source mount="/live">
<bitrate>Some data</bitrate>
<server_description>This is what I want to return</server_description>
</source>
</icestats>
and this stylesheet
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select="/icestats/source[#mount='/live']/server_description" />
</xsl:template>
</xsl:stylesheet>
I get the following line of text from xsltproc and saxon:
This is what I want to return
The xsl:value-of element will return the string value of an element (here, that one text node). If you actually wanted the server_description element, then you can use xsl:copy-of to get the whole thing, tags and all. (You would have to update xsl:output as well.)

It looks like you are doing a select based on the attribute, so you just need to properly capture the attribute in the XPath. The quotes you use in the document and the XPath don't need to match, so you can switch them to single quotes ('):
<xsl:copy-of select="/icestats/source[#mount='/live']/server_description/node()" />
(Edited to correct the the missing / from the mount attribute.)
Also, your original document isn't valid XML, as XML doesn't allow attributes in the closing tag.

I think all you need to do is escape the quotes in the attribute string with ":
<xsl:copy-of select="/icestats/source mount="/live"/server_description/node()" />

Related

XSLT transformation and CDATA

I have to transform my input xml using XSLT.
It contains, CDATA and I need to extract elements from CDATA and then I have to rename the tag.
Below is my input xml :
<getArtifactContentResponse>
<return>
<![CDATA[
<metadata>
<overview>
<name>scannapp</name>
<developerId>developer702</developerId>
<stateId>2</stateId>
<serverURL>dddd</serverURL>
<id>cspapp1103</id>
<description>scann doc</description>
<hostingTypeId>1</hostingTypeId>
</overview>
</metadata>
]]>
</return>
</getArtifactContentResponse>
And the expected output is :
<?xml version="1.0" encoding="UTF-8"?>
<metadata >
<information>
<name>scannapp</name>
<developerId>developer702</developerId>
<stateId>2</stateId>
<serverURL>ddddd</serverURL>
<id>cspapp1103</id>
<description>scann doc</description>
<hostingTypeId>1</hostingTypeId>
</Information>
</metadata>
XSLT I am using is below :
<xsl:output method="xml" version="1.0" encoding="UTF-8" />
<xsl:template match="/">
<xsl:value-of select="//ns:getArtifactContentResponse/ns:return/text()" disable-output-escaping="yes"/>
</xsl:template>
<xsl:template match="overview">
<Information>
<xsl:apply-templates select="#* | node()" />
</Information>
</xsl:template>
With this I am able to exrtact the CDATA but it is not renaming the element 'overview' to 'Information' .
Transformed xml is below :
<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<overview>
<name>scannapp</name>
<developerId>developer702</developerId>
<stateId>2</stateId>
<serverURL>dddddd</serverURL>
<id>cspapp1103</id>
<description>scann doc</description>
<hostingTypeId>1</hostingTypeId>
</overview>
</metadata>
Can someone tell me how I can rename the tag after extracting the CDATA?
I don't understand what I am missing here?
Thanks in Advance
There are no elements in your CDATA, there is only text. That's what CDATA means: "this stuff might look like markup, but I want it treated as text".
Turning text into elements is called parsing, so to extract the elements from the text in your CDATA you are going to have to parse it. There's no direct way to do this in XSLT until you get to XSLT 3.0 (which has a parse-xml() function). Some XSLT processors have an extension function to do it; in some (I believe) the exslt:node-set() function does this if you supply a string as input. With others, you can call out to your own Java or Javascript code to do the parsing. So it all becomes processor-dependent.
Another approach is to output the XML in your CDATA section using the disable-output-escaping trick, and then process it in a second transformation.
The best approach is to get rid of the CDATA tags before you start. They should never have been put there in the first place.

Regex of XML with multiple tags

I'm trying to find all text that is not within the XML markup:
<transcript>
<text start="9.75" dur="5.94">welcome to about my property here you
can learn more about how your property</text>
<text start="15.69" dur="4.71">was assessed see the information impact
has on file and compare your property to</text>
<text start="20.4" dur="1.3">others in your neighborhood</text>
<text start="21.7" dur="5.32">interested in learning about market
trends in your municipality no problem</text>
<text start="105.79" dur="6.23">I have all of this and more about life property
. see your property assessment know more</text>
<text start="112.02" dur="0.11">about</text>
</transcript>
I am using the following regex pattern, but obviously it is not correct because it grabs all of the text between the opening and closing <transcript> tags:
<transcript>[\s\S]*?<\/transcript>
How can modify this regex pattern to select only the text that is not within any of the markup tags?
Use XSLT. XSLT is a language specifically designed to convert XML into another output format (back to valid XML again, or something else such as (X)HTML, plain text, or any other format – but preferably, based on plain text).
In this case the smallest XSLT necessary is just this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0" >
<xsl:output method="text" indent="no" />
<xsl:template match="text">
<!-- do NOTHING here! -->
</xsl:template>
</xsl:stylesheet>
This works because the default for processing a single XML tag is to recursively apply template matches to its containing tags, and plain text will always be copied. The only tag inside your <template> is <text>, and you process it by doing 'nothing' – i.e., by not copying its contents to the output. The line inside that template is just a comment.
All other "nodes", in XML terminology, are those without a surrounding tag and so are copied to the output.
Alternatively, if you have more types of tags than just <text> elements and you want to skip all of them, apply templates to / and transcript to process each and apply another to * (which will select all remaining tags not specified elsewhere) to not process them:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0" >
<xsl:output method="text" indent="no" />
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="transcript">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="*">
<!-- do NOTHING here! -->
</xsl:template>
</xsl:stylesheet>
Again, the plain untagged text will fall through and not get processed, so their contents will be copied to output.
Both XSLT stylesheets will output only I ha, the only part in your sample text that is not surrounded by tags.
Do you want to find
welcome to about my property here you can learn more about how your property
from
<text start="9.75" dur="5.94">welcome to about my property here you can learn more about how your property</text>
??
Than it will work.
(?<=>).+?(?=<)

XSL default namespace with lookup file

I have a input xml with a default namespace. eg as below.
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns="aaa">
<subroot>
<country>aaa</country>
<country>bbb</country>
<country>ccc</country>
</subroot>
</root>
While transforming I use xpath-default-namespace="aaa" because otherwise xpaths will not match. Again I have to read a lookup xml using xsl key function. eg as below
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xpath-default-namespace="aaa" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:variable name="LookupDoc" select="document('lookup.xml')" />
<xsl:key name="ObjectType-lookup" match="lookup" use="#att1" />
<xsl:template match="//country">
<countrynew>
<xsl:apply-templates select="$LookupDoc/*">
<xsl:with-param name="curr-code" select="string(.)" />
</xsl:apply-templates>
</countrynew>
</xsl:template>
<xsl:template match='lookups'>
<xsl:param name="curr-code" />
<xsl:value-of select="key( 'ObjectType-lookup' , normalize-space($curr-code))/#att2" />
</xsl:template>
with default namespace in stylesheet element xpath "//country" works fine. The problem arise when I read the lookup xml which doesn't have any namespace. eg:
<?xml version="1.0" encoding="UTF-8"?>
<x:lookups>
<lookup att1="aaa" att2="zzz"/>
<lookup att1="bbb" att2="yyy"/>
<lookup att1="ccc" att2="xxx"/>
</x:lookups>
Is there any way that I can specify in template maching "lookups" to ignore xpath-default-namespace or to match any namespace including no namespce?
Thank you
Is there any way that I can specify in template maching "lookups" to ignore xpath-default-namespace or to match any namespace including no namespce?
You can specify xpath-default-namespace anywhere in the stylesheet: an XPath expression will look up the tree and use the "nearest ancestor" value.
For any element in the stylesheet, this attribute has an effective value, which is the value of the [xsl:]xpath-default-namespace on that element or on the innermost containing element that specifies such an attribute
(From the XSLT 2.0 spec)
So you could say
<xsl:template match='lookups' xpath-default-namespace=''>
to override the default namespace specified on the xsl:stylesheet element. You can even specify it on a literal result element in the stylesheet, as xsl:xpath-default-namespace:
<something xsl:xpath-default-namespace="bbb" attr="{foo}" />
This would create a <something attr="xxx" /> where xxx is the value of the {bbb}foo child element of the current context node.
I did solve problem but sure there will be other ways. What I did was I move template match for lookup and key xsl function to a different document and in xsl stylesheed element I put xpath-default-namespace="". So for those xpath matching xsl use default namespace as none.
Still I'm curious weather there is a way to specify in template itself to use no namespace while matching.

XSLT 2.0 processing invalid node mixing text and cdata

I need to parse the following node:
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
into a valid string, preferably "keyword1,keyword2,keyword3" but I would settle for removing the cdata completely.
Trying to access the node gives me the text "keyword1,keyword2keyword3" and I can't tell where the CDATA begins.
original xml (simplified version of mRSS feed)
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
<channel>
<item>
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
</item>
</channel>
</rss>
xsl (simplified):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:media="http://search.yahoo.com/mrss/" exclude-result-prefixes="xs xsi fn">
<xsl:output method="xml" encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
<test>
<xsl:variable name="items" select="/rss/channel/item"/>
<xsl:for-each select="$items">
<xsl:variable name="mediakw" select="media:keywords"/>
<xsl:element name="mediaKeyWords">
<xsl:value-of select="$mediakw"/>
</xsl:element>
</xsl:for-each>
</test>
</xsl:template>
</xsl:stylesheet>
and the output:
<test xmlns:media="http://search.yahoo.com/mrss/"><mediaKeyWords>keyword1,keyword2keyword3</mediaKeyWords></test>
Thanks a lot!
XML and XSLT cannot help you here.
XSLT uses the INFOSET model in which there isn't anything as a "CDATA node" and there is just a single text() node:
"keyword1,keyword2keyword3"
The XML document needs to be corrected and a comma be inserted between the substrings "keyword2" and "keyword3"
One solution would be to process the CDATA DOM node using DOM, and only then initiate the XSLT transformation.
By the time the XSLT processor sees the text, the CDATA is gone. You cannot see the incoming CDATA, and have very little control over how output CDATA is generated (all or nothing for a given tag).
Can't be done in standard XSLT.
The input XML you're receiving,
<media:keywords>keyword1,keyword2<![CDATA[keyword3]]></media:keywords>
is indistinguishable (to XSLT) from
<media:keywords>keyword1,keyword2keyword3</media:keywords>
because the CDATA markup is just a way of escaping the data inside it. There is really no special markup to escape in this case, so the CDATA happens to be a no-op. But XSLT has no way of knowing what data was originally expressed using CDATA, what was expressed using character entities, etc.
The solution would be to tell whoever is providing this XML that they need to put a delimiter between keyword2 and keyword3.

Insert value using XSLT

I have an tag, and need to assign value to it's attribute within my XSLT
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
I need to assign value to the trId attribute, but the way I have it now don't work,
what is the right way to do it?
<ABX trId="<xsl:value-of select="CODE_VALUE"/>">
</xsl:template>
</xsl:stylesheet>
<ABX>
<xsl:attribute name="trId"><xsl:value-of select="CODE_VALUE"/></xsl:attribute>
</ABX>
The XSLT <attribute> tag will do exactly what you want.
Or you could simply do this:
<ABX trId="{CODE_VALUE}"/>
The expression inside curly braces is evaluated and the result is put into the attribute value. See Section 7.6.2, Attribute Value Templates in the spec.