XSLT wrap element and following-sibling text - xslt

Kindly help me to wrap the img.inline element with the following sibling text comma (if comma exists):
text <img id="1" class="inline" src="1.jpg"/> another text.
text <img id="2" class="inline" src="2.jpg"/>, another text.
Should be changed to:
text <img id="1" class="inline" src="1.jpg"/> another text.
text <span class="img-wrap"><img id="2" class="inline" src="2.jpg"/>,</span> another text.
Currently, my XSLT will wrap the img.inline element and add comma inside the span, now I want to remove the following comma.
text <span class="img-wrap"><img id="2" class="inline" src="2.jpg"/>,</span>
, <!--remove this extra comma--> another text.
My XSLT:
<xsl:template match="//img[#class='inline']">
<xsl:copy>
<xsl:choose>
<xsl:when test="starts-with(following-sibling::text(), ',')">
<span class="img-wrap">
<xsl:apply-templates select="node()|#*"/>
<xsl:text>,</xsl:text>
</span>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="node()|#*"/>
</xsl:otherwise>
</xsl:choose>
</xsl:copy>
<!-- checking following-sibling::text() -->
<xsl:apply-templates select="following-sibling::text()" mode="commatext"/>
</xsl:template>
<!-- here I want to match the following text, if comma, then remove it -->
<xsl:template match="the following comma" mode="commatext">
<!-- remove comma -->
</xsl:template>
Is my approach is correct? or is this something should be handled differently? pls suggest?

Currently you are copying the img and the embedding the span within that. Also, you do <xsl:apply-templates select="node()|#*"/> which will select child nodes of img (or which there are none). And for the attributes it will end add them to the span.
You don't actually need the xsl:choose here as you can add the condition to the match attribute.
<xsl:template match="//img[#class='inline'][starts-with(following-sibling::node()[1][self::text()], ',')]">
Note I have changed the condition as following-sibling::text() selects ALL text elements that follow the img node. You only want to get the node immediately after the img node, but only if it is a text node.
Also, trying to select the following text node with xsl:apply-templates is probably not the right approach, assuming you have a template that matches the parent node which selects all child nodes (not just img ones). I am assuming you were using the identity template here.
Anyway, try this XSLT instead
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" indent="no" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="//img[#class='inline'][starts-with(following-sibling::node()[1][self::text()], ',')]">
<span class="img-wrap">
<xsl:copy-of select="." />
<xsl:text>,</xsl:text>
</span>
</xsl:template>
<xsl:template match="text()[starts-with(., ',')][preceding-sibling::node()[1][self::img]/#class='inline']">
<xsl:value-of select="substring(., 2)" />
</xsl:template>
</xsl:stylesheet>

Related

XSL copy without values is it possible?

I want to compare two xmls.
1. First compare XML strucutre/schema.
2. Compare values.
I am using beyond compare tool to compare. Since these two xmls are different values, there are lot many differences in comparison report, for which I am not interested. Since, my focus now is to only compare structure/schema.
I tried to copy the xmls by following template, and other as well. But every time it is with values.
I surfed on google, xsl-copy command itself copies everything for selected node/element..
Is there any ways with which I can filter out values and only schema is copied ?
My Data :
<root>
<Child1>xxxx</Child1>
<Child2>yyy</Child2>
<Child3>
<GrandChild1>dddd<GrandChild1>
<GrandChild2>erer<GrandChild2>
</Child3>
</root>
Template used :
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<!-- for all elements (tags) -->
<xsl:template match="*">
<!-- create a copy of the tag (without attributes and children) in the output -->
<xsl:copy>
<!-- For all attributes of the current tag -->
<xsl:for-each select="#*">
<xsl:sort select="name( . )" order="ascending" case-order="lower-first" />
<xsl:copy/>
</xsl:for-each>
<!-- recurse through all child tags -->
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()|comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
OutPut Required :
Something like..
<root>
<Child1></Child1>
<Child2></Child2>
<Child3>
<GrandChild1><GrandChild1>
<GrandChild2><GrandChild2>
</Child3>
</root>
At the moment, you have a template matching text() to copy it. What you need to do is remove this match from that template, and have a separate template match, that matches only non-whitespace text, and remove it.
<xsl:template match="comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
<xsl:template match="text()[normalize-space()]" />
For white-space only text (as used in indentation), these will be matched by XSLT'S built-in templates.
For attributes, use xsl:attribute to create a new attribute, without a value, rather than using xsl:copy which will copy the whole attribute.
<xsl:attribute name="{name()}" />
Note the use of Attribute Value Templates (the curly braces) to indicate the expression is to be evaluated to get the string to use.
Try this XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- for all elements (tags) -->
<xsl:template match="*">
<!-- create a copy of the tag (without attributes and children) in the output -->
<xsl:copy>
<!-- For all attributes of the current tag -->
<xsl:for-each select="#*">
<xsl:sort select="name( . )" order="ascending" case-order="lower-first" />
<xsl:attribute name="{name()}" />
</xsl:for-each>
<!-- recurse through all child tags -->
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>
<xsl:template match="text()[normalize-space()]" />
</xsl:stylesheet>
Also note that attributes are considered to be unordered in XML, so although you have code to sort the attributes, and they probably will appear in the right order, you can't guarantee it.

how to exclude footer element content from body using xslt 1.0

I have a html page for an example like below
<html><head><title>test</title></head><body><div>test1</div><footer><div>test2</div></footer></body></html>
I have written xslt 1.0 to transform and extract the title and body content, but my requirement is to ignore footer content alone and consider all other element values inside body content. How to achieve this ?
<xsl:template match="/">
<document >
<xsl:copy-of select="#*" />
<xsl:apply-templates select="html/head" />
<xsl:apply-templates select="html/body" />
</document>
</xsl:template>
<xsl:template match="html/head">
<content name="title">
<xsl:value-of select="title" />
</content>
</xsl:template>
<xsl:template match="html/body">
<content name="snippet">
<xsl:value-of select="viv:replace(viv:replace(.,'<[^>]*>',' ', 'gi'),'&nbsp;','','gi')"/>
</content>
</xsl:template>
Q: how to exclude footer element content from body using xslt 1.0
If this is really your question this should be answered hundred times.
Stat with an identity transform and have empty templates for elements to ignore.
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="body/footer"/>
Looking to your xlst let us assume there a some strange_other_things requested.
Do strange_other_things for the body without footer put the result form identity transfer into a variable.
<xsl:template match="body" mode="strange_other_things">
<xsl:variable name="body" >
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:variable>
<!-- use $body but I'm out here -->
</xsl:template>
Further guess: With viv:replace(.,'<[^>]*>',' ', 'gi') you try to remove xml element names. This will not work because . is used in a text context an will only return all text inside the current node.
So if I'm right the question is a lite deceptive.

How to fold by a tag a group of selected (neighbor) tags with XSLT1?

I have a set of sequential nodes that must be enclosed into a new element. Example:
<root>
<c>cccc</c>
<a gr="g1">aaaa</a> <b gr="g1">1111</b>
<a gr="g2">bbbb</a> <b gr="g2">2222</b>
</root>
that must be enclosed by fold tags, resulting (after XSLT) in:
<root>
<c>cccc</c>
<fold><a gr="g1">aaaa</a> <b gr="g1">1111</b></fold>
<fold><a gr="g2">bbbb</a> <b gr="g2">2222</b></fold>
</root>
So, I have a "label for grouping" (#gr) but not imagine how to produce correct fold tags.
I am trying to use the clues of this question, or this other one... But I have a "label for grouping", so I understand that my solution not needs the use of key() function.
My non-general solution is:
<xsl:template match="/">
<root>
<xsl:copy-of select="root/c"/>
<fold><xsl:for-each select="//*[#gr='g1']">
<xsl:copy-of select="."/>
</xsl:for-each></fold>
<fold><xsl:for-each select="//*[#gr='g2']">
<xsl:copy-of select="."/>
</xsl:for-each></fold>
</root>
</xsl:template>
I need a general solution (!), looping by all #gr and coping (identity) all context that not have #gr... perhaps using identity transform.
Another (future) problem is to do this recursively, with fold of foldings.
In XSLT 1.0 the standard technique to handle this sort of thing is called Muenchian grouping, and involves the use of a key that defines how the nodes should be grouped and a trick using generate-id to extract just the first node in each group as a proxy for the group as a whole.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:strip-space elements="*" />
<xsl:output indent="yes" />
<xsl:key name="elementsByGr" match="*[#gr]" use="#gr" />
<xsl:template match="#*|node()" name="identity">
<xsl:copy><xsl:apply-templates select="#*|node()"/></xsl:copy>
</xsl:template>
<!-- match the first element with each #gr value -->
<xsl:template match="*[#gr][generate-id() =
generate-id(key('elementsByGr', #gr)[1])]" priority="2">
<fold>
<xsl:for-each select="key('elementsByGr', #gr)">
<xsl:call-template name="identity" />
</xsl:for-each>
</fold>
</xsl:template>
<!-- ignore subsequent ones in template matching, they're handled within
the first element template -->
<xsl:template match="*[#gr]" priority="1" />
</xsl:stylesheet>
This achieves the grouping you're after, but just like your non-general solution it doesn't preserve the indentation and the whitespace text nodes between the a and b elements, i.e. it will give you
<root>
<c>cccc</c>
<fold>
<a gr="g1">aaaa</a>
<b gr="g1">1111</b>
</fold>
<fold>
<a gr="g2">bbbb</a>
<b gr="g2">2222</b>
</fold>
</root>
Note that if you were able to use XSLT 2.0 then the whole thing becomes one for-each-group:
<xsl:template match="root">
<xsl:for-each-group select="*" group-adjacent="#gr">
<xsl:choose>
<!-- wrap each group in a fold -->
<xsl:when test="#gr">
<fold><xsl:copy-of select="current-group()" /></fold>
</xsl:when>
<!-- or just copy as-is for elements that don't have a #gr -->
<xsl:otherwise>
<xsl:copy-of select="current-group()" />
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:template>

XSLT matching PAGEID to an element ID

How would I match two separate numbers in an XML document? There are multiple <PgIndexElementInfo> elements in my XML document, each representing a different navigation element, each with a unique <ID>. Later in the document a <PageID> specifies a number that sometimes matches an <ID> used above. How could I go about matching the <PageID> to the <ID> specified above?
<Element>
<Content>
<PgIndexElementInfo>
<ElementData>
<Items>
<PgIndexElementItem>
<ID>1455917</ID>
</PgIndexElementItem>
</Items>
</ElementData>
</PgIndexElementInfo>
</Content>
</Element>
<Element>
<Content>
<CustomElementInfo>
<PageID>1455917</PageID>
</CustomElementInfo>
</Content>
</Element>
EDIT:
I added the solution below to my code. The xsl:apply-templates that is present is used to recreate the nested lists that are lost between HTML and XML. What I now need to do is match the PageID to the ID of a <PgIndexElementItem> and add a CSS class to the <ul> it is a part of. I hope that makes sense.
<xsl:key name="kIDByValue" match="ID" use="."/>
<xsl:template match="PageID[key('kIDByValue',.)]">
<xsl:apply-templates select="//PgIndexElementItem[not(contains(Description, '.'))]" />
</xsl:template>
<xsl:template match="PgIndexElementItem">
<li>
<xsl:value-of select="Title"/>
<xsl:variable name="prefix" select="concat(Description, '.')"/>
<xsl:variable name="childOptions"
select="../PgIndexElementItem[starts-with(Description, $prefix)
and not(contains(substring-after(Description, $prefix), '.'))]"/>
<xsl:if test="$childOptions">
<ul>
<xsl:apply-templates select="$childOptions" />
</ul>
</xsl:if>
</li>
</xsl:template>
The XSLT way for dealing with cross references is with keys.
Matching: A rule matching every PageID element that it has been referenced by an ID element.
<xsl:key name="kIDByValue" match="ID" use="."/>
<xsl:template match="PageID[key('kIDByValue',.)]">
<!-- Template content -->
</xsl:template>
Selecting: A expression selecting every PageID element with specific value.
<xsl:key name="kPageIDByValue" match="PageID" use="."/>
<xsl:template match="ID">
<xsl:apply-templates select="key('kPageIDByValue',.)"/>
</xsl:template>

Trim whitespace from parent element only

I'd like to trim the leading whitespace inside p tags in XML, so this:
<p> Hey, <em>italics</em> and <em>italics</em>!</p>
Becomes this:
<p>Hey, <em>italics</em> and <em>italics</em>!</p>
(Trimming trailing whitespace won't hurt, but it's not mandatory.)
Now, I know normalize-whitespace() is supposed to do this, but if I try to apply it to the text nodes..
<xsl:template match="text()">
<xsl:text>[</xsl:text>
<xsl:value-of select="normalize-space(.)"/>
<xsl:text>]</xsl:text>
</xsl:template>
...it's applied to each text node (in brackets) individually and sucks them dry:
[Hey,]<em>[italics]</em>[and]<em>[italics]</em>[!]
My XSLT looks basically like this:
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
So is there any way I can let apply-templates complete and then run normalize-space on the output, which should do the right thing?
This stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p//text()[1][generate-id()=
generate-id(ancestor::p[1]
/descendant::text()[1])]">
<xsl:variable name="vFirstNotSpace"
select="substring(normalize-space(),1,1)"/>
<xsl:value-of select="concat($vFirstNotSpace,
substring-after(.,$vFirstNotSpace))"/>
</xsl:template>
</xsl:stylesheet>
Output:
<p>Hey, <em>italics</em> and <em>italics</em>!</p>
Edit 2: Better expression (now only three function calls).
Edit 3: Matching the first descendant text node (not just the first node if it's a text node). Thanks to #Dimitre's comment.
Now, with this input:
<p><b> Hey, </b><em>italics</em> and <em>italics</em>!</p>
Output:
<p><b>Hey, </b><em>italics</em> and <em>italics</em>!</p>
I would do something like this:
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
<!-- strip leading whitespace -->
<xsl:template match="p/node()[1][self::text()]">
<xsl:call-template name="left-trim">
<xsl:with-param name="s" value="."/>
</xsl:call-template>
</xsl:template>
This will strip left space from the initial node child of a <p> element, if it is a text node. It will not strip space from the first text node child, if it is not the first node child. E.g. in
<p><em>Hey</em> there</p>
I intentionally avoid stripping the space from the front of 'there', because that would make the words run together when rendered in a browser. If you did want to strip that space, change the match pattern to
match="p/text()[1]"
If you also want to strip trailing whitespace, as your title possibly implies, add these two templates:
<!-- strip trailing whitespace -->
<xsl:template match="p/node()[last()][self::text()]">
<xsl:call-template name="right-trim">
<xsl:with-param name="s" value="."/>
</xsl:call-template>
</xsl:template>
<!-- strip leading/trailing whitespace on sole text node -->
<xsl:template match="p/node()[position() = 1 and
position() = last()][self::text()]"
priority="2">
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
The definitions of the left-trim and right-trim templates are at Trim Template for XSLT (untested). They might be slow for documents with lots of <p>s. If you can use XSLT 2.0, you can replace the call-templates with
<xsl:value-of select="replace(.,'^\s+','')" />
and
<xsl:value-of select="replace(.,'\s+$','')" />
(Thanks to Priscilla Walmsley.)
You want:
<xsl:template match="text()">
<xsl:value-of select=
"substring(
substring(normalize-space(concat('[',.,']')),2),
1,
string-length(.)
)"/>
</xsl:template>
This wraps the string in "[]", then performs normalize-string(), then finally removes the wrapping characters.