Capture Following Nodes Until Specific Node - xslt

I have data like this:
<h4>Test1</h4>
<p>test</p>
<div>test</div>
<p>test</p>
<h4>Test2</h4>
<p>test</p>
<div>test</div>
<p>test</p>
<h4>Test3</h4>
<p>test</p>
<div>test</div>
<p>test</p>
I'm trying to capture all sibling nodes of an H4 until I reach another H4.
I'm currently using:
<xsl:for-each select="//h4”>
<xsl:copy-of select="following-sibling::*[generate-id(preceding-sibling::h4[1]) = generate-id(current())]"/>
</xsl:for-each>
This works but it captures the last h4 tag. I want to exclude that h4 tag. Output looks like this currently:
<p>test</p>
<div>test</div>
<p>test</p>
<h4>Test2</h4>
Is there a way to not capture the h4?

You can use following-sibling::*[not(self::h4)] instead of following-sibling::*.

Related

Insert element in a tag on the fly (all in the "content" side)

I need to modify on-the-fly the "content" side of a tag appending some text.
I have (on the content side) the classic portal-tabs:
<ul class="nav" id="portal-globalnav">
....
<li id="portaltab-events" class="plain">
Eventi
</li>
</ul>
I need to append (via diazo) on-the fly the content of another tag (#numbers) to obtain something like:
<ul class="nav" id="portal-globalnav">
....
<li id="portaltab-events" class="plain">
Eventi
<div id="#numbers">33</div>
</li>
</ul>
How solve this issue?
Thank's
You might see if this helps: http://docs.diazo.org/en/latest/recipes/modifying-text/index.html
Also, where does the #numbers div come from? If you append it to each LI tag, you'll have an invalid HTML (more than one element with the same ID)
A content replace containing a little XSL should do it.
<replace css:content="#portaltab-events a">
<xsl:copy-of select="." />
<xsl:copy-of select="//*[#id='numbers']" />
<xsl:apply-templates />
</replace>
If you separately drop the #numbers div, you'll need to add mode="raw" to the apply-templates to prevent it from being dropped here.

Using colon in attribute names in xsl transformations

<xsl:template name="AddThis">
<div class="AddThis">
<!-- AddThis Button BEGIN -->
<div class="addthis_toolbox addthis_default_style" addthis:url="{be:GetFullBlogUrl(#Date, #Title)}" addthis:title="{#Title}" xmlns:addthis="http://www.addthis.com">
<a class="addthis_button_facebook_like" fb:like:width="115"> </a>
<a class="addthis_button_tweet"></a>
<a class="addthis_counter addthis_pill_style addthis_nonzero"></a>
</div>
<script type="text/javascript" src="http://s7.addthis.com/js/250/addthis_widget.js#pubid=xa-4f86b27a69737a92"></script>
<!-- AddThis Button END -->
</div>
</xsl:template>
I need to add the fb:like:width="115" according to
http://support.addthis.com/customer/portal/articles/125587-facebook-like-button-width#.UZyl2rVM_2P
but the xsl transformation of course can't figure that out, due to namespaces issues.
Any idea how to resolve it? Any option to just write out plain text.
AFAIK there is no way to generate an attribute with 2 colons (a single colon fb: can be processed using a normal xmlns alias prefix).
So instead, you can render a literal using xsl:text with disable-output-escaping="yes", like so:
<xsl:text disable-output-escaping="yes">
<a class="addthis_button_facebook_like" fb:like:width="115"> </a>
</xsl:text>
Output:
<a class="addthis_button_facebook_like" fb:like:width="115">

xslt matching all nodes prior to a specific node

I am trying to match all the nodes before a specific node. Input XML
<story>
<content>
<p>This is the text I want</p>
<p>This is the text I want</p>
<p>This is the text I want</p>
<ul>
<li></li>
...
</ul>
....
....
</content>
</story>
With That as my input XML, I am trying and failing to grab all the <p> tags prior to the <ul> tags and render them. There could be 0 <p> tags or infinite. Any thoughts on how to do this with XSLT 1.0? Thanks!
/story/content/p[not(preceding-sibling::ul)]
Use:
//p[not(preceding::ul or ancestor::ul)]
This is generally wrong:
//p[not(preceding-sibling::ul)]
because it doesnt select p elements that come before any ul, but aren't siblings to any ul.
For example, given this XML document:
<story>
<div>
<p>Must be selected</p>
</div>
<ul>
<li><p>Must not be selected</p></li>
</ul>
<content>
<p>Must not be selected</p>
<div>
<p>Must not be selected</p>
</div>
<p>Must not be selected</p>
<p>Must not be selected</p>
<ul>
<li></li>
<li><p>This must not be selected</p></li>
</ul>
....
....
</content>
</story>
the above wrong expression selects:
<p>Must not be selected</p>
<p>Must not be selected</p>
<p>Must not be selected</p>
and doesn't select the wanted element:
<p>Must be selected</p>
But the correct expression at the start of this answer selects just the wanted p element:
<p>Must be selected</p>

Edit html document using regex replace and matching contents of only immediate child

I have html that looks like so:
<ul style="list-style-type: square;">
<br />
<li margin-left="80px">
<br />first line
<br />
<br />second line
</li>
<br />
<li margin-left="80px">
<br />text line 1
</li>
<br />
<li margin-left="80px">
<br />text line 2
</li>
<br />
</ul>
I want to match contents of the ul, but I don't want to match contents of the li elements
The end goal is to get rid of the <br /> tags that are directly under the <ul></ul> and not under the <li></li>
Note:For clarity of the example I did formate the above html, but in my real world scenario it comes as a single giant string without any /r/n's
here:
<p margin-left="40px"><br /> <b>[What is the nature of the Services?]</b></p><br /><p><br /> [What are the overarching goals, objectives and outcomes you want to achieve?]</p><br /><p margin-left="80px"><br /> <b><i><u>[How should the Services be delivered?]</u></i></b></p><br /><ul style="list-style-type: square;"><br /> <li margin-left="80px"><br /> gfhsdfsdf<br /><br /> some line here</li><br /> <li margin-left="80px"><br /> sfdsfsdfsdf</li><br /> <li margin-left="80px"><br /> sdfsdfsdf</li><br /></ul><br /><p><br /> [Is the appointment of this Supplier exclusive?]</p><br /><p><br /> [Refer to any proposal prepared by the Supplier if this helps describes any aspects of the Service]</p><br />
Anyway the first thing in my mind was to
use this to extract the contents of the <ul>
<ul[^>]*>(.*)</ul>
and then maybe do a subsequent one to select all the li
<li[^>]*>.*</li>
and then somehow get rid of anything else that's left over
but that's kind of lame and then again
<li[^>]*>.*</li>
matches whole bunch of li's
this entrie string gets captured:
<li margin-left="80px"><br />\t\tgfhsdfsdf<br /><br />\t\tsome line here</li><br />\t<li margin-left="80px"><br />\t\tsfdsfsdfsdf</li><br />\t<li margin-left="80px"><br />\t\tsdfsdfsdf</li>
i know it's because dot is greedy, but not sure how to avoid it
something like [^</li>]* wouldn't work cuz it treats it like list of characters not a string
any help much appreciated
So I have 2 problems
1) i don't like the way I'm approaching this - better ideas needed (I'm considering using set operations of linq to xml to achieve this) - still hope to do this with regex, but if anyone knows exactly how to do this then please share
2) how do I capture separate groups of lis instead of capturing entire first opening <li> and last closing </li>?
I think you should go look at this...
RegEx match open tags except XHTML self-contained tags
Then recognize that parsing html with a regex is not quite that easy. personally I would load the html in to an html dom object then crawl the document... you might look at this project for some help.
http://htmlagilitypack.codeplex.com/
Since you don't say which regex flavor you're using, here's a JavaScript-compatible regex to match a <br /> that's inside a <ul> element but not inside a <li> element:
<br\s*/>(?=[^<]*(?:<(?!/?ul\b)[^<]*)*</ul>)(?![^<]*(?:<(?!/?li\b)[^<]*)*</li>)
Breaking that down,
<br\s*/> matches the BR tag, of course.
(?=[^<]*(?:<(?!/?ul\b)[^<]*)*</ul>) looks ahead for the next occurrence of </ul>, but only if it doesn't encounter a <ul> tag first.
(?![^<]*(?:<(?!/?li\b)[^<]*)*</li>) does the same thing with </li> and <li> tags, but this time negating the result.
Being JS compatible, this should work in Dreamweaver as well as in editors with solid regex support, like EditPad and TextMate. It's also compatible with most Perl-derived flavors (Python, .NET, Java, etc.), though some syntactic tweaking will probably be needed.

Using XSLT, how to turn each tag into a div with a class matching the tag name?

Using XSLT, I'd like to be able to transform this :
<doc>
<tag1>AAA</tag1>
Hello !
<tag2>BBB</tag2>
</doc>
into this :
<div class="doc">
<div class="tag1">AAA</div>
Hello !
<div class="tag2">BBB</div>
</div>
...but without specifying explicitly any tag name in the stylesheet (there are too many in the real world)
What would be the best way to do this ?
Something along the lines of
<xslt:template match="*">
<div class="{local-name()}">
<xsl:apply-templates />
</div>
</xslt:template>