R: how to get xpath expression for the nested structure - regex

the following is the html codes:
<div class="grid1-4">
<a class="largeButton javascript sponsorProject button orangeGrad" href="javascript:;">
<div class="button">
<div class="button progress">
<div class="progressWrapper">
<div class="meter">
<div class="progress" style="width:52%"> </div>
</div>
<p class="progressText">
<span>52% Raised of $20,000 Goal</span>
I want to extract the sentence around at the very bottom of the codes - that is, 52% Raised of $20,000 Goal.
what is the xpath expression for that? I googled and searched for hints but couldn't get much out of it...:(. I even used firebug to find xpath expression yet still, no progress...
thank you
PS: due to the nature of my project, I CANNOT write
//p[#class="progressText"]//span
the xpath expression HAS TO INVOLVE
<div class="grid1-4">

This XPath:
//div[#class="grid1-4"]//text()[contains(., 'Raised of')]
Yields:
52% Raised of $20,000 Goal

Related

regex repeat group

I am trying to capture the url of the images (how ever many there may be on a specific site. I am able to do so however when I then progress to try an capture other things thereafter the entire thing falls apart. Would greatly appreciate any help.
Working regex:
.(?:src="(http:\/\/website\.bla\.com\/Live.+?)".+?)
Non working
.(?:src="(http:\/\/website\.bla\.com\/Live.+?)".+?).*Status.*\s(Sld|Rtr)
Sample code:
<div ng-class="{
'active': active
}" class="item text-center ng-isolate-scope" ng-transclude="" ng-repeat="slide in slides" active="slide.active">
<img class="image-circle ng-scope" ng-src="http://website.bla.com/Live/photos/FULL/18/134/W3764134_18.jpg" src="http://website.bla.com/Live/photos/FULL/18/134/W3764134_18.jpg">
</div><!-- end ngRepeat: slide in slides --><div ng-class="{
'active': active
}" class="item text-center ng-isolate-scope" ng-transclude="" ng-repeat="slide in slides" active="slide.active">
<img class="image-circle ng-scope" ng-src="http://website.bla.com/Live/photos/FULL/19/134/W3764134_19.jpg" src="http://website.bla.com/Live/photos/FULL/19/134/W3764134_19.jpg">
</div><!-- end ngRepeat: slide in slides --><div ng-class="{
'active': active
}" class="item text-center ng-isolate-scope" ng-transclude="" ng-repeat="slide in slides" active="slide.active">
<img class="image-circle ng-scope" ng-src="http://website.bla.com/Live/photos/FULL/20/134/W3764134_20.jpg" src="http://website.bla.com/Live/photos/FULL/20/134/W3764134_20.jpg">
</div><!-- end ngRepeat: slide in slides -->
</div>
<b class="ng-binding">Status:</b> Sld
For this simple example: use alternates. Please see this.
But this can get complicated if added requirements are to be implemented. In that case you might want to use a HTML parser as in JSoup.
See this one - it is already answered:
With lots of assumptions, you could try this:
src="(http://website\.bla\.com/Live.+?)"(?:(?:[^s]|s[^r]|sr[^c])*?Status.*? (Sld|Rtr))?

Parse specific div from raw text using regex?

So I'm in a situation that requires parsing raw HTML data as a string, this is unavoidable unfortunately otherwise I wouldn't post this. I only need regex to match the class of a div that has an img tag as a child.
So this is the code example that I'm dealing with:
<div class="summary">
<h3>Example</h3>
<div class="explanation">
<span>This serves as an example for the site.</span>
</div>
<div class="user-details">
mheathershaw<br>
<img src="res/badge522.png"/> <span class="score">522</span>
</div>
<div class="help">
Help
</div>
</div>
And the div that I'd like to retrieve the class from is the div that contains the image. The exact capture from this example that I'd like (optimally) is user-details. The criteria for capturing it is simply if it has <img ... /> as a child.
Anyone able to help? Thanks!
You may try this,
/<div\b[^>]*\bclass="([^"]*)"[^>]*>(?:(?!<\/div>)[\s\S])*?<img\b[^>]*>(?:(?!<\/div>)[\s\S])*?<\/div>/
DEMO

Find and replace Dreamweaver - Adding a number each time?

I've just been doing some research about regular expressions on dreamweaver and had no idea you could do so much with find and replace.
One thing i haven't been able to figure out is if i have:
<div class="item active"> <img src="images/gates2/large/gates1.jpg">
<div class="carousel-caption">WG1</div>
</div>
<div class="item"><img src="images/gates2/large/gates2.jpg">
<div class="carousel-caption">WG1</div>
</div>
<div class="item"><img src="images/gates2/large/gates3.jpg">
<div class="carousel-caption">WG1</div>
</div>
<div class="item"><img src="images/gates2/large/gates4.jpg">
<div class="carousel-caption">WG1</div>
</div>
Can i automatically change the 1 to 2, 3 ect? So it becomes an order? Rather than going through each manually?
Kind Regards,
Shaun
Unfortunately, you won't be able to do what you're describing using the Find & Replace method in Dreamweaver. I know this probably isn't the answer you want to hear but, you'll have to go through them manually. Sorry to be the bearer of bad news.

Transform HTML grab class and inner string in Sublime Text with Regex

I'm well aware of the dangers of parsing HTML with Regular Expressions but I'm in a pickle and need to speed things up! I have this code block:
<div class="row">
<div class="span1"><i class="fc-icon-letterpress-circle"></i>
</div>
<div class="span4">This design is letterpressed onto the finest quality salvaged paper made here in the USA.</div>
</div>
<div class="row">
<div class="span1"><i class="fc-icon-madeinusa-circle"></i>
</div>
<div class="span4">We work hard to be able to say that this product is proudly created right here in the USA.</div>
</div>
<div class="row">
<div class="span1"><i class="fc-icon-cotton-circle"></i>
</div>
<div class="span4">This product is made from tree-free paper sourced from salvaged cotton from the textile industry.</div>
</div>
And I need it to be transformed to this:
letterpress-circle, "This design is letterpressed onto the finest quality salvaged paper made here in the USA.";
madeinusa-circle, "We work hard to be able to say that this product is proudly created right here in the USA.";
cotton-circle, "This product is made from tree-free paper sourced from salvaged cotton from the textile industry.";
Preferably with one regular expression, in sublime text
This is what I have so far.
.+i class=\"fc-icon-(.+)\".+
$1
This is what I landed on:
<div class=\"row\">
<div class=\"span1\"><i class=\"fc-icon-(.+)\"></i>
</div>
<div class=\"span4\">(.+)</div>
</div>

Regular expression for exactly one match

I am using the following regular expression in my code editor (sublime text) in order to search for the ASP.NET comments.
<%--.*(\n.*)*--%>
I want this regular expression to stop looking any forward as soon as the first --%> is found. But it keeps looking until the last comment's --%> is found. I have got this idea that i've to use some kind of flag to make it stop as soon as the first --%> but I am unable to figure it out.
Can anyone please tell me how may I modify this regex?
UPDATE
I forgot to post some sample markup. Here it is:
<div class="modal-footer">
<%--<button class="btn" data-dismiss="modal">
Close</button>
<button id="btnAddCountry" class="btn btn-primary" data-dismiss="modal">
Save changes</button>--%>
</div>
</div>
<div class="row-fluid">
<div class="span12">
<div class="box paint_hover">
<div class="title">
<h3>Sale Voucher</span>
</h3>
</div>
<div class="content">
<ul id="tabExample1" class="nav nav-tabs">
<li class="active"><a id="lnkAddEditVoucher" href="#AddEditVoucher" data-toggle="tab">Add/Update Sale Voucher</a></li>
<li><a id="lnkViewVouchers" href="#ViewVouchers" data-toggle="tab">Search Sale Voucher</a></li>
<%-- <li><a id="lnkViewParties" href="#ViewParties" data-toggle="tab">Search Parties</a></li>--%>
</ul>
I just want to match the first comment and not the second one.
You need to make the * quantifiers non-greedy. Usually this is done by adding a ? after them, e.g. .*? instead of just .*.
I've also simplified the regex a bit. Sublime Text supports the (?s) modifier at the beginning of the pattern to make the dot match even newlines:
(?s)<%--.*?--%>
If you prefer matching the newline explicitly:
<%--(.|\n)*?--%>
The problem you seem to have is that you use the greedy version of .*, which matches anything (including --%>). Try using <%--.*?(\n.*?)*?--%> instead to make it non-greedy.