Selenium Python UnboundLocalError: local variable 'element' referenced before assignment - python-2.7

I am trying to click on a span tag which contains the text "Clean feed crm"
using an XPATH locator.
I get the error:
UnboundLocalError: local variable 'element' referenced before assignment
Full error trace:
Traceback (most recent call last):
File "C:\Webdriver\ClearCore\TestCases\OperationsPage_TestCase.py", line 56, in test_add_and_run_clean_process
process_lists_page.click_clean_feed_task_from_groups_tab(Globals.process_lists_clean_feed_task_crm)
File "C:\Webdriver\ClearCore\Pages\operations.py", line 90, in click_clean_feed_task_from_groups_tab
clean_feed_crm_element = self.get_element(By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "Clean feed crm")]')
File "C:\Webdriver\ClearCore 501\Pages\base.py", line 31, in get_element
return element
UnboundLocalError: local variable 'element' referenced before assignment
If i use the absolute full XPATH it works fine. The relative XPATH it shows the error.
The full absolute XPATH which works is:
(By.XPATH, 'html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[7]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[3]/div/div[2]/div/div[1]/div/div/div/div/div[1]/div[1]/div[2]/div/div[1]/div[1]/div/div/div[2]/div/div[2]/span[1]/span')
The relative XPATH which does not work is:
(By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "Clean feed crm")]')
The HTML is:
<div id="operations_add_process_list_ct_groups_and_tasks" class="GPI5XK1CDG" __gwtcellbasedwidgetimpldispatchingfocus="true" __gwtcellbasedwidgetimpldispatchingblur="true" role="tree">
<div style="overflow: hidden;">
<div>
<div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="1" aria-expanded="true" aria-level="1">
<div class="GPI5XK1CIF GPI5XK1CAG" style="padding-left: 0px;">
<div style="overflow: hidden;">
<div>
<div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="1" aria-level="2">
<div class="GPI5XK1CIF" style="padding-left: 16px;">
<div class="GPI5XK1CIF GPI5XK1CKF" style="padding-left: 16px;position:relative;" onclick="">
<div style="position:absolute;display:none;"/>
<div class="GPI5XK1CLF">
<div style="padding-left: 22px;position:relative;zoom:1;">
<div style="left:0px;margin-top:-8px;position:absolute;top:50%;line-height:0px;">
<img border="0" style="width:16px;height:16px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAC2UlEQVR42mNgQANlZ1PqG84XfA7YYVsG5AqBaK+lFnM0E5RUgXx2IGZkwAbq99ezgCTTj4Wu//jrw/+68/mf3BeZ7ei5XPdv6+M1/y2btBcC5TWghqCCwlPxzZ2Xqz/FHfbe3Xu57uudTzf+H3i+4//eZ1v/H3257//aB4v/a8XKrxE1Esw2LFDzBGrhAGImuAFdl2s+/fjz/f+r7y/+H3qxG6zx0ruz/8++Of5/x5P1/7c8Xv2/9lzu/7Izqf/9N9t8E9bidwZq44d5h9Fvo9XMlffn/z/x6iBY040Pl/8vuTvj/7bH68BiMENAdOnplP/SdqJTgfr0wd7xXmnu6DTTcOmcWxP+73m25f+tj9f+B+6w/afoJb5TNVRmHVDDl11PN/0HhQPQm//NazQ/sQuzgwxwB2IehsiDLr9AzgXZClJ49f0FYIDpXAVKLgHiCu81lis3PloBNgAYoP8NclVuc0lwTgbKOQAxN0PIPvuXR17u+X/wxS6wE0++PgyKgT9mNRpL/TZat0253v5m3cMl/0GGgLy56v6C/4YFqg+Bmr3B4WDdot3usdjsg+96K7AhIFeAYgCkCaQBRIP8D2JXns0AG6CdqHAXqLkYiGVBgSjMq8CdELjd9jvIAFDAgTDIyTA2SBPI/0CNT2UcRHfwynLNRTaAxahQrQ8U1yvuzQX7E2TrmgeLwDHRdKH4/9K7s/5Pu9H1X85VfC9Q/UwgzgVie2hUMjDopSs5A13wy3GywX+TMvXHwFj4DfJz4mG//1bNOv8yjoX+m3trIsjvT4HKa6FRyAfEzLC0xCnvKp4ibiI4CcieoREpe7DjUuV/0wqNn8yczEsF1XkXgdiiBoKrgPLlQCyHnidAHBEgtgRiJzlXiUagS35J24psAfJBhkYwczAnQjUHgsINW34CpW1OcOKAGOYBypxAHAXE8kAsBsSKUDkWBgIApEAISSMr1JWM6E4HAJKeit5kyDtvAAAAAElFTkSuQmCC) no-repeat 0px 0px;" src="http://justin-pc.infoshare.local:8080/clearcore501/ClearCore/clear.cache.gif" onload="this.__gwtLastUnhandledEvent="load";"/>
</div>
<div>
<span>
<span class=" myinlineblock" title="Clean feed crm" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;width:100%;margin-right:-14px;">Clean feed crm</span>
</span>
<span>
<span class="" title="Turn task off or on." style="">
<input type="checkbox" checked="" tabindex="-1"/>
</span>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="2" aria-level="2">
<div class="GPI5XK1CIF" style="padding-left: 16px;">
<div class="GPI5XK1CIF GPI5XK1CKF" style="padding-left: 16px;position:relative;" onclick="">
<div style="position:absolute;display:none;"/>
<div class="GPI5XK1CLF">
<div style="padding-left: 22px;position:relative;zoom:1;">
<div style="left:0px;margin-top:-8px;position:absolute;top:50%;line-height:0px;">
<img border="0" style="width:16px;height:16px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAC2UlEQVR42mNgQANlZ1PqG84XfA7YYVsG5AqBaK+lFnM0E5RUgXx2IGZkwAbq99ezgCTTj4Wu//jrw/+68/mf3BeZ7ei5XPdv6+M1/y2btBcC5TWghqCCwlPxzZ2Xqz/FHfbe3Xu57uudTzf+H3i+4//eZ1v/H3257//aB4v/a8XKrxE1Esw2LFDzBGrhAGImuAFdl2s+/fjz/f+r7y/+H3qxG6zx0ruz/8++Of5/x5P1/7c8Xv2/9lzu/7Izqf/9N9t8E9bidwZq44d5h9Fvo9XMlffn/z/x6iBY040Pl/8vuTvj/7bH68BiMENAdOnplP/SdqJTgfr0wd7xXmnu6DTTcOmcWxP+73m25f+tj9f+B+6w/afoJb5TNVRmHVDDl11PN/0HhQPQm//NazQ/sQuzgwxwB2IehsiDLr9AzgXZClJ49f0FYIDpXAVKLgHiCu81lis3PloBNgAYoP8NclVuc0lwTgbKOQAxN0PIPvuXR17u+X/wxS6wE0++PgyKgT9mNRpL/TZat0253v5m3cMl/0GGgLy56v6C/4YFqg+Bmr3B4WDdot3usdjsg+96K7AhIFeAYgCkCaQBRIP8D2JXns0AG6CdqHAXqLkYiGVBgSjMq8CdELjd9jvIAFDAgTDIyTA2SBPI/0CNT2UcRHfwynLNRTaAxahQrQ8U1yvuzQX7E2TrmgeLwDHRdKH4/9K7s/5Pu9H1X85VfC9Q/UwgzgVie2hUMjDopSs5A13wy3GywX+TMvXHwFj4DfJz4mG//1bNOv8yjoX+m3trIsjvT4HKa6FRyAfEzLC0xCnvKp4ibiI4CcieoREpe7DjUuV/0wqNn8yczEsF1XkXgdiiBoKrgPLlQCyHnidAHBEgtgRiJzlXiUagS35J24psAfJBhkYwczAnQjUHgsINW34CpW1OcOKAGOYBypxAHAXE8kAsBsSKUDkWBgIApEAISSMr1JWM6E4HAJKeit5kyDtvAAAAAElFTkSuQmCC) no-repeat 0px 0px;" src="http://justin-pc.infoshare.local:8080/clearcore501/ClearCore/clear.cache.gif" onload="this.__gwtLastUnhandledEvent="load";"/>
</div>
<div>
<span>
<span class=" myinlineblock" title="Clean feed escr" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;width:100%;margin-right:-14px;">Clean feed escr</span>
</span>
<span>
<span class="" title="Turn task off or on." style="">
<input type="checkbox" checked="" tabindex="-1"/>
</span>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
My method implementation is:
def click_clean_feed_task_from_groups_tab(self, feed):
# Params: feed: clean feed crm, clean feed escr or clean feed orchard
#clean_feed_crm_element = self.driver.find_element(By.XPATH, '//span[#class="myinlineblock" and contains(text(), "%s") % feed]')
clean_feed_crm_element = self.get_element(By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "Clean feed crm")]')
#clean_feed_crm_element = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//..//.//..//..//..//..//..//..//../span[contains(text(), "%s")] % feed ]')))
clean_feed_crm_element.click()
return self
From my TestCase class i call th method:
project_navigator = ProjectNavigatorPage(self.driver)
process_lists_page = project_navigator.select_projectNavigator_item("Process Lists")
process_lists_page.click_add_button_for_process_lists()
process_lists_page.click_clean_task_arrow_to_expand_it_from_groups_tab("add")
process_lists_page.click_clean_feed_task_from_groups_tab(Globals.process_lists_clean_feed_task_crm)
Globals.py is:
process_lists_clean_feed_task_crm = "Clean feed crm"
I havea also tried using WebDriverWait still the same error:
clean_feed_crm_element = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable(((By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "%s") % feed]')))
%s, % feed the value is "Clean feed crm" as I am looking for this text (passed in as a parameter into my method.
What am i doing wrong? What XPATH could i use then to click the element which has the text "Clean feed crm"?
Thanks,
Riaz

If we recall some elements from the XPath sintax:
The expression "//" selects nodes in the document from the current
node that match the selection no matter where they are.
The expression ".." selects the parent of the current node.
Therefore when you write:
//div[#id="operations_add_process_list_ct_groups_and_tasks"]//..
You are selecting the div node itself. From that node the relative XPath should be:
//div[#id="operations_add_process_list_ct_groups_and_tasks"]//span[contains(text(), "Clean feed crm")]
That way you select the div node with the id selected, and look inside for the span tag which contains the text.

Related

Parallax Not functioning properly : Django

I have downloaded the template : http://keenthemes.com/preview/megakit/
On the index page there is a parallax :
<div class="js__parallax-window" style="background: url(img/1920x1080/04.jpg) 50% 0 no-repeat fixed;">
<div class="container g-text-center--xs g-padding-y-80--xs g-padding-y-125--sm">
<p class="text-uppercase g-font-size-14--xs g-font-weight--700 g-color--white-opacity g-letter-spacing--2 g-margin-b-50--xs">Testimonials</p>
<div class="s-swiper js__swiper-testimonials">
<!-- Swiper Wrapper -->
<div class="swiper-wrapper g-margin-b-50--xs">
<div class="swiper-slide g-padding-x-130--sm g-padding-x-150--lg">
<div class="g-padding-x-20--xs g-padding-x-50--lg">
<div class="g-margin-b-40--xs">
<p class="g-font-size-22--xs g-font-size-28--sm g-color--white"><i>" I have purchased many great templates over the years but this product and this company have taken it to the next level. Exceptional customizability. "</i></p>
</div>
<div class="center-block g-hor-divider__solid--white-opacity-lightest g-width-100--xs g-margin-b-30--xs"></div>
<h4 class="g-font-size-15--xs g-font-size-18--sm g-color--white-opacity-light g-margin-b-5--xs">Jake Richardson / Google</h4>
</div>
</div>
<div class="swiper-slide g-padding-x-130--sm g-padding-x-150--lg">
<div class="g-padding-x-20--xs g-padding-x-50--lg">
<div class="g-margin-b-40--xs">
<p class="g-font-size-22--xs g-font-size-28--sm g-color--white"><i>" I have purchased many great templates over the years but this product and this company have taken it to the next level. Exceptional customizability. "</i></p>
</div>
<div class="center-block g-hor-divider__solid--white-opacity-lightest g-width-100--xs g-margin-b-30--xs"></div>
<h4 class="g-font-size-15--xs g-font-size-18--sm g-color--white-opacity-light g-margin-b-5--xs">Jake Richardson / Google</h4>
</div>
</div>
<div class="swiper-slide g-padding-x-130--sm g-padding-x-150--lg">
<div class="g-padding-x-20--xs g-padding-x-50--lg">
<div class="g-margin-b-40--xs">
<p class="g-font-size-22--xs g-font-size-28--sm g-color--white"><i>" I have purchased many great templates over the years but this product and this company have taken it to the next level. Exceptional customizability. "</i></p>
</div>
<div class="center-block g-hor-divider__solid--white-opacity-lightest g-width-100--xs g-margin-b-30--xs"></div>
<h4 class="g-font-size-15--xs g-font-size-18--sm g-color--white-opacity-light g-margin-b-5--xs">Jake Richardson / Google</h4>
</div>
</div>
</div>
<!-- End Swipper Wrapper -->
<!-- Arrows -->
<div class="g-font-size-22--xs g-color--white-opacity js__swiper-fraction"></div>
<!-- End Arrows -->
</div>
</div>
</div>
The Parallax image is basically defined with properties in the line :
<div class="js__parallax-window" style="background: url(img/1920x1080/04.jpg) 50% 0 no-repeat fixed;">
Now to use the same in django I changed the url of the image to following:
<div class="js__parallax-window" style="background: url({% static 'img_events/1920x1080/04.jpg' %}) 50% 0 no-repeat fixed;" >
The only problem the parallax is not working properly in this case, the parallax image size should only be 50% but it causes a mismatch in the height and placement of the testimonials and the background image whereas the same code works in the template
The only problem here was that <!DOCTYPE html> was not included at the beginning of the page.

Extract information from all matching nodes without looping xpath

<ul class="products-grid">
<li class="item">
<div class="product-block">
<div class="product-block-inner">
<img src="#/producta.jpg">
<h2 class="product-name">Product A</h2>
<div class="price-box">
<span class="regular-price" id="#">
<span class="price">Rs 1,849</span>
</span>
</div>
</div>
</div>
</li>
<li class="item">
<div class="product-block">
<div class="product-block-inner">
<img src="#/productb.jpg">
<h2 class="product-name">Product B</h2>
<div class="price-box">
<span class="regular-price" id="#">
<span class="price">Rs 1,849</span>
</span>
</div>
</div>
</div>
</li>
</ul>
I am at this moment scraping the item in a loop.
products = response.xpath('//ul[#class="products-grid"]//li//div[#class="product-block"]//div[#class="product-block-inner"]').extract()
After getting the product-block-inner node, I save it into products and then I will have to loop like
for product in products:
// parse the div.product-block-inner further deep down
// to get name, price, image etc
// and save it to a dict and yeild
pass
Is this possible that i get text, href for all div.product-block-inner in the final list without looping
Yes, but it's very confusing, for example you could try this:
products = response.xpath(
'//ul[#class="products-grid"]//li//div[#class="product-block"]//div[#class="product-block-inner"]'
).css(
'.product-name a::attr(href), .product-name a::text, .price::text'
).extract()
but I would suggest to always loop (btw, why do you call extract() when you assign it to products?)
products = response.xpath(
'//ul[#class="products-grid"]//li//div[#class="product-block"]//div[#class="product-block-inner"]'
)
for product in products:
yield {'name': product.css('.product-name a::text').extract_first()
'url': product.css('.product-name a::attr(href)').extract_first()
'price': product.css('.price::text').extract_first()}
(I've used css selectors in this case because the equivalent xpaths are longer, but the same can also be achieved using xpath)

Selenium Python Xpath how to select the correct span text from many nested div tags

I have a web page with a left hand menu. It is made up of many div tags.
I have noticed when my Selenium Python script runs it is not clicking the text I want clicked from the left hand menu. It is clicking something else.
My Xpath is not correct.
I would like to locate the text "Statistics" (it is in a div\span tag) which has the parent div text "Analysis"
It is not clicking the correct text "Statistics" because there maybe another "Statistics" somewhere in the HTML source. If i start from the div tag which has the text "Analysis" and then find the text "Statistics" then I will get the correct element.
My Xpath is:
.//div//span[#title="Analysis"]/following::div[5]//span[text()="Statistics"]
The HTML is:
<div>
<span class="" title="Analysis"
style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;">Analysis</span>
</div>
</div>
</div>
</div>
</div>
<div style="overflow: hidden;">
<div>
<div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="1" aria-expanded="false"
aria-level="2">
<div class="GJPPK2LBIF" style="padding-left: 16px;">
<div class="GJPPK2LBIF GJPPK2LBKF" style="padding-left: 16px;position:relative;" onclick="">
<div class="GJPPK2LBJF" style="left: 0px;width: 15px;height: 15px;position:absolute;">
<img border="0"
style="width:15px;height:15px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA8AAAAPCAYAAAA7/HbnjJn53wAAAABJRU5ErkJggg==) no-repeat 0px 0px;"
src="http://test1:8080/clearcore/ClearCore/clear.cache.gif"
onload="this.__gwtLastUnhandledEvent=" load";"/>
</div>
<div class="GJPPK2LBLF">
<div style="padding-left: 22px;position:relative;zoom:1;">
<div style="left:0px;margin-top:-8px;position:absolute;top:50%;line-height:0px;">
<img border="0"
style="width:16px;height:16px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAABmJLR0QA/wD/AP+gvaekJggg==) no-repeat 0px 0px;"
src="http://test1:8080/clearcore/ClearCore/clear.cache.gif"
onload="this.__gwtLastUnhandledEvent=" load";"/>
</div>
<div>
<span class="" title="Statistics"
style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;">Statistics</span>
</div>
</div>
</div>
</div>
</div>
</div>
Thanks,
Riaz
If you have FireFox with FirePath you can test the xpath and see how many and which matches you get. For instance:
//span[text()="Statistics"]
This may result in 1 matching node but also in more. Let's assume there's two matches and the one you want is the second one. Then you'd choose:
//span[text()="Statistics"][2]

perl regex for complex multiline search replace

I know there are many questions on this topic, but most are fairly trivial and
I'm unable to find a solution for my case.
I have a set of HTML files with many, many "media" items like the following,
each of which is a "paragraph", separated by "\n\n". Here is a link to a sample file of the type I'm working on.
<li class="media">
<div class="media-left">
<a href="#">
<img class="media-object" src="4_17-HE-assoc.png" width="250" alt="...">
</a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 4.17</h4>
Association plot for the hair-color eye-color data. Left: marginal table, collapsed over
gender; right: full table.
</div>
</li>
For each <img ...> tag, I need to find the src="file" value, and replace the href="#" on the previous line
by href="file" class="fancybox. i.e., so that item will then look like
<li class="media">
<div class="media-left">
<a href="4_17-HE-assoc.png" class="fancybox">
<img class="media-object" src="4_17-HE-assoc.png" width="250" alt="...">
</a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 4.17</h4>
Association plot for the hair-color eye-color data. Left: marginal table, collapsed over
gender; right: full table.
</div>
</li>
I tried the following as a one-liner, but it has no effect, i.e., it doesn't make the changes.
perl -pi~ -e '$/ = "";s|<a href="#">\n(\s*<img class="media object") src=(".*png")|<a class="fancybox" href="\2">\n\1 src=\2|ms' ch03.html
Can someone help with this? I'd be happy with a simple script that I could
use for this and modify for other similar modifications of a collection of web files.
edit: I'm aware of the advantages of using perl modules such as HTML::TreeBuilder to avoid having to parse HTML directly. If someone
could give me a start, I could probably take it from there.
use XML::LibXML qw( );
my $qfn = 'ch03.html';
my $in_qfn = $qfn . "~";
my $out_qfn = $qfn;
rename($qfn, $in_qfn)
or die("Can't rename \"qfn\": $!\n");
my $parser = XML::LibXML->new();
my $doc = $parser->parse_html_file($in_qfn);
for my $a_node ($doc->findnodes('//a[#href="#"]')) {
my ($src_node) = $a_node->findnodes('img[1]/#src')
or next;
$a_node->setAttribute('href', $src_node->value());
$a_node->setAttribute('class', 'fancybox');
}
my $html = $doc->toStringHTML();
open(my $fh, '>', $out_qfn)
or die("Can't create \"$out_qfn\": $!\n");
print($fh $html);
Tested:
$ diff -u ch03.html{~,}
--- ch03.html~ 2016-01-20 12:41:30.809203040 -0800
+++ ch03.html 2016-01-20 12:41:31.009201042 -0800
## -1,7 +1,7 ##
-<div class="contents">
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
+<html><body><div class="contents">
<h1 class="tocpage">Chapter 3: Fitting and Graphing Discrete Distributions</h1>
<hr class="tocpage">
-
<div class="row">
<div class="col-md-6">
<!-- prelude-inserted -->
## -18,7 +18,7 ##
<div class="col-md-6">
<h3>Contents</h3>
<dl class="chaptoc">
- <dd>3.1. Introduction to discrete distributions</dd>
+<dd>3.1. Introduction to discrete distributions</dd>
<dd>3.2. Characteristics of discrete distributions</dd>
<dd>3.3. Fitting discrete distributions</dd>
<dd>3.4. Diagnosing discrete distributions: Ord plots</dd>
## -27,8 +27,7 ##
<dd>3.7. Chapter summary</dd>
<dd>3.8. Lab exercises</dd>
</dl>
-
- </div>
+</div>
</div>
<!-- more-content -->
## -38,11 +37,10 ##
<h3>Selected figures</h3>
<a class="btn btn-primary" href="../../Rcode/ch03.R" role="button">view R code</a>
<ul class="media-list">
- <li class="media">
+<li class="media">
<div class="media-left">
- <a href="#">
- <img class="media-object" src="saxony-barplot.png" width="250" alt="males in Saxony families">
- </a>
+ <a href="saxony-barplot.png" class="fancybox">
+ <img class="media-object" src="saxony-barplot.png" width="250" alt="males in Saxony families"></a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 3.2</h4>
## -52,9 +50,8 ##
<li class="media">
<div class="media-left">
- <a href="#">
- <img class="media-object" src="dbinom2-plot2-1.png" width="250" alt="Binomial distributions">
- </a>
+ <a href="dbinom2-plot2-1.png" class="fancybox">
+ <img class="media-object" src="dbinom2-plot2-1.png" width="250" alt="Binomial distributions"></a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 3.9</h4>
## -64,9 +61,8 ##
<li class="media">
<div class="media-left">
- <a href="#">
- <img class="media-object" src="dpois-xyplot2-1.png" width="250" alt="Poisson distributions">
- </a>
+ <a href="dpois-xyplot2-1.png" class="fancybox">
+ <img class="media-object" src="dpois-xyplot2-1.png" width="250" alt="Poisson distributions"></a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 3.11</h4>
## -76,9 +72,8 ##
<li class="media">
<div class="media-left">
- <a href="#">
- <img class="media-object" src="Fed0-plots2-1.png" width="250" alt="Hanging rootogram">
- </a>
+ <a href="Fed0-plots2-1.png" class="fancybox">
+ <img class="media-object" src="Fed0-plots2-1.png" width="250" alt="Hanging rootogram"></a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 3.15</h4>
## -89,9 +84,8 ##
<li class="media">
<div class="media-left">
- <a href="#">
- <img class="media-object" src="ordplot1-1.png" width="250" alt="Ord plot for the Butterfly data">
- </a>
+ <a href="ordplot1-1.png" class="fancybox">
+ <img class="media-object" src="ordplot1-1.png" width="250" alt="Ord plot for the Butterfly data"></a>
</div>
<div class="media-body">
<h4 class="media-heading">Figure 3.18</h4>
## -100,9 +94,10 ##
</div>
</li>
- </ul> <!-- media-list -->
- </div> <!-- col-md-12 -->
+ </ul>
+<!-- media-list -->
+</div> <!-- col-md-12 -->
<!-- footer -->
</div> <!-- row -->
-</div>
+</div></body></html>
I couldn't resist but write this one-off, super unstable, sends-me-to-parse-html-with-regex-hell sed command:
sed -i.bak '/<a href="#"/ {
N
/\n.*<img class=/ {
s/^\( *<a href="\).*\(\n.*src="\)\([^"]*\)\(.*\)/\1\3" class="fancybox">\2\3\4/
}
}' ch03.html
This looks for a line with href="#", appends the next line and then substitutes the filename and fancybox into the a tag.
Diffing the result and the input file:
43c43
< <a href="#">
---
> <a href="saxony-barplot.png" class="fancybox">
55c55
< <a href="#">
---
> <a href="dbinom2-plot2-1.png" class="fancybox">
67c67
< <a href="#">
---
> <a href="dpois-xyplot2-1.png" class="fancybox">
79c79
< <a href="#">
---
> <a href="Fed0-plots2-1.png" class="fancybox">

Extracting all dojo attach point values from HTML

I have a saved HTML page which I've opened in notepad++. I would like to extract all the attach points out of the html file. Example from the HTML below:
<div class="contentBar">
<div class="banner" style="">
<span class="bannerRepeat"></span>
<span class="bannerDecal"></span>
</div>
<div>
<div class="logo" data-dojo-attach-point="pageLogoPt">
ABC
</div>
<div class="title" data-dojo-attach-point="pageTitlePt">
ABC
</div>
<div class="userPane">
<div>
<span class="LoginCell LoginText"><span data-dojo-attach-point="welcomeBlockPt">Welcome</span>, <b data-dojo-attach-point="usernameBlockPt">User Name</b></span>
<span widgetid="acme_Button_0" id="acme_Button_0" class="LoginCell Button" data-dojo-type="acme.Button" data-dojo-props="size: 'small'" data-dojo-attach-point="logOutButtonPt"><span widgetid="dijit_form_Button_0" class="dijit dijitReset dijitInline dijitButton ButtonSmall" role="presentation"><span class="dijitReset dijitInline dijitButtonNode" data-dojo-attach-event="ondijitclick:__onClick" role="presentation"><span style="-moz-user-select: none;" aria-disabled="false" id="dijit_form_Button_0" tabindex="0" class="dijitReset dijitStretch dijitButtonContents" data-dojo-attach-point="titleNode,focusNode" role="button" aria-labelledby="dijit_form_Button_0_label"><span class="dijitReset dijitInline dijitIcon dijitNoIcon" data-dojo-attach-point="iconNode"></span><span class="dijitReset dijitToggleButtonIconChar">●</span><span class="dijitReset dijitInline dijitButtonText" id="dijit_form_Button_0_label" data-dojo-attach-point="containerNode">Logout</span></span></span><input value="" class="dijitOffScreen" data-dojo-attach-event="onclick:_onClick" tabindex="-1" role="presentation" aria-hidden="true" data-dojo-attach-point="valueNode" type="button"></span></span>
</div>
<div>
<span id="printLink" style="display:none;">Print</span>
<span id="zoomPercentageDisplay"><span data-dojo-attach-point="zoomBlockPt">Zoom</span>: 100%</span>
<span id="smallFontSizeLink" style="font-size: .8em;">A</span>
<span id="defaultFontSizeLink" style="font-size: 1em;">AA</span>
<span id="largeFontSizeLink" style="font-size: 1.2em;">AAA</span>
</div>
</div>
</div>
</div>
I would like to get:
pageLogoPt
pageTitlePt
welcomeBlockPt
usernameBlockPt
etc ...
Is this possible? Thanks
You can do the following:
Replace (data-dojo-attach-point="[^"]+)(?=") with \n\1\n. This will put what you're looking for on separate lines.
Mark All based on the regex data-dojo-attach-point="[^"]+. Tick "Bookmark line" checkbox.
Search -> Bookmark -> Remove Unmarked Lines
Replace data-dojo-attach-point=" with blank.
This will give you your list with each item in its own line.
Tested on Notepad++ 6.8.8.
Inspired by https://superuser.com/questions/477628/export-all-regular-expression-matches-in-textpad-or-notepad-as-a-list.