Extracting all dojo attach point values from HTML - regex

I have a saved HTML page which I've opened in notepad++. I would like to extract all the attach points out of the html file. Example from the HTML below:
<div class="contentBar">
<div class="banner" style="">
<span class="bannerRepeat"></span>
<span class="bannerDecal"></span>
</div>
<div>
<div class="logo" data-dojo-attach-point="pageLogoPt">
ABC
</div>
<div class="title" data-dojo-attach-point="pageTitlePt">
ABC
</div>
<div class="userPane">
<div>
<span class="LoginCell LoginText"><span data-dojo-attach-point="welcomeBlockPt">Welcome</span>, <b data-dojo-attach-point="usernameBlockPt">User Name</b></span>
<span widgetid="acme_Button_0" id="acme_Button_0" class="LoginCell Button" data-dojo-type="acme.Button" data-dojo-props="size: 'small'" data-dojo-attach-point="logOutButtonPt"><span widgetid="dijit_form_Button_0" class="dijit dijitReset dijitInline dijitButton ButtonSmall" role="presentation"><span class="dijitReset dijitInline dijitButtonNode" data-dojo-attach-event="ondijitclick:__onClick" role="presentation"><span style="-moz-user-select: none;" aria-disabled="false" id="dijit_form_Button_0" tabindex="0" class="dijitReset dijitStretch dijitButtonContents" data-dojo-attach-point="titleNode,focusNode" role="button" aria-labelledby="dijit_form_Button_0_label"><span class="dijitReset dijitInline dijitIcon dijitNoIcon" data-dojo-attach-point="iconNode"></span><span class="dijitReset dijitToggleButtonIconChar">●</span><span class="dijitReset dijitInline dijitButtonText" id="dijit_form_Button_0_label" data-dojo-attach-point="containerNode">Logout</span></span></span><input value="" class="dijitOffScreen" data-dojo-attach-event="onclick:_onClick" tabindex="-1" role="presentation" aria-hidden="true" data-dojo-attach-point="valueNode" type="button"></span></span>
</div>
<div>
<span id="printLink" style="display:none;">Print</span>
<span id="zoomPercentageDisplay"><span data-dojo-attach-point="zoomBlockPt">Zoom</span>: 100%</span>
<span id="smallFontSizeLink" style="font-size: .8em;">A</span>
<span id="defaultFontSizeLink" style="font-size: 1em;">AA</span>
<span id="largeFontSizeLink" style="font-size: 1.2em;">AAA</span>
</div>
</div>
</div>
</div>
I would like to get:
pageLogoPt
pageTitlePt
welcomeBlockPt
usernameBlockPt
etc ...
Is this possible? Thanks

You can do the following:
Replace (data-dojo-attach-point="[^"]+)(?=") with \n\1\n. This will put what you're looking for on separate lines.
Mark All based on the regex data-dojo-attach-point="[^"]+. Tick "Bookmark line" checkbox.
Search -> Bookmark -> Remove Unmarked Lines
Replace data-dojo-attach-point=" with blank.
This will give you your list with each item in its own line.
Tested on Notepad++ 6.8.8.
Inspired by https://superuser.com/questions/477628/export-all-regular-expression-matches-in-textpad-or-notepad-as-a-list.

Related

Radio list appears horizontally on IE when it should have been vertical

Radio button list appear vertically rather than horizontally. Appears alright on Chrome and Firefox. We are using angular js 1.7.
radio group from IE 11
radio group from Firefox 60.3.0 esr
HTML code snippet for the radio group
HTML Code:
<div ng-if="feedbackChoices && feedbackChoices.length > 1" id="629934096_not_helpful" class="choiceText f-b-choices-body ng-scope" ng-show="showChoices">
<br>
<div class="responsePrompt ng-scope" translate="">Where did we go wrong?</div>
<br>
<ul class="selectionRadio f-b-choices selectionGroup629934096_feedback_choices">
<!-- ngRepeat: choice in feedbackChoices --><li ng-repeat="choice in feedbackChoices" class="ng-scope">
<div>
<label><input id="948060864" class="f-b-input" name="selectionGroup" ng-click="submitFeedbackChoice(choice)" type="radio">
<label for="948060864" class="f-b-choices-label ng-scope ng-binding" translate="">The instructions did not apply</label>
</label>
</div>
</li><!-- end ngRepeat: choice in feedbackChoices --><li ng-repeat="choice in feedbackChoices" class="ng-scope">
<div>
<label><input id="948060864_2" class="f-b-input" name="selectionGroup" ng-click="submitFeedbackChoice(choice)" type="radio">
<label for="948060864_2" class="f-b-choices-label ng-scope ng-binding" translate="">The instructions were too long/hard to follow</label>
</label>
</div>
</li><!-- end ngRepeat: choice in feedbackChoices --><li ng-repeat="choice in feedbackChoices" class="ng-scope">
<div>
<label><input id="948060864_3" class="f-b-input" name="selectionGroup" ng-click="submitFeedbackChoice(choice)" type="radio">
<label for="948060864_3" class="f-b-choices-label ng-scope ng-binding" translate="">I need you to do this for me</label>
</label>
</div>
</li><!-- end ngRepeat: choice in feedbackChoices -->
</ul>
</div>

preg_replace regular expression to replace link within a particular tags

I need one help, i want to replace the href link to my link within a particular div class only.
<div id="slider1" class="owl-carousel owl-theme">
<div class="item">
<div class="imagens">
<img src="https://image.oldste.org" alt="The Fate of the Furious" width="100%" height="100%" />
<span class="imdb">
<b class="icon-star"></b> N/A
</span>
</div>
<span class="ttps">The Fate of the Furious</span>
<span class="ytps">2017</span>
</div>
</div>
Here i want to change http://oldsite.com/ to http://newsite.com/?id=
i want these href links like
<a href="http://newsite.com/?id=the-fate-of-the-furious">
Please help me with preg_replace regular expression.
Thanks
this may help you
$content = get_the_content();
$pattern = "/(?<=href=(\"|'))[^\"']+(?=(\"|'))/";
$newurl = get_permalink();
$content = preg_replace($pattern,$newurl,$content);
echo $content;
Lookbehinds are too expensive, use \K to start the fullstring match and avoid a capture group.
<a href="\K[^"]+\/ This pattern will be very efficient. I should state that this pattern will match ALL <a href urls. It also matches greedily until it finds the last / in the url -- I assume this is okay by your input sample.
Pattern Demo
Code (PHP Demo):
$in='<div id="slider1" class="owl-carousel owl-theme">
<div class="item">
<div class="imagens">
<img src="https://image.oldste.org" alt="The Fate of the Furious" width="100%" height="100%" />
<span class="imdb"><b class="icon-star"></b> N/A</span>
</div>
<span class="ttps">The Fate of the Furious</span>
<span class="ytps">2017</span>
</div>';
echo preg_replace('/<a href="\K[^"]+\//','http://newsite.com/?id=',$in);
Output:
<div id="slider1" class="owl-carousel owl-theme">
<div class="item">
<div class="imagens">
<img src="https://image.oldste.org" alt="The Fate of the Furious" width="100%" height="100%" />
<span class="imdb"><b class="icon-star"></b> N/A</span>
</div>
<span class="ttps">The Fate of the Furious</span>
<span class="ytps">2017</span>
</div>

jquery regex get several key not only one

I would like to get
PA-1400-11PA ADP-40PH ABA
Here html code
</div>
<div class="ref">
<h2 id='affiche_sous_titre'>eee :</h2> <p>
<a href='eee' title='PA-1400-11PA' class='lien_menu'>PA-1400-11PA</a> - <a href='uuu' title='ADP-40PH ABA' class='lien_menu'>ADP-40PH ABA</a> </p>
</div>
<div class="modele_tout">
</div>
<div class="star-customer">
Here my reg code
line=line.replace(/[\"\']lien_menu[\"\']>(.*?)<\/a>/ig,"$1\n")
But I have only
ADP-40PH ABA
What is the problem.I dont understand?
thanks for your help

Selenium Python Xpath how to select the correct span text from many nested div tags

I have a web page with a left hand menu. It is made up of many div tags.
I have noticed when my Selenium Python script runs it is not clicking the text I want clicked from the left hand menu. It is clicking something else.
My Xpath is not correct.
I would like to locate the text "Statistics" (it is in a div\span tag) which has the parent div text "Analysis"
It is not clicking the correct text "Statistics" because there maybe another "Statistics" somewhere in the HTML source. If i start from the div tag which has the text "Analysis" and then find the text "Statistics" then I will get the correct element.
My Xpath is:
.//div//span[#title="Analysis"]/following::div[5]//span[text()="Statistics"]
The HTML is:
<div>
<span class="" title="Analysis"
style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;">Analysis</span>
</div>
</div>
</div>
</div>
</div>
<div style="overflow: hidden;">
<div>
<div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="1" aria-expanded="false"
aria-level="2">
<div class="GJPPK2LBIF" style="padding-left: 16px;">
<div class="GJPPK2LBIF GJPPK2LBKF" style="padding-left: 16px;position:relative;" onclick="">
<div class="GJPPK2LBJF" style="left: 0px;width: 15px;height: 15px;position:absolute;">
<img border="0"
style="width:15px;height:15px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA8AAAAPCAYAAAA7/HbnjJn53wAAAABJRU5ErkJggg==) no-repeat 0px 0px;"
src="http://test1:8080/clearcore/ClearCore/clear.cache.gif"
onload="this.__gwtLastUnhandledEvent=" load";"/>
</div>
<div class="GJPPK2LBLF">
<div style="padding-left: 22px;position:relative;zoom:1;">
<div style="left:0px;margin-top:-8px;position:absolute;top:50%;line-height:0px;">
<img border="0"
style="width:16px;height:16px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAABmJLR0QA/wD/AP+gvaekJggg==) no-repeat 0px 0px;"
src="http://test1:8080/clearcore/ClearCore/clear.cache.gif"
onload="this.__gwtLastUnhandledEvent=" load";"/>
</div>
<div>
<span class="" title="Statistics"
style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;">Statistics</span>
</div>
</div>
</div>
</div>
</div>
</div>
Thanks,
Riaz
If you have FireFox with FirePath you can test the xpath and see how many and which matches you get. For instance:
//span[text()="Statistics"]
This may result in 1 matching node but also in more. Let's assume there's two matches and the one you want is the second one. Then you'd choose:
//span[text()="Statistics"][2]

Selenium Python UnboundLocalError: local variable 'element' referenced before assignment

I am trying to click on a span tag which contains the text "Clean feed crm"
using an XPATH locator.
I get the error:
UnboundLocalError: local variable 'element' referenced before assignment
Full error trace:
Traceback (most recent call last):
File "C:\Webdriver\ClearCore\TestCases\OperationsPage_TestCase.py", line 56, in test_add_and_run_clean_process
process_lists_page.click_clean_feed_task_from_groups_tab(Globals.process_lists_clean_feed_task_crm)
File "C:\Webdriver\ClearCore\Pages\operations.py", line 90, in click_clean_feed_task_from_groups_tab
clean_feed_crm_element = self.get_element(By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "Clean feed crm")]')
File "C:\Webdriver\ClearCore 501\Pages\base.py", line 31, in get_element
return element
UnboundLocalError: local variable 'element' referenced before assignment
If i use the absolute full XPATH it works fine. The relative XPATH it shows the error.
The full absolute XPATH which works is:
(By.XPATH, 'html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[7]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[3]/div/div[2]/div/div[1]/div/div/div/div/div[1]/div[1]/div[2]/div/div[1]/div[1]/div/div/div[2]/div/div[2]/span[1]/span')
The relative XPATH which does not work is:
(By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "Clean feed crm")]')
The HTML is:
<div id="operations_add_process_list_ct_groups_and_tasks" class="GPI5XK1CDG" __gwtcellbasedwidgetimpldispatchingfocus="true" __gwtcellbasedwidgetimpldispatchingblur="true" role="tree">
<div style="overflow: hidden;">
<div>
<div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="1" aria-expanded="true" aria-level="1">
<div class="GPI5XK1CIF GPI5XK1CAG" style="padding-left: 0px;">
<div style="overflow: hidden;">
<div>
<div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="1" aria-level="2">
<div class="GPI5XK1CIF" style="padding-left: 16px;">
<div class="GPI5XK1CIF GPI5XK1CKF" style="padding-left: 16px;position:relative;" onclick="">
<div style="position:absolute;display:none;"/>
<div class="GPI5XK1CLF">
<div style="padding-left: 22px;position:relative;zoom:1;">
<div style="left:0px;margin-top:-8px;position:absolute;top:50%;line-height:0px;">
<img border="0" style="width:16px;height:16px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAC2UlEQVR42mNgQANlZ1PqG84XfA7YYVsG5AqBaK+lFnM0E5RUgXx2IGZkwAbq99ezgCTTj4Wu//jrw/+68/mf3BeZ7ei5XPdv6+M1/y2btBcC5TWghqCCwlPxzZ2Xqz/FHfbe3Xu57uudTzf+H3i+4//eZ1v/H3257//aB4v/a8XKrxE1Esw2LFDzBGrhAGImuAFdl2s+/fjz/f+r7y/+H3qxG6zx0ruz/8++Of5/x5P1/7c8Xv2/9lzu/7Izqf/9N9t8E9bidwZq44d5h9Fvo9XMlffn/z/x6iBY040Pl/8vuTvj/7bH68BiMENAdOnplP/SdqJTgfr0wd7xXmnu6DTTcOmcWxP+73m25f+tj9f+B+6w/afoJb5TNVRmHVDDl11PN/0HhQPQm//NazQ/sQuzgwxwB2IehsiDLr9AzgXZClJ49f0FYIDpXAVKLgHiCu81lis3PloBNgAYoP8NclVuc0lwTgbKOQAxN0PIPvuXR17u+X/wxS6wE0++PgyKgT9mNRpL/TZat0253v5m3cMl/0GGgLy56v6C/4YFqg+Bmr3B4WDdot3usdjsg+96K7AhIFeAYgCkCaQBRIP8D2JXns0AG6CdqHAXqLkYiGVBgSjMq8CdELjd9jvIAFDAgTDIyTA2SBPI/0CNT2UcRHfwynLNRTaAxahQrQ8U1yvuzQX7E2TrmgeLwDHRdKH4/9K7s/5Pu9H1X85VfC9Q/UwgzgVie2hUMjDopSs5A13wy3GywX+TMvXHwFj4DfJz4mG//1bNOv8yjoX+m3trIsjvT4HKa6FRyAfEzLC0xCnvKp4ibiI4CcieoREpe7DjUuV/0wqNn8yczEsF1XkXgdiiBoKrgPLlQCyHnidAHBEgtgRiJzlXiUagS35J24psAfJBhkYwczAnQjUHgsINW34CpW1OcOKAGOYBypxAHAXE8kAsBsSKUDkWBgIApEAISSMr1JWM6E4HAJKeit5kyDtvAAAAAElFTkSuQmCC) no-repeat 0px 0px;" src="http://justin-pc.infoshare.local:8080/clearcore501/ClearCore/clear.cache.gif" onload="this.__gwtLastUnhandledEvent="load";"/>
</div>
<div>
<span>
<span class=" myinlineblock" title="Clean feed crm" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;width:100%;margin-right:-14px;">Clean feed crm</span>
</span>
<span>
<span class="" title="Turn task off or on." style="">
<input type="checkbox" checked="" tabindex="-1"/>
</span>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="2" aria-level="2">
<div class="GPI5XK1CIF" style="padding-left: 16px;">
<div class="GPI5XK1CIF GPI5XK1CKF" style="padding-left: 16px;position:relative;" onclick="">
<div style="position:absolute;display:none;"/>
<div class="GPI5XK1CLF">
<div style="padding-left: 22px;position:relative;zoom:1;">
<div style="left:0px;margin-top:-8px;position:absolute;top:50%;line-height:0px;">
<img border="0" style="width:16px;height:16px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAC2UlEQVR42mNgQANlZ1PqG84XfA7YYVsG5AqBaK+lFnM0E5RUgXx2IGZkwAbq99ezgCTTj4Wu//jrw/+68/mf3BeZ7ei5XPdv6+M1/y2btBcC5TWghqCCwlPxzZ2Xqz/FHfbe3Xu57uudTzf+H3i+4//eZ1v/H3257//aB4v/a8XKrxE1Esw2LFDzBGrhAGImuAFdl2s+/fjz/f+r7y/+H3qxG6zx0ruz/8++Of5/x5P1/7c8Xv2/9lzu/7Izqf/9N9t8E9bidwZq44d5h9Fvo9XMlffn/z/x6iBY040Pl/8vuTvj/7bH68BiMENAdOnplP/SdqJTgfr0wd7xXmnu6DTTcOmcWxP+73m25f+tj9f+B+6w/afoJb5TNVRmHVDDl11PN/0HhQPQm//NazQ/sQuzgwxwB2IehsiDLr9AzgXZClJ49f0FYIDpXAVKLgHiCu81lis3PloBNgAYoP8NclVuc0lwTgbKOQAxN0PIPvuXR17u+X/wxS6wE0++PgyKgT9mNRpL/TZat0253v5m3cMl/0GGgLy56v6C/4YFqg+Bmr3B4WDdot3usdjsg+96K7AhIFeAYgCkCaQBRIP8D2JXns0AG6CdqHAXqLkYiGVBgSjMq8CdELjd9jvIAFDAgTDIyTA2SBPI/0CNT2UcRHfwynLNRTaAxahQrQ8U1yvuzQX7E2TrmgeLwDHRdKH4/9K7s/5Pu9H1X85VfC9Q/UwgzgVie2hUMjDopSs5A13wy3GywX+TMvXHwFj4DfJz4mG//1bNOv8yjoX+m3trIsjvT4HKa6FRyAfEzLC0xCnvKp4ibiI4CcieoREpe7DjUuV/0wqNn8yczEsF1XkXgdiiBoKrgPLlQCyHnidAHBEgtgRiJzlXiUagS35J24psAfJBhkYwczAnQjUHgsINW34CpW1OcOKAGOYBypxAHAXE8kAsBsSKUDkWBgIApEAISSMr1JWM6E4HAJKeit5kyDtvAAAAAElFTkSuQmCC) no-repeat 0px 0px;" src="http://justin-pc.infoshare.local:8080/clearcore501/ClearCore/clear.cache.gif" onload="this.__gwtLastUnhandledEvent="load";"/>
</div>
<div>
<span>
<span class=" myinlineblock" title="Clean feed escr" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;width:100%;margin-right:-14px;">Clean feed escr</span>
</span>
<span>
<span class="" title="Turn task off or on." style="">
<input type="checkbox" checked="" tabindex="-1"/>
</span>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
My method implementation is:
def click_clean_feed_task_from_groups_tab(self, feed):
# Params: feed: clean feed crm, clean feed escr or clean feed orchard
#clean_feed_crm_element = self.driver.find_element(By.XPATH, '//span[#class="myinlineblock" and contains(text(), "%s") % feed]')
clean_feed_crm_element = self.get_element(By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "Clean feed crm")]')
#clean_feed_crm_element = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//..//.//..//..//..//..//..//..//../span[contains(text(), "%s")] % feed ]')))
clean_feed_crm_element.click()
return self
From my TestCase class i call th method:
project_navigator = ProjectNavigatorPage(self.driver)
process_lists_page = project_navigator.select_projectNavigator_item("Process Lists")
process_lists_page.click_add_button_for_process_lists()
process_lists_page.click_clean_task_arrow_to_expand_it_from_groups_tab("add")
process_lists_page.click_clean_feed_task_from_groups_tab(Globals.process_lists_clean_feed_task_crm)
Globals.py is:
process_lists_clean_feed_task_crm = "Clean feed crm"
I havea also tried using WebDriverWait still the same error:
clean_feed_crm_element = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable(((By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "%s") % feed]')))
%s, % feed the value is "Clean feed crm" as I am looking for this text (passed in as a parameter into my method.
What am i doing wrong? What XPATH could i use then to click the element which has the text "Clean feed crm"?
Thanks,
Riaz
If we recall some elements from the XPath sintax:
The expression "//" selects nodes in the document from the current
node that match the selection no matter where they are.
The expression ".." selects the parent of the current node.
Therefore when you write:
//div[#id="operations_add_process_list_ct_groups_and_tasks"]//..
You are selecting the div node itself. From that node the relative XPath should be:
//div[#id="operations_add_process_list_ct_groups_and_tasks"]//span[contains(text(), "Clean feed crm")]
That way you select the div node with the id selected, and look inside for the span tag which contains the text.