how to get text out of span in python using scrapy?

how to get text out of span in python using scrapy? - python-2.7

<div id="eventInfoContainer">
<table>
<tbody><tr>
<td class="verticalTop">
<script type="text/javascript"><!--
google_ad_client = "ca-pub-2475575566915822";
/* listing page */
google_ad_slot = "4647770957";
google_ad_width = 160;
google_ad_height = 600;
//-->
</script>
<script type="text/javascript" src="https://pagead2.googlesyndication.com/pagead/show_ads.js">
</script><ins id="aswift_0_expand" style="display:inline-table;border:none;height:600px;margin:0;padding:0;position:relative;visibility:visible;width:160px;background-color:transparent;"><ins id="aswift_0_anchor" style="display:block;border:none;height:600px;margin:0;padding:0;position:relative;visibility:visible;width:160px;background-color:transparent;"><iframe width="160" height="600" frameborder="0" marginwidth="0" marginheight="0" vspace="0" hspace="0" allowtransparency="true" scrolling="no" allowfullscreen="true" onload="var i=this.id,s=window.google_iframe_oncopy,H=s&&s.handlers,h=H&&H[i],w=this.contentWindow,d;try{d=w.document}catch(e){}if(h&&d&&(!d.body||!d.body.firstChild)){if(h.call){setTimeout(h,0)}else if(h.match){try{h=s.upd(h,i)}catch(e){}w.location.replace(h)}}" id="aswift_0" name="aswift_0" style="left:0;position:absolute;top:0;width:160px;height:600px;"></iframe></ins></ins>
</td>
<td class="spacer30w"></td>
<td class="verticalTop">
<span id="eventNameHeader">The Future of Medicine, Health Care and Biological Studies</span>
<br>
<br>
<span id="smallerHeading">Conference</span>
<br>
<br>
<span id="eventDate">16th to 17th October 2017</span>
<br>
<span id="eventCountry">Rockville, Maryland, United States of America</span>
<br>
<br>
<span id="eventWebsite">
<span id="smallerHeading">Website: </span>
http://rais.education/the-future-of-medicine-health-care-and-biological-studies/
</span>
<br>
<span id="eventContactPerson"><span id="smallerHeading">Contact person: </span>Eduard David</span>
<br>
<br>
<span id="eventDescription">We gladly invite you to attend the International Conference The Future of Medicine, Health Care and Biological Studies which will be held at Johns Hopkins University, just 20 miles away from Washington DC. </span>
<br>
<br>
<span id="eventOrganiser"><span style="font-weight: bold; color: #696969;">Organized by: </span>Research Association for Interdisciplinary Studies (RAIS)</span> <br><span id="eventDeadline"><span style="font-weight: bold; color: #696969;">Deadline for abstracts/proposals: </span>21st August 2017</span> <br>
<br>
Check the event website for more details.
<br>
<br>
<br>
<br>
<br>
<br>
<table>
<tbody><tr>
<td class="verticalMiddle">
<form><input type="button" value="Back" onclick="history.go(-1); return true;"></form>
</td>
<td class="spacer15w"></td>
<td class="verticalMiddle">
<a title="Share this conference on Facebook" href="http://www.facebook.com/sharer.php?
s=100
&p[url]=http://www.conferencealerts.com/show-event?id=187457 &p[title]=The Future of Medicine, Health Care and Biological Studies &p[summary]=We gladly invite you to attend the International Conference The Future of Medicine, Health Care and Biological Studies which will be held at Johns Hopkins University, just 20 miles away from Washington DC. " target="_blank" class="fb_share_link">Share on Facebook</a>
</td>
<td class="spacer15w"></td>
<td>
<img src="http://www.google.com/calendar/images/ext/gc_button6.gif" border="0" align="left">
</td>
</tr>
<tr><td class="spacer5"></td></tr>
<tr>
<td colspan="5">
<script type="text/javascript"><!--
google_ad_client = "ca-pub-2475575566915822";
/* show event under content */
google_ad_slot = "8943315143";
google_ad_width = 300;
google_ad_height = 250;
//-->
</script>
<script type="text/javascript" src="https://pagead2.googlesyndication.com/pagead/show_ads.js">
</script><ins id="aswift_1_expand" style="display:inline-table;border:none;height:250px;margin:0;padding:0;position:relative;visibility:visible;width:300px;background-color:transparent;"><ins id="aswift_1_anchor" style="display:block;border:none;height:250px;margin:0;padding:0;position:relative;visibility:visible;width:300px;background-color:transparent;"><iframe width="300" height="250" frameborder="0" marginwidth="0" marginheight="0" vspace="0" hspace="0" allowtransparency="true" scrolling="no" allowfullscreen="true" onload="var i=this.id,s=window.google_iframe_oncopy,H=s&&s.handlers,h=H&&H[i],w=this.contentWindow,d;try{d=w.document}catch(e){}if(h&&d&&(!d.body||!d.body.firstChild)){if(h.call){setTimeout(h,0)}else if(h.match){try{h=s.upd(h,i)}catch(e){}w.location.replace(h)}}" id="aswift_1" name="aswift_1" style="left:0;position:absolute;top:0;width:300px;height:250px;"></iframe></ins></ins>
</td>
</tr>
</tbody></table>
<br>
</td>
</tr>
</tbody></table>
</div>
How to get text "The Future of Medicine, Health Care and Biological Studies" from above code in python using scrapy?
I tried this code
response.css('div.eventInfoContainer table tbody tr td:nth-child(3) span::text').extract()
But o/p getting like this "[]"

As the span element that contains the required information has an id attribute (which should be unique), this should suffice:
text = response.css('span#eventNameHeader::text').extract_first()
EDIT:
Using XPath, it's similar:
text = response.xpath('//span[#id="eventNameHeader"]/text()').extract_first()

Related

Regex: match number in table cell

I have following table cell:
<td class="text-right"
onmouseenter="$(this).find('.overlay-viewable-box:first').show();"
onmouseleave="$(this).find('.overlay-viewable-box:first').hide();">
2.004
</td>
It contains spaces and line breaks too. The class="text-right" isn't unique on the page, but the first - if it could help to relate on it.
I want to match only number (this one - 2.004, or any other, it is always only one number) - with or without the point and/ or comma in it.
PS: yes, i fully agreed that the idea to parse html with regex is not the best - any other method would be such kind of overhead, that it would be not worth to do:(
PPS: guys and guls - please write your recommendations as answers, not as comments, so i could accept and honorate them.
Solution: (?:<td\b.*?text-right\b.*?\D*?;">)([\s\S\d]*?)(?=\D*?<\/)
Edit: full length HTML:
<div class="box " >
<div class="box-head " >
<div class="box-icon">
<span class="icon "></span> </div>
<span class="divider"></span>
<div class="box-title box-title-space-1">
<span>Keyword-Profile</span></div>
<div class="box-options dropdown box-options-no-divider">
<div class="divider "></div>
<div class="box-icon "><a
class="button">
<span class="icon "></span> </a></div>
<ul class="dropdown-menu">
<li
> <a onclick="" class="modal"><div><div class="icon"><div></div></div><div class="text"> Add to Dashboard</div></div></a>
</li>
<li
><span class="box-menu-seperator"></span> <a onclick="
" href="" class="modal"><div><div class="icon"><div></div></div><div class="text"> Add to Report</div></div></a>
</li>
</ul>
</div>
</div>
<div class="module-loading-blocker">
<div class="module-loading-blocker-icon">
<div style="width: 40px; height: 40px; display: inline-block;">
<svg width="100%" height="100%" class="loading-circular" viewBox="0 0 50 50">
<circle class="loading-path" cx="25" cy="25" r="20" fill="none" stroke-width="5" stroke-miterlimit="10"/>
</svg>
</div> </div>
</div>
<div class="box-content box-body box-table" > <table class="table table-spaced">
<tr>
<td>
Top-10
</td>
<td class="text-right"
onmouseenter="$(this).find('.overlay-viewable-box:first').show();"
onmouseleave="$(this).find('.overlay-viewable-box:first').hide();">
2.004
</td>
</tr>
<tr>
<td>
Top-100
</td>
<td class="text-right"
onmouseenter="$(this).find('.overlay-viewable-box:first').show();"
onmouseleave="$(this).find('.overlay-viewable-box:first').hide();">
237.557
</td>
</tr>
<tr>
<td>
∅ Position
</td>
<td class="text-right"
onmouseenter="$(this).find('.overlay-viewable-box:first').show();"
onmouseleave="$(this).find('.overlay-viewable-box:first').hide();">
60
</td>
</tr>
</table>
</div></div><div class="module" style="display: none;">x</div>

Update (JavaScript RegExp)
To get the number within <td>
Ignoring the fact code will not function and to provide a Regex that'll get the number in the first td.text-right only try this:
/(?:<td\b.*?text-right\b.*?\D*?)([0-9]+?[.,]*?[0-9]*?)(?=\D*?<\/)/
|1|]=-------------------------------------=[|2|]=-----------------------=[|3|]=------------=|]
begin non-capture (?: literal <td word border d\s & zero to any number of char until \b.*? literal text-right word border t\s & zero to any number of char until \b.*? zero to any number of char that is not a number until \D*? end non-capture )
begin capture ( one to any number of numbers until [0-9]+? zero to any number of a literal . or , until [.,]*? zero to any number of numbers until [0-9]*? end capture )
begin positive look ahead (?= of zero to any number of any non-number char until \D*? literal with escaped forward slash <\/ end-positive look ahead )
Better Regex
This one concentrates on the fact that each target is on the last column by adding: <\/td>\s*?</tr> in a positive look ahead.
/\b([0-9]+?[.,]*?[0-9]*?)(?=\D*?<\/td>\s*?<\/tr>)/g;
It has a cleaner result both matching and capture groups are the same. No side effect non-capturing group.
Demo
var rgx = /\b([0-9]+?[.,]*?[0-9]*?)(?=\D*?<\/td>\s*?<\/tr>)/g;
var str = document.documentElement.innerHTML;
let hits;
while ((hits = rgx.exec(str)) !== null) {
if (hits.index === rgx.lastIndex) {
rgx.lastIndex++;
}
hits.forEach(function(hit, idx) {
console.log(`Found match, group ${idx}: ${hit}`);
});
}
<div class="box ">
<div class="box-head ">
<div class="box-icon">
<span class="icon ">&f0ae;</span> </div>
<span class="divider"></span>
<div class="box-title box-title-space-1">
<span>Keyword-Profile</span></div>
<div class="box-options dropdown box-options-no-divider">
<div class="divider "></div>
<div class="box-icon ">
<a class="button">
<span class="icon ">&f013;</span> </a>
</div>
<ul class="dropdown-menu">
<li>
<a onclick="" class="modal">
<div>
<div class="icon">
<div>&f055;</div>
</div>
<div class="text"> Add to Dashboard</div>
</div>
</a>
</li>
<li><span class="box-menu-seperator"></span>
<a onclick="
" href="" class="modal">
<div>
<div class="icon">
<div>&f055;</div>
</div>
<div class="text"> Add to Report</div>
</div>
</a>
</li>
</ul>
</div>
</div>
<div class="module-loading-blocker">
<div class="module-loading-blocker-icon">
<div style="width: 40px; height: 40px; display: inline-block;">
<svg width="100%" height="100%" class="loading-circular" viewBox="0 0 50 50">
<circle class="loading-path" cx="25" cy="25" r="20" fill="none" stroke-width="5" stroke-miterlimit="10"/>
</svg>
</div>
</div>
</div>
<div class="box-content box-body box-table">
<table class="table table-spaced">
<tr>
<td>
Top-10
</td>
<td class="text-right" onmouseenter="\$(this).find('.overlay-viewable-box:first').show();" onmouseleave="\$(this).find('.overlay-viewable-box:first').hide();">
2.004
</td>
</tr>
<tr>
<td>
Top-100
</td>
<td class="text-right" onmouseenter="\$(this).find('.overlay-viewable-box:first').show();" onmouseleave="\$(this).find('.overlay-viewable-box:first').hide();">
237.557
</td>
</tr>
<tr>
<td>
∅ Position
</td>
<td class="text-right" onmouseenter="\$(this).find('.overlay-viewable-box:first').show();" onmouseleave="\$(this).find('.overlay-viewable-box:first').hide();">
60
</td>
</tr>
</table>
</div>
</div>
<div class="module" style="display: none;">x</div>

A simple solution, provided that your parsing engine can search across lines, and supports lookarounds:
(?<=>\s*)([0-9]+(?:\.[0-9]+)?)(?=\s*<)
Explained:
The first part is (?<=>). (?<=regex) is called a positive lookbehind, which tells the parser to check if a pattern matching regex exists before the actual matching part. In this case it will look for any number of whitespaces after a >.
The core part, [0-9]+(\.[0-9]+)? matches one or more digits, optionally followed by a dot and another group of one or more digits. The last ? indicates that the decimal part is optional.
The last part is (?=<). (?=regex) is called a positive lookahead, which tells the parser to check if a pattern matching regex exists after the actual matching part. In this case it will look for any number of whitespaces, followed by a <.

Assuming your regex engine understands pcre, try
/>[\s]*([[:digit:]]+(\.[[:digit:]]+)?)[\s]*<\//g
to match a number optionally surrounded by whitespace ( including newline/linefeed characters ) which is the sole textual content of a html element. Capture group 1 holds the number.
You may need to adjust the pattern inside the capture group to cater for the kind of lexiclaisations you'd consider a 'number'.
Drop the start and the end of the expression ( ie. >, <\/ ) if the assumed structural html context is too restrictive for your purposes. Given your question you are aware that doing so increases the risk of false positives.
See it live at Regex101
Btw there are html parser libraries for most programming languages that allow for parsing lenient to syntax errors and sport simple interfaces to iterate over all textual content. Just for the sake of the argument, if jQuery or some similar functionality is available, you may proceed along the lines of this SO answer ( just replace the inner return expression with a regex test, like (untested code):
var re = RegExp('[[:digit:]]+(\.[[:digit:]]+)?', 'g');
$.fn.findByREText = function (re) {
$('*').contents().filter(function () {
return re.test($(this).text.trim());
});
};

BeautifulSoup My extracted data from HTML table does not print out in the same table format. Can i keep the table format

I have a HTML Test Report file with a list of test cases. Each test case is in a row in an HTML table.
I have managed to get the test cases out from the table for each row.
When i write this out into my email code it does not write it out as a table format like in the HTML. I would like to keep the grid lines for the rows and columns so it displays nicely as a table.
My method to extract the data is:
def extract_testcases_from_report_htmltestrunner():
filename = (r"E:\test_runners project\selenium_regression_test\TestReport\ClearCore_Automated_GUI_Regression_Project_TestReport.html")
html_report_part = open(filename,'r')
soup = BeautifulSoup(html_report_part, "html.parser")
for div in soup.select("#result_table tr div.testcase"):
yield div.text.strip().encode('utf-8'), div.find_next("a").text.strip().encode('utf-8')
When i write it into my email code out the output I get is:
test_000001_login_valid_user
pass
test_000002_select_a_project
pass
test_000003_verify_Lademo_CRM_DataPreview_is_present
pass
test_000004_view_data_preview_Lademo_CRM_and_test_scrollpage
pass
My desired output would be in the following format with the table lines if possible or in the same table format as it is in the HTML:
test_000001_login_valid_user pass
test_000002_select_a_project pass
test_000003_verify_Lademo_CRM_DataPreview_is_present pass
test_000004_view_data_preview_Lademo_CRM_and_test_scrollpage pass
The HTML snippet is:
<div class='heading'>
<h1>Test Report</h1>
<p class='attribute'><strong>Start Time:</strong> 2016-10-27 10:06:59</p>
<p class='attribute'><strong>Duration:</strong> 0:57:01.842000</p>
<p class='attribute'><strong>Status:</strong> Pass 93</p>
<p class='description'>Selenium - ClearCore Regression Project Automated Test</p>
</div>
<p id='show_detail_line'>Show
<a href='javascript:showCase(0)'>Summary</a>
<a href='javascript:showCase(1)'>Failed</a>
<a href='javascript:showCase(2)'>All</a>
</p>
<table id='result_table'>
<colgroup>
<col align='left' />
<col align='right' />
<col align='right' />
<col align='right' />
<col align='right' />
<col align='right' />
</colgroup>
<tr id='header_row'>
<td>Test Group/Test case</td>
<td>Count</td>
<td>Pass</td>
<td>Fail</td>
<td>Error</td>
<td>View</td>
</tr>
<tr class='passClass'>
<td>Regression_TestCase.RegressionProject_TestCase</td>
<td>47</td>
<td>47</td>
<td>0</td>
<td>0</td>
<td>Detail</td>
</tr>
<tr id='pt1.1' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000001_login_valid_user</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.1')" >
pass</a>
<div id='div_pt1.1' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.1').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.1: *** test_login_valid_user ***
test login with a valid user - Passed
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
<tr id='pt1.2' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000002_select_a_project</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.2')" >
pass</a>
<div id='div_pt1.2' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.2').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.2: *** test_login_valid_user ***
test login with a valid user - Passed
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
<tr id='pt1.3' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000057_run_clean_and_match_process</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.3')" >
pass</a>
<div id='div_pt1.3' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.3').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.3: *** test_login_valid_user ***
test login with a valid user - Passed
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
<tr id='pt1.4' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000058_view_all_records_report_CRM_CRM2_ESCR</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.4')" >
pass</a>
<div id='div_pt1.4' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.4').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.4: *** test_login_valid_user ***
test login with a valid user - Passed
*** Test view_all_records_report - CRM, CRM2, ESCR ***
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
<tr id='pt1.5' class='hiddenRow'>
<td class='none'><div class='testcase'>test_000059_view_matches_report_CRM_CRM2_ESCR</div></td>
<td colspan='5' align='center'>
<!--css div popup start-->
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.5')" >
pass</a>
<div id='div_pt1.5' class="popup_window">
<div style='text-align: right; color:red;cursor:pointer'>
<a onfocus='this.blur();' onclick="document.getElementById('div_pt1.5').style.display = 'none' " >
[x]</a>
</div>
<pre>
pt1.5: *** test_login_valid_user ***
test login with a valid user - Passed
*** Test view_all_records_report - CRM, CRM2, ESCR ***
</pre>
</div>
<!--css div popup end-->
</td>
</tr>
Is it possible?
The words pass is going onto a new line. If i can separate this out into a column or by a few spaces that would be good.
The word pass is in an a tag in the HTML. The following line of code finds this. Could i put a few spaces or in another column when i extract it?:
yield div.text.strip().encode('utf-8'), div.find_next("a").text.strip().encode('utf-8')
My email message code snippet which writes it out is:
msg = MIMEText("\n ClearCore Automated GUI Project Test Report \n " + "\n" +
"".join([' - '.join(seq) for seq in extract_status_from_report_htmltestrunner()]) + "\n\n" +
'\n'.join([elem
for seq in extract_testcases_from_report_htmltestrunner()
for elem in seq]) + "\n" +
"\n Report location = : \\\storage-1\Testing\Selenium_Test_Report_Results\ClearCore\Selenium VM \n" + "\n")
My code to extract the status from the report is:
def extract_status_from_report_htmltestrunner():
filename = (
r"E:\test_runners 2 edit project\selenium_regression_test\TestReport\ClearCore_Automated_GUI_Regression_Project_TestReport.html")
html_report_part = open(filename, 'r')
soup = BeautifulSoup(html_report_part, "html.parser")
div_heading = soup.find('div', {'class': 'heading'})
p_status = div_heading.find('strong', text='Status:').parent
p_status.find(text=True, recursive=False)
print p_status.text
return p_status.text
Thanks, Riaz

from bs4 import BeautifulSoup
soup = BeautifulSoup(text, 'lxml')
trs = soup.find_all(class_='hiddenRow')
for tr in trs:
row1 = tr.find('td').get_text()
row2 = tr.find('a').get_text(strip=True)
print('{:<55}{:>5}'.format(row1, row2))
out:
test_000001_login_valid_user pass
test_000002_select_a_project pass
test_000057_run_clean_and_match_process pass
test_000058_view_all_records_report_CRM_CRM2_ESCR pass
test_000059_view_matches_report_CRM_CRM2_ESCR pass

Selenium Python UnboundLocalError: local variable 'element' referenced before assignment

I am trying to click on a span tag which contains the text "Clean feed crm"
using an XPATH locator.
I get the error:
UnboundLocalError: local variable 'element' referenced before assignment
Full error trace:
Traceback (most recent call last):
File "C:\Webdriver\ClearCore\TestCases\OperationsPage_TestCase.py", line 56, in test_add_and_run_clean_process
process_lists_page.click_clean_feed_task_from_groups_tab(Globals.process_lists_clean_feed_task_crm)
File "C:\Webdriver\ClearCore\Pages\operations.py", line 90, in click_clean_feed_task_from_groups_tab
clean_feed_crm_element = self.get_element(By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "Clean feed crm")]')
File "C:\Webdriver\ClearCore 501\Pages\base.py", line 31, in get_element
return element
UnboundLocalError: local variable 'element' referenced before assignment
If i use the absolute full XPATH it works fine. The relative XPATH it shows the error.
The full absolute XPATH which works is:
(By.XPATH, 'html/body/div[2]/div[2]/div/div[4]/div/div[2]/div/div[3]/div/div[7]/div/div[3]/div/div[4]/div/div[2]/div/div[4]/div/div[3]/div/div[3]/div/div[2]/div/div[1]/div/div/div/div/div[1]/div[1]/div[2]/div/div[1]/div[1]/div/div/div[2]/div/div[2]/span[1]/span')
The relative XPATH which does not work is:
(By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "Clean feed crm")]')
The HTML is:
<div id="operations_add_process_list_ct_groups_and_tasks" class="GPI5XK1CDG" __gwtcellbasedwidgetimpldispatchingfocus="true" __gwtcellbasedwidgetimpldispatchingblur="true" role="tree">
<div style="overflow: hidden;">
<div>
<div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="1" aria-expanded="true" aria-level="1">
<div class="GPI5XK1CIF GPI5XK1CAG" style="padding-left: 0px;">
<div style="overflow: hidden;">
<div>
<div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="1" aria-level="2">
<div class="GPI5XK1CIF" style="padding-left: 16px;">
<div class="GPI5XK1CIF GPI5XK1CKF" style="padding-left: 16px;position:relative;" onclick="">
<div style="position:absolute;display:none;"/>
<div class="GPI5XK1CLF">
<div style="padding-left: 22px;position:relative;zoom:1;">
<div style="left:0px;margin-top:-8px;position:absolute;top:50%;line-height:0px;">
<img border="0" style="width:16px;height:16px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAC2UlEQVR42mNgQANlZ1PqG84XfA7YYVsG5AqBaK+lFnM0E5RUgXx2IGZkwAbq99ezgCTTj4Wu//jrw/+68/mf3BeZ7ei5XPdv6+M1/y2btBcC5TWghqCCwlPxzZ2Xqz/FHfbe3Xu57uudTzf+H3i+4//eZ1v/H3257//aB4v/a8XKrxE1Esw2LFDzBGrhAGImuAFdl2s+/fjz/f+r7y/+H3qxG6zx0ruz/8++Of5/x5P1/7c8Xv2/9lzu/7Izqf/9N9t8E9bidwZq44d5h9Fvo9XMlffn/z/x6iBY040Pl/8vuTvj/7bH68BiMENAdOnplP/SdqJTgfr0wd7xXmnu6DTTcOmcWxP+73m25f+tj9f+B+6w/afoJb5TNVRmHVDDl11PN/0HhQPQm//NazQ/sQuzgwxwB2IehsiDLr9AzgXZClJ49f0FYIDpXAVKLgHiCu81lis3PloBNgAYoP8NclVuc0lwTgbKOQAxN0PIPvuXR17u+X/wxS6wE0++PgyKgT9mNRpL/TZat0253v5m3cMl/0GGgLy56v6C/4YFqg+Bmr3B4WDdot3usdjsg+96K7AhIFeAYgCkCaQBRIP8D2JXns0AG6CdqHAXqLkYiGVBgSjMq8CdELjd9jvIAFDAgTDIyTA2SBPI/0CNT2UcRHfwynLNRTaAxahQrQ8U1yvuzQX7E2TrmgeLwDHRdKH4/9K7s/5Pu9H1X85VfC9Q/UwgzgVie2hUMjDopSs5A13wy3GywX+TMvXHwFj4DfJz4mG//1bNOv8yjoX+m3trIsjvT4HKa6FRyAfEzLC0xCnvKp4ibiI4CcieoREpe7DjUuV/0wqNn8yczEsF1XkXgdiiBoKrgPLlQCyHnidAHBEgtgRiJzlXiUagS35J24psAfJBhkYwczAnQjUHgsINW34CpW1OcOKAGOYBypxAHAXE8kAsBsSKUDkWBgIApEAISSMr1JWM6E4HAJKeit5kyDtvAAAAAElFTkSuQmCC) no-repeat 0px 0px;" src="http://justin-pc.infoshare.local:8080/clearcore501/ClearCore/clear.cache.gif" onload="this.__gwtLastUnhandledEvent="load";"/>
</div>
<div>
<span>
<span class=" myinlineblock" title="Clean feed crm" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;width:100%;margin-right:-14px;">Clean feed crm</span>
</span>
<span>
<span class="" title="Turn task off or on." style="">
<input type="checkbox" checked="" tabindex="-1"/>
</span>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
<div aria-selected="false" role="treeitem" aria-setsize="3" aria-posinset="2" aria-level="2">
<div class="GPI5XK1CIF" style="padding-left: 16px;">
<div class="GPI5XK1CIF GPI5XK1CKF" style="padding-left: 16px;position:relative;" onclick="">
<div style="position:absolute;display:none;"/>
<div class="GPI5XK1CLF">
<div style="padding-left: 22px;position:relative;zoom:1;">
<div style="left:0px;margin-top:-8px;position:absolute;top:50%;line-height:0px;">
<img border="0" style="width:16px;height:16px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAC2UlEQVR42mNgQANlZ1PqG84XfA7YYVsG5AqBaK+lFnM0E5RUgXx2IGZkwAbq99ezgCTTj4Wu//jrw/+68/mf3BeZ7ei5XPdv6+M1/y2btBcC5TWghqCCwlPxzZ2Xqz/FHfbe3Xu57uudTzf+H3i+4//eZ1v/H3257//aB4v/a8XKrxE1Esw2LFDzBGrhAGImuAFdl2s+/fjz/f+r7y/+H3qxG6zx0ruz/8++Of5/x5P1/7c8Xv2/9lzu/7Izqf/9N9t8E9bidwZq44d5h9Fvo9XMlffn/z/x6iBY040Pl/8vuTvj/7bH68BiMENAdOnplP/SdqJTgfr0wd7xXmnu6DTTcOmcWxP+73m25f+tj9f+B+6w/afoJb5TNVRmHVDDl11PN/0HhQPQm//NazQ/sQuzgwxwB2IehsiDLr9AzgXZClJ49f0FYIDpXAVKLgHiCu81lis3PloBNgAYoP8NclVuc0lwTgbKOQAxN0PIPvuXR17u+X/wxS6wE0++PgyKgT9mNRpL/TZat0253v5m3cMl/0GGgLy56v6C/4YFqg+Bmr3B4WDdot3usdjsg+96K7AhIFeAYgCkCaQBRIP8D2JXns0AG6CdqHAXqLkYiGVBgSjMq8CdELjd9jvIAFDAgTDIyTA2SBPI/0CNT2UcRHfwynLNRTaAxahQrQ8U1yvuzQX7E2TrmgeLwDHRdKH4/9K7s/5Pu9H1X85VfC9Q/UwgzgVie2hUMjDopSs5A13wy3GywX+TMvXHwFj4DfJz4mG//1bNOv8yjoX+m3trIsjvT4HKa6FRyAfEzLC0xCnvKp4ibiI4CcieoREpe7DjUuV/0wqNn8yczEsF1XkXgdiiBoKrgPLlQCyHnidAHBEgtgRiJzlXiUagS35J24psAfJBhkYwczAnQjUHgsINW34CpW1OcOKAGOYBypxAHAXE8kAsBsSKUDkWBgIApEAISSMr1JWM6E4HAJKeit5kyDtvAAAAAElFTkSuQmCC) no-repeat 0px 0px;" src="http://justin-pc.infoshare.local:8080/clearcore501/ClearCore/clear.cache.gif" onload="this.__gwtLastUnhandledEvent="load";"/>
</div>
<div>
<span>
<span class=" myinlineblock" title="Clean feed escr" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;width:100%;margin-right:-14px;">Clean feed escr</span>
</span>
<span>
<span class="" title="Turn task off or on." style="">
<input type="checkbox" checked="" tabindex="-1"/>
</span>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
My method implementation is:
def click_clean_feed_task_from_groups_tab(self, feed):
# Params: feed: clean feed crm, clean feed escr or clean feed orchard
#clean_feed_crm_element = self.driver.find_element(By.XPATH, '//span[#class="myinlineblock" and contains(text(), "%s") % feed]')
clean_feed_crm_element = self.get_element(By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "Clean feed crm")]')
#clean_feed_crm_element = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//..//.//..//..//..//..//..//..//../span[contains(text(), "%s")] % feed ]')))
clean_feed_crm_element.click()
return self
From my TestCase class i call th method:
project_navigator = ProjectNavigatorPage(self.driver)
process_lists_page = project_navigator.select_projectNavigator_item("Process Lists")
process_lists_page.click_add_button_for_process_lists()
process_lists_page.click_clean_task_arrow_to_expand_it_from_groups_tab("add")
process_lists_page.click_clean_feed_task_from_groups_tab(Globals.process_lists_clean_feed_task_crm)
Globals.py is:
process_lists_clean_feed_task_crm = "Clean feed crm"
I havea also tried using WebDriverWait still the same error:
clean_feed_crm_element = WebDriverWait(self.driver, 20).until(EC.element_to_be_clickable(((By.XPATH, '//div[#id="operations_add_process_list_ct_groups_and_tasks"]//../span[contains(text(), "%s") % feed]')))
%s, % feed the value is "Clean feed crm" as I am looking for this text (passed in as a parameter into my method.
What am i doing wrong? What XPATH could i use then to click the element which has the text "Clean feed crm"?
Thanks,
Riaz

If we recall some elements from the XPath sintax:
The expression "//" selects nodes in the document from the current
node that match the selection no matter where they are.
The expression ".." selects the parent of the current node.
Therefore when you write:
//div[#id="operations_add_process_list_ct_groups_and_tasks"]//..
You are selecting the div node itself. From that node the relative XPath should be:
//div[#id="operations_add_process_list_ct_groups_and_tasks"]//span[contains(text(), "Clean feed crm")]
That way you select the div node with the id selected, and look inside for the span tag which contains the text.

Nokogiri parsing with xpath returns empty string

I have the following HTML:
<div>
<table>
<tr>
<td>
<div class="w135">
<div style="float: left; padding-right: 10px;" class="imageThumbnail playerDiv">
<a href="/sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_lnkImage10" target="_parent">
<img src="/mritems/imagecache/89/135/mritems/images/2014/10/1/2014101114447491734_20.jpg" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_imgSmall10" border="0" class="imageThumbnail">
</a>
</div>
</div>
</td>
</tr>
</table>
</div>
When i attempt the rake, i get the error:
NoMethodError: undefined method `at_css' for ["id","ctl00_cphBody_ctl01_DataList1_ctl00_Thumbnail1_Layout17"]:Array
This is the code:
#request = HTTParty.get(url)
#html = Nokogiri::HTML(#request.body)
#html.css(".w135")[0].map do |item|
url = item.at_css("div.playerDiv a")
puts url.inspect
end
I'm really not sure what the issue is and have been trying to fix this for a while. The error occurs on this line url = item.at_css("div.playerDiv a")
Any suggestion is appreciated!
Thanks

I'd do it using something like:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<div>
<table>
<tr>
<td>
<div class="w135">
<div style="float: left; padding-right: 10px;" class="imageThumbnail playerDiv">
<a href="/sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_lnkImage10" target="_parent">
<img src="/mritems/imagecache/89/135/mritems/images/2014/10/1/2014101114447491734_20.jpg" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_imgSmall10" border="0" class="imageThumbnail">
</a>
</div>
</div>
</td>
</tr>
</table>
</div>
EOT
puts doc.search('.w135 div.playerDiv a').map(&:inspect)
Which outputs:
# >> #<Nokogiri::XML::Element:0x3ff0918b132c name="a" attributes=[#<Nokogiri::XML::Attr:0x3ff0918b1250 name="href" value="/sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html">, #<Nokogiri::XML::Attr:0x3ff0918b123c name="id" value="ctl00_ctl00_DataList1_ctl00_Thumbnail1_lnkImage10">, #<Nokogiri::XML::Attr:0x3ff0918b1228 name="target" value="_parent">] children=[#<Nokogiri::XML::Text:0x3ff0918a5b6c "\n ">, #<Nokogiri::XML::Element:0x3ff0918a5360 name="img" attributes=[#<Nokogiri::XML::Attr:0x3ff0918a4d20 name="src" value="/mritems/imagecache/89/135/mritems/images/2014/10/1/2014101114447491734_20.jpg">, #<Nokogiri::XML::Attr:0x3ff0918a4cbc name="id" value="ctl00_ctl00_DataList1_ctl00_Thumbnail1_imgSmall10">, #<Nokogiri::XML::Attr:0x3ff0918a4b90 name="border" value="0">, #<Nokogiri::XML::Attr:0x3ff0918a4a28 name="class" value="imageThumbnail">]>, #<Nokogiri::XML::Text:0x3ff091871920 "\n ">]>
If you're trying to access the "href" parameter, instead of using inspect, use:
puts doc.search('.w135 div.playerDiv a').map{ |n| n['href'] }
# >> /sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html

Find control of repeater nested in datalist on repeater item command

In my code repeater is nested in datalistdatalist contains checkboxes and radiobutton
i want to do database operation on checkchanged of checkboxes,so i had written this operation on repeater itemcomand. Here I am not able to find control of repeater.so please guide me how to find control of repeater.
My design is like this:
<asp:DataList ID="DatalistQues" runat="server" DataKeyField="QuestionID" OnSelectedIndexChanged="DataListText_SelectedIndexChanged"
Width="100%" OnItemCommand="Repeater1_ItemCommand">
<ItemTemplate>
<table width="100%">
<tr>
<td style="width: 13%">
<asp:LinkButton ID="LinkButton6" runat="server" CommandName="Select" CssClass="ppppp">
<asp:Image ID="Image1" runat="server" ImageUrl='<% #Eval("Image")%>' Height="60px"
Width="65px" />
</asp:LinkButton>
</td>
<td style="width: 87%">
<asp:LinkButton ID="LinkButton7" runat="server" CommandName="Select" CssClass="ppppp">
<asp:Label ID="Label3" runat="server" Text='<% #Eval("name")%>'></asp:Label>
</asp:LinkButton>
</td>
</tr>
<tr>
<td style="width: 13%">
</td>
<td style="width: 87%; white-space: pre-line">
<asp:Label ID="TextBox2" runat="server" Text='<% #Eval("Question")%>'></asp:Label>
</td>
</tr>
<tr>
<td style="width: 13%">
</td>
<td style="width: 87%">
<script type="text/javascript" language="javascript">
function fnCheckUnCheck(objId) {
var grd = document.getElementById("TabContainer1_TabPanel2_DatalistQues");
alert(grd);
//Collect A
var rdoArray = grd.getElementsByTagName("input");
alert(rdoArray);
for (i = 0; i <= rdoArray.length - 1; i++) {
if (rdoArray[i].type == 'radio') {
if (rdoArray[i].id != objId) {
rdoArray[i].checked = false;
}
}
}
}
</script>
<asp:Repeater ID="RepeaterQues" runat="server" OnItemCommand="Repeater1_ItemCommand">
<HeaderTemplate>
</HeaderTemplate>
<ItemTemplate>
<table style="border: none">
<tr>
<td style="width: 100px">
<asp:LinkButton ID="LinkButton23" runat="server" CommandName="radiob">
<asp:RadioButton ID="RadioButton1" runat="server" onclick="fnCheckUnCheck(this.id);" /></asp:LinkButton><asp:LinkButton
ID="LinkButton24" runat="server" CommandName="checkb">
<asp:CheckBox ID="CheckBox1" runat="server" Text='<%#Eval("QOption") %>' /></asp:LinkButton>
</td>
<td style="width: 100px">
<asp:Label ID="empname" Text='<%#Eval("QOption") %>' runat="server"></asp:Label>
</td>
</tr>
</table>
</ItemTemplate>
</asp:Repeater>
</td>
</tr>
</table>
</ItemTemplate>
</asp:DataList>

Try this
((Repeater)e.Item.Parent).ID // this gives the ID as specified on the aspx page
If you need Unique ID, you have to use this
((Repeater)e.Item.Parent).ClientID

On Item command of Repeater you will have to first find the naming container of sender
it will return you datalist item.
then in returned list item you will have to find the repeater of that particular data list itme
once you will find the repeater you can easily find the controls of the item of repeater

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

how to get text out of span in python using scrapy? - python-2.7

Related

Regex: match number in table cell

BeautifulSoup My extracted data from HTML table does not print out in the same table format. Can i keep the table format

Selenium Python UnboundLocalError: local variable 'element' referenced before assignment

Nokogiri parsing with xpath returns empty string

Find control of repeater nested in datalist on repeater item command

Categories

Resources