I want to use regular expression to match the following html table:
<tbody class=\"DocTableBody \">
<tr data-fastRow=\"1\" class=\"DataRow TDRE\">
<td id=\"g-f-1\" class=\"TDC FieldDisabled Field TCLeft CellText g-f\" >
<div class=\"DTC\">
<label id=\"c_g-f-1\" class=\"DCC\" >01-Apr-2015</label>
</div>
</td>
<td id=\"g-g-1\" class=\"TDC FieldDisabled Field TCLeft CellTextHtml g-g\" >
<div class=\"DTC\">
<label id=\"c_g-g-1\" class=\"DCC\" >ACTIVE</label>
</div>
</td>
</tr>
<tr data-fastRow=\"2\" class=\"DataRow TDRO\">
<td id=\"g-f-2\" class=\"TDC FieldDisabled Field TCLeft CellText g-f\" >
<div class=\"DTC\">
<label id=\"c_g-f-2\" class=\"DCC\" >01-Apr-2015</label>
</div>
</td>
<td id=\"g-g-2\" class=\"TDC FieldDisabled Field TCLeft CellTextHtml g-g\" >
<div class=\"DTC\">
<label id=\"c_g-g-2\" class=\"DCC\" >ACTIVE</label>
</div>
</td>
</tr>
</tbody>
I expected to extract the following value:
"1"
01-Apr-2015
ACTIVE
"2"
01-Apr-2015
ACTIVE
I tried the following to extract the value in data-fastRow:
(?sUi)<tr data-fastRow=\\"(\d+)\\".+>.*<\/tr>
But I couldn't extract the nested items in <label.+>(.*)</label> in single regular expression.
Is that possible to extract parent and nested items in single regular expression?
It's a really bad idea to parse HTML with regular expressions.
Each languahe has its own libraries to parse HTML.
In Python for example you have BeautifulSoup.
It's by far much better to use such libraries.
Usually, such libraries has jQuery-Selector-like interface (or something like that), which allows you to find your data with extremely easy queries.
Related
I'm trying to grab the value from the lights node, based on a house number set in a parameter. The problem is, based on certain conditions, houses may be in different row positions.
If the parameter being sent to me for the house number is House237, then how to I get the number of lights located within the row-2-Lights node?
Also, how do I do the same if the next run, the house number is House867? Below is my HTML:
<?xml version='1.0' encoding='utf-8'?>
<table id="neighborhood">
<tr onmouseover="leave('1')">
<td id="row-1-house">
<div class="houseCol">
<a href="#" onClick="goHome('867');return false">
House867
</a>
</div>
</td>
<td id="row-1-Lights">
<div class="decimal">14</div>
</td>
</tr>
<tr onmouseover="leave('2')">
<td id="row-2-house">
<div class="houseCol">
<a href="#" onClick="goHome('237');return false">
House237
</a>
</div>
</td>
<td id="row-2-Lights">
<div class="decimal">12</div>
</td>
</tr>
</table>
You can try the following XPath-1.0 expression. The parameter is the 'HouseXXX' string, the child of the a element.
/table[#id='neighborhood']/tr[td/div[#class='houseCol']/a[normalize-space(text())='House237']]/td[contains(#id,'Lights')]/div[#class='decimal']/text()
The output of this is
12
In this example the parameter is set to 'House237'. How you incorporate the parameter into the XPath expression depends on your usecase scenario.
For example, in XSLT you would replace 'House237' with a variable like $HouseNumber to set the parameter.
I have site and the HTML looks like this:
<tr role="row" class="odd">
<td class="sorting_1">555</td>
<td>
FruitType1 : Fruit1
</td>
<td>Fruit1</td>
<td>FruitType1</td>
<td>Somwhere</td>
<td></td>
<td>0</td>
<td>
<button class="copy_button btn_gray_inverse" id="555">Copy</button>
</td>
<td>
<button class="fruit check_btn" id="555" href="" value="0">
<i class="bt_check"></i>
</button>
</td>
<td>
<a class="fruit remove_btn" id="555" href="#">
<i class="bt_remove">
::before
</i>
</a>
</td>
</tr>
Im trying to click on button (<button class="fruit check_btn" id="555" href="" value="0">) inside this specific row. Rows can be different only in text under tr (Fruit1, Fruit2), with this code and its not working:
FruitList = self.driver.find_elements_by_xpath\
('//tr[#role="row" and contains (., "Fruit1")]')
for Fruit in FruitList:
Enabled = Fruit.find_element_by_xpath('//i[#class="bt_check"]')
Enabled.click()
It allways clicks on button from first awailable row on the page, not the one that is containing text "Fruit1".
Please help
To find the check button for Fruit1, I suggest you to do it in two steps.
First find the row for Fruit1:
fruit_row = driver.find_element_by_xpath("//td[text()='Fruit1']/..")
Note that I use /.. at the end of the Xpath in order to select the tr element that contains the td with text 'Fruit1' instead of the td himself.
Second step, find the button and click on it:
fruit_row.find_element_by_class_name("check_btn").click()
To click on <td> with text as Fruit1, you can use the following line of code :
driver.find_element_by_xpath("//td/a[contains(#href,'/fruit_list/555')]//following::td[text()='Fruit1']")
You can be more generic with :
driver.find_element_by_xpath("//td/a[contains(#href,'/fruit_list/555')]//following::td[1]")
i have a very specific task to achieve with a single regex.
Here's the pattern of the text i have to extract the data from (note i'm parsing HTML-like code, stored in an immutable file) :
<tr>
<td > <a ><img /></a>
</td>
<td > <a ><span >RootData</span></a>
</td>
<td > Data1.1
</td>
<td > <a ><img /></a>
</td>
<td > <a ><span >Data1.2</span></a>
</td>
<td >
</td></tr>
<tr>
<td > Data2.1
</td>
<td > <a ><img /></a>
</td>
<td > <a ><span >Data2.2</span></a>
</td>
<td >
</td></tr>
...
First there's a root contained inside the first "tr". Still inside this one, there's some datq (Data1.1 and Data1.2) to extract.
Then comes a finite number of "tr" block each containing data to extract.
I'd like the matches to be like this :
match 1 : 'RootData' 'Data1.1' 'Data1.2'
match 2 : 'RootData' 'Data2.1' 'Data2.2'
etc
So far i see what to do with 2 regex and 2 loops (like 1 searching for the Root, and the other to find all datas from this root) but i'd like it to be in a single regex.
If some of you already encountered that and could help, that'd be nice :)
Thanks in advance.
If I understand you correctly, you'd like to have a single regular expression provide more than one match for the same input. Regular expressions do not work that way, and are probably just not the right tool for the problem you're trying to solve.
I have the following HTML structure. A table with rows, the 2nd column has the text displayed e.g. Title (in row 1), FName (in row 2), SNAME (in row3), GENDER etc.
The 3rd column has a checkbox for each row.
I am trying to select a particular checkbox. E.g. my method will accept a parameter (the name of the text value in the row e.g. TITLE).
The method will select the checkbox for TITLE.
When i call the method again with parameter FNAME, the checkbox for FNAME will be clicked.
I am using Selenium Webdriver with Python
I have tried the following XPATH to identify the checkbox:
//span[#title="TITLE" and contains(text(), "TITLE")]/following-sibling::*
//span [text()="TITLE"]/../../preceding-sibling::td/div/input[#type="checkbox"]
These do not find the checkbox for the row called TITLE
I can get to the TITLE with the following XPATH.
//span [text()="TITLE"]
My code snippet is:
def add_mapping2(self, name):
try:
checkbox = self.driver.find_element(By.XPATH, '//span [text()="+name+"]/../../preceding-sibling::td/div/input[#type="checkbox"]')
checkbox.click()
except NoSuchElementException, e:
return False
return True
From my unittest.Testcase class I call the method as follows:
class MappingsPage_TestCase(BaseTestCase):
def test_add_mappings(self):
mappingsPage = projectNavigator.select_projectNavigator_item("Mappings")
mappingsPage.add_mapping2("TITLE")
mappingsPage.add_mapping2("SNAME")
The HTML is:
<table id="data_configuration_edit_mapping_tab_mappings_ct_mapping_body" cellspacing="0" style="table-layout: fixed; width: 100%;">
<colgroup>
<tbody>
<tr class="GOFU2OVFG" __gwt_subrow="0" __gwt_row="0">
<tr class="GOFU2OVEH GOFU2OVGH GOFU2OVPG GOFU2OVMG" __gwt_subrow="0" __gwt_row="1">
<td class="GOFU2OVEG GOFU2OVFH GOFU2OVHG GOFU2OVHH GOFU2OVAH GOFU2OVNG">
<td class="GOFU2OVEG GOFU2OVFH GOFU2OVHH GOFU2OVAH GOFU2OVNG">
<div __gwt_cell="cell-gwt-uid-792" style="outline-style:none;">
<span title="TITLE" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;padding-right: 1px;">TITLE</span>
</div>
</td>
<td class="GOFU2OVEG GOFU2OVFH GOFU2OVHH GOFU2OVBH GOFU2OVOG GOFU2OVAH GOFU2OVNG">
<div __gwt_cell="cell-gwt-uid-793" style="outline-style:none;" tabindex="0">
<input type="checkbox" checked="" tabindex="-1"/>
</div>
</td>
</tr>
<tr class="GOFU2OVFG" __gwt_subrow="0" __gwt_row="2">
<td class="GOFU2OVEG GOFU2OVGG GOFU2OVHG">
<div __gwt_cell="cell-gwt-uid-791" style="outline-style:none;">
<input type="radio" name="rbCrossRow124"/>
</div>
</td>
<td class="GOFU2OVEG GOFU2OVGG">
<div __gwt_cell="cell-gwt-uid-792" style="outline-style:none;">
<span class="" title="FNAME" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;padding-right: 1px;">FNAME</span>
</div>
</td>
<td class="GOFU2OVEG GOFU2OVGG GOFU2OVBH">
<div __gwt_cell="cell-gwt-uid-793" style="outline-style:none;">
<input type="checkbox" tabindex="-1"/>
</div>
</td>
</tr>
<tr class="GOFU2OVEH" __gwt_subrow="0" __gwt_row="3">
<tr class="GOFU2OVFG" __gwt_subrow="0" __gwt_row="4">
<tr class="GOFU2OVEH" __gwt_subrow="0" __gwt_row="5">
<tr class="GOFU2OVFG" __gwt_subrow="0" __gwt_row="6">
more rows with names with checkboxes etc......
</tbody>
</table>
What XPATH could I use to get the checkbox for TITLE, FNAME etc?
I have the table ID "data_configuration_edit_mapping_tab_mappings_ct_mapping_body"
Maybe there is a way to start from the table ID and use a for loop to iterate through the rows and find the particular checkbox?
Thanks.
Riaz
you would use the following xpath expression
String xpath = "//span[#title = 'TITLE']/ancestor::tr[1]//input[#type = 'checkbox']"
What it does:
first search for a span element with your parameter (pls change 'TITLE' to the variable you are using)
then find the first ancestor element that is a tr-element
from there find the input element that is your checkbox within this tr-element
You could then refine to sth like this:
WebElement table = driver.findElement(By.id("data_configuration_edit_mapping_tab_mappings_ct_mapping_body"));
WebElement checkbox = table.findElement(By.xpath(xpath));
For the interest of others whom come across a similar problem. My Python code which I have used for the above answer is:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
def add_mapping(self, name):
wait = WebDriverWait(self.driver, 10)
try:
checkbox = wait.until(EC.element_to_be_clickable((By.XPATH, '//span[#title = "%s"]/ancestor::tr[1]//input[#type = "checkbox"]' % name)))
#table_id = self.driver.find_element(wait.until(EC.element_to_be_clickable(By.ID, 'data_configuration_edit_mapping_tab_mappings_ct_mapping_body')))
#checkbox = table_id.find_element(By.XPATH, '//span[#title = "TITLE"]/ancestor::tr[1]//input[#type = "checkbox"]')
checkbox.click()
except NoSuchElementException, e:
return False
return True
The table id i commented out also works.
I am automating the test for web application. I have a scenario for creating an admin, for which i have to enter the name, email address and phone number text boxes. But ids of this text boxes are dynamic.
userName, id='oe-field-input-41'
Email, id='oe-field-input-42'
phone number, id='oe-field-input-43'
First Query:
The numbers in the ids are dynamic, it keep changes
I tired to use the xpath for handling the dynamic value.
xpath = //*[starts-with(#id,'oe-field-input-')]
In this it enter the text into first text box successfully
Second Query:
I am not able use the same xpath for next two text boxes, as it enters the email and phone number into name field only
Please help me to resolve this dynamic value handling.
Edited: added the html code,
<table class="oe_form_group " cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr class="oe_form_group_row">
<td class="oe_form_group_cell oe_form_group_cell_label" width="1%" colspan="1">
<td class="oe_form_group_cell" width="99%" colspan="1">
<span class="oe_form_field oe_form_field_many2one oe_form_field_with_button">
<a class="oe_m2o_cm_button oe_e" tabindex="-1" href="#" draggable="false" style="display: inline;">/</a>
<div>
</span>
</td>
</tr>
<tr class="oe_form_group_row">
<td class="oe_form_group_cell oe_form_group_cell_label" width="1%" colspan="1">
<td class="oe_form_group_cell" width="99%" colspan="1">
<span class="oe_form_field oe_form_field_email">
<div>
<input id="oe-field-input-35" type="text" maxlength="240">
</div>
</span>
</td>
</tr>
<tr class="oe_form_group_row">
<td class="oe_form_group_cell oe_form_group_cell_label" width="1%" colspan="1">
<td class="oe_form_group_cell" width="99%" colspan="1">
<span class="oe_form_field oe_form_field_char">
<input id="oe-field-input-36" type="text" maxlength="32">
</span>
</td>
</tr>
<tr class="oe_form_group_row">
<td class="oe_form_group_cell oe_form_group_cell_label" width="1%" colspan="1">
<td class="oe_form_group_cell" width="99%" colspan="1">
<span class="oe_form_field oe_form_field_char">
<input id="oe-field-input-37" type="text" maxlength="32">
</span>
</td>
</tr>
<tr class="oe_form_group_row">
</tbody>
you can try alternate way for locating unique element by label or so. For example:
css=.oe_form_group_row:contains(case_sensitive_text) input
xpath=//tr[#class = 'oe_form_group_row'][contains(.,'case_sensitive_text')]//input
If you are using ISFW you should create custom component for such form fields.
You do have some classes which are good for identification, e.g. oe_form_field_email, oe_form_field_char. It's a little complicated to use them because they're not on the input fields themselves, and the second one is not unique; but it's quite possible:
.//span[contains(#class, 'oe_form_field_email')]//input
That is an xpath which identifies the Email field as being the input which is a descendant of a span with the oe_form_field_email class. You could also use the same logic in a css selector like this, more efficiently:
span.oe_form_field_email input
For the two other fields, there is no unique class which can tell them apart so you're going to have to rely on the order (I'm assuming username comes before phone number), and that means you have to use xpaths:
(//tr//span[contains(#class, 'oe_form_field_char')])[1]//input
(//tr//span[contains(#class, 'oe_form_field_char')])[2]//input
Those xpaths pick out the first and second fields respectively, which are inputs which are descendants of a span of class oe_form_field_char.
P.S. I used Firepath in firefox to verify the xpath and css locators.
The problem here is, that your XPath does the correct selection, but Selenium will always pick the first one if multiple results are returned for your query.
You can select each of the input fields directly by using:
//input[1]
//input[2]
//input[3]
If there are other input fields, you can tighten your selection by selecting only input nodes with oe-field-input in their id attribute like this:
//input[starts-with(#id,'oe-field-input-')][1]
//input[starts-with(#id,'oe-field-input-')][2]
//input[starts-with(#id,'oe-field-input-')][3]
Use the following xpath works like a charm. Although I don't recommend this kind of an xpath. Since we don't have text against the text box no other choice.
//div/input[contains(#id, 'oe-field-input')] - First text box
//tr[#class = 'oe_form_group_row'][2]//input - Second text box
//tr[#class = 'oe_form_group_row'][3]//input - Third text box
You can use below XPATH.
//tr[#class = 'oe_form_group_row'][2]//input for First Text box
//tr[#class = 'oe_form_group_row'][3]//input for Second Text box
//tr[#class = 'oe_form_group_row'][4]//input for Third text box.
I have tested avove xpath.
But the better way if you have development access then ask developers to make is standaralized and recommand tags like "name" , "value", or attach text e.g. Email:, Password. So you can use these in your xpath.