selenium without display missing content - django

I am using django 1.10, selenium 3.4 and python 3.6.
My PC is used to test if my code is accurate. Using Chrome webdriver
The production service is on Amazon EC2 Ubuntu instance. Using Chrome webdriver as well but without display like below.
display = Display(visible=0, size=(1024, 768))
display.start()
I was using selenium to find a certain information that resides in the html like below code.
<div id="myid">
<table>
<tbody>
<tr>...</tr>
<tr>
<td>...</td>
<td ...>
<div ...>
<div ...>
<table ...>
<tbody>
<tr>...</tr>
<tr ...>
<td ...></td>
<td ...></td>
<td ...></td>
<td ...></td>
<td ...></td>
<td ...></td>
<td ...>This is what I searching</td>
<td ...></td>
<td ...></td>
..........
Sadly, there's not much id or unique class that I can use on searching.
So, I'm using xpath as this line.
driver.find_element_by_xpath("//div[#id='treegrid_QUICKSEARCH_TABLE']/table/tbody/tr[2]/td[2]/div/div[1]/table/tbody/tr[2]/td[7]").text
I can find the element I want on my PC with browser.
But not on my EC2 instance without display.
When I check the upper tag of it.
driver.find_element_by_xpath("//div[#id='treegrid_QUICKSEARCH_TABLE']/table/tbody/tr[2]/td[2]/div/div[1]/table/tbody/tr[2]").text
I found that selenium is indeed not getting that text I'm searching.
I even tried increasing the size of Display to (1920,1920)
But selenium is only not getting that text.
Once again, the code works just fine on my PC with browser.
Can someone tell me what's the problem?

Related

Xpath - Retrieveing Text value when condition contains a tag

I have section of a table and I am trying to get the value "Distributor 10"
<table class="d">
<tr>
<td class="ah">supplier<td>
<td class="ad">
Supplier 10
</td>
</tr>
<tr>
<td class="ah">distributor<pre><td>
<td class="ad">
Distributor 10
</td>
</tr>
</table>
If I am within Chrome Developer, I get this value by using the following xpath string
//tr/td[text()="distributor]/following-sibling::td[#class="ad"]/a/text()
But when I code this in python - it returns an empty list... From what I can see its is because of the <pre> tag next to "distributor"
When I amend the above mentioned xpath to look for "supplier" instead of distributor it works perfectly well
any suggestions would be welcome
Assuming you're using lxml you can use one of the following XPath to get this working :
//tr[contains(.,"distributor")]//a/text()
//a[parent::td[#class="ad"] and starts-with(#href,"/D")]/text()
Piece of code :
from lxml import etree
from io import StringIO
html = '''<table class="d">
<tr>
<td class="ah">supplier<td>
<td class="ad">
Supplier 10
</td>
</tr>
<tr>
<td class="ah">distributor<pre><td>
<td class="ad">
Distributor 10
</td>
</tr>
</table>'''
parser = etree.HTMLParser()
tree = etree.parse(StringIO(html), parser)
data = tree.xpath('//tr[contains(.,"distributor")]//a/text()')
print (data)
Output : ['Distributor 10']
Alternative : use lxml html cleaner class ("remove_tags") to remove the pre element from your page.
References :
https://lxml.de/api/lxml.html.clean.Cleaner-class.html
https://lxml.de/lxmlhtml.html#cleaning-up-html

regular expression: what's wrong with my expression?

I have a difficulty building a regex.
Suppose there is a html clip as below.
I want to use Javascript to cut the <tbody> part with the link of "apple"(which <a> is inside of the <td class="by">)
I construct the following expression :
/<tbody.*?text[\s\S]*?<td class="by"[\s\S]*?<a.*?>apple<\/a>[\s\S]*?<\/tbody>/g
But the result is different from what I wanted. Each match contains more than one block of <tbody>. How it should be? Regards!!!!
(I tested with https://regex101.com/ and get the unexpected selection. Please forgive me I can't figure out the problem :( )
<tbody id="text_0">
<td class="by">
...lots of other tags
cat
...lots of other tags
</td>
</tbody>
<tbody id="text_1">
...lots of other tags
<td class="by">
apple
</td>
...lots of other tags
</tbody>
<tbody id="text_2">
...lots of other tags
<td class="by">
cat
</td>
...lots of other tags
</tbody>
<tbody id="text_3">
...lots of other tags
<td class="by">
...lots of other tags
tiger
</td>
...lots of other tags
</tbody>
<tbody id="text_4">
<td class="by">
banana
</td>
</tbody>
<tbody id="text_5">
<td class="by">
peach
</td>
</tbody>
<tbody id="text_6">
<td class="by">
apple
</td>
</tbody>
<tbody id="text_7">
<td class="by">
banana
</td>
</tbody>
And this is what i expect to get
<tbody id="text_1">
<td class="by">
apple
</td>
</tbody>
<tbody id="text_6">
<td class="by">
apple
</td>
</tbody>
This is not an answer to the regex part of the question, but shouldn't the td elements be embedded in tr elements? tr stands for "table row", while tbody stands for "table body". tbody usually groups the table rows. It is not prohibited to have more than one tbody in the same table, but it is usually not necessary. (tbody is actually optional; you can have tr directly inside the table element.)
First, Regex is not a good solution for parsing anything like HTML or XML.
I can fix your pattern to work with this specific example but I can't guarantee that it will work in all cases. Regex just is not the right tool for the job.
But anyway, replace the first 2 instances of [\s\S] in your pattern with [^<].
<tbody.*?text[^<]*?<td class="by"[^<]*?<a.*?>apple<\/a>[\s\S]*?</tbody>
Start with this working regexp and go from there:
/<a href="(.*?)">apple<\/a>/g
If that is too broad and you want to make it more specific, add the next surrounding tag:
/<td.*?>\s*<a href="(.*?)">apple<\/a>/g
Then continue:
/<tbody.*?>\s*<td.*?>\s*<a href="(.*?)">apple<\/a>/g
Also, consider an alternate solution such as XPATH. Regular expressions can't really parse all variations of HTML.

html parsing using jsoup and coldfusion

This is the continuation of my previous question. Below is the script I am trying to build to parse HTML which looks like the example below. I am getting the error Value must be initialised before use. Not able to attached the error.
I have to make a http call using jsoup where I need to provide username and password for the server login. Is the below code right way to do it? I looked at the Bennals blog for html parsing using jsoup.
I have this in my Application.cfc
component {
this.name = "jsoupTest";
this.javaSettings = {loadPaths=["/jsoup/jsoup-1.7.3.jar"], loadColdFusionClassPath=true};
}
Example of the HTML to be parsed
Note there are at least 5000 rows like below which need to be parsed and extract only the TEXT from the TD.
<tbody>
<tr>
<td class="drpdetailtablerowdetailleft">Robert M Best Jr.</td>
<td class="drpdetailtablerowdetailleft">AAI</td>
<td class="drpdetailtablerowdetail">7948</td>
<td class="drpdetailtablerowdetail">1</td>
<td class="drpdetailtablerowdetail">MC</td>
<td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td>
</tr>
</tbody>
<tbody>
<tr>
<td class="drpdetailtablerowdetailleft">Robert M Best Jr.</td>
<td class="drpdetailtablerowdetailleft">ABWS</td>
<td class="drpdetailtablerowdetail">4884</td>
<td class="drpdetailtablerowdetail">4</td>
<td class="drpdetailtablerowdetail">NMC</td>
<td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td>
</tr>
</tbody>
Updated Code to be used
<cfhttp url="https://intra.att.com/itscmetrics/EM2/LTMR.cfm" method="get" username="abc" password="zxyr">
<cfhttpparam type="url" name="LTMX" value="Andre Fuetsch / Shelly K Lazzaro">
</cfhttp>
<cfset jsoup = createObject("java", "org.jsoup.Jsoup") />
<cfset document = jsoup.parse(myPage.filecontent) />
<cfset content = doc.getElementById("contentwrapper")>
<!--- Let's see what we got. --->
<cfdump var="#content#" />
The myPage variable is being declared for the first time in your parse command.
I think you need to add result="myPage" to your cfhttp call.
<cfhttp result="myPage" url="https://intra.att.com/itscmetrics/EM2/LTMR.cfm" method="get" username="abc" password="zxyr">
It looks like the reason it is not working is because you have not called the constructor on the Jsoup class.
Try changing this line
var jSoupClass = createObject( "java", "org.jsoup.Jsoup" ).init(); // note calling init calls the constructor for the Java class
Did you install your jar file correctly?
ColdFusion searches for the objects in the following order:
The ColdFusion Java Dynamic Class Load directories:
Java archive (.jar) files in web_root/WEB-INF/lib
Class (.class) files in web_root/WEB-INF/classes
Quoted from : About ColdFusion, Java, and J2EE
So copy your jar file to web_root/WEB-INF/lib, restart CF, and try again.

How to handle dynamically changing id's with similar starting name using Webdriver

I am automating the test for web application. I have a scenario for creating an admin, for which i have to enter the name, email address and phone number text boxes. But ids of this text boxes are dynamic.
userName, id='oe-field-input-41'
Email, id='oe-field-input-42'
phone number, id='oe-field-input-43'
First Query:
The numbers in the ids are dynamic, it keep changes
I tired to use the xpath for handling the dynamic value.
xpath = //*[starts-with(#id,'oe-field-input-')]
In this it enter the text into first text box successfully
Second Query:
I am not able use the same xpath for next two text boxes, as it enters the email and phone number into name field only
Please help me to resolve this dynamic value handling.
Edited: added the html code,
<table class="oe_form_group " cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr class="oe_form_group_row">
<td class="oe_form_group_cell oe_form_group_cell_label" width="1%" colspan="1">
<td class="oe_form_group_cell" width="99%" colspan="1">
<span class="oe_form_field oe_form_field_many2one oe_form_field_with_button">
<a class="oe_m2o_cm_button oe_e" tabindex="-1" href="#" draggable="false" style="display: inline;">/</a>
<div>
</span>
</td>
</tr>
<tr class="oe_form_group_row">
<td class="oe_form_group_cell oe_form_group_cell_label" width="1%" colspan="1">
<td class="oe_form_group_cell" width="99%" colspan="1">
<span class="oe_form_field oe_form_field_email">
<div>
<input id="oe-field-input-35" type="text" maxlength="240">
</div>
</span>
</td>
</tr>
<tr class="oe_form_group_row">
<td class="oe_form_group_cell oe_form_group_cell_label" width="1%" colspan="1">
<td class="oe_form_group_cell" width="99%" colspan="1">
<span class="oe_form_field oe_form_field_char">
<input id="oe-field-input-36" type="text" maxlength="32">
</span>
</td>
</tr>
<tr class="oe_form_group_row">
<td class="oe_form_group_cell oe_form_group_cell_label" width="1%" colspan="1">
<td class="oe_form_group_cell" width="99%" colspan="1">
<span class="oe_form_field oe_form_field_char">
<input id="oe-field-input-37" type="text" maxlength="32">
</span>
</td>
</tr>
<tr class="oe_form_group_row">
</tbody>
you can try alternate way for locating unique element by label or so. For example:
css=.oe_form_group_row:contains(case_sensitive_text) input
xpath=//tr[#class = 'oe_form_group_row'][contains(.,'case_sensitive_text')]//input
If you are using ISFW you should create custom component for such form fields.
You do have some classes which are good for identification, e.g. oe_form_field_email, oe_form_field_char. It's a little complicated to use them because they're not on the input fields themselves, and the second one is not unique; but it's quite possible:
.//span[contains(#class, 'oe_form_field_email')]//input
That is an xpath which identifies the Email field as being the input which is a descendant of a span with the oe_form_field_email class. You could also use the same logic in a css selector like this, more efficiently:
span.oe_form_field_email input
For the two other fields, there is no unique class which can tell them apart so you're going to have to rely on the order (I'm assuming username comes before phone number), and that means you have to use xpaths:
(//tr//span[contains(#class, 'oe_form_field_char')])[1]//input
(//tr//span[contains(#class, 'oe_form_field_char')])[2]//input
Those xpaths pick out the first and second fields respectively, which are inputs which are descendants of a span of class oe_form_field_char.
P.S. I used Firepath in firefox to verify the xpath and css locators.
The problem here is, that your XPath does the correct selection, but Selenium will always pick the first one if multiple results are returned for your query.
You can select each of the input fields directly by using:
//input[1]
//input[2]
//input[3]
If there are other input fields, you can tighten your selection by selecting only input nodes with oe-field-input in their id attribute like this:
//input[starts-with(#id,'oe-field-input-')][1]
//input[starts-with(#id,'oe-field-input-')][2]
//input[starts-with(#id,'oe-field-input-')][3]
Use the following xpath works like a charm. Although I don't recommend this kind of an xpath. Since we don't have text against the text box no other choice.
//div/input[contains(#id, 'oe-field-input')] - First text box
//tr[#class = 'oe_form_group_row'][2]//input - Second text box
//tr[#class = 'oe_form_group_row'][3]//input - Third text box
You can use below XPATH.
//tr[#class = 'oe_form_group_row'][2]//input for First Text box
//tr[#class = 'oe_form_group_row'][3]//input for Second Text box
//tr[#class = 'oe_form_group_row'][4]//input for Third text box.
I have tested avove xpath.
But the better way if you have development access then ask developers to make is standaralized and recommand tags like "name" , "value", or attach text e.g. Email:, Password. So you can use these in your xpath.

Scraping No Link

I'm learning xpath & web scraping using django-dynamic-scraper aka DSS (django+scrapy) and try to retrieve data from a website with following code:
<tr valign="top">
<td align="center" valign="top">
<p><img src="someimage.jpg"></p>
</td>
<td>
</td>
<td>
<div align="left">
<span class="style1">
<strong>Title1</strong>
</span>
<span class="style2">Title2:</span>ContentA<br />
<span class="style2">Title3:</span>ContentB<br />
<span class="style2">Title4:</span>ContentC<br />
</div>
</td>
</tr>
My questions:
What's the xpath for an URL object of DSS if there's no link at that code?
What's the xpath to retrieve image file if there's no class for first <td>?
How to retrieve data for each data from ContentA, ContentB, & ContentC if the span's class is same?
What's the xpath for an URL object of DSS if there's no link at that code?
Can't get the question, could you please explain?
What's the xpath to retrieve image file if there's no class for first ?
//tr[1]/td[1]//img/#src
How to retrieve data for each data from ContentA, ContentB, & ContentC
if the span's class is same?
//text()[preceding-sibling::span[#class="style2"]]