PhantomJS with selenium doesnt scroll to bottom - python-2.7

With Firefox driver, the below code works(scrolls to bottom of page) but not with PhantomJS webdriver. The below page has infinity scroll so I need to scroll down to gather more information. Kindly help me identify why this doesnt work with phantomjs.
driver = webdriver.PhantomJS()
driver.maximize_window()
driver.get("http://www.betpawa.co.ke/upcoming")
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "div.events-wrapper")))
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
soup = BeautifulSoup(driver.page_source.encode('utf-8'),"html.parser")
print len(soup.findAll("div", {"class":"prematch"}))

Related

Python, Why is selenium script running so slow [duplicate]

Selenium driver.get (url) wait till full page load. But a scraping page try to load some dead JS script. So my Python script wait for it and doesn't works few minutes. This problem can be on every pages of a site.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.cortinadecor.com/productos/17/estores-enrollables-screen/estores-screen-corti-3000')
# It try load: https://www.cetelem.es/eCommerceCalculadora/resources/js/eCalculadoraCetelemCombo.js
driver.find_element_by_name('ANCHO').send_keys("100")
How to limit the time wait, block AJAX load of a file, or is other way?
Also I test my script in webdriver.Chrome(), but will use PhantomJS(), or probably Firefox(). So, if some method uses a change in browser settings, then it must be universal.
When Selenium loads a page/url by default it follows a default configuration with pageLoadStrategy set to normal. To make Selenium not to wait for full page load we can configure the pageLoadStrategy. pageLoadStrategy supports 3 different values as follows:
normal (full page load)
eager (interactive)
none
Here is the code block to configure the pageLoadStrategy :
Firefox :
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities().FIREFOX
caps["pageLoadStrategy"] = "normal" # complete
#caps["pageLoadStrategy"] = "eager" # interactive
#caps["pageLoadStrategy"] = "none"
driver = webdriver.Firefox(desired_capabilities=caps, executable_path=r'C:\path\to\geckodriver.exe')
driver.get("http://google.com")
Chrome :
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "normal" # complete
#caps["pageLoadStrategy"] = "eager" # interactive
#caps["pageLoadStrategy"] = "none"
driver = webdriver.Chrome(desired_capabilities=caps, executable_path=r'C:\path\to\chromedriver.exe')
driver.get("http://google.com")
Note : pageLoadStrategy values normal, eager and none is a requirement as per WebDriver W3C Editor's Draft but pageLoadStrategy value as eager is still a WIP (Work In Progress) within ChromeDriver implementation. You can find a detailed discussion in “Eager” Page Load Strategy workaround for Chromedriver Selenium in Python
#undetected Selenium answer works well but for the chrome, part its not working use the below answer for chrome
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
capa = DesiredCapabilities.CHROME
capa["pageLoadStrategy"] = "none"
browser= webdriver.Chrome(desired_capabilities=capa,executable_path='PATH',options=options)

browser.click() & browser.send_keys() conflict - Selenium 3.0 Python 2.7

I am currently trying to implement a subtitle downloader with the help of the http://www.yifysubtitles.com website.
The first part of my code is to click on the accept cookies button and then send keys to search the movie of interest.
url = "http://www.yifysubtitles.com"
profile = SetProfile() # A function returning my favorite profile for Firefox
browser = webdriver.Firefox(profile)
WindowSize(400, 400)
browser.get(url)
accept_cookies = WebDriverWait(browser, 100).until(
EC.element_to_be_clickable((By.CLASS_NAME, "cc_btn.cc_btn_accept_all")))
accept_cookies_btn = browser.find_element_by_class_name("cc_btn.cc_btn_accept_all")
accept_cookies_btn.click()
search_bar = browser.find_element_by_id("qSearch")
search_bar.send_keys("Harry Potter and the Chamber of Secrets")
search_bar.send_keys(Keys.RETURN)
print "Succesfully clicked!"
But it only works once - if not randomly. If I turn on my computer and run the code, it does click, make the search and print the last statement. The second time, it doesn't click but still make the search and print the final statement.
After each try, I close the session with the browser.quit() method.
Any idea on what might be the issue here?
Specify wait for button and search bar it should solve your problem.
Thanks,D

With selenium I do not get the data

I have with successfully navigated to an iframe with selenium + phantomJS but I do not get the data.
If I look the iframe url in Midori browser I can see the result.
But with webdriver without the table.
Here is my test code:
link = 'http://ebelediye.fatih.bel.tr/alfa/servlet/hariciprogramlar.online.rayic?caid=1449'
def get_site():
driver = webdriver.PhantomJS()
driver.get(link)
driver.find_element_by_name('btnlistele').click()
src = driver.find_element_by_tag_name('iframe').get_attribute('src')
driver.get(src)
print driver.page_source
This seem to be security issue, because of the high frequency you're sending requests.
FloodGuard Güvenlik uyarısı !!!
Bu kadar sık istek gönderemezsiniz !!!
Just add some delay as below:
import time
link = 'http://ebelediye.fatih.bel.tr/alfa/servlet/hariciprogramlar.online.rayic?caid=1449'
def get_site():
driver = webdriver.PhantomJS()
driver.get(link)
time.sleep(1)
driver.find_element_by_name('btnlistele').click()
src = driver.find_element_by_tag_name('iframe').get_attribute('src')
driver.get(src.replace('ISSK_KOD=', 'ISSK_KOD=999'))
print driver.page_source

How to cronly fill ajax form in django app?

I have following problem:
I need my app to
1. Go to https://airqualityegg.wickeddevice.com/download and fill out the form
(I found Django-cron package. Is it good idea to use it?)
2. Wait, while after js effects "Download file" button will appear
3. Download .zip file by clicking on that button
4. Extract from this .zip archive .csv file and work with it.
every 1 hour
What I gonna do?
I've used selenium webdriver:
def get_zipfile_file_link(self):
display = Display(visible=0, size=(1024, 768))
display.start()
driver = webdriver.Firefox()
driver.get("https://airqualityegg.wickeddevice.com/download")
driver.find_element_by_id("serial_numbers").send_keys(self.egg_id)
driver.find_element_by_id("start_date").send_keys(timezone.now() - timedelta(hours=5))
driver.find_element_by_id("zipfilename").send_keys("q")
driver.find_element_by_id("download_submit").click()
WebDriverWait(driver, 150).until(lambda d: d.find_element_by_xpath('/html/body/table/tbody/tr[9]/td[2]/a'))
url = driver.find_element_by_xpath('/html/body/table/tbody/tr[9]/td[2]/a').get_attribute('href')
driver.close()
display.stop()
return url

Using Mechanize to login to Bing on Python 2.7.5

I am trying to use Python 2.7.5 and the mechanize library to create a program that logs me into my Microsoft account on bing.com. To start out I have created this program to print out the names of the forms on this webpage, so I can reference them in later code. My current code is this (sorry about the long URL):
import mechanize
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent','Firefox')]
br.open("https://login.live.com/ppsecure/post.srf?wa=wsignin1.0&rpsnv=11&ct=1375231095&rver=6.0.5286.0&wp=MBI&wreply=http:<%2F%2Fwww.bing.com%2FPassport.aspx%3Frequrl%3Dhttp%253a%252f%252fwww.bing.com%252f&lc=1033&id=264960&bk=1375231423")
print(br.title)
forms_printed = 0
for form in br.forms():
print form
forms_printed += 1
if forms_printed == 0:
print "No forms to print!"
Despite the fact that when I visit the webpage in Firefox I see the username and password form, when I run this code, the result always is "No forms to print!" Am I using mechanize wrong here, or is the website intentionally stopping me from finding those forms? Any tips and/or advice is greatly appreciated.
If you try to read the HTML that you are recieving you will see that webpage requires javascript.
Example:
import mechanize
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent','Firefox')]
page = br.open("https://login.live.com/ppsecure/post.srf?wa=wsignin1.0&rpsnv=11&ct=1375231095&rver=6.0.5286.0&wp=MBI&wreply=http:<%2F%2Fwww.bing.com%2FPassport.aspx%3Frequrl%3Dhttp%253a%252f%252fwww.bing.com%252f&lc=1033&id=264960&bk=1375231423")
print page.read()
print(br.title)
forms_printed = 0
for form in br.forms():
print form
forms_printed += 1
if forms_printed == 0:
print "No forms to print!"
Output:
Microsoft account
JavaScript required to sign in
Microsoft account requires JavaScript to sign in. This web browser either does not support JavaScript, or scripts are being blocked.
To find out whether your browser supports JavaScript, or to allow scripts, see the browser's online help.
See related questions about that