I am writing a script that will save the complete contents of a web page. If I try using urllib2 and bs4 it only writes the contents of the logon page and none of the content after navigating to a search within the page. However, if I do a ctrl + s on the search results page, an html file is saved to disk that when opened in a text editor has all of the contents from the search results.
I've read several posts here on the subject and am trying to use the steps in this one:
How to save "complete webpage" not just basic html using Python
However, after installing geckodriver and setting the sys path variable I continue to get errors. Here is my limited code:
from selenium import webdriver
>>> from selenium.webdriver.common.action_chains import ActionChains
>>> from selenium.webdriver.common.keys import Keys
>>> br = webdriver.Firefox()
Here is the error:
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\Python27\lib\site-packages\selenium\webdriver\firefox\webdriver.py", line 142, in __init__
self.service.start()
File "C:\Python27\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start
os.path.basename(self.path), self.start_error_message)
WebDriverException: Message: 'geckodriver' executable needs to be in PATH.
And here is where I set the sys path variable:
I've restarted after setting sys path variable.
UPDATE:
I am now trying to use the chromdriver as this seemed more straight forward. I downloaded hromedriver_win32.zip II'm on a windows laptop) from chromedriver's download page, set the environmetal variable path to:
C:\Python27\Lib\site-packages\selenium\webdriver\chrome\chromedriver.exe
but am getting the similar following error:
>>> br = webdriver.Chrome()
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\Python27\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 62, in __init__
self.service.start()
File "C:\Python27\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start
os.path.basename(self.path), self.start_error_message)
WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
You have also to add the path of Firefox to the system variables manually,
you maybe have installed firefox some other location while Selenium is trying to find firefox and launch from default location but it couldn't find. You need to provide explicitly firefox installed binary location:
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary('path/to/installed firefox binary')
browser = webdriver.Firefox(firefox_binary=binary)
browser = webdriver.Firefox()
Related
I ran the following code by installing selenium and django module.
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://localhost:8000')
assert 'Django' in browser.title
For the selenium module, I need geckodriver for firefox browser.
So, I installed geckodriver by different ways - 1. npm, 2. brew, 3. direct install (download from here and move it to /usr/local/bin or /usr/bin. All the ways did not work for the above test code.
I got the following error message:
Traceback (most recent call last):
File "functional_tests.py", line 3, in <module>
browser.get('http://localhost:8000')
File "/Users/kiyeonj/opt/anaconda3/envs/tdd_practice/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/Users/kiyeonj/opt/anaconda3/envs/tdd_practice/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/Users/kiyeonj/opt/anaconda3/envs/tdd_practice/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Reached error page: about:neterror?e=connectionFailure&u=http%3A//localhost%3A8000/&c=UTF-8&d=Firefox%20can%E2%80%99t%20establish%20a%20connection%20to%20the%20server%20at%20localhost%3A8000.
Please let me know what the problem is..
I think you need a gecko driver as well.
You can download from here
and put it into a folder, preferably where you have your .py files
and then use it like this :-
from selenium import webdriver
driver = webdriver.Firefox(executable_path = "D:\geckodriver.exe")
driver.maximize_window()
browser.get('http://localhost:8000')
assert 'Django' in browser.title
PS :
executable_path = 'full file path to your gecko driver'
I have had this strange problem with import in python 2.7
I have my app in a directory that has some subdirectoriers and more python apps running simultaneously using Pyro Name Server for communicating with each other.
When I run one of my apps, it crashes on import when calling one of sub methods.
Here is the exception:
Traceback (most recent call last):
File "ps_logic.py", line 15840, in <module>
ps_logic = PSLogic(pyro_objects, cfg_handler, status_distributor, voip_processing)
File "ps_logic.py", line 590, in __init__
self.smarthopper_initial_check()
File "ps_logic.py", line 12824, in smarthopper_initial_check
counters_compared = self.smarthopper_maintenance_action()
File "ps_logic.py", line 12928, in smarthopper_maintenance_action
status = self.smart_hopper_logic.status_get()
File "/home/app_core/flexcore/003-480/ps_logic/smart_devices_logic.py", line 203, in status_get
return SmartStatusAugmented(self.smart_obj.queue_status_get(), self.smart_obj)
File "/usr/lib/python2.7/site-packages/Pyro/core.py", line 381, in __call__
return self.__send(self.__name, args, kwargs)
File "/usr/lib/python2.7/site-packages/Pyro/core.py", line 456, in _invokePYRO
return self.adapter.remoteInvocation(name, Pyro.constants.RIF_VarargsAndKeywords, vargs, kargs)
File "/usr/lib/python2.7/site-packages/Pyro/protocol.py", line 497, in remoteInvocation
return self._remoteInvocation(method, flags, *args)
File "/usr/lib/python2.7/site-packages/Pyro/protocol.py", line 536, in _remoteInvocation
answer = pickle.loads(answer)
ImportError: No module named drivers.smart.smart_common_const
it clearly says that it cannot import drivers.smart.smart_common_const BUT the problem is, I do not have that line in my code.
If I try to find in which file that line is (cuz I have already fixed it in some), it finds me nothing:
app_core#003-481 ~/flexcore/003-480 $ grep -R "from drivers.smart.smart_common_const import" .
./drivers/.svn/pristine/23/23e13acbf9e604f179d4625e18b2b992116a98a1.svn-base:from drivers.smart.smart_common_const import *
./drivers/.svn/pristine/65/65655973d3c70a16cc982db59db8f2989366524b.svn-base:from drivers.smart.smart_common_const import *
./drivers/.svn/pristine/3b/3ba2e2518e64db9188b63247b763926544bddd90.svn-base:from drivers.smart.smart_common_const import *
app_core#003-481 ~/flexcore/003-480 $
but svn files.
I have been running my python app with -v option to find out where it is trying to import from that file. BUT it is not returning nay debug line before that exception, so I guess its something imported previously or showing nothing if import fails.
I have also deleted all *.pyc files and rebooted machine to be sure there is noting left in memory, but problem persisted.
Is there any other option how to find out where is the problem? I am starting to be desperate..
That PS_Logic (whatever it may be) seems to be using Pyro to do remote calls to a server. In particular the line with the following, seems to be a remote call:
self.smart_obj.queue_status_get()
The server sends back a custom object and because it uses pickle as serialization format , your client program tries to reconstruct that object. Apparently you don't have the correct modules available in your client code, because it is pickle that fails when it tries to import the required module for you (to reassemble the response into objects)
There's got to be something in the manual of that ps_logic module that tells you about how to use it correctly, and that you should have it installed in the client as well perhaps.
(It is advised to not use pickle by the way, and stick to Pyro's default serializer, but that's another story)
I'm trying to loop through the list of companies in the Link. The link of each company name is dynamic for example http://ae.bizdirlib.com/node/946273 - Text link 946273 keeps changing i.e its dynamic. I want open each of these links in the page in a browser I'm really confused on how to do this. I have tried this for now.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time
browser = webdriver.Firefox() # Get local session of firefox
#wait until the pages are loaded
browser.implicitly_wait(3)
browser.get("http://ae.bizdirlib.com/taxonomy/term/1493") # Load page
browser.refresh()
page_source = browser.page_source
for node in page_source:
link = browser.find_element_by_link_text('node').click
On executing this code it gives a error
Traceback (most recent call last):
File "C:/Python27/automation scripts/ggulf/large data.py", line 29, in <module>
link = browser.find_element_by_link_text('node').click
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 276, in find_element_by_link_text
return self.find_element(by=By.LINK_TEXT, value=link_text)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 684, in find_element
{'using': by, 'value': value})['value']
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 195, in execute
self.error_handler.check_response(response)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 170, in check_response
raise exception_class(message, screen, stacktrace)
NoSuchElementException: Message: Unable to locate element: {"method":"link text","selector":"node"}
Stacktrace:
at FirefoxDriver.prototype.findElementInternal_ (file:///c:/users/akrakhan/appdata/local/temp/tmppveyk8/extensions/fxdriver#googlecode.com/components/driver-component.js:10299)
at fxdriver.Timer.prototype.setTimeout/<.notify (file:///c:/users/akrakhan/appdata/local/temp/tmppveyk8/extensions/fxdriver#googlecode.com/components/driver-component.js:603)
You are better off looking for something more specific rather than looking through the page source. All the company links are links inside a H2 tag. You can find them using a CSS selector h2 > a which is read find all A tags that are a child of (>) an h2 element.
browser.get("http://ae.bizdirlib.com/taxonomy/term/1493") # Load page
links = browser.find_elements_by_css_selector("h2 > a")
for link in links:
link.click
This isn't the final solution because clicking the link will take you off the main page but it's a parallel to what you were trying to accomplish. Probably a better approach would be to store the URLs of all the company links in a string array and then loop through that array navigating to each URL... or something like that. An exercise for the reader... :)
I am currently using the following versions
Python - 2.7.10 ( 32 bit , win)
AndroidViewClient - androidviewclient-10.7.1-py2.7.egg
I have a simple program as below
import sys
import os
try:
sys.path.insert(0, os.path.join(os.environ['ANDROID_VIEW_CLIENT_HOME'], 'src'))
except:
pass
from com.dtmilano.android.viewclient import ViewClient
device, serialno = ViewClient.connectToDeviceOrExit()
vc = ViewClient(device=device, serialno=serialno)
device.takeSnapshot().save('Menu.png','PNG')
This is giving me the following error
*Traceback (most recent call last):
File "dump.py", line 14, in <module>
device.takeSnapshot().save('Menu.png','PNG')
File "C:\Python27\lib\site-packages\androidviewclient-10.7.1-py2.7.egg\com\dtmilano\android\adb\adbclient.py", line 678, in takeSnapshot
image = Image.open(stream)
File "C:\Python27\lib\site-packages\PIL\Image.py", line 2126, in open
% (filename if filename else fp))
IOError: cannot identify image file <cStringIO.StringI object at 0x023462A8>*
The Same snippet code - works for some devices and for some it doesnt
How can i figure out what is wrong with the devices where it doesnt work
Also please help me idetify any configuration issues as i am new to this
I have HTML that looks like the three following sample statements:
...
12
13
(I'd presently be on pg. 11.)
I don't know the Py/Selenium/Splinter syntax for selecting one of the page numbers in a list and clicking on it to go to that page. (Also, I need to be able to identify the element in the argument as, for example, 'Page$10' or 'Page$12', as seen in the __doPostBack notation. Maybe just a 'next page', in so many words, would be fine, but I don't even know how to do that.)
Thank you for any help.
UPDATE II: Here's the code I have to work from:
import time
import win32ui
import win32api
import win32con
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from ctypes import *
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get('http://[site]');
UPDATE III:
Traceback (most recent call last):
File "montpa_05.py", line 47, in <module>
continue_link = driver.find_element_by_link_text('4')
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", l
ine 246, in find_element_by_link_text
return self.find_element(by=By.LINK_TEXT, value=link_text)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", l
ine 680, in find_element
{'using': by, 'value': value})['value']
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", l
ine 165, in execute
self.error_handler.check_response(response)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\errorhandler.py"
, line 164, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: u'no such element\n
(Session info: chrome=28.0.1500.95)\n (Driver info: chromedriver=2.2,platform=
Windows NT 6.1 SP1 x86_64)'
The <a> element is defined as a link. That means that you can select it by link text.
I don't know Python, but the java syntax would be By.linkText(##) where ## is the number you want to click on.