Python 2.7 Selenium unable to extract data

Python 2.7 Selenium unable to extract data - python-2.7

I am trying to extra data by return error
NoSuchElementException: Message: u'Unable to locate element: {"method":"xpath","selector":"//*[#id=\'searchpopbox\']"}' ; Stacktrace:
at FirefoxDriver.findElementInternal_ (file:///tmp/tmpjVcHQR/extensions/fxdriver#googlecode.com/components/driver_component.js:8444)
at FirefoxDriver.findElement (file:///tmp/tmpjVcHQR/extensions/fxdriver#googlecode.com/components/driver_component.js:8453)
at DelayedCommand.executeInternal_/h (file:///tmp/tmpjVcHQR/extensions/fxdriver#googlecode.com/components/command_processor.js:10456)
at DelayedCommand.executeInternal_ (file:///tmp/tmpjVcHQR/extensions/fxdriver#googlecode.com/components/command_processor.js:10461)
at DelayedCommand.execute/< (file:///tmp/tmpjVcHQR/extensions/fxdriver#googlecode.com/components/command_processor.js:10401)
My code is as below and I am trying to get the list from the link
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2)
profile.set_preference('browser.download.manager.showWhenStarting', False)
browser = webdriver.Firefox(profile)
url = 'https://www.bursamarketplace.com/index.php?tpl=th001_search_ajax'
browser.get(url)
time.sleep(15)
a = browser.find_element_by_xpath("//*[#id='searchpopbox']")
print a
I am seeking your help to get the right xpath for the url.

This gets all the listing for that table.
from webdriver_manager.chrome import ChromeDriverManager
from selenium import webdriver
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.bursamarketplace.com/index.php?tpl=th001_search_ajax")
time.sleep(15)
a = driver.find_element_by_xpath("//*[#id='searchpopbox']")
print(a.text)
Or without chromedrivermanager same thing applies to firefox
.Chrome(executable_path='absolutepathofchromedriver.exe')

Related

How can I add my web scrape process using bs4 to selenium automation in Python to make it one single process which just asks for a zipcode?

I am using selenium to go to a website and then go to the search button type a zipcode which I am entering beforehand and then for that zip code I want the link that the webpage has to feed my web scraper created using beautiful soup and once the link comes up I can scrape required data to get my csv.
What I want:
I am having trouble getting that link to the beautiful soup URL. I basically want to automate it so that I just have to enter a zip code and it gives me my CSV.
What I am able to get:
I am able to enter the zip code and search using selenium and then add that url to my scraper to give csv.
Code I am using for selenium :
driver = webdriver.Chrome('/Users/akashgupta/Desktop/Courses and Learning/Automating Python and scraping/chromedriver')
driver.get('https://www.weather.gov/')
messageField = driver.find_element_by_xpath('//*[#id="inputstring"]')
messageField.click()
messageField.send_keys('75252')
time.sleep(3)
showMessageButton = driver.find_element_by_xpath('//*[#id="btnSearch"]')
showMessageButton.click()
#web scraping Part:
url="https://forecast.weather.gov/MapClick.php?lat=32.99802500000004&lon=-96.79775499999994#.Xo5LnFNKgWo"
res= requests.get(url)
soup=BeautifulSoup(res.content,'html.parser')
tag=soup.find_all('div',id='seven-day-forecast-body')
weekly=soup.find_all(class_='tombstone-container')
main=soup.find_all(class_='period-name')
description=soup.find_all(class_='short-desc')
temp=soup.find_all(class_='temp')
Period_Name=[]
Desc=[]
Temp=[]
for a in range(0,len(main)):
Period_Name.append(main[a].get_text())
Desc.append(description[a].get_text())
Temp.append(temp[a].get_text())
df = pd.DataFrame(list(zip(Period_Name, Desc,Temp)),columns =['Period_Name', 'Short_Desc','Temperature'])

from selenium import webdriver
import time
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://www.weather.gov/')
messageField = driver.find_element_by_xpath('//*[#id="inputstring"]')
messageField.click()
messageField.send_keys('75252')
time.sleep(3)
showMessageButton = driver.find_element_by_xpath('//*[#id="btnSearch"]')
showMessageButton.click()
WebDriverWait(driver, 10).until(EC.url_contains("https://forecast.weather.gov/MapClick.php")) # here you are waiting until url will match your output pattern
currentURL = driver.current_url
print(currentURL)
time.sleep(3)
driver.quit()
#web scraping Part:
res= requests.get(currentURL)
....

Sentiment analysis using TextBlob for Dutch language

I want to do sentiment analysis of a list of tweets fetched based on a particular keyword. The tweets coming in are mostly in dutch language and TextBlob needs them converted to English in order to compute the tweet's polarity and subjectivity value. How can I convert the tweet to English language? I basically need a FREE API to do the translation. Having trouble using the MS Bing Translator. I have tried using goslate , langdetect , translate and translation libraries but none of them worked. Here's the code that I am using :
#!/usr/bin/env python
import tweepy
import goslate
from langdetect import detect
from translation import baidu, google, youdao, iciba
from translate import Translator
import os
import time
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
t=time.time()
#karan's api keys
consumer_key = 'xxx'
consumer_secret = 'xxx'
access_key = 'xxx'
access_secret = 'xxx'
gs=goslate.Goslate()
translator= Translator(to_lang="en")
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
search_results = api.search(q="football", count=2, geocode="52.132633,5.2912659999999505,300km")
f=open('tweets_football.txt','wb')
for i in range(0,len(search_results)):
try:
print search_results[i].text
print search_results[i].id
print search_results[i].user.screen_name
trans=search_results[i].text
#print(gs.translate(trans,'en'))
print(translator.translate(trans))
if search_results[i].text not in search_results:
f.write(search_results[i].text)
f.write("\n")
print "Written to file!"
except Exception as e:
print str(e)
f.close()
print time.time()-t
Please point me in the right direction. If there's an easier method for this process, please suggest that as well. Thanks in advance.

You can try this code:
from translate import Translator
from googletrans import Translator
text = 'hallo_allemaal'
entext = Translator().translate(text, src='nl',dest='en').text
print(entext)
Google API has some limitations of number of hits from one IP. if you get that error. Check out for that too.

There is a Ducth package for textblob as textblob-nl. You can install it via pip install textblob
Official sample:
from textblob import TextBlob
from textblob_nl import PatternTagger, PatternAnalyzer
text = u"De kat wil wel vis eten maar geen poot nat maken."
blob = TextBlob(text, pos_tagger=PatternTagger(), analyzer=PatternAnalyzer())
blob.sentiment
>>> (-0.1, 0.4)
You can find more information: https://github.com/gvisniuc/textblob-nl

Selenium - chromeDriver u'unknown error : chrome field to start

I'm pretty new in selenium and getting an error with ChromeWebDriver.
I'm using: Chrome 36, ChromeWebDriver 2.10, Windows 7
Here's my code:
from selenium import webdriver
webD = webdriver.Chrome();
But I get the response
unknown error : chrome field to start
How can I fix this?

You may need to download the chrome executable driver from http://chromedriver.storage.googleapis.com/index.html and set the executable path accordingly.
Sample Python Code :
import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
chromedriver = "./chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
#driver = webdriver.Firefox()
driver.get("http://www.python.org")
print driver.title
assert "Python" in driver.title
For more information and end to end script follow
Reference

Python requests 503 erros when trying to access localhost:8000

I am facing a bit of a situation,
Scenario: I got a django rest api running on my localhost:8000 and I want to access the api using my command line. I have tried urllib2 and python requests libs to talk to the api but failed(i'm getting a 503 error). But when I pass google.com as the url, I am getting the expected response. So I believe my approach is correct but I'm doing something wrong. please see the code below :
import urllib, urllib2, httplib
url = 'http://localhost:8000'
httplib.HTTPConnection.debuglevel = 1
print "urllib"
data = urllib.urlopen(url);
print "urllib2"
request = urllib2.Request(url)
opener = urllib2.build_opener()
feeddata = opener.open(request).read()
print "End\n"
Envioroments:
OS Win7
python v2.7.5
Django==1.6
Markdown==2.3.1
colorconsole==0.6
django-filter==0.7
django-ping==0.2.0
djangorestframework==2.3.10
httplib2==0.8
ipython==1.0.0
jenkinsapi==0.2.14
names==0.3.0
phonenumbers==5.8b1
requests==2.1.0
simplejson==3.3.1
termcolor==1.1.0
virtualenv==1.10.1
Thanks

I had a similar problem, but found that it was the company's proxy that was preventing from pinging myself.
503 Reponse when trying to use python request on local website
Try:
>>> import requests
>>> session = requests.Session()
>>> session.trust_env = False
>>> r = session.get("http://localhost:5000/")
>>> r
<Response [200]>
>>> r.content
'Hello World!'

If you are registering your serializers with DefaultRouter then your api will appear at
http://localhost:8000/api/ for an html view of the index
http://localhost:8000/api/.json for a JSON view of the index
http://localhost:8000/api/appname for an html view of the individual resource
http://localhost:8000/api/appname/.json for a JSON view of the individual resource
you can check the response in your browser to make sure your URL is working as you expect.

Importing CSV to Django and settings not recognised

So i'm getting to grips with Django, or trying to. I have some code that isn't dependent on being called by the webpage - it's designed to populate the database with information. Eventually it will be set up as a cron job to run overnight. This is the first crack at it, which is to do an initial population (once I have that working, I'll move to an add structure, where only new records are pushed.) I'm using Python 2.7, Django 1.5 and Sqlite3. When I run this code, I get
Requested setting DATABASES, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.
That seems fairly obvious, but I've spent a couple of hours now trying to work out how to adjust that setting. How do I call / open a connection / whatever the right terminology is here? I have a number of functions like this that will be scheduled jobs, and this has been frustrating me all afternoon.
import urllib2
import csv
import requests
from django.db import models
from gmbl.models import Match
master_data_file = urllib2.urlopen("http://www.football-data.co.uk/mmz4281/1213/E0.csv", "GET")
data = list(tuple(rec) for rec in csv.reader(master_data_file, delimiter=','))
for row in data:
current_match = Match(matchdate=row[1],
hometeam=row[2],
awayteam = row [3],
homegoals = row [4],
awaygoals = row[5],
homeshots = row[10],
awayshots = row[11],
homeshotsontarget = row[12],
awayshotsontarget = row[13],
homecorners = row[16],
awaycorners = row[17])
current_match.save()
I had originally started out with http://django-csv-importer.readthedocs.org/en/latest/ but I had the same error, and the documentation doesn't make much sense trying to debug it. When I tried calling settings.configure in the function, it said it didn't exist; presumably I had to import it, but couldn't make that work.

Make sure Django, and your project are in PYTHONPATH then you can do:
import urllib2
import csv
import requests
from django.core.management import setup_environ
from django.db import models
from yoursite import settings
setup_environ(settings)
from gmbl.models import Match
master_data_file = urllib2.urlopen("http://www.football-data.co.uk/mmz4281/1213/E0.csv", "GET")
data = list(tuple(rec) for rec in csv.reader(master_data_file, delimiter=','))
# ... your code ...
Reference: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/
Hope it helps!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Python 2.7 Selenium unable to extract data - python-2.7

Related

How can I add my web scrape process using bs4 to selenium automation in Python to make it one single process which just asks for a zipcode?

Sentiment analysis using TextBlob for Dutch language

Selenium - chromeDriver u'unknown error : chrome field to start

Python requests 503 erros when trying to access localhost:8000

Importing CSV to Django and settings not recognised

Categories

Resources