Cannot find a link - python-2.7

I am trying to click a tab (Regulatory Regional) on a webpage: https://www5.fdic.gov/idasp/advSearchLanding.asp
However, it does not recognize the command. Here, I have attached the code.
import urllib2
import urllib
from bs4 import BeautifulSoup
import subprocess
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome("/usr/local/bin/chromedriver")
import time
s1_url = 'https://www5.fdic.gov/idasp/advSearchLanding.asp'
browser.get(s1_url)
Problem: choose regulatory regional tab but it does not click it.
browser.find_element_by_xpath('//[#id="Banks_Regulatory_Tab"]/a').click()
Got an exception:
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id="Banks_Regulatory_Tab"]/a"}

Required element located inside an iframe. To be able to handle it you need to switch to that iframe:
browser.switch_to.frame("content")
browser.find_element_by_link_text("Regulatory Regional").click()

Related

Open link using Selenium on new page

I am clicking the link "Images" on a new page (after searching 'bugs bunny') on Google. It is not retrieving images of the search, rather it is opening the link 'Images' on the old page.
My Code:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get('http://www.google.com')
search = browser.find_element_by_name('q')
search.send_keys("bugs bunny")
search.send_keys(Keys.RETURN) # hit return after you enter search text
browser.current_window_handle
print(browser.current_url)
browser.find_element_by_link_text("Images").click()
Your problem is you are using send_keys, which perform the action and don't wait
search.send_keys(Keys.RETURN) # hit return after you enter search text
So after that if you use click it is doing it nearly on the current page even when the results are not loaded. So you need to add some delay for the return key to change the results and once the results are loaded, you can do the click
So what you need is a simple sleep delay

Trying to automate to buy a shoe from Amazon

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.amazon.com/")
driver.find_element_by_partial_link_text("Sign in").click()
driver.find_element_by_name("email").send_keys("** UR EMAILID **")
driver.find_element_by_name("password").send_keys(" ** UR PASSWORD **")
driver.find_element_by_id("signInSubmit").click()
driver.find_element_by_id("twotabsearchtextbox").send_keys("Mens shoes")
driver.find_element_by_css_selector("#nav-search > form > div.nav-right > div > input").click()
driver.find_element_by_partial_link_text("Fashion Sneakers").click()
dropdown = driver.find_element_by_css_selector("#native_dropdown_selected_size_name")
select = Select(dropdown)
select.select_by_value("4,B01CE7QQPY")
driver.find_element_by_xpath("//*[#id='add-to-cart-button']").click()
I'm able to login, go to mens shoes and select fashion sneakers and select a type of shoe with particular size. Despite selecting its size, I'm not allowed to add to cart. The page says "select the size form the left to add to shopping cart". There are no errors on terminal/command line, but I'm unable to proceed. I have added a screenshot of the screen for reference as image below:
[]

Python and Beautiful Soup Web Scraping

I am trying to scrape the stats off the table on this webpage: http://stats.nba.com/teams/traditional/ but I am unable to find the html for the table. This is in python 2.7.10.
from bs4 import BeautifulSoup
import json
import urllib
html = urllib.urlopen('http://stats.nba.com/teams/traditional/').read()
soup = BeautifulSoup(html, "html.parser")
for table in soup.find_all('tr'):
print(table)
This is the code I have now, but nothing is being outputted.
If I try this with different elements on the page it works fine.
The table is loaded dynamically, so when you grab the html, there are no tr tags in it to be found.
The table you're looking for is NOT in that specific page/URL.
The stats you're trying to scrape come from this url:
http://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2016-17&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=
When you browse a webpage/url in a modern browser, more requests are made "behind the scene" other than the original url you use to fully render the whole page.
I know this sounds counter-intuitive, you can check out this answer for a bit more detailed explanation.
Try this code. It is giving me the HTML code. I am using requests to obtain information.
import datetime
import BeautifulSoup
import os
import sys
import pdb
import webbrowser
import urllib2
import requests
from datetime import datetime
from requests.auth import HTTPBasicAuth
from HTMLParser import HTMLParser
from urllib import urlopen
from bs4 import BeautifulSoup
url="http://stats.nba.com/teams/traditional/"
data=requests.get(url)
if (data.status_code<400):
print("AUTHENTICATED:STATUS_CODE"+" "+str(data.status_code))
sample=data.content
soup=BeautifulSoup(sample,'html.parser')
print soup
You can use selenium and PhantomJS (or chomedriver, firefox etc.) to load the page, thereby also loading all the javascript. All you need is to download selenium and the PhantomJS webdriver, then place a sleep timer after the get(url) to ensure that the page loads (actually, using a function such as WebDriverWait would be much better than sleep, but you can look more into that if you need it). Now your soup content will look exactly like that what you see when looking at the site through your browser.
from bs4 import BeautifulSoup
from selenium import webdriver
from time import sleep
url = 'http://stats.nba.com/teams/traditional/'
browser = webdriver.PhantomJS('*path to PhantomJS driver')
browser.get(url)
sleep(10)
soup = BeautifulSoup(browser.page_source, "html.parser")
for table in soup.find_all('tr'):
print(table)

StaleElementReferenceException occurs during scraping infinite scroll with Selenium in Python

I am trying to scroll down an infinite scroll page and get the links of news. The problem is when I scrolled down the page for let say 100 times, and I tried to get the links, Python launched an error that says: "StaleElementReferenceException: Message: stale element reference: element is not attached to the page document". I think its because the page is get updated and scrolled page is not available any more. here is my code for scrolling the page with Selenium Webdriver:
import urllib2
from bs4 import BeautifulSoup
from __future__ import print_function
from selenium import webdriver #open webdriver for specific browser
from selenium.webdriver.common.keys import Keys # for necessary browser action
from selenium.webdriver.common.by import By # For selecting html code
import time
driver = webdriver.Chrome('C:\\Program Files (x86)\\Google\\Chrome\\chromedriver.exe')
driver.get('http://seekingalpha.com/market-news/top-news')
for i in range(0,100):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(15)
URL = driver.find_elements_by_class_name('market_current_title')
print URL
and the code for getting the URLs
for a in URL:
links = a.get_attribute('href')
print(links)
I am wondering if there is any solution to settle this problem or it is possible to get URLs for this specific page with request library, as I couldn't do that.

scrapy-linkedin for LinkedIn data extraction

I'm using scrapy-0.16 for data extraction from LinkedIn.
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.http import Request
from scrapy import log
from linkedin.items import LinkedinItem, PersonProfileItem
from os import path
from linkedin.parser.HtmlParser import HtmlParser
import os
import urllib
from bs4 import UnicodeDammit
from linkedin.db import MongoDBClient
https://github.com/pondering/scrapy-linkedin
The error comes
Traceback (most recent call last):
File "C:\Users\TAWANE DUDEZ\Desktop\linkedin\linkedin\spiders\LinkedinSpider.py", line 6, in <module>
from linkedin.items import LinkedinItem, PersonProfileItem
ImportError: No module named linkedin.items
Cannot find linkedin.items module.
My suspicion is that you're trying to run the scrapy crawl LinkedinSpider command from the wrong directory. Try navigating to C:\Users\TAWANE DUDEZ\Desktop\linkedin and then running the command again.
Since the crawler is now starting, you also need to be running a MongoDB instance before starting the crawl. The README of the github project being used says to typemongod to start an instance. Just to check, you do have MongoDB and pymongo installed right?