scraping the text from source code using python

scraping the text from source code using python - python-2.7

I'm trying to scrape google search results using python and selenium. I'm able to get only the first search result. Here is the code I'm using.
driver.get(url)
res = driver.find_elements_by_css_selector('div.g')
link = res[0].find_element_by_tag_name("a")
href = link.get_attribute("href")
How can I get all the search results?

Try to get list of links (from first page only. If you need to scrape more pages, you need to click "Next" button in a loop and append results from following pages) as below:
href = [link.get_attribute("href") for link in driver.find_elements_by_css_selector('div.g a')]
P.S. You also might use solutions from this question to get results as GET request response with requests lib

Related

How to link to html page at specific ID bookmark?

Using Diagrams.net (draw.io), I would like to link specific elements to web pages. This is easily accomplished currently by creating a link for the element (say a rectangle).
However, I would like to navigate directly to a specific id bookmark in the HTML page. I cannot seem to get that to work.
For example, if I try to use this syntax (which works in the browser location bar):
https://en.wikipedia.org/wiki/Canada#Geography
I will be taken to the main page:
https://en.wikipedia.org/wiki/Canada
However, the goal is to go to the "Geography" section of this page.
I have also tried the json syntax without any success:
data:action/json,{"actions":[{"open":"https://en.wikipedia.org/wiki/Canada#Geography"}]}
I have also played with different action syntax such as:
data:action/json,{"actions":[{"open":"https://en.wikipedia.org/wiki/Canada"},{"scroll":{"tags":["Geography"]}}]}
Note: I'm using the diagrams.net desktop version 14.1.8.
Thank you for taking the time to read this question.
Paul

On Windows this only seems to work if the browser isn't already open. There is not much we can do to fix this as we're passing the link to the OS.

Element only available after inspect element in selenium

I am trying to get the contact names from here and I'm facing a very strange problem. The content is visible in the browser but when I use selenium to find the element using xpath, I get no data. As soon as I click inspect element, selenium will find the data.
mydriver = webdriver.Chrome()
print 'Webdriver Started'
mydriver.get('http://listings.fta-companies-au.com/l/101662595/BNP-Paribas-in-Sydney-NSW')
contact_persons = mydriver.find_elements_by_xpath('//div[#class="data-block is-editable no-header"]//div[#class="srp-float-wrap flt-scroll-wrap"]//table[#class="srp-widget-table"]/tbody/tr')
for p in contact_persons:
print p.text
When I just load and try find the data, it will return an empty list but as soon as I click inspect element, I'll get the required data.
I've also tried using requests and lxml to parse but and they too return empty data.

Seems that the details table is only setup when you scroll to it. Try moving to the h2 tag with text -- Employees and Executives, using driver.moveto.... function. This should make the details available.

content empty when using scrapy

Thanks for everyone in advance.
I encountered a problem when using Scrapy on Python 2.7.
The webpage I tried to crawl is a discussion board for Chinese stock market.
When I tried to get the first number "42177" just under the banner of this page (the number you see on that webpage may not be the number you see in the picture shown here, because it represents the number of times this article has been read and is updated realtime...), I always get an empty content. I am aware that this might be the dynamic content issue, but yet don't have a clue how to crawl it properly.
The code I used is:
item["read"] = info.xpath("div[#id='zwmbti']/div[#id='zwmbtilr']/span[#class='tc1']/text()").extract()
I think the xpath is set correctly and I have checked the return value of this response and it indeed told me that there is nothing under this directory. Results shown here:'read': [u'<div id="zwmbtilr"></div>']
If it has something, there should be something between <div id="zwmbtilr"> and </div>.
Really appreciated if you guys share any thoughts on this!

I just opened your link in Firefox with NoScript enabled. There nothing inside the <div #id='zwmbtilr'></div>. If I enable the javascripts, I can see the content you want. So, as you already new, it is a dynamic content issue.
Your first option is try to identify the request generated by javascript. If you can do that, you can send the same request from scrapy. If you can't do it, the next option is usually to use some package with javascript/browser emulation or someting like that. Something like ScrapyJS or Scrapy + Selenium.

Find and click links in ugly table with Python and Selenium webdriver

I'm trying to get Selenium Webdriver to click x number of links in a table, and I can't get it to work. I can print the links like this:
links = driver.find_elements_by_xpath("//table[2]/tbody/tr/td/p/strong/a")
for i in range(0,len(links)):
print links[i].text
But when I try to do a links[i].click() instead of printing, python throws me an error.
The site uses JSP and the hrefs of the links looks like this "javascript:loadTestResult(169)"
This is a sub/sub-page and not possible to access by direct URL, and the table containing the links are very messy and large so instead of pasting the whole source here I saved the page on this url.
http://wwwe.aftonbladet.se/redaktion/martin/badplats.html
(I'm hunting the 12 blue links in the left column)
Any ideas?
Thanks
Martin

Sorry, to trigger happy.
Simple solution to my own problem:
linkList = driver.find_elements_by_css_selector("a[href*='loadTestResult']")
for i in range(0,len(linkList)):
links = driver.find_elements_by_css_selector("a[href*='loadTestResult']")
links[i].click()

Using Selenium Python on Google page to click links

Trying to write a very simple script in Selenium Python. I am opening Google page with a search string then I am not able to locate any of the HTML element like "Images", "Maps" etc of any of the links appearing as a part of search. Though I am using Firebug. But only one thing worked and that is following
links = driver.find_elements_by_tag_name ("a")
for link in links:
print ("hello")
What to do if I want to click on "Images" or "Maps"?
What to do if I want click on 1st, 2nd or a particular numbered link or click the link by partial text ?
Any help would be appreciated.

Something like:
driver.get('http://www.google.com')
driver.find_element_by_xpath('//a[starts-with(#href,"https://maps.google")]').click()
But please note that your browser would often redirect 'http://www.google.com' to a slightly different URL, and that the web-page it displays might be slightly different.

What to do if i want to click on images or maps?
Images:
img = driver.find_element_by_css_selector('img#myImage').click()
Maps:
map = driver.find_element_by_css_selector("map[for='myImage']").click()
1st, second or N link:
link = driver.find_elements_by_tag_name('a')[n].click() -- or
link = driver.find_elements_by_css_selector("div#someParent > a:nth-child(n)").click()

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

scraping the text from source code using python - python-2.7

Related

How to link to html page at specific ID bookmark?

Element only available after inspect element in selenium

content empty when using scrapy

Find and click links in ugly table with Python and Selenium webdriver

Using Selenium Python on Google page to click links

Categories

Resources