browser.click() & browser.send_keys() conflict - Selenium 3.0 Python 2.7 - python-2.7

I am currently trying to implement a subtitle downloader with the help of the http://www.yifysubtitles.com website.
The first part of my code is to click on the accept cookies button and then send keys to search the movie of interest.
url = "http://www.yifysubtitles.com"
profile = SetProfile() # A function returning my favorite profile for Firefox
browser = webdriver.Firefox(profile)
WindowSize(400, 400)
browser.get(url)
accept_cookies = WebDriverWait(browser, 100).until(
EC.element_to_be_clickable((By.CLASS_NAME, "cc_btn.cc_btn_accept_all")))
accept_cookies_btn = browser.find_element_by_class_name("cc_btn.cc_btn_accept_all")
accept_cookies_btn.click()
search_bar = browser.find_element_by_id("qSearch")
search_bar.send_keys("Harry Potter and the Chamber of Secrets")
search_bar.send_keys(Keys.RETURN)
print "Succesfully clicked!"
But it only works once - if not randomly. If I turn on my computer and run the code, it does click, make the search and print the last statement. The second time, it doesn't click but still make the search and print the final statement.
After each try, I close the session with the browser.quit() method.
Any idea on what might be the issue here?

Specify wait for button and search bar it should solve your problem.
Thanks,D

Related

Python win32com to forward a selected email with added content

Some time ago I wrote a simple python app which asks users for input and generates a new mail via Outlook app basing on the input. Now, I was asked to add some functionality so the app will no longer generate a new mail but it'll forward a selected email and add content to it. While I was able to write code which generates a new mail, I'm completely lost when I want to approach it with forwarding selected mails.
At the moment I use something like this to send a new email:
import win32com.client
from win32com.client import Dispatch
const=win32com.client.constants
olMailItem = 0x0
obj = win32com.client.Dispatch("Outlook.Application")
newMail = obj.CreateItem(olMailItem)
newMail.SentOnBehalfOfName = 'mail#mail.com'
newMail.Subject = ""
newMail.BodyFormat = 2
newMail.HTMLBody = output
newMail.To = ""
newMail.CC = ""
newMail.display()
And I know that by using something like this you can select an email in Outlook so Python can interact with it :
obj = win32com.client.Dispatch("Outlook.Application")
selection = obj.ActiveExplorer().Selection
How to merge these two together so the app will forward a selected email and add a new content on the top? I tried to find it out by trial and error, but finally, I gave up. Microsoft API documentation also was not very helpful for me as I was not really able to understand much of it (I'm not a dev). Any help appreciated.
Replace the line newMail = obj.CreateItem(olMailItem) with
newMail = obj.ActiveExplorer().Selection.Item(1).Forward()

Pepper: pass variable from Python to web JS

I'm programming an App for the Aldebaran's Pepper robot. I'm using Choregraphe and I made an html for displaying in robots tablet. I have made the boxes for displaying the web and I need to pass a variable from Python to the web Javascript.
Is there any way to do it?
The Python code is the same as the default of a Raise Event box, it receives a string "IMAGE" on his onStart input:
class MyClass(GeneratedClass):
def __init__(self):
GeneratedClass.__init__(self)
pass
def onLoad(self):
self.memory = ALProxy("ALMemory")
def onUnload(self):
self.memory = None
def onInput_onStart(self, p):
self.memory.raiseEvent(self.getParameter("key"), p)
self.onStopped(p)
def onInput_onStop(self):
self.onUnload() #~ it is recommended to call onUnload of this box in a onStop method, as the code written in onUnload is used to stop the box as well
pass
And the Javascript code is this:
$('document').ready(function(){
var session = new QiSession();
session.service("ALMemory").done(function (ALMemory) {
ALMemory.subscriber("PepperQiMessaging/totablet").done(function(subscriber) {
$("#log").text("AAA");
subscriber.signal.connect(toTabletHandler);
});
});
function toTabletHandler(value) {
$("#log").text("-> ");
}
});
It enters the first #log but not the second of JS.
yes, I think the webpage is loading too late to catch your event. One quick solution would be to send an event from Javascript when the page is ready, and wait for this event in your Python script. Once this event is received, then you know that the webpage is ready and you can send the "sendImage" event.
I solved the problem. I put a delay box of 2 seconds between the show HTML box and the sendImage box like in the image below:
I think the problem was that the string that is send to tabled was sent before the web was prepared to receive it, and the delay of 2 seconds (with 1 it doesn't work) the page have time to prepare for receiving data.

How to scrape hidden text from a web page?

I am trying to scrape some text from a web page. On my webpage there is a list of words being shown. Some of them are visible some others become visible when I click on "+ More". Once clicked, the list of words is always the same (same order same words). However, some of them are in bold some are in deleted. So basically each item of the database has some features. What I would like to do: for each item tell me which features are available and which not. My problem is to overcome the "+ More" button.
My script works fine only for those words which are shown and not for those which are hidden by "+ More". What I am trying to do is to collect all the words that follow under the node "del". I initially thought that through lxml, the web page would have been loaded as it appears in chrome inspect element and I wrote my code accordingly:
from lxml import html
tree = html.fromstring(br.open(current_url).get_data())
mydata={}
if len(tree.xpath('//del[text()='some text']')) > 0:
mydata['some text'] = 'text is deleted from the web page!'
else:
mydata['some text'] = 'text is not deleted'
Every time I ran this code what I can collect is actually part of data being shown on the web page, but not the complete list of words that would have been shown after clicking "+ More".
I had tried selenium, but as far as I understand it is not meant for parsing but rather to interact with the web page. However if I ran this:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.mywebpage.co.uk')
a = driver.find_element_by_xpath('//del[text()="some text"]')
I either get the element or an error. I would like to get an empty list so I could do:
mydata = {}
if len(driver.find_element_by_xpath('//del[text()="some text"]')) > 0:
mydata['some text'] = 'text is deleted from the web page!'
else:
mydata['some text'] = 'text is not deleted'
or find another way to get these "hidden" elements captured by the script.
My question is has anyone had this type of problem? How did them sorted it out?
If I understand correctly you want to find the element in a list. However Selenium throws an ElementNotFoundException if the element is not available on the page instead of returning a list.
The question I have is why do you want a list? Judging by your example you want to see if an element is present on the page or not. You can easily achieve this by using a try/except.
from selenium.common.exceptions import TimeoutException
try:
driver.find_element_by_xpath('//del[text()="some text"]')
mydata['some text'] = 'text is deleted from the web page!'
except TimeOutException:
mydata['some text'] = 'text is not deleted'
Now if you really really need this list you could search the page for multiple elements. This will return all the elements that match the locator in a list.
To do this replace:
driver.find_element_by_xpath('//del[text()="some text"]')
With (elements):
driver.find_elements_by_xpath('//del[text()="some text"]')

How to cronly fill ajax form in django app?

I have following problem:
I need my app to
1. Go to https://airqualityegg.wickeddevice.com/download and fill out the form
(I found Django-cron package. Is it good idea to use it?)
2. Wait, while after js effects "Download file" button will appear
3. Download .zip file by clicking on that button
4. Extract from this .zip archive .csv file and work with it.
every 1 hour
What I gonna do?
I've used selenium webdriver:
def get_zipfile_file_link(self):
display = Display(visible=0, size=(1024, 768))
display.start()
driver = webdriver.Firefox()
driver.get("https://airqualityegg.wickeddevice.com/download")
driver.find_element_by_id("serial_numbers").send_keys(self.egg_id)
driver.find_element_by_id("start_date").send_keys(timezone.now() - timedelta(hours=5))
driver.find_element_by_id("zipfilename").send_keys("q")
driver.find_element_by_id("download_submit").click()
WebDriverWait(driver, 150).until(lambda d: d.find_element_by_xpath('/html/body/table/tbody/tr[9]/td[2]/a'))
url = driver.find_element_by_xpath('/html/body/table/tbody/tr[9]/td[2]/a').get_attribute('href')
driver.close()
display.stop()
return url

Scraping data off flipkart using scrapy

I am trying to scrape some information from flipkart.com for this purpose I am using Scrapy. The information I need is for every product on flipkart.
I have used the following code for my spider
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy.selector import HtmlXPathSelector
from tutorial.items import TutorialItem
class WebCrawler(CrawlSpider):
name = "flipkart"
allowed_domains = ['flipkart.com']
start_urls = ['http://www.flipkart.com/store-directory']
rules = [
Rule(LinkExtractor(allow=['/(.*?)/p/(.*?)']), 'parse_flipkart', cb_kwargs=None, follow=True),
Rule(LinkExtractor(allow=['/(.*?)/pr?(.*?)']), follow=True)
]
#staticmethod
def parse_flipkart(response):
hxs = HtmlXPathSelector(response)
item = FlipkartItem()
item['featureKey'] = hxs.select('//td[#class="specsKey"]/text()').extract()
yield item
What my intent is to crawl through every product category page(specified by the second rule) and follow the product page(first rule) within the category page to scrape data from the products page.
One problem is that I cannot find a way to control the crawling and scrapping.
Second flipkart uses ajax on its category page and displays more products when a user scrolls to the bottom.
I have read other answers and assessed that selenium might help solve the issue. But I cannot find a proper way to implement it into this structure.
Suggestions are welcome..:)
ADDITIONAL DETAILS
I had earlier used a similar approach
the second rule I used was
Rule(LinkExtractor(allow=['/(.?)/pr?(.?)']),'parse_category', follow=True)
#staticmethod
def parse_category(response):
hxs = HtmlXPathSelector(response)
count = hxs.select('//td[#class="no_of_items"]/text()').extract()
for page num in range(1,count,15):
ajax_url = response.url+"&start="+num+"&ajax=true"
return Request(ajax_url,callback="parse_category")
Now i was confused on what to use for callback "parse_category" or "parse_flipkart"
Thank you for your patience
Not sure what you mean when you say that you can't find a way to control the crawling and scraping. Creating a spider for this purpose is already taking it under control, isn't it? If you create proper rules and parse the responses properly, that is all you need. In case you are referring to the actual order in which the pages are scraped, you most likely don't need to do this. You can just parse all the items in whichever order, but gather their location in the category hierarchy by parsing the breadcrumb information above the item title. You can use something like this to get the breadcrumb in a list:
response.css(".clp-breadcrumb").xpath('./ul/li//text()').extract()
You don't actually need Selenium, and I believe it would be an overkill for this simple issue. Using your browser (I'm using Chrome currently), press F12 to open the developer tools. Go to one of the category pages, and open the Network tab in the developer window. If there is anything here, click the Clear button to clear things up a bit. Now scroll down until you see that additional items are being loaded, and you will see additional requests listed in the Network panel. Filter them by Documents (1) and click on the request in the left pane (2). You can see the URL for the request (3) and the query parameters that you need to send (4). Note the start parameter which will be the most important since you will have to call this request multiple times while increasing this value to get new items. You can check the response in the Preview pane (5), and you will see that the request from the server is exactly what you need, more items. The rule you use for the items should pick up those links too.
For a more detail overview of scraping with Firebug, you can check out the official documentation.
Since there is no need to use Selenium for your purpose, I shall not cover this point more than adding a few links that show how to use Selenium with Scrapy, if the need ever occurs:
https://gist.github.com/cheekybastard/4944914
https://gist.github.com/irfani/1045108
http://snipplr.com/view/66998/