With selenium I do not get the data - python-2.7

I have with successfully navigated to an iframe with selenium + phantomJS but I do not get the data.
If I look the iframe url in Midori browser I can see the result.
But with webdriver without the table.
Here is my test code:
link = 'http://ebelediye.fatih.bel.tr/alfa/servlet/hariciprogramlar.online.rayic?caid=1449'
def get_site():
driver = webdriver.PhantomJS()
driver.get(link)
driver.find_element_by_name('btnlistele').click()
src = driver.find_element_by_tag_name('iframe').get_attribute('src')
driver.get(src)
print driver.page_source

This seem to be security issue, because of the high frequency you're sending requests.
FloodGuard Güvenlik uyarısı !!!
Bu kadar sık istek gönderemezsiniz !!!
Just add some delay as below:
import time
link = 'http://ebelediye.fatih.bel.tr/alfa/servlet/hariciprogramlar.online.rayic?caid=1449'
def get_site():
driver = webdriver.PhantomJS()
driver.get(link)
time.sleep(1)
driver.find_element_by_name('btnlistele').click()
src = driver.find_element_by_tag_name('iframe').get_attribute('src')
driver.get(src.replace('ISSK_KOD=', 'ISSK_KOD=999'))
print driver.page_source

Related

Flask - Generated PDF can be viewed but cannot be downloaded

I recently started learning flask and created a simple webapp which randomly generates kids' math work sheets in PDF based on user input.
The PDF opens automatically in a browser and can be viewed. But when I try downloading it both on a PC and in Chrome iOS, I get error messages (Chrome PC: Failed - Network error / Chrome iOS:the file could not be downloaded at this time).
You can try it out here: kidsmathsheets.com
I suspect it has something to do with the way I'm generating and returning the PDF file. FYI I'm using ReportLab to generate the PDF. My code below (hosted on pythonanywhere):
from reportlab.lib.pagesizes import A4, letter
from reportlab.pdfgen import canvas
from reportlab.platypus import Table
from flask import Flask, render_template, request, Response
import io
from werkzeug import FileWrapper
# Other code to take in input and generate data
filename=io.BytesIO()
if letter_size:
c = canvas.Canvas(filename, pagesize=letter)
else:
c = canvas.Canvas(filename, pagesize=A4)
pdf_all(c, p_set, answer=answers, letter=letter_size)
c.save()
filename.seek(0)
wrapped_file = FileWrapper(filename)
return Response(wrapped_file, mimetype="application/pdf", direct_passthrough=True)
else:
return render_template('index.html')
Any idea what's causing the issue? Help is much appreciated!
Please check whether you are using an ajax POST request for invoking the endpoint to generate your data and display the PDF respectively. If this is the case - quite probably this causes the behaviour our observe. You might want to try invoking the endpoint with a GET request to /my-endpoint/some-hashed-non-reusable-id-of-my-document where some-hashed-non-reusable-id-of-my-documentwill tell the endpoint which document to serve without allowing users to play around with guesstimates about what other documents you might have. You might try it first like:
#app.route('/display-document/<document_id>'):
def display_document(document_id):
document = get_my_document_from_wherever_it_is(document_id)
binary = get_binary_data_from_document(document)
.........
Prepare response here
.......
return send_file(binary, mimetype="application/pdf")
Kind note: a right click and 'print to pdf' will work but this is not the solution we want

PhantomJS with selenium doesnt scroll to bottom

With Firefox driver, the below code works(scrolls to bottom of page) but not with PhantomJS webdriver. The below page has infinity scroll so I need to scroll down to gather more information. Kindly help me identify why this doesnt work with phantomjs.
driver = webdriver.PhantomJS()
driver.maximize_window()
driver.get("http://www.betpawa.co.ke/upcoming")
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "div.events-wrapper")))
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
soup = BeautifulSoup(driver.page_source.encode('utf-8'),"html.parser")
print len(soup.findAll("div", {"class":"prematch"}))

web scraping with BeautifulSoup and python

I'm triyng to print out all the Ip address from this website https://hidemy.name/es/proxy-list/#list
but nothing happens
code in python 2.7:
import requests
from bs4 import BeautifulSoup
def trade_spider(max_pages): #go throw max pages of the website starting from 1
page = 0
value = 0
print('proxies')
while page <= 18:
value += 64
url = 'https://hidemy.name/es/proxy-list/?start=' + str(value) + '#list' #add page number to link
source_code = requests.get(url) #get website html code
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'html.parser')
for link in soup.findAll('td',{'class': 'tdl'}): #get the link of this class
proxy = link.string #get the string of the link
print(proxy)
page += 1
trade_spider(1)
You don't seeing any output because there is no matching elements in your soup.
I've tried to dump all the variables to output stream and figured out that this website is blocking crawlers. Try to print plain_text variable. It'll probably only contain warning message like:
It seems you are bot. If so, please use separate API interface. It
cheap and easy to use.

need to get the exact redirect link

I need to get the final url of the link. But this code is only giving me a link to its store
It is returning me the link: http://www.amazon.in/electronics/b?ie=UTF8&node=976419031
But what I need is: http://www.amazon.in/Samsung-G-550FY-On5-Pro-Gold/dp/B01FM7GGFI?tag=prdeskdetailmob-21&ascsubtag=desktop-mobile-15920-blank-27092016
import mechanize
br = mechanize.Browser()
br.open("https://priceraja.com/r/go2store.php?mpc=mobile--1178916--15920--deskdetail")
br.select_form(nr=0)
br.submit()
x=br.geturl()
print x
from selenium import webdriver
chrome_path = r"C:\Users\Bhanwar\Desktop\price raja mobile\working\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
link = "https://priceraja.com/r/go2store.php?mpc=mobile--1185105--15236--deskdetail"
driver.get(link)
while(link == driver.current_url):
time.sleep(3)
redirected_url = driver.current_url
print redirected_url

How to cronly fill ajax form in django app?

I have following problem:
I need my app to
1. Go to https://airqualityegg.wickeddevice.com/download and fill out the form
(I found Django-cron package. Is it good idea to use it?)
2. Wait, while after js effects "Download file" button will appear
3. Download .zip file by clicking on that button
4. Extract from this .zip archive .csv file and work with it.
every 1 hour
What I gonna do?
I've used selenium webdriver:
def get_zipfile_file_link(self):
display = Display(visible=0, size=(1024, 768))
display.start()
driver = webdriver.Firefox()
driver.get("https://airqualityegg.wickeddevice.com/download")
driver.find_element_by_id("serial_numbers").send_keys(self.egg_id)
driver.find_element_by_id("start_date").send_keys(timezone.now() - timedelta(hours=5))
driver.find_element_by_id("zipfilename").send_keys("q")
driver.find_element_by_id("download_submit").click()
WebDriverWait(driver, 150).until(lambda d: d.find_element_by_xpath('/html/body/table/tbody/tr[9]/td[2]/a'))
url = driver.find_element_by_xpath('/html/body/table/tbody/tr[9]/td[2]/a').get_attribute('href')
driver.close()
display.stop()
return url