how to download from an html link (href) in python? - python-2.7

how to download movie from a link (that normally start with click ) this is the html code for the download File in the web page. i am looking to do so in python code as a client that download multiply times the movie but not saving it (just simulating traffic on the web page)

In case you have the url:
import requests
url="http://....."
response = requests.get(url)
You can print the response or parse it:
response.headers is dict of the headers response.
content is the content of the response

Related

how to send csv file from postman- post request- to CherryPy

i have post request to my python code. in the json body I need to send 1 parameter. and then I need to upload csv file.
I have 2 questions: 1. how to upload the csv from postman side 2. how to get it in my python code.
attached my post request and my python code.
post request screen shot
my code in python.
#cherrypy.tools.json_in()
#cherrypy.tools.json_out()
#cherrypy.tools.accept(media='application/json')
def POST(self):
body = cherrypy.request.json
If you want to use JSON to upload data. You need to convert csv file in to base64 string before you post your data. Because JSON format does not support file.
If you just want to get your data, you can select "form-data" in postman.
Code for 'form-data':
import cherrypy
#cherrypy.expose
#cherrypy.tools.json_out()
def uploadcsv(self, img=None, other=None):
print(img)
print(other)
return 'ok'
cherrypy.quickstart(HelloWorld())
Image of postman setting:

How to make a request to get minimum data from server?

I want to make a HTTP request, so that I get minimum data from the server. For eg : If the user device is a mobile, the server will send less data.
I was doing this in python ::
req = urllib2.Request(__url, headers={'User-Agent' : "Magic Browser"})
html = urllib2.urlopen(req).read()
But it still takes some time to download all this.
If it helps this is the domain from which I want to download pages : https://in.bookmyshow.com
Is there any other way so that I can download a page, quickly with minimum data? Is it even possible?
you can use request for upload files get datas example for get cookies:
import requests
r = requests.get('https://in.bookmyshow.com')
print r.cookies.get_dict()
or for upload file:
import requests
file = {'file':('filename.txt', open('filename.txt', 'r'), multipart/from-data)}
data = {
"ButtonValueNameInHtml" : "Submit",
}
r = requests.post('https://in.bookmyshow.com', files=file, data=data)
replace in.bookmyshow.com by your own url
you can do many Thigs With requests

a web server developed by python, how to transfer a image file, and can show in a web browser?

I want to develop a simple web server using python to handle some simple http request. I have learn how to response the request, such as transferring html pages or transferring some other file. When I transfer a image file, a client use a browser to get the file, the url is like below:
http://114.212.82.104:8080/1.png
I set 'Content-Type = application/x-png'. But the browser directly download the file, and can not display in the browser. Not like the image below
https://www.baidu.com/img/bd_logo1.png
it can display in the browser. How to display the image in the browser?
Can someone help me?
and i know i can encode the image file into html page to fix it. code like below:
class RequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
def do_GET(self):
path = os.getcwd()+ self.path
if os.path.isfile(path):
with open(path,'rb') as fileTrans:
content = fileTrans.read().encode('base64').replace('\n','')
#self.sendContent(200, content)
self.send_response(200)
page = "<p>\"fef\"</p><img src=\"data:image/jpg;base64,{0}\"/>"
contentPage = page.format(content)
self.send_header('Content-Type', 'text/html')
self.send_header("Content-Length", str(len(contentPage)))
self.end_headers()
self.wfile.write(contentPage)
else:
self.sendContent(404,"file do not exists")
But I know there must be another way, i see the source code of URL(https://www.baidu.com/)
it just use
<img hidefocus="true" src="//www.baidu.com/img/bd_logo1.png" width="270" height="129"></div><a href="/" id="result_logo" onmousedown="return c({'fm':'tab','tab':'logo'})">
different from my page:
<p>"fef"</p><img src="......
OK,I think I have solved this problem.
Just set the attribute - 'Content-Type' of header as 'image/png' instead of 'application/x-png'.

Test scrapy spider still working - find page changes

How can I test a scrapy spider against online data.
I now from this post that it is possible to test a spider against offline data.
My target is to check if my spider still extracts the right data from a page, or if the page changed. I extract the data via XPath and sometimes the page receives and update and my scraper is no longer working. I would love to have the test as close to my code as possible, eg. using the spider and scrapy setup and just hook into the parse method.
Referring to the link you provided, you could try this method for online testing which I used for my problem which was similar to yours. All you have to do is instead of reading the requests from a file you can use the Requests library to fetch the live webpage for you and compose a scrapy response from the response you get from Requests like below
import os
import requests
from scrapy.http import Response, Request
def online_response_from_url (url=None):
if not url:
url = 'http://www.example.com'
request = Request(url=url)
oresp = requests.get(url)
response = TextResponse(url=url, request=request,
body=oresp.text, encoding = 'utf-8')
return response

Extract "Liked" songs from Pandora using python

I am attempting to use Python's urllib2 to extract info on my "liked" tracks in Pandora. I'm getting discrepencies when comparing the HTML yielded from the following code and the HTML seen via Chrome's inspect element:
import urllib2
headers={ 'User-Agent' : 'Mozilla/5.0' }
url='http://www.pandora.com/profile/likes/myusername'
request=urllib2.Request(url,None,headers)
response = urllib2.urlopen(request)
html = response.read()
I'm thinking this might be due to the lack of authentication even though I'm still able to load the same page logged out using Chrome's incognito mode.
So I added the following lines to attempt to use basic authentication on my request:
SERVER='pandora.com'
authinfo = urllib2.HTTPPasswordMgrWithDefaultRealm()
authinfo.add_password(None, SERVER, "login", "password")
handler=urllib2.HTTPBasicAuthHandler(authinfo)
myopener=urllib2.build_opener(handler)
opened=urllib2.install_opener(myopener)
headers={ 'User-Agent' : 'Mozilla/5.0' }
url='http://www.pandora.com/profile/likes/chris.r.armstrong'
request=urllib2.Request(url,None,headers)
response = urllib2.urlopen(request)
html = response.read()
Still not getting the right HTML response back. Any suggestions?
The DOM (HTML page), you see inside the browser is not the payload of the HTTP request. Once an HTTP request is been made by a browser, depending on how complex a page is, a number of transformations happen. At the basic level, the parser might reorder and/or reorganize the content as mandated by HTML5 parsing algorithm. Then JS scripts and XMLHttpRequests will modify and add content to the DOM.
If you really need the DOM as seen in the browser, you might want to use a webdriver for being able to get back what the browser sees and not only what the HTTP client sees.
Hope it helps.