I am attempting to use Python's urllib2 to extract info on my "liked" tracks in Pandora. I'm getting discrepencies when comparing the HTML yielded from the following code and the HTML seen via Chrome's inspect element:
import urllib2
headers={ 'User-Agent' : 'Mozilla/5.0' }
url='http://www.pandora.com/profile/likes/myusername'
request=urllib2.Request(url,None,headers)
response = urllib2.urlopen(request)
html = response.read()
I'm thinking this might be due to the lack of authentication even though I'm still able to load the same page logged out using Chrome's incognito mode.
So I added the following lines to attempt to use basic authentication on my request:
SERVER='pandora.com'
authinfo = urllib2.HTTPPasswordMgrWithDefaultRealm()
authinfo.add_password(None, SERVER, "login", "password")
handler=urllib2.HTTPBasicAuthHandler(authinfo)
myopener=urllib2.build_opener(handler)
opened=urllib2.install_opener(myopener)
headers={ 'User-Agent' : 'Mozilla/5.0' }
url='http://www.pandora.com/profile/likes/chris.r.armstrong'
request=urllib2.Request(url,None,headers)
response = urllib2.urlopen(request)
html = response.read()
Still not getting the right HTML response back. Any suggestions?
The DOM (HTML page), you see inside the browser is not the payload of the HTTP request. Once an HTTP request is been made by a browser, depending on how complex a page is, a number of transformations happen. At the basic level, the parser might reorder and/or reorganize the content as mandated by HTML5 parsing algorithm. Then JS scripts and XMLHttpRequests will modify and add content to the DOM.
If you really need the DOM as seen in the browser, you might want to use a webdriver for being able to get back what the browser sees and not only what the HTTP client sees.
Hope it helps.
Related
To be more precise I have the following
path('',views.home,name='home'),
path('json', json, name="json"),
my home page calls the json file to plot a chart, however if user types in the url '.../json' they can access the json data file. I want users to not have access to the path /json but I still want my home page to access this json path.
Thanks
This is not quite possible because when your homepage is loaded by a user, the page tells them to request the json data from your server. The browser sees will make a request for the data, and there is almost no difference between the user going to http://yoursite.com and the browser making a call to http://yoursite.com/json.
One way you could try if the data is only loaded once is to append a single-use token to homepage (which is different for every request) and allow the browser to use this to allow the request.
def home(request):
# do everything
context['token'] = generate_token()
session['json_token'] = context['token']
return render()
def json(request):
if request.GET.get('token', '') == session['json_token']:
del session['json_token']
# generate_data
return data
return HttpResponse(status_code=405)
I am trying to set up a Django API that receives POST requests with some JSON data and basically sends emails to a list of recipients. The logic is rather simple:
First I have the view for when I create a blog post. In the template, I include the csrf_token as specified on the Django Documentation. When I hit the submit button, behind the scene the create-post view, in addition to creating the post, makes a request (I am using the requests module) to the API which is charged with sending the emails. This is the piece of logic the sends the request to the API:
data = {
"title": new_post.title,
"summary": new_post.summary,
"link": var["BASE_URL"] + f"blog/post/{new_post.slug}"
}
csrf_token = get_token(request)
# print(csrf_token)
headers = {"X-CSRFToken": csrf_token}
requests.post(var["BASE_URL"] + "_api/send-notification/", json=data, headers=headers)
As you can see I am adding the X-CSRFToken to the headers which I generate through the get_token() method, as per the Django docs. However, the response in the API is a 403 Forbidden status CSRF Token not set.
I have checked the headers in the request and the token is indeed present. In addition, I have been providing other routes to the API and I have been using it for AJAX calls which again is very simple just follow the Django docs and they work perfectly well.
The problem seems to arise when I make the call from within the view, AJAX calls are handle by Javascript static files, and as I said they work fine.
I have thought that Django didn't allow the use of 2 CSRF tokens on the same page (one for the submit form and the other in the view by get_token()), but that's not the problem.
This is typically the error I get:
>>> Forbidden (CSRF cookie not set.): /_api/send-notification/
>>> "POST /_api/send-notification/ HTTP/1.1" 403 2864
I have read several similar questions on SO but they mostly involved using the csrf_exempt decorator, which in my opinion is not really a solution. It just gets rid of the CRSF token usefulness altogether.
Does anyone have any idea what might be causing this problem?
Thanks
Error tries to tell you that you need to add token into cookie storage like that:
cookies = {'csrftoken': csrf_token}
requests.post(var["BASE_URL"] + "_api/send-notification/", json=data, headers=headers, cookies=cookies)
how to download movie from a link (that normally start with click ) this is the html code for the download File in the web page. i am looking to do so in python code as a client that download multiply times the movie but not saving it (just simulating traffic on the web page)
In case you have the url:
import requests
url="http://....."
response = requests.get(url)
You can print the response or parse it:
response.headers is dict of the headers response.
content is the content of the response
How can I test a scrapy spider against online data.
I now from this post that it is possible to test a spider against offline data.
My target is to check if my spider still extracts the right data from a page, or if the page changed. I extract the data via XPath and sometimes the page receives and update and my scraper is no longer working. I would love to have the test as close to my code as possible, eg. using the spider and scrapy setup and just hook into the parse method.
Referring to the link you provided, you could try this method for online testing which I used for my problem which was similar to yours. All you have to do is instead of reading the requests from a file you can use the Requests library to fetch the live webpage for you and compose a scrapy response from the response you get from Requests like below
import os
import requests
from scrapy.http import Response, Request
def online_response_from_url (url=None):
if not url:
url = 'http://www.example.com'
request = Request(url=url)
oresp = requests.get(url)
response = TextResponse(url=url, request=request,
body=oresp.text, encoding = 'utf-8')
return response
I am trying to create an admin command that will simulate some api calls associated with a view but I don't want to hard code the url, for example like that url='http://127.0.0.1:8000/api/viewname', in order to send the request.
If I use the reverse option I can obtain half the url /api/viewname.
If I try to post the request that way
url = reverse('name-of-view')
requests.post(url, data=some_data)
I get
requests.exceptions.MissingSchema: Invalid URL '/api/viewname/': No schema supplied. Perhaps you meant http:///api/viewname/?
Do I have to look whether the server is running on the localhost or is there a more generic way?
requests module needs the absolute url to post to. you need
url = 'http://%s%s' % (request.META['HTTP_HOST'], reverse('name-of-view'))
requests.post(url, data=some_data)