Python : urllib2 put request returns 301 error - python-2.7

I'm trying to make a put request via the urllib2 module of Python 2.7. When I perform a GET it works just fine but when I try to turn it into a PUT it returns me a 301 http error.
My code is above :
opener = urllib2.build_opener(urllib2.HTTPHandler)
req = urllib2.Request(reqUrl)
base64string = base64.encodestring('%s:%s' % (v_username, v_password)).replace('\n', '')
req.add_header("Authorization", "Basic %s" % base64string)
req.add_header("Content-Type", "application/rdf+xml")
req.add_header("Accept", "application/rdf+xml")
req.add_header("OSLC-Core-Version", "2.0")
req.get_method = lambda: 'PUT'
req.allow_redirects=True
url = opener.open(req)
If I suppress the line
req.get_method = lambda: 'PUT'
it works but it's a get request (or a post if I pass some data) but it has to be a PUT and I don't how to do it differently with this module.
The error is
urllib2.HTTPError: HTTP Error 301: Moved Permanently.
Does anyone understand this more than I do? I'm quite a newbie with REST request and there are some specificity that remains obscure to me.

I'm not certain, but could it be that urllib is handling the 301 automatically for the GET but not for the PUT? According to the RFC, user agents can redirect GETs automatically, but not PUTs.
This page seems to suggest that urllib does indeed handle the 301 redirection automatically, and it seems plausible it wouldn't automatically handle the PUT redirect given the RFC. Guess you should find out what the redirect is to and redirect there.

Thanks Ken F, you helped me understand the problem. I changed the handler directly in the urllib2.py file (not sure if it's very clean but whatever) so it can handle PUT requests:
if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
or code in (301, 302, 303) and m in ("POST", "PUT")):
Indeed, when the request was neither GET nor POST, it automatically raised an error. I'm surprised I couldn't find anyone else with the same issue.

Related

How to ignore HTTP Errors while scraping URLs with python 2.7

I'm crawling several URLs to find specific keywords in their source code. However, while crawling half of the websites, my spider suddenly stops due to HTTP errors like 404 or 503.
My crawler:
import urllib2
keyword = ['viewport']
with open('listofURLs.csv') as f:
for line in f:
strdomain = line.strip()
if strdomain:
req = urllib2.Request(strdomain.strip())
response = urllib2.urlopen(req)
html_content = response.read()
for searchstring in keyword:
if searchstring.lower() in str(html_content).lower():
print (strdomain, keyword, 'found')
f.close()
What code should I add to ignore bad URLs with HTTP errors and letting the crawler continue craping?
You can use a try-except block as demonstrated here. This allows you to apply your logic to the valid urls and apply different logic to the urls that give HTTP errors.
Applying the solution in the link to your code gives.
import urllib2
keyword = ['viewport']
with open('listofURLs.csv') as f:
for line in f:
strdomain = line.strip()
if strdomain:
req = urllib2.Request(strdomain.strip())
try:
response = urllib2.urlopen(req)
html_content = response.read()
for searchstring in keyword:
if searchstring.lower() in str(html_content).lower():
print (strdomain, keyword, 'found')
except urllib2.HTTPError, err:
# Do something here maybe print err.code
f.close()
This is the right solution for the code you've provided. However, eLRuLL makes a great point that you really should look at using scrapy for your web crawling needs.
I would recommend using Scrapy framework for crawling purposes

Why does the Python script to send data to Slack web hook not work when variable is pulled from a line?

Language: Python 2.7
Hello all. I found a really helpful script here: Python to Slack Web Hook
that shows how to send messages to a Slack web hook.
import json
import requests
# Set the webhook_url to the one provided by Slack when you create the webhook at https://my.slack.com/services/new/incoming-webhook/
webhook_url = 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX'
slack_data = {"text": "<https://alert-system.com/alerts/1234|Click here> for details!"}
response = requests.post(
webhook_url, data=json.dumps(slack_data),
headers={'Content-Type': 'application/json'}
)
if response.status_code != 200:
raise ValueError(
'Request to slack returned an error %s, the response is:\n%s'
% (response.status_code, response.text)
)
It works flawlessly when I run .py file.
Now, I have a file that has many lines of messages that I want to send to Slack. I have it formatted correctly already in the file, no spaces etc.. It's just a matter of grabbing it and passing it so slack_data = line1 etc..
So, I modify the file with something like this:
with open('export.txt', 'r') as e:
for line in e:
slack_data = line
Now if I do a print slack_data right after that, the information returns on the screen exactly as it should be, so I'm thinking it's good. I haven't began to get it working for each line yet, because it's not even working on the first line.
I get an invalid payload 400 when I run it.
EDIT: Slack support said the what they were receiving has escape characters inserted into for some reason.
"{\"text\": \"<https://alert-system.com/alerts/1234|Click here> for details!"}\n"
Any direction or assistance is appreciated.
Thanks!!
Just posting as it might help somebody. For me the below snippet worked:
data = json.dumps(slack_data)
response = requests.post(
URL, json={"text": data},
headers={'Content-Type': 'application/json'}
)
As #Geo pointed out the final payload that we are going to send should have keyword "text", else it will fail.
Moreover, in post method I have to replace data= with json= else it kept throwing error for invalid payload with 400
Since I already had the data preformatted in the file as JSON already, it was just a matter of removing json.dumps out of the code.
OLD:
#response = requests.post(webhook_url, data=json.dumps(slack_data), headers={'Content-Type': 'application/json'})
NEW:
response = requests.post(webhook_url, data=slack_data, headers={'Content-Type': 'application/json'})
Once I did that, everything worked like a charm.
If you change the code to this:
with open('export.txt', 'r') as e:
slack_data = e.read()
do you still get the 400?

Setting Spotify credentials using Spotipy

I am trying out spotipy with python 2.7.10 preinstalled on my mac 10.10, specifically [add_a_saved_track.py][1] Here is the code as copied from github:
# Add tracks to 'Your Collection' of saved tracks
import pprint
import sys
import spotipy
import spotipy.util as util
scope = 'user-library-modify'
if len(sys.argv) > 2:
username = sys.argv[1]
tids = sys.argv[2:]
else:
print("Usage: %s username track-id ..." % (sys.argv[0],))
sys.exit()
token = util.prompt_for_user_token(username, scope)
if token:
sp = spotipy.Spotify(auth=token)
sp.trace = False
results = sp.current_user_saved_tracks_add(tracks=tids)
pprint.pprint(results)
else:
print("Can't get token for", username)
I registered the application with developer.spotify.com/my-applications and received client_id and client_secret. I am a bit unclear about selection of redirect_uri so I set that to 'https://play.spotify.com/collection/songs'
Running this from terminal I get an error that says:
You need to set your Spotify API credentials. You can do this by
setting environment variables like so:
export SPOTIPY_CLIENT_ID='your-spotify-client-id'
export SPOTIPY_CLIENT_SECRET='your-spotify-client-secret'
export SPOTIPY_REDIRECT_URI='your-app-redirect-url'
I put that into my code with the id, secret, and url as strings, just following the imports but above the util.prompt_for_user_token method.
That caused a traceback:
File "add-track.py", line 8
export SPOTIPY_CLIENT_ID='4f...6'
^
SyntaxError: invalid syntax
I noticed that Text Wrangler does not recognize 'export' as a special word. And I searched docs.python.org for 'export' and came up with nothing helpful. What is export? How am I using it incorrectly?
I next tried passing the client_id, client_secret, and redirect_uri as arguments in the util.prompt_for_user_token method like so:
util.prompt_for_user_token(username,scope,client_id='4f...6',client_secret='xxx...123',redirect_uri='https://play.spotify.com/collection/songs')
When I tried that, this is what happens in terminal:
User authentication requires interaction with your
web browser. Once you enter your credentials and
give authorization, you will be redirected to
a url. Paste that url you were directed to to
complete the authorization.
Opening https://accounts.spotify.com/authorize?scope=user-library-modify&redirect_uri=https%3A%2F%2Fplay.spotify.com%2Fcollection%2Fsongs&response_type=code&client_id=4f...6 in your browser
Enter the URL you were redirected to:
I entered https://play.spotify.com/collection/songs and then got this traceback:
Traceback (most recent call last):
File "add-track.py", line 21, in <module>
token = util.prompt_for_user_token(username, scope, client_id='4f...6', client_secret='xxx...123', redirect_uri='https://play.spotify.com/collection/songs')
File "/Library/Python/2.7/site-packages/spotipy/util.py", line 86, in prompt_for_user_token
token_info = sp_oauth.get_access_token(code)
File "/Library/Python/2.7/site-packages/spotipy/oauth2.py", line 210, in get_access_token
raise SpotifyOauthError(response.reason)
spotipy.oauth2.SpotifyOauthError: Bad Request
It seems like I am missing something, perhaps another part of Spotipy needs to be imported, or some other python module. It seems I am missing the piece that sets client credentials. How do I do that? I am fairly new at this (if that wasn't obvious). Please help.
UPDATE: I changed redirect_uri to localhost:8888/callback. That causes a Firefox tab to open with an error -- "unable to connect to server." (Since I do not have a server running. I thought about installing node.js as in the Spotify Web API tutorial, but I have not yet). The python script then asks me to copy and paste the URL I was redirected to. Even though FF could not open a page, I got this to work by copying the entire URL including the "code=BG..." that follows localhost:8888/callback? I am not sure this is an ideal setup, but at least it works.
Does it matter if I set up node.js or not?
The process you've followed (including your update) is exactly as the example intends and you are not missing anything! Obviously, it is a fairly simple tutorial, but it sets you up with a token and you should be able to get the information you need.
For the credentials, you can set these directly in your Terminal by running each of the export commands. Read more about EXPORT here: https://www.cyberciti.biz/faq/linux-unix-shell-export-command/

Python requests 503 erros when trying to access localhost:8000

I am facing a bit of a situation,
Scenario: I got a django rest api running on my localhost:8000 and I want to access the api using my command line. I have tried urllib2 and python requests libs to talk to the api but failed(i'm getting a 503 error). But when I pass google.com as the url, I am getting the expected response. So I believe my approach is correct but I'm doing something wrong. please see the code below :
import urllib, urllib2, httplib
url = 'http://localhost:8000'
httplib.HTTPConnection.debuglevel = 1
print "urllib"
data = urllib.urlopen(url);
print "urllib2"
request = urllib2.Request(url)
opener = urllib2.build_opener()
feeddata = opener.open(request).read()
print "End\n"
Envioroments:
OS Win7
python v2.7.5
Django==1.6
Markdown==2.3.1
colorconsole==0.6
django-filter==0.7
django-ping==0.2.0
djangorestframework==2.3.10
httplib2==0.8
ipython==1.0.0
jenkinsapi==0.2.14
names==0.3.0
phonenumbers==5.8b1
requests==2.1.0
simplejson==3.3.1
termcolor==1.1.0
virtualenv==1.10.1
Thanks
I had a similar problem, but found that it was the company's proxy that was preventing from pinging myself.
503 Reponse when trying to use python request on local website
Try:
>>> import requests
>>> session = requests.Session()
>>> session.trust_env = False
>>> r = session.get("http://localhost:5000/")
>>> r
<Response [200]>
>>> r.content
'Hello World!'
If you are registering your serializers with DefaultRouter then your api will appear at
http://localhost:8000/api/ for an html view of the index
http://localhost:8000/api/.json for a JSON view of the index
http://localhost:8000/api/appname for an html view of the individual resource
http://localhost:8000/api/appname/.json for a JSON view of the individual resource
you can check the response in your browser to make sure your URL is working as you expect.

Djrill throwing 'TypeError' when send_mail method is used

I am currently trying to integrate mandrill into this Django-based website for emails. Djrill is the recommended package for Django and sits in place of the default SMTP/email backend, passing emails through to a Mandrill account.
When I try to test that this new backend is working by running this command:
send_mail('Test email', body, 'noreply#*********.com', [user.email], fail_silently=False)
It throws the following error: http://pastebin.ca/2239978
Can anybody point me to my mistake?
Update:
As #DavidRobinson mentions in a comment, you are not getting a successful response from the mandrill API authentication call. You should double check your API key.
If that is correct, try using curl to post {"key": <your api key>, "email": <your from email>} to MANDRILL_API_URL + "/users/verify-sender.json" and see if you get a 200.
Something like this:
curl -d key=1234567890 -d email=noreply#mydomain.com http://mandrill.whatever.com/user/verify-sender.json
Original answer:
There is also an issue in Djrill that prevents a useful error message from propagating up. That last line of the stack trace is the problem.
This is the entire open method taken from the source:
def open(self, sender):
"""
"""
self.connection = None
valid_sender = requests.post(
self.api_verify, data={"key": self.api_key, "email": sender})
if valid_sender.status_code == 200:
data = json.loads(valid_sender.content)
if data["is_enabled"]:
self.connection = True
return True
else:
if not self.fail_silently:
raise
See how it just says raise without an exception argument? That syntax is only allowed inside an except block, and raises the exception currently being handled. It doesn't work outside an except block.
An open issue in Djrill mentions a send failure and links a fork that supposedly fixes it. I suspect Djrill isn't well supported and you might try that fork or another solution entirely.