Basic authentication error with urllib2 since python 2.7 update - python-2.7

This piece of python code worked fine an hour ago, before I ran an apt-get upgrade on my raspberry pi.
This is now my python version: Python 2.7.9 (default, Sep 17 2016, 20:26:04)
import urllib, urllib2
from PIL import Image
URL="http://server.local/picture.jpg"
headers = {'Authorization': 'Basic ' + base64.encodestring('Guess:Thepassword')}
req = urllib2.Request(URL, None, headers)
img=Image.open(urllib2.urlopen(req,timeout=1))
But now I get this error:
File "/usr/lib/python2.7/httplib.py", line 1017, in putheader
raise ValueError('Invalid header value %r' % (one_value,))
ValueError: Invalid header value 'Basic TGlvbjpSdW5SYWJiaXRSdW4=\n'
I assume something has changed, but can't figure out what..

You can't have a new line character \n at the end of your header. Instead of using base64.encodestring, use base64.b64encode.
I don't think this has anything to do with an update to Python, since this behaviour has been there since the base64 module was included back in Python 2.4 (see the bolded text):
Encode the string s, which can contain arbitrary binary data, and
return a string containing one or more lines of base64-encoded data.
encodestring() returns a string containing one or more lines of
base64-encoded data always including an extra trailing newline ('\n').

FYI - I believe the change in functionality can be traced to this security update:
https://launchpad.net/ubuntu/+source/python2.7/2.7.6-8ubuntu0.3:
SECURITY UPDATE: CRLF injection vulnerability in the
HTTPConnection.putheader
- debian/patches/CVE-2016-5699.patch: disallow newlines in
putheader() arguments when not followed by spaces or tabs in
Lib/httplib.py, add tests in Lib/test/test_httplib.py
- CVE-2016-5699

Related

Django encoding error when reading from a CSV

When I try to run:
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Most of my data gets created in the database, except for one particular row. When my script reaches the row, I receive the error:
ProgrammingError: You must not use 8-bit bytestrings unless you use a
text_factory that can interpret 8-bit bytestrings (like text_factory = str).
It is highly recommended that you instead just switch your application to Unicode strings.`
The particular row in the CSV that causes this error is:
>>> row
{'FR\xed\x8aD\xed\x8aRIC.ST-DENIS', 'BOS', 'G'}
I've looked at the other similar Stackoverflow threads with the same or similar issues, but most aren't specific to using Sqlite with Django. Any advice?
If it matters, I'm running the script by going into the Django shell by calling python manage.py shell, and copy-pasting it in, as opposed to just calling the script from the command line.
This is the stacktrace I get:
Traceback (most recent call last):
File "<console>", line 4, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next
row = self.reader.next()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 302, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 1674: invalid continuation byte
EDIT: I decided to just manually import this entry into my database, rather than try to read it from my CSV, based on Alastair McCormack's feedback
Based on the output from your question, it looks like the person who made the CSV mojibaked it - it doesn't seem to represent FRÉDÉRIC.ST-DENIS. You can try using windows-1252 instead of utf-8 but I think you'll end up with FRíŠDíŠRIC.ST-DENIS in your database.
I suspect you're using Python 2 - open() returns str which are simply byte strings.
The error is telling you that you need to decode your text to Unicode string before use.
The simplest method is to decode each cell:
with open('data.csv', 'r') as csvfile: # 'U' means Universal line mode and is not necessary
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].decode('utf-8),
team=row['Team'].decode('utf-8),
position=row['Position'].decode('utf-8)
)
That'll work but it's ugly add decodes everywhere and it won't work in Python 3. Python 3 improves things by opening files in text mode and returning Python 3 strings which are the equivalent of Unicode strings in Py2.
To get the same functionality in Python 2, use the io module. This gives you a open() method which has an encoding option. Annoyingly, the Python 2.x CSV module is broken with Unicode, so you need to install a backported version:
pip install backports.csv
To tidy your code and future proof it, do:
import io
from backports import csv
with io.open('data.csv', 'r', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# now every row is automatically decoded from UTF-8
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Encode Player name in utf-8 using .encode('utf-8') in player name
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].encode('utf-8'),
team=row['Team'],
position=row['Position']
)
In Django, decode with latin-1, csv.DictReader(io.StringIO(csv_file.read().decode('latin-1'))), it would devour all special characters and all comma exceptions you get in utf-8.

python3 convert str to bytes-like obj without use encode

I wrote a httpserver to serve html files for python2.7 and python3.5.
def do_GET(self):
...
#if resoure is api
data = json.dumps({'message':['thanks for your answer']})
#if resource is file name
with open(resource, 'rb') as f:
data = f.read()
self.send_response(response)
self.send_header('Access-Control-Allow-Origin', '*')
self.end_headers()
self.wfile.write(data) # this line raise TypeError: a bytes-like object is required, not 'str'
the code works in python2.7, but in python 3, it raised the above the error.
I could use bytearray(data, 'utf-8') to convert str to bytes, but the html is changed in web.
My question:
How to do to support python2 and python3 without use 2to3 tools and without change the file's encoding.
is there a better way to read a file and sent it content to client with the same way in python2 and python3 ?
thanks in advance.
You just have to open your file in binary mode, not in text mode:
with open(resource,"rb") as f:
data = f.read()
then, data is a bytes object in python 3, and a str in python 2, and it works for both versions.
As a positive side-effect, when this code hits a Windows box, it still works (else binary files like images are corrupt because of the endline termination conversion when opened in text mode).

Function assertIn causes the UnicodeDecodeError

During the test for my Django 1.9 project I get an error:
Python swears on this code:
def test_students_list(self):
# make request to the server to get homepage page
response = self.client.get(self.url)
# do we have student name on a page?
self.assertIn('Vitaliy', response.content)
How to set the same encoding for the arguments in function assertIn?
I tried so:
self.assertIn(u"Vitaliy", response.content.decode('utf8'))
The result is the same...
P.S. I have Python 2.7.6 on Ubuntu 14.04
Did you try to define your Python source code encoding using:
# -- coding: utf-8 --
As suggest in PEP 0263.
See https://docs.djangoproject.com/en/1.9/ref/request-response/#django.http.HttpResponse.content
The HTTPResponse.content is noted in the docs as being a bytestring (https://github.com/django/django/blob/master/django/http/response.py#L225), it should be encoded as DEFAULT_CHARSET which by default is utf-8 but in both our cases this doesn't seem to get through to the test.
My solution is to tell Python request.content should have unicode encoding:
def test_students_list(self):
# make request to the server to get homepage page
response = self.client.get(self.url)
# do we have student name on a page?
self.assertIn('Vitaliy', unicode(response.content, encoding='utf-8'))
Use self.assertContains(response, 'Vitaliy') instead.

Gzip and Encode file in Python 2.7

with gzip.open(sys.argv[5] + ".json.gz", mode="w", encoding="utf-8") as outfile:
It throws:
TypeError: open() got an unexpected keyword argument 'encoding'
But the docs says it exists
https://docs.python.org/3/library/gzip.html
Update
How can i encode and zip the file in Python 2.7?
I tried now this:
(but it don't work)
with gzip.open(sys.argv[5] + ".json.gz", mode="w") as outfile:
outfile = io.TextIOWrapper(outfile, encoding="utf-8")
json.dump(fdata, outfile, indent=2, ensure_ascii=False)
TypeError: must be unicode, not str
What can i do?
Those are the Python 3 docs. The Python 2 version of gzip does not allow encoding= as a keyword argument to gzip.open().
Seems the question has been answered sufficiently, but for your peace of mind: Alternatively to ensure that Python2 uses utf-8 as standard perhaps try the following, as it then becomes unnecessary to specify an encoding:
import sys
reload(sys)
sys.setdefaultencoding('UTF8')

"UnicodeEncodeError: 'ascii' codec can't encode character"

I'm trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this:
UnicodeEncodeError: 'ascii' codec can't encode character
I traced it back to a trademark superscript on the end of this word: Protection™ -- and I expect to encounter others like it in the future.
Is there a module to process non-ascii characters? or, what is the best way to handle/escape non-ascii stuff in python?
Thanks!
Full error:
E
======================================================================
ERROR: test_untitled (__main__.Untitled)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Python26\Test2.py", line 26, in test_untitled
ofile.write(Whois + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 1005: ordinal not in range(128)
Full Script:
from selenium import selenium
import unittest, time, re, csv, logging
class Untitled(unittest.TestCase):
def setUp(self):
self.verificationErrors = []
self.selenium = selenium("localhost", 4444, "*firefox", "http://www.BaseDomain.com/")
self.selenium.start()
self.selenium.set_timeout("90000")
def test_untitled(self):
sel = self.selenium
spamReader = csv.reader(open('SubDomainList.csv', 'rb'))
for row in spamReader:
sel.open(row[0])
time.sleep(10)
Test = sel.get_text("//html/body/div/table/tbody/tr/td/form/div/table/tbody/tr[7]/td")
Test = Test.replace(",","")
Test = Test.replace("\n", "")
ofile = open('TestOut.csv', 'ab')
ofile.write(Test + '\n')
ofile.close()
def tearDown(self):
self.selenium.stop()
self.assertEqual([], self.verificationErrors)
if __name__ == "__main__":
unittest.main()
You're trying to convert unicode to ascii in "strict" mode:
>>> help(str.encode)
Help on method_descriptor:
encode(...)
S.encode([encoding[,errors]]) -> object
Encodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that is able to handle UnicodeEncodeErrors.
You probably want something like one of the following:
s = u'Protection™'
print s.encode('ascii', 'ignore') # removes the ™
print s.encode('ascii', 'replace') # replaces with ?
print s.encode('ascii','xmlcharrefreplace') # turn into xml entities
print s.encode('ascii', 'strict') # throw UnicodeEncodeErrors
You're trying to pass a bytestring to something, but it's impossible (from the scarcity of info you provide) to tell what you're trying to pass it to. You start with a Unicode string that cannot be encoded as ASCII (the default codec), so, you'll have to encode by some different codec (or transliterate it, as #R.Pate suggests) -- but it's impossible for use to say what codec you should use, because we don't know what you're passing the bytestring and therefore don't know what that unknown subsystem is going to be able to accept and process correctly in terms of codecs.
In such total darkness as you leave us in, utf-8 is a reasonable blind guess (since it's a codec that can represent any Unicode string exactly as a bytestring, and it's the standard codec for many purposes, such as XML) -- but it can't be any more than a blind guess, until and unless you're going to tell us more about what you're trying to pass that bytestring to, and for what purposes.
Passing thestring.encode('utf-8') rather than bare thestring will definitely avoid the particular error you're seeing right now, but it may result in peculiar displays (or whatever it is you're trying to do with that bytestring!) unless the recipient is ready, willing and able to accept utf-8 encoding (and how could WE know, having absolutely zero idea about what the recipient could possibly be?!-)
The "best" way always depends on your requirements; so, what are yours? Is ignoring non-ASCII appropriate? Should you replace ™ with "(tm)"? (Which looks fancy for this example, but quickly breaks down for other codepoints—but it may be just what you want.) Could the exception be exactly what you need; now you just need to handle it in some way?
Only you can really answer this question.
First of all, try installing translations for English language (or any other if needed):
sudo apt-get install language-pack-en
which provides translation data updates for all supported packages (including Python).
And make sure you use the right encoding in your code.
For example:
open(foo, encoding='utf-8')
Then double check your system configuration like value of LANG or configuration of locale (/etc/default/locale) and don't forget to re-login your session.