Python 2 and 3 differences in hashing a uuid - python-2.7

I have some py2 code that works in python 2:
import uuid
import hashlib
playername = "OfflinePlayer:%s" % name # name is just a text only username
m = hashlib.md5()
m.update(playername)
d = bytearray(m.digest())
d[6] &= 0x0f
d[6] |= 0x30
d[8] &= 0x3f
d[8] |= 0x80
print(uuid.UUID(bytes=str(d)))
However, when the code when run in python3, it produces "TypeError: Unicode-objects must be encoded before hashing" when m.update() is attempted. I tried to endcode it first with the default utf-8 by:
m.update(playername.encode())
but now this line -
print(uuid.UUID(bytes=str(d)))
produces this error:
File "/usr/lib/python3.5/uuid.py", line 149, in __init__
raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
I then tried to decode it back, but the bitwise operations have evidently ruined it (I am guessing?):
print(uuid.UUID(bytes=(d.decode())))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 2: invalid start byte
I don't honestly know what the purpose of the bitwise operations is in the "big-picture". The code snippet in general is supposed to produce the same expected UUID every time based on the spelling of the username.
I just want this code to do the same job in Python3 that it did in Python 2 :(
Thanks in advance.

A couple of things:
m.update(playername.encode('utf-8'))
Should properly encode your string.
print(uuid.UUID(bytes=bytes(d)))
Should return the UUID properly.
Example:
https://repl.it/EyuG/0

Related

UnicodeDecodeError: 'utf8' codec can't decode byte 0xaf in position 3: invalid start byte in python 2.7

using windows10 python 2.7
my code for decryption
def decrypt(self, enc):
enc = b64decode(enc)
iv = enc[:16]
cipher = AES.new(self.key, AES.MODE_CBC, iv)
print cipher,"======"
dec = cipher.decrypt(enc[16:])
#print dec,"========",dec
unp = unpad(dec)
print unp,"=========","=fdkjfsdklfsdjndjdjk"
decode = unp.decode('utf8')
#decode = unp.decode('utf8')
print decode
# unpad(cipher.decrypt(enc[16:])).decode('utf8')
return decode
while decrypting the encrypted response cipher.decrypt(enc[16:]) line gives me below output. But actually It should be the XML format.
)^»3(Fm╠¡Oå┤╖¢iOÑ>s▌B¿▌╥≥┐Éj6╬░¢√(å¥ 2?J≤ôGOL═\¥°t╬╚ΓÜ▐╝Φ÷═AQw≥[&nΣ±ƒ∩(╩ûGN~[3bgrHPÜ4%╖H⌡▄wÅ|■Çq≥½÷σHñxìdºwë±!│▐íWÇÿΘ╦σ╖è#X▓┤2ÿ ┘╟ƒΣ°Y░çNßæÅαb3f«─O(Wo9┐A╕t£╧{K [X┴┬ÜHΘ⌠X4┬Æ≡~╠h3ε┘σmÉfú.Fú╜₧c!_╒▐wα²A/╒|─sY%=⌐▒Yö╕[╞ε░::tA┴₧µ≤²∙C─A█₧╕╧τ╙x≤rƒú░uú█å┬-╤`╡f╕^∞tΦ½q╗&╪─╘¥&┐Σ₧▌(╙┌JüñÇäQ¥/*ó▐H!C┬+δà\Bah╘áÆXu╥C█│¼)ë╩╓*E(÷·├à√¿╨╧1Θ·0≈º²║Ås┬xOò}a╪╔╫HÜq┬gqÅÖ⌐4~v╖·9╥Ü$wçZ▌╗┬? /Zj12^}&t$F=SBKhöåε è╝o╪█º8fìîé╫=«·gO:Z╢≡2╔K«Θ uè/╩ {⌐Åwwε^α┼µk4┘Ñ╧:ƒ16║╞ⁿB°¢üdó?eB┼P┌L_90]\5W╥µA⌐
#Mq╤ìⁿ²ç≥Θ·▓F₧▀) ç#ë╒╖às2╡}πL╕╨60ä┌ù6▒.rn╔jⁿR¢∙µIëÉ╝µè}c≈σß_αäcª/╤"lK*└qX2H öφq#â½æΘjÄ% é6#üY█▓aFßα█÷I║n+⌡▄Ä!jTÄ√∩yr¥d"╛¬z√ⁿµº½êYⁿπ¬2[╕¿≡ÿ │Uv?{τæτ°QÜĵ╨íkUFπ╚BπÆ! Hiåƒ╒£αì≥Æεtr█[╤àÆ█oíΩ("┤╞åMÜò╝D3╬¿VτΩr▓ÜÆÿ$┌)⌠≡\~╩▀Rr≡y£₧≤║L>╙ ╘µv9ÿæ├#B≡µ£╕Ew╗yÿtXeY.αÑsú Y±£∩=yy¥óüΣÆF╧╦á─} Oƒ≥-9[≤¢fúΣe3&Öÿ░ìç·ntÄO
l∙m¥\╞&KêëR»s╔E2╨ª│OV≥░m═╬2┬₧ú(ûöz¢¼╣\≤5nqò+╝±Äm{Gσ╝ROφNµàg╛RV╨;Lδa ,é/ⁿY╜|┤ñ╔÷πvⁿ╞W▓π}Rå#h$*πAò¼2╝CÅk*l"h╕≥aÆhæt)9▐░╝.]B}-╢└∩Iσw┬╚D&5≡▒²`WJ╔╫⌡K1∩ fú~A▌c▄mÑ┴?ôQ╩ƒⁿ|╨{ç▒·ΘB╡Φτ▌⌠─╘q?nⁿC/v>σ°┬#'L┌ 0Kè£
╩[Érekx«wë,\¥─K\a╡·┐PDIF╩l╤YH╞F$c6≈G¡Üc^r=pbiµΦ┘±ÿ▓zΦ¿0░ì┐á7┌o■«-ⁿ#,
While decoding i.e at line unp.decode('utf8') gives me the following error
Traceback (most recent call last):
File "nic_dycrypt_encrypt.py", line 99, in
print('Ciphertext:', AESCipher(key).decrypt(ciphertext))
File "nic_dycrypt_encrypt.py", line 86, in decrypt
decode = unp.decode('utf8').strip()
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xaf in position 3: invalid start byte
Please any one help me to know what is that format and why the error is coming and how to resolve
Simply put not all bytes and/or byte sequences map to unicode characters. In fact most byte sequences do not have a UTF-8 character mapping.
The common solution is to convert binary to an encoding that can handle all byte values, the most common are Base64 and Hexadecimal.

Django encoding error when reading from a CSV

When I try to run:
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Most of my data gets created in the database, except for one particular row. When my script reaches the row, I receive the error:
ProgrammingError: You must not use 8-bit bytestrings unless you use a
text_factory that can interpret 8-bit bytestrings (like text_factory = str).
It is highly recommended that you instead just switch your application to Unicode strings.`
The particular row in the CSV that causes this error is:
>>> row
{'FR\xed\x8aD\xed\x8aRIC.ST-DENIS', 'BOS', 'G'}
I've looked at the other similar Stackoverflow threads with the same or similar issues, but most aren't specific to using Sqlite with Django. Any advice?
If it matters, I'm running the script by going into the Django shell by calling python manage.py shell, and copy-pasting it in, as opposed to just calling the script from the command line.
This is the stacktrace I get:
Traceback (most recent call last):
File "<console>", line 4, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next
row = self.reader.next()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 302, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 1674: invalid continuation byte
EDIT: I decided to just manually import this entry into my database, rather than try to read it from my CSV, based on Alastair McCormack's feedback
Based on the output from your question, it looks like the person who made the CSV mojibaked it - it doesn't seem to represent FRÉDÉRIC.ST-DENIS. You can try using windows-1252 instead of utf-8 but I think you'll end up with FRíŠDíŠRIC.ST-DENIS in your database.
I suspect you're using Python 2 - open() returns str which are simply byte strings.
The error is telling you that you need to decode your text to Unicode string before use.
The simplest method is to decode each cell:
with open('data.csv', 'r') as csvfile: # 'U' means Universal line mode and is not necessary
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].decode('utf-8),
team=row['Team'].decode('utf-8),
position=row['Position'].decode('utf-8)
)
That'll work but it's ugly add decodes everywhere and it won't work in Python 3. Python 3 improves things by opening files in text mode and returning Python 3 strings which are the equivalent of Unicode strings in Py2.
To get the same functionality in Python 2, use the io module. This gives you a open() method which has an encoding option. Annoyingly, the Python 2.x CSV module is broken with Unicode, so you need to install a backported version:
pip install backports.csv
To tidy your code and future proof it, do:
import io
from backports import csv
with io.open('data.csv', 'r', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# now every row is automatically decoded from UTF-8
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Encode Player name in utf-8 using .encode('utf-8') in player name
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].encode('utf-8'),
team=row['Team'],
position=row['Position']
)
In Django, decode with latin-1, csv.DictReader(io.StringIO(csv_file.read().decode('latin-1'))), it would devour all special characters and all comma exceptions you get in utf-8.

Python 2.7 : UnicodeDecodeError when I use character point with import socket connect

I work with python under windows. I have this error "UnicodeDecodeError: 'utf8' codec can't decode byte 0x92" when I excecute this simple code :
import socket
s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((controlAddr, 9051))
controlAddr is "127.0.0.1" and I know that it is the character '.' which cause the problem so I tried different conversion but each time, I have the same error. I tried these different ways:
controlAddr = u'127.0.0.1'
controlAddr = unicode('127.0.0.1')
controlAddr.encode('utf-8')
controlAddr = u'127'+unichr(ord('\x2e'))+u'0'+unichr(ord('\x2e'))+'0'+unichr(ord('\x2e'))+u'1'
I added # -*- coding: utf-8 -*- at the begining of the main file and socket.py file.
... I still have the same error
Your error says 'utf8' codec can't decode byte 0x92". In the Windows codepage 1252, this character maps to U+2019 the right quotation mark ’.
It is likely that the editor you use for your Python script is configured to replace the single quote ('\x27' or ') by the right quotation mark. It may be nicer for text, but is terrible in source code. You must fix it in your editor, or use another editor.
The error message says you have a byte 0x92 in your file somewhere, which is not valid in utf-8, but in other encodings it may be, for example:
>>> b'\x92'.decode('windows-1252')
'`'
That means that your file encoding is not utf-8, but probably windows-1252, and problematic character is the backtick, not the dot, even if that character is found only in a comment.
So either change your file encoding to utf-8 in your editor, or the encoding line to
# -*- coding: windows-1252 -*-
The error message doesn't mention the file the interpreter choked on, but it may be your "main" file, not socket.py.
Also, don't name your file socket.py, that will shadow the builtin socket module and lead to further errors.
Setting an encoding line only affects that one file, you need to do this for every file, only setting it in your "main" file would not be enough.
Thank you ! Indeed, this character doesn't exist in utf-8.
However, I didn't send the character "`", corresponding to 0x92 with windows-1252 and to nothing in utf-8. Futhermore this error appears when a character "." is in controlAddr and it is the same hexadecimal code for both encoding, i.e, 0x2e.
The complete error message is given above :
Traceback (most recent call last):
File "C:\Python27\Lib\site-packages\spyderlib\widgets\externalshell\pythonshell.py", line 566, in write_error
self.shell.write_error(self.get_stderr())
File "C:\Python27\Lib\site-packages\spyderlib\widgets\externalshell\baseshell.py", line 272, in get_stderr
return self.transcode(qba)
File "C:\Python27\Lib\site-packages\spyderlib\widgets\externalshell\baseshell.py", line 258, in transcode
return to_text_string(qba.data(), 'utf8')
File "C:\Python27\Lib\site-packages\spyderlib\py3compat.py", line 134, in to_text_string
return unicode(obj, encoding)
File "C:\Python27\Lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 736: invalid start byte
For this code :
controlPort = 9051
controlAddr = unicode("127.0.0.1")
import socket
s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((controlAddr, controlPort))

Casting in Jython 2.5.3

There is a Python function, that runs in CPython 2.5.3 but crashes in Jython 2.5.3 .
It is part of a user defined function in Apache Pig, which uses Jython 2.5.3 so i cannot change it.
The input is a array of singed bytes, but in fact that are unsigned bytes, so i need to cast it.
from StringIO import StringIO
import array
import ctypes
assert isinstance(input, array.array), 'unexpected input parameter'
assert input.typecode == 'b', 'unexpected input type'
buffer = StringIO()
for byte in input:
s_byte = ctypes.c_byte(byte)
s_byte_p = ctypes.pointer(s_byte)
u_byte = ctypes.cast(s_byte_p, ctypes.POINTER(ctypes.c_ubyte)).contents.value
buffer.write(chr(u_byte))
buffer.seek(0)
output = buffer.getvalue()
assert isinstance(output, str)
The error is:
s_byte = ctypes.cast(u_byte_p, ctypes.POINTER(ctypes.c_byte)).contents.value
AttributeError: 'module' object has no attribute 'cast'
I guess the ctypes.cast functions is not implemeted in Jython 2.5.3 . Is there a workaround for that issue?
Thanks,
Steffen
Here is my solution, that is quite ugly but works without additional dependecies.
It uses the bit representation of usinged und signed bytes (https://de.wikipedia.org/wiki/Zweierkomplement).
import array
assert isinstance(input, array.array), 'unexpected input parameter'
assert input.typecode == 'b', 'unexpected input type'
output = array.array('b', [])
for byte in input:
if byte > 127:
byte = byte & 127
byte = -128 + byte
output.append(byte)

Python: POSTing binary data gives UnicodeDecodeError or Ascii decode error

When POSTing binary data using urllib2 or urllib3, or httplib2, I receive the error UnicodeDecodeError: 'utf8' codec can't decode or UnicodeDecodeError: 'ascii' codec can't decode... depending on whether the Python script is in UniCode or ASCII mode.
I first thought that the library was the issue, so I tried different libraries but that didn't solve the problem.
End of the stack trace:
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 895, in _send_output
msg += message_body
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc4 in position 627: invalid continuation byte
The problem, as noted in comments to Python bug 11898 is that the url string became tagged at some point as either a Unicode or Ascii string.
Then, when the httplib library is creating the byte string for the entire HTTP/S message, and the line
msg += message_body
is executed, Python tries to convert message_body (which contains binary data) to either Ascii or Unicode. In either case, the conversion fails.
Solution
use str() when making any modifications to the url. In my case:
url = baseUrl + "/envelopes" # throws UnicodeDecodeError
url = str(baseUrl + "/envelopes") # works great
If that is not enough, check your other strings to ensure that they haven't been tagged as Unicode or Ascii.