Python 2 str.decode('hex') in Python 3? - python-2.7

I want to send hex encoded data to another client via sockets in python. I managed to do everything some time ago in python 2. Now I want to port it to python 3.
Data looks like this:
""" 16 03 02 """
Then I used this function to get it into a string:
x.replace(' ', '').replace('\n', '').decode('hex')
It then looks like this (which is a type str by the way):
'\x16\x03\x02'
Now I managed to find this in python 3:
codecs.decode('160302', 'hex')
but it returns another type:
b'\x16\x03\x02'
And since everything I encode is not a proper language, i cannot use utf-8 or some decoders, as there are invalid bytes in it (e.g. \x00, \xFF). Any ideas on how I can get the string solution escaped again just like in python 2?
Thanks

'str' objects in python 3 are not sequences of bytes but sequences of unicode code points.
If by "send data" you mean calling send then bytes is the right type to use.
If you really want the string (not 3 bytes but 12 unicode code points):
>>> import codecs
>>> s = str(codecs.decode('16ff00', 'hex'))[2:-1]
>>> s
'\\x16\\xff\\x00'
>>> print(s)
\x16\xff\x00
Note that you need to double backslashes in order to represent them in code.

There is an standard solution for Python2 and Python3. No imports needed:
hex_string = """ 16 03 02 """
some_bytes = bytearray.fromhex(hex_string)
In python3 you can treat it like an str (slicing it, iterate, etc) also you can add byte-strings: b'\x00', b'text' or bytes('text','utf8')
You also mentioned something about to encode "utf-8". So you can do it easily with:
some_bytes.encode()
As you can see you don't need to clean it. This function is very effective. If you want to return to hexadecimal string: some_bytes.hex() will do it for you.

a = """ 16 03 02 """.encode("utf-8")
#Send things over socket
print(a.decode("utf-8"))
Why not encoding with UTF-8, sending with socket and decoding with UTF-8 again ?

Related

How do I access binary data via python registry?

The data in the registry key looks like:
Name Type Value
Data REG_BINARY 60 D0 DB 9E 2D 47 Cf 01
The data represent 8 bytes (QWORD little endian) filetime value. So why they chose to use binary rather than REG_QWORD is anyones guess.
If the python 2.7 code I can see the data value has been located and a value object contains the key information such as
print "***", value64.name(), value64.value_type(), value64.value
*** Data 3 <bound method RegistryValue.value of <Registry.Registry.RegistryValue object at 0x7f2d500b3990>>
The name 'Data' is correct and the value_type of 3 means REG_BINARY so that is correct.
The documentation to the python.registry (assuming I have the right doc) is
https://github.com/williballenthin/python-registry/blob/master/documentation/registry.html
However I am can't figure out what methods/functions have been provided to process binary data.
Because I know this binary data will always be 8 bytes I'm tempted to cast the object pointer to a QWORD (double) pointer and get the value directly but I'm not sure the object points to the data or how I would do this in python anyway.
Any pointers appreciated.
I figured out the type of the value64.value() was a 'str' so then I used simple character indexing to reference each of the 8 bytes and converted the value to a float.
def bin_to_longlong(binval):
return ord(binval[7])*(2**56) + ord(binval[6])*(2**48) + ord(binval[5])*(2**40) + ord(binval[4])*(2**32) + \
ord(binval[3])*(2**24) + ord(binval[2])*(2**16) + ord(binval[1])*(2**8) + ord(binval[0])
Code by me.
which can be tidied up by using struct.unpack like so:
return struct.unpack('<Q', binval)[0] # '<Q' little endian long long
And converted the float (filetime value) to a date.
EPOCH_AS_FILETIME = 116444736000000000 # January 1, 1970 as MS file time
HUNDREDS_OF_NANOSECONDS = 10000000
def filetime_to_dt(ft):
return datetime.fromtimestamp((ft - EPOCH_AS_FILETIME) / HUNDREDS_OF_NANOSECONDS)
Code from : https://gist.github.com/Mostafa-Hamdy-Elgiar/9714475f1b3bc224ea063af81566d873
Like so :
value64date = filetime_to_dt(bin_to_longlong(value64.value()))
Now hopefully someone can show me how to do that elegantly in python!

Logging fails with UnicodeEncodeError when attempting to use .format(myVar) where myVar contains unicode characters

I am using the str.format() function in my logging:
logging.debug('querying author: {}, track: {}'.format(artist)
When the artist variable contains unicode characters such as this: u'Ry Cooder & Ali Farka Tour\xe9' format fails as follows:
artists = {u'A Tribe Called Quest': [u"People's Instinctive Travels and the Paths of Rhythm"],
u'All': [u'Percolater', u'Pummel'],
u'Andrew Bird': [u'The Mysterious Production of Eggs',
u'Noble Beast',
u'Break It Yourself',
u'Weather Systems',
u'Hands of Glory'],
u'April Smith And The Great Picture Show': [u'Songs For A Sinking Ship'],
u'Ry Cooder & Ali Farka Tour\xe9': [u'Talking Timbuktu']}
for each in artists:
print 'this is the string: u"{}"'.format(each)
>>> this is the string: A Tribe Called Quest
>>> ---------------------------------------------------------------------------
>>> UnicodeEncodeError Traceback (most recent call last)
>>> <ipython-input-28-4770333e9fbf> in <module>()
>>> 1 for each in artists:
>>> ----> 2 print 'this is the string: {}'.format(each)
>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 26: ordinal not in range(128)
What is the proper way to deal with this for all logging instances? I know that I can use str.encode('ascii', 'ignore') to dump the unicode characters and side-step this issue as such:
for each in artists:
print 'this is the string: {}'.format(each.encode('ascii', 'ignore'))
>>> this is the string: A Tribe Called Quest
>>> this is the string: Ry Cooder & Ali Farka Tour
>>> this is the string: Andrew Bird
>>> this is the string: All
>>> this is the string: April Smith And The Great Picture Show
The above solution would mean hunting down every logging instance that might encounter unicode characters and adding str.encode() and that doesn't feel very "pythonic."
EDIT 10 Jan2 2019
This is especially problematic when another module's logging attempts to deal with this data. Other than make sure that the unicode characters never make it out of my controlled environs, is there another solution?
end EDIT
Is there a more elegant and appropriate way to deal with this? What is the appropriate way to handle unicode when using the str.format() function?
For completeness:
The artist variable is always forced to unicode using the following code as the API I am interacting with requires UTF-8.
def _forceUnicode(self, text):
'''
force text into unicode
https://gist.github.com/gornostal/1f123aaf838506038710
'''
return text if isinstance(text, unicode) else text.encode('utf-8')
TL:DR: text.encode('utf-8') should be text.decode('utf-8') and the str.format() statements should be preceded by "u" as in u'some text: {}'.format(myVar).
Long Answer:
I fundamentally did not understand the difference between 'utf-8' and 'unicode.' I can't say that I fully understand it now, but I realize now that my attempt to force all my text to unicode was actually forcing it to utf-8 and str.format() was choking on the utf-8 that I was feeding it.
The function above should read:
def _forceUnicode(text):
'''
force text into unicode
https://gist.github.com/gornostal/1f123aaf838506038710
'''
return text if isinstance(text, unicode) else text.decode('utf-8')
That forces everything into unicode and str.format() behaves as expected but requires an indication that it is a unicode string print u'{}'.format():
for each in artists:
print u'this is the string: {}'.format(each)
>>> this is the string: A Tribe Called Quest
>>> this is the string: Ry Cooder & Ali Farka Touré
>>> this is the string: Andrew Bird
>>> this is the string: All
>>> this is the string: April Smith And The Great Picture Show

'ascii' codec can't decode byte 0xdb in position 942: ordinal not in range(128) SQLAlchemy (Django)

I use SQLAlchemy query with utf-8 encode when i use run query on mysqldb i get output, but run code on python i get error :
'ascii' codec can't decode byte 0xdb in position 942: ordinal not in range(128)
query :
query = """SELECT * FROM (SELECT p.ID AS 'persons_ID', p.FirstName AS 'persons_FirstName', p.LastName AS 'persons_LastName',p.NationalCode AS 'persons_NationalCode', p.CityID AS 'persons_CityID', p.Mobile AS 'persons_Mobile',p.Address AS 'persons_Address', cities_1.ID AS 'cities_1_ID', cities_1.Name AS 'cities_1_Name',cities_1.ParentID AS 'cities_1_ParentID', cities_2.ID AS 'cities_2_ID', cities_2.Name AS 'cities_2_Name',cities_2.ParentID AS 'cities_2_ParentID' , cast(#row := #row + 1 as unsigned) as 'persons_row_number' FROM Persons p LEFT OUTER JOIN cities AS cities_2 ON cities_2.ID = p.CityID LEFT OUTER JOIN cities AS cities_1 ON cities_1.ID = cities_2.ParentID , (select #row := 0) as init WHERE 1=1 AND p.FirstName LIKE N'{}%'""".format('رامین')
Conntector charset Mysql :
e = create_engine("mysql+pymysql://#localhost/test?charset=utf8")
do you have idea for resolve ?
Thanks,
Python 2 uses bytestrings (ASCII) strings by default, which support only Latin characters. Python 3 uses Unicode strings by default.
As I see you use some Arabic script in your query and therefore you probably get some in response. The error says, that, obviously, Python can't decode Arabic characters to ASCII. To handle Arabic (or any other non-Latin) characters you have to use unicode in Python. Note: it has nothing to do with unicode setting you provide, which affects only the database.
So your options are:
Switch to Python 3.
Stay as you are, but add from __future__ import unicode_literals at the start of your every module to enable using unicode for strings by default.
Use encode/decode everytime to manipulate with unicode and bytestrings, but it's the worst solution.

Reproducing legacy binary file with Python

I'm trying to write a legacy binary file format in Python 2.7 (the file will be read by a C program).
Is there a way to output the hex representation of integers to a file? I suspect I'll have to roll my own (not least because I don't think Python has the concept of short int, int and long int), but just in case I thought I'd ask. If I have a list:
[0x20, 0x3AB, 0xFFFF]
Is there an easy way to write that to a file so a hex editor would show the file contents as:
20 00 AB 03 FF FF
(note the endianness)?
Since you have some specific formatting needs, I think that using hex is out - you don't need the prefix. We use format instead.
data = [0x20, 0x3AB, 0xFFFF]
def split_digit(n):
""" Bitmasks out the first and second bytes of a <=32 bit number.
Consider checking if isinstance(n, long) and throwing an error.
"""
return (0x00ff & n, (0xff00 & n) >> 8)
[hex(x) + ' ' + hex(y) for x, y in [split_digit(d) for d in data]]
# ['0x20 0x0', '0xab 0x3', '0xff 0xff']
with open('myFile.bin', 'wb') as fh:
for datum in data:
little, big = split_digit(datum)
fh.write(format(little, '02x'))
fh.write(format(big, '02x'))
...or something like that? You'll need to change the formatting a bit, I bet.

Xively read data in Python

I have written a python 2.7 script to retrieve all my historical data from Xively.
Originally I wrote it in C#, and it works perfectly.
I am limiting the request to 6 hour blocks, to retrieve all stored data.
My version in Python is as follows:
requestString = 'http://api.xively.com/v2/feeds/41189/datastreams/0001.csv?key=YcfzZVxtXxxxxxxxxxxORnVu_dMQ&start=' + requestDate + '&duration=6hours&interval=0&per_page=1000' response = urllib2.urlopen(requestString).read()
The request date is in the correct format, I compared the full c# requestString version and the python one.
Using the above request, I only get 101 lines of data, which equates to a few minutes of results.
My suspicion is that it is the .read() function, it returns about 34k of characters which is far less than the c# version. I tried adding 100000 as an argument to the ad function, but no change in result.
Left another solution wrote in Python 2.7 too.
In my case, got data each 30 minutes because many sensors sent values every minute and Xively API has limited half hour of data to this sent frequency.
It's general module:
for day in datespan(start_datetime, end_datetime, deltatime): # loop increasing deltatime to star_datetime until finish
while(True): # assurance correct retrieval data
try:
response = urllib2.urlopen('https://api.xively.com/v2/feeds/'+str(feed)+'.csv?key='+apikey_xively+'&start='+ day.strftime("%Y-%m-%dT%H:%M:%SZ")+'&interval='+str(interval)+'&duration='+duration) # get data
break
except:
time.sleep(0.3)
raise # try again
cr = csv.reader(response) # return data in columns
print '.'
for row in cr:
if row[0] in id: # choose desired data
f.write(row[0]+","+row[1]+","+row[2]+"\n") # write "id,timestamp,value"
The full script you can find it here: https://github.com/CarlosRufo/scripts/blob/master/python/retrievalDataXively.py
Hope you might help, delighted to answer any questions :)