I have a function built that reads in hex-ascii encoded data, I built that in Python 2.7. I am changing my code over to run on 3.x and hit an unforeseen issue. The function worked flawlessly under 2.7. Here is what I have:
# works with 2.7
data = open('hexascii_file.dat', 'rU').read()
When I run that under 3.x I get a UnicodeError:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x85 in position 500594: invalid start byte
I thought the default codec under Python 2.7 was ascii, so I tried the following under 3.x:
data = open('hexascii_file.dat', 'rU', encoding='ascii')
This did not work (same error as above, but specifying 'ascii' instead of 'utf-8'. However, when I use the latin-1 codec all works well.
data = open('hexascii_file.dat', 'rU', encoding='latin-1')
I guess I am looking for a quick sanity check here to ensure I have made the proper change to the script. Does this change make sense?
Related
I'd like to log (or even print) some messages that are utf-8 encoded unicode ; having run into this a lot I collected a bunch of fixes none of which actually seems to work in my case (python3 in a jupyter notebook). What i have so far is:
#!/usr/bin/env LC_ALL=en_US.UTF-8 /usr/local/bin/python3
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
!export PYTHONIOENCODING=UTF-8
import logging
logging.basicConfig(level=logging.INFO)
then i try any of
logging.info(u'שלום')
logging.info(unicode('שלום','utf-8'))
logging.info(u'שלום'.encode('utf-8'))
all of which hit the dreaded
'ascii' codec can't decode byte 0xd7 in position 10: ordinal not in range(128)
At this point i am willing to sacrifice a goat to the unicode monkey-god if that would help, can anyone weigh in (e.g. what kind of goat?)
I'm communicating over a serial port, and currently using python2 code which I want to convert to python3. I want to make sure the bytes I send over the wire are the same, but I'm having trouble verifying that that's the case.
In the original code the commands are sent like this:
serial.Serial().write("\xaa\xb4" + chr(2))
If I print "\xaa\xb4" in python2 I get this: ��.
If I print("\xaa\xb4") in python3 I get this: ª´
Encoding and decoding seem opposite too:
Python2: print "\xaa".decode('latin-1') -> ª
Python3: print("\xaa".encode('latin-1')) -> b'\xaa'
To be crude, what do I need to send in serial.write() in python3 to make sure exactly the same sequence of 1s and 0s are sent down the wire?
Use a bytes sequence.
ser.write(b'\xaa\xb4')
Two uServices are communicating via a message queue (RabbitMQ). The data is encoded using message pack.
I have the following scenarios:
python3 -> python3: working fine
python2 -> python3: encoding issues
Encoding is done with:
umsgpack.packb(data)
Decoding with:
umsgpack.unpackb(body)
When doing encoding and decoding in python3 I get:
data={'sender': 'producer-big-red-tiger', 'json': '{"msg": "hi"}', 'servicename': 'echo', 'command': 'run'}
When doing encoding in python2 and decoding on python3 I get:
data={b'command': b'run', b'json': b'{"msg": ""}', b'servicename': b'echo', b'sender': b'bla-blah'}
Why is the data non "completely" decoded? What should I do on the sender / receiver to achieve compatibility between python2 and python3?
Look at the "Notes" section of the README from msgpack-python;
msgpack can distinguish string and binary type for now. But it is not like Python 2. Python 2 added unicode string. But msgpack renamed raw to str and added bin type. It is because keep compatibility with data created by old libs. raw was used for text more than binary.
Currently, while msgpack-python supports new bin type, default setting doesn't use it and decodes raw as bytes instead of unicode (str in Python 3).
You can change this by using use_bin_type=True option in Packer and encoding="utf-8" option in Unpacker.
>>> import msgpack
>>> packed = msgpack.packb([b'spam', u'egg'], use_bin_type=True)
>>> msgpack.unpackb(packed, encoding='utf-8')
['spam', u'egg']
I am playing around with pyglet 1.2alpha-1 and Python 3.3. I have the following (extremely simple) application and cannot figure out what my issue is:
import pyglet
window = pyglet.window.Window()
#image = pyglet.resource.image('img1.jpg')
image = pyglet.image.load('img1.jpg')
label = pyglet.text.Label('Hello, World!!',
font_name='Times New Roman',
font_size=36,
x=window.width//2, y=window.height//2,
anchor_x='center', anchor_y='center')
#window.event
def on_draw():
window.clear()
label.draw()
# image.blit(0,0)
pyglet.app.run()
With the above code, my text label will appear as long as image.blit(0, 0) is commented out. However, if I try to display the image, the program crashes with the following error:
File "C:\Python33\lib\site-packages\pyglet\gl\lib.py", line 105, in errcheck
raise GLException(msg)
pyglet.gl.lib.GLException: b'invalid value'
I also get the above error if I try to use pyglet.resource.image instead of pyglet.image.load (the image and py file are in the same directory).
Any one know how I can fix this issue?
I am using Python 3.3, pyglet 1.2alpha-1, and Windows 8.
The code -including the image.blit- runs fine for me. I'm using python 2.7.3, pyglet 1.1.4
There's nothing wrong with the code. You might consider trying other python and pyglet versions for the time being (until pyglet has a new stable release)
This isn't a "fix", but might at least determine if it's fixable or not (mine was not). (From the Pyglet mailing group.)
You can verify whether the system does not even support Textures greater than 1024, by running this code (Python 3+):
from ctypes import c_long
from pyglet.gl import glGetIntegerv, GL_MAX_TEXTURE_SIZE
i = c_long()
glGetIntegerv(GL_MAX_TEXTURE_SIZE, i)
print (i) # output: c_long(1024) (or higher)
That is the maximum texture size your system supports. If it's 1024, then any larger pictures will raise an Exception. (And the only fix is, get a better system).
I'm trying to set up the sorl-thumbnail django app to provide thumbnails of pdf-files for a web site - running on Windows Server 2008 R2 with Appache web server.
I've had sorl-thumbnail functional with the PIL backend for thumbnail generation of jpeg images - which was working fine.
Since PIL cannot read pdf-files I wanted to switch to the graphicsmagick backend.
I've installed and tested the graphicsmagick/ghostscript combination. From the command line
gm convert foo.pdf -resize 400x400 bar.jpg
generates the expected jpg thumbnail. It also works for jpg to jpg thumbnail generation.
However, when called from sorl-thumbnail, ghostscript crashes.
From django python shell (python manage.py shell) I use the low-level command described in the sorl docs and pass in a FieldFile instance (ff) pointing to foo.pdf and get the following error:
In [8]: im = get_thumbnail(ff, '400x400', quality=95)
**** Warning: stream operator isn't terminated by valid EOL.
**** Warning: stream Length incorrect.
**** Warning: An error occurred while reading an XREF table.
**** The file has been damaged. This may have been caused
**** by a problem while converting or transfering the file.
**** Ghostscript will attempt to recover the data.
**** Error: Trailer is not found.
GPL Ghostscript 9.07: Unrecoverable error, exit code 1
Note that ff is pointing to the same file that converts fine when using gm convert from command line.
I've tried also passing an ImageFieldFile instance (iff) and get the following error:
In [5]: im = get_thumbnail(iff, '400x400', quality=95)
identify.exe: Corrupt JPEG data: 1 extraneous bytes before marker 0xdb `c:\users\thin\appdata\local\temp\tmpxs7m5p' # warning/jpeg.c/JPEGWarningHandler/348.
identify.exe: Corrupt JPEG data: 1 extraneous bytes before marker 0xc4 `c:\users\thin\appdata\local\temp\tmpxs7m5p' # warning/jpeg.c/JPEGWarningHandler/348.
identify.exe: Corrupt JPEG data: 1 extraneous bytes before marker 0xda `c:\users\thin\appdata\local\temp\tmpxs7m5p' # warning/jpeg.c/JPEGWarningHandler/348.
Invalid Parameter - -auto-orient
Changing back sorl settings to use the default PIL backend and repeating the command for jpg to jpg conversion, the thumbnail image is generated without errors/warnings and available through the cache.
It seems that sorl is copying the source file to a temporary file before passing it to gm - and that the problem originates in this copy operation.
I've found what I believe to be the copy operation in the sources of sorl_thumbnail-11.12-py2.7.egg\sorl\thumbnail\engines\convert_engine.py lines 47-55:
class Engine(EngineBase):
...
def get_image(self, source):
"""
Returns the backend image objects from a ImageFile instance
"""
handle, tmp = mkstemp()
with open(tmp, 'w') as fp:
fp.write(source.read())
os.close(handle)
return {'source': tmp, 'options': SortedDict(), 'size': None}
Could the problem be here - I don't see it!
Any suggestions of how to overcome this problem would be greatly appreciated!
I'm using django 1.4, sorl-thumbnail 11.12 with memcached and ghostscript 9.07.
After some trial and error, I found that the problem could be solved by changing the write mode from 'w' to 'wb', so that the sources of sorl_thumbnail-11.12-py2.7.egg\sorl\thumbnail\engines\convert_engine.py lines 47-55 now read:
class Engine(EngineBase):
...
def get_image(self, source):
"""
Returns the backend image objects from a ImageFile instance
"""
handle, tmp = mkstemp()
with open(tmp, 'wb') as fp:
fp.write(source.read())
os.close(handle)
return {'source': tmp, 'options': SortedDict(), 'size': None}
There are I believe two other locations in the convert_engine.py file, where the same change should be made.
After that, the gm convert command was able to process the file.
However, since my pdf's are fairly large multipage pdf's I then ran into other problems, the most important being that the get_image method makes a full copy of the file before the thumbnail is generated. With filesizes around 50 Mb it therefore turns out to be a very slow process, and finally I've opted for bypassing sorl and calling gm directly. The thumbnail is then stored in a standard ImageField. Not so elegant, but much faster.