We are writing a complex website with i18n.
To make translation easier we hold the translations in models.
Our staff writes and edits the translations via django-admin.
When the translation is completed a management script is started which writes the po-files and executes afterwards djangos compilemessages for all of them.
I know, the po-files have to be writen using utf-8.
But after opening the app I still get the error "'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)" when using languages with special characters like spanish or frensh.
What am I doing wrong?
Here is my (shortened) code:
class Command(NoArgsCommand):
def handle_noargs(self, **options):
languages = XLanguage.objects.all()
currPath = os.getcwd()
for lang in languages:
path = "{}/framework/locale/{}/LC_MESSAGES/".format(currPath, lang.langToplevel)
# check and create path
create_path(path)
# add filename
path = path + "django.po"
with codecs.open(path, "w", encoding='utf-8') as file:
# select all textitems for this language from XTranslation
translation = XTranslation.objects.filter(langID=lang)
for item in translation:
# check if menu-item
if item.textID.templateID:
msgid = u"menu_{}_label".format(item.textID.templateID.id)
else:
msgid = u"{}".format (item.textID.text_id)
trans = u"{}".format (item.textTranslate)
text = u'msgid "{}" msgstr "{}"\n'.format(msgid, trans)
file.write(text)
file.close()
Traceback:
Environment:
Request Method: GET
Request URL: http://127.0.0.1:8000/
Django Version: 1.7
Python Version: 3.4.0
Installed Applications:
('django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'simple_history',
'datetimewidget',
'payroll',
'framework',
'portal',
'pool',
'billing')
Installed Middleware:
('django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.auth.middleware.SessionAuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
'simple_history.middleware.HistoryRequestMiddleware')
Traceback:
File "c:\python34\lib\site-packages\django\core\handlers\base.py" in get_response
111. response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "E:\python\sarlex\framework\views.py" in init
34. activate("de")
File "c:\python34\lib\site-packages\django\utils\translation\__init__.py" in activate
145. return _trans.activate(language)
File "c:\python34\lib\site-packages\django\utils\translation\trans_real.py" in activate
225. _active.value = translation(language)
File "c:\python34\lib\site-packages\django\utils\translation\trans_real.py" in translation
210. current_translation = _fetch(language, fallback=default_translation)
File "c:\python34\lib\site-packages\django\utils\translation\trans_real.py" in _fetch
195. res = _merge(apppath)
File "c:\python34\lib\site-packages\django\utils\translation\trans_real.py" in _merge
177. t = _translation(path)
File "c:\python34\lib\site-packages\django\utils\translation\trans_real.py" in _translation
159. t = gettext_module.translation('django', path, [loc], DjangoTranslation)
File "c:\python34\lib\gettext.py" in translation
410. t = _translations.setdefault(key, class_(fp))
File "c:\python34\lib\site-packages\django\utils\translation\trans_real.py" in __init__
107. gettext_module.GNUTranslations.__init__(self, *args, **kw)
File "c:\python34\lib\gettext.py" in __init__
160. self._parse(fp)
File "c:\python34\lib\gettext.py" in _parse
300. catalog[str(msg, charset)] = str(tmsg, charset)
Exception Type: UnicodeDecodeError at /
Exception Value: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
Whenever you have an encoding/decoding error, it means you are handling Unicode incorrectly. This is most often when you mix Unicode with byte strings, which will prompt Python 2.x to implicitly decode your byte strings to Unicode with the default encoding, 'ascii', which is why you get errors like these:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
The best way to avoid these errors is to work with ONLY Unicode within your program, i.e. you have to explicitly decode all input byte strings to Unicode with 'utf-8' (or another Unicode encoding of your choice), and mark the strings in your code as type Unicode with the prefix u''. When you write out to file, explicitly, encode these back to byte string with 'utf-8'.
Specifically to your code, my guess is either
msgid = "menu_{}_label".format(item.textID.templateID.id)
or
text = 'msgid "{}" msgstr "{}"\n'.format(msgid, item.textTranslate)
is throwing the error. Try making msgid and text Unicode strings instead of byte strings by declaring them like so:
msgid = u"menu_{}_label".format(item.textID.templateID.id)
and
text = u'msgid "{}" msgstr "{}"\n'.format(msgid, item.textTranslate)
I'm assuming that the values of item.textID.templateID.id and item.textTranslate are both in Unicode. If they aren't (i.e. they are byte strings), you'd have to decode them first.
Lastly, this is a very good presentation on how to handle Unicode in Python: http://nedbatchelder.com/text/unipain.html. I highly recommend you go thru it if you do a lot of i18n work.
EDIT 1: since item.textID.templateID.id and item.textTranslate are byte strings, your code should be:
for item in translation:
# check if menu-item
if item.textID.templateID:
msgid = u"menu_{}_label".format(item.textID.templateID.id.decode('utf-8'))
else:
msgid = item.textID.text_id.decode('utf-8') # you don't need to do u"{}".format() here since there's only one replacement field
trans = item.textTranslate.decode('utf-8') # same here, no need for u"{}".format()
text = u'msgid "{}" msgstr "{}"\n'.format(msgid, trans) # msgid and trans should both be Unicode at this point
file.write(text)
EDIT 2: Original code was in Python 3.x, so all of the above is NOT applicable.
I had the same error and this helped me https://stackoverflow.com/a/23278373/2571607
Basically, for me, it's an issue with python. My solution is, open C:\Python27\Lib\mimetypes.py
replace
‘default_encoding = sys.getdefaultencoding()’
with
if sys.getdefaultencoding() != 'gbk':
reload(sys)
sys.setdefaultencoding('gbk')
default_encoding = sys.getdefaultencoding()
Soluction found!
I was writing msgid and msgstr in one line separated with space to make it more readable.
This works in english but throws an error in languages with special characters like spanish or frensh.
After writing msgid and msgstr in 2 lines it works.
Related
When I try to run:
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Most of my data gets created in the database, except for one particular row. When my script reaches the row, I receive the error:
ProgrammingError: You must not use 8-bit bytestrings unless you use a
text_factory that can interpret 8-bit bytestrings (like text_factory = str).
It is highly recommended that you instead just switch your application to Unicode strings.`
The particular row in the CSV that causes this error is:
>>> row
{'FR\xed\x8aD\xed\x8aRIC.ST-DENIS', 'BOS', 'G'}
I've looked at the other similar Stackoverflow threads with the same or similar issues, but most aren't specific to using Sqlite with Django. Any advice?
If it matters, I'm running the script by going into the Django shell by calling python manage.py shell, and copy-pasting it in, as opposed to just calling the script from the command line.
This is the stacktrace I get:
Traceback (most recent call last):
File "<console>", line 4, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next
row = self.reader.next()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 302, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 1674: invalid continuation byte
EDIT: I decided to just manually import this entry into my database, rather than try to read it from my CSV, based on Alastair McCormack's feedback
Based on the output from your question, it looks like the person who made the CSV mojibaked it - it doesn't seem to represent FRÉDÉRIC.ST-DENIS. You can try using windows-1252 instead of utf-8 but I think you'll end up with FRíŠDíŠRIC.ST-DENIS in your database.
I suspect you're using Python 2 - open() returns str which are simply byte strings.
The error is telling you that you need to decode your text to Unicode string before use.
The simplest method is to decode each cell:
with open('data.csv', 'r') as csvfile: # 'U' means Universal line mode and is not necessary
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].decode('utf-8),
team=row['Team'].decode('utf-8),
position=row['Position'].decode('utf-8)
)
That'll work but it's ugly add decodes everywhere and it won't work in Python 3. Python 3 improves things by opening files in text mode and returning Python 3 strings which are the equivalent of Unicode strings in Py2.
To get the same functionality in Python 2, use the io module. This gives you a open() method which has an encoding option. Annoyingly, the Python 2.x CSV module is broken with Unicode, so you need to install a backported version:
pip install backports.csv
To tidy your code and future proof it, do:
import io
from backports import csv
with io.open('data.csv', 'r', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# now every row is automatically decoded from UTF-8
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Encode Player name in utf-8 using .encode('utf-8') in player name
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].encode('utf-8'),
team=row['Team'],
position=row['Position']
)
In Django, decode with latin-1, csv.DictReader(io.StringIO(csv_file.read().decode('latin-1'))), it would devour all special characters and all comma exceptions you get in utf-8.
I work with python under windows. I have this error "UnicodeDecodeError: 'utf8' codec can't decode byte 0x92" when I excecute this simple code :
import socket
s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((controlAddr, 9051))
controlAddr is "127.0.0.1" and I know that it is the character '.' which cause the problem so I tried different conversion but each time, I have the same error. I tried these different ways:
controlAddr = u'127.0.0.1'
controlAddr = unicode('127.0.0.1')
controlAddr.encode('utf-8')
controlAddr = u'127'+unichr(ord('\x2e'))+u'0'+unichr(ord('\x2e'))+'0'+unichr(ord('\x2e'))+u'1'
I added # -*- coding: utf-8 -*- at the begining of the main file and socket.py file.
... I still have the same error
Your error says 'utf8' codec can't decode byte 0x92". In the Windows codepage 1252, this character maps to U+2019 the right quotation mark ’.
It is likely that the editor you use for your Python script is configured to replace the single quote ('\x27' or ') by the right quotation mark. It may be nicer for text, but is terrible in source code. You must fix it in your editor, or use another editor.
The error message says you have a byte 0x92 in your file somewhere, which is not valid in utf-8, but in other encodings it may be, for example:
>>> b'\x92'.decode('windows-1252')
'`'
That means that your file encoding is not utf-8, but probably windows-1252, and problematic character is the backtick, not the dot, even if that character is found only in a comment.
So either change your file encoding to utf-8 in your editor, or the encoding line to
# -*- coding: windows-1252 -*-
The error message doesn't mention the file the interpreter choked on, but it may be your "main" file, not socket.py.
Also, don't name your file socket.py, that will shadow the builtin socket module and lead to further errors.
Setting an encoding line only affects that one file, you need to do this for every file, only setting it in your "main" file would not be enough.
Thank you ! Indeed, this character doesn't exist in utf-8.
However, I didn't send the character "`", corresponding to 0x92 with windows-1252 and to nothing in utf-8. Futhermore this error appears when a character "." is in controlAddr and it is the same hexadecimal code for both encoding, i.e, 0x2e.
The complete error message is given above :
Traceback (most recent call last):
File "C:\Python27\Lib\site-packages\spyderlib\widgets\externalshell\pythonshell.py", line 566, in write_error
self.shell.write_error(self.get_stderr())
File "C:\Python27\Lib\site-packages\spyderlib\widgets\externalshell\baseshell.py", line 272, in get_stderr
return self.transcode(qba)
File "C:\Python27\Lib\site-packages\spyderlib\widgets\externalshell\baseshell.py", line 258, in transcode
return to_text_string(qba.data(), 'utf8')
File "C:\Python27\Lib\site-packages\spyderlib\py3compat.py", line 134, in to_text_string
return unicode(obj, encoding)
File "C:\Python27\Lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 736: invalid start byte
For this code :
controlPort = 9051
controlAddr = unicode("127.0.0.1")
import socket
s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((controlAddr, controlPort))
I have a scrapy Pipeline defined that should write any Item Field crawled by a scraper to text. One of the fields contains HTML code. I'm having issues writing it to file due to the notorious Unicode error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 100: ordinal not in range(128)
Scrapy can write out all of the fields as json in the logfile. Could someone explain what needs to be done to handle the character encoding for writing files? Thanks in advance.
import scrapy
import codecs
class SupportPipeline(object):
def process_item(self, item, spider):
for key, value in item.iteritems():
with codecs.open("%s.%s" % (prefix, key), 'wb', 'utf-8') as f:
# with open("%s.%s" % (prefix, key), 'wb') as f:
f.write(value.encode('utf-8'))
return item
I've got this site running on top with Tornado and its template engine that I want to Internationalize, so I thought on using gettext to help me with that.
Since my site is already in Portuguese, my message.po (template) file has all msgid's in portuguese as well (example):
#: base.html:30 base.html:51
msgid "Início"
msgstr ""
It was generated with xgettext:
xgettext -i *.html -L Python --from-code UTF-8
Later I used Poedit to generate the translation file en_US.po and later compile it as en_US.mo.
Stored in my translation folder:
translation/en_US/LC_MESSAGES/site.mo
So far, so good.
I've created a really simple RequestHandler that would render and return the translated site.
import os
import logging
from tornado.web import RequestHandler
import tornado.locale as locale
LOG = logging.getLogger(__name__)
class SiteHandler(RequestHandler):
def initialize(self):
locale.load_gettext_translations(os.path.join(os.path.dirname(__file__), '../translations'), "site")
def get(self, page):
LOG.debug("PAGE REQUESTED: %s", page)
self.render("site/%s.html" %page)
As far as I know that should work perfectly, but somehow I've encountered some issues:
1 - How do I tell Tornado that my template has its text in Portuguese so it won't go looking for a pt locale which I don't have?
2 - When asking for the site with en_US locale, it loads ok but when Tornado is going to translate, it throws an encoding exception.
TypeError: not all arguments converted during string formatting
ERROR:views.site:Could not load template
Traceback (most recent call last):
File "/Users/ademarizu/Dev/git/new_plugin/site/src/main/py/views/site.py", line 20, in get
self.render("site/%s.html" %page)
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/web.py", line 664, in render
html = self.render_string(template_name, **kwargs)
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/web.py", line 771, in render_string
return t.generate(**namespace)
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/template.py", line 278, in generate
return execute()
File "site/home_html.generated.py", line 11, in _tt_execute
_tt_tmp = _("Início") # site/base.html:30
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/locale.py", line 446, in translate
return self.gettext(message)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gettext.py", line 406, in ugettext
return self._fallback.ugettext(message)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gettext.py", line 407, in ugettext
return unicode(message)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)
Any help?
Ah, I'm running python 2.7 btw!
1 - How do I tell Tornado that my template has its text in Portuguese so it won't go looking for a pt locale which I don't have?
This is what the set_default_locale method is for. Call tornado.locale.set_default_locale('pt') (or pt_BR, etc) once at startup to tell tornado that your template source is in Portuguese.
2 - When asking for the site with en_US locale, it loads ok but when Tornado is going to translate, it throws an encoding exception.
Remember that in Python 2, strings containing non-ascii characters need to be marked as unicode. Instead of _("Início"), use _(u"Início").
I've added image to product, but it not shown in product preview. Error message appears: "The request content cannot be loaded. Please try again later".
Web-page is located in localhost, DB is in UTF8_general_ci (MySQL), Django 1.8, Python 2.7.
Also when I try to open an attachement (i've put image there), I recieve an error, traceback il below that post:
Environment:
Request Method: GET
Request URL: http://localhost:8000/media/files/book1.png
Django Version: 1.8
Python Version: 2.7.6
Installed Applications:
('lfs_theme',
'compressor',
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.staticfiles',
'django.contrib.sites',
'django.contrib.flatpages',
'django.contrib.redirects',
'django.contrib.sitemaps',
'django_countries',
'pagination',
'reviews',
'portlets',
'lfs.addresses',
'lfs.caching',
'lfs.cart',
'lfs.catalog',
'lfs.checkout',
'lfs.core',
'lfs.criteria',
'lfs.customer',
'lfs.customer_tax',
'lfs.discounts',
'lfs.export',
'lfs.gross_price',
'lfs.mail',
'lfs.manage',
'lfs.marketing',
'lfs.manufacturer',
'lfs.net_price',
'lfs.order',
'lfs.page',
'lfs.payment',
'lfs.portlet',
'lfs.search',
'lfs.shipping',
'lfs.supplier',
'lfs.tax',
'lfs.tests',
'lfs.utils',
'lfs.voucher',
'lfs_contact',
'lfs_order_numbers',
'localflavor',
'postal',
'paypal.standard.ipn',
'paypal.standard.pdt')
Installed Middleware:
('django.middleware.csrf.CsrfViewMiddleware',
'django.middleware.common.CommonMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.redirects.middleware.RedirectFallbackMiddleware',
'pagination.middleware.PaginationMiddleware')
Traceback:
File "/home/stp/Рабочий стол/lfs-installer/eggs/Django-1.8-py2.7.egg/django/core/handlers/base.py" in get_response
132. response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/home/stp/Рабочий стол/lfs-installer/eggs/Django-1.8-py2.7.egg/django/views/static.py" in serve
54. fullpath = os.path.join(document_root, newpath)
File "/usr/lib/python2.7/posixpath.py" in join
80. path += '/' + b
Exception Type: UnicodeDecodeError at /media/files/book1.png
Exception Value: 'ascii' codec can't decode byte 0xd0 in position 10: ordinal not in range(128)
Your problem is that media root path contains non-ASCII characters, such as "Рабочий стол".
Posible solutions:
Move your project to ASCII-only (non-Cyrillic in your case) path
Use Python3 instead 2, which not have this Unicode problems
Change MEDIA_ROOT setting to Unicode string, e.g. u'/home/stp/Рабочий стол/myproject/media/'
To get image and override ANSII encoding I've done following:
Added string # -*- coding: utf-8 -*- at the beginning of the
settings.py in my LFS;
As Mr. Tikhonov said, changed MEDIA_ROOT string
After that images started to load normally.