I'm newbee. I use Django ==1.7.5 with python-2.7. When I execute command
django-admin makemessages -a
I receive an error:
'ascii' codec can't encode characters in position 374-378 ordinal not in range(128)
Is there a way in Django to print out more information about errors? How can I find the file with wrong characters? Traceback don't give the name of this file. I checked all templates and other files but found nothing.
I have # -*- coding: utf-8 -*- everywhere and my model has def __unicode__(self) method.
Your issue might be where you convert the unicode to ascii or whatever format you are trying to convert it to. Keep everything you have already in place and go over the code where it does this and isolate and test it separately to see if it produces the result you want.
I think I might be a bit vague in the answer, this should have been a comment however I do not have 50 reputation points yet to make one.
Related
Well, my python script is supposed to open all utf-8 yaml files in a directory and show the content to the user. But, there are words with graphic accent, words in French, such as présenter, which is shown like this: u"pr\xe9senter. I need it to be shown properly to the user.
Here is my code:
import glob
files = glob.glob("data/*.yaml")
def read_yaml_file(filename):
with open(filename, 'r') as stream:
try:
print(yaml.safe_load(stream))
except yaml.YAMLError as exc:
print(exc)
for file in files:
read_yaml_file(file)
I already tried to use the import from __future__, but it didn't work. Does anyone know how to solve it?
Unicode in 2.x is painful. If you can, use current python 3, in which text is unicode, printed without a 'u' prefix, instead of bytes, which is now printed with a 'b' prefix.
>>> print(u"pr\xe9senter") # 3.8
'présenter'
You also need a system console/terminal or IDE that displays glyphs for the codepoints in your yaml files.
If you are a masochist or otherwise stuck on 2.7, use sys.stdout.write(). Note that you must explicitly write '\n's.
>>> import sys; sys.stdout.write(u"pr\xe9senter\n") # 2.7
présenter
This question is not really about IDLE. However, the above lines work in both standard interactive Python on Windows 10 and in IDLE. IDLE uses tkinter which uses tcl/tk. Tk itself can handle all Basic Multilingual Plane (BMP) characters (the first 64K), but only those. Which BMP characters it can display depends on your OS and its current fonts.
This is the unicode that I have defined at the top of my program
#!/usr/bin/env python
# -*- coding: utf-8 -*-
And yet I still get this error
SyntaxError: Non-ASCII character '\xfe' in file C:/Users/aaron/Desktop/Python/Bicycle_Diagnosis_System/Main.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
I have looked at the website it wprovides and trawled other websites and still can't find the answer. Any ideas (im using pycharm community edition as my IDE if that affects it)
Any help is much appreciated!
Trying add #coding=utf-8 on line 1 and re-run
You file is saved as UTF-16 with BOM (big-endian). I saved your sample code in Notepad++ with that encoding and reproduced the error:
File "test.py", line 1
SyntaxError: Non-ASCII character '\xfe' in file x.py on line 1, but no encoding declared; see http://python.org/dev/peps
/pep-0263/ for details
Make sure your file is saved in the encoding declared. You have to use an encoding compatible with ASCII for the hash bang and encoding lines to be read properly. UTF-16 is not compatible, hence the error message when it read the non-ASCII bytes of the byte order mark (BOM) character.
In my django application users can upload their csv files to import data into django. It works fine for CLRF unicode files.
But there are two issues:
When the file is not encoded with utf8 I keep getting 'utf8' codec can't decode byte 0xdc in position 393: invalid continuation byte. I tried to resolve that by using the following code
file = codecs.EncodedFile(request.FILES['import'],"utf-8")
dialect = csv.Sniffer().sniff(file.read(2048))
file.open() # seek to 0
reader = csv.reader(file,dialect=dialect)
When the file uses CR Linebreaks they are not recognized or I get: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?. But the InMemoryUploadedFile is already an opened file object.
My issue is very similar to this one but the solution mentioned for point 1 didn't work for me (as you can see my code is very similar) and point 2 isn't answered at all:
Proccessing a Django UploadedFile as UTF-8 with universal newlines
I am aware that there are plenty of discussions on the "UTF-8" encoding issue on Python 2 but I was unable to find a solution to my problem so far. I am currently creating a script to get the name of a file and hyperlink it in xlwt, so that the file can be accessed by clicks in the spreadsheet. Problem is, some of the names of these files include non-ASCII characters.
Question 1
I used the following line to retrieve the name of the file. There is only one file in the folder by the way.
>>f = filter(os.path.isfile, os.listdir(tmp_path))[0]
And then
>>print f
'521001ldrAvisoAcionistas(Retifica\xe7\xe3o)_doc'
>>print sys.stdout.encoding
'UTF-8'
>>f.decode("UTF-8")
*** UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 76: invalid continuation byte
From browsing the discussions here, I realized that "\xe7\xe3o" is not a "UTF-8" encoding. Running the following line seems to back this point.
>>f.decode("latin-1")
u'521001ldrAvisoAcionistas(Retifica\xe7\xe3o)_doc'
My question is then, why is the variable f being encoded in "latin-1" when the system encoding is set to "UTF-8"?
Question 2
While f.decode("latin-1") gives me the output that I want, I am still unable to supply the variable to the hyperlink function in the spreadsheet.
>>data.append(["File", xlwt.Formula('HYPERLINK("%s";"%s")' % (os.path.join(dl_path,f.decode("latin-1")),f.decode("latin-1")))])
*** FormulaParseException: can't parse formula HYPERLINK("u'H:\\Mad Lab\\SE Doc Crawler\\bovespa\\download\\521001ldrAvisoAcionistas(Retifica\xe7\xe3o)_doc's;"u'521001ldrAvisoAcionistas(Retifica\xe7\xe3o)_doc's)
Apparently, the closing double quote got eaten up and was replaced by a " 's" suffix. Can somebody help to explain what's going on here? 0.0
Oh and if someone can suggest a solution to Question 2 above then I will be very grateful - for you would have saved my weekend from misery!
Thanks in advance all!
Welcome to the confusing world of encoding! There's at least file encoding, terminal encoding and filename encoding to deal with, and all three could be different.
In Python 2.x, the goal is to get a Unicode string (different from str) from an encoded str. The problem is that you don't always know the encoding used for the str so it's difficult to decode it.
When using listdir() to get filenames, there's a documented but often overlooked quirk - if you pass a str to listdir() you get encoded strs back. These will be encoded according to your locale. On Windows these will be an 8bit character set, like windows-1252.
Alternatively, pass listdir() a Unicode string instead.
E.g.
os.listdir(u'C:\\mydir')
Note the u prefix
This will return properly decoded Unicode filenames. On Windows and OS X, this is pretty reliable as long your environment locale hasn't been messed with.
In your case, listdir() would return:
u'521001ldrAvisoAcionistas(Retifica\xe7\xe3o)_doc'
Again, note the u prefix. You can now print this to your PyCharm console with no modification.
E.g.
f = filter(os.path.isfile, os.listdir(tmp_path))[0]
print f
As for Question 2, I did not investigate further but just printed the output as unicode strings, rather than xlwt objects, due to time constraint. I'm able to continue with the project, though without the understanding of what went wrong here. In that sense, the 2 questions above have been answered.
I have this line:
#str = u'Harsha: This has unicode character ♭.\n'
This line causes SyntaxError: Non-ASCII character '\xe2' even if it's commented.
If I remove this line the error is gone. Can anyone tell me whats wrong here?
I'm using PyCharm as IDE.
You want to add the following line at the top of your source file:
# -*- coding: utf-8 -*-
This tells python what is the encoding of your source file.
Source: Working with utf-8 encoding in Python source
You need to hint the proper file encoding.
As you know the character e2 is represented by binary string
1110 ...
this is ambiguos because it could be the UTF8 starting byte for a triplet, or just a Extended ASCII character (wich is what you wanted).
Python defaults to ASCII (7 bit character) that means that without giving some hint for parsing the code everythin over 7 bit will be considered ambiguos and hence lead to an error.
You should instead escape that character or if possible hint the python interpreter to do so (I don't know if it possible, I only found a proposal for that but I don't know if that is implemented already)