Python: How to solve SyntaxError: Non-ASCII character? - python-2.7

This is the unicode that I have defined at the top of my program
#!/usr/bin/env python
# -*- coding: utf-8 -*-
And yet I still get this error
SyntaxError: Non-ASCII character '\xfe' in file C:/Users/aaron/Desktop/Python/Bicycle_Diagnosis_System/Main.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
I have looked at the website it wprovides and trawled other websites and still can't find the answer. Any ideas (im using pycharm community edition as my IDE if that affects it)
Any help is much appreciated!

Trying add #coding=utf-8 on line 1 and re-run

You file is saved as UTF-16 with BOM (big-endian). I saved your sample code in Notepad++ with that encoding and reproduced the error:
File "test.py", line 1
SyntaxError: Non-ASCII character '\xfe' in file x.py on line 1, but no encoding declared; see http://python.org/dev/peps
/pep-0263/ for details
Make sure your file is saved in the encoding declared. You have to use an encoding compatible with ASCII for the hash bang and encoding lines to be read properly. UTF-16 is not compatible, hence the error message when it read the non-ASCII bytes of the byte order mark (BOM) character.

Related

How to set the UTF-8 as default encoding in Odoo Build?

Can anyone tell me how to set the UTF-8 as default encoding option in Odoo Build.?
Note : I have mentioned "# -- coding: utf-8 --" in all the files which takes no effect on my expected encoding.
If you put # coding: utf-8 at the top of a Python module, this affects the way how Python interprets the source code. This is important if you have string literals with non-ASCII characters in your code, in order to have them represent the correct characters.
However, since you talk about "default encoding", I assume you care about the encoding of text files opened for reading or writing. In Python 2.x, the default for reading and writing files is not to decode/encode at all. I don't think you can change this default (because the built-in function open simply doesn't support encoding), but you can use io.open() or codecs.open() to open files with an explicit encoding.
Thus, to read from a file encoded with UTF-8, open it as follows:
with io.open(filename, encoding='utf-8') as f:
for line in f:
...
In Python 3, built-in open() is the same as io.open(), and the default encoding is platform-dependent.

Django InMemoryUploadedFile with universal line brakes and utf8

In my django application users can upload their csv files to import data into django. It works fine for CLRF unicode files.
But there are two issues:
When the file is not encoded with utf8 I keep getting 'utf8' codec can't decode byte 0xdc in position 393: invalid continuation byte. I tried to resolve that by using the following code
file = codecs.EncodedFile(request.FILES['import'],"utf-8")
dialect = csv.Sniffer().sniff(file.read(2048))
file.open() # seek to 0
reader = csv.reader(file,dialect=dialect)
When the file uses CR Linebreaks they are not recognized or I get: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?. But the InMemoryUploadedFile is already an opened file object.
My issue is very similar to this one but the solution mentioned for point 1 didn't work for me (as you can see my code is very similar) and point 2 isn't answered at all:
Proccessing a Django UploadedFile as UTF-8 with universal newlines

Python 2.7 "latin-1" encoding used instead of "UTF-8"

I am aware that there are plenty of discussions on the "UTF-8" encoding issue on Python 2 but I was unable to find a solution to my problem so far. I am currently creating a script to get the name of a file and hyperlink it in xlwt, so that the file can be accessed by clicks in the spreadsheet. Problem is, some of the names of these files include non-ASCII characters.
Question 1
I used the following line to retrieve the name of the file. There is only one file in the folder by the way.
>>f = filter(os.path.isfile, os.listdir(tmp_path))[0]
And then
>>print f
'521001ldrAvisoAcionistas(Retifica\xe7\xe3o)_doc'
>>print sys.stdout.encoding
'UTF-8'
>>f.decode("UTF-8")
*** UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 76: invalid continuation byte
From browsing the discussions here, I realized that "\xe7\xe3o" is not a "UTF-8" encoding. Running the following line seems to back this point.
>>f.decode("latin-1")
u'521001ldrAvisoAcionistas(Retifica\xe7\xe3o)_doc'
My question is then, why is the variable f being encoded in "latin-1" when the system encoding is set to "UTF-8"?
Question 2
While f.decode("latin-1") gives me the output that I want, I am still unable to supply the variable to the hyperlink function in the spreadsheet.
>>data.append(["File", xlwt.Formula('HYPERLINK("%s";"%s")' % (os.path.join(dl_path,f.decode("latin-1")),f.decode("latin-1")))])
*** FormulaParseException: can't parse formula HYPERLINK("u'H:\\Mad Lab\\SE Doc Crawler\\bovespa\\download\\521001ldrAvisoAcionistas(Retifica\xe7\xe3o)_doc's;"u'521001ldrAvisoAcionistas(Retifica\xe7\xe3o)_doc's)
Apparently, the closing double quote got eaten up and was replaced by a " 's" suffix. Can somebody help to explain what's going on here? 0.0
Oh and if someone can suggest a solution to Question 2 above then I will be very grateful - for you would have saved my weekend from misery!
Thanks in advance all!
Welcome to the confusing world of encoding! There's at least file encoding, terminal encoding and filename encoding to deal with, and all three could be different.
In Python 2.x, the goal is to get a Unicode string (different from str) from an encoded str. The problem is that you don't always know the encoding used for the str so it's difficult to decode it.
When using listdir() to get filenames, there's a documented but often overlooked quirk - if you pass a str to listdir() you get encoded strs back. These will be encoded according to your locale. On Windows these will be an 8bit character set, like windows-1252.
Alternatively, pass listdir() a Unicode string instead.
E.g.
os.listdir(u'C:\\mydir')
Note the u prefix
This will return properly decoded Unicode filenames. On Windows and OS X, this is pretty reliable as long your environment locale hasn't been messed with.
In your case, listdir() would return:
u'521001ldrAvisoAcionistas(Retifica\xe7\xe3o)_doc'
Again, note the u prefix. You can now print this to your PyCharm console with no modification.
E.g.
f = filter(os.path.isfile, os.listdir(tmp_path))[0]
print f
As for Question 2, I did not investigate further but just printed the output as unicode strings, rather than xlwt objects, due to time constraint. I'm able to continue with the project, though without the understanding of what went wrong here. In that sense, the 2 questions above have been answered.

Unicode and more information about errors in Django

I'm newbee. I use Django ==1.7.5 with python-2.7. When I execute command
django-admin makemessages -a
I receive an error:
'ascii' codec can't encode characters in position 374-378 ordinal not in range(128)
Is there a way in Django to print out more information about errors? How can I find the file with wrong characters? Traceback don't give the name of this file. I checked all templates and other files but found nothing.
I have # -*- coding: utf-8 -*- everywhere and my model has def __unicode__(self) method.
Your issue might be where you convert the unicode to ascii or whatever format you are trying to convert it to. Keep everything you have already in place and go over the code where it does this and isolate and test it separately to see if it produces the result you want.
I think I might be a bit vague in the answer, this should have been a comment however I do not have 50 reputation points yet to make one.

Error thrown even if a line is commented

I have this line:
#str = u'Harsha: This has unicode character ♭.\n'
This line causes SyntaxError: Non-ASCII character '\xe2' even if it's commented.
If I remove this line the error is gone. Can anyone tell me whats wrong here?
I'm using PyCharm as IDE.
You want to add the following line at the top of your source file:
# -*- coding: utf-8 -*-
This tells python what is the encoding of your source file.
Source: Working with utf-8 encoding in Python source
You need to hint the proper file encoding.
As you know the character e2 is represented by binary string
1110 ...
this is ambiguos because it could be the UTF8 starting byte for a triplet, or just a Extended ASCII character (wich is what you wanted).
Python defaults to ASCII (7 bit character) that means that without giving some hint for parsing the code everythin over 7 bit will be considered ambiguos and hence lead to an error.
You should instead escape that character or if possible hint the python interpreter to do so (I don't know if it possible, I only found a proposal for that but I don't know if that is implemented already)