I have this line:
#str = u'Harsha: This has unicode character ♭.\n'
This line causes SyntaxError: Non-ASCII character '\xe2' even if it's commented.
If I remove this line the error is gone. Can anyone tell me whats wrong here?
I'm using PyCharm as IDE.
You want to add the following line at the top of your source file:
# -*- coding: utf-8 -*-
This tells python what is the encoding of your source file.
Source: Working with utf-8 encoding in Python source
You need to hint the proper file encoding.
As you know the character e2 is represented by binary string
1110 ...
this is ambiguos because it could be the UTF8 starting byte for a triplet, or just a Extended ASCII character (wich is what you wanted).
Python defaults to ASCII (7 bit character) that means that without giving some hint for parsing the code everythin over 7 bit will be considered ambiguos and hence lead to an error.
You should instead escape that character or if possible hint the python interpreter to do so (I don't know if it possible, I only found a proposal for that but I don't know if that is implemented already)
Related
When a python script with non ASCII character is compiled using py_compile.compile it does not complaint about encoding. But when imported gives in python 2.7
SyntaxError: Non-ASCII character '\xe2' in file
Why is this happening? whats the difference between importing and compiling using py_compile?
It seems that Python provides two variants of its lexer, one used internally when Python itself parses files, and one that is exposed to Python through e.g. __builtins__.compile or tokenizer.generate_tokens. Only the former one checks for non-ASCII characters, it seems. It's controlled by an #ifdef PGEN in Parser/tokenizer.c.
I have a qualified guess on why they did it this way: In Python 3, non-ASCII characters are permitted in .py files, and are interpreted as utf-8 IIRC. By silently permitting UTF-8 in the lexer, 2.7's tokenizer.generate_tokens() function can accept all valid Py3 code.
This is the unicode that I have defined at the top of my program
#!/usr/bin/env python
# -*- coding: utf-8 -*-
And yet I still get this error
SyntaxError: Non-ASCII character '\xfe' in file C:/Users/aaron/Desktop/Python/Bicycle_Diagnosis_System/Main.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
I have looked at the website it wprovides and trawled other websites and still can't find the answer. Any ideas (im using pycharm community edition as my IDE if that affects it)
Any help is much appreciated!
Trying add #coding=utf-8 on line 1 and re-run
You file is saved as UTF-16 with BOM (big-endian). I saved your sample code in Notepad++ with that encoding and reproduced the error:
File "test.py", line 1
SyntaxError: Non-ASCII character '\xfe' in file x.py on line 1, but no encoding declared; see http://python.org/dev/peps
/pep-0263/ for details
Make sure your file is saved in the encoding declared. You have to use an encoding compatible with ASCII for the hash bang and encoding lines to be read properly. UTF-16 is not compatible, hence the error message when it read the non-ASCII bytes of the byte order mark (BOM) character.
Can anyone tell me how to set the UTF-8 as default encoding option in Odoo Build.?
Note : I have mentioned "# -- coding: utf-8 --" in all the files which takes no effect on my expected encoding.
If you put # coding: utf-8 at the top of a Python module, this affects the way how Python interprets the source code. This is important if you have string literals with non-ASCII characters in your code, in order to have them represent the correct characters.
However, since you talk about "default encoding", I assume you care about the encoding of text files opened for reading or writing. In Python 2.x, the default for reading and writing files is not to decode/encode at all. I don't think you can change this default (because the built-in function open simply doesn't support encoding), but you can use io.open() or codecs.open() to open files with an explicit encoding.
Thus, to read from a file encoded with UTF-8, open it as follows:
with io.open(filename, encoding='utf-8') as f:
for line in f:
...
In Python 3, built-in open() is the same as io.open(), and the default encoding is platform-dependent.
When I run my tests I get a syntax error: SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xa7 in position 0: invalid start byte
The cause of this seems to be that I use a § in a string on line 62. I'm using python 3.4.2 for the project and have used § elsewhere without getting a error. I got a friend to open the project as well, on his screen the § in tests.py showed up as question marks, but this was only in the test files, in the other places it had been used it showed up as normal. I got him to change the § that were showing up as question marks to § on his pc and it worked, which is really weird. How would I go about fixing something like this on my computer though? I can't really get him to load up the file and insert special character every time I want to use them in tests.
edit: So I found out pycharm for some reason had set only tests.py to a encoding other than utf-8. I changed this to utf-8 and it then showed the § I had written as question marks. However swapping them out for § did not work for me. The reason is that for some reason even though the encoding is set to utf-8, pycharm still displays latin1 for me and type latin1 characters instead of utf-8. I've tested on 2 other computers (1 mac, and 1 windows 8.1 same as the one I have problems with) where it correctly displays utf-8. On those computers my § still appear as question marks, but if i change it on the other computer it now appears as § on the computer with the problem. So my problem now is to get pycharm to properly use UTF-8 instead of latin 1.
Ok so I found the problem. as patrys sugested in a comment the file didn't use UTF-8 as encoding. To change that in pycharm I had to go to file->settings->editor->file encodings and change the file encoding for tests to utf-8. After I did that I had to go into the file and re eddit the § as they have now turned into question marks. However it still didn't work. I found out that I also have to change it to UTF-8 down in the right corner of pycharm. For some reason tests is the only .py file that was affected by this (even though I deleted the original tests.py file and remade it).
I have 'Malformed UTF-8 character' error when I'm putting some scalar data in XML::Simple or Data::Dumper. There are regular expressions on the lines where the error occurs.
Malformed UTF-8 character (fatal) at /usr/share/perl5/XML/Simple.pm line 1690.
Malformed UTF-8 character (fatal) at /usr/lib/perl/5.10/Data/Dumper.pm line 682.
At the moment I failed to reproduce the error with a small piece of code.
XML::Simple 2.18
Data::Dumper 2.124
perl v5.10.1
The problem arose because somewhere deep in the code of the application there was Encode::_utf8_on with a scalar, that wasn't a proper UTF-8 string.
You could try piping your data through Encoding::FixLatin. If the 'binary' bytes you're encountering are actually Latin-1 characters then they'll get converted to valid UTF8. If they really are random binary bytes then they should at least get converted to random (but valid) UTF8 characters :-)
The core Encode module provides facilities for Handling Malformed Data. I never used them myself, though.