PyCharm issue on encoding - python-2.7

I am trying pycharm and facing an encoding issue. Can you please help resolve it.
code:
# -*- coding: utf-8 -*-
__author__ = 'me'
import os, sys
def main():
print repeat('mike',False)
print repeat('mok', True)
"""
comments here..
"""
def repeat(s,exclaim):
result = s*3
if exclaim:
result = result +'!!!'
return result
if __name__ == '__main__':
main()
error:
C:\Python27\python.exe C:\Python27\python.exe C:/Users/prakashs/PycharmProjects/GooglePython/WarmUp.py
File "C:\Python27\python.exe", line 1
SyntaxError: Non-ASCII character '\x90' in file C:\Python27\python.exe on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Process finished with exit code 1
I have set the default encoding in pycharm to utf-8 as well. but i need to know where in pycharm we have to edit the settings.
Thank you.

Googling for Non-ASCII character '\x90' in file gives Using #-*- coding: utf-8 -*- does not remove "Non-ASCII character '\x90' in file hello.exe on line 1, but no encoding declared" error Stackoverflow question as the first hit. There you'll find answer to your question.
You have wrong command starting with C:\Python27\python.exe C:\Python27\python.exe... (python.exe is mentioned twice) which means you try to run executable (python.exe) instead of script file (WarmUp.py).

Related

UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 12

I'm developing a chatbot with the chatterbot library. The chatbot is in my native language --> Slovene, which has a lot of strange characters (for example: š, č, ž). I'm using python 2.7.
When I try to train the bot, the library has trouble with the characters mentioned above. For example, when I run the following code:
chatBot.set_trainer(ListTrainer)
chatBot.train([
"Koliko imam še dopusta?",
"Letos imate še 19 dni dopusta.",
])
it throws the following error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 12: invalid start byte
I added the # -*- coding: utf-8 -*- line to the top of my file, I also changed the encoding of all used files via my editor (Sublime text 3) to utf-8, I changed the system default encoding with the following code:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
The strings are of type unicode.
When I try to get a response, with these strange characters, it works, it has no issues with them. For example, running the following code in the same execution as the above training code(when I change 'š' to 's' and 'č' to 'c', in the train strings), throws no errors:
chatBot.set_trainer(ListTrainer)
chatBot.train([
"Koliko imam se dopusta?",
"Letos imate se 19 dni dopusta.",
])
chatBot.get_response("Koliko imam še dopusta?")
I can't find a solution to this issue. Any suggestions?
Thanks loads in advance. :)
EDIT: I used from __future__ import unicode_literals, to make strings of type unicode. I also checked if they really were unicode with the method type(myString)
I would also like to paste this link.
EDIT 2: #MallikarjunaraoKosuri - s code works, but in my case, I had one more thing inside the chatbot instance intialization, which is the following:
chatBot = ChatBot(
'Test',
trainer='chatterbot.trainers.ListTrainer',
storage_adapter='chatterbot.storage.JsonFileStorageAdapter'
)
This is the cause of my error. The json storage file the chatbot creates, is created in my local encoding and not in utf-8. It seems the default storage (.sqlite3), doesn't have this issue, so for now I'll just avoid the json storage. But I am still interested in finding a solution to this error.
The strings from your example are not of type unicode.
Otherwise Python would not throw the UnicodeDecodeError.
This type of error says that at a certain step of program's execution Python tries to decode byte-string into unicode but for some reason fails.
In your case the reason is that:
decoding is configured by utf-8
your source file is not in utf-8 and almost certainly in cp1252:
import unicodedata
b = '\x9a'
# u = b.decode('utf-8') # UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a
# in position 0: invalid start byte
u = b.decode('cp1252')
print unicodedata.name(u) # LATIN SMALL LETTER S WITH CARON
print u # š
So, the 0x9a byte from your cp1252 source can't be decoded with utf-8.
The best solution is to do nothing except convertation your source to utf-8.
With Sublime Text 3 you can easily do it by: File -> Reopen with Encoding -> UTF-8.
But don't forget to Ctrl+C your source code before the convertation beacuse just after that all your š, č, ž chars wil be replaced with ?.
Some of our friends are already suggested good part solutions, However again I would like combine all the solutions into one.
And author #gunthercox suggested some guidelines are described here http://chatterbot.readthedocs.io/en/stable/encoding.html#how-do-i-fix-python-encoding-errors
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
# Create a new chat bot named Test
chatBot = ChatBot(
'Test',
trainer='chatterbot.trainers.ListTrainer'
)
chatBot.train([
"Koliko imam še dopusta?",
"Letos imate še 19 dni dopusta.",
])
Python Terminal
>>> # -*- coding: utf-8 -*-
... from chatterbot import ChatBot
>>>
>>> # Create a new chat bot named Test
... chatBot = ChatBot(
... 'Test',
... trainer='chatterbot.trainers.ListTrainer'
... )
>>>
>>> chatBot.train([
... "Koliko imam še dopusta?",
... "Letos imate še 19 dni dopusta.",
... ])
List Trainer: [####################] 100%
>>>

why cleaning text function doens't work without decoding to UTF8?

I wrote the following function in python 2.7 to clean the text but it doesn't work without decoding the tweet variable to utf8
# -*- coding: utf-8 -*-
import re
def clean_tweet(tweet):
tweet = re.sub(u"[^\u0622-\u064A]", ' ', tweet, flags=re.U)
return tweet
if __name__ == "__main__":
s="sadfas سيبس sdfgsdfg/dfgdfg ffeee منت منشس يت??بمنشس//تبي منشكسميكمنشسكيمنك ٌاإلا رًاٌااًٌَُ"
print "not working "+clean_tweet(s)
print "working "+clean_tweet(s.decode("utf-8"))
Could any one explain why?
Because I don't want to use the decoding as it makes the manipulation of the text in Sframe in graphlab is too slow.

python Error : must be char not unicode

def get_point_id(point_type, id):#id is str,like:"001",“1001.0”
if not id:
return None
if True:
str_id = str(int(id))
if point_type == 'alarm':
return str_id.rjust(3,'0')# error
When I run it ,the console display:
File "b2_insert_standard_template_signals.py", line 118, in get_point_id
return str_id.rjust(3,'0')
TypeError: must be char, not unicode
str_id is a str,not unicode,I don't know why. This py file already declare # -- coding: utf-8 --
my python version is 2.7.3. Please help me.
ok,I know what happened. because I declare “ # -- coding: utf-8 --” in the front of the .py file.
so the second argument of the rjust method is utf-8 encoding. This is wrong.
the second argument of the rjust method must be "str".
It's funny

wolframalpha api syntax error

i am working on 'wolframalpha' api and i am keep getting this error, i tried to search but not getting any working post on this error if you know please help me to fix this error
File "jal.py", line 9
app_id=’PR5756-H3EP749GGH'
^
SyntaxError: invalid syntax
please help; i have to show project tomorrow :(
my code is
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import wolframalpha
import sys
app_id=’PR5756-H3EP749GGH'
client = wolframalpha.Client(app_id)
query = ‘ ‘.join(sys.argv[1:])
res = client.query(query)
if len(res.pods) > 0:
texts = “”
pod = res.pods[1]
if pod.text:
texts = pod.text
else:
texts = “I have no answer for that”
# to skip ascii character in case of error
texts = texts.encode(‘ascii’, ‘ignore’)
print texts
else:
print “Sorry, I am not sure.”
You used a backtick (´) instead of a single-quote (').
app_id='PR5756-H3EP749GGH'
Python directly shows you the error.
Also, use an editor with text highlighting.

django.http.HttpResponse does not deal unicode properly

following Tutorial 3, I have written this trivial views.py:
# coding = UTF-8
from django.http import HttpResponse
def index(request):
return HttpResponse( u"Seznam kontaktů" )
I tried also other tricks, such as using django.utils.encoding.smart_unicode(...), the u"%s" % ... trick, etc.
Whatever I try, I always get "Non-ASCII character" error:
SyntaxError at /kontakty/
Non-ASCII character '\xc5' in file C:\Users\JindrichVavruska\eclipse\workspace\ars\src\ars_site\party\views.py
on line 5, but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details (views.py, line 5)
It is even more mysterious because I used a lot of national character strings in other files, such as models.py, e.g. text = models.CharField( u"Všechen text", max_length = 150), and there was absolutely no problem at all.
I found other answers on this site irrelevant, the suggested changes make no difference in my views.py
Jindra
It should be # -*- coding: utf-8 -*- not UTF-8. See PEP-263 for more details. You should also save the file as UTF-8. Check your editor's settings.