python Error : must be char not unicode - python-2.7

def get_point_id(point_type, id):#id is str,like:"001",“1001.0”
if not id:
return None
if True:
str_id = str(int(id))
if point_type == 'alarm':
return str_id.rjust(3,'0')# error
When I run it ,the console display:
File "b2_insert_standard_template_signals.py", line 118, in get_point_id
return str_id.rjust(3,'0')
TypeError: must be char, not unicode
str_id is a str,not unicode,I don't know why. This py file already declare # -- coding: utf-8 --
my python version is 2.7.3. Please help me.

ok,I know what happened. because I declare “ # -- coding: utf-8 --” in the front of the .py file.
so the second argument of the rjust method is utf-8 encoding. This is wrong.
the second argument of the rjust method must be "str".
It's funny

Related

UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 12

I'm developing a chatbot with the chatterbot library. The chatbot is in my native language --> Slovene, which has a lot of strange characters (for example: š, č, ž). I'm using python 2.7.
When I try to train the bot, the library has trouble with the characters mentioned above. For example, when I run the following code:
chatBot.set_trainer(ListTrainer)
chatBot.train([
"Koliko imam še dopusta?",
"Letos imate še 19 dni dopusta.",
])
it throws the following error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 12: invalid start byte
I added the # -*- coding: utf-8 -*- line to the top of my file, I also changed the encoding of all used files via my editor (Sublime text 3) to utf-8, I changed the system default encoding with the following code:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
The strings are of type unicode.
When I try to get a response, with these strange characters, it works, it has no issues with them. For example, running the following code in the same execution as the above training code(when I change 'š' to 's' and 'č' to 'c', in the train strings), throws no errors:
chatBot.set_trainer(ListTrainer)
chatBot.train([
"Koliko imam se dopusta?",
"Letos imate se 19 dni dopusta.",
])
chatBot.get_response("Koliko imam še dopusta?")
I can't find a solution to this issue. Any suggestions?
Thanks loads in advance. :)
EDIT: I used from __future__ import unicode_literals, to make strings of type unicode. I also checked if they really were unicode with the method type(myString)
I would also like to paste this link.
EDIT 2: #MallikarjunaraoKosuri - s code works, but in my case, I had one more thing inside the chatbot instance intialization, which is the following:
chatBot = ChatBot(
'Test',
trainer='chatterbot.trainers.ListTrainer',
storage_adapter='chatterbot.storage.JsonFileStorageAdapter'
)
This is the cause of my error. The json storage file the chatbot creates, is created in my local encoding and not in utf-8. It seems the default storage (.sqlite3), doesn't have this issue, so for now I'll just avoid the json storage. But I am still interested in finding a solution to this error.
The strings from your example are not of type unicode.
Otherwise Python would not throw the UnicodeDecodeError.
This type of error says that at a certain step of program's execution Python tries to decode byte-string into unicode but for some reason fails.
In your case the reason is that:
decoding is configured by utf-8
your source file is not in utf-8 and almost certainly in cp1252:
import unicodedata
b = '\x9a'
# u = b.decode('utf-8') # UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a
# in position 0: invalid start byte
u = b.decode('cp1252')
print unicodedata.name(u) # LATIN SMALL LETTER S WITH CARON
print u # š
So, the 0x9a byte from your cp1252 source can't be decoded with utf-8.
The best solution is to do nothing except convertation your source to utf-8.
With Sublime Text 3 you can easily do it by: File -> Reopen with Encoding -> UTF-8.
But don't forget to Ctrl+C your source code before the convertation beacuse just after that all your š, č, ž chars wil be replaced with ?.
Some of our friends are already suggested good part solutions, However again I would like combine all the solutions into one.
And author #gunthercox suggested some guidelines are described here http://chatterbot.readthedocs.io/en/stable/encoding.html#how-do-i-fix-python-encoding-errors
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
# Create a new chat bot named Test
chatBot = ChatBot(
'Test',
trainer='chatterbot.trainers.ListTrainer'
)
chatBot.train([
"Koliko imam še dopusta?",
"Letos imate še 19 dni dopusta.",
])
Python Terminal
>>> # -*- coding: utf-8 -*-
... from chatterbot import ChatBot
>>>
>>> # Create a new chat bot named Test
... chatBot = ChatBot(
... 'Test',
... trainer='chatterbot.trainers.ListTrainer'
... )
>>>
>>> chatBot.train([
... "Koliko imam še dopusta?",
... "Letos imate še 19 dni dopusta.",
... ])
List Trainer: [####################] 100%
>>>

Python 2.7 : UnicodeDecodeError when I use character point with import socket connect

I work with python under windows. I have this error "UnicodeDecodeError: 'utf8' codec can't decode byte 0x92" when I excecute this simple code :
import socket
s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((controlAddr, 9051))
controlAddr is "127.0.0.1" and I know that it is the character '.' which cause the problem so I tried different conversion but each time, I have the same error. I tried these different ways:
controlAddr = u'127.0.0.1'
controlAddr = unicode('127.0.0.1')
controlAddr.encode('utf-8')
controlAddr = u'127'+unichr(ord('\x2e'))+u'0'+unichr(ord('\x2e'))+'0'+unichr(ord('\x2e'))+u'1'
I added # -*- coding: utf-8 -*- at the begining of the main file and socket.py file.
... I still have the same error
Your error says 'utf8' codec can't decode byte 0x92". In the Windows codepage 1252, this character maps to U+2019 the right quotation mark ’.
It is likely that the editor you use for your Python script is configured to replace the single quote ('\x27' or ') by the right quotation mark. It may be nicer for text, but is terrible in source code. You must fix it in your editor, or use another editor.
The error message says you have a byte 0x92 in your file somewhere, which is not valid in utf-8, but in other encodings it may be, for example:
>>> b'\x92'.decode('windows-1252')
'`'
That means that your file encoding is not utf-8, but probably windows-1252, and problematic character is the backtick, not the dot, even if that character is found only in a comment.
So either change your file encoding to utf-8 in your editor, or the encoding line to
# -*- coding: windows-1252 -*-
The error message doesn't mention the file the interpreter choked on, but it may be your "main" file, not socket.py.
Also, don't name your file socket.py, that will shadow the builtin socket module and lead to further errors.
Setting an encoding line only affects that one file, you need to do this for every file, only setting it in your "main" file would not be enough.
Thank you ! Indeed, this character doesn't exist in utf-8.
However, I didn't send the character "`", corresponding to 0x92 with windows-1252 and to nothing in utf-8. Futhermore this error appears when a character "." is in controlAddr and it is the same hexadecimal code for both encoding, i.e, 0x2e.
The complete error message is given above :
Traceback (most recent call last):
File "C:\Python27\Lib\site-packages\spyderlib\widgets\externalshell\pythonshell.py", line 566, in write_error
self.shell.write_error(self.get_stderr())
File "C:\Python27\Lib\site-packages\spyderlib\widgets\externalshell\baseshell.py", line 272, in get_stderr
return self.transcode(qba)
File "C:\Python27\Lib\site-packages\spyderlib\widgets\externalshell\baseshell.py", line 258, in transcode
return to_text_string(qba.data(), 'utf8')
File "C:\Python27\Lib\site-packages\spyderlib\py3compat.py", line 134, in to_text_string
return unicode(obj, encoding)
File "C:\Python27\Lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 736: invalid start byte
For this code :
controlPort = 9051
controlAddr = unicode("127.0.0.1")
import socket
s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((controlAddr, controlPort))

PyCharm issue on encoding

I am trying pycharm and facing an encoding issue. Can you please help resolve it.
code:
# -*- coding: utf-8 -*-
__author__ = 'me'
import os, sys
def main():
print repeat('mike',False)
print repeat('mok', True)
"""
comments here..
"""
def repeat(s,exclaim):
result = s*3
if exclaim:
result = result +'!!!'
return result
if __name__ == '__main__':
main()
error:
C:\Python27\python.exe C:\Python27\python.exe C:/Users/prakashs/PycharmProjects/GooglePython/WarmUp.py
File "C:\Python27\python.exe", line 1
SyntaxError: Non-ASCII character '\x90' in file C:\Python27\python.exe on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Process finished with exit code 1
I have set the default encoding in pycharm to utf-8 as well. but i need to know where in pycharm we have to edit the settings.
Thank you.
Googling for Non-ASCII character '\x90' in file gives Using #-*- coding: utf-8 -*- does not remove "Non-ASCII character '\x90' in file hello.exe on line 1, but no encoding declared" error Stackoverflow question as the first hit. There you'll find answer to your question.
You have wrong command starting with C:\Python27\python.exe C:\Python27\python.exe... (python.exe is mentioned twice) which means you try to run executable (python.exe) instead of script file (WarmUp.py).

string vs unicode encoding - Struct() argument

I am experiencing a strange problem that returns the same error, regardless of the encoding I use. The code works well, without the encoding part in Python 2.7.8, but it breaks in 2.7.6 which is the version that I use for all my development.
import MIDI_PY2 as md
import glob
import ast
import os
dir = '/Users/user/Desktop/sample midis/'
os.chdir(dir)
file_list = []
for file in glob.glob('*.mid'):
file_list.append((dir + file))
dir = '/Users/user/Desktop/sample midis/'
os.chdir(dir)
file_list returns this:
[u'/Users/user/Desktop/sample midis/M1.mid',
u'/Users/user/Desktop/sample midis/M2.mid',
u'/Users/user/Desktop/sample midis/M3.mid',
u'/Users/user/Desktop/sample midis/M4.mid']
md.concatenate_midis(file_list,'/Users/luissanchez/Desktop/temp/out.mid') returns this error:
-
TypeError Traceback (most recent call last)
<ipython-input-73-2d7eef92f566> in <module>()
----> 1 md.concatenate_midis(file_list_1,'/Users/user/Desktop/temp/out.mid')
/Users/user/Desktop/sample midis/MIDI_PY2.pyc in concatenate_midis(paths, outPath)
/Users/user/Desktop/sample midis/MIDI_PY2.pyc in midi2score(midi)
/Users/user/Desktop/sample midis/MIDI_PY2.pyc in midi2opus(midi)
TypeError: Struct() argument 1 must be string, not unicode
then I modify the code so the first argument is string, not unicode:
file_list_1 = [str(x) for x in file_list]
which returns:
['/Users/user/Desktop/sample midis/M1.mid',
'/Users/user/Desktop/sample midis/M2.mid',
'/Users/user/Desktop/sample midis/M3.mid',
'/Users/user/Desktop/sample midis/M4.mid']
running the function concatenate_midis with this last list (file_list_1) returns exactly the same error: TypeError: Struct() argument 1 must be string, not unicode.
Does anybody knows what's going on here? concatenate_midi works well in python 2.7.8, but can't figure out why it doesn't work in what I use, Enthought Canopy Python 2.7.6 | 64-bit
Thanks
The error
error: TypeError: Struct() argument 1 must be string, not unicode.
is usually caused by the struct.unpack() function which in older versions of python requires string arguments and not unicode. Check that struct.unpack() arguments are strings and not unicodes.
One possible cause is from __future__ .. statement.
>>> type('a')
<type 'str'>
>>> from __future__ import unicode_literals
>>> type('a')
<type 'unicode'>
Check whether your code contains the statement.

django.http.HttpResponse does not deal unicode properly

following Tutorial 3, I have written this trivial views.py:
# coding = UTF-8
from django.http import HttpResponse
def index(request):
return HttpResponse( u"Seznam kontaktů" )
I tried also other tricks, such as using django.utils.encoding.smart_unicode(...), the u"%s" % ... trick, etc.
Whatever I try, I always get "Non-ASCII character" error:
SyntaxError at /kontakty/
Non-ASCII character '\xc5' in file C:\Users\JindrichVavruska\eclipse\workspace\ars\src\ars_site\party\views.py
on line 5, but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details (views.py, line 5)
It is even more mysterious because I used a lot of national character strings in other files, such as models.py, e.g. text = models.CharField( u"Všechen text", max_length = 150), and there was absolutely no problem at all.
I found other answers on this site irrelevant, the suggested changes make no difference in my views.py
Jindra
It should be # -*- coding: utf-8 -*- not UTF-8. See PEP-263 for more details. You should also save the file as UTF-8. Check your editor's settings.