django.http.HttpResponse does not deal unicode properly - django

following Tutorial 3, I have written this trivial views.py:
# coding = UTF-8
from django.http import HttpResponse
def index(request):
return HttpResponse( u"Seznam kontaktů" )
I tried also other tricks, such as using django.utils.encoding.smart_unicode(...), the u"%s" % ... trick, etc.
Whatever I try, I always get "Non-ASCII character" error:
SyntaxError at /kontakty/
Non-ASCII character '\xc5' in file C:\Users\JindrichVavruska\eclipse\workspace\ars\src\ars_site\party\views.py
on line 5, but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details (views.py, line 5)
It is even more mysterious because I used a lot of national character strings in other files, such as models.py, e.g. text = models.CharField( u"Všechen text", max_length = 150), and there was absolutely no problem at all.
I found other answers on this site irrelevant, the suggested changes make no difference in my views.py
Jindra

It should be # -*- coding: utf-8 -*- not UTF-8. See PEP-263 for more details. You should also save the file as UTF-8. Check your editor's settings.

Related

UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 12

I'm developing a chatbot with the chatterbot library. The chatbot is in my native language --> Slovene, which has a lot of strange characters (for example: š, č, ž). I'm using python 2.7.
When I try to train the bot, the library has trouble with the characters mentioned above. For example, when I run the following code:
chatBot.set_trainer(ListTrainer)
chatBot.train([
"Koliko imam še dopusta?",
"Letos imate še 19 dni dopusta.",
])
it throws the following error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 12: invalid start byte
I added the # -*- coding: utf-8 -*- line to the top of my file, I also changed the encoding of all used files via my editor (Sublime text 3) to utf-8, I changed the system default encoding with the following code:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
The strings are of type unicode.
When I try to get a response, with these strange characters, it works, it has no issues with them. For example, running the following code in the same execution as the above training code(when I change 'š' to 's' and 'č' to 'c', in the train strings), throws no errors:
chatBot.set_trainer(ListTrainer)
chatBot.train([
"Koliko imam se dopusta?",
"Letos imate se 19 dni dopusta.",
])
chatBot.get_response("Koliko imam še dopusta?")
I can't find a solution to this issue. Any suggestions?
Thanks loads in advance. :)
EDIT: I used from __future__ import unicode_literals, to make strings of type unicode. I also checked if they really were unicode with the method type(myString)
I would also like to paste this link.
EDIT 2: #MallikarjunaraoKosuri - s code works, but in my case, I had one more thing inside the chatbot instance intialization, which is the following:
chatBot = ChatBot(
'Test',
trainer='chatterbot.trainers.ListTrainer',
storage_adapter='chatterbot.storage.JsonFileStorageAdapter'
)
This is the cause of my error. The json storage file the chatbot creates, is created in my local encoding and not in utf-8. It seems the default storage (.sqlite3), doesn't have this issue, so for now I'll just avoid the json storage. But I am still interested in finding a solution to this error.
The strings from your example are not of type unicode.
Otherwise Python would not throw the UnicodeDecodeError.
This type of error says that at a certain step of program's execution Python tries to decode byte-string into unicode but for some reason fails.
In your case the reason is that:
decoding is configured by utf-8
your source file is not in utf-8 and almost certainly in cp1252:
import unicodedata
b = '\x9a'
# u = b.decode('utf-8') # UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a
# in position 0: invalid start byte
u = b.decode('cp1252')
print unicodedata.name(u) # LATIN SMALL LETTER S WITH CARON
print u # š
So, the 0x9a byte from your cp1252 source can't be decoded with utf-8.
The best solution is to do nothing except convertation your source to utf-8.
With Sublime Text 3 you can easily do it by: File -> Reopen with Encoding -> UTF-8.
But don't forget to Ctrl+C your source code before the convertation beacuse just after that all your š, č, ž chars wil be replaced with ?.
Some of our friends are already suggested good part solutions, However again I would like combine all the solutions into one.
And author #gunthercox suggested some guidelines are described here http://chatterbot.readthedocs.io/en/stable/encoding.html#how-do-i-fix-python-encoding-errors
# -*- coding: utf-8 -*-
from chatterbot import ChatBot
# Create a new chat bot named Test
chatBot = ChatBot(
'Test',
trainer='chatterbot.trainers.ListTrainer'
)
chatBot.train([
"Koliko imam še dopusta?",
"Letos imate še 19 dni dopusta.",
])
Python Terminal
>>> # -*- coding: utf-8 -*-
... from chatterbot import ChatBot
>>>
>>> # Create a new chat bot named Test
... chatBot = ChatBot(
... 'Test',
... trainer='chatterbot.trainers.ListTrainer'
... )
>>>
>>> chatBot.train([
... "Koliko imam še dopusta?",
... "Letos imate še 19 dni dopusta.",
... ])
List Trainer: [####################] 100%
>>>

My telegram bot does not support Persian language

I built a telegram bot with Python-Telegram-bot, and I want to send a bot to a user in Persian when the user sends /Start ;but the bot does not work.
My Code:
from telegram.ext import Updater,CommandHandler
updater = Updater(token='TOKEN')
def start_method(bot,update):
bot.sendMessage(update.message.chat_id,"سلام")
start_command = CommandHandler('start', start_method)
updater.dispatcher.add_handler(start_command)
updater.start_polling()
If you want to use unicode text in your code, you have to specify the file encoding according to PEP 263.
Place this comment at the beginning of your script:
#!/usr/bin/python
# -*- coding: utf-8 -*-
You can also use Python 3, which has much better unicode support in general and assumes utf-8 encoding for source files by default.
First, need to use a urllib. If your text is something like txt1, you need to quote it first and then send it as a message. like this:
from urllib.parse import quote
......
txt1 = 'سلام. خوش آمدید!'
txt = quote(txt1.encode('utf-8'))

How can I test output of non-ASCII characters using Sphinx doctest?

I'm at a loss how to test printing output that includes non-ASCII characters using Sphinx doctest.
When I have test that include code that generates non-ASCII characters, or that contains expected results that include non-ASCII characters, I get encoding errors.
For example, if I have:
def foo():
return 'γ'
then a doctest including
>>> print(foo())
will produce an error of the form
Encoding error:
'ascii' codec can't encode character u'\u03b3' in position 0: ordinal not in range(128)
as will any test of the form
>>> print('')
γ
Only by ensuring that none of my functions whose results I'm attempting to print, and none of the expected printed results, contain such characters can I avoid these errors. As a result I've had to disable many important tests.
At the head of all my code I have
# encoding: utf8
from __future__ import unicode_literals
and (in desperation) I've tried things like
doctest_global_setup =(
'#encoding: utf8\n\n'
'from __future__ import unicode_literals\n'
)
and
.. testsetup::
from __future__ import unicode_literals
but these (of course) don't change the outcome.
How can I test output of non-ASCI characters using Sphinx doctest?
I believe it is due to your from __future__ import unicode_literals statement. print will implicitly encode Unicode strings to the terminal encoding. Lacking a terminal, Python 2 will default to the ascii codec.
If you skip an explicit print, it will work with or without import:
>>> def foo():
... return 'ë'
...
>>> foo()
'\x89'
Or:
>>> from __future__ import unicode_literals
>>> def foo():
... return 'ë'
...
>>> foo()
u'\xeb'
Then you can test for the escaped representation of the string.
You can also try changing the encoding of print itself with PYTHONIOENCODING=utf8.

PyCharm issue on encoding

I am trying pycharm and facing an encoding issue. Can you please help resolve it.
code:
# -*- coding: utf-8 -*-
__author__ = 'me'
import os, sys
def main():
print repeat('mike',False)
print repeat('mok', True)
"""
comments here..
"""
def repeat(s,exclaim):
result = s*3
if exclaim:
result = result +'!!!'
return result
if __name__ == '__main__':
main()
error:
C:\Python27\python.exe C:\Python27\python.exe C:/Users/prakashs/PycharmProjects/GooglePython/WarmUp.py
File "C:\Python27\python.exe", line 1
SyntaxError: Non-ASCII character '\x90' in file C:\Python27\python.exe on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Process finished with exit code 1
I have set the default encoding in pycharm to utf-8 as well. but i need to know where in pycharm we have to edit the settings.
Thank you.
Googling for Non-ASCII character '\x90' in file gives Using #-*- coding: utf-8 -*- does not remove "Non-ASCII character '\x90' in file hello.exe on line 1, but no encoding declared" error Stackoverflow question as the first hit. There you'll find answer to your question.
You have wrong command starting with C:\Python27\python.exe C:\Python27\python.exe... (python.exe is mentioned twice) which means you try to run executable (python.exe) instead of script file (WarmUp.py).

Selecting nodes with non-ASCII characters in Scrapy

I have the following simple web scraper written in Scrapy:
#!/usr/bin/env python
# -*- coding: latin-1 -*-
from scrapy.http import Request
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class MySpiderTest(BaseSpider):
name = 'MySpiderTest'
allowed_domains = ["boliga.dk"]
start_urls = ["http://www.boliga.dk/bbrinfo/3B71489C-AEA0-44CA-A0B2-7BD909B35618",]
def parse(self, response):
hxs = HtmlXPathSelector(response)
item = bbrItem()
print hxs.select("id('unitControl')/div[2]/table/tbody/tr[td//text()[contains(.,'Antal Badeværelser')]]/td[2]/text()").extract()
but when I run the spider I get the following syntax error:
SyntaxError: Non-ASCII character '\xe6' in file... on line 32, but no encoding declared
because of the æ in the xpath. The xpath is working in Xpath Checker for Firefox. I tried URL-encoding the æ, but that didn't work. What am I missing?
thanks!
UPDATE: I have added the encoding declaration in the beginning of the code (Latin-1 should support Danish characters)
Use a unicode string for your XPath expression
hxs.select(u"id('unitControl')/div[2]/table/tbody/tr[td//text()[contains(.,'Antal Badeværelser')]]/td[2]/text()").extract()
or
hxs.select(u"id('unitControl')/div[2]/table/tbody/tr[td//text()[contains(.,'Antal Badev\u00e6relser')]]/td[2]/text()").extract()
See Unicode Literals in Python Source Code
SyntaxError: Non-ASCII character ‘\xe2′ in file … on line 40,
but no decoding declared …
This is caused by the replacing standard characters like apostrophe (‘) by non-standard characters like quotation mark (`) during copying.
Try to edit the text copied from pdf.
repsonse.xpath("//tr[contains(., '" + u'中文字符' + "')]").extract()