Solved with your help
#!/usr/bin/python
# -*- coding: utf-8 -*-
message = {'message1':'நாம்','message2':'செய்தி'}
a={}
for i in message.keys():
if "message" in i:
a[i]=message[i]
status="success"
print a
got output:
{'message2':'செய்தி','message1':'நாம்'}
Thanks for all your help!!!!
You need to decode with 'utf-8' format in order to print/read as it is
print message['message1'].decode('utf-8')
This will print correctly.
Have you tried writing them as literal string:
message = {'message1': r'நாம்','message2':r'செய்தி'}
Related
I want to use the xlwt to create an Excel file. However, some of the strings contain letters like ä, ü and ö. Therefore, I get the UnicodeDecodeError. Can this be fixed?
I transfered my code from 3.5 (IDLE) to 2.7 (Pycharm). It worked in 3.5, probably because I didn't need to put
# coding=utf-8
# -*- coding: iso-8859-1 -*-
at the beginning of the code in 3.5...
# coding=utf-8
# -*- coding: iso-8859-1 -*-
import xlwt
name_of_new_file = "Test.xls"
workbook_new = xlwt.Workbook()
worksheet = workbook_new.add_sheet("Testing")
worksheet.write(0, 0, "ä") # it works if I write an "a" instead
workbook_new.save("C:\\...\\Test123.xls")
I'm pretty sure the reason for the problem is about the first two lines. The Error message says:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)
in python 2, the encoding is a thouble
worksheet.write(0, 0, u"ä")
I'm scraping this site www.soundkartell.de, and I'm facing some unicode issues:
results =[]
for article in soup.find_all('article'):
if article.select('a[href*="alternative"]'):
artist = article.h2.text
results.append(artist.encode('latin1').decode("utf-8"))
print artist # Din vän Skuggan
print results # [u'Din v\xe4n Skuggan']
I have -*- coding: utf-8 -*-at the top of my file.
why does python print scraped data correctly and not the appended data?
how do I fix the unicode issue?
I am using Python 2.7.x
You likely do not actually have a problem. What you are seeing is a side effect of how python prints things:
Sample Code:
artist = 'Din vän Skuggan'
artists = [artist]
print 'artist:', artist
print 'artists:', artists
print 'str:', str(artist)
print 'repr:', repr(artist)
Produces:
artist: Din vän Skuggan
artists: ['Din v\xc3\xa4n Skuggan']
str: Din vän Skuggan
repr: 'Din v\xc3\xa4n Skuggan'
So as can be seen above, when python prints a list, it use the repr() for the items in the list. In both cases you have the same contents, python is just showing it differently.
Side Note:
# -*- coding: utf-8 -*-
At top of your script, is useful for string literals with unicode text in your code.
I wrote the following function in python 2.7 to clean the text but it doesn't work without decoding the tweet variable to utf8
# -*- coding: utf-8 -*-
import re
def clean_tweet(tweet):
tweet = re.sub(u"[^\u0622-\u064A]", ' ', tweet, flags=re.U)
return tweet
if __name__ == "__main__":
s="sadfas سيبس sdfgsdfg/dfgdfg ffeee منت منشس يت??بمنشس//تبي منشكسميكمنشسكيمنك ٌاإلا رًاٌااًٌَُ"
print "not working "+clean_tweet(s)
print "working "+clean_tweet(s.decode("utf-8"))
Could any one explain why?
Because I don't want to use the decoding as it makes the manipulation of the text in Sframe in graphlab is too slow.
i would want to extract the id of this statement , how could i proceed with this in python.
i am a beginner in python.
javascript:return WebForm_FireDefaultButton(event, 'ctl00_ibtnFind')
#!/usr/bin/python2
# -*- coding: utf-8 -*-
import re
input = """
javascript:return WebForm_FireDefaultButton(event, 'ctl00_ibtnFind')
javascript:return WebForm_FireDefaultButton(event, 'ctl00_ibtnFind2')
"""
m = re.findall("javascript:return WebForm_FireDefaultButton\(event, '([^']+)'\)", input)
print m
I can't make output in Russian language only output of Unicode=(
I use Pythonv.2.7.9
Microsoft 8
How I can do that with list?
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup
r = requests.get("http://fs.to/video/films/group/film_genre/")
response = r.content.decode('utf-8')
page = BeautifulSoup(response)
for tag in page.findAll('li'):
a = tag.find('a')
for b in a.contents:
print (u'{0}'.format(u'○'),unicode(b.string))
Example of output must be like:
Аниме
Биография
...
Фэнтези
Эротика
Change the last line to:
print (u'{0}'.format(u'○'),b.string