python print str.decode("utf-8") UnicodeEncodeError - python-2.7

I want to convert a python string (utf-8) to unicode.
word = "3——5" # —— is u'\u2013', not a english symbol
print type(word).__name__ # output is str
print repr(word) # output is '3\xe2\x80\x935'
print word.decode("utf-8", errors='ignore')
I got this error
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1: ordinal not in range(128)
But when I change
word.decode("utf-8", errors='ignore')
to
word.decode(errors='ignore')
the error disappears.
Why? word is a utf-8 string, why can't i specify utf-8 to decode?

Related

python .format() can't interpolate accented letters - Python 2.7

I'm trying to interpolate strings that are saved correctly with accented letters in a database. When I recover them I have an error:
'<html><div>{ragioneSociale}{iva}{sdi}{cuu}{indirizzo}{metodoDiPagamento}{iban_bic}</div></html>'.format(
ragioneSociale=generaleViewRes.getString('ragioneSociale'),
iva=generaleViewRes.getString('iva'),
sdi=generaleViewRes.getString('sdi'),
cuu=generaleViewRes.getString('cuu'),
indirizzo=generaleViewRes.getString('indirizzo'),
metodoDiPagamento=generaleViewRes.getString('metodoDiPagamento'),
iban_bic=generaleViewRes.getString('iban_bic')
)
Then I tried to use encode('utf-8') on each single element individually, then encode('utf-8').decode('utf-8') and finally .decode('utf-8'). The errors were:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd2' in position 57: ordinal not in range(128)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd2' in position 57: ordinal not in range(128)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 57-58: invalid data
Unfortunately this is an error that I often find with the .format method, which I had solved in smaller contexts using the + operator. The fact is that large interpolations cannot be solved by using the + operator for a matter of readability and for other problems that the .format provides. I wonder, is it possible that this problem has never been solved?
Thanks in advance.

python + unicodeEncodeError \xb5 while reading from excel and writing to msqldatabase

I have a python 2.7 script that reads data from an excel file where it is possible that the user uses special characters (e.g. µ). and write it in a msqldatabase.
I've added the next code on top f the file:
# -*- coding: utf-8 -*-
But it still uses the ascii codec. How can I solve this error.
This is the errocode:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 19: ordinal not in range(128)
tx in advance
I have faced this issue while inserting values from 3rd party app. over there I used to insert values with escape string.
from re import escape
r = escape('µ')
Result:
'\\\xc2\\\xb5'
in insert statement pass r variable value.

Reading and writing UTF-8 from file

I have some text encoded in UTF-8. 'Before – after.' It was fetched from the web. The '–' character is the issue. If you try to print directly from the command line, using copy and paste:
>>> text = 'Before – after.'
>>> print text
Before – after.
But if you save to a text file and try to print:
>>> for line in ('file.txt','r'):
>>> print line
Before û after.
Im pretty sure this is some sort of UTF-8 encode/decode error, but it is eluding me. I have tried to decode, or re-encode but that is not it either.
>>> for line in ('file.txt','r'):
>>> print line.decode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 7: invalid start byte
>>> for line in ('file.txt','r'):
>>> print line.encode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 7: invalid start byte
It's happening because a non-ascii character cannot be encoded or decoded. You can strip it out and then print the ascii values.
Take a look at this question : UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte

How do I write a capital Greek "delta" as a string in Python 2.7?

I am looking for this character: Δ which I need for a legend item in matplotlib. Python 3.x features a str type that contains Unicode characters, but I couldn't find any valuable information about how to do it in Python 2.7.
x = range(10)
y = [5] * 10
z = [y[i] - x[i] for i in xrange(10)]
plt.plot(x,z,label='Δ x,y')
plt.legend()
plt.show()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position
0: ordinal not in range(128)
Although #berna1111's comment is correct, you don't need to use LaTeX format to get a ∆ character.
In python 2, you need to specify that a string is unicode by using the u'' construct (see doc here). E.g.:
plt.plot(x,z,label=u'Δ x,y')

Replace utf8 characters

I want to replace some utf-8 characters set with another utf-8 character set but anything I try I end up with errors.
I am a noob at Python so please be patient
What I want to achieve is converting characters by unicode values or by html entities (more readable, for maintanance)
Tries (with example):
1.First
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#Found this function
def multiple_replace(dic, text):
pattern = "|".join(map(re.escape, dic.keys()))
return re.sub(pattern, lambda m: dic[m.group()], text)
text="Larry Wall is ùm© some text"
replace_table = {
u'\x97' : u'\x82' # ù -> é
}
text2=multiple_replace(dic,text)
print text #Expected:Larry Wall is ém© some text
#Got: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
2.Html entities
dic = {
"ú" : "é" # ù -> é
}
some_text="Larry Wall is ùm© some text"
some_text2=some_text.encode('ascii', 'xmlcharrefreplace')
some_text2=multiple_replace(dic,some_text2)
print some_text2
#Got:UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)
Any ideas are welcome
Your problem is due to the fact that your input strings are in non-unicode representation (<type 'str'> rather than <type 'unicode'>). You must define the input string using the u"..." syntax:
text=u"Larry Wall is ùm© some text"
# ^
(Besides you will have to fix the last statement in your first example - currently it prints the input string (text), whereas I am pretty sure that you meant to see the result (text2)).