python + unicodeEncodeError \xb5 while reading from excel and writing to msqldatabase - python-2.7

I have a python 2.7 script that reads data from an excel file where it is possible that the user uses special characters (e.g. µ). and write it in a msqldatabase.
I've added the next code on top f the file:
# -*- coding: utf-8 -*-
But it still uses the ascii codec. How can I solve this error.
This is the errocode:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 19: ordinal not in range(128)
tx in advance

I have faced this issue while inserting values from 3rd party app. over there I used to insert values with escape string.
from re import escape
r = escape('µ')
Result:
'\\\xc2\\\xb5'
in insert statement pass r variable value.

Related

python .format() can't interpolate accented letters - Python 2.7

I'm trying to interpolate strings that are saved correctly with accented letters in a database. When I recover them I have an error:
'<html><div>{ragioneSociale}{iva}{sdi}{cuu}{indirizzo}{metodoDiPagamento}{iban_bic}</div></html>'.format(
ragioneSociale=generaleViewRes.getString('ragioneSociale'),
iva=generaleViewRes.getString('iva'),
sdi=generaleViewRes.getString('sdi'),
cuu=generaleViewRes.getString('cuu'),
indirizzo=generaleViewRes.getString('indirizzo'),
metodoDiPagamento=generaleViewRes.getString('metodoDiPagamento'),
iban_bic=generaleViewRes.getString('iban_bic')
)
Then I tried to use encode('utf-8') on each single element individually, then encode('utf-8').decode('utf-8') and finally .decode('utf-8'). The errors were:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd2' in position 57: ordinal not in range(128)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd2' in position 57: ordinal not in range(128)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 57-58: invalid data
Unfortunately this is an error that I often find with the .format method, which I had solved in smaller contexts using the + operator. The fact is that large interpolations cannot be solved by using the + operator for a matter of readability and for other problems that the .format provides. I wonder, is it possible that this problem has never been solved?
Thanks in advance.

python print str.decode("utf-8") UnicodeEncodeError

I want to convert a python string (utf-8) to unicode.
word = "3——5" # —— is u'\u2013', not a english symbol
print type(word).__name__ # output is str
print repr(word) # output is '3\xe2\x80\x935'
print word.decode("utf-8", errors='ignore')
I got this error
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1: ordinal not in range(128)
But when I change
word.decode("utf-8", errors='ignore')
to
word.decode(errors='ignore')
the error disappears.
Why? word is a utf-8 string, why can't i specify utf-8 to decode?

How do I write a capital Greek "delta" as a string in Python 2.7?

I am looking for this character: Δ which I need for a legend item in matplotlib. Python 3.x features a str type that contains Unicode characters, but I couldn't find any valuable information about how to do it in Python 2.7.
x = range(10)
y = [5] * 10
z = [y[i] - x[i] for i in xrange(10)]
plt.plot(x,z,label='Δ x,y')
plt.legend()
plt.show()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position
0: ordinal not in range(128)
Although #berna1111's comment is correct, you don't need to use LaTeX format to get a ∆ character.
In python 2, you need to specify that a string is unicode by using the u'' construct (see doc here). E.g.:
plt.plot(x,z,label=u'Δ x,y')

Replace utf8 characters

I want to replace some utf-8 characters set with another utf-8 character set but anything I try I end up with errors.
I am a noob at Python so please be patient
What I want to achieve is converting characters by unicode values or by html entities (more readable, for maintanance)
Tries (with example):
1.First
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#Found this function
def multiple_replace(dic, text):
pattern = "|".join(map(re.escape, dic.keys()))
return re.sub(pattern, lambda m: dic[m.group()], text)
text="Larry Wall is ùm© some text"
replace_table = {
u'\x97' : u'\x82' # ù -> é
}
text2=multiple_replace(dic,text)
print text #Expected:Larry Wall is ém© some text
#Got: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
2.Html entities
dic = {
"ú" : "é" # ù -> é
}
some_text="Larry Wall is ùm© some text"
some_text2=some_text.encode('ascii', 'xmlcharrefreplace')
some_text2=multiple_replace(dic,some_text2)
print some_text2
#Got:UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)
Any ideas are welcome
Your problem is due to the fact that your input strings are in non-unicode representation (<type 'str'> rather than <type 'unicode'>). You must define the input string using the u"..." syntax:
text=u"Larry Wall is ùm© some text"
# ^
(Besides you will have to fix the last statement in your first example - currently it prints the input string (text), whereas I am pretty sure that you meant to see the result (text2)).

Python: ascii codec can't encode en-dash

I'm trying to print a poem from the Poetry Foundation's daily poem RSS feed with a thermal printer that supports an encoding of CP437. This means I need to translate some characters; in this case an en-dash to a hyphen. But python won't even encode the en dash to begin with. When I try to decode the string and replace the en-dash with a hyphen I get the following error:
Traceback (most recent call last):
File "pftest.py", line 46, in <module>
str = str.decode('utf-8')
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 140: ordinal not in range(128)
And here is my code:
#!/usr/bin/python
#-*- coding: utf-8 -*-
# This string is actually a variable entitled d['entries'][1].summary_detail.value
str = """Love brought by night a vision to my bed,
One that still wore the vesture of a child
But eighteen years of age – who sweetly smiled"""
str = str.decode('utf-8')
str = str.replace("\u2013", "-") #en dash
str = str.replace("\u2014", "--") #em dash
print (str)
I can actually print the output using the following code without errors in my terminal window (Mac), but my printer spits out sets of 3 CP437 characters:
str = u''.str.encode('utf-8')
I'm using Sublime Text as my editor, and I've saved the page with UTF-8 encoding, but I'm not sure that will help things. I would greatly appreciate any help with this code. Thank you!
I don't fully understand what's happening in your code, but I've also been trying to replace en-dashes with hyphens in a string I got from the Web, and here's what's working for me. My code is just this:
txt = re.sub(u"\u2013", "-", txt)
I'm using Python 2.7 and Sublime Text 2, but I don't bother setting -*- coding: utf-8 -*- in my script, as I'm trying not to introduce any new encoding issues. (Even though my variables may contain Unicode I like to keep my code pure ASCII.) Do you need to include Unicode in your .py file, or was that just to help with debugging?
I'll note that my txt variable is already a unicode string, i.e.
print type(txt)
produces
<type 'unicode'>
I'd be curious to know what type(str) would produce in your case.
One thing I noticed in your code is
str = str.replace("\u2013", "-") #en dash
Are you sure that does anything? My understanding is that \u only means "unicode character' inside a u"" string, and what you've created there is a string with 5 characters, a "u", a "2", a "0", etc. (The first character is because you can escape any character and if there's no special meaning, like in the case of '\n' or '\t', it just ignores the backslash.)
Also, the fact that you get 3 CP437 characters from your printer makes me suspect that you still have an en-dash in your string. The UTF-8 encoding of an en-dash is 3 bytes: 0xe2 0x80 0x93. When you call str.encode('utf-8') on a unicode string that contains an en-dash you get those three bytes in the returned string. I'm guessing that your terminal knows how to interpret that as an en-dash and that's what you're seeing.
If you can't get my first method to work, I'll mention that I also had success with this:
txt = txt.encode('utf-8')
txt = re.sub("\xe2\x80\x93", "-", txt)
Maybe that re.sub() would work for you if you put it after your call to encode(). And in that case you might not even need that call to decode() at all. I'll confess that I really don't understand why it's there.