python .format() can't interpolate accented letters - Python 2.7 - python-2.7

I'm trying to interpolate strings that are saved correctly with accented letters in a database. When I recover them I have an error:
'<html><div>{ragioneSociale}{iva}{sdi}{cuu}{indirizzo}{metodoDiPagamento}{iban_bic}</div></html>'.format(
ragioneSociale=generaleViewRes.getString('ragioneSociale'),
iva=generaleViewRes.getString('iva'),
sdi=generaleViewRes.getString('sdi'),
cuu=generaleViewRes.getString('cuu'),
indirizzo=generaleViewRes.getString('indirizzo'),
metodoDiPagamento=generaleViewRes.getString('metodoDiPagamento'),
iban_bic=generaleViewRes.getString('iban_bic')
)
Then I tried to use encode('utf-8') on each single element individually, then encode('utf-8').decode('utf-8') and finally .decode('utf-8'). The errors were:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd2' in position 57: ordinal not in range(128)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd2' in position 57: ordinal not in range(128)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 57-58: invalid data
Unfortunately this is an error that I often find with the .format method, which I had solved in smaller contexts using the + operator. The fact is that large interpolations cannot be solved by using the + operator for a matter of readability and for other problems that the .format provides. I wonder, is it possible that this problem has never been solved?
Thanks in advance.

Related

python + unicodeEncodeError \xb5 while reading from excel and writing to msqldatabase

I have a python 2.7 script that reads data from an excel file where it is possible that the user uses special characters (e.g. µ). and write it in a msqldatabase.
I've added the next code on top f the file:
# -*- coding: utf-8 -*-
But it still uses the ascii codec. How can I solve this error.
This is the errocode:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 19: ordinal not in range(128)
tx in advance
I have faced this issue while inserting values from 3rd party app. over there I used to insert values with escape string.
from re import escape
r = escape('µ')
Result:
'\\\xc2\\\xb5'
in insert statement pass r variable value.

python print str.decode("utf-8") UnicodeEncodeError

I want to convert a python string (utf-8) to unicode.
word = "3——5" # —— is u'\u2013', not a english symbol
print type(word).__name__ # output is str
print repr(word) # output is '3\xe2\x80\x935'
print word.decode("utf-8", errors='ignore')
I got this error
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1: ordinal not in range(128)
But when I change
word.decode("utf-8", errors='ignore')
to
word.decode(errors='ignore')
the error disappears.
Why? word is a utf-8 string, why can't i specify utf-8 to decode?

Reading and writing UTF-8 from file

I have some text encoded in UTF-8. 'Before – after.' It was fetched from the web. The '–' character is the issue. If you try to print directly from the command line, using copy and paste:
>>> text = 'Before – after.'
>>> print text
Before – after.
But if you save to a text file and try to print:
>>> for line in ('file.txt','r'):
>>> print line
Before û after.
Im pretty sure this is some sort of UTF-8 encode/decode error, but it is eluding me. I have tried to decode, or re-encode but that is not it either.
>>> for line in ('file.txt','r'):
>>> print line.decode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 7: invalid start byte
>>> for line in ('file.txt','r'):
>>> print line.encode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 7: invalid start byte
It's happening because a non-ascii character cannot be encoded or decoded. You can strip it out and then print the ascii values.
Take a look at this question : UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte

How do I write a capital Greek "delta" as a string in Python 2.7?

I am looking for this character: Δ which I need for a legend item in matplotlib. Python 3.x features a str type that contains Unicode characters, but I couldn't find any valuable information about how to do it in Python 2.7.
x = range(10)
y = [5] * 10
z = [y[i] - x[i] for i in xrange(10)]
plt.plot(x,z,label='Δ x,y')
plt.legend()
plt.show()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position
0: ordinal not in range(128)
Although #berna1111's comment is correct, you don't need to use LaTeX format to get a ∆ character.
In python 2, you need to specify that a string is unicode by using the u'' construct (see doc here). E.g.:
plt.plot(x,z,label=u'Δ x,y')

Python: ascii codec can't encode en-dash

I'm trying to print a poem from the Poetry Foundation's daily poem RSS feed with a thermal printer that supports an encoding of CP437. This means I need to translate some characters; in this case an en-dash to a hyphen. But python won't even encode the en dash to begin with. When I try to decode the string and replace the en-dash with a hyphen I get the following error:
Traceback (most recent call last):
File "pftest.py", line 46, in <module>
str = str.decode('utf-8')
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 140: ordinal not in range(128)
And here is my code:
#!/usr/bin/python
#-*- coding: utf-8 -*-
# This string is actually a variable entitled d['entries'][1].summary_detail.value
str = """Love brought by night a vision to my bed,
One that still wore the vesture of a child
But eighteen years of age – who sweetly smiled"""
str = str.decode('utf-8')
str = str.replace("\u2013", "-") #en dash
str = str.replace("\u2014", "--") #em dash
print (str)
I can actually print the output using the following code without errors in my terminal window (Mac), but my printer spits out sets of 3 CP437 characters:
str = u''.str.encode('utf-8')
I'm using Sublime Text as my editor, and I've saved the page with UTF-8 encoding, but I'm not sure that will help things. I would greatly appreciate any help with this code. Thank you!
I don't fully understand what's happening in your code, but I've also been trying to replace en-dashes with hyphens in a string I got from the Web, and here's what's working for me. My code is just this:
txt = re.sub(u"\u2013", "-", txt)
I'm using Python 2.7 and Sublime Text 2, but I don't bother setting -*- coding: utf-8 -*- in my script, as I'm trying not to introduce any new encoding issues. (Even though my variables may contain Unicode I like to keep my code pure ASCII.) Do you need to include Unicode in your .py file, or was that just to help with debugging?
I'll note that my txt variable is already a unicode string, i.e.
print type(txt)
produces
<type 'unicode'>
I'd be curious to know what type(str) would produce in your case.
One thing I noticed in your code is
str = str.replace("\u2013", "-") #en dash
Are you sure that does anything? My understanding is that \u only means "unicode character' inside a u"" string, and what you've created there is a string with 5 characters, a "u", a "2", a "0", etc. (The first character is because you can escape any character and if there's no special meaning, like in the case of '\n' or '\t', it just ignores the backslash.)
Also, the fact that you get 3 CP437 characters from your printer makes me suspect that you still have an en-dash in your string. The UTF-8 encoding of an en-dash is 3 bytes: 0xe2 0x80 0x93. When you call str.encode('utf-8') on a unicode string that contains an en-dash you get those three bytes in the returned string. I'm guessing that your terminal knows how to interpret that as an en-dash and that's what you're seeing.
If you can't get my first method to work, I'll mention that I also had success with this:
txt = txt.encode('utf-8')
txt = re.sub("\xe2\x80\x93", "-", txt)
Maybe that re.sub() would work for you if you put it after your call to encode(). And in that case you might not even need that call to decode() at all. I'll confess that I really don't understand why it's there.