How can the python interpreter read the charaset? - python-2.7

I'm learning python. And i learned every comment starts with a hash "#". So how can the python interpreter read this line?
# -*- coding: utf-8 -*-
and set the charset to utf-8 ? (I'm using Python 2.7.3)
Thank you in advance.

Yes, it is a comment. But this does not mean that python doesn't see it. So it can obviously parse it, too.
What python actually does is using the regular expression coding[:=]\s*([-\w.]+) on the first two lines. Most likely this is done even before the actual python parser steps in.
See PEP-0263 for details.

Related

How to add non ASCII characters in a python list?

I am a new learner of python. I want to have a list of strings with non-ASCII characters.
This answer suggested a way to do this, but when I tried a code, I got some weird results. Please see the following MWE -
#-*- coding: utf-8 -*-
mylist = ["अ,ब,क"]
print mylist
The output was ['\xe0\xa4\x85,\xe0\xa4\xac,\xe0\xa4\x95']
When I use ASCII characters in the list, let's say ["a,b,c"] the output also is ['a,b,c']. I want the output of my code to be ["अ,ब,क"]
How to do this?
PS - I am using python 2.7.16
You want to mark these as Unicode strings.
mylist = [u"अ,ब,क"]
Depending on what you want to accomplish, if the data is just a single string, it might not need to be in a list. Or perhaps you want a list of strings?
mylist = [u"अ", u"ब", u"क"]
Python 3 brings a lot of relief to working with Unicode (and doesn't need the u sigil in front of Unicode strings, because all strings are Unicode), and should definitely be your learning target unless you are specifically tasked with maintaining legacy software after Python 2 is officially abandoned at the end of this year.
Regardless of your Python version, there may still be issues with displaying Unicode on your system, in particular on older systems and on Windows.
If you are unfamiliar with encoding issues, you'll want to read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) and perhaps the Python-specific Pragmatic Unicode.
Use:
#-*- coding: utf-8 -*-
mylist = ["अ,ब,क"]
print [unicode(i) for i in mylist]
Or use:
#-*- coding: utf-8 -*-
mylist = ["अ,ब,क"]
print map(unicode, mylist)

Using French text in Python script

I am new to Python, and I am trying to change some text from English to French in a series of ArcGIS maps, using a Python script (running version 2.7.12) and editing it in IDLE. Following the suggestions in these posts
Write french characters in python 2.7
How to make the python interpreter correctly handle non-ASCII characters in string operations?
I used
#!/usr/bin/python2.7
# coding: utf-8
as the first lines of my script, and included a 'u' inside the brackets before the text with the French character. However, when I make the substitution, I can no longer save or run the script.
The following code generates the English text correctly:
if name[0] == "Alcids":
elm_spp.text = '\r\n'.join(textwrap.wrap("Alcids: ANMU, CAAU, COMU, MAMU,
PIGU, RHAU, UNAL",30))
The following does not allow me to save or run the script:
if name[0] == "Alcids":
elm_spp.text = '\r\n'.join(textwrap.wrap(u"Alcidés: GUCB, SCAS, GUMA,
GMRB, GUCO, MARH, ALSP",30))
Can anyone tell me what I am missing?
Thanks.

Different base64 encoding between python versions

I'm having trouble sending an html code through JSON.
I'm noticing my string values are different between python versions (2.7 and 3.5)
My string being something like: <html><p>PAÇOCA</p></html>
on Python 2.7:
x = '<html><p>PAÇOCA</p></html>'
base64.b64encode(x)
=> PGh0bWw+PHA+UEGAT0NBPC9wPjwvaHRtbD4=
on Python 3.5:
x = '<html><p>PAÇOCA</p></html>'
base64.b64encode(x)
=> b'PGh0bWw+PHA+UEHDh09DQTwvcD48L2h0bWw+'
Why are these values different?
How can I make the 3.5 string equal to the 2.7?
This is causing me troubles with receiving e-mails due to the accents being lost.
Your example x values are not valid Python so it is difficult to tell where the code went wrong, but the answer is to use Unicode strings and explicitly encode them to get consistent answers. The below code gives the same answer in Python 2 and 3, although Python 3 decorates byte strings with b'' when printed. Save the source file in the encoding declared via #coding. The source code encoding can be any encoding that supports the characters used in the source file. Typically UTF-8 is used for non-ASCII source code, but I made it deliberately different to show it doesn't matter.
#coding:cp1252
from __future__ import print_function
import base64
x = u'<html><p>PAÇOCA</p></html>'.encode('utf8')
enc = base64.b64encode(x)
print(enc)
Output using Pylauncher to choose the major Python version:
C:\>py -2 test.py
PGh0bWw+PHA+UEHDh09DQTwvcD48L2h0bWw+
C:\>py -3 test.py
b'PGh0bWw+PHA+UEHDh09DQTwvcD48L2h0bWw+'

Is it possible to create string decorators?

In Python I can automatically create an unicode object by prepending an u (as in u"test").
Is it possible to build something like that myself?
All things are possible - but in this case, only by modifying the source code of the Python interpreter and recompiling.
A related question with the same answer: Can you add new statements to Python's syntax?
Yes you can.
All you need to do is type the following:
ur"\u<hex>"
For example, if you were to type
print ur"\u0186"
it would output the following character (given you are using a certain font)
҉
To print this, you could just simply type
print "҉"
but for this to be allowed, you must put the following line of code as the FIRST line of code
# -*- coding: utf-8 -*-
Yes, I know it has the # symbol, that is supposed to be there.
Hope this helps! Have fun with unicoding! :)

Accents/special characters (e.g., ñ) in verbose_name or help_text ?

How do I use letters with accent marks or special characters like ñ in verbose_name or help_text?
include this in the head of your file:
# -*- coding: utf-8 -*-
and then use this:
u'áéíóú'
i did it:
import os, sys
#encoding= utf-8
Thanks
#diegueus9 has the right answer for using raw Unicode characters in the source file. Use whatever characters you like as long as you declare the encoding as per the instructions in PEP263. However, for using just a few special characters you may find this easier: declare the string as Unicode with the u prefix and use the character's code point. The following are equivalent ways of writing "ñ":
help_text=u'\xF1 \u00F1 \U000000F1'
When it comes to actually finding the code point for a character...that's a little harder. Windows has the useful Character Map utility. gucharmap is similar. The charts at unicode.org provide alphabet-specific PDFs you can search through. Anyone know an easier way?