how to use stdscr.addstr() (curses) to print unicode characters - python-2.7

I know how to use the print() function to print unicode characters, but I do not know how to do it using stdscr.addstr()
I'm using python 2.7 on a Linux operating system
Thanks

I'm pretty sure you need to encode the string.
The docs reads:
Since version 5.4, the ncurses library decides how to interpret non-ASCII data using the nl_langinfo function. That means that you have to call locale.setlocale() in the application and encode Unicode strings using one of the system’s available encodings.
This example worked for me in 2.7.12
import locale
locale.setlocale(locale.LC_ALL, '')
stdscr.addstr(0, 0, mystring.encode('UTF-8'))

Related

Problems in codification - unicode vs. utf-8 in python 2.7

Well, my python script is supposed to open all utf-8 yaml files in a directory and show the content to the user. But, there are words with graphic accent, words in French, such as présenter, which is shown like this: u"pr\xe9senter. I need it to be shown properly to the user.
Here is my code:
import glob
files = glob.glob("data/*.yaml")
def read_yaml_file(filename):
with open(filename, 'r') as stream:
try:
print(yaml.safe_load(stream))
except yaml.YAMLError as exc:
print(exc)
for file in files:
read_yaml_file(file)
I already tried to use the import from __future__, but it didn't work. Does anyone know how to solve it?
Unicode in 2.x is painful. If you can, use current python 3, in which text is unicode, printed without a 'u' prefix, instead of bytes, which is now printed with a 'b' prefix.
>>> print(u"pr\xe9senter") # 3.8
'présenter'
You also need a system console/terminal or IDE that displays glyphs for the codepoints in your yaml files.
If you are a masochist or otherwise stuck on 2.7, use sys.stdout.write(). Note that you must explicitly write '\n's.
>>> import sys; sys.stdout.write(u"pr\xe9senter\n") # 2.7
présenter
This question is not really about IDLE. However, the above lines work in both standard interactive Python on Windows 10 and in IDLE. IDLE uses tkinter which uses tcl/tk. Tk itself can handle all Basic Multilingual Plane (BMP) characters (the first 64K), but only those. Which BMP characters it can display depends on your OS and its current fonts.

encoding in py_compile vs import

When a python script with non ASCII character is compiled using py_compile.compile it does not complaint about encoding. But when imported gives in python 2.7
SyntaxError: Non-ASCII character '\xe2' in file
Why is this happening? whats the difference between importing and compiling using py_compile?
It seems that Python provides two variants of its lexer, one used internally when Python itself parses files, and one that is exposed to Python through e.g. __builtins__.compile or tokenizer.generate_tokens. Only the former one checks for non-ASCII characters, it seems. It's controlled by an #ifdef PGEN in Parser/tokenizer.c.
I have a qualified guess on why they did it this way: In Python 3, non-ASCII characters are permitted in .py files, and are interpreted as utf-8 IIRC. By silently permitting UTF-8 in the lexer, 2.7's tokenizer.generate_tokens() function can accept all valid Py3 code.

Correct len() 32-bit unicode strings in Python

I am facing a problem with 32-bit unicode strings in Python 2.7. A simple declaration such as:
s = u'\U0001f601'
print s
Will print a nice 😁 (smiley face) in the shell (if the shell supports unicode). The problem is that when I try:
print len(s), s.encode('latin-1', errors='replace')
I get different responses for different platforms. In Linux, I get:
1 ?
But in Mac, I get:
2 ??
Is the string declaration correct? Is this a bug in Python for Mac?
The OS X Python has been compiled with UCS-2 (really UTF-16) support versus UCS-4 support for Linux. This means that a surrogate pair with a length of 2 characters is being used to represent the SMP character on OS X.

Python 2.7 range regex matching unicode emoticons

How to count the number of unicode emoticons in a string using python 2.7 regex? I tried the first answer posted for this question. But it has been showing invalid expression error.
re.findall(u'[\U0001f600-\U0001f650]', s.decode('utf-8')) is not working and showing invalid expression error
How to find and count emoticons in a string using python?
"Thank you for helping out 😊(Emoticon1) Smiley emoticon rocks!😉(Emoticon2)"
Count : 2
The problem is probably due to using a "narrow build" of Python 2. That is, if you fire up your interpreter, you'll find that sys.maxunicode == 0xffff is True.
This site has a few interesting notes on wide builds of Python (which are commonly found on Linux, but not, as the link suggests, on OS X in my experience). These builds use UCS-4 internally to encode characters, and as a result seem to have saner support for higher range Unicode code points, such as the ranges you are talking about. Narrow builds apparently use UTF-16 internally, and as a result encode these higher code points using "surrogate pairs". I presume this is the reason you see a bad character range error when you try and compile this regular expression.
The only solution I know is to switch to a python version >= 3.3 which no longer has the wide/narrow distinction if you can, or install a wide Python build

How do i create a unicode filename in linux?

I heard fopen supports UTF8 but i dont know how to convert an array of shorts to utf8
How do i create a file with unicode letters in it? I prefer to use only built in libraries (no boost which is not installed on the linux box). I do need to use fopen but its pretty simple to.
fopen(3) supports any valid byte sequence; the encoding is unimportant. Use nl_langinfo(3) with CODESET to get what charset you should use for the encoding, and libiconv or icu for the actual encoding.