ISO-10646 XFont encoding issue

ISO-10646 XFont encoding issue - c++

I'm trying to use ISO-10646 fixed font in my X Window program. It has to support English, Russian and Greek letters but it doesn't. In xfontsel window all is OK but in my program only English displays correctly. Using something like g++ -fexec-charset=ISO-10646 ... or g++ -finput-charset=ISO-10646 ... wasn't helpful. How can I fix it?
Test app window screenshot:
xfontsel window screenshot:
Test app code:
#include <X11/StringDefs.h>
#include <X11/Xaw/Command.h>
int main(int argc, char **argv) {
Widget widget = XtInitialize(argv[0], "simple", NULL, 0, &argc, argv);
XtVaCreateManagedWidget(
"English Русский ελληνικά", labelWidgetClass, widget,
XtNfont, XLoadQueryFont(XtDisplay(widget),
"-Misc-Fixed-Medium-R-Normal--20-200-75-75-C-100-ISO10646-1"
), XtNwidth, 500, XtNheight, 100, NULL
);
XtRealizeWidget(widget);
XtMainLoop();
}

Your program uses UTF-8-encoded single byte strings, which is not what Xt/Xaw expects. -fexec-charset won't help any.
With pure Xlib (no toolkit) you would use Xutf8DrawString and friends, but Xt and Xaw have no provision for that.
Xaw theoretically supports 2-byte encodings of labels with XtNencoding set to XawTextEncodingChar2b, but I could never make it work with UTF-16.
XChar2b lbl[] = { {0x04, 0x40}, {0x04, 0x43},
{0x04, 0x41}, {0x04, 0x41},
{0x04, 0x3a}, {0x04, 0x38},
{0x04, 0x39}, {0, 0}};
XtVaCreateManagedWidget(
"w00t", labelWidgetClass, widget,
XtNfont, XLoadQueryFont(XtDisplay(widget),
"-Misc-Fixed-Medium-R-Normal--20-200-75-75-C-100-ISO10646-1"
), XtNwidth, 500, XtNheight, 100,
XtNencoding, XawTextEncodingChar2b,
XtNlabel, lbl,
NULL
);
This particular string works. I hope you can figure out how to produce an arbitrary label. However, a null byte at either high or low position terminates the string, so no English text can be displayed with this method.
Perhaps a copy of Xaw on my machine is not up to date. There is a patch that should make it work here but I don't know if it's applied on this copy and cannot be bothered to build a patched version from sources. Perhaps you shouldn't rely on it if you want to distribute your code dynamically linked, as not every machine will have an up-to-date Xaw library. This patch was made in 2014.
My advice would be not to rely on i18n abilities of Xaw. Use raw X11 with Xutf8DrawString, or a modern toolkit such as Qt or Gtk or FLTK or wxWidgets, which should all work with UTF-8 seamlessly. As the last resort, subclass Xaw widgets as needed and make them work with Xutf8DrawString.
Update I have checked the source from a Gentoo ebuild which ought to be up to date. The patch is not applied there, there is strlen everywhere, XChar2b is not working. A typical Xaw code fragment:
len = strlen(label);
...
if (len) {
if (w->label.encoding)
XDrawString16(XtDisplay(gw), XtWindow(gw), gc,
w->label.label_x, y, (XChar2b *)label, len / 2);
else
XDrawString(XtDisplay(gw), XtWindow(gw), gc,
w->label.label_x, y, label, len);
}
Clearly this cannot possibly have any hope of working correctly.

Related

Not eliding correctly on QListView in Windows (OS)

I work with both operation systems (windows and linux) and in linux (image 1) ElideRight is working well but in windows (image 2) is not working well. (...) supposed to be "a" instead.
I use the code below for "Eliding".
Also you need to know, this is happening in QListView.
ui->geometry_list->setTextElideMode(Qt::ElideRight);

I tried to reproduce OPs issue with the following MCVE testQListViewElide.cc:
// Qt header:
#include <QtWidgets>
// main application
int main(int argc, char **argv)
{
qDebug() << "Qt Version:" << QT_VERSION_STR;
QApplication app(argc, argv);
// setup GUI
QListWidget qLst;
qLst.resize(200, 200);
qLst.setTextElideMode(Qt::ElideRight);
qLst.addItem(QString("A very long item text to make the elide feature visible"));
qLst.show();
// runtime loop
return app.exec();
}
My platform is Visual Studio 2019 on Windows 10.
Output:
Qt Version: 5.15.1
There are in fact no ellipses. However… Please, note the horizontal scrollbar.
So, I added one line to switch the scrollbar off:
qLst.setHorizontalScrollBarPolicy(Qt::ScrollBarAlwaysOff);
Output:
So, this is working in Windows in general. (I would have been surprised if not.)
The claim of OP that an à is shown instead of ellipses made me a bit suspicious. This could be a sign for encoding problems. However, as Qt is using Unicode in QString such issues are unlikely. (I never experienced such issues while developing with Qt in Windows in daily business.)
Just, out of curiosity, I compared the Windows 1252 encoding of à (224 = 0xE0) with the Unicode of ellipses (U+2026), in UTF-8 encoding: e2 80 a6, in UTF-16 encoding: 26 20 (LE) or 20 26 (BE).
This doesn't look like a mis-interpreted encoding – at least, no obvious. However, to sort this out, OP had to provide a little bit more info like e.g. an MCVE which makes the issue reproducible.
(Thus, OP could use my MCVE whether it reproduces the issue on OPs platform.)
I suspect that this an encoding problem but happening while the item texts are stored in QStrings. It's just the list view which exposes the broken item text. Thereby, consider that it's very likely that strings retrieved in Linux are already in UTF-8 encoding. If a QString is assigned from a std::string, UTF-8 encoding is assumed as well. (QString::fromStdString()).
This is different in Windows where the internal encoding is UTF-16 but these ANSI flavors of system functions (with different meanings of character values depending on current code page) are still available (which are always good for any encoding damage).

Visualisation of uft-8 (Polish) not working properly

My software supports multiple languages (English, German, Polish, Russian, ...). For this reason I have some language specific files with the dialog texts in the specific language (Encoded as UTF-8).
In my mfc application I open and read those files and insert the text into my AfxMessageBoxes and other UI-Windows.
// Get the codepage number. 65001 = UTF-8
// In the real code this is a parameter in the function I call (just for clarification)
LANGID languageID = 65001;
TCHAR szCodepage[10];
GetLocaleInfo (MAKELCID (languageID, SORT_DEFAULT), LOCALE_IDEFAULTANSICODEPAGE, szCodepage, 10);
int nAnsiCodePage = _ttoi (szCodepage);
// Open the file
CFile file;
CString filename = getName();
if (!file.Open(FileName, CFile::modeRead, NULL))
{
//Check if everything is fine, else break
}
// Read the file
CString inString;
int len = file.GetLength ();
UINT n = file.Read (inString.GetBuffer(len), len);
inString.ReleaseBuffer ();
int size = MultiByteToWideChar (CP_ACP, 0, strAllItems, -1, NULL, 0);
WCHAR *ubuf = new WCHAR[size + 1];
MultiByteToWideChar ((UINT) nAnsiCodePage, (nAnsiCodePage == CP_UTF8 ?
0 : MB_PRECOMPOSED), inString, -1, ubuf, (int) size);
outString = ubuf;
file.Close ();
Result:
This mechanism is working fine for special letters of russian and german, but not for polish. I already checked the utf-8 site (http://www.utf8-chartable.de/unicode-utf8-table.pl?number=1024) and the polish characters are part of it.
I also checked the hex values of my CString and everything seems to be alright, but it is not visualized in the correct way. Just for testing I changed the used codepage from utf-8 to 1250 (Eastern Europe, Polish included) and it also did not work.
What am I doing wrong?
EDIT:
When I use:
MultiByteToWideChar (CP_UTF8 , 0, inString, -1, ubuf, (int) size);
The hex-values are shortend to the "best match" letters. Meaning my result is: mezczyzna
I am using windows 7 with the english language selected.

Well, you have two options:
A. Make your application Unicode. You don't tell us whether it actually is, but I conclude it's not. This is the 'best" solution technically, but it may require a lot of effort, and it may even not be feasible at all (eg use of non-Unicode libraries).
B. If your app is non-Unicode, you have some limitations:
- Your application will only be capable of displaying correctly one codepage using the non-unicode APIs & messages, and this unfortunately cannot be set per application, it's globally set in Windows with the "Language for non-Unicode programs" option, and requires a reboot.
- To display correctly strings containing characters not in the default codepage, you need to convert them to Unicode and use the "wide" versions of APIs & messages explicitly, to display them (eg MessageBoxW()). A little cumbersome, but doable, if the operation concerns only a small number of controls.
The machine you're working on has some western european language as the "Language for non-Unicode programs", and I come to this conclusion because "This mechanism is working fine for special letters of russian and german" and "Using MessageBoxA(0, "mężczyzna", 0, 0) does not work", as you said (though i'm not sure at all about russian, as it's a different codepage).
Apart from this, as IInspectable said, int size = MultiByteToWideChar (CP_ACP, 0, strAllItems, -1, NULL, 0); makes not sense at all, as the string is known to be UTF-8, and not of the default codepage. You may also need to remove the UTF-8 BOM header, if your file contains it.

Writing unicode(?) character directly from source code to WriteConsoleOutput

I'm trying to use WriteConsoleOutput from the WinApi to write characters to the command prompt window buffer. The thing is, I'd really like to be able to write characters such as ☺ directly into the source code, as-is, instead of using some kind of encoding/notation like '\uFFFF' or '0xFF', since I don't understand them too well (differences between codepages/character sets/etc.)
The code below showcases the simplest form of my problem. Running this code does not print ☺ into the command prompt window, but a question mark (?) instead.
#include <Windows.h>
int main()
{
HANDLE h = GetStdHandle(STD_OUTPUT_HANDLE);
CHAR_INFO c[1] = {0};
COORD cS = {1, 1};
COORD cH = {0, 0};
SMALL_RECT sr = {0, 0, 0, 0};
c[0].Attributes = FOREGROUND_INTENSITY;
c[0].Char.UnicodeChar = '☺';
WriteConsoleOutput(h, c, cS, cH, &sr);
Sleep(5000);
return 0;
}
It is vital for my code to display output identically between all Windows versions, regardless of the languages installed/used. So to my knowledge (which admittedly is absolutely minimal), I'd need to set a specific codepage (one which would hopefully be supported by the command prompt in any language Windows).
I've tried:
• Changing from using the CHAR_INFO.UnicodeChar to CHAR_INFO.AsciiChar
• Fiddling around with SetConsoleCP and SetConsoleOutputCP functions, but I haven't got a clue on how to utilize them to help me with this problem.
• Changing the Visual Studio -> Project -> Project properties.. -> Character Set setting to every possible value.
• Using specifically either WriteConsoleOutputA or WriteConsoleOutputW in addition to the aforementioned settings
• Changing the source code file encoding to UTF-8 with(/out) signature.
In my project I'm programmatically setting the command prompt font to 8x8 Terminal, which to my knowledge does not support actual unicode characters. The available characters are displayed here. Those characters do include '☺', so I'm not entirely sure my question is about unicode. I have no idea anymore. Please help.

C source has to be ascii only. If you embed non-ascii characters in a C source file, and IDE might show them in what appears to be the correct format, but the compiler quite likely treats them differently, and the executable function you pass them to can treat them differently still. It's just not portable or reliable. But you can use the escape sequence \x to embed arbitrary bytes in C strings.
UTF-8 is good for internal use, but Windows APIs don't yet support it, so you need to convert to Windows 16 bit chars (UTF-16 nearly but not quite), to display extended characters. However you have to ensure that you are calling the wide character version of the Windows API. Most Windows API functions that take string come in a A and W version (ascii and wide) for binary backwards compatibility. If you query the identifier in the IDE (go to definition etc) you should see which version you have.

Why Non-Unicode apps system locale makes Unicode fonts with symbol charset displayed incorrectly?

I'm trying to display Unicode chars from Wingdings font (it's Unicode TrueType font supporting symbol charset only).
It's displayed correctly on my Win7/64 system using corresponding regional OS settings:
Formats: Russian
Location: Russia
System locale (AKA Language for Non-Unicode applications): English
But if I switch System locale to Russian, Unicode characters with codes > 127 are displayed incorrectly (replaced with boxes).
My application is created as using Unicode Charset in Visual Studio, it calls only Unicode Windows API functions.
Also I noted that several Windows apps also display such chars incorrectly with symbol fonts (Symbol, Wingdings, Webdings etc), e.g. Notepad, Beyond Compare 3. But WordPad and MS Office apps aren't affected.
Here is minimal code snippet (resources cleanup skipped for brevity):
LOGFONTW lf = { 0 };
lf.lfCharSet = SYMBOL_CHARSET;
lf.lfHeight = 50;
wcscpy_s(lf.lfFaceName, L"Wingdings");
HFONT f = CreateFontIndirectW(&lf);
SelectObject(hdc, f);
// First two chars displayed OK, 3rd and 4th aren't (replaced with boxes) if
// Non-Unicode apps language is NOT English.
TextOutW(hdc, 10, 10, L"\x7d\x7e\x81\xfc");
So the question is: why the hell Non-Unicode apps language setting affects Unicode apps?
And what is the correct (and most simple) way to display SYMBOL_CHARSET fonts without dependency to OS system locale?

The root cause of the problem is that Wingdings font is actually non-Unicode font. It supports Unicode partially, so some symbols are still displayed correctly. See #Adrian McCarthy's answer for details about how it's probably works under the hood.
Also see more info here: http://www.fileformat.info/info/unicode/font/wingdings
and here: http://www.alanwood.net/demos/wingdings.html
So what can we do to avoid such problems? I found several ways:
1. Quick & dirty
Fall back to ANSI version of API, as #user1793036 suggested:
TextOutA(hdc, 10, 10, "\x7d\x7e\x81\xfc"); // Displayed correctly!
2. Quick & clean
Use special Unicode range F0 (Private Use Area) instead of ASCII character codes. It's supported by Wingdings:
TextOutW(hdc, 10, 10, L"\xf07d\xf07e\xf081\xf0fc"); // Displayed correctly!
To explore which Unicode symbols are actually supported by font some font viewer can be used, e.g. dp4 Font Viewer
3. Slow & clean, but generic
But what to do if you don't know which characters you have to display and which font actually will be used? Here is most universal solution - draw text by glyphs to avoid any undesired translations:
void TextOutByGlyphs(HDC hdc, int x, int y, const CStringW& text)
{
CStringW glyphs;
GCP_RESULTSW gcpRes = {0};
gcpRes.lStructSize = sizeof(GCP_RESULTS);
gcpRes.lpGlyphs = glyphs.GetBuffer(text.GetLength());
gcpRes.nGlyphs = text.GetLength();
const DWORD flags = GetFontLanguageInfo(hdc) & FLI_MASK;
GetCharacterPlacementW(hdc, text.GetString(), text.GetLength(), 0,
&gcpRes, flags);
glyphs.ReleaseBuffer(gcpRes.nGlyphs);
ExtTextOutW(hdc, x, y, ETO_GLYPH_INDEX, NULL, glyphs.GetString(),
glyphs.GetLength(), NULL);
}
TextOutByGlyphs(hdc, 10, 10, L"\x7d\x7e\x81\xfc"); // Displayed correctly!
Note GetCharacterPlacementW() function usage. For some unknown reason similar function GetGlyphIndicesW() would not work returning 'unsupported' dummy values for chars > 127.

Here's what I think is happening:
The Wingdings font doesn't have Unicode mappings (a cmap table?). (You can see this by using charmap.exe: the Character set drop down control is grayed out.)
For fonts without Unicode mappings, I think Windows assumes that it depends on the "Language for Non-Unicode applications" setting.
When that's English, Windows (probably) uses code page 1252, and all the values map to themselves.
When that's Russian, Windows (probably) uses code page 1251, and then tries to remap them.
The '\x81' value in code page 1251 maps to U+0403, which obviously doesn't exist in the font, so you get a box. Similarly the, '\xFC' maps to U+044C.
I assumed that if you used ExtTextOutW with the ETO_GLYPH_INDEX flag, Windows wouldn't try to interpret the values at all and just treat them as glyph indexes into the font. But that assumption is wrong.
However, there is another flag called ETO_IGNORELANGUAGE, which is reserved, but, empirically, it seems to solve the problem.

Linux send unicode character to active application

Ok, so I'm trying to develop an app using C++ and Qt4 for Linux that will map certain key sequences to special Unicode characters. Also, I'm trying to make it bilingual, so the special Unicode character sent depends on the selected language. Example: AltGr+s will send ß or ș, depending whether German or Romanian is selected. On Windows, I have achieved this using AutoHotKey. However, I couldn't get IronAHK to work on Linux so I have written myself a nice Qt Application for it, using Qxt to register "global" shortcuts. I have tried this snippet:
void mainWnd::sendKeypress( unsigned int keycode )
{
Display *display = QX11Info::display();
Window curr_focus;
int revert_to;
XGetInputFocus( display, &curr_focus, &revert_to );
XTestFakeKeyEvent( display, keycode, true, 0 );
XTestFakeKeyEvent( display, keycode, false, 1 );
XFlush( display );
}
copied from another application(where it works), but here it seems to do nothing. Also, there might be a problem with the fact that the characters I'm trying to send aren't found on a US 101 Keyboard, that I currently use on my laptop(and as the layout in the OS).
So my question is: how do I make the app send a Unicode character to whichever app has focus, inserting a special character(sort of like KCharMap)? Remember, these are special characters which are not found on a normal US Keyboard. Thanks in advance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js