Show arabic text in GUI - c++

Is there a way to show the following as Arabic text?
I used in my project frequently char *sql and ANSI, what can I do ?

Basically a char is just not applicable since it only has 256 possibilities for values. In your code you can use wchar_t instead of char.
I would use Qt and their Internationalization functionality. It supports Unicode and would thus solve your problem. Here's an example showing how to use it for different languages as well. There's also ICU (International Components for Unicode) - look at this.

You may want to read this article by Joel Spolsky on Unicode and character sets to understand what you are facing.
Then find out what kind of internationalization support your GUI toolkit offers.

Related

Qt C++ write to / read from "IBM037 / CP037"

I was looking for a way to write to and read from IBM037 encoding in Qt. I was able to achieve that in C# by using
Encoding.GetEncoding("IBM037")
However, I am currently porting an application from C# to C++ using Qt, and I wasn't able to find a way to do so.
Thanks in advance.
Edit: I am aware of QTextCodec but it does not contain a definition for IBM 037. Using it returns a normal text (non-encoded).
You can implement your own class derived from QTextCodec and use tables (like the ones available here ) to perform the translation character by character.
As suggested in the comments, check what stated in the QTextCodec documentation here.
With tables like these you can translate to ASCII 8 bit. Then convert the ASCII characters to Unicode using the functions already provided by the Qt framework.

Qt internationalization from native language

I am going to write software in Qt. Its string literals should be written in native (non-English) language and they should support internationalization. The Qt docs advice to use tr() function for this http://doc.qt.io/qt-5/i18n-source-translation.html
So I try to write:
edit->setText(tr("Фильтр"));
and and I can see only question marks in running app
I replace it with QString::fromStdWString
edit->setText(QString::fromStdWString(L"Фильтр"));
and I can see correct text in my language
So the question is: How should I write non-ASCII strings to be able to correctly display them and translate using Qt Linguist
PS: I use UTF8 encoding for all source files, compiler is vs2013
PS2: I have found QTextCodec::setCodecForTr() function.. but It was removed from Qt 5.4
I think that the best option is to use some kind of Latin1 transliteration inside program source. Then, it's possible to implement both Russian and English versions as normal Qt translations.
BTW. It's possible, with some additional work, to use even plain numbers as translation-placeholders. Just like MFC did.
I found the strange solution for my problem:
By default VS saves files in UTF8 with BOM. In File -> Advanced save options I choose to save file in UTF without BOM and everything works like a charm:
edit->setText(tr("Фильтр"));
It looks like a VS compiler bug.. Interestingly MS claims that its compiler support Unicode only for UTF8 with BOM https://msdn.microsoft.com/en-us/library/xwy0e8f2.aspx
PS: length of "Фильтр" is 12 bytes, so it is really utf8 string

Arabic: 'source' Unicode to final display Unicode

simple question:
this is the final display string I am looking for
لعبة ديدة
now below is each of the separate characters, before being 'glued' together (so I've put a space between each of them to stop the joining)
ل ع ب ة د ي د ة
note how they are NOT the same characters, there is some magical transform that melds them together and converts them to new Unicode characters.
and then in that above, the characters are actually appearing right to left (in memory, they are left to right)
so my simple question is this: where do I get a platform independent c/c++ function that will take my source 16 bit Unicode string, and do the transform on it to result in the Unicode string that will create the one first quoted above? doing the RTL conversion, and the joining?
that's all I want, one function that does that.
UPDATE:
ok, yes, I know that the 'characters' are the same in the two above examples, they are the same 'letters' but (viewing in chrome, or latest IE) anyone can CLEARLY see that the glyphs are different. now I'm fairly confident that this transform that needs to be done can be done on the unicode level, because my font file, and the unicode standard, seems to specify the different glyphs for both the separate, and various joined versions of the characters/letters. (unicode.org/charts/PDF/UFB50.pdf unicode.org/charts/PDF/UFE70.pdf)
so, can I just put my unicode into a function and get the transformed unicode out?
The joining and RTL conversion don't happen at the level of Unicode characters.
In other words: the order of the characters and the actual unicode codepoints are not changed during this process.
In fact, the merging and handling RTL/LTR transitions is handled by the text rendering engine.
This quote from the Wikipedia article on the Arabic alphabet explains it quite nicely:
Finally, the Unicode encoding of Arabic is in logical order, that is, the characters are entered, and stored in computer memory, in the order that they are written and pronounced without worrying about the direction in which they will be displayed on paper or on the screen. Again, it is left to the rendering engine to present the characters in the correct direction, using Unicode's bi-directional text features. In this regard, if the Arabic words on this page are written left to right, it is an indication that the Unicode rendering engine used to display them is out-of-date.
The processing you're looking for is called ligature. Unlike many latin-based languages, where you can simply put one character after another to render the text, ligatures are fundamental in arabic. The substitution is done in the text rendering engine, and the ligature infos are generally stored in font files.
note how they are NOT the same characters
They are the same for an Arabic reader. It is still readable.
There is no transform to do on your Unicode16 source text. You must provide the whole string to your text renderer. In C/C++, and as you are going the platform independent way, you can use Pango for rendering.
Note : Perhaps you wanted to write لعبة جديدة (i.e. new game) ? Because what you give as an example has no meaning in Arabic.
I realise this is an old question, but what you're looking for is FriBidi, the GNU implementation of the Unicode bidirectional algorithm.
This program does the glyph selection that was asked about in the question, as well as handling bidirectional text (mixture of right-to-left and left-to-right text).
What you are looking for is an Arabic script synthesis algorithm. I'm not aware one exists as open source. If you arrive at one please post.
Some points:
At the storage level, there is no Unicode transform. There is an abstract representation of the string as pointed out by other answers.
At the rendering level, you could choose to use Unicode Presentation Forms, but you could also choose to use other forms. Unicode Presentation Forms are not a standard for what presentation output encoding should be - rather they are just one example of presentation codes that can be output by the rendering engine using script synthesis.
To make it clearer: There wouldn't be a single standard transform (ie synthesis algorithm) that would transform from A to B, where A is standard Unicode Arabic page, and B is standard Unicode Arabic Presentation Forms. Rather, there would be different transformations that can vary in complexity and can have different encoding systems for B, but one of the encodings that can be used for B is the Unicode Presentation Forms.
For example, a simple typewriter style would require a simple rendering algorithm that would not require Presentation Forms. Indeed there does exist modern writing styles (not in common usage though) where A and B are actually identical, only that a different font page would be used to do the rendering. On the other hand, the transform to render typesetting or traditional calligraphic forms would be more complex and require something similar to the Unicode Presentation Forms.
Here are a couple of pointers for more information on the topic:
http://unicode.org/faq/ligature_digraph.html#Pf1
http://www.decotype.com/publications/unicode-tutorial.pdf
PLease see: http://www.fileformat.info/info/unicode/block/arabic_presentation_forms_b/list.htm and Have a look at this repo: https://github.com/Accorpa/Arabic-Converter-From-and-To-Arabic-Presentation-Forms-B

What is the native narrow string encoding on Windows?

The Subversion API has a number of functions for converting from "natively-encoded" strings to strings that are encoded in UTF-8. My question is: what is this native encoding on Windows? Does it depend on locale?
"Natively encoded" strings are strings written in whatever code page the user is using. That is, they are numbers that are translated to the appropriate glyphs based on the correct code page. Assuming the file was saved that way and not as a UTF-8 file.
This is a candidate question for Joel's article on Unicode.
Specifically:
Eventually this OEM free-for-all got
codified in the ANSI standard. In the
ANSI standard, everybody agreed on
what to do below 128, which was pretty
much the same as ASCII, but there were
lots of different ways to handle the
characters from 128 and on up,
depending on where you lived. These
different systems were called code
pages. So for example in Israel DOS
used a code page called 862, while
Greek users used 737. They were the
same below 128 but different from 128
up, where all the funny letters
resided. The national versions of
MS-DOS had dozens of these code pages,
handling everything from English to
Icelandic and they even had a few
"multilingual" code pages that could
do Esperanto and Galician on the same
computer! Wow! But getting, say,
Hebrew and Greek on the same computer
was a complete impossibility unless
you wrote your own custom program that
displayed everything using bitmapped
graphics, because Hebrew and Greek
required different code pages with
different interpretations of the high
numbers.
Windows 1252. Jukka Korpela has an excellent page on character encodings, with an extensive discussion of the Windows character set.
From the header svn_string.h you can see that the relevant svn_strings are just plain old const char* + a length element.
I would guess that the "natively encoded" svn strings are interpreted according to your system locale (I do not know this for sure, but this is the convention). On Windows 7 you can check your locale by selecting "Start-->Control Panel-->Region and Language-->Administrative-->Change system locale" where any value of English would probably entail the character encoding Windows 1252. However, a different system locale, for example Hebrew (Israel), would entail a different character encoding (Windows 1255 for the case of Hebrew).
Sadly the MSVC version of the C library does not support UTF-8 and uses legacy codepages only, but cygwin provides a UTF-8 locale as part of its emulation layer. If your svn is built on cygwin, you should be able to use UTF-8 just fine.

Rendering unicode characters correctly on textbox

I am working on a translation application in which users are allowed to give English input and I need to convert to a target language and display on a text box. I am facing problems in displaying unicode characters.
Complex characters are not rendering correctly. I know windows uses Uniscribe for rendering complex characters. So do I need to use that explicitly to get the correct rendering? What is the equivalent of Uniscribe in LINUX and MAC?
I am using C++ with wxWidgets framework and trying to display unicode characters on a text box. Any help would be great!
Considering that Uniscribe support in wxWidgets was merely a Google Summer of code idea this year, it seems unlikely that it's working today.
There's no trivial Linux or Mac equivalent for Uniscribe
Read up on Pango. It's the library that supports full OpenType rendering on Linux. Mac's another story.