Issue with compound character font rendering in openGL using FTGL - c++

In my application, FTGL renders unicode characters of true type fonts well except compound characters. Compound character is a combination of unicode consonant and vowel sounds.
For example, in an indic language Tamil, the following string input
"கா கி கீ கு கூ கெ கே கை கொ கோ கௌ"
is displayed as
in the openGL viewer.
output of some characters is not same as input.
Can anybody help on this?
my code snippet,
std::ifstream fontFile("latha.ttf", std::ios::binary);//tamil unicode font
if (fontFile.fail())
return NULL;
fontFile.seekg(0, std::ios::end);
std::fstream::pos_type fontFileSize = fontFile.tellg();
fontFile.seekg(0);
unsigned char *fontBuffer = new unsigned char[fontFileSize];
fontFile.read((char *)fontBuffer, fontFileSize);
FTBitmapFont* m_pFTFont = new FTBitmapFont(fontBuffer, fontFileSize);
m_pFTFont->Render("கா கி கீ கு கூ கெ கே கை கொ கோ கௌ");

Welcome to the wonderful world of "complex text layout". Enjoy your stay ;)
To put it simply, a script that uses complex text layout does not have a simple 1:1 mapping between Unicode codepoints and the glyph to be rendered. And Tamil is such a script.
FTGL only handles scripts that have simple layout, because complex text layout is complex (very much so). So it's not going to be able to reproduce Tamil script correctly.
You will need to use a layout engine like Harfbuzz to layout your text. FTGL can still render the script, but you'll need Harfbuzz to tell you which glyphs in the font to render.

Related

what locale does wstring support?

In my program I used wstring to print out text I needed but it gave me random ciphers (those due to different encoding scheme). For example, I have this block of code.
wstring text;
text.append(L"Some text");
Then I use directX to render it on screen. I used to use wchar_t but I heard it has portability problem so I switched to swtring. wchar_t worked fine but it seemed only took English character from what I can tell (the print out just totally ignore the non-English character entered), which was fine, until I switch to wstring: I only got random ciphers that looked like Chinese and Korean mixed together. And interestingly, my computer locale for non-unicode text is Chinese. Based on what I saw I suspected that it would render Chinese character correctly, so then I tried and it does display the charactor correctly but with a square in front (which is still kind of incorrect display). I then guessed the encoding might depend on the language locale so I switched the locale to English(US) (I use win8), then I restart and saw my Chinese test character in the source file became some random stuff (my file is not saved in unicode format since all texts are English) then I tried with English character, but no luck, the display seemed exactly the same and have nothing to do with the locale. But I don't understand why it doesn't display correctly and looked like asian charactor (even I use English locale).
Is there some conversion should be done or should I save my file in different encoding format? The problem is I wanted to display English charactore correctly which is the default.
In the absence of code that demonstrates your problem, I will give you a correspondingly general answer.
You are trying to display English characters, but see Chinese characters. That is what happens when you pass 8 bit ANSI text to an API that receives UTF-16 text. Look for somewhere in your program where you cast from char* to wchar_t*.
First of all what is type of file you are trying to store text in?Normal txt files stores in ANSI by default (so does excel). So when you are trying to print a Unicode character to a ANSI file it will print junk. Two ways of over coming this problem is:
try to open the file in UTF-8 or 16 mode and then write
convert Unicode to ANSI before writing in file. If you are using windows then MSDN provides particular API to do Unicode to ANSI conversion and vice-verse. If you are using Linux then Google for conversion of Unicode to ANSI. There are lot of solution out there.
Hope this helps!!!
std::wstring does not have any locale/internationalisation support at all. It is just a container for storing sequences of wchar_t.
The problem with wchar_t is that its encoding is unspecified. It might be Unicode UTF-16, or Unicode UTF-32, or Shift-JIS, or something completely different. There is no way to tell from within a program.
You will have the best chances of getting things to work if you ensure that the encoding of your source code is the same as the encoding used by the locale under which the program will run.
But, the use of third-party libraries (like DirectX) can place additional constraints due to possible limitations in what encodings those libraries expect and support.
Bug solved, it turns out to be the CASTING problem (not rendering problem as previously said).
The bugged text is a intermediate product during some internal conversion process using swtringstream (which I forgot to mention), the code is as follows
wstringstream wss;
wstring text;
textToGenerate.append(L"some text");
wss << timer->getTime()
text.append(wss.str());
Right after this process the debugger shows the text as a bunch of random stuff but later somehow it converts back so it's readable. But the problem appears at rendering stage using DirectX. I somehow left the casting for wchar_t*, which results in the incorrect rendering.
old:
LPCWSTR lpcwstrText = (LPCWSTR)textToDraw->getText();
new:
LPCWSTR lpcwstrText = (*textToDraw->getText()).c_str();
By changing that solves the problem.
So, this is resulted by a bad cast. As some kind people provided correction to my statement.

Unicode troubles in FreeType

So, I've got an implementation that parses an xml that, among other things, positions and strings of Wikipedia's main page. The parsing is done with rapidxml after which the strings are converted from UTF-8 to UTF-32 by http://utfcpp.sourceforge.net/. The UTF-32 code is then used in freetype's:
unsigned long c = FT_Get_Char_Index(face,*p);
FT_Load_Glyph(face,c,FT_LOAD_RENDER);
where *p is the UTF-32 char code. This glyph is then rendered in OpenGL.
Now, I can't seem to get cryllic characters to work, nor any chinese or japanese or viet, I am sure that *p corresponds to the correct code, and I would be thankful for any pointers I can get.
For these fonts Microsofts arial.ttf is used, from the Arch linux package and from what I've seen in fontviewing programs, it should contain the characters that I want.
Two things to suggest:
First, have you called FT_Select_Charmap to specify you're using a Unicode encoding?
FT_Select_Charmap(face , ft_encoding_unicode);
Second, not all Arial fonts have all characters, and some font viewers (on Windows, anyway) can mislead by automatically substituting glyphs from different faces. Try ArialUni.ttf if you can find it.
Do not forget to set the font size right after loading the face.
FT_Error err = FT_Set_Pixel_Sizes(face, (width), (height));

Unicode character for superscript shows a square box: ࠚ

Using the following code to create a Unicode string:
wchar_t HELLO[20];
wsprintf(HELLO, TEXT("%c"), 0x2074);
When I display this onto a Win32 Control like a Text box or a button it gets mapped to a [] Square.
How do I fix this ?
I tried compiling with both Eclipse(MinGW) and Microsoft Visual C++ (2010).
Also, UNICODE is defined at the top
Edit:
I think it might be something to do with my system, because when I visit: http://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
some of the unicode characters don't appear.
The font you are using does not contain a glyph for that character. You will likely need to install some new fonts to overcome this deficiency.
The character you have picked out is 'SAMARITAN MODIFIER LETTER EPENTHETIC YUT' (U+081A). Perhaps you were after U+2074, i.e. 'SUPERSCRIPT FOUR' (U+2074). You need hex for that: 0x2074.
Note you changed the question to read 0x2074 but the original version read 2074. Either way, if you see a box that indicates your font is missing that glyph.
The characters you are getting from Wikipedia are expressed in hexadecimal, so your code should be:
wchar_t HELLO[20];
wsprintf(HELLO, TEXT("%c"), (wchar_t)0x2074); // or TEXT('\x2074')
If it still doesn't work, it's a font problem; if you need a pan-Unicode font, it seems that Code2000 is one of the most complete out there.
Funny fact: the character that has the decimal code 2074 (i.e. hex 81a) seems to actually be a box (or it's such a strange beast that even the image outline at FileFormat.Info is wrong). :)
For the curious ones: it turns out that 0x081a is this thing:

Convert Glyph indices to Unicode character

I am working on a printer driver sample. In this sample i am hooking to DrvTextOut() call. when the call back is called i get the text as Glyph indices. i want to convert these Glyph indices to Unicode character.
please let me know how to convert it.
In general the answer is "you cannot". In PDFs, for instance, you might have an embedded character map that lets you look up the characters corresponding to the glyphs (e.g. if you used the cmap package with pdfLaTeX to make the the PDF), but glyphs are private to a font, and there may be many glyphs that get used for the same character and vice versa, thanks to the magic of the GSUB tables.
If you're really desperate and have access to the font in question, you could try to build a character map yourself from the font file, but you better know which font you are currently looking at.
Edit: I think your question is tagged poorly; are you referring to this function? Perhaps the FONTOBJ structure that you already own exposes some sort of character map of the font, I wouldn't know.
If you mean you need to handle the case where STROBJ->flAccel has SO_GLYPHINDEX_TEXTOUT set, then see this answer here from Microsoft's Bobby Mattappally:
http://www.winvistatips.com/glyph-handles-drvtextout-t183048.html
There is not always a 1:1 mapping
between glyph indices and character
codes and vice versa. This is
expecially true with international
character sets like Hebrew, Arabic
etc.
So once you get this as
SO_GLYPHINDEX_TEXTOUT, your driver
should handle the glyph itself instead
of trying to convert it back to
Unicode.

Problem rendering non-English unicode text using freetype font on OpenGL

I am currently following NeHe tutorial lesson 43 ( http://nehe.gamedev.net/data/lessons/lesson.asp?lesson=43). The code works satisfactorily only for English text, not Unicoded languages. Fortunately, I follow a link from NeHe lesson 43 to http://www.cs.northwestern.edu/~sco590/fonts_tutorial.html and found another identical tutorial sample with only one difference: it uses w_char, and the site claims that you can run on a language other than English.
So I give it a try:
freetype::print(our_font, 320, 200, (unsigned short*)L"Active FreeType Text หกโด้กี่ดุ öáæé おはよ。- %7.2f", cnt1);
the function print of namespace freetype has the 4th argument as *const unsigned short** so I typecasted it. I also put an L in front of the double quoted string for long characters and put in some Asian characters for testing purpose.
The result is all the English text can displayed just fine, but all the Thai characters become "[]B[]I[]5H[]8". The [] are square boxes. From what I understand, this implies that the font does not have the specified language, so I tried out other fonts, but all other Thai fonts give out these same square boxes. For the Japanese font, it is the same. All boxes along with some English characters next to them. The substring öáæé is being rendered just fine without any problem.
Am I forgetting something here? How can we display non-English Unicode language here?
Fortunately, the author has uploaded a modified version of his tutorial in his website (specified in the question) and it uses wchar_t (in the original version, the author uses *const unsigned short** as an argument in the print function), which allows non-English languages.
It looks like print() in lesson 43 is not even anywhere near Unicode capable. All NeHe is doing is creating 256 display lists for the first 256 ASCII characters, not accepting a UTF8 string and converting it to UTF32 for FreeType.
Transliterating this into C++ has worked quite well for me.
Also, grab a copy of the GNU Unifont to make sure you have glyphs for all of the BMP.