I am developing cocos2d game, which supports multiple languages. I created a font file(.png and .fnt) with all supported characters.
The issue is some of the character id's are in range of 917505-917631. So I set kCCBMFontMaxChars = 917632. But this is taking lot of memory.
Can anyone please tell me how to handle this situation.
kCCBMFontMaxChars = 0xffff; // 65k
This should suffice for all Unicode characters. It certainly works for all the asian and cyrillic languages. The memory usage will be exactly 2 MB.
Don't worry about the ID, I believe they are offsets into the BMFont char array and not indexes. Each entry is 32 Bytes. 917632 divided by 32 gives you 28676, which if it is an index fits within the unicode character range.
Related
I'm making a game in C++ console using UTF-16 characters to make it little bit more interesting, but some characters are different size then others. So, when I print the level, things after character are moved further than others. Is there any way how to add spacing between characters with some console function, I try to google something helpful, but I have not found nothing.
I tried to change font size by CONSOLE_FONT_INFOEX, but it changed nothing, maybe i implement it in the wrong way, or it not work with UTF-16 characters.
// i tried this
CONSOLE_FONT_INFOEX cfi;
cfi.cbSize = sizeof(cfi);
cfi.dwFontSize.X= 24;
cfi.dwFontSize.Y= 24;
Unfortunately I expect that this will heavily depend on the particular console you're using. Some less Unicode-friendly consoles will treat all characters as the same size (possibly cutting off the right half of larger characters), and some consoles will cause larger characters to push the rest of the line to the right (which is what I see in the linked image). The most reasonable consoles I've observed have a set of characters considered "double-wide" and reserve two monospace columns for those characters instead of one, so the rest of the line still fits into the grid.
That said, you may want to experiment with different console programs. Can I assume you are on Windows? In that case, you might want to give Windows Terminal a try. If that doesn't work, there are other console programs available, such as msys's Mintty, or ConEmu.
So, after some intense googling i found the solution. And solution is fight fire with fire. Unicode include character Thin Space, that is 1/5 of the normal space, so if i include two of them with one normal space after my problematic character, the output is diplaying how i want. If anybody runs into some simliar issue, unicode have lot of different sized spaces. I found Website that shows them all of them, with their propperties.
fixed output picture
I am using the C++ ICU library. I wish to split a utf-8 string into approximately equal chunks. However, I want the chunks to be demarcated at grapheme cluster boundaries. I do not wish to convert my entire string into utf-16 to do this for both memory and speed efficiency. Instead, I want to translate a small number of utf-8 codepoints close to my estimated chunk boundaries into utf-16. I can then use ICU's BreakIterator to work out the exact boundaries.
Is there a hard upper limit of the number of codepoints that can make up a grapheme cluster? If so, what is it? I need to know this in order to determine the minimum codepoints that I need to translate from utf-8 to utf-16.
Is there a hard upper limit of the number of codepoints that can make up a grapheme cluster?
No. There is no hard upper limit for how many code points a grapheme clusters - i.e. a user-perceived character - consists of.
You could for example repeatedly add ZERO WIDTH JOINER with a joined character.
Just to add an example to the accepted answer.
You can for example create arbitrarily large grapheme clusters using this page:
https://glitchtextgenerator.com/
As an example here is a "letter X" that occupies 73 bytes on disk:
x̧̡̬̘͓̖̲̻̻̲̠̪̻͓͙̜̂̓̊̔̀̀͗̑̀̅̀̂̚͘̕̚͘͢͜͠
I also created another that is close to 10 kilobytes, but perhaps better not post such monsters here because they could cause some problems. Depending on software these render in interesting ways.
In my c++ textbook, there is an "ASCII Table of Printable Characters."
I noticed a few odd things that I would appreciate some clarification on:
Why do the values start with 32? I tested out a simple program and it has the following lines of code: char ch = 1; std::cout << ch << "\n"; of code and nothing printed out. So I am kind of curious as to why the values start at 32.
I noticed the last value, 127, was "Delete." What is this for, and what does it do?
I thought char can store 256 values, why is there only 127? (Please let me know if I have this wrong.)
Thanks in advance!
The printable characters start at 32. Below 32 there are non-printable characters (or control characters), such as BELL, TAB, NEWLINE etc.
DEL is a non-printable character that is equivalent to delete.
char can indeed store 256 values, but its signed-ness is implementation defined. If you need to store values from 0 to 255 then you need to explicitly specify unsigned char. Similarly from -128 to 127, have to specify signed char.
EDIT
The so called extended ASCII characters with codes >127 are not part of the ASCII standard. Their representation depends on the so called "code page" chosen by the operating system. For example, MS-DOS used to use such extended ASCII characters for drawing directory trees, window borders etc. If you changed the code page, you could have also used to display non-English characters etc.
It's a mapping between integers and characters plus other "control" "characters" like space, line feed and carriage return interpreted by display devices (possibly virtual). As such it is arbitrary, but they are organized by binary values.
32 is a power of 2 and an alphabet starts there.
Delete is the signal from your keyboard delete key.
At the time the code was designed only 7 bits were standard. Not all bytes (parts words) were 8 bits.
All the ASCII codes greater than 127 are replaced by Diamond? symbol. How can I display those characters. I have an unsigned char buffer[1024] which contains values from 0 to 256.
Use the QString class's fromAscii() method. By default this will treat Ascii chars above 128 as Latin-1 chars. To change this behavior use QTextCodec::setCodecForCStrings method to set the correct codec for your usage.
I believe QT5 may have taken out the setCodecForCStrings method.
EDIT: Adnan supplied the QT5 alternative to setCodecForCStrings method, adding to answer for completeness.
Qt5 alternative for setCodecForCStrings is QTextCodec::setCodecForLocale(QTextCodec::codecForName("UTF-8"));
This is a rabbit hole with no end. Qt does not fully support printing ascii > 127 as it is not well defined. The current method is to use "fromLocal8bit()" which will take a char array and transform it into the "right" Unicode string (the only thing Qt supports printing).
QTextCodec::setCodecForLocale can be used to identify the character set you wish to transform from. Many codecs are supported, but for some reason IBM437 (the character set used by IBM PCs in the US for decades) is not supported, where several other codecs used by Europe, etc. are. Probably some characters in IBM437 were never assigned proper code points in Unicode, so transforming it isn't possible?
What's frustrating is that there are fonts with all 256 ascii code points, but it is simply not possible to display these in Qt as they only work with Unicode strings. There are a handful of glyphs they don't support, and it seems to grow with newer versions of Qt. Currently I know of 9, 10, 12, 13, and 173. Some of these are for obvious reasons (usually you don't want to print a carriage return glyph, though it did exist in DOS), but others used to work in Qt and now do not.
In my application, I resorted to creating a new font that has copies of the unprintable glyphs in higher unicode codepoints, and translate them before printing them on the screen. It's quite silly but Qt gave up on ascii many years ago, so it's the best option I could find.
Question: What is the correct order of Unicode extended symbols by value?
If I excel sort a list of Unicode chars the order is different than if I use the excel "=code()" and sort by those values. The purpose is that I want to measure the distance between chars, for example a-b = 1 and &-% = 1; when sorted with the excel sort function, two chars that are ordered within three appear to have values that are 134 away.
Also, some char symbols are blank in excel and several are found twice with 'find' and are two different symbols - and a couple are not found at all. Please explain the details of these 'special' chars.
http://en.wikipedia.org/wiki/List_of_Unicode_characters
sample code:
int charDist = abs(alpha[index] - code[0]);
EDIT:
To figure out the UNICODE values in c++ vs2008 I ran each code as a comparison from code 1 to code 255 against code 1
cout << mem << " code " << key << " is " << abs(key[0] - '') << " from " << endl;
In the brackets is a black happy face that this website does not have the font for but the command window does, in vs2008 it looks like a half-post | with the right half of a T. Excel leaves a blank.
The following Unicodes are not handled in c++ vs2008 with the std library and #include
9, 10, 13, 26, 34, 44,
And, the numerical 'distance' for codes 1 through 127 are correct, but at 128 the distance skips an extra and is one further away for some reason. Then from 128 to 255 the distance reverses and becomes closer; 255 is 2 away from 1 ''
It'd be nice if these followed something more logical and were just 1 through 255 without hiccups or skips and reversals, and 255-1 = 254 but hey, what do I know.
EDIT2: I found it - without the absolute - the collation for UNIFORMAT is 128 to 255 then 1 to 127 and yields 1 to 255 with the 6 skips for 9, 10, 13, 26, 34, 44 that are garbage. That was not intuitive. In the new order 128->255,1->127 the strange skip from 127 to 128 is clearer, it is because there is no 0 so the value is missing between 255 and 1.
SOLUTION: make my own hashtable with values for each symbol and do not rely on c++ std library or vs2008 to provide the UNIFORMAT values since they are not correct for measuring the char distance outside of several specific subsets of UNIFORMAT.
Unicode doesn't have a defined sort (or collation) order. When Excel sorts, it's using tables based on the currently selected language. For example, someone using Excel in English mode may get different sorting results that someone using Excel in Portuguese.
There are also issues of normalization. With Unicode, one "character" doesn't necessarily correspond to one value. Some characters can be represented in different ways. For example, a capital omega can be coded as a Greek letter or as a symbol for representing units of electrical resistance. In some languages, a single character may be composed from several consecutive values.
The blank values probably correspond to glyphs that you don't have any font coverage for. Some systems use so-called "Unicode fonts" which have a large percentage of the glyphs you need for every script. Windows tends to switch fonts on the fly when the current font doesn't have a necessary glyph. Neither approach will have every glyph necessary. Also, some Unicode values don't encode to a visible glyph (e.g., there are many different kinds of spaces in Unicode), some values act more like ASCII-style controls codes (e.g., paragraph separator or bidi controls), and some values only make sense when they combine with another character, like many of the "combining" accents.
So there's not an answer you're going to be satisfied with. Perhaps if you gave more information about what you're ultimately trying to do, we could suggest a different approach.
I don't think you can do what you want to do in Excel without limiting your approach significantly.
By experimentation, the Code function will never return a value higher than 255. If you use any unicode text that cannot be generated via this VBA Code, it will be interpreted as a question mark (?) or 63.
For x = 1 To 255
Cells(x, 1).Value = Chr(x)
Next
You should be able to determine the difference using Code. But if the character doesn't fall in that realm, you'll need to go outside of Excel, because even VBA will convert any other Unicode characters to the question mark(?) or 63.