simple sum of Unicode symbol codes - c++

I want to do this:
1) Click event of Convert ! button
User must type 2 value into writable edit controls. After pressing Convert ! program must set sum of these characters Unicode values to first read-only edit control (it's near of " = " symbol). For example, if I set first edit control value as є (which its UTF-16 (hex) encoding value is 0x0404 (0404). It's also known as Cyrillic Capital Letter Ukrainian IE) and second edit control value as # (which its UTF-16 (hex) encoding value is 0x0040 (0040). It's also known as Commercial At), then result must be a symbol: ф (its UTF-16 encoding value is 0x0444 (0444)). Therefore, its value equals to sum of other edit controls UTF-16 encoding values. How can I do this?
2) Click event of Undo button
By clicking Undo button, it must sets the value of edit control the below this button. This value should be є symbol (as you see its Unicode encoding value is extraction of sum and second edit control's value. How can I do this?
I've searched out for these problem for 2 weeks in Google, MSDN and some different forums. But I couldn't find any helpful topic. I could find only the MultiByteCharacterSet, _mbclen, mblen and _mblen_l functions. If these functions are useful for me, how can I use it/them in my program? Please, give me advice. I'm new to VC++.
Edit
User must enter a character. It is maybe a digit or letter. Not a word or sequence of characters or number.
Thanks for any attention.
P.S: If there are too many and poor mistakes in my grammar, and if the question is duplicate so sorry...
Best regards,
Mirjalal.

The input value is already equal to its unicode-16 value. No conversion is needed.
CString in1(L'1');
CString in2(L'2');
CString sum(wchar_t(in1[0] + in2[0]));

Related

printing string with c++ for language other than english

Hi I am trying to print a string in c++, which is not in English, and the output is always ????, for example, I want to print a korean world '선배' or Thai word 'ยิ่ง', the simple code snippet is as follows-
main(){
string name("선배");// string name("ยิ่ง");
int len=name.size();
cout<<"\n name: "<<name;
cout<<"\n length "<<len;
}
OUTPUT:
name: ??
length 2
Where as if I change the string line by English character as-
string name("ab");
OUTPUT:
name: ab
length 2
Update: I also tried wchar_t, which is also printing question marks.
code-
wchar_t *a=L"อดีตรักงานไหม";
wprintf(L"sss : %s \n" , a);
I checked the property of the project, project properties->configuration properties->general and the Character set is set as ' Use Unicode Charecter Set'
Anybody can please tell me what is going wrong? How can I get it printing different languages?
regards
I'm not familiar with korean, but in general you need to do two things:
Set the correct code page using std::locale OR use unicode (for example std::wstring and std::wcout).
Set your console to a font that can display those characters. The default font in Windows can not do this.
If you are using Windows, you can set the console font by using SetCurrentConsoleFontEx
CONSOLE_FONT_INFOEX cfi;
cfi.cbSize = sizeof cfi;
cfi.nFont = 0;
cfi.dwFontSize.X = 0;
cfi.dwFontSize.Y = 16;
cfi.FontFamily = FF_DONTCARE;
cfi.FontWeight = FW_NORMAL;
wcscpy_s(cfi.FaceName, L"Consolas");
SetCurrentConsoleFontEx(GetStdHandle(STD_OUTPUT_HANDLE), FALSE, &cfi);
IF you want to set it independent of your actual application or you do not have the prerequisites for the function above, you can have a look at the different guides on the internet, for example this one.
I have no clue what font may support asian characters, you will need to check this yourself. Any unicode font should do.
You need to write byte order mark (BOM) first then you can print this.
I am working on a project in Hebrew using Microsoft Visual Studio Community 2019.
When trying to output string literals of non-English characters in any way, I get either question marks in boxes, or just question marks. I checked to see how the command line handles the situation by saving a file with a Hebrew name in Explorer and then accessing it through CMD. Again, question marks.
I am guessing there is a way to include the language packs in the c++ script (that's what I am looking up now), but saw this and wanted to share what else I found out. By looking at the Disassembly of my code I noticed that the Assembler is mishandling the assignment to the register. When the value (characters) are loaded, the Unicode formatting (right to left) causes the Assembler to flip the parenthesis and shift the first two (last two) values to the opposite side resulting in an unusable value in the register:
eax,dword ptr [ב (0BDF3A0h)]
Eeven as I try to save this it comes out wrong: what it should be is a zero, a right-parenthesis, a space, and then the Hebrewcharacter enter image description here
which should be [(0בBDF3A0h)].
(Somehow in my code, I now have the Unicode for א outputting the value assigned to it...)
I'm looking at how to handle the issue. Hopefully, you know more than I do :-) Good luck!
More:
As variable
as string literal

C++ - Displaying and Storing Japanese Characters

So, I'm attempting to use C++ and Windows Forms to create an application that will help me study Japanese (for now, Hiragana and possibly Katakana only). The aim is to be able to create a program that has the user select the character sets they want to use (A through O, KA through KO, etc.), and either view the cards freely or have the program test them over the characters. For debugging purposes, I currently have the View button set to output 5 values to 5 different text boxes - the Roman pronunciation, the corresponding character, its position in an array in which all of the characters are stored, and a Boolean value.
My problem lies in the fact that the characters all show up as "?", and I get multiple warnings when I compile. An example of this warning:
1>c:\users\cameron\documents\visual studio 2010\projects\japanesecards\japanesecards\Form1.h(218): warning C4566: character represented by universal-character-name '\u3093' cannot be represented in the current code page (1252)
This shows up 46 times, 1 for each Japanese character in the array. The array's declaration line is,
std::string hiraList[5][11][2];
An example of inserting a Romanji-Hiragana pair is,
hiraCheck[0][0][0] = "A";
hiraCheck[0][0][1] = "あ";
Finally, the Hiragana is being inserted into a text box using the following code:
System::String^ displayText = gcnew String(hiraList[x][y][1].c_str());
textBox5 -> Text = displayText;
Basically, given all of this, my question is - How can I get my form to display Japanese characters properly in a text box?
Okay! I've done a bit of tweaking and experimenting with wchar_t, and I've found out a solution.
First, I reduced the hiraList array to a two-dimensional array, and moved the Hiragana characters into their own, array, defined like so:
wchar_t hiraChar[5][11];
And added values like so:
hiraChar[0][0] = L'あ';
Then, I went down to the code for the 'View' button and made a few changes:
Deleted the method for declaring and filling the displayText variable
Updated the line of code which assigns textBox5 its text value to read from hiraChar[x][y]
A line of the new code has been pasted below:
textBox5 -> Text = hiraChar[x][y].ToString();
In essence, the program now creates three variables for Hiragana - One to monitor check boxes, one to store the romanji values, and one to store Hiragana characters. When at least one check box is selected, and the View button pressed, five things are outputted to text boxes - the character, its position in the array (x and y are separate boxes), its romanji equivalent, and a 'True' value which was used earlier in development for debugging purposes.

hyphen character and apostrophe character - the same ASCII code in different languages?

I need to specify a regex for validation of user input that allows the user to enter a hyphen character or apostrophe character on Windows Desktop operating systems or Mac OS/X desktop operating systems.
The user may have configured for the following languages:
English
French
Spanish
Portuguese
Hawaiian
I wan't to understand if I use a standard ASCII regex for hyphen and apostophe (e.g. ['-]) whether that will catch the hyphen or apostrophe keys typed by the user in most cases. I appreciate my definition is quite loose as there are many different keyboard layouts, OS versions, and language definitions (e.g. fr_FR, ca_FR).
I have checked the following resources and generally searched on google, but could not find anything in particular about saying that the ASCII code generated by a hyphen key or apostrophe key will always be ASCII code 45 and ASCII code 39 respectively.
http://en.wikipedia.org/wiki/Keyboard_layout
http://en.wikipedia.org/wiki/Hyphen
http://en.wikipedia.org/wiki/Apostrophe
NOTE: If you feel this question is badly worded, please add a comment to help me improve it.
You're mixing up a couple of things:
keyboard layout is what determines what value get assigned to a scancode.
localization settings determine in what language you should address the user, and wether the user expects a decimal point or comma.
character encoding is how a glyph is encoded into the bits memory and, in reverse, how to decode bits into glyphs
If you're validating user input, you shouldn't be interested in scancodes. A DVORAK layout user on a QWERTY keyboard will be pressing the Q key to input an '. And you shouldn't mess with that. So you have no business dealing with keyboard layouts.
The existence of this keyboard, should remind you, that what keys do is not your head-ache, but up to the user.
The localization settings will matter to you, but not for your regex. They will, however, tell you in what language you should put your error message, in case the user input is invalid. A good coding practice is to use a library like gettext to manage this.
What matters most, when you are validating input. Is just those 2 things: what is valid and what is the input.
You (or your domain expert) decide what is valid. Wether a hyphen-minus is just as acceptable as a hyphen or n-dash.
The input will be in encoded; computers work with bits, not strings of glyphs. It could be ASCII, but I'd steer towards unicode if I could help it.
As for your real concern, if I may rephrase it: "Can all users easily enter ' and -?". I guess they probably can. Many important programming languages use those glyphs to resp. denote strings and as a subtraction operator. And if your application needs to (dis)allow certain glyphs you can put unicode code points or categories in your regex.

Incorrect conversion when decimal point embedded in VT_BSTR and German locale used

I have a piece of code(c++) that is writing some floating point values to excel like this:
...
values[ position ].bstrVal = formattedValue;
values[ position ].vt = VT_BSTR;
...
as you can see those floating point values are stored in the form of string and the decimal point is formatted in different ways, for example:
"110.000000", "20.11" etc. (this example is for English locale)
Now it works perfectly when English locale is used. However when I switch to German locale in the Control Panel the decimal point is changed to "," (and that's fine) but after passing those localized strings to Excel they are not correctly converted. For example in case of writing "110,000000" I'm getting 100 millions in excel. Other values like "20,11" stay as a text.
The only way to fix this is to overwrite the decimal point with "." in my program before writing to Excel. Any ideas why the conversion is not locale-aware when using VT_BSTR?
I should also add that I tried to switch the locale in my program from default one to German - still no luck.
Thank you in advance
It is never a good idea to let Excel guess at the value type. Do not use VT_BSTR, a currency value should be of variant type VT_CY. Assign the cyVal member with the value. It is an 8 byte integer value (int64 member of type LONGLONG), the currency amount multiplied by 10,000. Ten thousand :)

Can all keys be represented as a single char in c++?

I've searched around and I can't seem to find a way to represent arrow keys or the escape key as single char in c++. Is this even possible? I would expect that it would be similar to \t or \n for tab and new line respectively. Whenever I search for escaped characters, there's only ever a list of five or six well known characters.
The short answer is no.
The long answer is that there are a number of control characters in the standard ANSI character set (from decimal 1 to decimal 31, inclusive), among which are the control codes for linefeed, carriage return, end-of-file, and so on. A few are commonly interpreted as arrows and the escape key, but only for compatibility with terminals.
Standard PC keyboards send a 2- or 3-byte control code that represents the key that was pressed, what state it's in, which control/alt/shift key is pressed, and a few other things. You'll want to look up "key codes" to see how to handle them. Handling them differs between operating systems and the base libraries you use, and their meaning differs based on the operating system's configured keyboard layout (which may include characters not found in the ANSI character set).
Not possible; keyboards built for some languages have characters that can't be represented in a char, and anyway, how do you represent control-option-command-shift-F11 in a char?
Keyboards send scancodes, which are either some kind of event in a GUI system or a short string of bytes that represent the key. What codes depends on your system, but on most terminal-like systems, ncurses knows how to deal with them.
char variables usually represent elements in the ASCII table.
http://www.asciitable.com/
there is also man ascii on unix. If you want arrow keys you'll need a more direct way to access keyboard input. the arrow keys get translated into sequences of characters before hitting stdio. If oyu want direct keyboard access consider a GUI library, sdl, direct input to name a few.
There aren't any escape characters for the arrow keys. They are represented as Keycodes, afaik. I suggest using a higher level input library to detect key presses. If you want to do it from scratch, the approach taken might vary with the specific platform you are programming for.
In any case, back in the days of TURBO C, I used -
//Wait for keypress
//store input in ch
//see if the ASCII code matches the code for one of the arrow keys