C++ - Displaying and Storing Japanese Characters

C++ - Displaying and Storing Japanese Characters - c++

So, I'm attempting to use C++ and Windows Forms to create an application that will help me study Japanese (for now, Hiragana and possibly Katakana only). The aim is to be able to create a program that has the user select the character sets they want to use (A through O, KA through KO, etc.), and either view the cards freely or have the program test them over the characters. For debugging purposes, I currently have the View button set to output 5 values to 5 different text boxes - the Roman pronunciation, the corresponding character, its position in an array in which all of the characters are stored, and a Boolean value.
My problem lies in the fact that the characters all show up as "?", and I get multiple warnings when I compile. An example of this warning:
1>c:\users\cameron\documents\visual studio 2010\projects\japanesecards\japanesecards\Form1.h(218): warning C4566: character represented by universal-character-name '\u3093' cannot be represented in the current code page (1252)
This shows up 46 times, 1 for each Japanese character in the array. The array's declaration line is,
std::string hiraList[5][11][2];
An example of inserting a Romanji-Hiragana pair is,
hiraCheck[0][0][0] = "A";
hiraCheck[0][0][1] = "あ";
Finally, the Hiragana is being inserted into a text box using the following code:
System::String^ displayText = gcnew String(hiraList[x][y][1].c_str());
textBox5 -> Text = displayText;
Basically, given all of this, my question is - How can I get my form to display Japanese characters properly in a text box?

Okay! I've done a bit of tweaking and experimenting with wchar_t, and I've found out a solution.
First, I reduced the hiraList array to a two-dimensional array, and moved the Hiragana characters into their own, array, defined like so:
wchar_t hiraChar[5][11];
And added values like so:
hiraChar[0][0] = L'あ';
Then, I went down to the code for the 'View' button and made a few changes:
Deleted the method for declaring and filling the displayText variable
Updated the line of code which assigns textBox5 its text value to read from hiraChar[x][y]
A line of the new code has been pasted below:
textBox5 -> Text = hiraChar[x][y].ToString();
In essence, the program now creates three variables for Hiragana - One to monitor check boxes, one to store the romanji values, and one to store Hiragana characters. When at least one check box is selected, and the View button pressed, five things are outputted to text boxes - the character, its position in the array (x and y are separate boxes), its romanji equivalent, and a 'True' value which was used earlier in development for debugging purposes.

Related

Bolding certain characters in a CCombobox entry (C++, MFC)

I have a CComboBox containing strings of serial numbers. I want to be able to bold individual characters in the string. Like for instance, I would like to be able to make the second string 87650123 show up as 87650123.
I have seen some posts about bolding a whole individual string entry in a CCombobox, but not individual characters. Would this be possible? Thanks in advance.
I have attempted bolding the entire entry to start, which was successful, but have not been able to do individual characters.

(C++) Can the color of text change as it is typed?

I have a list of instructions in my program and they are activated by entering a string. There are a large number of possible instructions. You could call them commands if you like.
I already have a program that can successfully execute the instructions I've added so far.
For example, adding a person to the database would require the user to enter add "John" "Doe".
This would output to the screen Added John Doe to the database, ID#1234. The IDs are random.
I know how to add colors; in this output text, "John Doe" would be colored green.
What I'm wondering is, can I make it so that color changes as one types? Because I learned how to use some kind of keyboard mode change so that when I type a password, all characters are displayed as * of any color I desire, or even nothing displayed at all, through "display" of \0, and I know how to do that. I was wondering if the color could change before the complete string (extracted with std::getline(std::cin, str);) is typed.
The first reason I want the colors to change is because when there are so many commands, and I already have more complex commands than that, I want to provide a way for the user to be able to correct syntax mistakes before they press enter. Something like Windows PowerShell, perhaps, which was written in C#. I know that C# is a very different language than C++, but if C# can achieve something like that, I want to see if C++ can as well. My hope is that it doesn't require thousands of lines of application-specific code, especially considering that PowerShell is an actual application and not a simple terminal-run executable. And while PowerShell appears to be open-source, I don't understand C#. See the bottom for the second reason.
I have no idea if this is possible, but because similar manipulation of entered text is possible (as I said, I know how to add colors and also mask text as some other single character like *), I want to know if this is also possible.
Simple examples:
Firstly, the user should know when they have not entered a valid command, and when they have. I want the text to be in red until the letters entered so far consist of an actual keyword, like add. So, the text would be red until the second d is added, when it reverts to white, and if another letter is entered, it becomes red again.
Example 1: add "John Elias" "Doe"
After the space, the text after "add" should be red no matter what, unless the character after the space is a quotation mark. In order to tell the user that they have not terminated the string, the text beyond the (orange?) quotation mark should be orange. When the final quotation mark is entered, the entire content of the quotation marks (including the quotation marks) should be some other color (probably green?) to tell the user that they have successfully entered an argument. The same applies to any instances of quotation-mark arguments. Note that a space is allowed in a quotation argument.
Example 2: list-info 1234
In this command, it gets more complex. list is a separate command, so the text should be red until t is entered, and it turns white. But then it turns red again after that, until o is entered, and it turns white again. The numerical argument following it should be red if the entered character isn't a digit. If it is, it's still red, because the only valid IDs are 3- or 4-digit numbers. It should turn green(?) once a third digit is entered, and still stay green when another digit is entered. But if a fifth digit is entered (or another character for that matter), the number turns red again. Although this would better be implemented as returning an error if the entered number is invalid, I would still like to know if this can be done as well.
Example 3: add "John" "Elias" "Doe" "fourth-string"
Since there is an overloaded function that enables an explicit first-middle-last name to be stored as well, it should be ok if there is a third string. But if there is a fourth string added, then it should be in red no matter what because add cannot take more than 3 arguments.
My question is, are any of these things possible? And yes, I am aware that it is almost certainly better to just implement an error system, but my intention is to expand my coding ability, and that is the second reason, and coding an error system will not do that because I have already done that for every command.
For reference, I'm operating in Linux Ubuntu 18.04, I compile with g++, my code conforms to C++17, I use ANSI escape sequences for color, bold, etc., and for masking characters with something like * I use a pointer to a char array (passed by address as char**) and a C-style FILE* to reference the input stream stdin (because I haven't bothered to conform it to a typical C++ implementation yet, learning ways to advance my current skills is my priority at this point in time).

how many spaces are considered in \t

why the number of space is different in case 3
how the result is getting effected by \t character.
(-) refers space by (\t)
case 1
void main()
{
int a,b;
printf("%d",printf("hello%d\t",scanf("%d%d",&a,&b)));
}
here the output is>hello2-7
case 2
void main()
{
int a,b;
printf("%d",printf("hello\t%d",scanf("%d%d",&a,&b)));
}
here the output is>hello-27
case 3
void main()
{
int a,b;
printf("%d",printf("\thello%d",scanf("%d%d",&a,&b)));
}
here the output is>--------hello27
Why in the 3rd case there are 8 spaces.

Most terminal programs will have a tab stop at every 8th column - so I'd expect output to be determined like this (I know your output's a little different - discussed below):
. column
. 1 2
input 12345678901234567890
"%d",printf("hello%d\t" hello2__7
"%d",printf("hello\t%d" hello___27
"%d",printf("\thello%d" ________hello27
To understand this, you have to understand the order of evaluation of your (unnecessarily complex) code. Examining the first printf line...
printf("%d",printf("hello%d\t",scanf("%d%d",&a,&b)));
Above, the arguments to the left-hand printf have to be prepared before it can print anything itself, and those arguments include the result of calling the right-hand printf. That right-hand printf outputs hello, the number of arguments scanf read from standard input which is 2 if you typed two, then the tab, then the right-hand printf has finished outputting and returns "7" to indicate how many characters it printed, which is printed by the left-hand printf. I would expect a tab to take you to the 9th column on screen, which suggests TWO spaces before the 7, where-as your question says you're observing 1. Clearly your terminal works a little different, probably considering the 8th, 16th, 24th etc. columns to be tab stops.
More about tabs
There is no universal interpretation of the \t TAB character... how it's rendered depends on the terminal software or rendering device you're using (e.g. an xterm, vt220, vt100 terminal, MS-DOS command window, printer, IDE, text editor etc.).
Some display/printing/formatting programs will consider there to be a tab stop every N characters, where N is often 8, such that if you issue a tab from the first column through to the 8th column you're taken to the 9th, a tab from the 9th to 16th column takes you to the 17th etc.. But, many programs will have ways to set arbitrary columns for tab placements. Some programs like MS Word can use variable-width fonts with which the number of characters between tab stops varies: if your C++ program prints some text that you import into Word you may find it practically impossible to work out how many tabs are needed to get the desired alignment of output - it's generally easier to just put one tab between values and change your tab stops inside Word so it all looks ok, or stick to a fixed-width font such as Courier.
C++ IDEs often let you set the value ("N" above) for columns per tab stop - 4 and 8 are both common settings, with 8 often meaning your source code indentation is a mix of tabs and spaces to reach the desired left-hand-column: that's kind of messy to navigate with naive cursor movement implementation. Many people prefer to set a "insert spaces when tab is pressed" option so the file is always saved with actual spaces, and displays more predictably with a wide variety of display/printing software.

A TAB has only the space given to it as it is rendered (so does any character, really); however, one subtle difference with tabs is that they are often taken to mean advance to the next "virtual column" (I'm sure there is a better term), where these virtual columns are, say, 8 characters wide; although this width can often be changed.
Here is an ugly graphic, where n..- represents a "virtual column" and T..t represents the space "taken up" by the tab:
1-------2-------3-------
hello\tworld helloTttworld
\thelloworld Tttttttthelloworld
hello\t\tworld helloTttTtttttttworld

in C99 and C11,
\t ( horizontal tab ) Moves the active position to the next horizontal
tabulation position on the current line. If the active position is at
or past the last defined horizontal tabulation position, the behavior
of the display device is unspecified.
while C++03 and C++11 don't specify the difference of '\t' with C.

\t does not contain any space. \t is a proper character that could be displayed with different length, but it's only one char.

As per compiler and computer software width would be changing either 4bits or 8 bits For example: here is c program compiled using compiler gcc 6.3 compiler on windows 10 pro where width 4 bits has taken,
#include <stdio.h>
int main(void) {
printf("a12345678patil\n");
printf("a\tpat\til");
}
output:
a12345678patil
a pat il

get char on screen

I've looked through the NCurses function list, and I can't seem to find a function that returns the characters already printed on the screen. Is there an accessible value for the char stored in each character cell? If not, is there a similar function in the Windows terminal?
I want to use this to replace all the characters on the screen of a certain value (ex: all the a's) with a different character, or with new attributes.

The function inch() gets the character and returns it as a chtype. Use winch() to get a character from a window other than stdscr.

Unicode Woes! Ms-Access 97 migration to Ms-Access 2007

Problem is categorized in two steps:
Problem Step 1. Access 97 db containing XML strings that are encoded in UTF-8.
The problem boils down to this: the Access 97 db contains XML strings that are encoded in UTF-8. So I created a patch tool for separate conversion for the XML strings from UTF-8 to Unicode. In order to covert UTF8 string to Unicode, I have used function
MultiByteToWideChar(CP_UTF8, 0, PChar(OriginalName), -1, #newName, Size);.(where newName is array as declared "newName : Array[0..2048] of WideChar;" ).
This function works good on most of the cases, I have checked it with Spainsh, Arabic, characters. but I am working on Greek and Chineese Characters it is choking.
For some greek characters like "Î•Ï…Î³. ÎšÎ±ÏÎ±Î²Î¹Î¬" (as stored in Access-97), the resultant new string contains null charaters in between, and when it is stored to wide-string the characters are getting clipped.
For some chineese characters like "?Â¢Â»?Âµ?"(as stored in Access-97), the result is totally absurd like "?¢»?µ?".
Problem Step 2. Access 97 db Text Strings, Application GUI takes unicode input and saved in Access-97
First I checked with Arabic and Spainish Characters, it seems then that no explicit characters encoding is required. But again the problem comes with greek and chineese characters.
I tried the above mentioned same function for the text conversion( Is It correct???), the result was again disspointing. The Spainsh characters which are ok with out conversion, get unicode character either lost or converted to regular Ascii Alphabets.
The Greek and Chineese characters shows similar behaviour as mentined in step 1.
Please guide me. Am I taking the right approach? Is there some other way around???
Well Right now I am confused and full of Questions :)

There is no special requirement for working with Greek characters. The real problem is that the characters were stored in an encoding that Access doesn't recognize in the first place. When the application stored the UTF8 values in the database it tried to convert every single byte to the equivalent byte in the database's codepage. Every character that had no correspondence in that encoding was replaced with ? That may mean that the Greek text is OK, while the chinese text may be gone.
In order to convert the data to something readable you have to know the codepage they are stored in. Using this you can get the actual bytes and then convert them to Unicode.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js