In console the user type several characters including: ú .I would like store these characters in a char array using std::cin, but the character ú is stored as: 163'£', I really want to store it as: 163'ú', How could I do it?.
The character set of the console defines how a char value will be displayed. For example:
if console uses ISO 8859-1 or windows-1252 character set, the value 163 is a £;
if console uses an old DOS code page 850, the same value of 163 is an ù.
In principle, if you input a char from the console and output this char on the same console, you should graphically get the same result.
However, if there's some mixing, this is not the case. For example if you input ù on a CMD window using 850 code page, but then output the result in a unicode window, you would get £ as output. Same phenomenon if you write a file to the disk and open it in an editor using another character encoding.
Unfortunately, console settings and default encodings are things which are very much system dependent, and more information is needed to provide you accurate advise on the best way to solve the issue.
Related
I'm getting stuck trying to convert an input string in char* to Chinese character encoding. An application accepts a Chinese string input ex: "啊说到" and when it is written into a file it turns into this "°¡Ëµµ½". I'm able to take this input and feed it to _mbstowcs_s_l() but the solution needs to be locale independent, so I'm forced to use either mbstowcs() or WideCharToMultiByte() but it looks like both would work for me if the input did already went through MBCS to UTF-8, which in our case isnt.
The project is using Multibyte Character Set, and I'm struggling to understand what is going on. One other thing is the input is coming from a different application and stores it into file.
The application that accepted the Chinese input is an MFC set to Multibyte Char Set and the os was set to regional Chinese Simplified, UI accepts the input and is placed on a CString, that is coped to a char*. This is that part where I don't know whats going on in the encoding, this application stores it into a file, then we read it using the other application, the string is read unto char*, thats when the characters seems to take the "°¡Ëµµ½".
Question is, how can I turn this encoded char"°¡Ëµµ½" back to its Chinese encoding "啊说到", with out setting the locale in _mbstowcs_s_l()? The problem is, we could be reading strings from other regional settings and the application wouldn't just know what character map to use unless we tell it to.
I'm writing a C++ program in Visual Studio for class. I am using certain Unicode characters within my program like:
╚, █, ╗, ╝, & ║
I have figured out how to print these characters onto the console properly but I have yet to find a way to output it to a file properly.
In Visual Studio, choosing [OEM United States - Codepage 437] encoding when saving the .cpp file allows it to display properly onto the console.
Now I just need a way to output these characters to a file without errors.
Hopefully someone knows how. Thank You!
Create the file using a wofstream, which uses wide (wchar_t) characters instead of an ofstream (which uses char).
Hello I was just wondering how I can display the infinity (∞) in C++? I am using CodeBlocks. I read couple of Q&A's on this topic but I'm a newbie at this stuff, especially with Hex coding and stuff. What do I have to include and what do I type out exactly. If someone can write the code and explain it, that'd be great! Thanks!
The symbol is not part of the ASCII code. However, in the code page 437 (most of the time the default in Windows Command Prompt with English locales/US regional settings) it is represented as the character #236. So in principle
std::cout << static_cast<unsigned char>(236);
should display it, but the result depends on the current locale/encoding. On my Mac (OS X) it is not displayed properly.
The best way to go about it is to use the UNICODE set of characters (which standardized a large amount of characters/symbols). In this case,
std::cout << "\u221E";
should do the job, as the UNICODE character #221 represents inf.
However, to be able to display UNICODE, your output device should support UTF encoding. On my Mac, the Terminal uses UTF, however Windows Command Prompt still uses the old ASCII encoding CodePage 437 (thanks to #chris for pointing this out). According to this answer, you can change to UNICODE by typing
chcp 65001
in a Command Prompt.
You can show it through its UNICODE
∞ has the value: \u221E
You can show any character from the Character Map by its unicode.
The character I'm first looking for is usually 201 in normal ascii code, but its different for mac. How do i work around this?
It's possible to input the Unicode characters on a Mac by switching to the Unicode Hex Input keyboard layout.
Open system preferences
Choose keyboard
Add Unicode Hex Input to the list
Select "Show Input menu in menu bar"
Close the preferences
Click on the flag that's appeared in the menu bar
Select Unicode Hex Input
Then you need the codes and you can find a nice summary of the box codes here at Wikipedia.
To enter a code:
Hold down Option (alt)
Type the code, without the preceding U, i.e for U+2560, type 2560
Release Option
I drew this example using that method: ╠╩╬╩╣
After you're finished, you can change your keyboard input back to your normal one using the flag in the menu bar.
This character in not available in any single byte character set on OS X.
Unlike the Windows environment (which require special coding to use Unicode), Unicode is readily available in OS X.
Use Unicode U+2554 or UTF-8 E2 95 94
You can just use the following in a character or string ╔
There is no such thing as ASCII character 201. ASCII is a 7-bit single byte character encoding, where code points go from 0 to 127, inclusive. Maybe you are referring to “╔” in the original IBM PC character set?
Then you can do this:
Use a Windows PC with a keyboard that has a numeric keypad.
In a console window with input (e.g. the command interpreter), hold down Alt and type 201 on the numeric keypad, in number mode (NumLock on).
Start Word or Windows’ WordPad.
Copy and paste the character into Word or WordPad.
Type Alt+X.
On my laptop WordPad reports 2554, which means it's Unicode character U+2554 (hexadecimal).
In C++ you can express that character as L'\u2554', which is of type wchar_t.
On the other hand, if you prefer names to numbers, ncurses has supported double- and thick-line drawing characters in Unicode terminals since late 2009. That is after the (rather old) ncurses 5.7 bundled with OSX, but newer releases are available with MacPorts, etc.
Here are a couple of screenshots to illustrate:
In my program I used wstring to print out text I needed but it gave me random ciphers (those due to different encoding scheme). For example, I have this block of code.
wstring text;
text.append(L"Some text");
Then I use directX to render it on screen. I used to use wchar_t but I heard it has portability problem so I switched to swtring. wchar_t worked fine but it seemed only took English character from what I can tell (the print out just totally ignore the non-English character entered), which was fine, until I switch to wstring: I only got random ciphers that looked like Chinese and Korean mixed together. And interestingly, my computer locale for non-unicode text is Chinese. Based on what I saw I suspected that it would render Chinese character correctly, so then I tried and it does display the charactor correctly but with a square in front (which is still kind of incorrect display). I then guessed the encoding might depend on the language locale so I switched the locale to English(US) (I use win8), then I restart and saw my Chinese test character in the source file became some random stuff (my file is not saved in unicode format since all texts are English) then I tried with English character, but no luck, the display seemed exactly the same and have nothing to do with the locale. But I don't understand why it doesn't display correctly and looked like asian charactor (even I use English locale).
Is there some conversion should be done or should I save my file in different encoding format? The problem is I wanted to display English charactore correctly which is the default.
In the absence of code that demonstrates your problem, I will give you a correspondingly general answer.
You are trying to display English characters, but see Chinese characters. That is what happens when you pass 8 bit ANSI text to an API that receives UTF-16 text. Look for somewhere in your program where you cast from char* to wchar_t*.
First of all what is type of file you are trying to store text in?Normal txt files stores in ANSI by default (so does excel). So when you are trying to print a Unicode character to a ANSI file it will print junk. Two ways of over coming this problem is:
try to open the file in UTF-8 or 16 mode and then write
convert Unicode to ANSI before writing in file. If you are using windows then MSDN provides particular API to do Unicode to ANSI conversion and vice-verse. If you are using Linux then Google for conversion of Unicode to ANSI. There are lot of solution out there.
Hope this helps!!!
std::wstring does not have any locale/internationalisation support at all. It is just a container for storing sequences of wchar_t.
The problem with wchar_t is that its encoding is unspecified. It might be Unicode UTF-16, or Unicode UTF-32, or Shift-JIS, or something completely different. There is no way to tell from within a program.
You will have the best chances of getting things to work if you ensure that the encoding of your source code is the same as the encoding used by the locale under which the program will run.
But, the use of third-party libraries (like DirectX) can place additional constraints due to possible limitations in what encodings those libraries expect and support.
Bug solved, it turns out to be the CASTING problem (not rendering problem as previously said).
The bugged text is a intermediate product during some internal conversion process using swtringstream (which I forgot to mention), the code is as follows
wstringstream wss;
wstring text;
textToGenerate.append(L"some text");
wss << timer->getTime()
text.append(wss.str());
Right after this process the debugger shows the text as a bunch of random stuff but later somehow it converts back so it's readable. But the problem appears at rendering stage using DirectX. I somehow left the casting for wchar_t*, which results in the incorrect rendering.
old:
LPCWSTR lpcwstrText = (LPCWSTR)textToDraw->getText();
new:
LPCWSTR lpcwstrText = (*textToDraw->getText()).c_str();
By changing that solves the problem.
So, this is resulted by a bad cast. As some kind people provided correction to my statement.