Hi I am trying to print a string in c++, which is not in English, and the output is always ????, for example, I want to print a korean world '선배' or Thai word 'ยิ่ง', the simple code snippet is as follows-
main(){
string name("선배");// string name("ยิ่ง");
int len=name.size();
cout<<"\n name: "<<name;
cout<<"\n length "<<len;
}
OUTPUT:
name: ??
length 2
Where as if I change the string line by English character as-
string name("ab");
OUTPUT:
name: ab
length 2
Update: I also tried wchar_t, which is also printing question marks.
code-
wchar_t *a=L"อดีตรักงานไหม";
wprintf(L"sss : %s \n" , a);
I checked the property of the project, project properties->configuration properties->general and the Character set is set as ' Use Unicode Charecter Set'
Anybody can please tell me what is going wrong? How can I get it printing different languages?
regards
I'm not familiar with korean, but in general you need to do two things:
Set the correct code page using std::locale OR use unicode (for example std::wstring and std::wcout).
Set your console to a font that can display those characters. The default font in Windows can not do this.
If you are using Windows, you can set the console font by using SetCurrentConsoleFontEx
CONSOLE_FONT_INFOEX cfi;
cfi.cbSize = sizeof cfi;
cfi.nFont = 0;
cfi.dwFontSize.X = 0;
cfi.dwFontSize.Y = 16;
cfi.FontFamily = FF_DONTCARE;
cfi.FontWeight = FW_NORMAL;
wcscpy_s(cfi.FaceName, L"Consolas");
SetCurrentConsoleFontEx(GetStdHandle(STD_OUTPUT_HANDLE), FALSE, &cfi);
IF you want to set it independent of your actual application or you do not have the prerequisites for the function above, you can have a look at the different guides on the internet, for example this one.
I have no clue what font may support asian characters, you will need to check this yourself. Any unicode font should do.
You need to write byte order mark (BOM) first then you can print this.
I am working on a project in Hebrew using Microsoft Visual Studio Community 2019.
When trying to output string literals of non-English characters in any way, I get either question marks in boxes, or just question marks. I checked to see how the command line handles the situation by saving a file with a Hebrew name in Explorer and then accessing it through CMD. Again, question marks.
I am guessing there is a way to include the language packs in the c++ script (that's what I am looking up now), but saw this and wanted to share what else I found out. By looking at the Disassembly of my code I noticed that the Assembler is mishandling the assignment to the register. When the value (characters) are loaded, the Unicode formatting (right to left) causes the Assembler to flip the parenthesis and shift the first two (last two) values to the opposite side resulting in an unusable value in the register:
eax,dword ptr [ב (0BDF3A0h)]
Eeven as I try to save this it comes out wrong: what it should be is a zero, a right-parenthesis, a space, and then the Hebrewcharacter enter image description here
which should be [(0בBDF3A0h)].
(Somehow in my code, I now have the Unicode for א outputting the value assigned to it...)
I'm looking at how to handle the issue. Hopefully, you know more than I do :-) Good luck!
More:
As variable
as string literal
Related
I'm trying to make a xy program which prints ASCII art in the console with chracters such as ⣿, when running the program just prints question marks (?). I understand that its either because of me using the wrong encoding or Microsoft Visual Studio not having the dictionary of these ASCII Characters.
If you have any idea on how to either change encoding or fixing the isue ,it would be much appreciated
Possible solutions:
Try to change the source file encoding to UTF-8 without signature
or UTF-8 with signature.
Try to use wchar_t literal, i.e. std::wcout << L"Your String";.
Learn more:
how to change source file encoding in csharp project (visual studio / msbuild machine)? (Also applies to C++)
What does the 'L' in front a string mean in C++?
There is not a problem with your code but rather a problem with the console that shows your output. It does not show unicode character correctly. In order for it to show these characters correctly it need to recognize unicode and use a font that actually have those characters. To verify this, simple open a cmd window and copy/paste the character into it and see what heppens.
does anyone have experience in Unicodes?
I am facing a tough problem with Farsi unicodes.
I have an std::wstring s = (L"\u0634\u0646\u0628\u0647"); which is a Farsi word. When I debug it, I see that the underlying word is exactly what I want, but reversed. So I have researched and found that u2067 is for right to left reading the string.
NOTE:
I cannot reverse the string manually because Farsi characters are changing their shape regardless of their position in the string.
So I added the 2067 int the beginning and got
std::wstring s = (L"\u2067\u0634\u0646\u0628\u0647");.
But now the underlying string is the same , just added a square in the beginning if the string instead of reversing.
Does anyone have experince with this stuff? Please suggest a solution. Thanks!
The underlying string will be the same. You haven't changed the order of bytes, which is written right there in the code. But a renderer that understands Unicode should take those bytes and display the characters right-to-left. That's a visual thing. It has nothing to do with the encoding. From your question, it's not entirely clear what else you expected. It may be that you are viewing the string in a debugger, and the debugger does not support this feature of Unicode. If you try outputting the string to a proper console you ought to see it as you expect.
I have a function which returns a string.
I have to define that string with greek characters in the function itself and should return that string.
I am working on Linux platform and my code is in C++.
My function is as follows:
string gen_string()
{
string str = "αγρω";
return str;
}
But I am not able to give the input.
When I try to copy paste the greek characters I want, it is appearing as some garbage characters.
Can some one please help me with this?
Thanks in advance.
EDIT:
Thanks for all your response.
Its not about using the wstring or string.
When I copy the string to the vim to give it as input, it is appearing as something like this.
▒~^▒~T▒~A▒~A201604¸▒~B▒žMDF_F▒~S123▒~T▒~B▒▒~B▒
I also tried by keeping the text in the file and opening the text file from vim.
But still it's the same.
string is only for ASCII characters, I believe.
You have international, likely Unicode characters. Consider using std::wstring for a multibyte "wide" string.
If you mean copy from some text to the terminal input then how to do this depends on the terminal. If it's a gnome terminal you need to specify UTF-8 in the locale settings though I'm not sure if that would get you the Greek alphabet.
locale command will list the current locale setting in locale.conf. You likely want to change the LANG setting. A way to do this system wide is
localectl set-locale LANG=en_country_code.UTF-8
Change country_code. It's US for the United States but I don't know what the Greek code is. You may need to be root. To change it just for yourself modify
~/.config/locale.conf
(or $XDG_CONFIG_HOME/locale.conf or $HOME/.config/locale.conf).
whichever gets you to the locale.conf file. On most systems all of them do.
I'm beginner in C++
I'd like to print Arabic statement in C++ using Borland C, but I failed , I also tried to save the file as UTF-8 but it didn't help.
So please if any one knows anything about this or about what is the problem or how to configure the compiler to print Arabic please help me.
#include<iostram.h>
#include<conio.h>
void main()
{
clrscr();
char x [5] = {'ا','ح','م','د'};
for(int i = 0; i< 5; i++)
cout << x[i];
getche();
}
First of all, you are assuming that your source code can contain Arabic characters. This is a very bad idea, and depends on the assumption that the compiler is interpreting your source file in the same code page as your editor is writing it in.
The safest way to handle Arabic or other arbitrary Unicode in Windows C++ is to compile with _UNICODE, declare variables of wchar_t (and friends), and use Unicode constants like '\u6041' for your Arabic characters. If you must do things with 'char', you will have to come up with the multi-byte \x sequences in the right code page for your Arabic characters, and deal with the fact that a single char can't hold an Arabic character in UTF-8.
Finally, since you are using cout, this will only show you Arabic if the current code page of your DOS box is an Arabic code page.
If your BorlandC++ is under DOS
By default you have not any character set to show it as Arabic. But those days, there were applications which change extended ASCII characters to other languages such as Arabic, Persian, ... .
Steps you should do:
If you are using Windows Vista/7+, first you should use DosBox (you need Fullscreen-mode)
You must change the default ASCII font table in memory
Something like vegaf.com which defines Persian/Arabic alpha-beta
Note: UTF-8 is undefined for this system
C++11 is the only C++ standard that can offer native support for UTF-8 ( and other UTF charsets ) encoding.
in pre C++11 releases you can simply use a third part library if you need UTF-8 support like this one .
Using the following code to create a Unicode string:
wchar_t HELLO[20];
wsprintf(HELLO, TEXT("%c"), 0x2074);
When I display this onto a Win32 Control like a Text box or a button it gets mapped to a [] Square.
How do I fix this ?
I tried compiling with both Eclipse(MinGW) and Microsoft Visual C++ (2010).
Also, UNICODE is defined at the top
Edit:
I think it might be something to do with my system, because when I visit: http://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
some of the unicode characters don't appear.
The font you are using does not contain a glyph for that character. You will likely need to install some new fonts to overcome this deficiency.
The character you have picked out is 'SAMARITAN MODIFIER LETTER EPENTHETIC YUT' (U+081A). Perhaps you were after U+2074, i.e. 'SUPERSCRIPT FOUR' (U+2074). You need hex for that: 0x2074.
Note you changed the question to read 0x2074 but the original version read 2074. Either way, if you see a box that indicates your font is missing that glyph.
The characters you are getting from Wikipedia are expressed in hexadecimal, so your code should be:
wchar_t HELLO[20];
wsprintf(HELLO, TEXT("%c"), (wchar_t)0x2074); // or TEXT('\x2074')
If it still doesn't work, it's a font problem; if you need a pan-Unicode font, it seems that Code2000 is one of the most complete out there.
Funny fact: the character that has the decimal code 2074 (i.e. hex 81a) seems to actually be a box (or it's such a strange beast that even the image outline at FileFormat.Info is wrong). :)
For the curious ones: it turns out that 0x081a is this thing: