Printing arabic string in c++ - c++

I'm beginner in C++
I'd like to print Arabic statement in C++ using Borland C, but I failed , I also tried to save the file as UTF-8 but it didn't help.
So please if any one knows anything about this or about what is the problem or how to configure the compiler to print Arabic please help me.
#include<iostram.h>
#include<conio.h>
void main()
{
clrscr();
char x [5] = {'ا','ح','م','د'};
for(int i = 0; i< 5; i++)
cout << x[i];
getche();
}

First of all, you are assuming that your source code can contain Arabic characters. This is a very bad idea, and depends on the assumption that the compiler is interpreting your source file in the same code page as your editor is writing it in.
The safest way to handle Arabic or other arbitrary Unicode in Windows C++ is to compile with _UNICODE, declare variables of wchar_t (and friends), and use Unicode constants like '\u6041' for your Arabic characters. If you must do things with 'char', you will have to come up with the multi-byte \x sequences in the right code page for your Arabic characters, and deal with the fact that a single char can't hold an Arabic character in UTF-8.
Finally, since you are using cout, this will only show you Arabic if the current code page of your DOS box is an Arabic code page.

If your BorlandC++ is under DOS
By default you have not any character set to show it as Arabic. But those days, there were applications which change extended ASCII characters to other languages such as Arabic, Persian, ... .
Steps you should do:
If you are using Windows Vista/7+, first you should use DosBox (you need Fullscreen-mode)
You must change the default ASCII font table in memory
Something like vegaf.com which defines Persian/Arabic alpha-beta
Note: UTF-8 is undefined for this system

C++11 is the only C++ standard that can offer native support for UTF-8 ( and other UTF charsets ) encoding.
in pre C++11 releases you can simply use a third part library if you need UTF-8 support like this one .

Related

Printing ASCII code in C ++ (Visual studio not recognizing encoding)

I'm trying to make a xy program which prints ASCII art in the console with chracters such as ⣿, when running the program just prints question marks (?). I understand that its either because of me using the wrong encoding or Microsoft Visual Studio not having the dictionary of these ASCII Characters.
If you have any idea on how to either change encoding or fixing the isue ,it would be much appreciated
Possible solutions:
Try to change the source file encoding to UTF-8 without signature
or UTF-8 with signature.
Try to use wchar_t literal, i.e. std::wcout << L"Your String";.
Learn more:
how to change source file encoding in csharp project (visual studio / msbuild machine)? (Also applies to C++)
What does the 'L' in front a string mean in C++?
There is not a problem with your code but rather a problem with the console that shows your output. It does not show unicode character correctly. In order for it to show these characters correctly it need to recognize unicode and use a font that actually have those characters. To verify this, simple open a cmd window and copy/paste the character into it and see what heppens.

C++ Infinity Sign

Hello I was just wondering how I can display the infinity (∞) in C++? I am using CodeBlocks. I read couple of Q&A's on this topic but I'm a newbie at this stuff, especially with Hex coding and stuff. What do I have to include and what do I type out exactly. If someone can write the code and explain it, that'd be great! Thanks!
The symbol is not part of the ASCII code. However, in the code page 437 (most of the time the default in Windows Command Prompt with English locales/US regional settings) it is represented as the character #236. So in principle
std::cout << static_cast<unsigned char>(236);
should display it, but the result depends on the current locale/encoding. On my Mac (OS X) it is not displayed properly.
The best way to go about it is to use the UNICODE set of characters (which standardized a large amount of characters/symbols). In this case,
std::cout << "\u221E";
should do the job, as the UNICODE character #221 represents inf.
However, to be able to display UNICODE, your output device should support UTF encoding. On my Mac, the Terminal uses UTF, however Windows Command Prompt still uses the old ASCII encoding CodePage 437 (thanks to #chris for pointing this out). According to this answer, you can change to UNICODE by typing
chcp 65001
in a Command Prompt.
You can show it through its UNICODE
∞ has the value: \u221E
You can show any character from the Character Map by its unicode.

Windows UTF8 to UTF16

I've been trying to convert between UTF8 and UTF16 LE with BOM using C++ to make the characters output correctly on Windows, without having to change the font of the terminal.
I tried changing the code pages, but they didn't work.
I have 2 questions,
How can I convert a normal string to a wide string?
Is it a bad idea to create a C++ map that maps each unicode character to the character in the Windows code page?
For example,
wcout << L"\u00A0" << endl;
This code outputs the letter á on Windows when using Code page 850. How can I put a variable in place of the "\u00A0" to convert a normal string to a wide character on Windows?
What I'd like is this:
wcout << Lsome_variable << endl;
I realise it's not valid c++ syntax but does anyone know how I can do this? Or if there's a better way?
As noted in the comments, the standard library now provides things like std::wstring_convert (and other functions/classes in the See Also section of that page).
Since you're on Windows, the WinAPI also has conversion functions. In this case you would be looking for MultiByteToWideChar which can be used to convert from UTF-8 to UTF-16.
Between those options, something should fit your use case. Generally speaking, you should never need to write your own conversion map.

what locale does wstring support?

In my program I used wstring to print out text I needed but it gave me random ciphers (those due to different encoding scheme). For example, I have this block of code.
wstring text;
text.append(L"Some text");
Then I use directX to render it on screen. I used to use wchar_t but I heard it has portability problem so I switched to swtring. wchar_t worked fine but it seemed only took English character from what I can tell (the print out just totally ignore the non-English character entered), which was fine, until I switch to wstring: I only got random ciphers that looked like Chinese and Korean mixed together. And interestingly, my computer locale for non-unicode text is Chinese. Based on what I saw I suspected that it would render Chinese character correctly, so then I tried and it does display the charactor correctly but with a square in front (which is still kind of incorrect display). I then guessed the encoding might depend on the language locale so I switched the locale to English(US) (I use win8), then I restart and saw my Chinese test character in the source file became some random stuff (my file is not saved in unicode format since all texts are English) then I tried with English character, but no luck, the display seemed exactly the same and have nothing to do with the locale. But I don't understand why it doesn't display correctly and looked like asian charactor (even I use English locale).
Is there some conversion should be done or should I save my file in different encoding format? The problem is I wanted to display English charactore correctly which is the default.
In the absence of code that demonstrates your problem, I will give you a correspondingly general answer.
You are trying to display English characters, but see Chinese characters. That is what happens when you pass 8 bit ANSI text to an API that receives UTF-16 text. Look for somewhere in your program where you cast from char* to wchar_t*.
First of all what is type of file you are trying to store text in?Normal txt files stores in ANSI by default (so does excel). So when you are trying to print a Unicode character to a ANSI file it will print junk. Two ways of over coming this problem is:
try to open the file in UTF-8 or 16 mode and then write
convert Unicode to ANSI before writing in file. If you are using windows then MSDN provides particular API to do Unicode to ANSI conversion and vice-verse. If you are using Linux then Google for conversion of Unicode to ANSI. There are lot of solution out there.
Hope this helps!!!
std::wstring does not have any locale/internationalisation support at all. It is just a container for storing sequences of wchar_t.
The problem with wchar_t is that its encoding is unspecified. It might be Unicode UTF-16, or Unicode UTF-32, or Shift-JIS, or something completely different. There is no way to tell from within a program.
You will have the best chances of getting things to work if you ensure that the encoding of your source code is the same as the encoding used by the locale under which the program will run.
But, the use of third-party libraries (like DirectX) can place additional constraints due to possible limitations in what encodings those libraries expect and support.
Bug solved, it turns out to be the CASTING problem (not rendering problem as previously said).
The bugged text is a intermediate product during some internal conversion process using swtringstream (which I forgot to mention), the code is as follows
wstringstream wss;
wstring text;
textToGenerate.append(L"some text");
wss << timer->getTime()
text.append(wss.str());
Right after this process the debugger shows the text as a bunch of random stuff but later somehow it converts back so it's readable. But the problem appears at rendering stage using DirectX. I somehow left the casting for wchar_t*, which results in the incorrect rendering.
old:
LPCWSTR lpcwstrText = (LPCWSTR)textToDraw->getText();
new:
LPCWSTR lpcwstrText = (*textToDraw->getText()).c_str();
By changing that solves the problem.
So, this is resulted by a bad cast. As some kind people provided correction to my statement.

Does Visual Studio 2010 Supports C++ Source Code in Unicode with Unicode Char in String Literal

I want to directly embed non-ASCII Unicode characters in string literals and use them in printf. This implies my source codes must be saved in utf-8 or utf-16. Visual Studio 2010 does support editing and saving C++ source files in either format. But when compiled & executed, it does not produce the correct unicode characters. Does the compiler support string literals with unicode characters embedded?
e.g.
wprintf(L" chinese characters:中文字\n"); the trailing chinese characters cannot be displayed
I don't have a Chinese version of Windows to test with, so this is complete speculation.
The console and file output functions are aware that files are not coded in UTF-16, so they attempt to convert the characters to a code page before output. Just as the default locale is "C" rather than anything based on your system settings, so too the default code page is probably an inappropriate one that does not include Chinese characters.
There is a function SetConsoleOutputCP to change the code page for the console. It is not clear if this function changes the code page used by the actual console window, or if it only affects conversions from Unicode within the program.
The easy way to test wide literals is to skip the formatting part of printf, and give your string straight to the OS: WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), L" chinese characters:中文字", ....
It's possible that #pragma setlocale may be what you need.