How to write a portable c++ code with unicode support? - c++

I have the below program, which tries to print Unicode characters on a console by enabling the _O_U16TEXT mode for the console:
#include <iostream>
#include <fcntl.h>
#include <io.h>
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"test\n \x263a\x263b Hello from C/C++\n");
return 0;
}
What is unclear to me is that, I have seen many C++ projects (with the same code running on Windows and Linux) and using a macro called _UNICODE. I have the following questions:
Under what circumstance do I need to define the _UNICODE macro?
Does enabling the _UNICODE macro mean I need to separate the ASCII related code by using #ifdef _UNICODE? In case of the above program, do I need to put any #ifdef UNICODE and the ASCII code processing in #else?
What extra do I need to enable the code to have Unicode support in C/C++? Do I need to link to any specific libraries on Windows and Linux?
Why does my sample program above not need to define the _UNICODE macro?
When I define the _UNICODE macro, how does the code know whether it uses UTF-8, UTF-16 or UTF-32? How do I decide between these Unicode types?

Related

Why #define UNICODE has no effect in windows

I have the following code:
#define UNICODE
// so strange??
GetModuleBaseName( hProcess, hMod, szProcessName,
sizeof(szProcessName)/sizeof(TCHAR) );
But the compiler still report error like this:
error C2664: “DWORD K32GetModuleBaseNameA(HANDLE,HMODULE,LPSTR,DWORD)”: 无法将参数 3 从“wchar_t [260]”转换为“LPSTR” [E:\source\mh-gui\build\src\mhgui.vcxproj]
Which means cant convert param 3 from wchar_t[260] to LPSTR. It's look's like that still looking for A version api?
You must put
#define UNICODE
#define _UNICODE
BEFORE
#include <Windows.h>
The Windows header uses #ifdef UNICODE (et al), so if you want to make the distinction count, the #defines must occur before the #include.
edit: Because these #defines are functionally global, the most reliable place to add them is in your compiler options, so the ordering doesn't matter then.
Since you are using visual studio, instead of defining UNICODE yourself you should enable the W version by right clicking the project in the solution explorer -> properties -> Advanced -> Change the "Character Set" option to "Use Unicode Character Set"

using unicode dlls in MBCS projects or vice versa

I have a vc++ dll that is compiled with charset set to 'Use Unicode Character Set'.
Now i want to use this dll in my vc++ exe whose charset is 'Use Multi-Byte Character Set'. I know that theoretically nothing stops me from doing this as after compiling the vc++ dll, all the function signatures would be either wchar_t or LPCWSTRs .. And in my vc++ exe i just create strings of that format and call the exported functions. But, The problem i am facing is , The header in unicode dll takes TCHAR parameters like for ex:
class MYLIBRARY_EXPORT PrintableInt
{
public:
PrintableInt(int value);
...
void PrintString(TCHAR* somestr);
private:
int m_val;
};
Now i want to use this PrintString() in my exe. So, I included the header and used it like below:
#include "unicodeDllHeaders.h"
PrintableInt p(2);
wchar_t* someStr = L"Some str";
p.PrintString(someStr);
This as expected gives me a compiler error :
error C2664: 'PrintableInt::PrintString' : cannot convert parameter 1 from 'wchar_t *' to 'TCHAR *'
As the exe is built with MBCS TCHAR is defined to char . So, what i thought would solve this issue is :
#define _UNICODE
#include "unicodeDllHeaders.h"
#undef _UNICODE
But after defining _UNICODE also i still get the compilation error. So, my next guess was that probably TCHAR.h was already included before the #include "unicodeDllHeaders.h" , when i searched for the inclusion of TCHAR.h it was there else where in the project. So, I moved the inclusion to after definition of _UNICODE , This solved the compilation error here but it is failing in the other places where TCHAR is expected to be made char . So, My question is :
Can i somehow make TCHAR resolve to char and wchar_t in the same project ? I tried #define TCHAR char, #undef TCHAR , #define TCHAR wchar_t but its failing in the c headers like xutils
You can not retroactively change the binary interface of your DLL, regardless of what do to your header file by using macros. The typical solution is to firstly dump the whole legacy MBCS stuff. This stopped being interesting 10 years ago, nowadays all targets supporting the win32 API support the wide character interface with full Unicode support.
If you want that retro feeling of the nineties, you can also compile your DLL twice, once with CHAR and once with WCHAR. The latter then typically get a "u" suffix (for Unicode). In your header, you then check the charset and delegate to the according DLL using #pragma comment lib...

Unicode output on windows console

The article Unicode apps in the MinGW-w64 wiki explains the following example for an Unicode application, e.g. _main.c_:
#define UNICODE
#define _UNICODE
#include <tchar.h>
int _tmain(int argc, TCHAR * argv[])
{
_tprintf(argv[1]);
return 0;
}
The above code makes use of tchar.h mapping, which allows it to both compile in Unicode and non-Unicode mode. [...] The -municode option is still required when linking if Unicode mode is used.
So I used
C:\> i686-w64-mingw32-gcc main.c -municode -o hello
_tmain.c:1:0: warning: "UNICODE" redefined
#define UNICODE
^
<command-line>:0:0: note: this is the location of the previous definition
to compile a Unicode application. But, when I run it, it returns
C:\> hello Süßer
S³▀er
So the Unicode string is wrong. I used the latest version 4.9.2 of MinGW-w64, i686 architecture and tried the Win32 and POSIX theads variants, both result in the same error. My operating system is 32-bit German Windows 7. When I used the Unicode codepage (chcp 65001), I have to use the font "Lucida Console". With this setting I get a similar error:
C:\> hello Süßer
S��er
I want to use a parameter with "ü" or "ß" in a Windows C++ program.
Solution
nwellnhof is right: The problem is the output on the console. This problem is explained in Unicode part 1: Windows console i/o approaches und Unicode part 2: UTF-8 stream mode. The latter gives a solution for Visual C++ - it worked also with Intel C++ 15. This blog post does "not yet consider the g++ compiler. All this code is Visual C++ specific. However, [the blog author has] done generally the same with g++, and [he] will probably discuss that in a third installment."
I want to open a file, which name is given by a parameter. This works simple, e. g. main.c:
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char* argv[])
{
if ( argc > 1 ) {
// The output will be wrong, ...
cout << argv[1] << endl;
// but the name of this file will be right:
fstream fl_rsl(argv[1], ios::out);
fl_rsl.close();
}
return 0;
}
and the compilation without unicode mode
C:\> g++ main.cpp -o hello && hello Süßer
It's console output is still wrong, but the created filename is right. This is okay for me.

Compiling the MongoDB c++ driver without _UNICODE

I'm trying to compile the MongoDB c++ driver into my project and I've run across an interesting error.
in util/text.h, you can find this code:
/* like toWideString but UNICODE macro sensitive */
# if !defined(_UNICODE)
#error temp error
inline std::string toNativeString(const char *s) { return s; }
# else
inline std::wstring toNativeString(const char *s) { return toWideString(s); }
# endif
It looks like you should be able to compile it without the _UNICODE define, yet there is this seemingly arbitrary line #error temp error which causes the failure. On Github, this seems to have been the case for the lifetime of the file. Does anyone know if it's safe to remove it?
Unfortunately I can't just compile this project in unicode because there are a number of unicode incompatible sources in the project as well.
Cheers
Kyle

_O_WTEXT, _O_U16TEXT, _O_U8TEXT - are these modes possible in mingw compiler, are there any workarounds?

#include <fcntl.h>
#include <io.h>
#include <stdio.h>
int main(void) {
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n");
return 0;
}
returns error at compilation: _O_U16TEXT was not declared in this scope
Is this a show-stopper with this compiler ?
Well, there's a simple workaround: just use values of these constants instead of their names. For example, _O_U16TEXT is 0x00020000 and _O_U8TEXT is 0x00040000.
I've just confirmed that it works with _setmode using g++ 4.8.1 on Windows 10:
#include <iostream>
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
int main() {
_setmode(_fileno(stdout), 0x00020000); // _O_U16TEXT
std::wcout << L"Русский текст\n";
}
Don't even bother with it.
Your user needs to set the font of his console to a unicode one (Lucida Console for example), or else it won't display anything. If you're targeting East Asia, where the default font is unicode aware, it'll display fine, but in Europe, you have to change the font manually.
User's won't change their console font because you write it in the readme. They won't even read the readme, they'll just see that it displays garbage, and they will think the program doesn't work.
Also, it's a farly new function VS 2005, so you need to link the VS 2005 redistributeable, msvcr50.dll, or newer (mingw link to msvcrt.dll by default). If you don't do this, the text, won't be displayed at all.
And I think a simple pipe will kill your new and shiny unicode text. Let's say your exe named a.exe. I'm pretty sure, that a.exe | more will NOT display unicode text.