Is it possible to set the execution character set for Visual C++ compiler?
Problem
When trying to convert an (UCN) string literal to a wide string, a runtime crash happens when using Visual Studio 2015 for compilation:
std::string narrowUCN = "\u00E4\u00F6\u00FC\u00DF\u20AC\u0040";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convertWindows;
std::wstring wide = convertWindows.from_bytes(narrowUCN); // Unhandled C++ exception in xlocbuf, line 426.
Using narrowUCN = u8"\u00E4\u00F6\u00FC\u00DF\u20AC\u0040" works so I assume a problem with the execution character set?
Since Visual Studio 2015 Update 2, it is possible to set the execution character set to UTF-8 using the compiler option /utf-8. Then the conversion of narrow string literals, that don't use u8, will work. This is because those string literals are then converted to UTF-8 instead of the system's codepage (which is the default behaviour of Visual C++ compiler).
The option /utf-8 is a synonym for /source-charset:utf-8 and /execution-charset:utf-8. From the link above:
In those cases where BOM-less UTF-8 files already exist or where changing to a BOM is a problem, use the /source-charset:utf-8 option to correctly read these files.
Use of /execution-charset or /utf-8 can help when targeting code between Linux and Windows as Linux commonly uses BOM-less UTF-8 files and a UTF-8 execution character set.
PS: Don't confuse this with the character set setting in the common project configuration page which only sets the Macros Unicode/MBCS (historical reasons).
Related
I faced the following problem: when I try to save a Cyrillic character in the char16_t variable, I get a c2015 error (too many characters in constant). The most interesting thing is that it happens only when I build with MSVC via cmake. If I do it directly through studio, or build with cmake with another compiler, it's ok. Moreover, in the same case, which does not work with MSVC, it is possible to save Cyrillic characters in wchar_t. I use version 16 of MSVC generator to generate it.
Example of broken code:
char16_t val = u'б';
EDIT:
It is trouble with utf literals. I`m use this:
char16_t val = some_int_val
And this work correctly. Also, IDE correctly print int value of literal, but on the compile all crushs.
I'm trying to open some C++ source files from a programming textbook companion CD, but Dev-C++ throws me this message:
No mapping for the Unicode character exists in the target multi-byte code page Address : 0x0014206D
It seems there's not support for characters in my language (Brazilian Portuguese), like accute accents. If I convert the encoding to UTF-8 with Notepad++, Dev-C++ can then open and compile the source without errors, but doesn't render the non US/ANSI characters correctly. In the provided source file, "salário" is output as "sal├írio" on the program.
Here's the source file:
https://drive.google.com/file/d/1MltK2dLY_ElUIMuWM8xlO8i_fr6yEdns/view?usp=sharing
edit: I'm using version 6.3, downloaded from GitHub
I'm trying to set up an arduino uno for serial port communication with a C++ program in visual studio 2010. I'm working from the code found here: http://playground.arduino.cc/Interfacing/CPPWindows
Unfortunately, the .cpp file gives me the following message for line 9 for the variable 'portName':
Error: argument of type "char *" is incompatible with parameter of type "LPCWSTR"
I don't understand this error message, and have tried a few different things to fix it. Any help would be greatly appreciated!
Given the code link in your question, it seems the problem is here:
Serial::Serial(char *portName)
{
...
this->hSerial = CreateFile(portName, // <--- ERROR
CreateFile is a Win32 API that expects an LPCTSTR as first string parameter .
LPCTSTR is a Win32 typedef, which is expanded to:
const char* in ANSI/MBCS builds
const wchar_t* in Unicode builds (which have been the default since VS2005)
Since you are using VS2010, probably you are in the default Unicode build mode.
Actually, there is no "physical" CreateFile API exposed, but there are two distinct functions: CreateFileA and CreateFileW. The former takes a const char* input string, the latter takes a const wchar_t*.
In Unicode builds, CreateFile is a preprocessor macro expanded to CreateFileW; in ANSI/MBCS builds, CreateFile is expanded to CreateFileA.
So, if you are in Unicode build mode, your CreateFile call is expanded to CreateFileW(const wchar_t*, ...). Since portName is defined as a char*, there is a mismatch between wchar_t* and char*, and you get a compiler error.
To fix that, you have some options.
For example, you could be explicit in your code, and just call CreateFileA() instead of CreateFile(). In this way, you will be using the ANSI/MBCS version of the function (i.e., the one taking a const char*), independently from the actual ANSI/MBCS/Unicode settings in Visual Studio.
Another option would be to change your current build settings from the default Unicode mode to ANSI/MBCS. To do that, you can follow the path:
Project Properties | Configuration Properties | General | Character Set
and select "Use Multi-Byte Character Set", as showed in the following screenshot:
Your settings in Visual Studio are probably set to Unicode but the code you're compiling expects ASCII.
Go to Project Properties -> Configuration Properties -> General -> Character Set and choose "Use Multi-Byte Character Set".
-Surenthar
Your settings in Visual Studio are probably set to Unicode but the code you're compiling expects ASCII.
Go to Project Properties -> Configuration Properties -> General -> Character Set and choose "Use Multi-Byte Character Set".
You should also remove UNICODE or _UNICODE from C++ -> Preprocessor -> Preprocessor definitions, if they are defined there.
This will make your code call the ASCII versions of the Windows API functions, which accept char strings.
I have been using this:
char appdata[250];
ExpandEnvironmentStrings("%AppData%\\\MCPCModLocator\\", appdata, 250);
in C++ Win32 to get the AppData folder on one of my projects. Its worked fine, no issues. Now on my newest project (same PC, still in Visual Studio 2013) when I try to do that, i get an error on the first string saying "const char* is incompatible with type LPCWSTR" and on the second parameter it says "char * is incompatible with type LPWSTR". I have no idea why it work on the first project, but not the second. I assume its a setting change, but looking through each projects settings, I see nothing. Any help is appreciated! Thanks!
ExpandEnvironmentStrings is a macro that expands to ExpandEnvironmentStringsA or ExpandEnvironmentStringsW depending on whether UNICODE was defined when you included <windows.h>.
In a Visual Studio project UNICODE is defined by default, but this is not so for command line use of the compiler.
Since modern Windows programming is better based on Unicode, the best fix is not to remove the definition of UNICODE but to add an L prefix to your literals, like L"Hello", which makes it a “wide” wchar_t based literal, and correspondingly change the type of appdata.
By default, newly created project in VS2013 has been set to use Unicode APIs, those use LPWSTR (or, wchar_t*) instead of LPSTR(or, char*).
You can call the old ANSI version APIs by add "A" at the end of function name explicitly e.g. ExpandEnvironmentStringsA or change the project configuration to use Multibyte character set(Project Property pages -> configuration properties -> general -> character set)
I'm trying to compile a UTF-16BE C++ source file in g++ with -finput-charset compiler option but I'm always getting a bunch of errors. More details follow.
My environment(in CentOS Linux):
g++: 4.1.2
iconv: 2.5
Linux language(in Terminal): LANG="en_US.UTF-8"
My sample source file(stored in UTF-16BE encoding):
// main.cpp:
#include <iostream>
int main()
{
std::cout << "Hello, UTF-16" << std::endl;
return 0;
}
My steps:
I read the manual of g++ about the -finput-charset option. The g++ manual says:
-finput-charset=charset
Set the input character set, used for translation from the character set of the input file to the source character set used by
GCC. If the locale does not specify, or GCC cannot get this
information from the locale, the default is UTF-8. This can be
overridden by either the locale or this command line option.
Currently the command line option takes precedence if there’s a
conflict. charset can be any encoding supported by the system’s
"iconv" library routine.
Thus I entered the command as follows:
g++ -finput-charset=UTF-16BE main.cpp
and I got these errors:
In file included from main.cpp:1:
/usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/iostream:1:
error: stray ‘\342’ in program
/usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/iostream:1:
error: stray ‘\274’ in program
...(repeatedly, A LOT, around 4000+)...
/usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/iostream:1:
error: stray ‘\257’ in program
main.cpp: In function ‘int main()’:
main.cpp:5: error: ‘cout’ is not a member of ‘std’
main.cpp:5: error: ‘endl’ is not a member of ‘std’
The manual text suggests that the charset can be any encoding supported by 'iconv' routine, thus I guessed the compilation errors might be caused by my iconv library. I then tested the iconv:
iconv --from-code=UTF-16BE --to-code=UTF-8 --output=main_utf8.cpp main.cpp
A "main_utf8.cpp" file is generated as expected. I then tried to compile it:
g++ -finput-charset=UTF-8 main_utf8.cpp
Note that I specified the input-charset explicitly to see if I did anything wrong, but this time a "a.out" was generated without any errors. When I ran it, it could produce the correct output.
Finally...
I couldn't figure out where I did wrong. I searched in the web trying to find out some examples for this compiler option but I couldn't.
Please advise! Thanks!
Further edits:
Thanks, guys! Your replies are quick! Some updates:
When I said "UTF-16" I meant "UTF-16 + BOM". In fact I used UTF-16BE. I have updated the text above.
Some answers say the errors are caused by the non-UTF-16 header files. Here are my thoughts if this is the case: We'll always include some standard header files when writing a C/C++ project, right? Such as stdio.h or iostream. If the G++ compiler only deals with the encoding of the source files created by us but never with the source files in the standard library, then what does this -finput-charset option exist for??
Final edit:
At last, my solution is like this:
At the beginning, I changed the encoding of my source files to GB2312, as "Mr Lister" said below. This worked fine for a while, but later I found it not suitable for my situation because most of the other parts in the system still use UTF-8 for communication and interfaces, thus I must convert the encoding in many places... Not only an overhead of my work, it may also result in some performance decrease in my program.
Later I tried to convert all my source files to UTF-8 + BOM. In this way, Visual Studio in Windows could compile them happily but GCC in Linux would complain. I then wrote a shell script to remove the BOM, and before I want to compile my code with GCC, I run this script first.
Luckily, I don't have to build the code in Linux manually because TeamCity the continuous integration tool is used in my project to generate the build automatically. I could change the build steps in TeamCity to help me run this script before the daily build starts.
With this UTF-8 + BOM + script method, I decide not to edit my source code in Linux, because if I want to do so, I must make sure my code could build successfully before I commit it, which means I must run the script to remove the BOM before I build the code, which means SVN would report EVERY file is modified(BOM removed) thus make it very easy to mistakenly commit a wrong file. To solve this problem, I wrote another shell script to add the BOM back to the source files. Though I still don't edit my code very often in Linux, but when I really need to, I don't have to face the terribly long change list in the commit dialog.
Encoding Blues
You cannot use UTF-16 for source code files; because the header you are including, <iostream>, is not UTF-16-encoded. As #include includes the files verbatim, this means that you suddenly have an UTF-16-encoded file with a large chunk (approximately 4k, apparently) of invalid data.
There is almost no good reason to ever use UTF-16 for anything, so this is just as well.
Edit: Regarding problems with encoding support: The OSes themselves are not responsible for providing encoding support, this comes down to the compilers used.
g++ on Windows supports absolutely all of the same encodings as g++ on Linux, because it's the same program, unless whatever version of g++ you are using on Windows relies on a deeply broken iconv library.
Inspect your toolchain and ensure that all your tools are in working order.
As an alternative; don't use Chinese in the source files, but write them in English, using English-language literals, or simple TOKEN_STYLE_PLACEHOLDERs, using l10n and i18n to replace these in the running executable.
Threedit: -finput-charset is almost certainly a holdover from the days of codepages and other nonsense of the kind; however; an ISO-8859-n file will almost always be compatible with UTF-8 standard headers, however, see the reedit below.
Reedit: For next time; remember a simple mantra: "N'DUUH!"; "Never Don't Use UTF-8!"
I18N
A common solution to this kind of problem is to remove the problem entirely, by way of, for instance, gettext.
When using gettext, you usually end up with a function loc(char *) that abstracts away most of the translation tool specific code. So, instead of
#include <iostream>
int main () {
std::cout << "瓜田李下" << std::endl;
}
you would have
#include <iostream>
#include "translation.h"
int main () {
std::cout << loc("DEEPER_MEANING") << std::endl;
}
and, in zh.po:
msgid DEEPER_MEANING
msgstr "瓜田李下"
Of course, you could also then have a en.po:
msgid DEEPER_MEANING
msgstr "Still waters run deep"
This can be expanded upon, and the gettext package has tools for expansion of strings with variables and such, or you could use printf, to account for different grammars.
The Third Option
Instead of having to deal with multiple compilers with different requirements for file encodings, file endings, byte order marks, and other problems of the kind; it is possible to cross-compile using MinGW or similar tools.
This option requires some setup, but may very well reduce future overhead and headaches.
The error message says the problem is in the include files, so I presume what happens is that the include files are normal UTF-8, but the compiler wants to treat them as UTF-16 because of the compiler switch.
So I'm afraid the solution is to always convert the source to UTF-8 first; perhaps in the makefile. Or to find a solution that doesn't contain include files in other encodings...
Edit:
Maybe a GB encoding would work, if and only if none of the system source files contain any non-ASCII characters. Then you could tell the compiler they were GB encoded without problem.
This does not work because the compiler will also try to read the header files as UTF-16, which they are not.
UTF-16 is not an encoding for bytes. It's an encoding where your basic storage unit is 16 bits large.
When you want to store UTF-16 in a byte sequence you have to choose between UTF-16BE and UTF-16LE.