I'm tring to write Cyrillic characters into a spreadsheet that's being generated in an OSX app:
sheet = xlBookAddSheet(book, "Output", 0);
detailFormat = xlBookAddFormat(book, 0);
wchar_t *w_user = (wchar_t*)malloc(max_chars * sizeof(wchar_t));
//get wchar_t characters...
xlSheetWriteStr(sheet, row, column, w_user, detailFormat);
but the last line won't compile as the xlSheetWriteStr is expecting const char* in place of w_user.
According to the docs I should be able to pass in const wchar_t*. Is there any way of writing my international characters to a cell?
LibXL provides headers for the ANSI and UNICODE versions of the functions.
Programs willing to use the unicode version of an API should define the UNICODE macro. If you are using Visual Studio it is as easy as going to Project -> Properties -> Character Set and set it to Use Unicode Character Set.
What you see as xlSheetWriteStr is actually a macro that ends up having the value xlSheetWriteStrA or xlSheetWriteStrW. If UNICODE is not defined the former, if it is de later.
The function xlSheetWriteStrA is declared in SheetA.h, whereas xlSheetWriteStrW is declared in SheetW.h.
Note: It is better if you are consistent. If you use the unicode version of a function, use the unicode version of all functions. I say this because in your example I see you are using the ansi version of xlBookAddSheet.
Update: For the OSX version of the library the macro is _UNICODE, with the preceding underscore (See libxl.h for details).
Related
This question already has answers here:
Is it possible to cout an EM DASH on Linux and Windows? [duplicate]
(2 answers)
Closed 5 years ago.
A simple problem: I'm writing a chatroom program in C++ (but it's primarily C-style) for a class, and I'm trying to print, “#help — display a list of commands...” to the output window. While I could use two hyphens (--) to achieve roughly the same effect, I'd rather use an em-dash (—). printf(), however, doesn't seem to support printing em-dashes. Instead, the console just prints out the character, ù, in its place, despite the fact that entering em-dashes directly into the prompt works fine.
How do I get this simple Unicode character to show up?
Looking at Windows alt key codes, I find it interesting how alt+0151 is "—" and alt+151 is "ù". Is this related to my problem, or a simple coincidence?
the windows is unicode (UTF-16) system. console unicode as well. if you want print unicode text - you need (and this is most effective) use WriteConsoleW
BOOL PrintString(PCWSTR psz)
{
DWORD n;
return WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), psz, (ULONG)wcslen(psz), &n, 0);
}
PrintString(L"—");
in this case in your binary file will be wide character — (2 bytes 0x2014) and console print it as is.
if ansi (multi-byte) function is called for output console - like WriteConsoleA or WriteFile - console first translate multi-byte string to unicode via MultiByteToWideChar and in place CodePage will be used value returned by GetConsoleOutputCP. and here (translation) can be problem if you use characters > 0x80
first of all compiler can give you warning: The file contains a character that cannot be represented in the current code page (number). Save the file in Unicode format to prevent data loss. (C4819). but even after you save source file in Unicode format, can be next:
wprintf(L"ù"); // no warning
printf("ù"); //warning C4566
because L"ù" saved as wide char string (as is) in binary file - here all ok and no any problems and warning. but "ù" is saved as char string (single byte string). compiler need convert wide string "ù" from source file to multi-byte string in binary (.obj file, from which linker create pe than). and compiler use for this WideCharToMultiByte with CP_ACP (The current system default Windows ANSI code page.)
so what happens if you say call printf("ù"); ?
unicode string "ù" will be converted to multi-byte
WideCharToMultiByte(CP_ACP, ) and this will be at compile time. resulting multi-byte string will be saved in binary file
the console it run-time convert your multi-byte string to
wide char by MultiByteToWideChar(GetConsoleOutputCP(), ..) and
print this string
so you got 2 conversions: unicode -> CP_ACP -> multi-byte -> GetConsoleOutputCP() -> unicode
by default GetConsoleOutputCP() == CP_OEMCP != CP_ACP even if you run program on computer where you compile it. (on another computer with another CP_OEMCP especially)
problem in incompatible conversions - different code pages used. but even if you change console code page to your CP_ACP - convertion anyway can wrong translate some characters.
and about CRT api wprintf - here situation is next:
the wprintf first convert given string from unicode to multi-byte by using it internal current locale (and note that crt locale independent and different from console locale). and then call WriteFile with multi-byte string. console convert back this multi-bytes string to unicode
unicode -> current_crt_locale -> multi-byte -> GetConsoleOutputCP() -> unicode
so for use wprintf we need first set current crt locale to GetConsoleOutputCP()
char sz[16];
sprintf(sz, ".%u", GetConsoleOutputCP());
setlocale(LC_ALL, sz);
wprintf(L"—");
but anyway here i view (on my comp) - on screen instead —. so will be -— if call PrintString(L"—"); (which used WriteConsoleW) just after this.
so only reliable way print any unicode characters (supported by windows) - use WriteConsoleW api.
After going through the comments, I've found eryksun's solution to be the simplest (...and the most comprehensible):
#include <stdio.h>
#include <io.h>
#include <fcntl.h>
int main()
{
//other stuff
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"#help — display a list of commands...");
Portability isn't a concern of mine, and this solves my initial problem—no more ù—my beloved em-dash is on display.
I acknowledge this question is essentially a duplicate of the one linked by sata300.de. Albeit, with printf in the place of cout, and unnecessary ramblings in the place of relevant information.
I'm just getting back into Programming C++, MFC, Unicode. Lots have changed over the past 20 years.
Code on another project compiled just fine, but had errors when I paste it into my code. It took me 1-1/2 days of wasted time to solve the function call below:
enter code here
CString CFileOperation::ChangeFileName(CString sFileName)
{
char drive[MAX_PATH], dir[MAX_PATH], name[MAX_PATH], ext[MAX_PATH];
_splitpath_s(sFileName, drive, dir, name, ext); //error
------- other code
}
After reading help, I changed the CString sFileName to use a cast:
enter code here
_splitpath_s((LPTCSTR)sFileName, drive, dir, name, ext); //error
This created an error too. So then I used GetBuffer() which is really the same as above.
enter code here
char* s = sFileName.GetBuffer(300);
_splitpath_s(s, drive, dir, name, ext); //same error for the 3rd time
sFileName.ReleaseBuffer();
At this point I was pretty upset, but finally realized that I needed to change the CString to Ascii (I think because I'm set up as Unicode).
hence;
enter code here
CT2A strAscii(sFileName); //convert CString to ascii, for splitpath()
then use strAscii.m_pz in the function _splitpath_s()
This finally worked. So after all this, to make a story short, I need help focusing on:
1. Unicode vs Mulit-Byte (library calls)
2. Variables to uses
I'm willing to purchase another book, please recommend.
Also, is there a way to filter my help on VS2015 so that when I'm on a variable and press F1, it only gives me help for Unicode and ways to convert old code to unicode or convert Mylti-Byte to Unicode.
Hope this is not to confusing, but I have some catching up to do. Be patient if my verbiage is not perfect.
Thanks in advance.
The documentation of _splitpath lists a Unicode (wchar_t based) version _wsplitpath. That's the one you should be using. Don't convert to ASCII or Windows ANSI, that will in general lose information and not produce a valid path when you recombine the pieces.
Modern Windows programming is Unicode based.
A Visual Studio C++ project is Unicode-based by default, in particular it defines the macro symbol UNICODE, which affects the declarations from <windows.h>.
All supported versions of Windows use Unicode internally throughout, and your application should, too. Windows uses UTF-16 encoding.
To make your application Unicode-enabled you need to perform the following steps:
Set up your project's Character Set to "Use Unicode Character Set" (if it's currently set to "Use Multi-Byte Character Set"). This is not strictly required, but it deals with those cases, where you aren't using the Unicode version explicitly.
Use wchar_t (in place of char or TCHAR) for your strings.
Use wide character string literals (L"..." in place of "...").
Use CStringW (in place of CStringA or CString) in an MFC project.
Explicitly call the Unicode version of the CRT (e.g. wcslen in place of strlen or _tcslen).
Explicitly call the Unicode version of any Windows API call where it exists (e.g. CreateWindowExW in place of CreateWindowExA or CreateWindowEx).
Try using _tsplitpath_s and TCHAR.
So the final code looks something like:
CString CFileOperation::ChangeFileName(CString sFileName)
{
TCHAR drive[MAX_PATH], dir[MAX_PATH], name[MAX_PATH], ext[MAX_PATH];
_tsplitpath_s(sFileName, drive, dir, name, ext); //error
------- other code
}
This will enable C++ compiler to use the correct character width during build time depending on the project settings
I am trying to use Lithuanian in my c++ application, but every try is unsuccesfull.
Multi-byte character set is used. I have tryed everything i have tought of, i am new in c++. Never ever tryed to do something in Lithuanian.
Tryed every setlocale(LC_ALL, "en_US.utf8"); setlocale(LC_ALL, "Lithuanian");...
Researched for 2 hours and didnt found proper examples, solution.
I do have a average sized project which needs Lithuanian translation from database and it cant understand most of "ĄČĘĖĮŠŲŪąčęėįšųū".
Compiler - "Visual studio 2013"
Database - sqlite3.
I cant get simple strings to work(defined myself), and output as Lithuanian to win32 application, even.
In Windows use wide character strings (1UTF-16 encoding, wchar_t type) for internal text handling, and preferably UTF-8 for external text files and networking.
Note that Visual C++ will translate narrow text literals from the source encoding to Windows ANSI, which is a platform-dependent usually single-byte encoding (you can check which one via the GetACP API function), i.e., Visual C++ has the platform-specific Windows ANSI as its narrow C++ execution character set.
But also do note that for an app restricted to non-Windows platforms, i.e. Unix-land, it makes practical sense to do everything in UTF-8, based on char type.
For the database communication you may need to translate to and from the program's internal text representation.
This depends on what the database interface requires, which is not stated.
Example for console output in Windows:
#include <iostream>
#include <fcntl.h>
#include <io.h>
auto main() -> int
{
_setmode( _fileno( stdout ), _O_WTEXT );
using namespace std;
wcout << L"ĄČĘĖĮŠŲŪąčęėįšųū" << endl;
}
To make this compile by default with g++, the source code encoding needs to be UTF-8. Then, to make it produce correct results with Visual C++ the source code encoding needs to be UTF-8 with BOM, which happily is also accepted by modern versions of g++. For otherwise the Visual C++ compiler will assume the Windows ANSI encoding and produce an incorrect UTF-16 string.
Not coincidentally this is the default meaning of UTF-8 in Windows, e.g. in the Notepad editor, namely UTF-8 with BOM.
But note that while in Windows the problem is that the main system compiler requires a BOM for UTF-8, in Unix-land the problem is the opposite, that many old tools can't handle the BOM (for example, even MinGW g++ 4.9.1 isn't yet entirely up to speed: it sometimes includes the BOM bytes, then incorrectly interpreted, in error messages).
1) On other platforms wide character text can be encoded in other ways, e.g. with UTF-32. In fact the Windows convention is in direct conflict with the C and C++ standards which require that a single wchar_t should be able to encode any character in the extended character set. However, this requirement was, AFAIK, imposed after Windows adopted UTF-16, so the fault probably lies with the politics of the C and C++ standardization process, not yet another Microsoft'ism.
Complexity of internationalisation
There are several related but distinct topics that can cause mismatches between them, making try and error approach very tedious:
type used for storing strings and chars: windows iuses wchar_t by default, but for most APIs you have also char equivalents functions
character set encoding this defines how the chars stored in the type are to be understood. For exemple unicode (UTF8, UTF16, UTF32), 7 bits ascii, 8 bit ansii. In windows, by default it is UTF16 for wchar_t and ansi/windows for char
locale defines, among other things, the character set asumptions, when processing strings. This permit to use language independent functions like isalpha(i, loc), islower(i, loc), ispunct(i, loc) to find out if a given character is alphanumeric, a lower case alphabetic, or a punctuation, for example to bereak down a user text into words. C++ offers here portable functions.
output codepage or font used to show a character to the user. This assumes that the font used shows the characters using the same character set used in the code internals.
source code encoding. For example your editor could assume an ansi encoding, with windows 1252 character set.
Most typical errors
The problem n°1 is Win32 console output, as unicode is not well supported by the console. But this is not your problem here.
Another cause of mismatch is the encoding of your text editor. It might not be unicode, but use a windows code page. In this case, you type "Č", the deditor displays it as such, but editor might use windows 1257 encoding for lithuanian and store 0xC8 in the file. If you then display this literal with a windows unicode function, it will interpret 0xC8 as "latin E grave accent" and print something else, as the right unicode encoding for "Č" is 0x010C !
I can be even worse: the compiler may have its own assumption about character set encoding used and convert your litterals into unicode using false assumptions (it happened to me when I used some exotic code generation switch).
How to do ?
To figure out what goes wront, proceed by elimination:
First, for plain windows, use the native unicode setting. Ok it's UTF16 and wchar_t instead of UTF8 and as thus comes with some drawbacks, but it's native and well supported.
Then use explict unicode coding in litterals, for example TEXT("\u010C") instead of TEXT("Č"). This avoids editor and compiler mismatch.
If it's still not the right character, make sure that your font FULLY supports unicode. The default system font for instance doesn't while most other do. You can easily check with the windows font pannel (WindowKey+R fonts then click on "search char") to display the character table of your font.
Set fonts explicitely in your code
For example, a very tiny experiment :
...
case WM_PAINT:
{
hdc = BeginPaint(hWnd, &ps);
auto hf = CreateFont(24, 0, 0, 0, 0, TRUE, 0, 0, 0, 0, 0, 0, 0, L"Times New Roman");
auto hfOld = SelectObject(hdc, hf); // if you comment this out, € and Č won't display
TextOut(hdc, 50, 50, L"Test with éç € \u010C special chars", 30);
SelectObject(hdc, hfOld);
DeleteObject(hf);
EndPaint(hWnd, &ps);
break;
}
I have been struggling to print Copyright symbol in Windows using Visual Studio. I understand that 0xA9 is the ASCII code for Copyright symbol and it works on non-windows platform. But on Windows I can't print the Copyright symbol using the same code.
#include "iostream.h"
using namespace std;
int main(int argc, char * argv[])
{
cout << (char)0xA9 << " Copyright symbol" << endl;
return 0;
}
Output on Linux/HP-UX and AIX: © Copyright symbol
Output on Windows: ⌐ Copyright symbol
I am new to Windows, can someone help me out.
As Basile points out, the copyright symbol (©) is not an ASCII character. In other words, it is not one of the characters included in the 7-bit ASCII character set.
You need to switch to a Unicode encoding in order to use "special" characters like this that extend beyond the range of 7-bit ASCII. That's not difficult in Windows, it just requires that you use wide characters (wchar_t) instead of narrow characters (char). Unlike most Unix-based systems that implement Unicode support using UTF-8 (which uses the regular char data type), Windows does not have built-in support for UTF-8. It uses UTF-16 instead, which requires that you use the larger wchar_t type.
Conveniently, the C++ standard library also supports wide character strings, you just need to use the appropriate versions of the classes. The ones you want will have a w appended to the front of their names. So, rewriting your code to use wide (Unicode) characters on Windows, it would look like this:
#include <iostream> // (standard C++ headers should be in angle brackets)
int main(int argc, char * argv[])
{
std::wcout << (wchar_t)0xA9 << " Copyright symbol" << std::endl;
return 0;
}
The reason you're getting that strange ⌐ character when you try the original code on Windows is that that character is what the value 0xA9 maps to in your default Windows character set. You see, the char type supports 8-bit values, but I said above that the ASCII character set only defines 7 bits worth of characters. That extra bit is used on Windows to define some additional useful characters.
There are two different sets of extended narrow (non-Unicode) character sets, one is called the OEM character set and the other is (for historical reasons) called the ANSI character set. Generally, the Command Prompt uses the OEM character set, which fills most of the upper range with characters for drawing lines, boxes, and other simulated graphics in a text-based environment. Legacy, non-Unicode Windows applications generally use the ANSI character set, which is specific to your localized version of Windows and fills the upper range with characters needed to display all of the letters/symbols in your language.
If it sounds complicated, that's because it is. That's why everyone has forgotten all of this stuff and uses exclusively Unicode on Windows. I strongly recommend that path to you as well. :-)
Edit: Nuts, I forgot it was more complicated than this. Changing your code to output wide characters may not be sufficient. The Windows Command Prompt is broken backwards-compatible in all sorts of ways, severely hobbling its support for Unicode characters.
By default, it uses raster fonts which probably don't even provide symbols for most of the Unicode characters (the copyright symbol is likely common enough to be an exception). You need to change the font used by the Command Prompt to something else like Lucida Console or Consolas to ensure that it works correctly. Fortunately, you can set the defaults for all Command Prompt windows. Unfortunately, this is a per-user setting.
Additionally, the Command Prompt still uses the active code page, so all of that stuff I was explaining above is still relevant and you can't forget about it. You can change the particular code page that it uses with the chcp xxxx command, where xxxx is the number of the code page you wish to use. Unfortunately, this applies only to the current console session and must be reset each time. Not a good solution for an application program that needs to output Unicode characters.
More information on these problems and how to output Unicode strings on the Command Prompt is available in the answers to these questions:
What encoding/code page is cmd.exe using?
Can command prompt display unicode characters?
Notice that 0xa9 is not ASCII (which had 7 bits characters, limited to 0 - 0x7f range). It could be ISO/IEC 8859-1. Many current systems (including most Linux terminals today) use UTF-8 these days, in which the copyright glyph is encoded by two bytes, so you would code "\302\251" or "\xc2\xa9" in your C or C++ source. So your program don't display a copyright sign in my Linux xfce4-terminal which uses UTF-8.
Some Windows machines had different encoding systems.
I would setup your system (be it Linux or Windows) to use UTF8 character encoding, if possible, on its terminal (or use UTF16 wide chars). Read about UTF-8 everywhere.
An ASCII conventional evocation of copyright is very commonly (C) precisely because the ASCII encoding does not have any copyright glyph.
Taken and adapted from here:
#if defined(WIN32)
#include <windows.h>
#endif
#include <stdio.h>
void print_copyright_hint() {
printf("Copyright ");
#if defined(WIN32)
auto copyright = const_cast<wchar_t *>(L"©");
auto handle = GetStdHandle(STD_OUTPUT_HANDLE);
WriteConsoleW(handle, copyright, static_cast<DWORD>(wcslen(copyright)), nullptr, nullptr);
#else
printf("©");
#endif
printf(" my Company");
}
You can use alt+0169.
Forgive me if i am wrong.
I'm using Visual Studio 6 to debug a C++ application. The application is compiled for unicode string support. The CString type is used for manipulating strings. When I am using the debugger, the watch window will display the first character of the string, but will not display the full string. I tried using XDebug, but this tool does not handle unicode strings properly. As a work around, I can create a custom watch for each character of the string by indexing into the private array the CString maintains, but this is very tedious.
How can I view the full, unicode value of a CString in the VC6 debugger?
Go to tools->options->Debug, and check the "Display unicode string" check-box. That would probably fix the problem. Two other options:
In the watch window, if you have a Unicode string variable named szText, add it to the watch as szText,su. This will tell VS to interpret it as a Unicode string (See Symbols for Watch Variables for more of this sort).
Worst comes to worst, you can have a global ANSI string buffer, and a global function that will get a Unicode CString and store its content as ANSI, in that global variable. Then, when need call that function with the string whose content you'd like to see in the watch window, and watch the ANSI buffer.
But the "Display unicode string" thing is probably the problem...