Why does vsnwprintf not translate normal strings to wide strings? - c++

The following code does not produce the expected output. Why?
wchar_t* wchar_t_printf_return(wchar_t* formatstring, ...){
va_list argp;
va_start(argp, formatstring);
int templen = 256;
templen = vsnwprintf(NULL, 0, formatstring, argp)+3;
wchar_t *buffer = (wchar_t *) malloc ((templen+1)*sizeof(wchar_t));
memset(buffer, 0, (templen+1)*sizeof(*buffer));
int retval;
while ((retval = vsnwprintf(buffer, templen, formatstring, argp)) == -1 || (retval >= (templen-1))){
templen = templen &lt&lt 1;
buffer = (wchar_t *) realloc (buffer, (templen+1)*sizeof(wchar_t));
va_end(argp);
va_start(argp, formatstring);
}
va_end(argp);
buffer[templen] = L'\0';
return buffer;
}
int main(){
int i;
char *id = "2923BE84E16CD6AE529049F1F1BBE9EB";
wchar_t *val = wchar_t_printf_return(L"'%s'", id);
printf("%ls\n", val);
}
EDIT: to state more specifically, the printf in main should wrap the id in two single quotes thereby outputting: '2923BE84E16CD6AE529049F1F1BBE9EB'. The purpose of the main here is to illustrate the bug in the function, no more. The function is supposed to be an alternate of the printf family functions which return the result in a newly allocated buffer instead of a preexisting one. This is being run in cygwin compiled natively via gcc-3 with the -mno-cygwin option (aka mingw). Sorry for the confusion!

The %s specifier changes meaning depending on whether you are using a printf or wprintf family function. When used with a wprintf family function, the %s specifier indicates a wide string, but you're passing a narrow string. You need %hs to say "This is a narrow string."
(You seemed to be aware of this because you use %ls to print a wide string with a printf-family function, but you somehow forgot about it when going the other way.)

Try using %S for a translation, not %s.

Related

Can "const char[18]* be changed to an entity of type LPCWSTR(C++)? [duplicate]

After getting a struct from C# to C++ using C++/CLI:
public value struct SampleObject
{
LPWSTR a;
};
I want to print its instance:
printf(sampleObject->a);
but I got this error:
Error 1 error C2664: 'printf' : cannot convert parameter 1 from
'LPWSTR' to 'const char *'
How can I convert from LPWSTR to char*?
Thanks in advance.
Use the wcstombs() function, which is located in <stdlib.h>. Here's how to use it:
LPWSTR wideStr = L"Some message";
char buffer[500];
// First arg is the pointer to destination char, second arg is
// the pointer to source wchar_t, last arg is the size of char buffer
wcstombs(buffer, wideStr, 500);
printf("%s", buffer);
Hope this helped someone! This function saved me from a lot of frustration.
Just use printf("%ls", sampleObject->a). The use of l in %ls means that you can pass a wchar_t[] such as L"Wide String".
(No, I don't know why the L and w prefixes are mixed all the time)
int length = WideCharToMultiByte(cp, 0, sampleObject->a, -1, 0, 0, NULL, NULL);
char* output = new char[length];
WideCharToMultiByte(cp, 0, sampleObject->a, -1, output , length, NULL, NULL);
printf(output);
delete[] output;
use WideCharToMultiByte() method to convert multi-byte character.
Here is example of converting from LPWSTR to char*
or wide character to character.
/*LPWSTR to char* example.c */
#include <stdio.h>
#include <windows.h>
void LPWSTR_2_CHAR(LPWSTR,LPSTR,size_t);
int main(void)
{
wchar_t w_char_str[] = {L"This is wide character string test!"};
size_t w_len = wcslen(w_char_str);
char char_str[w_len + 1];
memset(char_str,'\0',w_len * sizeof(char));
LPWSTR_2_CHAR(w_char_str,char_str,w_len);
puts(char_str);
return 0;
}
void LPWSTR_2_CHAR(LPWSTR in_char,LPSTR out_char,size_t str_len)
{
WideCharToMultiByte(CP_ACP,WC_COMPOSITECHECK,in_char,-1,out_char,str_len,NULL,NULL);
}
Here is a Simple Solution. Check wsprintf
LPWSTR wideStr = "some text";
char* resultStr = new char [wcslen(wideStr) + 1];
wsprintfA ( resultStr, "%S", wideStr);
The "%S" will implicitly convert UNICODE to ANSI.
Don't convert.
Use wprintf instead of printf:
wprintf
See the examples which explains how to use it.
Alternatively, you can use std::wcout as:
wchar_t *wstr1= L"string";
LPWSTR wstr2= L"string"; //same as above
std::wcout << wstr1 << L", " << wstr2;
Similarly, use functions which are designed for wide-char, and forget the idea of converting wchar_t to char, as it may loss data.
Have a look at the functions which deal with wide-char here:
Unicode in Visual C++

What causes the heap corruption in my method?

So I have tracked down an annoying heap corruption to a single method.
DWORD gdwCounter = 0;
TCHAR* GetName(const TCHAR* format, size_t len)
{
len += (snprintf(NULL, 0, "%lu", gdwCounter) * sizeof(TCHAR));
TCHAR *c = (TCHAR*)malloc(len);
_stprintf_s(c, len, __TEXT("%s%lu"), format, gdwCounter);
return c;
}
To make sure I found the correct method, I tried to change it and just copy the 'format' buffer it gets passed as an parameter to the output buffer. Heap corruption went away and everything was fine again.
I decided to look at the documentations of snprintf and _stprintf_s.
snprintf is supposed to return the required characters without the null-terminating character to actually print your buffer in a second call to it.
My len parameter already contains the full size (with null-terminating character) of format.
Also I couldn't find any hints to what is wrong in the documentation of _stprintf_s.
So what am I missing?
Edit: After further testing I found out that apparently _stprintf_s causes the error as snprintf does return the correct size.
TCHAR* GetName(const TCHAR* format, size_t len)
{
len += snprintf(NULL, 0, "%lu", gdwCounter);
TCHAR *c = (TCHAR*)malloc(len*sizeof(TCHAR));
_stprintf_s(c, len, __TEXT("%s%lu"), format, gdwCounter);
return c;
}
_stprintf_s takes the "Maximum number of characters to store" instead of maximum number of bytes.

Display the first characters of an unsigned char*

I have the following code (I stripped down the useless parts):
unsigned char* decrypted= (unsigned char *) malloc(500);
bufSize = operations.RSADecrypt(newEncrypted, bufSize, key, decrypted);
printf("Test: %s", decrypted);
And I would like to display only the bufSize first characters of decrypted because actually it displays a lot of nonsense characters!
You can use the "%.*s" format specifier:
printf("Test: %.*s", bufSize, decrypted);
which instructs printf() to write the first bufSize characters from decrypted.
You can limit the length with the format specifier:
printf ("Test: %-20.20s", decrypted);
For a version using a variable bufSize:
printf ("Test: %-*.*s", bufSize, bufSize, decrypted);
Note that this forces the length to exactly that many characters, padded with spaces on the right if need be. If you want a shorter string to be shorter in the output (irrelebant in your case if the string is, as indicated, always longer than what you want output), you can use:
printf ("Test: %.*s", bufSize, decrypted);
If you are 'allowed' to modify the decrypted string. You can simply add a terminator to it:
decrypted[bufSize] = 0;
So printf() will only print the buffer contents.
If you are not allowed to add a custom char to the decrypted buffer you need to copy the contents to a temporary buffer and use that buffer in your printf():
unsigned char* tmp = (unsigned char *) malloc(bufSize + 1);
strncpy(tmp, decrypted, bufSize);
tmp[bufSize] = 0;
I don't like that you said the pointer contained nonsense. Its not nonsense, its residual memory. There is a good chance you expect and want this area to be set to zero. Try the following, where calloc sets the malloc bits to zero.
unsigned char* decrypted= (unsigned char *) calloc(500,sizeof(char));
bufSize = operations.RSADecrypt(newEncrypted, bufSize, key, decrypted);
printf("Test: %s", decrypted);

libxml2 xmlChar * to std::wstring

libxml2 seems to store all its strings in UTF-8, as xmlChar *.
/**
* xmlChar:
*
* This is a basic byte in an UTF-8 encoded string.
* It's unsigned allowing to pinpoint case where char * are assigned
* to xmlChar * (possibly making serialization back impossible).
*/
typedef unsigned char xmlChar;
As libxml2 is a C library, there's no provided routines to get an std::wstring out of an xmlChar *. I'm wondering whether the prudent way to convert xmlChar * to a std::wstring in C++11 is to use the mbstowcs C function, via something like this (work in progress):
std::wstring xmlCharToWideString(const xmlChar *xmlString) {
if(!xmlString){abort();} //provided string was null
int charLength = xmlStrlen(xmlString); //excludes null terminator
wchar_t *wideBuffer = new wchar_t[charLength];
size_t wcharLength = mbstowcs(wideBuffer, (const char *)xmlString, charLength);
if(wcharLength == (size_t)(-1)){abort();} //mbstowcs failed
std::wstring wideString(wideBuffer, wcharLength);
delete[] wideBuffer;
return wideString;
}
Edit: Just an FYI, I'm very aware of what xmlStrlen returns; it's the number of xmlChar used to store the string; I know it's not the number of characters but rather the number of unsigned char. It would have been less confusing if I had named it byteLength, but I thought it would have been clearer as I have both charLength and wcharLength. As for the correctness of the code, the wideBuffer will be larger or equal to the required size to hold the buffer, always (I believe). As characters that require more space than wide_t will be truncated (I think).
xmlStrlen() returns the number of UTF-8 encoded codeunits in the xmlChar* string. That is not going to be the same number of wchar_t encoded codeunits needed when the data is converted, so do not use xmlStrlen() to allocate the size of your wchar_t string. You need to call std::mbtowc() once to get the correct length, then allocate the memory, and call mbtowc() again to fill the memory. You will also have to use std::setlocale() to tell mbtowc() to use UTF-8 (messing with the locale may not be a good idea, especially if multiple threads are involved). For example:
std::wstring xmlCharToWideString(const xmlChar *xmlString)
{
if (!xmlString) { abort(); } //provided string was null
std::wstring wideString;
int charLength = xmlStrlen(xmlString);
if (charLength > 0)
{
char *origLocale = setlocale(LC_CTYPE, NULL);
setlocale(LC_CTYPE, "en_US.UTF-8");
size_t wcharLength = mbtowc(NULL, (const char*) xmlString, charLength); //excludes null terminator
if (wcharLength != (size_t)(-1))
{
wideString.resize(wcharLength);
mbtowc(&wideString[0], (const char*) xmlString, charLength);
}
setlocale(LC_CTYPE, origLocale);
if (wcharLength == (size_t)(-1)) { abort(); } //mbstowcs failed
}
return wideString;
}
A better option, since you mention C++11, is to use std::codecvt_utf8 with std::wstring_convert instead so you do not have to deal with locales:
std::wstring xmlCharToWideString(const xmlChar *xmlString)
{
if (!xmlString) { abort(); } //provided string was null
try
{
std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> conv;
return conv.from_bytes((const char*)xmlString);
}
catch(const std::range_error& e)
{
abort(); //wstring_convert failed
}
}
An alternative option is to use an actual Unicode library, such as ICU or ICONV, to handle Unicode conversions.
There are some problems in this code, besides the fact that you are using wchar_t and std::wstring which is a bad idea unless you're making calls to the Windows API.
xmlStrlen() does not do what you think it does. It counts the number of UTF-8 code units (a.k.a. bytes) in a string. It does not count the number of characters. This is all stuff in the documentation.
Counting characters will not portably give you the correct size for a wchar_t array anyway. So not only does xmlStrlen() not do what you think it does, what you wanted isn't the right thing either. The problem is that the encoding of wchar_t varies from platform to platform, making it 100% useless for portable code.
The mbtowcs() function is locale-dependent. It only converts from UTF-8 if the locale is a UTF-8 locale!
This code will leak memory if the std::wstring constructor throws an exception.
My recommendations:
Use UTF-8 if at all possible. The wchar_t rabbit hole is a lot of extra work for no benefit (except the ability to make Windows API calls).
If you need UTF-32, then use std::u32string. Remember that wstring has a platform-dependent encoding: it could be a variable-length encoding (Windows) or fixed-length (Linux, OS X).
If you absolutely must have wchar_t, then chances are good that you are on Windows. Here is how you do it on Windows:
std::wstring utf8_to_wstring(const char *utf8)
{
size_t utf8len = std::strlen(utf8);
int wclen = MultiByteToWideChar(
CP_UTF8, 0, utf8, utf8len, NULL, 0);
wchar_t *wc = NULL;
try {
wc = new wchar_t[wclen];
MultiByteToWideChar(
CP_UTF8, 0, utf8, utf8len, wc, wclen);
std::wstring wstr(wc, wclen);
delete[] wc;
wc = NULL;
return wstr;
} catch (std::exception &) {
if (wc)
delete[] wc;
}
}
If you absolutely must have wchar_t and you are not on Windows, use iconv() (see man 3 iconv, man 3 iconv_open and man 3 iconv_close for the manual). You can specify "WCHAR_T" as one of the encodings for iconv().
Remember: You probably don't want wchar_t or std::wstring. What wchar_t does portably isn't useful, and making it useful isn't portable. C'est la vie.
add
#include <boost/locale.hpp>
convert xmlChar* to string
std::string strGbk((char*)node);
convert string to wstring
std::string strGbk = "china powerful forever";
std::wstring wstr = boost::locale::conv::to_utf<wchar_t>(strGbk, "gbk");
std::cout << strGbk << std::endl;
std::wcout << wstr. << std::endl;
it works for me,good lucks.

Why is the following C++ code printing only the first character?

I am trying to convert a char string to a wchar string.
In more detail: I am trying to convert a char[] to a wchar[] first and then append " 1" to that string and the print it.
char src[256] = "c:\\user";
wchar_t temp_src[256];
mbtowc(temp_src, src, 256);
wchar_t path[256];
StringCbPrintf(path, 256, _T("%s 1"), temp_src);
wcout << path;
But it prints just c
Is this the right way to convert from char to wchar? I have come to know of another way since. But I'd like to know why the above code works the way it does?
mbtowc converts only a single character. Did you mean to use mbstowcs?
Typically you call this function twice; the first to obtain the required buffer size, and the second to actually convert it:
#include <cstdlib> // for mbstowcs
const char* mbs = "c:\\user";
size_t requiredSize = ::mbstowcs(NULL, mbs, 0);
wchar_t* wcs = new wchar_t[requiredSize + 1];
if(::mbstowcs(wcs, mbs, requiredSize + 1) != (size_t)(-1))
{
// Do what's needed with the wcs string
}
delete[] wcs;
If you rather use mbstowcs_s (because of deprecation warnings), then do this:
#include <cstdlib> // also for mbstowcs_s
const char* mbs = "c:\\user";
size_t requiredSize = 0;
::mbstowcs_s(&requiredSize, NULL, 0, mbs, 0);
wchar_t* wcs = new wchar_t[requiredSize + 1];
::mbstowcs_s(&requiredSize, wcs, requiredSize + 1, mbs, requiredSize);
if(requiredSize != 0)
{
// Do what's needed with the wcs string
}
delete[] wcs;
Make sure you take care of locale issues via setlocale() or using the versions of mbstowcs() (such as mbstowcs_l() or mbstowcs_s_l()) that takes a locale argument.
why are you using C code, and why not write it in a more portable way, for example what I would do here is use the STL!
std::string src = std::string("C:\\user") +
std::string(" 1");
std::wstring dne = std::wstring(src.begin(), src.end());
wcout << dne;
it's so simple it's easy :D
L"Hello World"
the prefix L in front of the string makes it a wide char string.