I have a problem with the wcscpy_s function. After wcscpy_s returns the parameter (stringOneand stringTwo) of my function are not readable.
Here is simple demo to show the problem.
void testFunc(LPCWSTR stringOne, LPCWSTR stringTwo) {
wchar_t* defaultVal = L"Default";
wchar_t tmp[100];
int lenBefore = wcslen(stringOne); // Works
auto result = wcscpy_s(tmp, sizeof(tmp), defaultVal);
int len = wcslen(tmp);
int len2 = wcslen(stringOne); // Throws Exception Access violation
}
int main() {
testFunc(L"Test", L"Test");
}
The documentgation of wcscpy_s states that the debug version of this function fills the destination buffer with the special value 0xFE.
When you call wcscpy_s(tmp, sizeof(tmp), defaultVal); you pass the size of the tmp buffer, but wcscpy_s wants the length in number of characters. Therefore the length you pass to wcscpy_s is twice as long as it should be, and as the tmp buffer is overwritten by 0xfe you get a buffer overflow and undefined behaviour, even if the length if the source string (L"Default";) is small.
So use _countof(tmp) instead of _sizeof(tmp).
This said, I suggest you learn how to use the Visual Studio debugger.
As already explained in Michael Walz's answer, you have a buffer overflow caused by passing an incorrect buffer size.
In addition to his suggestion of using _countof(tmp) instead of sizeof(tmp), I'd like to add that in C++ there's a convenient overload of wcscpy_s() that automatically deduces the correct buffer size:
template <size_t size>
errno_t wcscpy_s(
wchar_t (&strDestination)[size],
const wchar_t *strSource
); // C++ only
Basically, you can write simpler code like this, that will just work:
wchar_t tmp[100];
// Use the C++-only template overload of wcscpy_s
// that automatically deduces the destination buffer size
auto result = wcscpy_s(tmp, defaultVal);
If you use this overload, you are immune to those sizeof/_countof mismatch kind of bugs.
Note that this C++ overload works only if you have a static buffer like your wchar_t tmp[100], as the C++ compiler must be able to figure out the buffer size at compile-time. On the other hand, when you have pointers to dynamically-allocated buffers, you have to pass the correct buffer size explicitly.
Related
So I have this function that has a string with a pre-defined buffer (the buffer is defined when calling a function).
My question is, why doesn't the compiler throws me an error whenever I do the following (without the new operator?):
int crc32test(unsigned char *write_string, int buffer_size){
// Append CRC32 to string
int CRC_NBYTES = 4;
int new_buffer_size = buffer_size + CRC_NBYTES; // Current buffer size + CRC
// HERE (DECLARATION OF THE STRING)
unsigned char appendedcrc_string[new_buffer_size];
return 0;
}
isn't THIS the correct way to do it..?
int crc32test(unsigned char *write_string, int buffer_size){
// Append CRC32 to string
int CRC_NBYTES = 4;
int new_buffer_size = buffer_size + CRC_NBYTES; // Current buffer size + CRC
// HERE (DECLARATION OF THE STRING USING NEW)
unsigned char * appendedcrc_string = new unsigned char[new_buffer_size+1];
delete[] appendedcrc_string ;
return 0;
}
And I actually compiled both, and both worked. Why isn't the compiler throwing me any error?
And is there a reason to use the new operator if apparently the former function works too?
There's a few answers here already, and I'm going to repeat several things said already. The first form you use is not valid C++, but will work in certain versions of GCC and CLang... It is decidedly non-portable.
There are a few options that you have as alternatives:
Use std::string<unsigned char> for your input and s.append(reinterpret_cast<unsigned char*>(crc), 4);
Similarly, you can use std::vector<unsigned char>
If your need is just for a simple resizable buffer, you can use std::unique_ptr<unsigned char[]> and use memcpy & std::swap, etc to move the data into a resized buffer and then free the old buffer.
As a non-portable alternative for temporary buffer creation, the alloca() function carves out a buffer by twiddling the stack pointer. It doesn't play very well with C++ features but it can be used if extremely careful about ensuring that the function will never have an exception thrown from it.
Store the CRC with the buffer in a structure like
struct input {
std::unique_ptr<unsigned char[]> buffer;
uint32_t crc;
}
And deal with the concatenation of the CRC and buffer someplace else in your code (i.e. on output). This, I believe is the best method.
The first code is ill-formed, however some compilers default to a mode where non-standard extensions are accepted.
You should be able to specify compiler switches for standard conformance. For example, in gcc, -std=c++17 -pedantic.
The second code is "correct" although not the preferred way either, you should use a container which frees the memory when execution leaves the scope, instead of a manual delete. For example, std::vector<unsigned char> buf(new_buffer_size + 1);.
The first example uses a C99 feature called Variable Length Arrays (VLA), that e.g. g++ by default supports as a C++ language extension. It's non-standard code.
Instead of the second example and similar, you should preferably use std::vector.
I've tried implementing a function like this, but unfortunately it doesn't work:
const wchar_t *GetWC(const char *c)
{
const size_t cSize = strlen(c)+1;
wchar_t wc[cSize];
mbstowcs (wc, c, cSize);
return wc;
}
My main goal here is to be able to integrate normal char strings in a Unicode application. Any advice you guys can offer is greatly appreciated.
In your example, wc is a local variable which will be deallocated when the function call ends. This puts you into undefined behavior territory.
The simple fix is this:
const wchar_t *GetWC(const char *c)
{
const size_t cSize = strlen(c)+1;
wchar_t* wc = new wchar_t[cSize];
mbstowcs (wc, c, cSize);
return wc;
}
Note that the calling code will then have to deallocate this memory, otherwise you will have a memory leak.
Use a std::wstring instead of a C99 variable length array. The current standard guarantees a contiguous buffer for std::basic_string. E.g.,
std::wstring wc( cSize, L'#' );
mbstowcs( &wc[0], c, cSize );
C++ does not support C99 variable length arrays, and so if you compiled your code as pure C++, it would not even compile.
With that change your function return type should also be std::wstring.
Remember to set relevant locale in main.
E.g., setlocale( LC_ALL, "" ).
const char* text_char = "example of mbstowcs";
size_t length = strlen(text_char );
Example of usage "mbstowcs"
std::wstring text_wchar(length, L'#');
//#pragma warning (disable : 4996)
// Or add to the preprocessor: _CRT_SECURE_NO_WARNINGS
mbstowcs(&text_wchar[0], text_char , length);
Example of usage "mbstowcs_s"
Microsoft suggest to use "mbstowcs_s" instead of "mbstowcs".
Links:
Mbstowcs example
mbstowcs_s, _mbstowcs_s_l
wchar_t text_wchar[30];
mbstowcs_s(&length, text_wchar, text_char, length);
You're returning the address of a local variable allocated on the stack. When your function returns, the storage for all local variables (such as wc) is deallocated and is subject to being immediately overwritten by something else.
To fix this, you can pass the size of the buffer to GetWC, but then you've got pretty much the same interface as mbstowcs itself. Or, you could allocate a new buffer inside GetWC and return a pointer to that, leaving it up to the caller to deallocate the buffer.
Andrew Shepherd 's answer.
Andrew Shepherd 's answer is Good for me, I add up some fix :
1, remove the ending char L'\0', casue sometime it will trouble.
2, use mbstowcs_s
std::wstring wtos(std::string& value){
const size_t cSize = value.size() + 1;
std::wstring wc;
wc.resize(cSize);
size_t cSize1;
mbstowcs_s(&cSize1, (wchar_t*)&wc[0], cSize, value.c_str(), cSize);
wc.pop_back();
return wc;
}
The question has several problems, but so do some of the answers. The idea of returning a pointer to allocated memory "and leaving it up to the caller to de-allocate" is asking for trouble. As a rule the best pattern is always to allocate and de-allocate within the same function. For example, something like:
wchar_t* buffer = new wchar_t[get_wcb_size(str)];
mbstowcs(buffer, str, get_wcb_size(str) + 1);
...
delete[] buffer;
In general, this requires two functions, one the caller calls to find out how much memory to allocate and a second to initialize or fill the allocated memory.
Unfortunately, the basic idea of using a function to return a "new" object is problematic -- not inherently, but because of the C++ inheritance of C memory handling. Using C++ and STL's strings/wstrings/strstreams is a better solution, but I felt the memory allocation thing needed to be better addressed.
Your problem has nothing to do with encodings, it's a simple matter of understanding basic C++. You are returning a pointer to a local variable from your function, which will have gone out of scope by the time anyone can use it, thus creating undefined behaviour (i.e. a programming error).
Follow this Golden Rule: "If you are using naked char pointers, you're Doing It Wrong. (Except for when you aren't.)"
I've previously posted some code to do the conversion and communicating the input and output in C++ std::string and std::wstring objects.
I did something like this. The first 2 zeros are because I don't know what kind of ascii type things this command wants from me. The general feeling I had was to create a temp char array. pass in the wide char array. boom. it works. The +1 ensures that the null terminating character is in the right place.
char tempFilePath[MAX_PATH] = "I want to convert this to wide chars";
int len = strlen(tempFilePath);
// Converts the path to wide characters
int needed = MultiByteToWideChar(0, 0, tempFilePath, len + 1, strDestPath, len + 1);
auto Ascii_To_Wstring = [](int code)->std::wstring
{
if (code>255 || code<0 )
{
throw std::runtime_error("Incorrect ASCII code");
}
std::string s{ char(code) };
std::wstring w{ s.begin(),s.end() };
return w;
};
I have been working with C++ strings and trying to load char * strings into std::string by using C functions such as strcpy(). Since strcpy() takes char * as a parameter, I have to cast it which goes something like this:
std::string destination;
unsigned char *source;
strcpy((char*)destination.c_str(), (char*)source);
The code works fine and when I run the program in a debugger, the value of *source is stored in destination, but for some odd reason it won't print out with the statement
std::cout << destination;
I noticed that if I use
std::cout << destination.c_str();
The value prints out correctly and all is well. Why does this happen? Is there a better method of copying an unsigned char* or char* into a std::string (stringstreams?) This seems to only happen when I specify the string as foo.c_str() in a copying operation.
Edit: To answer the question "why would you do this?", I am using strcpy() as a plain example. There are other times that it's more complex than assignment. For example, having to copy only X amount of string A into string B using strncpy() or passing a std::string to a function from a C library that takes a char * as a parameter for a buffer.
Here's what you want
std::string destination = source;
What you're doing is wrong on so many levels... you're writing over the inner representation of a std::string... I mean... not cool man... it's much more complex than that, arrays being resized, read-only memory... the works.
This is not a good idea at all for two reasons:
destination.c_str() is a const pointer and casting away it's const and writing to it is undefined behavior.
You haven't set the size of the string, meaning that it won't even necessealy have a large enough buffer to hold the string which is likely to cause an access violation.
std::string has a constructor which allows it to be constructed from a char* so simply write:
std::string destination = source
Well what you are doing is undefined behavior. Your c_str() returns a const char * and is not meant to be assigned to. Why not use the defined constructor or assignment operator.
std::string defines an implicit conversion from const char* to std::string... so use that.
You decided to cast away an error as c_str() returns a const char*, i.e., it does not allow for writing to its underlying buffer. You did everything you could to get around that and it didn't work (you shouldn't be surprised at this).
c_str() returns a const char* for good reason. You have no idea if this pointer points to the string's underlying buffer. You have no idea if this pointer points to a memory block large enough to hold your new string. The library is using its interface to tell you exactly how the return value of c_str() should be used and you're ignoring that completely.
Do not do what you are doing!!!
I repeat!
DO NOT DO WHAT YOU ARE DOING!!!
That it seems to sort of work when you do some weird things is a consequence of how the string class was implemented. You are almost certainly writing in memory you shouldn't be and a bunch of other bogus stuff.
When you need to interact with a C function that writes to a buffer there's two basic methods:
std::string read_from_sock(int sock) {
char buffer[1024] = "";
int recv = read(sock, buffer, 1024);
if (recv > 0) {
return std::string(buffer, buffer + recv);
}
return std::string();
}
Or you might try the peek method:
std::string read_from_sock(int sock) {
int recv = read(sock, 0, 0, MSG_PEEK);
if (recv > 0) {
std::vector<char> buf(recv);
recv = read(sock, &buf[0], recv, 0);
return std::string(buf.begin(), buf.end());
}
return std::string();
}
Of course, these are not very robust versions...but they illustrate the point.
First you should note that the value returned by c_str is a const char* and must not be modified. Actually it even does not have to point to the internal buffer of string.
In response to your edit:
having to copy only X amount of string A into string B using strncpy()
If string A is a char array, and string B is std::string, and strlen(A) >= X, then you can do this:
B.assign(A, A + X);
passing a std::string to a function from a C library that takes a char
* as a parameter for a buffer
If the parameter is actually const char *, you can use c_str() for that. But if it is just plain char *, and you are using a C++11 compliant compiler, then you can do the following:
c_function(&B[0]);
However, you need to ensure that there is room in the string for the data(same as if you were using a plain c-string), which you can do with a call to the resize() function. If the function writes an unspecified amount of characters to the string as a null-terminated c-string, then you will probably want to truncate the string afterward, like this:
B.resize(B.find('\0'));
The reason you can safely do this in a C++11 compiler and not a C++03 compiler is that in C++03, strings were not guaranteed by the standard to be contiguous, but in C++11, they are. If you want the guarantee in C++03, then you can use std::vector<char> instead.
I've tried implementing a function like this, but unfortunately it doesn't work:
const wchar_t *GetWC(const char *c)
{
const size_t cSize = strlen(c)+1;
wchar_t wc[cSize];
mbstowcs (wc, c, cSize);
return wc;
}
My main goal here is to be able to integrate normal char strings in a Unicode application. Any advice you guys can offer is greatly appreciated.
In your example, wc is a local variable which will be deallocated when the function call ends. This puts you into undefined behavior territory.
The simple fix is this:
const wchar_t *GetWC(const char *c)
{
const size_t cSize = strlen(c)+1;
wchar_t* wc = new wchar_t[cSize];
mbstowcs (wc, c, cSize);
return wc;
}
Note that the calling code will then have to deallocate this memory, otherwise you will have a memory leak.
Use a std::wstring instead of a C99 variable length array. The current standard guarantees a contiguous buffer for std::basic_string. E.g.,
std::wstring wc( cSize, L'#' );
mbstowcs( &wc[0], c, cSize );
C++ does not support C99 variable length arrays, and so if you compiled your code as pure C++, it would not even compile.
With that change your function return type should also be std::wstring.
Remember to set relevant locale in main.
E.g., setlocale( LC_ALL, "" ).
const char* text_char = "example of mbstowcs";
size_t length = strlen(text_char );
Example of usage "mbstowcs"
std::wstring text_wchar(length, L'#');
//#pragma warning (disable : 4996)
// Or add to the preprocessor: _CRT_SECURE_NO_WARNINGS
mbstowcs(&text_wchar[0], text_char , length);
Example of usage "mbstowcs_s"
Microsoft suggest to use "mbstowcs_s" instead of "mbstowcs".
Links:
Mbstowcs example
mbstowcs_s, _mbstowcs_s_l
wchar_t text_wchar[30];
mbstowcs_s(&length, text_wchar, text_char, length);
You're returning the address of a local variable allocated on the stack. When your function returns, the storage for all local variables (such as wc) is deallocated and is subject to being immediately overwritten by something else.
To fix this, you can pass the size of the buffer to GetWC, but then you've got pretty much the same interface as mbstowcs itself. Or, you could allocate a new buffer inside GetWC and return a pointer to that, leaving it up to the caller to deallocate the buffer.
I did something like this. The first 2 zeros are because I don't know what kind of ascii type things this command wants from me. The general feeling I had was to create a temp char array. pass in the wide char array. boom. it works. The +1 ensures that the null terminating character is in the right place.
char tempFilePath[MAX_PATH] = "I want to convert this to wide chars";
int len = strlen(tempFilePath);
// Converts the path to wide characters
int needed = MultiByteToWideChar(0, 0, tempFilePath, len + 1, strDestPath, len + 1);
Andrew Shepherd 's answer.
Andrew Shepherd 's answer is Good for me, I add up some fix :
1, remove the ending char L'\0', casue sometime it will trouble.
2, use mbstowcs_s
std::wstring wtos(std::string& value){
const size_t cSize = value.size() + 1;
std::wstring wc;
wc.resize(cSize);
size_t cSize1;
mbstowcs_s(&cSize1, (wchar_t*)&wc[0], cSize, value.c_str(), cSize);
wc.pop_back();
return wc;
}
The question has several problems, but so do some of the answers. The idea of returning a pointer to allocated memory "and leaving it up to the caller to de-allocate" is asking for trouble. As a rule the best pattern is always to allocate and de-allocate within the same function. For example, something like:
wchar_t* buffer = new wchar_t[get_wcb_size(str)];
mbstowcs(buffer, str, get_wcb_size(str) + 1);
...
delete[] buffer;
In general, this requires two functions, one the caller calls to find out how much memory to allocate and a second to initialize or fill the allocated memory.
Unfortunately, the basic idea of using a function to return a "new" object is problematic -- not inherently, but because of the C++ inheritance of C memory handling. Using C++ and STL's strings/wstrings/strstreams is a better solution, but I felt the memory allocation thing needed to be better addressed.
Your problem has nothing to do with encodings, it's a simple matter of understanding basic C++. You are returning a pointer to a local variable from your function, which will have gone out of scope by the time anyone can use it, thus creating undefined behaviour (i.e. a programming error).
Follow this Golden Rule: "If you are using naked char pointers, you're Doing It Wrong. (Except for when you aren't.)"
I've previously posted some code to do the conversion and communicating the input and output in C++ std::string and std::wstring objects.
auto Ascii_To_Wstring = [](int code)->std::wstring
{
if (code>255 || code<0 )
{
throw std::runtime_error("Incorrect ASCII code");
}
std::string s{ char(code) };
std::wstring w{ s.begin(),s.end() };
return w;
};
I'm writing a small proof-of-concept console program with Visual Studio 2008 and I wanted it to output colored text for readability. For ease of coding I also wanted to make a quick printf-replacement, something where I could write like this:
MyPrintf(L"Some text \1[bright red]goes here\1[default]. %d", 21);
This will be useful because I also build and pass strings around in some places so my strings will be able to contain formatting info.
However I hit a wall against wsprintf because I can't find a function that would allow me to find out the required buffer size before passing it to the function. I could, of course, allocate 1MB just-to-be-sure, but that wouldn't be pretty and I'd rather leave that as a backup solution if I can't find a better way.
Also, alternatively I'm considering using std::wstring (I'm actually more of a C guy with little C++ experience so I find plain-old-char-arrays easier for now), but that doesn't have anything like wsprintf where you could build a string with values replaced in them.
So... what should I do?
Your question is tagged C++, in which case I'd say std::wstringstream is the way to go. Example:
#include <sstream>
void func()
{
// ...
std::wstringstream ss; // the string stream
// like cout, you can add strings and numbers by operator<<
ss << L"Some text \1[bright red]goes here\1[default]. " << 21;
// function takes a C-style const wchar_t* string
some_c_function(ss.str().c_str()); // convert to std::wstring then const wchar_t*
// note: lifetime of the returned pointer probably temporary
// you may need a permanent std::wstring to return the c_str() from
// if you need it for longer.
// ...
}
You want _snwprintf. That function takes a buffer size, and if the buffer isn't big enough, just double the size of the buffer and try again. To keep from having to do multiple _snwprintf calls each time, keep track of what the buffer size was that you ended up using last time, and always start there. You'll make a few excess calls here and there, and you'll waste a bit of ram now and then, but it works great, and can't over-run anything.
I'd go for a C++ stringstream. It's not as compact as sprintf but it will give you the functionality you want.
If you can afford using boost, you could consider boost::format. It would give you the flexibility of std::strings, and formatting features of sprintf. It is fairly different from C-style, but is also fairly easy to use. Here's an example.
_scprintf, _scprintf_l, _scwprintf, _scwprintf_l
This functions will return the number of characters in the formatted string.
Using std::wstring seems like a good solution if you plan on passing strings between your objects - it handles the size and has a nice c_str method that will give you the array of wide chars.
The additional benefit is that you can pass it by reference instead of by pointer.
When you need the actuall string just use c_str method:
wprintf(L"string %s recieved!", myWString.c_str());
This answer is an expansion of the answer from #mheyman that uses vswprintf().
I also struggled with the same problem. The Microsoft documentation is weak, but this page was helpful: https://en.cppreference.com/w/c/io/vfwprintf
CppRef Description: If bufsz is greater than zero, writes the results to a wide string buffer. At most bufsz-1 wide characters are written followed by null wide character. If bufsz is zero, nothing is written (and buffer may be a null pointer).
CppRef Return value: Number of wide characters written (not counting the terminating null wide character) if successful or negative value if an encoding error occurred or if the number of characters to be generated was equal or greater than size (including when size is zero).
Roughly:
Measure required buffer size by calling vswprintf() with buffer == NULL and bufsz == 0
Call malloc() (or friends) to allocate a buffer.
Again, call vswprintf() with allocated buffer and buffer size + 1
Use result
Call free() on allocated buffer
Your example uses wchar_t: MyPrintf(L"Some text \1[bright red]goes here\1[default]. %d", 21);, so I recommend something like this:
#include <stdio.h> // includes both <wchar.h> and <stdarg.h>
#include <stdlib.h> // calloc()
void MyPrintf(const wchar_t *lpFormatWCharArr,
...)
{
// Ref: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/va-arg-va-copy-va-end-va-start?view=msvc-172
va_list ap;
va_start(ap, lpFormatWCharArr);
// does not include trailing NUL char
const int cch = vswprintf(NULL, // wchar_t *buffer
0, // size_t bufsz
lpFormatWCharArr, // const wchar_t *format
ap); // va_list vlist
va_end(ap);
if (cch < 0)
{
// handle error
}
const size_t NUL_CHAR_LEN = 1;
const size_t buf_len = cch + NUL_CHAR_LEN;
// malloc() is faster, but does not memset() result to zero
wchar_t *buf = calloc(buf_len, // size_t number
sizeof(wchar_t)); // size_t size
if (NULL == buf)
{
// handle error
}
va_list ap2;
va_start(ap2, lpFormatWCharArr);
// does not include trailing NUL char
const int cch2 = vswprintf(buf, // wchar_t *buffer
buf_len, // size_t bufsz
lpFormatWCharArr, // const wchar_t *format
ap2); // va_list vlist
va_end(ap2);
if (cch2 < 0)
{
// handle error
}
if (cch != cch2)
{
// handle error
}
// use 'buf' and 'buf_len'
free(buf);
}
There might be a (code) typo in this answer, but similar code was tested against 64-bit Win 10.