How to declare appropriate size for the buffer - c++

I'm using TCHAR in the Visual C++ poject I'm working on, which definition is shown below:
#ifdef _UNICODE
typedef wchar_t TCHAR;
#else
typedef char TCHAR;
#endif
I need to put some data into buffer buff:
char buff[size] = {0}; // how to declare the buffer size - what should be its value ?
sprintf(buff, "%s (ID: %i)", res->name(), res->id());
where:
name() returns TCHAR*
id() returns int
How to calculate the value of size - exact buffer capacity for actual needs (smaller if no unicode is defined, bigger if unicode is defined) ? In addition I'd like to protect myself from buffer overflow possibility, what kind of protection should I use ?
What's more, I've declared here the buffer as char. If I declare the buffer as int, would it be any difference for the size value (i.e 4 times smaller if compared to declared as char) ?
UPDATE
What I come up with partially based on Mats Petersson answer is:
size_t len;
const char *FORMAT;
#ifndef _UNICODE
len = strlen((char*)res->name());
FORMAT = "%s (ID: %i)";
#else
len = wcslen(res->name());
FORMAT = "%S (ID: %i)";
#endif
int size = 7 * sizeof(TCHAR) + /* place for characters inside format string */
len * sizeof(TCHAR) + /* place for "name" characters */
strlen(_itoa(id, ioatmp, 10)) * sizeof(TCHAR) + /* place for "id" digits */
1 * sizeof(TCHAR); /* zero byte(s) string terminator */
char *buff = new char[size]; /* buffer has to be declared dynamically on the heap,
* because its exact size is not known at compilation time */
sprintf(buff, FORMAT, name, id);
delete[] buff;
Is it correct thinking or did I miss something ?

To begin from the back, buff should always be char, because that's what is being stored by sprintf.
Second, if your res->name() is returning a wide-char (unicode) string, your format string should use "%S", for regular ASCII you should use "%s".
Now, to calculate the length required for the buffer, and avoid overflows. It's not that hard to do something like
const TCHAR *nm = res->name();
size_t len;
#ifndef UNICODE
len = strlen(nm);
#else
... see below.
#endif
and then guesstimate the length of the number (an integer can't take more than 12 places), along with the exact number of characters produced as constants in the format string.
This works fine for the standard ASCII variant.
However, it gets more fun with the wide char variant, as that can take up multiple bytes in the output string (e.g. writing Chinese characters that always require multibyte encoding). One solution is:
len = snprintf(0, NULL, "%S", nm);
which should give you the correct number [I think]. It's a pretty cumbersome method, but it will work. I'm not sure there is an easy way to convert a wide-string to "number of bytes needed to store this string" in another way.
Edit: I would seriously consider if it's much point in supporting non-UNICOD veariant, and then just convert the whole thing to using swprintf(...) instead. You still need the length, but it should just be the result of of wcslen(res->name()), rather than requiring some complex conversion calculation.

you can use: snprintf / swnprintf, it will return you number of chars/wchars needed.
here char buff[size] = {0}; you are writing outside of the buffer. UPDATE: I'll take that back - it just a declaration with initialization if size is constant.
this "%s (ID: %i)" shall be changed to this: "%s (ID: %d)" if last parameter is int.

Related

What causes the heap corruption in my method?

So I have tracked down an annoying heap corruption to a single method.
DWORD gdwCounter = 0;
TCHAR* GetName(const TCHAR* format, size_t len)
{
len += (snprintf(NULL, 0, "%lu", gdwCounter) * sizeof(TCHAR));
TCHAR *c = (TCHAR*)malloc(len);
_stprintf_s(c, len, __TEXT("%s%lu"), format, gdwCounter);
return c;
}
To make sure I found the correct method, I tried to change it and just copy the 'format' buffer it gets passed as an parameter to the output buffer. Heap corruption went away and everything was fine again.
I decided to look at the documentations of snprintf and _stprintf_s.
snprintf is supposed to return the required characters without the null-terminating character to actually print your buffer in a second call to it.
My len parameter already contains the full size (with null-terminating character) of format.
Also I couldn't find any hints to what is wrong in the documentation of _stprintf_s.
So what am I missing?
Edit: After further testing I found out that apparently _stprintf_s causes the error as snprintf does return the correct size.
TCHAR* GetName(const TCHAR* format, size_t len)
{
len += snprintf(NULL, 0, "%lu", gdwCounter);
TCHAR *c = (TCHAR*)malloc(len*sizeof(TCHAR));
_stprintf_s(c, len, __TEXT("%s%lu"), format, gdwCounter);
return c;
}
_stprintf_s takes the "Maximum number of characters to store" instead of maximum number of bytes.

String is not null terminated error

I'm having a string is not null terminated error, though I'm not entirely sure why. The usage of std::string in the second part of the code is one of my attempt to fix this problem, although it still doesn't work.
My initial codes was just using the buffer and copy everything into client_id[]. The error than occurred. If the error is correct, that means I've got either client_ id OR theBuffer does not have a null terminator. I'm pretty sure client_id is fine, since I can see it in debug mode. Strange thing is buffer also has a null terminator. No idea what is wrong.
char * next_token1 = NULL;
char * theWholeMessage = &(inStream[3]);
theTarget = strtok_s(theWholeMessage, " ",&next_token1);
sendTalkPackets(next_token1, sizeof(next_token1) + 1, id_clientUse, (unsigned int)std::stoi(theTarget));
Inside sendTalkPackets is. I'm getting a string is not null terminated at the last line.
void ServerGame::sendTalkPackets(char * buffer, unsigned int buffersize, unsigned int theSender, unsigned int theReceiver)
{
std::string theMessage(buffer);
theMessage += "0";
const unsigned int packet_size = sizeof(Packet);
char packet_data[packet_size];
Packet packet;
packet.packet_type = TALK;
char client_id[MAX_MESSAGE_SIZE];
char theBuffer[MAX_MESSAGE_SIZE];
strcpy_s(theBuffer, theMessage.c_str());
//Quick hot fix for error "string not null terminated"
const char * test = theMessage.c_str();
sprintf_s(client_id, "User %s whispered: ", Usernames.find(theSender)->second.c_str());
printf("This is it %s ", buffer);
strcat_s(client_id, buffersize , theBuffer);
Methinks that problem lies in this line:
sendTalkPackets(next_token1, sizeof(next_token1) + 1, id_clientUse, (unsigned int)std::stoi(theTarget));
sizeof(next_token1)+1 will always gives 5 (on 32 bit platform) because it return size of pointer not size of char array.
One thing which could be causing this (or other problems): As
buffersize, you pass sizeof(next_token1) + 1. next_token1 is
a pointer, which will have a constant size of (typically) 4 or 8. You
almost certainly want strlen(next_token1) + 1. (Or maybe without the
+ 1; conventions for passing sizes like this generally only include
the '\0' if it is an output buffer. There are a couple of other
places where you're using sizeof, which may have similar problems.
But it would probably be better to redo the whole logic to use
std::string everywhere, rather than all of these C routines. No
worries about buffer sizes and '\0' terminators. (For protocol
buffers, I've also found std::vector<char> or std::vector<unsigned char>
quite useful. This was before the memory in std::string was
guaranteed to be contiguous, but even today, it seems to correspond more
closely to the abstraction I'm dealing with.)
You can't just do
std::string theMessage(buffer);
theMessage += "0";
This fails on two fronts:
The std::string constructor doesn't know where buffer ends, if buffer is not 0-terminated. So theMessage will potentially be garbage and include random stuff until some zero byte was found in the memory beyond the buffer.
Appending string "0" to theMessage doesn't help. What you want is to put a zero byte somewhere, not value 0x30 (which is the ascii code for displaying a zero).
The right way to approach this, is to poke a literal zero byte buffersize slots beyond the start of the buffer. You can't do that in buffer itself, because buffer may not be large enough to accomodate that extra zero byte. A possibility is:
char *newbuffer = malloc(buffersize + 1);
strncpy(newbuffer, buffer, buffersize);
newbuffer[buffersize] = 0; // literal zero value
Or you can construct a std::string, whichever you prefer.

SetWindowText with a single dimensional array

Is it possible to display a single dimensional array of values using SetWindowsText() in a text box on windows api?
for example. SetWindowText(hwndStatic3, sArray);
******************EDIT************
I have a textbox on the windows api where I use GetWindowText() to retrieve the string written in the text box then I convert the string to decimal array. I then convert this decimal array value to hexadecimal value as I am trying to print those values using SetwindowsText within another textbox. However only the last value of the array is printing. How can I print all the values?
******************EDIT************
code:
GetWindowText(hwndtext1, value, 256);
for (i = 15; i >= 0; i--)
{
temp[i] = atoll(value); //converts sting to decimal
ulltoa(temp[i] , sArray, 16); //converts decimal to hexadecimal
buf[i] = temp[i];
}
SetWindowText(hwndStatic3, sArray);
SetWindowText is just a macro with signature:
BOOL SetWindowText(HWND, const TCHAR*);
Depending on your build settings, it will call one of the following:
BOOL SetWindowTextA(HWND, const char*); //ansi version
BOOL SetWindowTextW(HWND, const wchar_t*); //unicode version
where TCHAR is defined as:
#ifdef _UNICODE
typedef wchar_t TCHAR;
#else
typedef char TCHAR;
#endif
So, an array of strings is not compatible with SetWindowText but an array of characters will work, provided that the array is of type TCHAR *, or of type (char * or wchar_t *) that is compatible with your settings.
First, atoll and ulltoa aren't documented with the Microsoft Visual C/C++ (which is what I use for Windows) so I'm working from documentation I found online. Either your versions do more than those I've found documented, or you've left out some significant code from your example.
Based on the loop control, I'm guessing that you expect to always find 15 values in the string you read from the first control. BUT... the atoll and ulltoa functions only operate on one value at a time and do nothing to advance through the input list. So your loop is converting the first number from string to 64 bit int and then converting that into a string 15 times.
Since you say the last value is the only one you see, your functions must actually be parsing the value string in some way that is not apparent in your example. However, ulltoa seems to always be placing the value into the same place in the same string variable, with each subsequent call in the loop overwriting the previous call. My lazy self would add a bit like this:
int len = 0;
char szOutput[15*20]; // enough space for 15 64 bit hex strings
GetWindowText(hwndtext1, value, 256);
for (i = 15; i >= 0; i--)
{
temp[i] = atoll(value); //converts sting to decimal
ulltoa(temp[i] , sArray, 16); //converts decimal to hexadecimal
buf[i] = temp[i];
len += sprintf( szOutput+len, "%s ", sArray );
}
szOutput[len-1] - '\0'; // remove the final space
SetWindowText(hwndStatic3, szOutput);
Of course, with the sprintf you could also skip the ulltoa call entirely and change the sprintf line to:
len += sprintf( szOutput+len, "%16.16I64X", temp[i] );
(or whatever flavor/form of the hex output you want (see the printf format documentation for details.) If you want your list to be one item per line, then replace the trailing space with a newline. Oh, the I64 in the %16.16I64X is a Microsoft thing that might be different in other compilers/libraries.
FYI, the sprintf technique I used lets the function keep appending to the end of the buffer but incrementing the offset into the buffer (len) by the length of the string just appended, which is the value returned by sprintf. It is a quick and easy way to assembling string lists such as yours.

Convert wchar_t to char

I was wondering is it safe to do so?
wchar_t wide = /* something */;
assert(wide >= 0 && wide < 256 &&);
char myChar = static_cast<char>(wide);
If I am pretty sure the wide char will fall within ASCII range.
Why not just use a library routine wcstombs.
assert is for ensuring that something is true in a debug mode, without it having any effect in a release build. Better to use an if statement and have an alternate plan for characters that are outside the range, unless the only way to get characters outside the range is through a program bug.
Also, depending on your character encoding, you might find a difference between the Unicode characters 0x80 through 0xff and their char version.
You are looking for wctomb(): it's in the ANSI standard, so you can count on it. It works even when the wchar_t uses a code above 255. You almost certainly do not want to use it.
wchar_t is an integral type, so your compiler won't complain if you actually do:
char x = (char)wc;
but because it's an integral type, there's absolutely no reason to do this. If you accidentally read Herbert Schildt's C: The Complete Reference, or any C book based on it, then you're completely and grossly misinformed. Characters should be of type int or better. That means you should be writing this:
int x = getchar();
and not this:
char x = getchar(); /* <- WRONG! */
As far as integral types go, char is worthless. You shouldn't make functions that take parameters of type char, and you should not create temporary variables of type char, and the same advice goes for wchar_t as well.
char* may be a convenient typedef for a character string, but it is a novice mistake to think of this as an "array of characters" or a "pointer to an array of characters" - despite what the cdecl tool says. Treating it as an actual array of characters with nonsense like this:
for(int i = 0; s[i]; ++i) {
wchar_t wc = s[i];
char c = doit(wc);
out[i] = c;
}
is absurdly wrong. It will not do what you want; it will break in subtle and serious ways, behave differently on different platforms, and you will most certainly confuse the hell out of your users. If you see this, you are trying to reimplement wctombs() which is part of ANSI C already, but it's still wrong.
You're really looking for iconv(), which converts a character string from one encoding (even if it's packed into a wchar_t array), into a character string of another encoding.
Now go read this, to learn what's wrong with iconv.
An easy way is :
wstring your_wchar_in_ws(<your wchar>);
string your_wchar_in_str(your_wchar_in_ws.begin(), your_wchar_in_ws.end());
char* your_wchar_in_char = your_wchar_in_str.c_str();
I'm using this method for years :)
A short function I wrote a while back to pack a wchar_t array into a char array. Characters that aren't on the ANSI code page (0-127) are replaced by '?' characters, and it handles surrogate pairs correctly.
size_t to_narrow(const wchar_t * src, char * dest, size_t dest_len){
size_t i;
wchar_t code;
i = 0;
while (src[i] != '\0' && i < (dest_len - 1)){
code = src[i];
if (code < 128)
dest[i] = char(code);
else{
dest[i] = '?';
if (code >= 0xD800 && code <= 0xD8FF)
// lead surrogate, skip the next code unit, which is the trail
i++;
}
i++;
}
dest[i] = '\0';
return i - 1;
}
Technically, 'char' could have the same range as either 'signed char' or 'unsigned char'. For the unsigned characters, your range is correct; theoretically, for signed characters, your condition is wrong. In practice, very few compilers will object - and the result will be the same.
Nitpick: the last && in the assert is a syntax error.
Whether the assertion is appropriate depends on whether you can afford to crash when the code gets to the customer, and what you could or should do if the assertion condition is violated but the assertion is not compiled into the code. For debug work, it seems fine, but you might want an active test after it for run-time checking too.
Here's another way of doing it, remember to use free() on the result.
char* wchar_to_char(const wchar_t* pwchar)
{
// get the number of characters in the string.
int currentCharIndex = 0;
char currentChar = pwchar[currentCharIndex];
while (currentChar != '\0')
{
currentCharIndex++;
currentChar = pwchar[currentCharIndex];
}
const int charCount = currentCharIndex + 1;
// allocate a new block of memory size char (1 byte) instead of wide char (2 bytes)
char* filePathC = (char*)malloc(sizeof(char) * charCount);
for (int i = 0; i < charCount; i++)
{
// convert to char (1 byte)
char character = pwchar[i];
*filePathC = character;
filePathC += sizeof(char);
}
filePathC += '\0';
filePathC -= (sizeof(char) * charCount);
return filePathC;
}
one could also convert wchar_t --> wstring --> string --> char
wchar_t wide;
wstring wstrValue;
wstrValue[0] = wide
string strValue;
strValue.assign(wstrValue.begin(), wstrValue.end()); // convert wstring to string
char char_value = strValue[0];
In general, no. int(wchar_t(255)) == int(char(255)) of course, but that just means they have the same int value. They may not represent the same characters.
You would see such a discrepancy in the majority of Windows PCs, even. For instance, on Windows Code page 1250, char(0xFF) is the same character as wchar_t(0x02D9) (dot above), not wchar_t(0x00FF) (small y with diaeresis).
Note that it does not even hold for the ASCII range, as C++ doesn't even require ASCII. On IBM systems in particular you may see that 'A' != 65

Using istringstream to process a memory block of variable length

I'm trying to use istringstream to recreate an encoded wstring from some memory. The memory is laid out as follows:
1 byte to indicate the start of the wstring encoding. Arbitrarily this is '!'.
n bytes to store the character length of the string in text format, e.g. 0x31, 0x32, 0x33 would be "123", i.e. a 123-character string
1 byte separator (the space character)
n bytes which are the wchars which make up the string, where wchar_t's are 2-bytes each.
For example, the byte sequence:
21 36 20 66 00 6f 00 6f 00
is "!6 f.o.o." (using dots to represent char 0)
All I've got is a char* pointer (let's call it pData) to the start of the memory block with this encoded data in it. What's the 'best' way to consume the data to reconstruct the wstring ("foo"), and also move the pointer to the next byte past the end of the encoded data?
I was toying with using an istringstream to allow me to consume the prefix byte, the length of the string, and the separator. After that I can calculate how many bytes to read and use the stream's read() function to insert into a suitably-resized wstring. The problem is, how do I get this memory into the istringstream in the first place? I could try constructing a string first and then pass that into the istringstream, e.g.
std::string s((const char*)pData);
but that doesn't work because the string is truncated at the first null byte. Or, I could use the string's other constructor to explicitly state how many bytes to use:
std::string s((const char*)pData, len);
which works, but only if I know what len is beforehand. That's tricky given that the data is variable length.
This seems like a really solvable problem. Does my rookie status with strings and streams mean I'm overlooking an easy solution? Or am I barking up the wrong tree with the whole string approach?
Try setting your stringstream's rdbuf:
char* buffer = something;
std::stringbuf *pbuf;
std::stringstream ss;
std::pbuf=ss.rdbuf();
std::pbuf->sputn(buffer, bufferlength);
// use your ss
Edit: I see that this solution will have a similar problem to your string(char*, len) situation. Can you tell us more about your buffer object? If you don't know the length, and it isn't null terminated, it's going to be very hard to deal with.
Is it possible to modify how you encode the length, and make that a fixed size?
unsigned long size = 6; // known string length
char* buffer = new char[1 + sizeof(unsigned long) + 1 + size];
buffer[0] = '!';
memcpy(buffer+1, &size, sizeof(unsigned long));
buffer should hold the start indicator (1 byte), the actual size (size of unsigned long), the delimiter (1 byte) and the text itself (size).
This way, you could get the size "pretty" easy, then set the pointer to point beyond the overhead, and then use the len variable in the string constructor.
unsigned long len;
memcpy(&len, pData+1, sizeof(unsigned long)); // +1 to avoid the start indicator
// len now contains 6
char* actualData = pData + 1 + sizeof(unsigned long) + 1;
std::string s(actualData, len);
It's low level and error prone :) (for instance if you read anything that isn't encoded the way that you expect it to be, the len can get pretty big), but you avoid dynamically reading the length of the string.
It seems like something on this order should work:
std::wstring make_string(char const *input) {
if (*input != '!')
return "";
char length = *++input;
return std::wstring(++input, length);
}
The difficult part is dealing with the variable length of the size. Without something to specify the length it's hard to guess when to stop treating the data as specifying the length of the string.
As for moving the pointer, if you're going to do it inside a function, you'll need to pass a reference to the pointer, but otherwise it's a simple matter of adding the size you found to the pointer you received.
It's tempting to (ab)use the (deprecated but nevertheless standard) std::istrstream here:
// Maximum size to read is
// 1 for the exclamation mark
// Digits for the character count (digits10() + 1)
// 1 for the space
const std::streamsize max_size = 3 + std::numeric_limits<std::size_t>::digits10;
std::istrstream s(buf, max_size);
if (std::istream::traits_type::to_char_type(s.get()) != '!'){
throw "missing exclamation";
}
std::size_t size;
s >> size;
if (std::istream::traits_type::to_char_type(s.get()) != ' '){
throw "missing space";
}
std::wstring(reinterpret_cast<wchar_t*>(s.rdbuf()->str()), size/sizeof(wchar_t));