Why is vsnprintf safe? - c++

I have looked at this question as well as these PDFs' 1 and 2, this page and pretty much understand what happens if I do this printf(SOME_TEST_STRING). But what I do not understand is why exactly by ensuring the size of buffer vsnprintf becomes safe as compared to vsprintf?

What happens in these 2 cases ?
Case 1
char buf[3];
vsprint(buf, "%s", args);
Case 2
char buf[3];
vsnprint(buf, sizeof buf, "%s", args);
In case 1, if the string you're formatting has a length of 3 or greater, you have a buffer overrun, vsprintf might write to memory past the storage of the buf array, which is undefined behavior, possibly causing havoc/security concerns/crashes/etc.
In case 2. vsnprintf knows how big the buffer that will contain the result is, and it will make sure not to go past that(instead truncating the result to fit within buf ).

It's because vsnprintf has an additional size_t count parameter that vsprintf (and other non-n *sprintf methods) does not have. The implementation uses this to ensure that the data it writes to your buffer will not run off the end.
Data that runs off the end of a buffer can result in data corruption, or when maliciously exploited can be used as a buffer overrun attack.

The "n" in vsnprintf() means it takes the max size of the output string to avoid a buffer overflow. This makes it safe from buffer overflow, but does not make it safe if the format string comes from unsanitized user input. If your user gives you a giant format string, you'll avoid overflowing the target string, but if the user gives you %s and you don't pass a C string in the argument list at compile time, you are still left with undefined behavior.

I'm not sure what the problem is, since your question basically contains the answer already.
By passing your buffer size to vsnprintf you provide that function with information about your buffer size. The function now knows where the buffer ends and can make sure that it does not write past the end of the buffer.
vsprintf does not have information about buffer size, which is why it does not know where the buffer ends and cannot prevent buffer overflow.

Related

What happens if strncpy copies random data into a buffer?

Suppose you have a char buffer that you want to copy an std::string into. Are there consequences of copying extra data into the buffer, outside of the strings scope, even though the buffer has adequate size?
Example
std::string my_string = "hello";
char my_buffer[128];
memset(my_buffer, 0, 128);
strncpy(my_buffer, my_string.c_str(), 128);
So "hello" gets copied into my_buffer, but so will 123 other bytes of data that comes after my_string. Are there any consequences of this? Is it harmful for the buffer to hold this other data?
but so will 123 other bytes of data that comes after my_string
This assumption is incorrect: strncpy pays attention to null termination of the source string, never reading past null terminator. The remaining data will be set to '\0' characters:
destination is padded with zeros until a total of num characters have been written to it. [reference]
This is in contrast to memcpy, which requires both the source and the destination to be of sufficient size in order to avoid undefined behavior.
OK, let's assume what you wanted is:
strncpy(my_buffer, my_string.c_str(), 128);
Thsi is always a 0-terminated string by definition, so considering:
Copies at most count characters of the character array pointed to by src (including the terminating null character, but not any of the characters that follow the null character) to character array pointed to by dest.
You won't get anything copied after "hello" from the original string, the rest will be 0s:
If, after copying the terminating null character from src, count is not reached, additional null characters are written to dest until the total of count characters have been written.
According to strncpy() description here 1, the copy is done up to the length you provided, for null terminated string, so that when end of the string come before, like in this case, copy is done up to it and no more copy is done, so rest of the "123 bytes" are not copied, and the copy loop terminates
The other answers to this question have addressed what happens with strncpy() (i.e. it will copy your string correctly because it stops at the 0-terminator byte), so perhaps what gets more to the intent of the question would be, what if you had this instead?
memcpy(my_buffer, my_string.c_str(), 128);
In this case, the function (memcpy()) doesn't know about 0-terminated-string semantics, and will always just blindly copy 128 bytes (starting at the address returned by my_string.c_str()) to the address my_buffer. The first 6 (or so) of those bytes will be from my_string's internal buffer, and the remaining bytes will be from whatever happens to be in memory after that.
So the question is, what happens then? Well, this memcpy() call reads from "mystery memory" whose purpose you're not aware of, so you're invoking undefined behavior by doing that, and therefore in principle anything could happen. In practice, the likely result is that your buffer will contain a copy of whatever bytes were read (although you probably won't notice them, since you'll be using string functions that don't look past the 0/terminator byte in your array anyway).
There is a small chance, though, that the "extra" memory bytes that memcpy() read could be part of a memory-page that is marked as off-limits, in which case trying to read from that page would likely cause a segmentation fault.
And finally, there's the real bugaboo of undefined behavior, which is that your C++ compiler's optimizer is allowed to do all kinds of crazy modifications to your code's logic, in the name of making your program more efficient -- and (assuming the optimizer isn't buggy) all of those optimizations will still result in the program running as intended -- as long as the program follows the rules and doesn't invoke undefined behavior. Which is to say, if your program invokes undefined behavior in any way, the optimizations may be applied in ways that are very difficult to predict or understand, resulting in bizarre/unexpected behavior in your program. So the general rule is, avoid undefined behavior like the plague, because even if you think it "should be harmless", there's a very real possibility that it will end up doing things you wouldn't expect it to do, and then you're in for a long, painful debugging session as you try to figure out what's going on.
Assuming you mean definitely copying that data, as your current code would end at a null terminator.
Essentially, no. Whatever data is in there will be used only as string data so unless you then tried to do something weird with that spare data (try to put a function pointer to it and execute it etc.) then it's basically safe.
The issue is with initially copying random data past the end of your original string, that could overflow into protected data that you don't have access to and throw a segfault(Inaccessible memory exception)
Function strncpy will copy everything up to NUL character or specified size. That's it.
In addition to other answers please remember that strncpy does not NUL terminate the dst buffer when the src string length (excluding NUL) is equal to buffer's size given. You should always do something like my_buffer[127]=0 to prevent buffer overruns when handling the string later on. In libbsd there is strlcpy defined that always NUL terminate the buffer.
Check out this snippet:
#include <string.h>
int main(int argc, char ** argv)
{
char buf[8];
char *str = "deadbeef";
strlcpy(buf, str, sizeof(buf));
printf("%s\n", buf);
strncpy(buf, str, sizeof(buf));
printf("%s\n", buf);
return 0;
}
To see the problem compile it with flag -fsanitize=address. The second buffer is not NUL terminated!

Is it safe to call memcpy with count greater than memory allocated for src?

std::string str = "hello world!";
char dest[16];
memcpy(dest, str.c_str(), 16);
str.c_str() will return a null-terminated char array. However, if I call memcpy with a count greater than 13, what would happen? Is dest going to be a null-terminated char array? Is there anything I need to be cautious about?
Your code has undefined behaviour. When using memcpy, you need to make sure that the number of bytes being copied is not greater than the min(size_of_recepient, size_of_source).
In your case, the size of your source is 13 bytes, so copying more than that is not ok.
Is dest going to be a null-terminated char array? Is there anything I
need to be cautious about?
Unfortunately, the first parts of dest will be a null-terminated character array since str.c_str() returns a null terminated character array... But the remaining parts of dest will surely contain some additional garbage.
You are accessing additional memory that you have no idea about... Your code could reformat your PC ...its called Undefined behavior,
If you give it a count that's exactly how many sequential bytes will be copied. This means that all those bytes are read from after your string if you pass in a count larger than your actual source data. If this memory does not belong to your program to use, this might cause a crash due to a Segmentation Fault.
strcpy works differently in that sense since it checks for '\0'. Also if you want to copy bytes from a std::string to a char[], you could simply not rely on dangerous C functions and use std::copy.
It's not safe. You should use strncpy, which was made for this. What's after the end of the "hello world!" std::string is undefined. It could be an undefined part of your heap, in which case you'll be copying what might be called garbage, or it could be venturing into an unallocated memory page, in which case you'll get a segfault and your program dies. (Unless it has some clever magic to handle segfaults, which, judging by your question, it probably does not have.)

c++ what happens if you print more characters with sprintf, than the char pointer has allocated?

I assume this is a common way to use sprintf:
char pText[x];
sprintf(pText, "helloworld %d", Count );
but what exactly happens, if the char pointer has less memory allocated, than it will be print to?
i.e. what if x is smaller than the length of the second parameter of sprintf?
i am asking, since i get some strange behaviour in the code that follows the sprintf statement.
It's not possible to answer in general "exactly" what will happen. Doing this invokes what is called Undefined behavior, which basically means that anything might happen.
It's a good idea to simply avoid such cases, and use safe functions where available:
char pText[12];
snprintf(pText, sizeof pText, "helloworld %d", count);
Note how snprintf() takes an additional argument that is the buffer size, and won't write more than there is room for.
This is a common error and leads to memory after the char array being overwritten. So, for example, there could be some ints or another array in the memory after the char array and those would get overwritten with the text.
See a nice detailed description about the whole problem (buffer overflows) here. There's also a comment that some architectures provide a snprintf routine that has a fourth parameter that defines the maximum length (in your case x). If your compiler doesn't know it, you can also write it yourself to make sure you can't get such errors (or just check that you always have enough space allocated).
Note that the behaviour after such an error is undefined and can lead to very strange errors. Variables are usually aligned at memory locations divisible by 4, so you sometimes won't notice the error in most cases where you have written one or two bytes too much (i.e. forget to make place for a NUL), but get strange errors in other cases. These errors are hard to debug because other variables get changed and errors will often occur in a completely different part of the code.
This is called a buffer overrun.
sprintf will overwrite the memory that happens to follow pText address-wise. Since pText is on the stack, sprintf can overwrite local variables, function arguments and the return address, leading to all sorts of bugs. Many security vulnerabilities result from this kind of code — e.g. an attacker uses the buffer overrun to write a new return address pointing to his own code.
The behaviour in this situation is undefined. Normally, you will crash, but you might also see no ill effects, strange values appearing in unrelated variables and that kind of thing. Your code might also call into the wrong functions, format your hard-drive and kill other running programs. It is best to resolve this by allocating more memory for your buffer.
I have done this many times, you will receive memory corruption error. AFAIK, I remember i have done some thing like this:-
vector<char> vecMyObj(10);
vecMyObj.resize(10);
sprintf(&vecMyObj[0],"helloworld %d", count);
But when destructor of vector is called, my program receive memory corruption error, if size is less then 10, it will work successfully.
Can you spell Buffer Overflow ? One possible result will be stack corruption, and make your app vulnerable to Stack-based exploitation.

Using sprintf without a manually allocated buffer

In the application that I am working on, the logging facility makes use of sprintf to format the text that gets written to file. So, something like:
char buffer[512];
sprintf(buffer, ... );
This sometimes causes problems when the message that gets sent in becomes too big for the manually allocated buffer.
Is there a way to get sprintf behaviour without having to manually allocate memory like this?
EDIT: while sprintf is a C operation, I'm looking for C++ type solutions (if there are any!) for me to get this sort of behaviour...
You can use asprintf(3) (note: non-standard) which allocates the buffer for you so you don't need to pre-allocate it.
No you can't use sprintf() to allocate enough memory. Alternatives include:
use snprintf() to truncate the message - does not fully resolve your problem, but prevent the buffer overflow issue
double (or triple or ...) the buffer - unless you're in a constrained environment
use C++ std::string and ostringstream - but you'll lose the printf format, you'll have to use the << operator
use Boost Format that comes with a printf-like % operator
I dont also know a version wich avoids allocation, but if C99 sprintfs allows as string the NULL pointer. Not very efficient, but this would give you the complete string (as long as enough memory is available) without risking overflow:
length = snprintf(NULL, ...);
str = malloc(length+1);
snprintf(str, ...);
"the logging facility makes use of sprintf to format the text that gets written to file"
fprintf() does not impose any size limit. If you can write the text directly to file, do so!
I assume there is some intermediate processing step, however. If you know how much space you need, you can use malloc() to allocate that much space.
One technique at times like these is to allocate a reasonable-size buffer (that will be large enough 99% of the time) and if it's not big enough, break the data into chunks that you process one by one.
With the vanilla version of sprintf, there is no way to prevent the data from overwriting the passed in buffer. This is true regardless of wether the memory was manually allocated or allocated on the stack.
In order to prevent the buffer from being overwritten you'll need to use one of the more secure versions of sprintf like sprintf_s (windows only)
http://msdn.microsoft.com/en-us/library/ybk95axf.aspx

memset() causing data abort

I'm getting some strange, intermittent, data aborts (< 5% of the time) in some of my code, when calling memset(). The problem is that is usually doesn't happen unless the code is running for a couple days, so it's hard to catch it in the act.
I'm using the following code:
char *msg = (char*)malloc(sizeof(char)*2048);
char *temp = (char*)malloc(sizeof(char)*1024);
memset(msg, 0, 2048);
memset(temp, 0, 1024);
char *tempstr = (char*)malloc(sizeof(char)*128);
sprintf(temp, "%s %s/%s %s%s", EZMPPOST, EZMPTAG, EZMPVER, TYPETXT, EOL);
strcat(msg, temp);
//Add Data
memset(tempstr, '\0', 128);
wcstombs(tempstr, gdevID, wcslen(gdevID));
sprintf(temp, "%s: %s%s", "DeviceID", tempstr, EOL);
strcat(msg, temp);
As you can see, I'm not trying to use memset with a size larger that what's originally allocated with malloc()
Anyone see what might be wrong with this?
malloc can return NULL if no memory is available. You're not checking for that.
There's a couple of things. You're using sprintf which is inherently unsafe; unless you're 100% positive that you're not going to exceed the size of the buffer, you should almost always prefer snprintf. The same applies to strcat; prefer the safer alternative strncat.
Obviously this may not fix anything, but it goes a long way in helping spot what might otherwise be very annoying to spot bugs.
malloc can return NULL if no memory is
available. You're not checking for
that.
Right you are... I didn't think about that as I was monitoring the memory and it there was enough free. Is there any way for there to be available memory on the system but for malloc to fail?
Yes, if memory is fragmented. Also, when you say "monitoring memory," there may be something on the system which occasionally consumes a lot of memory and then releases it before you notice. If your call to malloc occurs then, there won't be any memory available. -- Joel
Either way...I will add that check :)
wcstombs doesn't get the size of the destination, so it can, in theory, buffer overflow.
And why are you using sprintf with what I assume are constants? Just use:
EZMPPOST" " EZMPTAG "/" EZMPVER " " TYPETXT EOL
C and C++ combines string literal declarations into a single string.
Have you tried using Valgrind? That is usually the fastest and easiest way to debug these sorts of errors. If you are reading or writing outside the bounds of allocated memory, it will flag it for you.
You're using sprintf which is
inherently unsafe; unless you're 100%
positive that you're not going to
exceed the size of the buffer, you
should almost always prefer snprintf.
The same applies to strcat; prefer the
safer alternative strncat.
Yeah..... I mostly do .NET lately and old habits die hard. I likely pulled that code out of something else that was written before my time...
But I'll try not to use those in the future ;)
You know it might not even be your code... Are there any other programs running that could have a memory leak?
It could be your processor. Some CPUs can't address single bytes, and require you to work in words or chunk sizes, or have instructions that can only be used on word or chunk aligned data.
Usually the compiler is made aware of these and works around them, but sometimes you can malloc a region as bytes, and then try to address it as a structure or wider-than-a-byte field, and the compiler won't catch it, but the processor will throw a data exception later.
It wouldn't happen unless you're using an unusual CPU. ARM9 will do that, for example, but i686 won't. I see it's tagged windows mobile, so maybe you do have this CPU issue.
Instead of doing malloc followed by memset, you should be using calloc which will clear the newly allocated memory for you. Other than that, do what Joel said.
NB borrowed some comments from other answers and integrated into a whole. The code is all mine...
Check your error codes. E.g. malloc can return NULL if no memory is available. This could be causing your data abort.
sizeof(char) is 1 by definition
Use snprintf not sprintf to avoid buffer overruns
If EZMPPOST etc are constants, then you don't need a format string, you can just combined several string literals as STRING1 " " STRING2 " " STRING3 and strcat the whole lot.
You are using much more memory than you need to.
With one minor change, you don't need to call memset in the first place. Nothing
really requires zero initialisation here.
This code does the same thing, safely, runs faster, and uses less memory.
// sizeof(char) is 1 by definition. This memory does not require zero
// initialisation. If it did, I'd use calloc.
const int max_msg = 2048;
char *msg = (char*)malloc(max_msg);
if(!msg)
{
// Allocaton failure
return;
}
// Use snprintf instead of sprintf to avoid buffer overruns
// we write directly to msg, instead of using a temporary buffer and then calling
// strcat. This saves CPU time, saves the temporary buffer, and removes the need
// to zero initialise msg.
snprintf(msg, max_msg, "%s %s/%s %s%s", EZMPPOST, EZMPTAG, EZMPVER, TYPETXT, EOL);
//Add Data
size_t len = wcslen(gdevID);
// No need to zero init this
char* temp = (char*)malloc(len);
if(!temp)
{
free(msg);
return;
}
wcstombs(temp, gdevID, len);
// No need to use a temporary buffer - just append directly to the msg, protecting
// against buffer overruns.
snprintf(msg + strlen(msg),
max_msg - strlen(msg), "%s: %s%s", "DeviceID", temp, EOL);
free(temp);