String error-checking - c++

I am using a lot of string functions like strncpy, strncat, sprintf etc. in my code. I know there are better alternatives to these, but I was handed over an old project, where these functions were used, so I have to stick with them for compatibility and consistency. My supervisor is very fussy about error checking and robustness, and insists that I check for buffer-overflow violations everytime I use these functions. This has created a lot of if-else statements in my code, which do not look pretty. My question is, is it really necessary to check for overflow everytime I call one of these functions? Even if I know that a buffer overflow can't possibly occur e.g. when storing an integer in a string using the sprintf function
sprintf(buf,"%d",someInteger);
I know that the maximum length of an unsigned integer on a 64-bit system can be 20 digits. buf on the other hand is well over 20 characters long. Should I still check for buffer overflow in this case?

I think the way to go is using exceptions. Exceptions are very useful when you must decouple the normal control-flow of a program and error-checking.
What you can do is create a wrapper for every string function in which you perform the error checking and throw an exception if a buffer overflow would occur.
Then, in your client code, you can simply call your wrappers inside a try block, and then check for exceptions and return error codes inside the catch block.
Sample code (not tested):
int sprintf_wrapper( char *buffer, int buffer_size, const char *format, ... )
{
if( /* check for buffer overflow */ )
throw my_buffer_exception;
va_list arg_ptr;
va_start( arg_ptr, format );
int ret = sprintf( buffer, , format, arg_ptr );
va_end(arg_ptr);
return ret;
}
Error foo()
{
//...
try{
sprintf_wrapper(buf1, 100, "%d", i1);
sprintf_wrapper(buf2, 100, "%d", i2);
sprintf_wrapper(buf3, 100, "%d", i3);
}
catch( my_buffer_exception& )
{
return err_code;
}
}

Maybe write a test case that you can invoke to simply test the buffer to reduce code duplication and ugliness.
You could abstract the if/else statements into a method of another class, and then pass in the buffer and length expected.
By nature, these buffers are VERY susceptible to overwrites, so be careful ANYTIME you take input in from a user/outside source. You could also try getting a string length (using strlen), or checking for the /0 end string character yourself, and comparing that to the buffer size. If you loop for the /0 character,and it's not there, you will get into an infinite loop if you don't constrain the max size of your loop by the expected buffer size, so check for this too.
Another option, is to refactor code, such that every time those methods are used, you replace them with a length safe version you write, where it calls a method with those checks already in place (but have to pass the buffer size to it). This may not be possible for some projects, as the complexity may be very hard to unit test.

Let me address your last paragraph first: You write code once, in contrast to how long it will be maintained and used. Guess how long you think your code will be in use, and then multiply that by 10-20 to figure out how long it will actually be in use. At the end of that window it's completely likely that an integer could be much bigger and overflow you buffer, so yes you must do buffer checking.
Given that you have a few options:
Use the "n" series of functions like snprintf to prevent buffer overflows and tell your users that it's undefined what will happen if the buffers overflow.
Consider it fatal and either abort() or throw an uncaught exception when a length violation occurs.
Try to notify the user there's a problem and either abort the operation or attempt to let the user modify input and retry.
The first two approaches are definitely going to be easier to implement and maintain because you don't have to worry about getting the right information back to the user in a reasonable way. In any of the cases you could most likely factored into a function as suggested in other answers.
Finally let me say since you've tagged this question C++ and not C, think long and hard about slowly migrating your code base to C++ (because your code base is C right now) and utilize the C++ facilities which then totally remove the need for these buffer checks, as it will happen automatically for you.

You can use gcc "-D_FORTIFY_SOURCE=1 D_FORTIFY_SOURCE=2" for buffer overflow detection.
https://securityblog.redhat.com/2014/03/26/fortify-and-you/

Related

vswprintf() without buffer size crashes on small buffer instead of EOF. How to pass buffer size

Using Borland C++ Builder 2009
I use vswprintf per RAD Studio's Help (F1) :
int vswprintf(wchar_t *buffer, const wchar_t *format, va_list arglist);
Up to now I always provided a big buffer wchar_t OutputStr[1000] and never had any issues. As a test and wanting to do an improvement action, I tried a small buffer wchar_t OutputStr[12] and noticed that the program crashes entirely. Even try{}catch(...){} doesn't catch it. Codeguard reports that a memcpy() fails, which seems to be the internal implementation. I had expected an EOF as return value instead.
When searching online for vswprintf I find the c++ variant takes a buffer size as input, but I can't seem to convince my compiler to use that variant ?
Any idea how to force it using BCB2009 ?
The whole point of the exercise was to implement a fall back scenario for when the buffer is too small in possibly one or two freak situations, so that I can allocate more memory for the function and try again. But this mechanism doesn't seem to work at all.
Not sure how to best test for the exact amount of bytes/characters needed either ?
You can use vswprintf_s. It returns negative value on failure

Does buffer overflow happen in C++ strings?

This is concerning strings in C++. I have not touched C/C++ for a very long time; infact I did programming in those languages only for the first year in my college, about 7 years ago.
In C to hold strings I had to create character arrays(whether static or dynamic, it is not of concern). So that would mean that I need to guess well in advance the size of the string that an array would contain. Well I applied the same approach in C++. I was aware that there was a std::string class but I never got around to use it.
My question is that since we never declare the size of an array/string in std::string class, does a buffer overflow occur when writing to it. I mean, in C, if the array’s size was 10 and I typed more than 10 characters on the console then that extra data would be writtein into some other object’s memory place, which is adjacent to the array. Can a similar thing happen in std::string when using the cin object.
Do I have to guess the size of the string before hand in C++ when using std::string?
Well! Thanks to you all. There is no one right answer on this page (a lot of different explanations provided), so I am not selecting any single one as such. I am satisfied with the first 5. Take Care!
Depending on the member(s) you are using to access the string object, yes. So, for example, if you use reference operator[](size_type pos) where pos > size(), yes, you would.
Assuming no bugs in the standard library implementation, no. A std::string always manages its own memory.
Unless, of course, you subvert the accessor methods that std::string provides, and do something like:
std::string str = "foo";
char *p = (char *)str.c_str();
strcpy(p, "blah");
You have no protection here, and are invoking undefined behaviour.
The std::string generally protects against buffer overflow, but there are still situations in which programming errors can lead to buffer overflows. While C++ generally throws an out_of_range exception when an operation references memory outside the bounds of the string, the subscript operator [] (which does not perform bounds checking) does not.
Another problem occurs when converting std::string objects to C-style strings. If you use string::c_str() to do the conversion, you get a properly null-terminated C-style string. However, if you use string::data(), which writes the string directly into an array (returning a pointer to the array), you get a buffer that is not null terminated. The only difference between c_str() and data() is that c_str() adds a trailing null byte.
Finally, many existing C++ programs and libraries have their own string classes. To use these libraries, you may have to use these string types or constantly convert back and forth. Such libraries are of varying quality when it comes to security. It is generally best to use the standard library (when possible) or to understand the semantics of the selected library. Generally speaking, libraries should be evaluated based on how easy or complex they are to use, the type of errors that can be made, how easy these errors are to make, and what the potential consequences may be.
refer https://buildsecurityin.us-cert.gov/bsi/articles/knowledge/coding/295-BSI.html
In c the cause is explained as follow:
void function (char *str) {
char buffer[16];
strcpy (buffer, str);
}
int main () {
char *str = "I am greater than 16 bytes"; // length of str = 27 bytes
function (str);
}
This program is guaranteed to cause unexpected behavior, because a string (str) of 27 bytes has been copied to a location (buffer) that has been allocated for only 16 bytes. The extra bytes run past the buffer and overwrites the space allocated for the FP, return address and so on. This, in turn, corrupts the process stack. The function used to copy the string is strcpy, which completes no checking of bounds. Using strncpy would have prevented this corruption of the stack. However, this classic example shows that a buffer overflow can overwrite a function's return address, which in turn can alter the program's execution path. Recall that a function's return address is the address of the next instruction in memory, which is executed immediately after the function returns.
here is a good tutorial that can give your answer satisfactory.
In C++, the std::string class starts out with a minimum size (or you can specify a starting size). If that size is exceeded, std::string allocates more dynamic memory.
Assuming the library providing std::string is correctly written, you cannot cause a buffer overflow by adding characters to a std::string object.
Of course, bugs in the library are not impossible.
"Do buffer overflows occur in C++ code?"
To the extent that C programs are legal C++ code (they almost all are), and C programs have buffer overflows, C++ programs can have buffer overflows.
Being richer than C, I'm sure C++ can have buffer overflows in ways that C cannot :-}

(How) do you handle possible integer overflows in C++ code?

Every now and then, especially when doing 64bit builds of some code base, I notice that there are plenty of cases where integer overflows are possible. The most common case is that I do something like this:
// Creates a QPixmap out of some block of data; this function comes from library A
QPixmap createFromData( const char *data, unsigned int len );
const std::vector<char> buf = createScreenShot();
return createFromData( &buf[0], buf.size() ); // <-- warning here in 64bit builds
The thing is that std::vector::size() nicely returns a size_t (which is 8 bytes in 64bit builds) but the function happens to take an unsigned int (which is still only 4 bytes in 64bit builds). So the compiler warns correctly.
If possible, I try to fix up the signatures to use the correct types in the first place. However, I'm often hitting this problem when combining functions from different libraries which I cannot modify. Unfortunately, I often resort to some reasoning along the lines of "Okay, nobody will ever do a screenshot generating more than 4GB of data, so why bother" and just change the code to do
return createFromData( &buf[0], static_cast<unsigned int>( buf.size() ) );
So that the compiler shuts up. However, this feels really evil. So I've been considering to have some sort of runtime assertion which at least yields a nice error in the debug builds, as in:
assert( buf.size() < std::numeric_limits<unsigned int>::maximum() );
This is a bit nicer already, but I wonder: how do you deal with this sort of problem, that is: integer overflows which are "almost" impossible (in practice). I guess that means that they don't occur for you, they don't occur for QA - but they explode in the face of the customer.
If you can't fix the types (because you can't break library compatibility), and you're "confident" that the size will never get that big, you can use boost::numeric_cast in place of the static_cast. This will throw an exception if the value is too big.
Of course the surrounding code then has to do something vaguely sensible with the exception - since it's a "not expected ever to happen" condition, that might just mean shutting down cleanly. Still better than continuing with the wrong size.
The solution depends on context. In some cases, you know where the data
comes from, and can exclude overflow: an int that is initialized with
0 and incremented once a second, for example, isn't going to overflow
anytime in the lifetime of the machine. In such cases, you just convert
(or allow the implicit conversion to do its stuff), and don't worry
about it.
Another type of case is fairly similar: in your case, for example, it's
probably not reasonable for a screen schot to have more data that can be
represented by an int, so the conversion is also safe. Provided the
data really did come from a screen shot; in such cases, the usual
procedure is to validate the data on input, ensuring that it fulfills
your constraints downstream, and then do no further validation.
Finally, if you have no real control over where the data is coming from,
and can't validate on entry (at least not for your constraints
downstream), you're stuck with using some sort of checking conversion,
validating immediately at the point of conversion.
If you push a 64-bit overflowing number into a 32-bit library you open pandora's box -- undefined behaviour.
Throw an exception. Since exceptions can in general spring up arbitrarily anywhere you should have suitable code to catch it anyway. Given that, you may as well exploit it.
Error messages are unpleasant but they're better than undefined behaviour.
Such scenarios can be held in one of four ways or using a combination of them:
use right types
use static assertions
use runtime assertions
ignore until hurts
Usually the best is to use right types right until your code gets ugly and then roll in static assertions. Static assertions are much better than runtime assertions for this very purpose.
Finally, when static assertions won't work (like in your example) you use runtime assertions - yes, they get into customers' faces, but at least your program behaves predictably. Yes, customers don't like assertions - they start panic ("we have error!" in all caps), but without an assertion the program would likely misbehave and no way to easily diagnose the problem would be.
One thing just came to my mind: since I need some sort of runtime check (whether or not the value of e.g. buf.size() exceeds the range of unsigned int can only be tested at runtime), but I do not want to have a million assert() invocations everywhere, I could do something like
template <typename T, typename U>
T integer_cast( U v ) {
assert( v < std::numeric_limits<T>::maximum() );
return static_cast<T>( v );
}
That way, I would at least have the assertion centralized, and
return createFromData( &buf[0], integer_cast<unsigned int>( buf.size() ) );
Is a tiny bit better. Maybe I should rather throw an exception (it is quite exceptional indeed!) instead of assert'ing, to give the caller a chance to handle the situation gracefully by rolling back previous work and issueing diagnostic output or the like.

C++ checking return codes

Is it worthwhile checking return codes for methods that should not fail ?
For example, I usually do:
char buf[MAXBUF];
snprintf(buf, sizeof(MAXBUF), "%s.%d", str, time);
Is it good practice to check the return code for snprintf even if I know
that MAXBUF is large enough for my purposes? It seems to make sense to do this
even though the code becomes more verbose.
Short Answer: Yes
Long Answer: Yes because it catches silly mistakes like the below.
char buf[MAXBUF];
snprintf(buf, sizeof(MAXBUF), "%s.%d", str, time);
// sizeof(MAXBUF) is probably equal to sizeof(int)
The main problem with C code is that people don't actually check the return codes (because they thought the code could never fail). So the moral of the story is don't assume and check. It does not actually add much to the code. You should probably exit/abort if things that should not go wrong actually go wrong and then you will find them early in the testing cycle.
C++ solution:
std::stringstream buf;
buf << str << "." << time; // No chance of error as compiler does the work.
It depends. Is it possible that either MAXBUF or the format string or the input values are ever going to change in the future? What realistic course of action could your code take if the call were to fail? The answer depends entirely on your application.
One possibility is to simply assert that the return values are as expected, rather than failing silently. This will cost you nothing in production builds, and will add little to the verbosity of your source code.
It would probably be best, just incase, but if your sure the size of MAXBUF will never be exceeded then it will only be an added few clock cycles.
You are weighing the onetime cost of a simple error check when the code is written, versus the repeated cost of deciding whether or not to check it depending on context, and if not, possible production bugs due to misunderstood assumptions or later maintenance by other people.
That's a no brainer in most cases.
If the buffer ever is somehow too small, then the string will be truncated. You'd hope that your tests will catch this, since you will produce incorrect results.
Then again, checking the return value needn't add much verbosity. As Oli says, assert is cheap:
int result = snprintf(buf, sizeof buf, "%s.%d", str, time);
assert(result >= 0 && result <= (sizeof buf) - 1);
To be honest I wouldn't always check, but it depends why I think str can't be that long. If it's for a really fundamental reason (like it's a filename from a dirent structure, and MAXBUF is defined in terms of MAX_FILENAME), then I probably wouldn't bother. If it's because there's some check elsewhere, or it's the caller's responsibility to pass in a string only of a certain length, then it might be an idea to assert, just on the off-chance of catching someone else's bug some day. Obviously if str is any kind of unchecked external input then it's essential to test.

c++ what happens if you print more characters with sprintf, than the char pointer has allocated?

I assume this is a common way to use sprintf:
char pText[x];
sprintf(pText, "helloworld %d", Count );
but what exactly happens, if the char pointer has less memory allocated, than it will be print to?
i.e. what if x is smaller than the length of the second parameter of sprintf?
i am asking, since i get some strange behaviour in the code that follows the sprintf statement.
It's not possible to answer in general "exactly" what will happen. Doing this invokes what is called Undefined behavior, which basically means that anything might happen.
It's a good idea to simply avoid such cases, and use safe functions where available:
char pText[12];
snprintf(pText, sizeof pText, "helloworld %d", count);
Note how snprintf() takes an additional argument that is the buffer size, and won't write more than there is room for.
This is a common error and leads to memory after the char array being overwritten. So, for example, there could be some ints or another array in the memory after the char array and those would get overwritten with the text.
See a nice detailed description about the whole problem (buffer overflows) here. There's also a comment that some architectures provide a snprintf routine that has a fourth parameter that defines the maximum length (in your case x). If your compiler doesn't know it, you can also write it yourself to make sure you can't get such errors (or just check that you always have enough space allocated).
Note that the behaviour after such an error is undefined and can lead to very strange errors. Variables are usually aligned at memory locations divisible by 4, so you sometimes won't notice the error in most cases where you have written one or two bytes too much (i.e. forget to make place for a NUL), but get strange errors in other cases. These errors are hard to debug because other variables get changed and errors will often occur in a completely different part of the code.
This is called a buffer overrun.
sprintf will overwrite the memory that happens to follow pText address-wise. Since pText is on the stack, sprintf can overwrite local variables, function arguments and the return address, leading to all sorts of bugs. Many security vulnerabilities result from this kind of code — e.g. an attacker uses the buffer overrun to write a new return address pointing to his own code.
The behaviour in this situation is undefined. Normally, you will crash, but you might also see no ill effects, strange values appearing in unrelated variables and that kind of thing. Your code might also call into the wrong functions, format your hard-drive and kill other running programs. It is best to resolve this by allocating more memory for your buffer.
I have done this many times, you will receive memory corruption error. AFAIK, I remember i have done some thing like this:-
vector<char> vecMyObj(10);
vecMyObj.resize(10);
sprintf(&vecMyObj[0],"helloworld %d", count);
But when destructor of vector is called, my program receive memory corruption error, if size is less then 10, it will work successfully.
Can you spell Buffer Overflow ? One possible result will be stack corruption, and make your app vulnerable to Stack-based exploitation.