Is it worthwhile checking return codes for methods that should not fail ?
For example, I usually do:
char buf[MAXBUF];
snprintf(buf, sizeof(MAXBUF), "%s.%d", str, time);
Is it good practice to check the return code for snprintf even if I know
that MAXBUF is large enough for my purposes? It seems to make sense to do this
even though the code becomes more verbose.
Short Answer: Yes
Long Answer: Yes because it catches silly mistakes like the below.
char buf[MAXBUF];
snprintf(buf, sizeof(MAXBUF), "%s.%d", str, time);
// sizeof(MAXBUF) is probably equal to sizeof(int)
The main problem with C code is that people don't actually check the return codes (because they thought the code could never fail). So the moral of the story is don't assume and check. It does not actually add much to the code. You should probably exit/abort if things that should not go wrong actually go wrong and then you will find them early in the testing cycle.
C++ solution:
std::stringstream buf;
buf << str << "." << time; // No chance of error as compiler does the work.
It depends. Is it possible that either MAXBUF or the format string or the input values are ever going to change in the future? What realistic course of action could your code take if the call were to fail? The answer depends entirely on your application.
One possibility is to simply assert that the return values are as expected, rather than failing silently. This will cost you nothing in production builds, and will add little to the verbosity of your source code.
It would probably be best, just incase, but if your sure the size of MAXBUF will never be exceeded then it will only be an added few clock cycles.
You are weighing the onetime cost of a simple error check when the code is written, versus the repeated cost of deciding whether or not to check it depending on context, and if not, possible production bugs due to misunderstood assumptions or later maintenance by other people.
That's a no brainer in most cases.
If the buffer ever is somehow too small, then the string will be truncated. You'd hope that your tests will catch this, since you will produce incorrect results.
Then again, checking the return value needn't add much verbosity. As Oli says, assert is cheap:
int result = snprintf(buf, sizeof buf, "%s.%d", str, time);
assert(result >= 0 && result <= (sizeof buf) - 1);
To be honest I wouldn't always check, but it depends why I think str can't be that long. If it's for a really fundamental reason (like it's a filename from a dirent structure, and MAXBUF is defined in terms of MAX_FILENAME), then I probably wouldn't bother. If it's because there's some check elsewhere, or it's the caller's responsibility to pass in a string only of a certain length, then it might be an idea to assert, just on the off-chance of catching someone else's bug some day. Obviously if str is any kind of unchecked external input then it's essential to test.
Related
I understand that at runtime, const char text[] = "some char array" is the same as const char text[16] = "some char array".
Is there any difference in compile time? I reckon there would be, as telling the compiler how many elements there are in the array ought to be faster than it counting the number of elements. I understand that, even if it makes a difference, it will be inconceivably small, but curiosity got the better of me, so I thought I'd ask.
You should favour readable code that reduces cognitive load over code that might compile faster. Your assumption that something compiles faster could as well be the other way round (see below). You simply do not know the compiler implementation and in general, the difference in compile speeds is negligible.
In the case of a constant string (that you never reassign a value to), I would omit the length, as it adds clutter and the compiler is perfectly able to determine the length it needs.
You can also reason that adding the number is slower. After all, the compiler needs to parse the string and thus knows its length anyway. Adding the 16 in the declaration forces the compiler to also parse a number and check whether the string ain't too long. That might make compilation slower. Who knows? But again: the difference is likely negligible, compared to all the other wonders that compilers do (and quite efficiently). So don't worry about it.
I am using a lot of string functions like strncpy, strncat, sprintf etc. in my code. I know there are better alternatives to these, but I was handed over an old project, where these functions were used, so I have to stick with them for compatibility and consistency. My supervisor is very fussy about error checking and robustness, and insists that I check for buffer-overflow violations everytime I use these functions. This has created a lot of if-else statements in my code, which do not look pretty. My question is, is it really necessary to check for overflow everytime I call one of these functions? Even if I know that a buffer overflow can't possibly occur e.g. when storing an integer in a string using the sprintf function
sprintf(buf,"%d",someInteger);
I know that the maximum length of an unsigned integer on a 64-bit system can be 20 digits. buf on the other hand is well over 20 characters long. Should I still check for buffer overflow in this case?
I think the way to go is using exceptions. Exceptions are very useful when you must decouple the normal control-flow of a program and error-checking.
What you can do is create a wrapper for every string function in which you perform the error checking and throw an exception if a buffer overflow would occur.
Then, in your client code, you can simply call your wrappers inside a try block, and then check for exceptions and return error codes inside the catch block.
Sample code (not tested):
int sprintf_wrapper( char *buffer, int buffer_size, const char *format, ... )
{
if( /* check for buffer overflow */ )
throw my_buffer_exception;
va_list arg_ptr;
va_start( arg_ptr, format );
int ret = sprintf( buffer, , format, arg_ptr );
va_end(arg_ptr);
return ret;
}
Error foo()
{
//...
try{
sprintf_wrapper(buf1, 100, "%d", i1);
sprintf_wrapper(buf2, 100, "%d", i2);
sprintf_wrapper(buf3, 100, "%d", i3);
}
catch( my_buffer_exception& )
{
return err_code;
}
}
Maybe write a test case that you can invoke to simply test the buffer to reduce code duplication and ugliness.
You could abstract the if/else statements into a method of another class, and then pass in the buffer and length expected.
By nature, these buffers are VERY susceptible to overwrites, so be careful ANYTIME you take input in from a user/outside source. You could also try getting a string length (using strlen), or checking for the /0 end string character yourself, and comparing that to the buffer size. If you loop for the /0 character,and it's not there, you will get into an infinite loop if you don't constrain the max size of your loop by the expected buffer size, so check for this too.
Another option, is to refactor code, such that every time those methods are used, you replace them with a length safe version you write, where it calls a method with those checks already in place (but have to pass the buffer size to it). This may not be possible for some projects, as the complexity may be very hard to unit test.
Let me address your last paragraph first: You write code once, in contrast to how long it will be maintained and used. Guess how long you think your code will be in use, and then multiply that by 10-20 to figure out how long it will actually be in use. At the end of that window it's completely likely that an integer could be much bigger and overflow you buffer, so yes you must do buffer checking.
Given that you have a few options:
Use the "n" series of functions like snprintf to prevent buffer overflows and tell your users that it's undefined what will happen if the buffers overflow.
Consider it fatal and either abort() or throw an uncaught exception when a length violation occurs.
Try to notify the user there's a problem and either abort the operation or attempt to let the user modify input and retry.
The first two approaches are definitely going to be easier to implement and maintain because you don't have to worry about getting the right information back to the user in a reasonable way. In any of the cases you could most likely factored into a function as suggested in other answers.
Finally let me say since you've tagged this question C++ and not C, think long and hard about slowly migrating your code base to C++ (because your code base is C right now) and utilize the C++ facilities which then totally remove the need for these buffer checks, as it will happen automatically for you.
You can use gcc "-D_FORTIFY_SOURCE=1 D_FORTIFY_SOURCE=2" for buffer overflow detection.
https://securityblog.redhat.com/2014/03/26/fortify-and-you/
I've recently heard that in some cases, programmers believe that you should never use literals in your code. I understand that in some cases, assigning a variable name to a given number can be helpful (especially in terms of maintenance if that number is used elsewhere). However, consider the following case studies:
Case Study 1: Use of Literals for "special" byte codes.
Say you have an if statement that checks for a specific value stored in (for the sake of argument) a uint16_t. Here are the two code samples:
Version 1:
// Descriptive comment as to why I'm using 0xBEEF goes here
if (my_var == 0xBEEF) {
//do something
}
Version 2:
const uint16_t kSuperDescriptiveVarName = 0xBEEF;
if (my_var == kSuperDescriptiveVarName) {
// do something
}
Which is the "preferred" method in terms of good coding practice? I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once. Also, does the compiler do any optimizations to make both versions effectively the same executable code? That is, are there any performance implications here?
Case Study 2: Use of sizeof
I fully understand that using sizeof versus a raw literal is preferred for portability and also readability concerns. Take the two code examples into account. The scenario is that you are computing the offset into a packet buffer (an array of uint8_t) where the first part of the packet is stored as my_packet_header, which let's say is a uint32_t.
Version 1:
const int offset = sizeof(my_packet_header);
Version 2:
const int offset = 4; // good comment telling reader where 4 came from
Clearly, version 1 is preferred, but what about for cases where you have multiple data fields to skip over? What if you have the following instead:
Version 1:
const int offset = sizeof(my_packet_header) + sizeof(data_field1) + sizeof(data_field2) + ... + sizeof(data_fieldn);
Version 2:
const int offset = 47;
Which is preferred in this case? Does is still make sense to show all the steps involved with computing the offset or does the literal usage make sense here?
Thanks for the help in advance as I attempt to better my code practices.
Which is the "preferred" method in terms of good coding practice? I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once.
Sounds like you understand the main point... factoring values (and their comments) that are used in multiple places. Further, it can sometimes help to have a group of constants in one place - so their values can be inspected, verified, modified etc. without concern for where they're used in the code. Other times, there are many constants used in proximity and the comments needed to properly explain them would obfuscate the code in which they're used.
Countering that, having a const variable means all the programmers studying the code will be wondering whether it's used anywhere else, keeping it in mind as they inspect the rest of the scope in which it's declared etc. - the less unnecessary things to remember the surer the understanding of important parts of the code will be.
Like so many things in programming, it's "an art" balancing the pros and cons of each approach, and best guided by experience and knowledge of the way the code's likely to be studied, maintained, and evolved.
Also, does the compiler do any optimizations to make both versions effectively the same executable code? That is, are there any performance implications here?
There's no performance implications in optimised code.
I fully understand that using sizeof versus a raw literal is preferred for portability and also readability concerns.
And other reasons too. A big factor in good programming is reducing the points of maintenance when changes are done. If you can modify the type of a variable and know that all the places using that variable will adjust accordingly, that's great - saves time and potential errors. Using sizeof helps with that.
Which is preferred [for calculating offsets in a struct]? Does is still make sense to show all the steps involved with computing the offset or does the literal usage make sense here?
The offsetof macro (#include <cstddef>) is better for this... again reducing maintenance burden. With the this + that approach you illustrate, if the compiler decides to use any padding your offset will be wrong, and further you have to fix it every time you add or remove a field.
Ignoring the offsetof issues and just considering your this + that example as an illustration of a more complex value to assign, again it's a balancing act. You'd definitely want some explanation/comment/documentation re intent here (are you working out the binary size of earlier fields? calculating the offset of the next field?, deliberately missing some fields that might not be needed for the intended use or was that accidental?...). Still, a named constant might be enough documentation, so it's likely unimportant which way you lean....
In every example you list, I would go with the name.
In your first example, you almost certainly used that special 0xBEEF number at least twice - once to write it and once to do your comparison. If you didn't write it, that number is still part of a contract with someone else (perhaps a file format definition).
In the last example, it is especially useful to show the computation that yielded the value. That way, if you encounter trouble down the line, you can easily see either that the number is trustworthy, or what you missed and fix it.
There are some cases where I prefer literals over named constants though. These are always cases where a name is no more meaningful than the number. For example, you have a game program that plays a dice game (perhaps Yahtzee), where there are specific rules for specific die rolls. You could define constants for One = 1, Two = 2, etc. But why bother?
Generally it is better to use a name instead of a value. After all, if you need to change it later, you can find it more easily. Also it is not always clear why this particular number is used, when you read the code, so having a meaningful name assigned to it, makes this immediately clear to a programmer.
Performance-wise there is no difference, because the optimizers should take care of it. And it is rather unlikely, even if there would be an extra instruction generated, that this would cause you troubles. If your code would be that tight, you probably shouldn't rely on an optimizer effect anyway.
I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once.
I think kSuperDescriptiveVarName will definitely be used more than once. One for check and at least one for assignment, maybe in different part of your program.
There will be no difference in performance, since an optimization called Constant Propagation exists in almost all compilers. Just enable optimization for your compiler.
I have a super-simple class representing a decimal # with fixed precision, and when I want to format it I do something like this:
assert(d.DENOMINATOR == 1000000);
char buf[100];
sprintf(buf, "%d.%06d", d._value / d.DENOMINATOR, d._value % d.DENOMINATOR);
Astonishingly (to me at least) this does not work. The %06d term comes out all 0s even when d.DENOMINATOR does not evenly divide d._value. And if I throw an extra %d in the format string, I see the right value show up in the third spot -- so it's like something is secretly creating an extra argument between my two.
If I compute the two terms outside of the call to sprintf, everything behaves how I expect. I thought to reproduce this with a more simple test case:
char testa[200];
char testb[200];
int x = 12345, y = 1000;
sprintf(testa, "%d.%03d", x/y, x%y);
int term1 = x/y, term2 = x%y;
sprintf(testb, "%d.%03d", term1, term2);
...but this works properly. So I'm completely baffled as to exactly what's going on, how to avoid it in the future, etc. Can anyone shed light on this for me?
(EDIT: Problem ended up being that d._value and d.DENOMINATOR are both long longs so %d doesn't suffice. Thanks very much to Serge's comment below which pointed to the problem, and Mark's answer submitted shortly thereafter.)
Almost certainly your term components are a 64-bit type (perhaps long on a 64-bit system) which is getting passed into the non-type-safe sprintf. Thus when you create an intermediate int the size is right and it works fine.
g++ will warn about this and many other useful things with -Wall. The preferred solution is of course to use C++ iostreams for your formatting as they're totally type safe.
The alternate solution is to cast the result of your expression to the type that you told sprintf to expect so it pulls the proper number of bytes out of memory.
Finally, never use sprintf when almost every compiler supports snprintf which prevents all sorts of silly mistakes. Your code is fine now but when someone modifies it later and it runs off the end of the buffer you may spend days tracking down the corruption.
Every now and then, especially when doing 64bit builds of some code base, I notice that there are plenty of cases where integer overflows are possible. The most common case is that I do something like this:
// Creates a QPixmap out of some block of data; this function comes from library A
QPixmap createFromData( const char *data, unsigned int len );
const std::vector<char> buf = createScreenShot();
return createFromData( &buf[0], buf.size() ); // <-- warning here in 64bit builds
The thing is that std::vector::size() nicely returns a size_t (which is 8 bytes in 64bit builds) but the function happens to take an unsigned int (which is still only 4 bytes in 64bit builds). So the compiler warns correctly.
If possible, I try to fix up the signatures to use the correct types in the first place. However, I'm often hitting this problem when combining functions from different libraries which I cannot modify. Unfortunately, I often resort to some reasoning along the lines of "Okay, nobody will ever do a screenshot generating more than 4GB of data, so why bother" and just change the code to do
return createFromData( &buf[0], static_cast<unsigned int>( buf.size() ) );
So that the compiler shuts up. However, this feels really evil. So I've been considering to have some sort of runtime assertion which at least yields a nice error in the debug builds, as in:
assert( buf.size() < std::numeric_limits<unsigned int>::maximum() );
This is a bit nicer already, but I wonder: how do you deal with this sort of problem, that is: integer overflows which are "almost" impossible (in practice). I guess that means that they don't occur for you, they don't occur for QA - but they explode in the face of the customer.
If you can't fix the types (because you can't break library compatibility), and you're "confident" that the size will never get that big, you can use boost::numeric_cast in place of the static_cast. This will throw an exception if the value is too big.
Of course the surrounding code then has to do something vaguely sensible with the exception - since it's a "not expected ever to happen" condition, that might just mean shutting down cleanly. Still better than continuing with the wrong size.
The solution depends on context. In some cases, you know where the data
comes from, and can exclude overflow: an int that is initialized with
0 and incremented once a second, for example, isn't going to overflow
anytime in the lifetime of the machine. In such cases, you just convert
(or allow the implicit conversion to do its stuff), and don't worry
about it.
Another type of case is fairly similar: in your case, for example, it's
probably not reasonable for a screen schot to have more data that can be
represented by an int, so the conversion is also safe. Provided the
data really did come from a screen shot; in such cases, the usual
procedure is to validate the data on input, ensuring that it fulfills
your constraints downstream, and then do no further validation.
Finally, if you have no real control over where the data is coming from,
and can't validate on entry (at least not for your constraints
downstream), you're stuck with using some sort of checking conversion,
validating immediately at the point of conversion.
If you push a 64-bit overflowing number into a 32-bit library you open pandora's box -- undefined behaviour.
Throw an exception. Since exceptions can in general spring up arbitrarily anywhere you should have suitable code to catch it anyway. Given that, you may as well exploit it.
Error messages are unpleasant but they're better than undefined behaviour.
Such scenarios can be held in one of four ways or using a combination of them:
use right types
use static assertions
use runtime assertions
ignore until hurts
Usually the best is to use right types right until your code gets ugly and then roll in static assertions. Static assertions are much better than runtime assertions for this very purpose.
Finally, when static assertions won't work (like in your example) you use runtime assertions - yes, they get into customers' faces, but at least your program behaves predictably. Yes, customers don't like assertions - they start panic ("we have error!" in all caps), but without an assertion the program would likely misbehave and no way to easily diagnose the problem would be.
One thing just came to my mind: since I need some sort of runtime check (whether or not the value of e.g. buf.size() exceeds the range of unsigned int can only be tested at runtime), but I do not want to have a million assert() invocations everywhere, I could do something like
template <typename T, typename U>
T integer_cast( U v ) {
assert( v < std::numeric_limits<T>::maximum() );
return static_cast<T>( v );
}
That way, I would at least have the assertion centralized, and
return createFromData( &buf[0], integer_cast<unsigned int>( buf.size() ) );
Is a tiny bit better. Maybe I should rather throw an exception (it is quite exceptional indeed!) instead of assert'ing, to give the caller a chance to handle the situation gracefully by rolling back previous work and issueing diagnostic output or the like.