Safe Use of strcpy - c++

Plain old strcpy is prohibited in its use in our company's coding standard because of its potential for buffer overflows. I was looking the source for some 3rd Party Library that we link against in our code. The library source code has a use of strcpy like this:
for (int i = 0; i < newArgc; i++)
{
newArgv[i] = new char[strlen(argv[i]) + 1];
strcpy(newArgv[i], argv[i]);
}
Since strlen is used while allocating memory for the buffer to be copied to, this looks fine. Is there any possible way someone could exploit this normal strcpy, or is this safe as I think it looks to be?
I have seen naïve uses of strcpy that lead to buffer overflow situations, but this does not seem to have that since it is always allocating the right amount of space for the buffer using strlen and then copying to that buffer using the argv[] as the source which should always be null terminated.
I am honestly curious if someone running this code with a debugger could exploit this or if there are any other tactics someone that was trying to hack our binary (with this library source that we link against in its compiled version) could use to exploit this use of strcpy. Thank you for your input and expertise.

It is possible to use strcpy safely - it's just quite hard work (which is why your coding standards forbid it).
However, the code you have posted is not a vulnerability. There is no way to overwrite bits of memory with it; I would not bother rewriting it. (If you do decide to rewrite it, use std::string instead.

Well, there are multiple problems with that code:
If an allocation throws, you get a memory-leak.
Using strcpy() instead of reusing the length is sub-optimal. Use std::copy_n() or memcpy() instead.
Presumably, there are no data-races, not that we can tell.
Anyway, that slight drop in performance is the only thing "wrong" with using strcpy() there. At least if you insist on manually managing your strings yourself.

Deviating from a coding standard should always be possible, but then document well why you decided to do so.
The main problem with strcpy is that it has no length limitation. When taking care this is no problem, but it means that strcpy always is to be accompanied with some safeguarding code. Many less experienced coders have fallen into this pitfall, hence the coding guideline came into practice.
Possible ways to handle string copy safely are:
Check the string length
Use a safe variant like strlcpy, or on older Microsoft compilers, strncpy_s.

As a general strcpy replacement idiom, assuming you're okay with the slight overhead of the print formatting functions, use snprintf:
snprintf(dest, dest_total_buffer_length, "%s", source);
e.g.
snprintf(newArgv[i], strlen(argv[i]) + 1, "%s", argv[i]);
It's safe, simple, and you don't need to think about the +1/-1 size adjustment.

Related

How do I avoid using a constant for filename size?

It seems like standard programming practice and the POSIX standard are at odds with each other. I'm working with a program and I noticed that I see a lot of stuff like:
char buf[NAME_MAX + 1]
And I'm also seeing that a lot of operating systems don't define NAME_MAX and say that that they technically don't have to according to POSIX because you're supposed to use pathconf to get the value it's configured to at runtime rather than hard-coding it as a constant anyway.
The problem is that the compiler won't let me use pathconf this way with arrays. Even if I try storing the result of pathconf in a const int, it still throws a fit and says it has to be a constant. So it looks like in order to actually use pathconf, I would have to avoid using an array of chars for the buffer here because that apparently isn't good enough. So I'm caught between a rock and a hard place, because the C++ standard seemingly won't allow me to do what POSIX says I must do, that is determine the size of a character buffer for a filename at runtime rather than compile time.
The only information I've been able to find on this suggests that I would need to replace the array with a vector, but it's not clear how I would do it. When I test using a simple program, I can get this to work:
std::vector<char> buf((pathconf("/", _PC_NAME_MAX) + 1));
And then I can figure out the size by calling buf.size() or something. But I'm not sure if this is the right approach at all. Does anyone have any experience with trying to get a program to stop depending on constants like NAME_MAX or MAXNAMLEN being defined in the system headers and getting the implementation to use pathconf at runtime instead?
Halfway measures do tend to result in conflicts of some sort.
const usigned NAME_MAX = /* get the value at runtime */;
char buf[NAME_MAX + 1];
The second line declares a C-style array (presumably) intended to hold a C-style string. In C, this is fine. In C++, there is an issue because the value of NAME_MAX is not known at compile time. That's why I called this a halfway measure—there is a mix of C-style code and C++ compiling. (Some compilers will allow this in C++. Apparently yours does not.)
The C++ approach would use C++-style strings, as in:
std::string buf;
That's it. The size does not need to be specified since memory will be allocated as needed, provided you avoid C-style interfaces. Use streaming (>>) when reasonable. If the buffer is being filled by user or file input, this should be all you need.
If you need to use C-style strings (perhaps this buffer is being filled by a system call written for C?), there are a few options for allocating the needed space. The simplest is probably a vector, much like you were thinking.
std::vector<char> buf{NAME_MAX + 1};
system_call(buf.data()); // Send a char* to the system call.
Alternatively, you could use a C++-style string, which could make manipulating the data more convenient.
std::string buf{NAME_MAX + 1, '\0'};
system_call(buf.data()); // Send a char* to the system call.
There is also a smart pointer option, but the vector approach might play nicer with existing code written for a C-style array.

Safe counterparts of itoa()?

I am converting some old c program to a more secure version. The following functions are used heavily, could anyone tell me their secure counterparts? Either windows functions or C runtime library functions. Thanks.
itoa()
getchar()
strcat()
memset()
itoa() is safe as long as the destination buffer is big enough to receive the largest possible representation (i.e. of INT_MIN with trailing NUL). So, you can simply check the buffer size. Still, it's not a very good function to use because if you change your data type to a larger integral type, you need to change to atol, atoll, atoq etc.. If you want a dynamic buffer that handles whatever type you throw at it with less maintenance issues, consider an std::ostringstream (from the <sstream> header).
getchar() has no "secure counterpart" - it's not insecure to begin with and has no buffer overrun potential.
Re memset(): it's dangerous in that it accepts the programmers judgement that memory should be overwritten without any confirmation of the content/address/length, but when used properly it leaves no issue, and sometimes it's the best tool for the job even in modern C++ programming. To check security issues with this, you need to inspect the code and ensure it's aimed at a suitable buffer or object to be 0ed, and that the length is computed properly (hint: use sizeof where possible).
strcat() can be dangerous if the strings being concatenated aren't known to fit into the destination buffer. For example: char buf[16]; strcpy(buf, "one,"); strcat(buf, "two"); is all totally safe (but fragile, as further operations or changing either string might require more than 16 chars and the compiler won't warn you), whereas strcat(buf, argv[0]) is not. The best replacement tends to be a std::ostringstream, although that can require significant reworking of the code. You may get away using strncat(), or even - if you have it - asprintf("%s%s", first, second), which will allocate the required amount of memory on the heap (do remember to free() it). You could also consider std::string and use operator+ to concatenate strings.
None of these functions are "insecure" provided you understand the behaviour and limitations. itoa is not standard C and should be replaced with sprintf("%d",...) if that's a concern to you.
The others are all fine to the experienced practitioner. If you have specific cases which you think may be unsafe, you should post them.
I'd change itoa(), because it's not standard, with sprintf or, better, snprintf if your goal is code security. I'd also change strcat() with strncat() but, since you specified C++ language too, a really better idea would be to use std::string class.
As for the other two functions, I can't see how you could make the code more secure without seeing your code.

the way to compare string in an efficient way in C++

Is it efficient to compare a string with another string or string literal like this?
string a;
string b;
if (a == "test")
or
if (a == b)
My coworker asked me to use memcmp
Any comments about this?
Thanks.
Yes use a == b, do not listen to your co-worker.
You should always prefer code readability and using STL over using C functions unless you have a specific bottleneck in your program that you need to optimize and you have proven that it is truly a bottleneck.
Obviously you should use a == b and rely on its implementation.
For the record, std::char_traits<char>::compare() in a popular implementation relies on memcmp(), so calling it directly would only be more painful and error-prone.
If you really need to know, you should write a test-application and see what the timing is.
That being said, you should rely on the provided implementation being quite efficient. It usually is.
I think your coworker is a bit hooked up on possible optimization.
memcmp isn't intended to compare strings (that would be strcmp)
to only compare upto the size of the shortest string, you would need strlen on both strings
memcmp returns <0, =0, >0, which is a nuisance to always remember
strcmp and strlen can cause weird behaviour with bad c-style strings (not ending with \0 or null)
It's less efficient. std::string::operator== can do one very quick check, for equal length. If the sting lengths aren't equal (quite common), it can return false without looking at even one character.
In C, memcmp must be told the length to compare, which means you need to call strlen twice, and that looks at all characters in both strings.
STL best practice is to always prefer member functions to perform a given task. In this case that's basic_string::operator==.
Your coworker needs to think a bit more in C++ and get away from the CRT. Sometimes I think this is just caused by fear of the unknown - if you can educate on C++ options, perhaps you will have an easier time.
Only If Speed is Very Important
Use strings of fixed size (32-64 bytes is very good), initialized to all zeros and then filled with string data.
(Note that here, by "string" I mean raw C code or your own custom string class, not the std::string class.)
Use memcpy and memcmp to compare these strings always using the fixed buffer size.
You can get even faster than memcmp if you make sure your string buffers are 16-byte aligned so you can use SSE2 and you only need to test for equality and not greater or less-than. Even without SSE2 you can do an equality compare using subtraction in word-sized chunks.
The reason that these techniques speed things up is that they remove the byte-by-byte comparison test from the equation. Looking for the terminating '\0' or the byte that is different is expensive because test-and-branch is hard to predict and pipeline.
Maybe or maybe not
If your C++ implementation uses a highly optimized memcmp (as GCC has) and
it's C++ string comparison does the trivial while(*p++ == *q++) ... equivalent,
then, yes, memcmp would be faster on large strings because it utilizes multiple character comparisons at a time and aligned 32bit loads.
On shorter strings, these optimizations wouldn't be visible in the timings - but on larger strings (some 10K or so), the speedup should be clearly visible.
The Answer: it depends ;-) Check your C++ strings implementation.
Regards
rbo

Why is snprintf faster than ostringstream or is it?

I read somewhere that snprintf is faster than ostringstream. Has anyone has any experiences with it? If yes why is it faster.
std::ostringstream is not required to be slower, but it is generally slower when implemented. FastFormat's website has some benchmarks.
The Standard library design for streams supports much more than snprintf does. The design is meant to be extensible, and includes protected virtual methods that are called by the publicly exposed methods. This allows you to derive from one of the stream classes, with the assurance that if you overload the protected method you will get the behavior you want. I believe that a compiler could avoid the overhead of the virtual function call, but I'm not aware of any compilers that do.
Additionally, stream operations often use growable buffers internally; which implies relatively slow memory allocations.
We replaced some stringstreams in inner loops with sprintf (using statically allocated buffers), and this made a big difference, both in msvc and gcc. I imagine that the dynamic memory management of this code:
{
char buf[100];
int i = 100;
sprintf(buf, "%d", i);
// do something with buf
}
is much simpler than
{
std::stringstream ss;
int i = 100;
ss << i;
std::string s = ss.str();
// do something with s
}
but i am very happy with the overall performance of stringstreams.
Some guys would possibly tell you about that the functions can't be faster than each other, but their implementation can. That's right i think i would agree.
You are unlikely to ever notice a difference in other than benchmarks. The reason that c++ streams generally tend to be slower is that they are much more flexible. Flexibility most often comes at the cost of either time or code growth.
In this case, C++ streams are based on stream-buffers. In itself, streams are just the hull that keep formatting and error flags in place, and call the right i/o facets of the c++ standard library (for example, num_put to print numbers), that print the values, well formatted, into the underlying stream-buffer connected to the c++ stream.
All this mechanisms - the facets, and the buffers, are implemented by virtual functions. While there is indeed no mark note, those functions must be implemented to be slower than c stdio pendants that fact will make them somewhat slower than using c stdio functions normally (i benchmark'ed that some time ago with gcc/libstdc++ and in fact noticed a slowdown - but which you hardly notice in day-by-day usage).
Absolutely this is implementation-specific.
But if you really want to know, write two small programs, and compare them. You would need to include typical usage for what you have in mind, the two programs would need to generate the same string, and you would use a profiler to look at the timing information.
Then you would know.
One issue would probably be that the type safety added by ostringstream carries extra overhead. I've not done any measurements, though.
As litb said, standard streams support many things we don't always need.
Some streams implementation get rid of this never used flexibility, see FAStream for instance.
It's quite possible that because sprintf is part of the CRT that is written in assembly. The ostringstream is part of the STL, and probably a little more generically written, and has OOP code/overhead to deal with.
Yes, if you run the function below on a few million numbers with Visual C++ 5.0, the first version takes about twice as long as the second and produces the same output.
Compiling tight loops into a .exe and running the Windows timethis something.exe' or the Linuxtime something' is how I investigate most of my performance curiosities. (`timethis' is available on the web somewhere)
void Hex32Bit(unsigned int n, string &result)
{
#if 0
stringstream ss;
ss
<< hex
<< setfill('0')
<< "0x" << setw(8) << n
;
result = ss.str();
#else
const size_t len = 11;
char temp[len];
_snprintf(temp, len, "0x%08x", n);
temp[len - 1] = '\0';
result = temp;
#endif
}
One reason I know that the printf family of functions are faster than the corresponding C++ functions (cout, cin, and other streams) is that the latter do typechecking. As this usually involves some requests to overloaded operators, it can take some time.
In fact, in programming competitions it is often recommended that you use printf et al rather than cout/cin for precisely this reason.

Using sprintf without a manually allocated buffer

In the application that I am working on, the logging facility makes use of sprintf to format the text that gets written to file. So, something like:
char buffer[512];
sprintf(buffer, ... );
This sometimes causes problems when the message that gets sent in becomes too big for the manually allocated buffer.
Is there a way to get sprintf behaviour without having to manually allocate memory like this?
EDIT: while sprintf is a C operation, I'm looking for C++ type solutions (if there are any!) for me to get this sort of behaviour...
You can use asprintf(3) (note: non-standard) which allocates the buffer for you so you don't need to pre-allocate it.
No you can't use sprintf() to allocate enough memory. Alternatives include:
use snprintf() to truncate the message - does not fully resolve your problem, but prevent the buffer overflow issue
double (or triple or ...) the buffer - unless you're in a constrained environment
use C++ std::string and ostringstream - but you'll lose the printf format, you'll have to use the << operator
use Boost Format that comes with a printf-like % operator
I dont also know a version wich avoids allocation, but if C99 sprintfs allows as string the NULL pointer. Not very efficient, but this would give you the complete string (as long as enough memory is available) without risking overflow:
length = snprintf(NULL, ...);
str = malloc(length+1);
snprintf(str, ...);
"the logging facility makes use of sprintf to format the text that gets written to file"
fprintf() does not impose any size limit. If you can write the text directly to file, do so!
I assume there is some intermediate processing step, however. If you know how much space you need, you can use malloc() to allocate that much space.
One technique at times like these is to allocate a reasonable-size buffer (that will be large enough 99% of the time) and if it's not big enough, break the data into chunks that you process one by one.
With the vanilla version of sprintf, there is no way to prevent the data from overwriting the passed in buffer. This is true regardless of wether the memory was manually allocated or allocated on the stack.
In order to prevent the buffer from being overwritten you'll need to use one of the more secure versions of sprintf like sprintf_s (windows only)
http://msdn.microsoft.com/en-us/library/ybk95axf.aspx