Safe counterparts of itoa()?

Safe counterparts of itoa()? - c++

I am converting some old c program to a more secure version. The following functions are used heavily, could anyone tell me their secure counterparts? Either windows functions or C runtime library functions. Thanks.
itoa()
getchar()
strcat()
memset()

itoa() is safe as long as the destination buffer is big enough to receive the largest possible representation (i.e. of INT_MIN with trailing NUL). So, you can simply check the buffer size. Still, it's not a very good function to use because if you change your data type to a larger integral type, you need to change to atol, atoll, atoq etc.. If you want a dynamic buffer that handles whatever type you throw at it with less maintenance issues, consider an std::ostringstream (from the <sstream> header).
getchar() has no "secure counterpart" - it's not insecure to begin with and has no buffer overrun potential.
Re memset(): it's dangerous in that it accepts the programmers judgement that memory should be overwritten without any confirmation of the content/address/length, but when used properly it leaves no issue, and sometimes it's the best tool for the job even in modern C++ programming. To check security issues with this, you need to inspect the code and ensure it's aimed at a suitable buffer or object to be 0ed, and that the length is computed properly (hint: use sizeof where possible).
strcat() can be dangerous if the strings being concatenated aren't known to fit into the destination buffer. For example: char buf[16]; strcpy(buf, "one,"); strcat(buf, "two"); is all totally safe (but fragile, as further operations or changing either string might require more than 16 chars and the compiler won't warn you), whereas strcat(buf, argv[0]) is not. The best replacement tends to be a std::ostringstream, although that can require significant reworking of the code. You may get away using strncat(), or even - if you have it - asprintf("%s%s", first, second), which will allocate the required amount of memory on the heap (do remember to free() it). You could also consider std::string and use operator+ to concatenate strings.

None of these functions are "insecure" provided you understand the behaviour and limitations. itoa is not standard C and should be replaced with sprintf("%d",...) if that's a concern to you.
The others are all fine to the experienced practitioner. If you have specific cases which you think may be unsafe, you should post them.

I'd change itoa(), because it's not standard, with sprintf or, better, snprintf if your goal is code security. I'd also change strcat() with strncat() but, since you specified C++ language too, a really better idea would be to use std::string class.
As for the other two functions, I can't see how you could make the code more secure without seeing your code.

Related

How do I avoid using a constant for filename size?

It seems like standard programming practice and the POSIX standard are at odds with each other. I'm working with a program and I noticed that I see a lot of stuff like:
char buf[NAME_MAX + 1]
And I'm also seeing that a lot of operating systems don't define NAME_MAX and say that that they technically don't have to according to POSIX because you're supposed to use pathconf to get the value it's configured to at runtime rather than hard-coding it as a constant anyway.
The problem is that the compiler won't let me use pathconf this way with arrays. Even if I try storing the result of pathconf in a const int, it still throws a fit and says it has to be a constant. So it looks like in order to actually use pathconf, I would have to avoid using an array of chars for the buffer here because that apparently isn't good enough. So I'm caught between a rock and a hard place, because the C++ standard seemingly won't allow me to do what POSIX says I must do, that is determine the size of a character buffer for a filename at runtime rather than compile time.
The only information I've been able to find on this suggests that I would need to replace the array with a vector, but it's not clear how I would do it. When I test using a simple program, I can get this to work:
std::vector<char> buf((pathconf("/", _PC_NAME_MAX) + 1));
And then I can figure out the size by calling buf.size() or something. But I'm not sure if this is the right approach at all. Does anyone have any experience with trying to get a program to stop depending on constants like NAME_MAX or MAXNAMLEN being defined in the system headers and getting the implementation to use pathconf at runtime instead?

Halfway measures do tend to result in conflicts of some sort.
const usigned NAME_MAX = /* get the value at runtime */;
char buf[NAME_MAX + 1];
The second line declares a C-style array (presumably) intended to hold a C-style string. In C, this is fine. In C++, there is an issue because the value of NAME_MAX is not known at compile time. That's why I called this a halfway measure—there is a mix of C-style code and C++ compiling. (Some compilers will allow this in C++. Apparently yours does not.)
The C++ approach would use C++-style strings, as in:
std::string buf;
That's it. The size does not need to be specified since memory will be allocated as needed, provided you avoid C-style interfaces. Use streaming (>>) when reasonable. If the buffer is being filled by user or file input, this should be all you need.
If you need to use C-style strings (perhaps this buffer is being filled by a system call written for C?), there are a few options for allocating the needed space. The simplest is probably a vector, much like you were thinking.
std::vector<char> buf{NAME_MAX + 1};
system_call(buf.data()); // Send a char* to the system call.
Alternatively, you could use a C++-style string, which could make manipulating the data more convenient.
std::string buf{NAME_MAX + 1, '\0'};
system_call(buf.data()); // Send a char* to the system call.
There is also a smart pointer option, but the vector approach might play nicer with existing code written for a C-style array.

C++17 Purpose of std::from_chars and std::to_chars?

Before C++17, there existed a variety of methods to convert integers, floats, and doubles to and from strings. For example, std::stringstream, std::to_string, std::atoi, std::stoi, and others could have been used to accomplish these tasks. To which, there exists plenty of posts discussing the differences between those methods.
However, C++ 17 has now introduced std::from_chars and std::to_chars. To which, I'd like to know the reasons for introducing another means of converting to and from strings.
For one, what advantages and functionality do these new functions provide over the previous methods?
Not only that, but are there any notable disadvantages for this new method of string conversion?

std::stringstream is the heavyweight champion. It takes into consideration things like the the stream's imbued locale, and its functionality involves things like constructing a sentry object for the duration of the formatted operation, in order to deal with exception-related issues. Formatted input and output operations in the C++ libraries have some reputation for being heavyweight, and slow.
std::to_string is less intensive than std::istringstream but it still returns a std::string, whose construction likely involves dynamic allocation (less likely with modern short string optimization techniques, but still likely). And, in most cases the compiler still needs to generate all the verbiage, at the call site, to support a std::string object, including its destructor.
std::to_chars are designed to have as little footprint as possible. You provide the buffer, and std::to_chars does very little beyond actually formatting the numeric value into the buffer, in a specific format, without any locale-specific considerations, with the only overhead of making sure that the buffer is big enough. Code that uses std::to_chars does not need to do any dynamic allocation.
std::to_chars is also a bit more flexible in terms of formatting options, especially with floating point values. std::to_string has no formatting options.
std::from_chars is, similarly, a lightweight parser, that does not need to do any dynamic allocation, and does not need to sacrifice any electrons to deal with locale issues, or overhead of stream operations.

to/from_chars are designed to be elementary string conversion functions. They have two basic advantages over the alternatives.
They are much lighter weight. They never allocate memory (you allocate memory for them). They never throw exceptions. They also never look at the locale, which also improves performance.
Basically, they are designed such that it is impossible to have faster conversion functions at an API level.
These functions could even be constexpr (they aren't, though I'm not sure why), while the more heavyweight allocating and/or throwing versions can't.
They have explicit round-trip guarantees. If you convert a float/double to a string (without a specified precision), the implementation is required to make it so that taking that exact sequence of characters and converting it back into a float/double will produce a binary-identical value. You won't get that guarantee from snprintf, stringstream or to_string/stof.
This guarantee is only good however if the to_chars and from_chars calls are using the same implementation. So you can't expect to send the string across the Internet to some other computer that may be compiled with a different standard library implementation and get the same float. But it does give you on-computer serialization guarantees.

All these pre-existing methods were bound to work based on a so-called locale. A locale is basically a set of formatting options that specify, e.g., what characters count as digits, what symbol to use for the decimal point, what thousand's separator to use, and so on. Very often, however, you don't really need that. If you're just, e.g., reading a JSON file, you know the data is formatted in a particular way, there is no reason to be looking up whether a '.' should be a decimal point or not every time you see one. The new functions introduced in <charconv> are basically hardcoded to read and write numbers based on the formatting laid out for the default C locale. There is no way to change the formatting, but since the formatting doesn't have to be flexible, they can be very fast…

Safe Use of strcpy

Plain old strcpy is prohibited in its use in our company's coding standard because of its potential for buffer overflows. I was looking the source for some 3rd Party Library that we link against in our code. The library source code has a use of strcpy like this:
for (int i = 0; i < newArgc; i++)
{
newArgv[i] = new char[strlen(argv[i]) + 1];
strcpy(newArgv[i], argv[i]);
}
Since strlen is used while allocating memory for the buffer to be copied to, this looks fine. Is there any possible way someone could exploit this normal strcpy, or is this safe as I think it looks to be?
I have seen naïve uses of strcpy that lead to buffer overflow situations, but this does not seem to have that since it is always allocating the right amount of space for the buffer using strlen and then copying to that buffer using the argv[] as the source which should always be null terminated.
I am honestly curious if someone running this code with a debugger could exploit this or if there are any other tactics someone that was trying to hack our binary (with this library source that we link against in its compiled version) could use to exploit this use of strcpy. Thank you for your input and expertise.

It is possible to use strcpy safely - it's just quite hard work (which is why your coding standards forbid it).
However, the code you have posted is not a vulnerability. There is no way to overwrite bits of memory with it; I would not bother rewriting it. (If you do decide to rewrite it, use std::string instead.

Well, there are multiple problems with that code:
If an allocation throws, you get a memory-leak.
Using strcpy() instead of reusing the length is sub-optimal. Use std::copy_n() or memcpy() instead.
Presumably, there are no data-races, not that we can tell.
Anyway, that slight drop in performance is the only thing "wrong" with using strcpy() there. At least if you insist on manually managing your strings yourself.

Deviating from a coding standard should always be possible, but then document well why you decided to do so.
The main problem with strcpy is that it has no length limitation. When taking care this is no problem, but it means that strcpy always is to be accompanied with some safeguarding code. Many less experienced coders have fallen into this pitfall, hence the coding guideline came into practice.
Possible ways to handle string copy safely are:
Check the string length
Use a safe variant like strlcpy, or on older Microsoft compilers, strncpy_s.

As a general strcpy replacement idiom, assuming you're okay with the slight overhead of the print formatting functions, use snprintf:
snprintf(dest, dest_total_buffer_length, "%s", source);
e.g.
snprintf(newArgv[i], strlen(argv[i]) + 1, "%s", argv[i]);
It's safe, simple, and you don't need to think about the +1/-1 size adjustment.

the way to compare string in an efficient way in C++

Is it efficient to compare a string with another string or string literal like this?
string a;
string b;
if (a == "test")
or
if (a == b)
My coworker asked me to use memcmp
Any comments about this?
Thanks.

Yes use a == b, do not listen to your co-worker.
You should always prefer code readability and using STL over using C functions unless you have a specific bottleneck in your program that you need to optimize and you have proven that it is truly a bottleneck.

Obviously you should use a == b and rely on its implementation.
For the record, std::char_traits<char>::compare() in a popular implementation relies on memcmp(), so calling it directly would only be more painful and error-prone.

If you really need to know, you should write a test-application and see what the timing is.
That being said, you should rely on the provided implementation being quite efficient. It usually is.

I think your coworker is a bit hooked up on possible optimization.
memcmp isn't intended to compare strings (that would be strcmp)
to only compare upto the size of the shortest string, you would need strlen on both strings
memcmp returns <0, =0, >0, which is a nuisance to always remember
strcmp and strlen can cause weird behaviour with bad c-style strings (not ending with \0 or null)

It's less efficient. std::string::operator== can do one very quick check, for equal length. If the sting lengths aren't equal (quite common), it can return false without looking at even one character.
In C, memcmp must be told the length to compare, which means you need to call strlen twice, and that looks at all characters in both strings.

STL best practice is to always prefer member functions to perform a given task. In this case that's basic_string::operator==.
Your coworker needs to think a bit more in C++ and get away from the CRT. Sometimes I think this is just caused by fear of the unknown - if you can educate on C++ options, perhaps you will have an easier time.

Only If Speed is Very Important
Use strings of fixed size (32-64 bytes is very good), initialized to all zeros and then filled with string data.
(Note that here, by "string" I mean raw C code or your own custom string class, not the std::string class.)
Use memcpy and memcmp to compare these strings always using the fixed buffer size.
You can get even faster than memcmp if you make sure your string buffers are 16-byte aligned so you can use SSE2 and you only need to test for equality and not greater or less-than. Even without SSE2 you can do an equality compare using subtraction in word-sized chunks.
The reason that these techniques speed things up is that they remove the byte-by-byte comparison test from the equation. Looking for the terminating '\0' or the byte that is different is expensive because test-and-branch is hard to predict and pipeline.

Maybe or maybe not
If your C++ implementation uses a highly optimized memcmp (as GCC has) and
it's C++ string comparison does the trivial while(*p++ == *q++) ... equivalent,
then, yes, memcmp would be faster on large strings because it utilizes multiple character comparisons at a time and aligned 32bit loads.
On shorter strings, these optimizations wouldn't be visible in the timings - but on larger strings (some 10K or so), the speedup should be clearly visible.
The Answer: it depends ;-) Check your C++ strings implementation.
Regards
rbo

Using sprintf without a manually allocated buffer

In the application that I am working on, the logging facility makes use of sprintf to format the text that gets written to file. So, something like:
char buffer[512];
sprintf(buffer, ... );
This sometimes causes problems when the message that gets sent in becomes too big for the manually allocated buffer.
Is there a way to get sprintf behaviour without having to manually allocate memory like this?
EDIT: while sprintf is a C operation, I'm looking for C++ type solutions (if there are any!) for me to get this sort of behaviour...

You can use asprintf(3) (note: non-standard) which allocates the buffer for you so you don't need to pre-allocate it.

No you can't use sprintf() to allocate enough memory. Alternatives include:
use snprintf() to truncate the message - does not fully resolve your problem, but prevent the buffer overflow issue
double (or triple or ...) the buffer - unless you're in a constrained environment
use C++ std::string and ostringstream - but you'll lose the printf format, you'll have to use the << operator
use Boost Format that comes with a printf-like % operator

I dont also know a version wich avoids allocation, but if C99 sprintfs allows as string the NULL pointer. Not very efficient, but this would give you the complete string (as long as enough memory is available) without risking overflow:
length = snprintf(NULL, ...);
str = malloc(length+1);
snprintf(str, ...);

"the logging facility makes use of sprintf to format the text that gets written to file"
fprintf() does not impose any size limit. If you can write the text directly to file, do so!
I assume there is some intermediate processing step, however. If you know how much space you need, you can use malloc() to allocate that much space.
One technique at times like these is to allocate a reasonable-size buffer (that will be large enough 99% of the time) and if it's not big enough, break the data into chunks that you process one by one.

With the vanilla version of sprintf, there is no way to prevent the data from overwriting the passed in buffer. This is true regardless of wether the memory was manually allocated or allocated on the stack.
In order to prevent the buffer from being overwritten you'll need to use one of the more secure versions of sprintf like sprintf_s (windows only)
http://msdn.microsoft.com/en-us/library/ybk95axf.aspx

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js