Testing buffer overrun of input

Testing buffer overrun of input - c++

For example, if i input characters greater than 10 why doesn't it throw an exception or error?
would you get the input with getline instead?
int main()
{
char c[10];
while (cin >> c)
{
cout << c << endl;
}
}

Why doesn't it throw an exception or error?
A buffer overflow is an example of undefined behavior. The behavior is literally undefined: if you overflow a buffer, there are no guarantees whatosever about what your program will do. This doesn't generate an exception because doing so would require lots of relatively costly checks even in correct code, and in C++ the general philosophy is that you don't pay for what you don't need.
If you avoid raw arrays and raw (non-smart) pointers and use the C++ Standard Library containers, strings, and algorithms, you can easily avoid most situations that would result in a buffer overflow.
Would you get the input with getline instead?
You can either use std::getline, which allows you to extract a "line" of characters into a std::string, or you can use >> and extract into a std::string object directly, depending on what, exactly, you want to extract.

there are tools which attempt to expose these issues. valgrind and GuardMalloc are examples of this. as well, msc allows you to specify build options which can expose such issues.
note also that different compilers emit different instructions based on your program, and different instructions when optimizing or not. this means the consequences may exist in some builds, and may not exist in others.
i occasionally test my programs using the tools/techniques i've mentioned. i also use more dynamic allocations in unit tests, in order to expose failure cases more easily when running programs with these tools.
if you're coming from java or another language which integrates smarter arrays: that's not how c programs are interpreted by the compiler, nor is it how they are represented in memory. instead, we typically use proper containers in c++. these will detect many of these issues. for example, a std::vector may throw if you attempt to access an invalid element.
good luck

Related

Is it legal to read into an array that doesn't have space for the null terminator?

Every C string is terminated with the null character when it is stored in a C-string variable, and this always consumes one array position.
So why is this legal:
char name[3];
std::cin >> name;
when a 3 letter word "cow" is input to std::cin?
Why is it allowing this?

The compiler cannot know what input you will be giving at runtime, so it cannot say that you will be doing something illegal. If the input fits in the array, then it's legal.
Prior C++20:
The behaviour of the program is undefined. Never do this.
Since C++20:
The input will be truncated to fit into the array, so the array will contain the string "co".

It is allowed because the memory management is left to the programmer.
In this case the error occurs at run-time and it's impossible for the compiler to spot it at compile time.
What you are getting here is an undefined behavior, where the program can crash, can go on smoothly, or present some odd behavior in the future.
Most likely here you are overflowing the allocated memory by one byte which will be stored in the first byte of the subsequent declared variable, or in other addresses which may or may not be already filled with relevant information.
You eventually may experience an abort of the execution if - for example - some read-only memory area is overwritten, or if you point your Instruction Pointer to some invalid area (again, by overwriting its value in memory)

Depends what you mean by "legal". In context of your question, there is no precise meaning.
According to the standard (at least before C++20) your code (if supplied with that input) has undefined behaviour. When behaviour is undefined, any observable outcome is permitted, and no diagnostics are required.
In particular, there is no requirement that an implementation (the compiler, the host system, or anything else) diagnose a problem (e.g. issue an error message), take preventative action (e.g. electrocute the programmer for entering "cow" before hitting the enter key), or take recovery action (e.g. truncate the input to "co").
The reason the standard allows such things is a combination of;
allowing implementation freedoms. For example, the standard permits but does not require your host system to electrocute the user for entering bad input to your program - but doesn't prevent an evil-minded system developer providing hardware and driver support to do exactly that;
technical infeasibility or technical difficulty. In a simple case like your code might be easy to detect. But, in more complicated programs (e.g. your code is buried away among other code) it might be impossible to identify a problem. Or, it might be possible, but take much longer than a programmer is willing/able to spend waiting.
In short, the C++ standard does not coddle the programmer. The programmer - not the standard, not the compiler - is responsible for avoiding undefined behaviour, and responsible if a program with undefined behaviour does undesirable things.

How do I avoid using a constant for filename size?

It seems like standard programming practice and the POSIX standard are at odds with each other. I'm working with a program and I noticed that I see a lot of stuff like:
char buf[NAME_MAX + 1]
And I'm also seeing that a lot of operating systems don't define NAME_MAX and say that that they technically don't have to according to POSIX because you're supposed to use pathconf to get the value it's configured to at runtime rather than hard-coding it as a constant anyway.
The problem is that the compiler won't let me use pathconf this way with arrays. Even if I try storing the result of pathconf in a const int, it still throws a fit and says it has to be a constant. So it looks like in order to actually use pathconf, I would have to avoid using an array of chars for the buffer here because that apparently isn't good enough. So I'm caught between a rock and a hard place, because the C++ standard seemingly won't allow me to do what POSIX says I must do, that is determine the size of a character buffer for a filename at runtime rather than compile time.
The only information I've been able to find on this suggests that I would need to replace the array with a vector, but it's not clear how I would do it. When I test using a simple program, I can get this to work:
std::vector<char> buf((pathconf("/", _PC_NAME_MAX) + 1));
And then I can figure out the size by calling buf.size() or something. But I'm not sure if this is the right approach at all. Does anyone have any experience with trying to get a program to stop depending on constants like NAME_MAX or MAXNAMLEN being defined in the system headers and getting the implementation to use pathconf at runtime instead?

Halfway measures do tend to result in conflicts of some sort.
const usigned NAME_MAX = /* get the value at runtime */;
char buf[NAME_MAX + 1];
The second line declares a C-style array (presumably) intended to hold a C-style string. In C, this is fine. In C++, there is an issue because the value of NAME_MAX is not known at compile time. That's why I called this a halfway measure—there is a mix of C-style code and C++ compiling. (Some compilers will allow this in C++. Apparently yours does not.)
The C++ approach would use C++-style strings, as in:
std::string buf;
That's it. The size does not need to be specified since memory will be allocated as needed, provided you avoid C-style interfaces. Use streaming (>>) when reasonable. If the buffer is being filled by user or file input, this should be all you need.
If you need to use C-style strings (perhaps this buffer is being filled by a system call written for C?), there are a few options for allocating the needed space. The simplest is probably a vector, much like you were thinking.
std::vector<char> buf{NAME_MAX + 1};
system_call(buf.data()); // Send a char* to the system call.
Alternatively, you could use a C++-style string, which could make manipulating the data more convenient.
std::string buf{NAME_MAX + 1, '\0'};
system_call(buf.data()); // Send a char* to the system call.
There is also a smart pointer option, but the vector approach might play nicer with existing code written for a C-style array.

C++17 Purpose of std::from_chars and std::to_chars?

Before C++17, there existed a variety of methods to convert integers, floats, and doubles to and from strings. For example, std::stringstream, std::to_string, std::atoi, std::stoi, and others could have been used to accomplish these tasks. To which, there exists plenty of posts discussing the differences between those methods.
However, C++ 17 has now introduced std::from_chars and std::to_chars. To which, I'd like to know the reasons for introducing another means of converting to and from strings.
For one, what advantages and functionality do these new functions provide over the previous methods?
Not only that, but are there any notable disadvantages for this new method of string conversion?

std::stringstream is the heavyweight champion. It takes into consideration things like the the stream's imbued locale, and its functionality involves things like constructing a sentry object for the duration of the formatted operation, in order to deal with exception-related issues. Formatted input and output operations in the C++ libraries have some reputation for being heavyweight, and slow.
std::to_string is less intensive than std::istringstream but it still returns a std::string, whose construction likely involves dynamic allocation (less likely with modern short string optimization techniques, but still likely). And, in most cases the compiler still needs to generate all the verbiage, at the call site, to support a std::string object, including its destructor.
std::to_chars are designed to have as little footprint as possible. You provide the buffer, and std::to_chars does very little beyond actually formatting the numeric value into the buffer, in a specific format, without any locale-specific considerations, with the only overhead of making sure that the buffer is big enough. Code that uses std::to_chars does not need to do any dynamic allocation.
std::to_chars is also a bit more flexible in terms of formatting options, especially with floating point values. std::to_string has no formatting options.
std::from_chars is, similarly, a lightweight parser, that does not need to do any dynamic allocation, and does not need to sacrifice any electrons to deal with locale issues, or overhead of stream operations.

to/from_chars are designed to be elementary string conversion functions. They have two basic advantages over the alternatives.
They are much lighter weight. They never allocate memory (you allocate memory for them). They never throw exceptions. They also never look at the locale, which also improves performance.
Basically, they are designed such that it is impossible to have faster conversion functions at an API level.
These functions could even be constexpr (they aren't, though I'm not sure why), while the more heavyweight allocating and/or throwing versions can't.
They have explicit round-trip guarantees. If you convert a float/double to a string (without a specified precision), the implementation is required to make it so that taking that exact sequence of characters and converting it back into a float/double will produce a binary-identical value. You won't get that guarantee from snprintf, stringstream or to_string/stof.
This guarantee is only good however if the to_chars and from_chars calls are using the same implementation. So you can't expect to send the string across the Internet to some other computer that may be compiled with a different standard library implementation and get the same float. But it does give you on-computer serialization guarantees.

All these pre-existing methods were bound to work based on a so-called locale. A locale is basically a set of formatting options that specify, e.g., what characters count as digits, what symbol to use for the decimal point, what thousand's separator to use, and so on. Very often, however, you don't really need that. If you're just, e.g., reading a JSON file, you know the data is formatted in a particular way, there is no reason to be looking up whether a '.' should be a decimal point or not every time you see one. The new functions introduced in <charconv> are basically hardcoded to read and write numbers based on the formatting laid out for the default C locale. There is no way to change the formatting, but since the formatting doesn't have to be flexible, they can be very fast…

Safe counterparts of itoa()?

I am converting some old c program to a more secure version. The following functions are used heavily, could anyone tell me their secure counterparts? Either windows functions or C runtime library functions. Thanks.
itoa()
getchar()
strcat()
memset()

itoa() is safe as long as the destination buffer is big enough to receive the largest possible representation (i.e. of INT_MIN with trailing NUL). So, you can simply check the buffer size. Still, it's not a very good function to use because if you change your data type to a larger integral type, you need to change to atol, atoll, atoq etc.. If you want a dynamic buffer that handles whatever type you throw at it with less maintenance issues, consider an std::ostringstream (from the <sstream> header).
getchar() has no "secure counterpart" - it's not insecure to begin with and has no buffer overrun potential.
Re memset(): it's dangerous in that it accepts the programmers judgement that memory should be overwritten without any confirmation of the content/address/length, but when used properly it leaves no issue, and sometimes it's the best tool for the job even in modern C++ programming. To check security issues with this, you need to inspect the code and ensure it's aimed at a suitable buffer or object to be 0ed, and that the length is computed properly (hint: use sizeof where possible).
strcat() can be dangerous if the strings being concatenated aren't known to fit into the destination buffer. For example: char buf[16]; strcpy(buf, "one,"); strcat(buf, "two"); is all totally safe (but fragile, as further operations or changing either string might require more than 16 chars and the compiler won't warn you), whereas strcat(buf, argv[0]) is not. The best replacement tends to be a std::ostringstream, although that can require significant reworking of the code. You may get away using strncat(), or even - if you have it - asprintf("%s%s", first, second), which will allocate the required amount of memory on the heap (do remember to free() it). You could also consider std::string and use operator+ to concatenate strings.

None of these functions are "insecure" provided you understand the behaviour and limitations. itoa is not standard C and should be replaced with sprintf("%d",...) if that's a concern to you.
The others are all fine to the experienced practitioner. If you have specific cases which you think may be unsafe, you should post them.

I'd change itoa(), because it's not standard, with sprintf or, better, snprintf if your goal is code security. I'd also change strcat() with strncat() but, since you specified C++ language too, a really better idea would be to use std::string class.
As for the other two functions, I can't see how you could make the code more secure without seeing your code.

Why is snprintf faster than ostringstream or is it?

I read somewhere that snprintf is faster than ostringstream. Has anyone has any experiences with it? If yes why is it faster.

std::ostringstream is not required to be slower, but it is generally slower when implemented. FastFormat's website has some benchmarks.
The Standard library design for streams supports much more than snprintf does. The design is meant to be extensible, and includes protected virtual methods that are called by the publicly exposed methods. This allows you to derive from one of the stream classes, with the assurance that if you overload the protected method you will get the behavior you want. I believe that a compiler could avoid the overhead of the virtual function call, but I'm not aware of any compilers that do.
Additionally, stream operations often use growable buffers internally; which implies relatively slow memory allocations.

We replaced some stringstreams in inner loops with sprintf (using statically allocated buffers), and this made a big difference, both in msvc and gcc. I imagine that the dynamic memory management of this code:
{
char buf[100];
int i = 100;
sprintf(buf, "%d", i);
// do something with buf
}
is much simpler than
{
std::stringstream ss;
int i = 100;
ss << i;
std::string s = ss.str();
// do something with s
}
but i am very happy with the overall performance of stringstreams.

Some guys would possibly tell you about that the functions can't be faster than each other, but their implementation can. That's right i think i would agree.
You are unlikely to ever notice a difference in other than benchmarks. The reason that c++ streams generally tend to be slower is that they are much more flexible. Flexibility most often comes at the cost of either time or code growth.
In this case, C++ streams are based on stream-buffers. In itself, streams are just the hull that keep formatting and error flags in place, and call the right i/o facets of the c++ standard library (for example, num_put to print numbers), that print the values, well formatted, into the underlying stream-buffer connected to the c++ stream.
All this mechanisms - the facets, and the buffers, are implemented by virtual functions. While there is indeed no mark note, those functions must be implemented to be slower than c stdio pendants that fact will make them somewhat slower than using c stdio functions normally (i benchmark'ed that some time ago with gcc/libstdc++ and in fact noticed a slowdown - but which you hardly notice in day-by-day usage).

Absolutely this is implementation-specific.
But if you really want to know, write two small programs, and compare them. You would need to include typical usage for what you have in mind, the two programs would need to generate the same string, and you would use a profiler to look at the timing information.
Then you would know.

One issue would probably be that the type safety added by ostringstream carries extra overhead. I've not done any measurements, though.

As litb said, standard streams support many things we don't always need.
Some streams implementation get rid of this never used flexibility, see FAStream for instance.

It's quite possible that because sprintf is part of the CRT that is written in assembly. The ostringstream is part of the STL, and probably a little more generically written, and has OOP code/overhead to deal with.

Yes, if you run the function below on a few million numbers with Visual C++ 5.0, the first version takes about twice as long as the second and produces the same output.
Compiling tight loops into a .exe and running the Windows timethis something.exe' or the Linuxtime something' is how I investigate most of my performance curiosities. (`timethis' is available on the web somewhere)
void Hex32Bit(unsigned int n, string &result)
{
#if 0
stringstream ss;
ss
<< hex
<< setfill('0')
<< "0x" << setw(8) << n
;
result = ss.str();
#else
const size_t len = 11;
char temp[len];
_snprintf(temp, len, "0x%08x", n);
temp[len - 1] = '\0';
result = temp;
#endif
}

One reason I know that the printf family of functions are faster than the corresponding C++ functions (cout, cin, and other streams) is that the latter do typechecking. As this usually involves some requests to overloaded operators, it can take some time.
In fact, in programming competitions it is often recommended that you use printf et al rather than cout/cin for precisely this reason.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js