Overflowing char array overwrites the exact same string every time - why? - c++

I have the following code that shows the dangers of using char arrays over strings:
int main(){
char password[] = "SECRET";
char msg[10], ch;
int i = 0;
cout << "Please enter your name:";
while((ch = getchar()) != '\n'){
msg[i++] = ch;
}
msg[i] = '\0';
cout << "\n\nHello " << msg << endl;
cout << "The password is " << password;
}
When I enter a name (stored in char msg[10]) that is longer than 16 characters, everything after those 16 characters replaces the value stored in char password[] ("SECRET").
Why is this the case? (a general curiosity)
Why 16 characters and not 10 - the size of the array?
Why is it always password that gets overwritten and not some other variable or some other part of the memory where I wouldn't notice immediately?
What's the benefit of using char[] over strings then?
EDIT: Updated with follow up questions:
5. In response to the argument that password and msg are declared next to each other, I shuffled the declaration block as follows:
char password[] = "SECRET";
char ch;
int i = 0;
char msg[10];
However, no change.
6. In response to the argument that it was chance that caused the gap between msg and password to be 6 (bytes?) long, I have recompiled the code many times, including the reshuffling above. Still, no change.
Any suggestions as to why?

The answer for your first three questions is the same: because that's how your compiler chose to lay out these variables on the stack. Nothing in the standard guarantees that - in fact, what you're doing is undefined behavior - anything could happen.
Change compilers, or even compiler settings, and other things might happen. Or not. There's no telling.
As for 4, except for interoperability with C code, or other APIs that require C-style strings, essentially none.

1 . In your case, memory is stored like that:
msg | |i |password
| | | | | | | | | | |1|2|3|4|5|6|S|E|C|R|E|T|\0
Then you write on msg progressively:
msg | |password
|A|Z|E|R|T|Y|U|I|O|P|1|2|3|4|5|6|S|E|C|R|E|T|\0
But if you continue:
msg | |password
|A|Z|E|R|T|Y|U|I|O|P|1|2|3|4|5|6|Q|W|E|R|T|Y|
Because char array doesn t check for length. (Search for overflow).
2 .You write on memory, you erase everything in between, maybe i or something that doesn t belong to your program.
3 .So it take 6 char before you overwrite password. It could have been 0char as well as millions.
4 .Unless you store a defined array of byte... Nothing, that is the point that code prove.
UPDATE:
Changing the place of code won t change padding, add variable, array, or better: use a different compiler, so that even after optimisation, the binary change.
Recompiling will not change the binary produced, because the compiler wil do the exact same thing.

Your two arrays msg and password are static, and therefore have been placed on the stack, meaning they're near each other.
The specifics are implementation dependent and are likely to change between compilers and optimisation levels. It's possible that the compiler has padded the stack a bit when allocating memory and there is a 16 byte gap between msg[0] and password[0].
password gets overwritten everytime because it just happens to be above msg on your stack. If you used a different compiler, or swapped their positions around in code, it might not be. How things are allocated on the stack isn't going to change between executions; it's determined at compile time (it's static), not runtime.
Note that, in principle, the compiler is free to do anything it wants! We can only make educated guesses about what'll happen given typical compiler behaviour.
If you really want to know what's going on, you have to look at the ouput assembly.
std::string (for C++) is usually preferable to char[] - it's far safer as it implements bound checking and manages its own memory.

1) writing outside an array will access something else.
2) alignment probably.
3) chance. anything can happen.
4) nothing!

Related

Char array returns four times more data than expected

Before I continue, here's the code:
#include <iostream>
using namespace std;
int main() {
char array[] = {'a','b','c'};
cout << array << endl;
return 0;
}
My system:
VisualStudio 2019, default C++ settings
Using Debug build instead of release
When I run this code sample, I get something like this in my console output:
abcXXXXXXXXX
Those X's represent seemingly random characters. I know they're from existing values in memory at that address, but I don't understand why I'm getting 12 bytes back instead of the three from my array.
Now, I know that if I were doing this with ints which are four bytes long, maybe this would make sense but sizeof(array) returns three (ie. three bytes long, I know the sizeof(array) / sizeof(array[0] trick.) And when I do try it with ints, I'm even more confused because I get some four-byte hex number instead (maybe a memory address?)
This may be some trivial question, I'm sorry, but I'm just trying to figure out why it behaves like this. No vectors please, I'm trying to stay as non-STL as possible here.
cout takes this char array and addresses it as a null-terminated string.
Since the terminating character in this array is not the null character (i.e., char(0)), it attempts to print until encountering the null character.
At this point, it attempts to read memory outside of the array which you have allocated, and technically, anything could happen.
For example, there can be different data in that memory every time the function is called, or the memory access operation may even be illegal, depending on the address where array was allocated at the time the function was called.
So the behavior of your program is generally considered undefined (or non-deterministic).

Purpose of char a[0] in converting integer to string using itoa()

I have this code,
char a[0];
cout << "What is a? " << a << endl;
char *word = itoa(123,a,10);
string yr = string(word);
but i have trouble comprehending the array a[0]. I tried to change its value and see if there is any changes, but it seems to make no differences at all.
example, even if a change a[0] to a[1], or any other integer, the output still make no difference
char a[1];
cout << "What is a? " << a << endl;
char *word = itoa(123,a,10);
string yr = string(word);
What is its purpose here?
Since itoa function is non-standard, this is a discussion of a popular signature itoa(int, char*, int).
Second parameter represents a buffer into which a null-terminated string representing the value is copied. It must provide enough space for the entire string: in your case, that is "123", which takes four characters. Your code passes a[] as the buffer, but the size of a[] is insufficient to accommodate the entire "123" string. Hence, the call causes undefined behavior.
You need to make a large enough to fit the destination string. Passing a buffer of size 12 is sufficient to accommodate the longest decimal number that can be produced by itoa on a 32-bit system (i.e. -2147483648). Replace char a[0] with char a[12] in the declaration.
What is its purpose here?
A zero-length array is an array with no elements in it.
You can't [legally] print or modify its contents, because it doesn't have any.
There are arcane reasons to want to use one, but speaking generally it has no purpose for you. It's not even allowed by the standard (although compilers tend to support it for those arcane reasons).
even if a change a[0] to a[1], or any other integer, the output still make no difference
Well, if you have an array with n elements in it, and you write more than n elements' worth of data to it, that's a "buffer overrun" and has undefined behaviour. It could appear to work as you overwrite somebody else's memory, or your program could crash, or your dog could suddenly turn into a zombie and eat you alive. Best avoided tbh.

cin >> writing out of range?

I have a code
char s[5];
cin >> s;
cout << strlen(s);
cout << endl;
cout << s;
It works even if I input more than 5 chars, for example "qwertyui". Does it mean that I am using not allocated memory?
strlen(s)
is something, but has nothing to do with 5. strlen applies to C strings, which are char arrays, but their length is defined as the numbers of characters until the first zero byte happens.
Now, cin in your second line cannot know how long your char[] is, so it just accepts as much input as there is. You must never use char buffers for input you don't know is well-formed. What you're seeing is a buffer overflow in action. Writing over memory that doesn't belong to any variable you allocated results in undefined behaviour, so your program might just work, crash with e.g. a segfault (accessing memory that the OS never gave you), or overwriting existing part's of your processes' memory, or … just do anything, because it's really undefined.
So, you're writing C++, not C. just
string s;
cin >> s;
cout << s.length()
<< endl
<< s;
to avoid dealing with the (very dangerous) C strings.
You're right, it might still echo correctly if you write more than 5 characters. You're simply writing off the end of the buffer, and just blasting the memory that's next to the memory allocated for char s[5]. This is bad for many reasons, including security vulnerabilities. See here for details.
If you can't use string (for whatever reason), use fgets. See here for the documentation on fgets and how it is used. NEVER USE gets. It's almost equivalent to what you've done above, see here for why gets is so dangerous.

Buffer overflow - The changes of variables

void go()
{
//{1}
char buffer[2];
gets(buffer);
//{2}
cout << allow;
}
I tried to run the procedure above in 2 cases:
-1st: I declare "int allow;' at position 1
-2nd: I declare "int allow;' at position 2
In both cases, when i tried to enter the string "123" (without the quotation marks), the allow's value was 51.
However, as I read about the memory layout, only in the first case, the position of "allow" in the stack is before buffer, which means that when the string is longer than the buffer, the value of "allow" is changed.
Then, I tried to declare "char sth[10]" in both position. This time, only when I declared sth in first position, the value of it was changed.
Can anyone explain what happened?
Since changing allow via overflow is Undefined Behavior, the compiler might even not have a variable allow at all and change your code to cout << 0 instead when compiling with optimization. This is not a valid way to check for overflow, regardless of where you put allow.
To emphasize: All changes of allow you observe are the result of UB. There are no guarantees on this in the standard what so ever. You can go ahead and speculate on why you see this output today, on you system, with this very toolchain, but the outcome might change to anything (like your program moving your lawn or stealing the crown jewels) for any reason.
Indeed, there is no way to use gets safely. This is why it is removed in both the current C++ and C standard.
You can use std::string and std::getline instead:
string buffer;
std::getline(std::cin, buffer);

What are the potential security vulnerabilities? C++

My boss told me to look at the following code and tell him what the potential security vulnerabilities were. I'm not very good at this kind of thing, since I don't think in the way of trying to hack code. All I see is that nothing is declared private, but other than that I just don't know.
#define NAME_SIZE (unsigned char) 255
// user input should contain the user’s name (first name space
// middle initial space last name and a null
// character), and was entered directly by the user.
// Returns the first character in the user input, or -1 if the method failed.
char poor_method(char* user_input, char* first, char *middle, char* last)
{
char*buffer;
char length;
// find first name
buffer = strtok(user_input, " ");
if(buffer==0)
{
return -1;
}
length = strlen(buffer);
if(length <= NAME_SIZE)
{
strcpy(first, buffer);
}
// find middle name
buffer = strtok(NULL, " ");
if(buffer==0)
{
return-1;
}
if(middle)
*middle = buffer[0];
// find last name
buffer = strtok(NULL, "\0");
length = strlen(buffer);
if(length <= NAME_SIZE)
{
strcpy(last, buffer);
}
// Check to make sure that all of the user input was used
buffer = strtok(NULL, "\0");
if(buffer != NULL)
{
return-1;
}
return first[0];
}
What security vulnerabilities are there?
Get good at writing secure code
You most likely don't want systems that you are responsible for finding their way onto bugtraq or cve. If you don't understand it, be honest with your boss. Tell him you don't understand and you want to work on it. Pick up Writing Secure Code. Read it, learn it, love it. Asking this question on SO and giving your boss the answer definitely doesn't help you in the long run.
Then look at the sample code again :)
What I saw (by no means a complete list):
There's no guarantees you're going to get a char pointer which points to a null-terminating string (unless you're allowed to make that assumption, not really a safe one to make).
strtok and strcpy are the C way of doing things and come with the fun stuff of programming C code. If you must use them, so be it (just make sure you can guarantee you're inputs to these functions will indeed be valid). Otherwise, try switching your code to use std::string and the "C++ way" (as Cat Plus Plus put it)
I'm assuming this is a typo:
charpoor_method(
You're missing a space between char and poor_method(
You're not checking if first or last are indeed valid pointers (unfortunately, the best you can do is to check them against NULL).
There's no guarantee that the buffers first or last can indeed hold whatever you're copying to them.
Another typo:
returnfirst[0];
missing space between return and first[0]
Learning to write secure code is something that's very important to do. Follow Brecht's advice and get good at it.
Ok strtok assumes user_input is NULL terminated, this might not be true.
charlength = strlen(buffer);
if(length &lt= NAME_SIZE)
{
strcpy(first, buffer);
}
charlenght here is undeclared, so is length, they should be declared as unsigned int.
strlen wont count the '\0' as a part of the length, so later strcpy will copy the '\0' to whatever is after First if the len of buffer is 255 + 1('\0')
Also is unknown if char *first size is, it should be NAME_SIZE but the comparisson should be
length <= NAME_SIZE - 1
or allocate char *first to NAME_SIZE + 1
I'd probably rewrite the whole thing, is quite ugly.
Rather than using strcpy(), use strncpy() with a specific length parameter, as that function, like strtok(), assumes a NULL-terminated buffer for the source, and that may not be the case, giving you a buffer overflow for the data copied into the buffer pointed to by either first or last. Additionally, you have no idea how long the buffers are that have been allocated for first and last ... Don't assume that the user of your function has properly allocated enough memory to copy into unless they've passed you a parameter telling you there are enough memory slots in the buffers. Otherwise again, you could (and most likely will) end-up with buffer overflows.
Also you may want to use the restrict keyword if you're using C99 in order to prevent the caller of your function from aliasing the same memory location for buffer, first, and last.