I'm doing an intensive course in C++ and I am the kind of person that likes and wants to understand everything under the sun otherwise I can't continue with my homework.
char anything[5]{};
cin >> anything;
cout << anything << endl;
I can write 'Hello' in the console and it's all good. Writing 'H' is good too, and 'Hel' is good too. But what shouldn't be right is writing 'Helloo' or 'Hellooo', and it works! The moment I write 'Helloooooooooooooooooooooooooooooooo' program crashes, which is ok. This does the same when using numbers.
Why is this happening? I'm using CodeLite.
To my understanding, if I write over 5 characters the program shouldn't allow it, but it does.
What's the explanation behind this?
[trigger warning: car crashes, amputations, cats]
if I write over 5 characters the program shouldn't allow it, but it does.
That's an overstatement.
The language doesn't allow it, but the compiler and runtime don't enforce that. That's because it would deduct from performance to perform bounds checking all the time just to look out for programmer errors, penalising those who didn't make any.
So, you broke a contract and what happens next is undefined. You could fall foul of some "optimisation" in the compiler that assumes you wrote a correct program; you could overwrite some memory at runtime, getting a crash; you could overwrite some memory at runtime, getting an apparently "valid" string up to the next NULL that happens to be there; or, you could open a wormhole to the local cattery. You just don't know.
This is called undefined behaviour.
An analogy would be driving over the speed limit. Your car won't stop you from speeding, but the law says not to, and if you do it anyway you may get arrested, or run someone over, or lose control then crash into a tree and lose all your limbs. Generally it's best not to take the chance.
Related
Every C string is terminated with the null character when it is stored in a C-string variable, and this always consumes one array position.
So why is this legal:
char name[3];
std::cin >> name;
when a 3 letter word "cow" is input to std::cin?
Why is it allowing this?
The compiler cannot know what input you will be giving at runtime, so it cannot say that you will be doing something illegal. If the input fits in the array, then it's legal.
Prior C++20:
The behaviour of the program is undefined. Never do this.
Since C++20:
The input will be truncated to fit into the array, so the array will contain the string "co".
It is allowed because the memory management is left to the programmer.
In this case the error occurs at run-time and it's impossible for the compiler to spot it at compile time.
What you are getting here is an undefined behavior, where the program can crash, can go on smoothly, or present some odd behavior in the future.
Most likely here you are overflowing the allocated memory by one byte which will be stored in the first byte of the subsequent declared variable, or in other addresses which may or may not be already filled with relevant information.
You eventually may experience an abort of the execution if - for example - some read-only memory area is overwritten, or if you point your Instruction Pointer to some invalid area (again, by overwriting its value in memory)
Depends what you mean by "legal". In context of your question, there is no precise meaning.
According to the standard (at least before C++20) your code (if supplied with that input) has undefined behaviour. When behaviour is undefined, any observable outcome is permitted, and no diagnostics are required.
In particular, there is no requirement that an implementation (the compiler, the host system, or anything else) diagnose a problem (e.g. issue an error message), take preventative action (e.g. electrocute the programmer for entering "cow" before hitting the enter key), or take recovery action (e.g. truncate the input to "co").
The reason the standard allows such things is a combination of;
allowing implementation freedoms. For example, the standard permits but does not require your host system to electrocute the user for entering bad input to your program - but doesn't prevent an evil-minded system developer providing hardware and driver support to do exactly that;
technical infeasibility or technical difficulty. In a simple case like your code might be easy to detect. But, in more complicated programs (e.g. your code is buried away among other code) it might be impossible to identify a problem. Or, it might be possible, but take much longer than a programmer is willing/able to spend waiting.
In short, the C++ standard does not coddle the programmer. The programmer - not the standard, not the compiler - is responsible for avoiding undefined behaviour, and responsible if a program with undefined behaviour does undesirable things.
Even though it is bad practice, is there any way the following code could cause trouble in real life? Note than I am only reading out of bounds, not writing:
#include <iostream>
int main() {
int arr[] = {1, 2, 3};
std::cout << arr[3] << '\n';
}
As mentioned, it is not "safe" to read beyond the end of the stack. But it sounds like you're really trying to ask what could go wrong? and, typically, the answer is "not much". Your program would ideally crash with a segfault, but it might just keep on happily running, unaware that it's entered undefined behavior. The results of such a program would be garbage, of course, but nothing's going to catch on fire (probably...).
People mistakenly write code with undefined behavior all the time, and a lot of effort has been spent trying to help them catch such issues and minimize their harm. Programs run in user space cannot affect other programs on the same machine thanks to isolated address spaces and other features, and software like sanitizers can help detect UB and other issues during development. Typically you can just fix the issue and move on to more important things.
That said, UB is, as the name suggests, undefined. Which means your computer is allowed to do whatever it wants once you ask it to execute UB. It could format your hard drive, fry your processor, or even "make demons fly out of your nose". A reasonable computer wouldn't do those things, but it could.
The most significant issue with a program that enters UB is simply that it's not going to do what you wanted it to do. If you are trying to delete /foo but you read off the end of the stack you might end up passing /bar to your delete function instead. And if you access memory that an attacker also has access to you could wind up executing code on their behalf. A large number of major security vulnerabilities boil down to some line of code that triggers UB in just the wrong way that a malicious user can take advantage of.
Depends on what you mean by stack. If it is the whole stack, then no, you can't do that, it will lead to a segmentation fault. Not because there is the memory of other processes there (that's not how it works), but rather because there is NOTHING there. You can heuristically see this by looking at the various addresses the program uses. The stack for example is at ~0x7f7d4af48040, which is beyond what any computer would have as memory. The memory your program sees is different from the physical memory.
If you mean read beyond the stack frame of the current method: yes, you can technically do that safely. Here is an example
void stacktrace(){
std::cerr << "Received SIGSEGV. Stack trace:\n";
void** bp;
asm(R"(
.intel_syntax noprefix
mov %[bp], rbp
.att_syntax
)"
: [bp] "=r" (bp));
size_t i = 0;
while(true){
std::cerr << "[" << i++ << "] " << bp[1] << '\n';
if(bp > *bp) break;
bp = (void**) *bp;
}
exit(1);
}
This is a very basic program I wrote to see, whether I could manually generate a stack trace. It might not be obvious if you are unfamiliar, but on x64 the address contained in rbp is the base of the current stack frame. In c++, the stack frame would look like:
return pointer
previous value of rsp [rsp = stack pointer] <- rbp points here
local variables (may be some other stuff like stack cookie)
...
local variables <- rsp points here
The address decreases the lower you go. In the example I gave above you can see that I get the value of rbp, which points outside the current stack frame, and move from there. So you can read from memory beyond the stack frame, but you generally shouldn't, and even so, why would you want to?
Note: Evg pointed this out. If you read some object, beyond the stack that might/will probably trigger a segfault, depending on object type, so this should only be done if you are very sure of what you're doing.
If you don't own the memory or you do own it but you haven't initialized it, you are not allowed to read it. This might seem like a pedantic and uselss rule. Afterall, the memory is there and I am not trying to overwrite anything, right? What is a byte among friends, let me read it.
The point is that C++ is a high level language. The compiler only tries to interpret what you have coded and translate it to assembly. If you type in nonsense, you will get out nonsense. It's a bit like forcing someone translate "askjds" from English to German.
But does this ever cause problems in real life? I roughly know what asm instructions are going to be generated. Why bother?
This video talks about a bug with Facebooks' string implementation where they read a byte of uninitialized memory which they did own, but it caused a very difficult to find bug nevertheless.
The point is that, silicon is not intuitive. Do not try to rely on your intuitions.
I have just found a problem and I have no idea what it could be. I started learning programming a few weeks ago and I am learning about pointers.
I compiled exactly the same code in 2 different PC's. In the first, the program runs perfectly. In the second, it stops working when it reaches a certain line.
I use 2 PC's.
The one at my workplace runs Windows XP SP3. In this one, the program worked fine.
The one at my home runs Windows 7 SP1. It compiled the code, but the program did not work.
I am writing and compiling using DEV C++ and TDM GCC 5.1.0 in both systems.
#include<iostream>
using namespace std;
int main (void) {
int* pointer;
cout << "pointer == " << pointer << "\n";
cout << "*pointer == " << *pointer << "\n"; // this is the line where the program stops.
cout << "&pointer == " << &pointer << "\n";
return 0;}
The output in the first computer was something like:
pointer == 0x000001234
*pointer == some garbage value
&pointer == 0x000007865
In the second computer, it stops at second line.
pointer == 0x1
I do understand that the pointer have not been assigned to a variable. Therefore, it does not store any correct address. Even so, it should at least show the garbage value inside it, or a "0" to indicate it has not yet an address to point to. I know the code is right because it worked fine in the first PC. But I do not understand why it failed in other computer.
I know the code is right because it worked fine in the first PC
You know no such thing.
You have undefined behaviour, and one entirely valid consequence is a program that always works. Or always works except on Saturdays, or always works until after you finished testing and shipped it to a paying customer, or always works on one machine and always fails on another.
The behaviour is undefined, not "defined to some specific consistent observable mode of failure".
Specifically, the real risk of undefined behaviour isn't simply that the result of some operation has an unspecified value, but that it may have undefined and unpredictable side-effects - on apparently-unrelated areas of your program, or on the system as a whole.
Even so, it should at least show the garbage value inside it
It did. But then you asked it to dereference that garbage value.
Reading any variable with an unspecified value is itself Undefined Behaviour, so the first piece of UB is reading the value of the pointer.
Following (dereferencing) a pointer which doesn't point to a valid object is also undefined behaviour, because you don't know whether the unspecified value you illegally interpreted as an address is correctly aligned for the type, or is mapped in your process' address space.
If you successfully load some integer from that address, that is a third piece of undefined behaviour, because again its value is unspecified.
So, the worst-case immediate pitfalls (with hardware trap values and restrictive alignment) are:
read the unspecified pointer value, get a trap representation, die with a hardware trap
OR read the unspecified pointer value, interpret it as an address which is misaligned, die with a bus error
OR follow the unspecified pointer to an unmapped address, die with a segment violation
OR survive all the previous steps - by pure chance - load some random value from some location in memory. Then die because that value is a trap representation.
But your if your process just dies, reproducibly, you can easily debug and fix it with no ill effects. In that sense, crashing at the point of invoking UB is actually the best possible outcome. The alternatives are worse, less predictable, and harder to debug.
I do understand that the pointer have not been assigned to a variable. Therefore, it does not store any correct address. Even so, it should at least show the garbage value inside it, or a "0" to indicate it has not yet an address to point to.
It did! That was the 0x000001234.
Unfortunately you then tried to dereference this invalid pointer, and print the value of an int that does not exist. You cannot do that.
If you hadn't done that, we'd have made it to the third line, where the 0x000007865 would correctly represent the address of the pointer, which is an object with name pointer and type int* that does indeed exist.
I know the code is right because it worked fine in the first PC.
One of the things you'll have to get used to with C++ is that "it appears to work on one computer" is very far from proof that the code is correct. Read about undefined behaviour and weep slow tears.
But I do not understand why it failed in other computer.
Because the code isn't right, and you didn't get "lucky" this time.
We could analyse a few reasons why it appeared to work on one system and not the other, and there are reasons for that. But it's late, and you're just starting out, and since this is undefined behaviour it doesn't matter. :)
This one is WTF city.
The below program is crashing after a few thousand loops.
unsigned long int nTurn = 1;
bool quit = false;
int main(){
while(!quit){
doTurn();
++nTurn;
}
}
That's, of course, simplified from my game, but nTurn is at the moment used nowhere but the incrementing of it, and when I comment out the ++nTurn line, the program will reliably loop forever. Shouldn't it run into the millions?
WTF, stackoverflow?
Your problem is elsewhere.
Some other part of the program is reading from a wild pointer that ends up pointing to nTurn, and when this loop changes the value the other code acts different. Or there's a race condition, and the increment makes this loop take just a tiny bit longer so the race-y thing doesn't cause trouble. There are an infinite number of things you could have wrong elsewhere.
Can you run your program under valgrind? Some errors it won't find, but a lot it will.
may sound silly, but, if I were looking at this, I'd perhaps output the nTurn var and see if it were always crashing out on the value. then, perhaps initialize nTurn to that & see if that also causes it. you could always put this into debug and see what is happening with various registers and so forth. did you try different compilers ?
I'd use a debugger to catch the fault and see the value of nTurn. Or if you have core dump from the crash load it into the debugger to see the var values at the crash time.
One more point, could the problem be when nTurn wraps round and goes to zero?
++nTurn can't be the source of the crash directly. You may have some sort of buffer overflow causing the memory for the nTurn variable to be accessed by pointer arithmetic when it shouldn't be. That would cause weird behavior when combined with the incrementing.
const char* src = "hello";
Calling strlen(src); returns size 5...
Now say I do this:
char* dest = new char[strlen(src)];
strcpy(dest, src);
That doesn't seem like it should work, but when I output everything it looks right. It seems like I'm not allocating space for the null terminator on the end... is this right? Thanks
You are correct that you are not allocating space for the terminator, however the failure to do this will not necessarily cause your program to fail. You may be overwriting following information on the heap, or your heap manager will be rounding up allocation size to a multiple of 16 bytes or something, so you won't necessarily see any visible effect of this bug.
If you run your program under Valgrind or other heap debugger, you may be able to detect this problem sooner.
Yes, you should allocate at least strlen(src)+1 characters.
That doesn't seem like it should work, but when I output everything it looks right.
Welcome to the world of Undefined Behavior. When you do this, anything can happen. Your program can crash, your computer can crash, your computer can explode, demons can fly out of your nose.
And worst of all, your program could run just fine, inconspicuously looking like it's working correctly until one day it starts spitting out garbage because it's overwriting sensitive data somewhere due to the fact that somewhere, someone allocated one too few characters for their arrays, and now you've corrupted the heap and you get a segfault at some point a million miles away, or even worse your program happily chugs along with a corrupted heap and your functions are operating on corrupted credit card numbers and you get in huge trouble.
Even if it looks like it works, it doesn't. That's Undefined Behavior. Avoid it, because you can never be sure what it will do, and even when what it does when you try it is okay, it may not be okay on another platform.
The best description I have read (was on stackoverflow) and went like this:
If the speed limit is 50 and you drive at 60. You may get lucky and not get a ticket but one day maybe not today maybe not tomorrow but one day that cop will be waiting for you. On that day you will pay and you will pay dearly.
If somebody can find the original I would much rather point at that they were much more eloquent than my explanation.
strcpy will copy the null terminated char as well as all of the other chars.
So you are copying the length of hello + 1 which is 6 into a buffer size which is 5.
You have a buffer overflow here though, and overwiting memory that is not your own will have undefined results.
Alternatively, you could also use dest = strdup(src) which will allocate enough memory for the string + 1 for the null terminator (+1 for Juliano's answer).
This is why you should always, always, always run valgrind on any C program that appears to work.
Yeah, everyone has covered the major point; you are not guaranteed to fail. The fact is that the null terminator is usually 0 and 0 is a pretty common value to be sitting in any particular memory address. So it just happens to work. You could test this by taking a set of memory, writing a bunch of garbage to it and then writing that string there and trying to work with it.
Anyway, the major issue I see here is that you are talking about C but you have this line of code:
char* dest = new char[strlen(src)];
This won't compile in any standard C compiler. There's no new keyword in C. That is C++. In C, you would use one of the memory allocation functions, usually malloc. I know it seems nitpicy, but really, it's not.