Reading off the end of an array: running in terminal vs. debugger - c++

I encountered an error in my code where an if() statement was checking a value off the end of an array. IE,
int arrayX [2];
if(arrayX [2])
FunctionCall();
This was leading to a function call that, for reasons related to the length of the above array, tried to subscript a vector with an out-of-bounds index, casuing the error. However, the error only occurred when running under the Xcode debugger; whenever I ran under terminal it didn't happen. This leads me to suspect that when I run under terminal, memory outside the array is being zeroed or tends to be zero for some other reason. The if statement gets tested for 80 different 'faulty' arrays per cycle so it seems unlikely that its a coincidence that it never pops up under terminal.
Just to be clear, my question is: why would unallocated or unrelated memory hold zeroes when run under terminal but not when run under a debugger.

Many debuggers fill unused memory with some distinct pattern, so that exactly the behaviour you describe happens.

What exactly is the question?
Whatever the question, the answer is likely... The program generator can do that if it wants to. The behavior of the sample code is undefined so the resulting program's behavior is wholly unpredictable.

You can't really tell what data is outside the array. Shall there be any debugger that zeroes that part of memory, it may be the Xcode debugger, not the terminal. So it's very strange for me that you had no problems in terminal!!
You said "The if statement gets tested for 80 different 'faulty' arrays per cycle ", consider this: are you sure those "different" faulty arrays reside on "different" areas of ram actually (If it's static data compiler may put it in once place of ram and re-use it)? And, the compiler ( / interpreter) may optimize your code and also take care of memory.

Related

Underlying reasons for different output after removing a print statement in an "out of range of vector error"

I have a minimal code causing said behavior:
vector<int> result(9);
int count = 0;
cout << "test1\n"; // removing this line causes 'core dump'
for (int j=0; j < 12; j++)
result[count++] = 1;
cout << "test2\n";
result is a vector of size 9, and inside 'for' loop I am accessing elements out of the range.
Now, removing test1 line, the code runs without any errors; but with this cout line, I get
* Error in `./out_of_range_vector2': free(): invalid next size (fast): 0x0000000001b27c20 *
I understand that this is telling me that free() encounter some memory that were not allocated my malloc(), but what role does this cout line plays here? I'd like to know a little bit more about what's going on here. More specifically, I have two questions:
Is this caused by the different state of heap on these 2 cases? If so, what exactly is different?
Why sometimes accessing out of range elements does not cause error? Is it because it hasn't exceeded vector's capacity?
The line
cout << "test1\n";
does a lot of things, and it can possibly allocate an free memory.
Writing outside the bounds of a vector is "undefined behavior" and this is very different from "getting an error". The meaning of "undefined behavior" is that who writes the compiler and the runtime library simply is free to ignore those possibilities and whatever happens happens. They can do so because a programmer will never ever do those things and when does it's 100% his/her fault. Preventing, protecting or even simply notifying of errors when accessing std::vector elements using operator[] is not part of the contract.
Technically what could have happened is that the write operation destroyed some internal data structure used by the memory allocator and this resulted in crazy behavior after that that may or may not result in a segfault.
A segfault is when things get crazy to the point that even the operating system can detect the program is not doing what is supposed to do because it requests access to locations that do not even exist (so for sure they cannot be the correct location that was supposed to contain the data being looked for).
You can however get undefined behavior and corrupted data without getting to that "segfault" point and the program will simply read or write from the wrong locations incorrect data, even without getting any observable difference from a correct program. This is actually what happens most of the times (unfortunately).
So what happens when you read or write outside the size of an std::vector using the unchecked opertor[]? Most of the time nothing (apparently). It may however do whatever it likes after that mistake, including behaving crazy in places where the code is instead correct, one billion machine instructions later and only if that provokes serious damages. Just don't do that.
When programming in C++ you simply cannot do any mistake. There are no "runtime error angels" to protect you like in other higher level languages.

Difference between accessing non-existent array index and existing-but-empty index

Suppose I wrote
vector<int> example(5);
example[6];
What difference would it make with the following?
vector<int> example(6);
example[5];
In the first case I'm trying to access a non-existent, non-declared index. Could that result in malicious code execution? Would it be possible to put some sort of code in the portion on memory corresponding to example[5] and have it executed by a program written like the first above?
What about the second case? Would it still be possible to place code in the area of memory of example[5], even though it should be reserved to my program, even if I haven't written anything in it?
Could that result in malicious code execution?
No, this causes 'only' undefined behaviour.
Simple code execution exploits usually write past the end of a stack-allocated buffer, thereby overwriting a return adress. When the function returns, it jumps to the malicious code. A write is always required, because else there is no malicious code in your program's address space.
With a vector the chances that this happens are low, because the storage for the elements is not allocated on the stack.
By writing to a wrong location on the heap, exploits are possible too, but they are much more complicated.
The first case reaches beyond the vector's buffer and thus invokes Undefined Behaviour. Technically, this means literally anything can happen. But it's unlikely to be directly exploitable to run malicious code—either the program will try to read the invalid memory (getting a garbage value or a memory error), or the compiler has eliminated the code path altogether (because it's allowed to assume UB doesn't happen). Depending on what's done with the result, it might potentially reveal unintended data from memory, though.
In the second case, all is well. Your program has already written into this memory—it has value-initialised all the 6 int objects in the vector (which happens in std::vector's constructor). So you're guarnateed to find a 0 of type int there.

Why does this line of code cause a computer to crash?

Why does this line of code cause the computer to crash? What happens on memory-specific level?
for(int *p=0; ;*(p++)=0)
;
I have found the "answer" on Everything2, but I want a specific technical answer.
This code simply formally sets an integer pointer to null, then writes to the integer pointed by it a 0 and increments the pointer, looping forever.
The null pointer is not pointing to anything, so writing a 0 to it is undefined behavior (i.e. the standard doesn't say what should happen). Also you're not allowed to use pointer arithmetic outside arrays and so even just the increment is also undefined behavior.
Undefined behavior means that the compiler and library authors don't need to care at all about these cases and still the system is a valid C/C++ implementation. If a programmer does anything classified as undefined behavior then whatever happens happens and s/he cannot blame compiler and library authors. A programmer entering the undefined behavior realm cannot expect an error message or a crash, but cannot complain if getting one (even one million executed instructions later).
On systems where the null pointer is represented as zeros and there is no support for memory protection the effect or such a loop could be of starting wiping all the addressable memory, until some vital part of memory like an interrupt table is corrupted or until the code writes zeros on the code itself, self-destroying. On other systems with memory protection (most common desktop systems today) execution may instead simply stop at the very first write operation.
Undoubtedly, the cause of the problem is that p has not been assigned a reasonable address.
By not properly initializing a pointer before writing to where it points, it is probably going to do Bad Things™.
It could merely segfault, or it could overwrite something important, like the function's return address where a segfault wouldn't occur until the function attempts to return.
In the 1980s, a theoretician I worked with wrote a program for the 8086 to, once a second, write one word of random data at a randomly computed address. The computer was a process controller with watchdog protection and various types of output. The question was: How long would the system run before it ceased usefully functioning? The answer was hours and hours! This was a vivid demonstration that most of memory is rarely accessed.
It may cause an OS to crash, or it may do any number of other things. You are invoking undefined behavior. You don't own the memory at address 0 and you don't own the memory past it. You're just trouncing on memory that doesn't belong to you.
It works by overwriting all the memory at all the addresses, starting from 0 and going upwards. Eventually it will overwrite something important.
On any modern system, this will only crash your program, not the entire computer. CPUs designed since, oh, 1985 or so, have this feature called virtual memory which allows the OS to redirect your program's memory addresses. Most addresses aren't directed anywhere at all, which means trying to access them will just crash your program - and the ones that are directed somewhere will be directed to memory that is allocated to your program, so you can only crash your own program by messing with them.
On much older systems (older than 1985, remember!), there was no such protection and this loop could access memory addresses allocated to other programs and the OS.
The loop is not necessary to explain what's wrong. We can simply look at only the first iteration.
int *p = 0; // Declare a null pointer
*p = 0; // Write to null pointer, causing UB
The second line is causing undefined behavior, since that's what happens when you write to a null pointer.

Why compiler does not complain about accessing elements beyond the bounds of a dynamic array? [duplicate]

This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 7 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Duplicate This question has been answered, is not unique, and doesn’t differentiate itself from another question.
I am defining an array of size 9. but when I access the array index 10 it is not giving any error.
int main() {
bool* isSeedPos = new bool[9];
isSeedPos[10] = true;
}
I expected to get a compiler error, because there is no array element isSeedPos[10] in my array.
Why don't I get an error?
It's not a problem.
There is no bound-check in C++ arrays. You are able to access elements beyond the array's limit (but this will usually cause an error).
If you want to use an array, you have to check that you are not out of bounds yourself (you can keep the sizee in a separate variable, as you did).
Of course, a better solution would be to use the standard library containers such as std::vector.
With std::vector you can either
use the myVector.at(i)method to get the ith element (which will throw an exception if you are out of bounds)
use myVector[i] with the same syntax as C-style arrays, but you have to do bound-checking yourself ( e.g. try if (i < myVector.size()) ... before accessing it)
Also note that in your case, std::vector<bool> is a specialized version implemented so that each booltakes only one bit of memory (therefore it uses less memory than an array of bool, which may or may not be what you want).
Use std::vector instead. Some implementations will do bounds checking in debug mode.
No, the compiler is not required to emit a diagnostic for this case. The compiler does not perform bounds checking for you.
It is your responsibility to make sure that you don't write broken code like this, because the compiler will not error on it.
Unlike in other languages like java and python, array access is not bound-checked in C or C++. That makes accessing arrays faster. It is your responsibility to make sure that you stay within bounds.
However, in such a simple case such as this, some compilers can detect the error at compile time.
Also, some tools such as valgrind can help you detect such errors at run time.
What compiler/debugger are you using?
MSVC++ would complain about it and tell you that you write out of bounds of an array.
But it is not required to do it by the standard.
It can crash anytime, it causes undefined behaviour.
Primitive arrays do not do bounds-checking. If you want bounds-checking, you should use std::vector instead. You are accessing invalid memory after the end of array, and purely by luck it is working.
There is no runtime checking on the index you are giving, accessing element 10 is incorrect but possible. Two things can happen:
if you are "unlucky", this will not crash and will return some data located after your array.
if you are "lucky", the data after the array is not allocated by your program, so access to the requested address is forbidden. This will be detected by the operating system and will produce a "segmentation fault".
There is no rule stateing that the memory access is checked in c, plain and simple. When you ask for an array of bool's it might be faster for the Operating system to give you a 16bit og 32bit array, instead of a 9bit one. This means that you might not even be writing or reading into someone elses space.
C++ is fast, and one of the reasons that it is fast is becaurse there are very few checks on what you are doing, if you ask for some memory, then the programming language will assume that you know what you are doing, and if the operating system does not complain, then everything will run.
There is no problem! You are just accessing memory that you shouldn't access. You get access to memory after the array.
isSeedPos doesn't know how big the array is. It is just a pointer to a position in memory. When you point to isSeepPos[10] the behaviour is undefined. Chances are sooner or later this will cause a segfault, but there is no requirement for a crash, and there is certainly no standard error checking.
Writing to that position is dangerous.
But the compiler will let you do it - Effectively you're writing one-past the last byte of memory assigned to that array = not a good thing.
C++ isn't a lot like many other languages - It assumes that you know what you are doing!
Both C and C++ let you write to arbitrary areas of memory. This is because they originally derived from (and are still used for) low-level programming where you may legitimately want to write to a memory mapped peripheral, or similar, and because it's more efficient to omit bounds checking when the programmer already knows the value will be within (eg. for a loop 0 to N over an array, he/she knows 0 and N are within the bounds, so checking each intermediate value is superfluous).
However, in truth, nowadays you rarely want to do that. If you use the arr[i] syntax, you essentially always want to write to the array declared in arr, and never do anything else. But you still can if you want to.
If you do write to arbitrary memory (as you do in this case) either it will be part of your program, and it will change some other critical data without you knowing (either now, or later when you make a change to the code and have forgotten what you were doing); or it will write to memory not allocated to your program and the OS will shut it down to prevent worse problems.
Nowadays:
Many compilers will spot it if you make an obvious mistake like this one
There are tools which will test if your program writes to unallocated memory
You can and should use std::vector instead, which is there for the 99% of the time you want bounds checking. (Check whether you're using at() or [] to access it)
This is not Java. In C or C++ there is no bounds checking; it's pure luck that you can write to that index.

Array index out of bound behavior

Why does C/C++ differentiates in case of array index out of bound
#include <stdio.h>
int main()
{
int a[10];
a[3]=4;
a[11]=3;//does not give segmentation fault
a[25]=4;//does not give segmentation fault
a[20000]=3; //gives segmentation fault
return 0;
}
I understand that it's trying to access memory allocated to process or thread in case of a[11] or a[25] and it's going out of stack bounds in case of a[20000].
Why doesn't compiler or linker give an error, aren't they aware of the array size? If not then how does sizeof(a) work correctly?
The problem is that C/C++ doesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.
In this particular case, you are declaring a stack based array. Depending upon the particular implementation, accessing outside the bounds of the array will simply access another part of the already allocated stack space (most OS's and threads reserve a certain portion of memory for stack). As long as you just happen to be playing around in the pre-allocated stack space, everything will not crash (note i did not say work).
What's happening on the last line is that you have now accessed beyond the part of memory that is allocated for the stack. As a result you are indexing into a part of memory that is not allocated to your process or is allocated in a read only fashion. The OS sees this and sends a seg fault to the process.
This is one of the reasons that C/C++ is so dangerous when it comes to boundary checking.
The segfault is not an intended action of your C program that would tell you that an index is out of bounds. Rather, it is an unintended consequence of undefined behavior.
In C and C++, if you declare an array like
type name[size];
You are only allowed to access elements with indexes from 0 up to size-1. Anything outside of that range causes undefined behavior. If the index was near the range, most probably you read your own program's memory. If the index was largely out of range, most probably your program will be killed by the operating system. But you can't know, anything can happen.
Why does C allow that? Well, the basic gist of C and C++ is to not provide features if they cost performance. C and C++ has been used for ages for highly performance critical systems. C has been used as a implementation language for kernels and programs where access out of array bounds can be useful to get fast access to objects that lie adjacent in memory. Having the compiler forbid this would be for naught.
Why doesn't it warn about that? Well, you can put warning levels high and hope for the compiler's mercy. This is called quality of implementation (QoI). If some compiler uses open behavior (like, undefined behavior) to do something good, it has a good quality of implementation in that regard.
[js#HOST2 cpp]$ gcc -Wall -O2 main.c
main.c: In function 'main':
main.c:3: warning: array subscript is above array bounds
[js#HOST2 cpp]$
If it instead would format your hard disk upon seeing the array accessed out of bounds - which would be legal for it - the quality of implementation would be rather bad. I enjoyed to read about that stuff in the ANSI C Rationale document.
You generally only get a segmentation fault if you try to access memory your process doesn't own.
What you're seeing in the case of a[11] (and a[10] by the way) is memory that your process does own but doesn't belong to the a[] array. a[25000] is so far from a[], it's probably outside your memory altogether.
Changing a[11] is far more insidious as it silently affects a different variable (or the stack frame which may cause a different segmentation fault when your function returns).
C isn't doing this. The OS's virtual memeory subsystem is.
In the case where you are only slightly out-of-bound you are addressing memeory that is allocated for your program (on the stack call stack in this case). In the case where you are far out-of-bounds you are addressing memory not given over to your program and the OS is throwing a segmentation fault.
On some systems there is also a OS enforced concept of "writeable" memory, and you might be trying to write to memeory that you own but is marked unwriteable.
Just to add what other people are saying, you cannot rely on the program simply crashing in these cases, there is no gurantee of what will happen if you attempt to access a memory location beyond the "bounds of the array." It's just the same as if you did something like:
int *p;
p = 135;
*p = 14;
That is just random; this might work. It might not. Don't do it. Code to prevent these sorts of problems.
As litb mentioned, some compilers can detect some out-of-bounds array accesses at compile time. But bounds checking at compile time won't catch everything:
int a[10];
int i = some_complicated_function();
printf("%d\n", a[i]);
To detect this, runtime checks would have to be used, and they're avoided in C because of their performance impact. Even with knowledge of a's array size at compile time, i.e. sizeof(a), it can't protect against that without inserting a runtime check.
As I understand the question and comments, you understand why bad things can happen when you access memory out of bounds, but you're wondering why your particular compiler didn't warn you.
Compilers are allowed to warn you, and many do at the highest warning levels. However the standard is written to allow people to run compilers for all sorts of devices, and compilers with all sorts of features so the standard requires the least it can while guaranteeing people can do useful work.
There are a few times the standard requires that a certain coding style will generate a diagnostic. There are several other times where the standard does not require a diagnostic. Even when a diagnostic is required I'm not aware of any place where the standard says what the exact wording should be.
But you're not completely out in the cold here. If your compiler doesn't warn you, Lint may. Additionally, there are a number of tools to detect such problems (at run time) for arrays on the heap, one of the more famous being Electric Fence (or DUMA). But even Electric Fence doesn't guarantee it will catch all overrun errors.
That's not a C issue its an operating system issue. You're program has been granted a certain memory space and anything you do inside of that is fine. The segmentation fault only happens when you access memory outside of your process space.
Not all operating systems have seperate address spaces for each proces, in which case you can corrupt the state of another process or of the operating system with no warning.
C philosophy is always trust the programmer. And also not checking bounds allows the program to run faster.
As JaredPar said, C/C++ doesn't always perform range checking. If your program accesses a memory location outside your allocated array, your program may crash, or it may not because it is accessing some other variable on the stack.
To answer your question about sizeof operator in C:
You can reliably use sizeof(array)/size(array[0]) to determine array size, but using it doesn't mean the compiler will perform any range checking.
My research showed that C/C++ developers believe that you shouldn't pay for something you don't use, and they trust the programmers to know what they are doing. (see accepted answer to this: Accessing an array out of bounds gives no error, why?)
If you can use C++ instead of C, maybe use vector? You can use vector[] when you need the performance (but no range checking) or, more preferably, use vector.at() (which has range checking at the cost of performance). Note that vector doesn't automatically increase capacity if it is full: to be safe, use push_back(), which automatically increases capacity if necessary.
More information on vector: http://www.cplusplus.com/reference/vector/vector/