Array index out of bound behavior - c++

Why does C/C++ differentiates in case of array index out of bound
#include <stdio.h>
int main()
{
int a[10];
a[3]=4;
a[11]=3;//does not give segmentation fault
a[25]=4;//does not give segmentation fault
a[20000]=3; //gives segmentation fault
return 0;
}
I understand that it's trying to access memory allocated to process or thread in case of a[11] or a[25] and it's going out of stack bounds in case of a[20000].
Why doesn't compiler or linker give an error, aren't they aware of the array size? If not then how does sizeof(a) work correctly?

The problem is that C/C++ doesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.
In this particular case, you are declaring a stack based array. Depending upon the particular implementation, accessing outside the bounds of the array will simply access another part of the already allocated stack space (most OS's and threads reserve a certain portion of memory for stack). As long as you just happen to be playing around in the pre-allocated stack space, everything will not crash (note i did not say work).
What's happening on the last line is that you have now accessed beyond the part of memory that is allocated for the stack. As a result you are indexing into a part of memory that is not allocated to your process or is allocated in a read only fashion. The OS sees this and sends a seg fault to the process.
This is one of the reasons that C/C++ is so dangerous when it comes to boundary checking.

The segfault is not an intended action of your C program that would tell you that an index is out of bounds. Rather, it is an unintended consequence of undefined behavior.
In C and C++, if you declare an array like
type name[size];
You are only allowed to access elements with indexes from 0 up to size-1. Anything outside of that range causes undefined behavior. If the index was near the range, most probably you read your own program's memory. If the index was largely out of range, most probably your program will be killed by the operating system. But you can't know, anything can happen.
Why does C allow that? Well, the basic gist of C and C++ is to not provide features if they cost performance. C and C++ has been used for ages for highly performance critical systems. C has been used as a implementation language for kernels and programs where access out of array bounds can be useful to get fast access to objects that lie adjacent in memory. Having the compiler forbid this would be for naught.
Why doesn't it warn about that? Well, you can put warning levels high and hope for the compiler's mercy. This is called quality of implementation (QoI). If some compiler uses open behavior (like, undefined behavior) to do something good, it has a good quality of implementation in that regard.
[js#HOST2 cpp]$ gcc -Wall -O2 main.c
main.c: In function 'main':
main.c:3: warning: array subscript is above array bounds
[js#HOST2 cpp]$
If it instead would format your hard disk upon seeing the array accessed out of bounds - which would be legal for it - the quality of implementation would be rather bad. I enjoyed to read about that stuff in the ANSI C Rationale document.

You generally only get a segmentation fault if you try to access memory your process doesn't own.
What you're seeing in the case of a[11] (and a[10] by the way) is memory that your process does own but doesn't belong to the a[] array. a[25000] is so far from a[], it's probably outside your memory altogether.
Changing a[11] is far more insidious as it silently affects a different variable (or the stack frame which may cause a different segmentation fault when your function returns).

C isn't doing this. The OS's virtual memeory subsystem is.
In the case where you are only slightly out-of-bound you are addressing memeory that is allocated for your program (on the stack call stack in this case). In the case where you are far out-of-bounds you are addressing memory not given over to your program and the OS is throwing a segmentation fault.
On some systems there is also a OS enforced concept of "writeable" memory, and you might be trying to write to memeory that you own but is marked unwriteable.

Just to add what other people are saying, you cannot rely on the program simply crashing in these cases, there is no gurantee of what will happen if you attempt to access a memory location beyond the "bounds of the array." It's just the same as if you did something like:
int *p;
p = 135;
*p = 14;
That is just random; this might work. It might not. Don't do it. Code to prevent these sorts of problems.

As litb mentioned, some compilers can detect some out-of-bounds array accesses at compile time. But bounds checking at compile time won't catch everything:
int a[10];
int i = some_complicated_function();
printf("%d\n", a[i]);
To detect this, runtime checks would have to be used, and they're avoided in C because of their performance impact. Even with knowledge of a's array size at compile time, i.e. sizeof(a), it can't protect against that without inserting a runtime check.

As I understand the question and comments, you understand why bad things can happen when you access memory out of bounds, but you're wondering why your particular compiler didn't warn you.
Compilers are allowed to warn you, and many do at the highest warning levels. However the standard is written to allow people to run compilers for all sorts of devices, and compilers with all sorts of features so the standard requires the least it can while guaranteeing people can do useful work.
There are a few times the standard requires that a certain coding style will generate a diagnostic. There are several other times where the standard does not require a diagnostic. Even when a diagnostic is required I'm not aware of any place where the standard says what the exact wording should be.
But you're not completely out in the cold here. If your compiler doesn't warn you, Lint may. Additionally, there are a number of tools to detect such problems (at run time) for arrays on the heap, one of the more famous being Electric Fence (or DUMA). But even Electric Fence doesn't guarantee it will catch all overrun errors.

That's not a C issue its an operating system issue. You're program has been granted a certain memory space and anything you do inside of that is fine. The segmentation fault only happens when you access memory outside of your process space.
Not all operating systems have seperate address spaces for each proces, in which case you can corrupt the state of another process or of the operating system with no warning.

C philosophy is always trust the programmer. And also not checking bounds allows the program to run faster.

As JaredPar said, C/C++ doesn't always perform range checking. If your program accesses a memory location outside your allocated array, your program may crash, or it may not because it is accessing some other variable on the stack.
To answer your question about sizeof operator in C:
You can reliably use sizeof(array)/size(array[0]) to determine array size, but using it doesn't mean the compiler will perform any range checking.
My research showed that C/C++ developers believe that you shouldn't pay for something you don't use, and they trust the programmers to know what they are doing. (see accepted answer to this: Accessing an array out of bounds gives no error, why?)
If you can use C++ instead of C, maybe use vector? You can use vector[] when you need the performance (but no range checking) or, more preferably, use vector.at() (which has range checking at the cost of performance). Note that vector doesn't automatically increase capacity if it is full: to be safe, use push_back(), which automatically increases capacity if necessary.
More information on vector: http://www.cplusplus.com/reference/vector/vector/

Related

Why is overflow in C++ arrays operating system/compiler dependent? [duplicate]

This question already has answers here:
Undefined, unspecified and implementation-defined behavior
(9 answers)
Closed 2 years ago.
I have a question regarding the C++ arrays. I am learning C++ and in the "Arrays" section I came across with something that I could not understand.
Suppose we have an integer array with 5 element. The instructor says, in the next line, if we try to get input for the fifth index of the array we cannot know what will happen (e.g. crash or whatnot) because it is operating system and compiler dependent.
int testArray[5] {0};
std::cin >> testArray[5];
Can you enlighten me why this situation is OS/compiler dependent?
C++ is a systems language. It doesn't mandate any checking on operations: a program which is implemented correctly would be impeded by unnecessary checks, e.g., to verify that an array access is within bounds. The checks would consist of a conditional potentially needed on every access (unless the optimizer can see that some checks can be elided) and potentially also extra memory as the system would need to represent the size of the array for the checks.
As there are no checks, you may end up manipulating bytes outside the memory allocated for the array in question. That may override values used for something else or it may access memory beyond a page boundary. In the latter case the system may be set up to cause segmentation fault. In the former case it may access otherwise unused memory and there may be no ill effect. ... or it may modify the stack in a way preventing the program from correctly returning from a function. However, how the stack is laid out depends on many aspects like the CPU used, the calling conventions on the given system, some compiler preferences. To avoid impeding correct programs there is no mandated behavior when illegal operations are performed. Instead the behavior the program become undefined and anything is OK to happen.
Having behavior undefined is kind of bad if you cause it. The simple solution is not to cause undefined behavior (yes, I know, that is easier said than done). However, leaving behavior undefined is often good performance and it also enables different implementation to actually define the behavior to do something helpful. For example, it makes it legal for implementations to check for undefined behavior and report these cases. Doing something to detect undefined behavior is, e.g., done by -fsanitize=undefined provided by some compilers (I don't think that would cause all kinds of undefined behavior to be detected but it certainly detects some kinds of undefined behavior).
Arrays are not meant to be indexed outside of. C++ made the decision to not actually check these indexes, which means that you can read out of the bounds of an array. It simply tells you that you shouldn't do that, and delegates the out-of-bounds checks to the programmer for added speed.
Since compilers and operating systems and many other factors can determine how memory is laid out, reading out of the bounds of an array can do basically anything, including giving garbage values, segfaulting, or summon nasal demons.

Detecting that the stack is full

When writing C++ code I've learned that using the stack to store memory is a good idea.
But recently I ran into a problem:
I had an experiment that had code that looked like this:
void fun(const unsigned int N) {
float data_1[N*N];
float data_2[N*N];
/* Do magic */
}
The code exploted with a seqmentation fault at random, and I had no idea why.
It turned out that problem was that I was trying to store things that were to big on my stack, is there a way of detecting this? Or at least detecting that it has gone wrong?
float data_1[N*N];
float data_2[N*N];
These are variable length arrays (VLA), as N is not a constant expression. The const-ness in the parameter only ensures that N is read-only. It doesn't tell the compiler that N is constant expression.
VLAs are allowed in C99 only; in other version of C, and all versions of C++ they're not allowed. However, some compilers provides VLA as compiler-extension feature. If you're compiling with GCC, then try using -pedantic option, it will tell you it is not allowed.
Now why your program gives segfault, probably because of stack-overflow due to large value of N * N:
Consider using std::vector as:
#include <vector>
void fun(const unsigned int N)
{
std::vector<float> data_1(N*N);
std::vector<float> data_2(N*N);
//your code
}
It's extremely difficult to detect that the stack is full, and not at all portable. One of the biggest problems is that stack frames are of variable size (especially when using variable-length arrays, which are really just a more standard way of doing what people were doing before with alloca()) so you can't use simple proxies like the number of stack frames.
One of the simplest methods that is mostly portable is to put a variable (probably of type char so that a pointer to it is a char*) at a known depth on the stack and to then measure the distance from that point to a variable (of the same type) in the current stack frame by simple pointer arithmetic. Add in an estimate of how much space you're about to allocate, and you can have a good guess as to wether the stack is about to blow up on you. The problems with this are that you don't know the direction that the stack is growing in (no, they don't all grow in the same direction!) and working out the size of the stack space is itself rather messy (you can try things like system limits, but they're really quite awkward). Plus the hack factor is very high.
The other trick I've seen used on 32-bit Windows only was to try to alloca() sufficient space and handle the system exception that would occur if there was insufficient room.
int have_enough_stack_space(void) {
int enough_space = 0;
__try { /* Yes, that's got a double-underscore. */
alloca(SOME_VALUE_THAT_MEANS_ENOUGH_SPACE);
enough_space = 1;
} __except (EXCEPTION_EXECUTE_HANDLER) {}
return enough_space;
}
This code is very non-portable (e.g., don't count on it working on 64-bit Windows) and building with older gcc requires some nasty inline assembler instead! Structured exception handling (which this is a use of) is amongst the blackest of black arts on Windows. (And don't return from inside the __try construct.)
Try using instead functions like malloc. It will return null explicitly, if it failed to find a block of memory of the size you requested.
Of course, in that case don't forget to free this memory in the end of function, after you are done.
Also, you can check the settings of your compiler, with what stack memory limit it generates the binaries.
One of the reasons people say it is better to use stack instead of heap memory can be because of the fact that variables allocated on top of the stack will be popped out automatically when you leave the body of the function. For storing big blocks of information it is usual to use heap memory and other data structures like linked lists or trees. Also memories allocated on the stack is limited and much more less than you can allocate in the heap space. I think it is better to manage the memory allocation and releasing more carefully instead of trying to use stack for storing big data.
You can use framework which manage your memory allocations. As well you can use VDL to check your memory leaks and memories which is not released.
is there a way of detecting this?
No, in general.
Stack size is platform depedent. Typically, Operating System decides the size of the stack. So you can check your OS (ulimit -s on linux) to see how much stack memory it allocates for your program.
If your compiler supports stackavail() then you can check it. It's better to go heap-allocated memory in situations where you are unsure whether you'd exceed the stack limit.

Why does strcpy "work" when writing to malloc'ed memory that is not large enough? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why does this intentionally incorrect use of strcpy not fail horribly?
Below see below code:
char* stuff = (char*)malloc(2);
strcpy(stuff,"abc");
cout<<"The size of stuff is : "<<strlen(stuff);
Even though I assigned 2 bytes to stuff, why does strcpy still work and the output of strlen is 3. Shouldn't this throw something like index out of bounds?
C and C++ don't do automatic bounds checking like Java and C# do. This code will overwrite stuff in memory past the end of the string, corrupting whatever was there. That can lead to strange behavior or crashes later, so it's good to be cautious about such things.
Accessing past the end of an array is deemed "undefined behavior" by the C and C++ standards. That means the standard doesn't specify what must happen when a program does that, so a program that triggers UB is in never-never-land where anything might happen. It might continue to work with no apparent problems. It might crash immediately. It might crash later when doing something else that shouldn't have been a problem. It might misbehave but not crash. Or velociraptors might come and eat you. Anything can happen.
Writing past the end of an array is called a buffer overflow, by the way, and it's a common cause of security flaws. If that "abc" string were actually user input, a skilled attacker could put bytes into it that end up overwriting something like the function's return pointer, which can be used to make the program run different code than it should, and do different things than it should.
you just over write heap memory, no crash usually, but bad things can happen later. C does not prevent you from shooting your own foot, no such thing as array out of bounds.
No, your char pointer now points to a character of length 3. Generally this would not cause any problems, but you might overwrite some critical memory region and cause the program to crash(you can expect to see a segmentation fault then). Specially when you are performing such operations over a large amount of memory
  here is the implementation of "strcpy"
char *strcpy(char *strDestination, const char *strSource)
  {
  assert(strDestination && strSource);
  char *strD=strDestination;
  while ((*strDestination++=*strSource++)!='\0')
  NULL;
  return strD;
  }
you should ensure the destination have enough space. However,what it is,it is.
strcpy does not check for sufficient space in strDestination before copying strSource,
ALso it does not perform bounds checking, and thus risks overrunning from or to. it is a potential cause of buffer overruns.

Why compiler does not complain about accessing elements beyond the bounds of a dynamic array? [duplicate]

This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 7 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Duplicate This question has been answered, is not unique, and doesn’t differentiate itself from another question.
I am defining an array of size 9. but when I access the array index 10 it is not giving any error.
int main() {
bool* isSeedPos = new bool[9];
isSeedPos[10] = true;
}
I expected to get a compiler error, because there is no array element isSeedPos[10] in my array.
Why don't I get an error?
It's not a problem.
There is no bound-check in C++ arrays. You are able to access elements beyond the array's limit (but this will usually cause an error).
If you want to use an array, you have to check that you are not out of bounds yourself (you can keep the sizee in a separate variable, as you did).
Of course, a better solution would be to use the standard library containers such as std::vector.
With std::vector you can either
use the myVector.at(i)method to get the ith element (which will throw an exception if you are out of bounds)
use myVector[i] with the same syntax as C-style arrays, but you have to do bound-checking yourself ( e.g. try if (i < myVector.size()) ... before accessing it)
Also note that in your case, std::vector<bool> is a specialized version implemented so that each booltakes only one bit of memory (therefore it uses less memory than an array of bool, which may or may not be what you want).
Use std::vector instead. Some implementations will do bounds checking in debug mode.
No, the compiler is not required to emit a diagnostic for this case. The compiler does not perform bounds checking for you.
It is your responsibility to make sure that you don't write broken code like this, because the compiler will not error on it.
Unlike in other languages like java and python, array access is not bound-checked in C or C++. That makes accessing arrays faster. It is your responsibility to make sure that you stay within bounds.
However, in such a simple case such as this, some compilers can detect the error at compile time.
Also, some tools such as valgrind can help you detect such errors at run time.
What compiler/debugger are you using?
MSVC++ would complain about it and tell you that you write out of bounds of an array.
But it is not required to do it by the standard.
It can crash anytime, it causes undefined behaviour.
Primitive arrays do not do bounds-checking. If you want bounds-checking, you should use std::vector instead. You are accessing invalid memory after the end of array, and purely by luck it is working.
There is no runtime checking on the index you are giving, accessing element 10 is incorrect but possible. Two things can happen:
if you are "unlucky", this will not crash and will return some data located after your array.
if you are "lucky", the data after the array is not allocated by your program, so access to the requested address is forbidden. This will be detected by the operating system and will produce a "segmentation fault".
There is no rule stateing that the memory access is checked in c, plain and simple. When you ask for an array of bool's it might be faster for the Operating system to give you a 16bit og 32bit array, instead of a 9bit one. This means that you might not even be writing or reading into someone elses space.
C++ is fast, and one of the reasons that it is fast is becaurse there are very few checks on what you are doing, if you ask for some memory, then the programming language will assume that you know what you are doing, and if the operating system does not complain, then everything will run.
There is no problem! You are just accessing memory that you shouldn't access. You get access to memory after the array.
isSeedPos doesn't know how big the array is. It is just a pointer to a position in memory. When you point to isSeepPos[10] the behaviour is undefined. Chances are sooner or later this will cause a segfault, but there is no requirement for a crash, and there is certainly no standard error checking.
Writing to that position is dangerous.
But the compiler will let you do it - Effectively you're writing one-past the last byte of memory assigned to that array = not a good thing.
C++ isn't a lot like many other languages - It assumes that you know what you are doing!
Both C and C++ let you write to arbitrary areas of memory. This is because they originally derived from (and are still used for) low-level programming where you may legitimately want to write to a memory mapped peripheral, or similar, and because it's more efficient to omit bounds checking when the programmer already knows the value will be within (eg. for a loop 0 to N over an array, he/she knows 0 and N are within the bounds, so checking each intermediate value is superfluous).
However, in truth, nowadays you rarely want to do that. If you use the arr[i] syntax, you essentially always want to write to the array declared in arr, and never do anything else. But you still can if you want to.
If you do write to arbitrary memory (as you do in this case) either it will be part of your program, and it will change some other critical data without you knowing (either now, or later when you make a change to the code and have forgotten what you were doing); or it will write to memory not allocated to your program and the OS will shut it down to prevent worse problems.
Nowadays:
Many compilers will spot it if you make an obvious mistake like this one
There are tools which will test if your program writes to unallocated memory
You can and should use std::vector instead, which is there for the 99% of the time you want bounds checking. (Check whether you're using at() or [] to access it)
This is not Java. In C or C++ there is no bounds checking; it's pure luck that you can write to that index.

Pointers to statically allocated objects

I'm trying to understand how pointers to statically allocated objects work and where they can go wrong.
I wrote this code:
int* pinf = NULL;
for (int i = 0; i<1;i++) {
int inf = 4;
pinf = &inf;
}
cout<<"inf"<< (*pinf)<<endl;
I was surprised that it worked becasue I thought that inf would dissapear when the program left the block and the pointer would point to something that no longer exists. I expected a segmentation fault when trying to access pinf. At what stage in the program would inf die?
Your understanding is correct. inf disappears when you leave the scope of the loop, and so accessing *pinf yields undefined behavior. Undefined behavior means the compiler and/or program can do anything, which may be to crash, or in this case may be to simply chug along.
This is because inf is on the stack. Even when it is out of scope pinf still points to a useable memory location on the stack. As far as the runtime is concerned the stack address is fine, and the compiler doesn't bother to insert code to verify that you're not accessing locations beyond the end of the stack. That would be prohibitively expensive in a language designed for speed.
For this reason you must be very careful to avoid undefined behavior. C and C++ are not nice the way Java or C# are where illegal operations pretty much always generate an immediate exception and crash your program. You the programmer have to be vigilant because the compiler will miss all kinds of elementary mistakes you make.
You use so called Dangling pointer. It will result in undefined behavior by the C++ Standard.
It probably will never die because pinf will point to something on the stack.
Stacks don't often shrink.
Modify it and you'll pretty much be guaranteed an overwrite though.
If you are asking about this:
int main() {
int* pinf = NULL;
for (int i = 0; i<1;i++){
int inf = 4;
pinf = &inf;
}
cout<<"inf"<< (*pinf)<<endl;
}
Then what you have is undefined behaviour. The automatically allocated (not not static) object inf has gone out of scope and notionally been destroyed when you access it via the pointer. In this case, anything might happen, including it appearing to "work".
You won't necessarily get a SIGSEGV (segmentation fault). inf memory is probably allocated in the stack. And the stack memory region is probably still allocated to your process at that point, so, that's probably why you are not getting a seg fault.
The behaviour is undefined, but in practice, "destructing" an int is a noop, so most compilers will leave the number alone on the stack until something else comes along to reuse that particular slot.
Some compilers might set the int to 0xDEADBEEF (or some such garbage) when it goes out of scope in debug mode, but that won't make the cout << ... fail; it will simply print the nonsensical value.
The memory may or may not still contain a 4 when it gets to your cout line. It might contain a 4 strictly by accident. :)
First things first: your operating system can only detect memory access gone astray on page boundaries. So, if you're off by 4k or 8k or 16k or more. (Check /proc/self/maps on a Linux system some day to see the memory layout of a process; any addresses in the listed ranges are allowed, any outside the listed ranges aren't allowed. Every modern OS on protected-memory CPUs will support a similar mechanism, so it'll be instructive even if you're just not that interested in Linux. I just know it is easy on Linux.) So, the OS can't help you when your data is so small.
Also, your int inf = 4; might very well be stashed in the .rodata, .data or .text segments of your program. Static variables may be stuffed into any of these sections (I have no idea how the compiler/linker decides; I consider it magic) and they will therefore be valid throughout the entire duration of the program. Check size /bin/sh next time you are on a Unix system for an idea how much data gets put into which sections. (And check out readelf(1) for way too much information. objdump(1) if you're on older systems.)
If you change inf = 4 to inf = i, then the storage will be allocated on the stack, and you stand a much better chance of having it get overwritten quickly.
A protection fault occurs when the memory page you point to is not valid anymore for the process.
Luckily most OS's don't create a separate page for each integer's worth of stack space.