C++ new[] operator creates array of length = length + 1? - c++

Why does the new[] operator in C++ actually create an array of length + 1? For example, see this code:
#include <iostream>
int main()
{
std::cout << "Enter a positive integer: ";
int length;
std::cin >> length;
int *array = new int[length]; // use array new. Note that length does not need to be constant!
//int *array;
std::cout << "I just allocated an array of integers of length " << length << '\n';
for (int n = 0; n<=length+1; n++)
{
array[n] = 1; // set element n to value 1
}
std::cout << "array[0] " << array[0] << '\n';
std::cout << "array[length-1] " << array[length-1] << '\n';
std::cout << "array[length] " << array[length] << '\n';
std::cout << "array[length+1] " << array[length+1] << '\n';
delete[] array; // use array delete to deallocate array
array = 0; // use nullptr instead of 0 in C++11
return 0;
}
We dynamically create an array of length "length" but we are able to assign a value at the index length+1. If we try to do length+2, we get an error.
Why is this? Why does C++ make the length = length + 1?

It doesn’t. You’re allowed to calculate the address array + n, for the purpose of checking that another address is less than it. Trying to access the element array[n] is undefined behavior, which means the program becomes meaningless and the compiler is allowed to do anything whatsoever. Literally anything; one old version of GCC, if it saw a #pragma directive, started a roguelike game on the terminal. (Thanks, Revolver_Ocelot, for reminding me: that was technically implementation-defined behavior, a different category.) Even calculating the address array + n + 1 is undefined behavior.
Because it can do anything, the particular compiler you tried that on decided to let you shoot yourself in the foot. If, for example, the next two words after the array were the header of another block in the heap, you might get a memory-corruption bug. Or maybe a compiler stored the array at the top of your memory space, the address &array[n+1] is aNULL` pointer, and trying to dereference it causes a segmentation fault. Or maybe the next page of memory is not readable or writable and trying to access it crashes the program with a protection fault. Or maybe the implementation bounds-checks your array accesses at runtime and crashes the program. Maybe the runtime stuck a canary value after the array and checks later to see if it was overwritten. Or maybe it happens, by accident, to work.
In practice, you really want the compiler to catch those bugs for you instead of trying to track down the bugs that buffer overruns cause later. It would be better to use a std::vector than a dynamic array. If you must use an array, you want to check that all your accesses are in-bounds yourself, because you cannot rely on the compiler to do that for you and skipping them is a major cause of bugs.

If you write or read beyond the end of an array or other object you create with new, your program's behaviour is no longer defined by the C++ standard.
Anything can happen and the compiler and program remain standard compliant.
The most likely thing to happen in this case is you are corrupting memory in the heap. In a small program this "seems to work" as the section of the heap ypu use isn't being used by any other code, in a larger one you will crash or behave randomly elsewhere in a seemingoy unrelated bit of code.
But arbitrary things could happen. The compiler could prove a branch leads to access beyond tue end of an array and dead-code eliminate paths that lead to it (UB that time travels), or it could hit a protected memory region and crash, or it could corrupt heap management data and cause a future new/delete to crash, or nasal demons, or whatever else.

At the for loop you are assigning elements beyond the bounds of the loop and remember that C++ does not do bounds checking.
So when you initialize the array you are initializing beyond the bounds of the array (Say the user enters 3 for length you are initializing 1 to array[0] through array[5] because the condition is n <= length + 1;
The behavior of the array is unpredictable when you go beyond its bounds, but most likely your program will crash. In this case you are going 2 elements beyonds its bounds because you have used = in the condition and length + 1.

There is no requirement that the new [] operator allocate more memory than requested.
What is happening is that your code is running past the end of the allocated array. It therefore has undefined behaviour.
Undefined behaviour means that the C++ standard imposes no requirements on what happens. Therefore, your implementation (compiler and standard library, in this case) will be equally correct if your program SEEMS to work properly (as it does in your case), produces a run time error, trashes your system drive, or anything else.
In practice, all that is happening is that your code is writing to memory, and later reading from that memory, past the end of the allocated memory block. What happens depends on what is actually in that memory location. In your case, whatever happens to be in that memory location is able to be modified (in the loop) or read (in order to print to std::cout).
Conclusion: the explanation is not that new[] over-allocates. It is that your code has undefined behaviour, so can seem to work anyway.

Related

Problem : Element stored outside defined array is accessible [duplicate]

I am assigning values in a C++ program out of the bounds like this:
#include <iostream>
using namespace std;
int main()
{
int array[2];
array[0] = 1;
array[1] = 2;
array[3] = 3;
array[4] = 4;
cout << array[3] << endl;
cout << array[4] << endl;
return 0;
}
The program prints 3 and 4. It should not be possible. I am using g++ 4.3.3
Here is compile and run command
$ g++ -W -Wall errorRange.cpp -o errorRange
$ ./errorRange
3
4
Only when assigning array[3000]=3000 does it give me a segmentation fault.
If gcc doesn't check for array bounds, how can I be sure if my program is correct, as it can lead to some serious issues later?
I replaced the above code with
vector<int> vint(2);
vint[0] = 0;
vint[1] = 1;
vint[2] = 2;
vint[5] = 5;
cout << vint[2] << endl;
cout << vint[5] << endl;
and this one also produces no error.
Welcome to every C/C++ programmer's bestest friend: Undefined Behavior.
There is a lot that is not specified by the language standard, for a variety of reasons. This is one of them.
In general, whenever you encounter undefined behavior, anything might happen. The application may crash, it may freeze, it may eject your CD-ROM drive or make demons come out of your nose. It may format your harddrive or email all your porn to your grandmother.
It may even, if you are really unlucky, appear to work correctly.
The language simply says what should happen if you access the elements within the bounds of an array. It is left undefined what happens if you go out of bounds. It might seem to work today, on your compiler, but it is not legal C or C++, and there is no guarantee that it'll still work the next time you run the program. Or that it hasn't overwritten essential data even now, and you just haven't encountered the problems, that it is going to cause — yet.
As for why there is no bounds checking, there are a couple aspects to the answer:
An array is a leftover from C. C arrays are about as primitive as you can get. Just a sequence of elements with contiguous addresses. There is no bounds checking because it is simply exposing raw memory. Implementing a robust bounds-checking mechanism would have been almost impossible in C.
In C++, bounds-checking is possible on class types. But an array is still the plain old C-compatible one. It is not a class. Further, C++ is also built on another rule which makes bounds-checking non-ideal. The C++ guiding principle is "you don't pay for what you don't use". If your code is correct, you don't need bounds-checking, and you shouldn't be forced to pay for the overhead of runtime bounds-checking.
So C++ offers the std::vector class template, which allows both. operator[] is designed to be efficient. The language standard does not require that it performs bounds checking (although it does not forbid it either). A vector also has the at() member function which is guaranteed to perform bounds-checking. So in C++, you get the best of both worlds if you use a vector. You get array-like performance without bounds-checking, and you get the ability to use bounds-checked access when you want it.
Using g++, you can add the command line option: -fstack-protector-all.
On your example it resulted in the following:
> g++ -o t -fstack-protector-all t.cc
> ./t
3
4
/bin/bash: line 1: 15450 Segmentation fault ./t
It doesn't really help you find or solve the problem, but at least the segfault will let you know that something is wrong.
g++ does not check for array bounds, and you may be overwriting something with 3,4 but nothing really important, if you try with higher numbers you'll get a crash.
You are just overwriting parts of the stack that are not used, you could continue till you reach the end of the allocated space for the stack and it'd crash eventually
EDIT:
You have no way of dealing with that, maybe a static code analyzer could reveal those failures, but that's too simple, you may have similar(but more complex) failures undetected even for static analyzers
It's undefined behavior as far as I know. Run a larger program with that and it will crash somewhere along the way. Bounds checking is not a part of raw arrays (or even std::vector).
Use std::vector with std::vector::iterator's instead so you don't have to worry about it.
Edit:
Just for fun, run this and see how long until you crash:
int main()
{
int arr[1];
for (int i = 0; i != 100000; i++)
{
arr[i] = i;
}
return 0; //will be lucky to ever reach this
}
Edit2:
Don't run that.
Edit3:
OK, here is a quick lesson on arrays and their relationships with pointers:
When you use array indexing, you are really using a pointer in disguise (called a "reference"), that is automatically dereferenced. This is why instead of *(array+1), array[1] automatically returns the value at that index.
When you have a pointer to an array, like this:
int arr[5];
int *ptr = arr;
Then the "array" in the second declaration is really decaying to a pointer to the first array. This is equivalent behavior to this:
int *ptr = &arr[0];
When you try to access beyond what you allocated, you are really just using a pointer to other memory (which C++ won't complain about). Taking my example program above, that is equivalent to this:
int main()
{
int arr[1];
int *ptr = arr;
for (int i = 0; i != 100000; i++, ptr++)
{
*ptr++ = i;
}
return 0; //will be lucky to ever reach this
}
The compiler won't complain because in programming, you often have to communicate with other programs, especially the operating system. This is done with pointers quite a bit.
Hint
If you want to have fast constraint size arrays with range error check, try using boost::array, (also std::tr1::array from <tr1/array> it will be standard container in next C++ specification). It's much faster then std::vector. It reserve memory on heap or inside class instance, just like int array[].
This is simple sample code:
#include <iostream>
#include <boost/array.hpp>
int main()
{
boost::array<int,2> array;
array.at(0) = 1; // checking index is inside range
array[1] = 2; // no error check, as fast as int array[2];
try
{
// index is inside range
std::cout << "array.at(0) = " << array.at(0) << std::endl;
// index is outside range, throwing exception
std::cout << "array.at(2) = " << array.at(2) << std::endl;
// never comes here
std::cout << "array.at(1) = " << array.at(1) << std::endl;
}
catch(const std::out_of_range& r)
{
std::cout << "Something goes wrong: " << r.what() << std::endl;
}
return 0;
}
This program will print:
array.at(0) = 1
Something goes wrong: array<>: index out of range
C or C++ will not check the bounds of an array access.
You are allocating the array on the stack. Indexing the array via array[3] is equivalent to *(array + 3), where array is a pointer to &array[0]. This will result in undefined behavior.
One way to catch this sometimes in C is to use a static checker, such as splint. If you run:
splint +bounds array.c
on,
int main(void)
{
int array[1];
array[1] = 1;
return 0;
}
then you will get the warning:
array.c: (in function main)
array.c:5:9: Likely out-of-bounds
store:
array[1]
Unable to resolve constraint:
requires 0 >= 1
needed to satisfy precondition:
requires maxSet(array # array.c:5:9) >= 1 A memory write may
write to an address beyond the
allocated buffer.
Run this through Valgrind and you might see an error.
As Falaina pointed out, valgrind does not detect many instances of stack corruption. I just tried the sample under valgrind, and it does indeed report zero errors. However, Valgrind can be instrumental in finding many other types of memory problems, it's just not particularly useful in this case unless you modify your bulid to include the --stack-check option. If you build and run the sample as
g++ --stack-check -W -Wall errorRange.cpp -o errorRange
valgrind ./errorRange
valgrind will report an error.
You are certainly overwriting your stack, but the program is simple enough that effects of this go unnoticed.
libstdc++, which is part of gcc, has a special debug mode for error checking. It is enabled by compiler flag -D_GLIBCXX_DEBUG. Among other things it does bounds checking for std::vector at the cost of performance. Here is online demo with recent version of gcc.
So actually you can do bounds checking with libstdc++ debug mode but you should do it only when testing because it costs notable performance compared to normal libstdc++ mode.
Undefined behavior working in your favor. Whatever memory you're clobbering apparently isn't holding anything important. Note that C and C++ do not do bounds checking on arrays, so stuff like that isn't going to be caught at compile or run time.
When you write 'array[index]' in C it translates it to machine instructions.
The translation is goes something like:
'get the address of array'
'get the size of the type of objects array is made up of'
'multiply the size of the type by index'
'add the result to the address of array'
'read what's at the resulting address'
The result addresses something which may, or may not, be part of the array. In exchange for the blazing speed of machine instructions you lose the safety net of the computer checking things for you. If you're meticulous and careful it's not a problem. If you're sloppy or make a mistake you get burnt. Sometimes it might generate an invalid instruction that causes an exception, sometimes not.
When you initialize the array with int array[2], space for 2 integers is allocated; but the identifier array simply points to the beginning of that space. When you then access array[3] and array[4], the compiler then simply increments that address to point to where those values would be, if the array was long enough; try accessing something like array[42] without initializing it first, you'll end up getting whatever value happened to already be in memory at that location.
Edit:
More info on pointers/arrays: http://home.netcom.com/~tjensen/ptr/pointers.htm
As I understand, local variables are allocated on stack, so going out of bounds on your own stack can only overwrite some other local variable, unless you go oob too much and exceed your stack size.
Since you have no other variables declared in your function - it does not cause any side effects. Try declaring another variable/array right after your first one and see what will happen with it.
A nice approach that i have seen often and I had been used actually is to inject some NULL type element (or a created one, like uint THIS_IS_INFINITY = 82862863263;) at end of the array.
Then at the loop condition check, TYPE *pagesWords is some kind of pointer array:
int pagesWordsLength = sizeof(pagesWords) / sizeof(pagesWords[0]);
realloc (pagesWords, sizeof(pagesWords[0]) * (pagesWordsLength + 1);
pagesWords[pagesWordsLength] = MY_NULL;
for (uint i = 0; i < 1000; i++)
{
if (pagesWords[i] == MY_NULL)
{
break;
}
}
This solution won't word if array is filled with struct types.
As mentioned now in the question using std::vector::at will solve the problem and make a bound check before accessing.
If you need a constant size array that is located on the stack as your first code use the C++11 new container std::array; as vector there is std::array::at function. In fact the function exists in all standard containers in which it have a meaning,i.e, where operator[] is defined :( deque, map, unordered_map) with the exception of std::bitset in which it is called std::bitset::test.
If you change your program slightly:
#include <iostream>
using namespace std;
int main()
{
int array[2];
INT NOTHING;
CHAR FOO[4];
STRCPY(FOO, "BAR");
array[0] = 1;
array[1] = 2;
array[3] = 3;
array[4] = 4;
cout << array[3] << endl;
cout << array[4] << endl;
COUT << FOO << ENDL;
return 0;
}
(Changes in capitals -- put those in lower case if you're going to try this.)
You will see that the variable foo has been trashed. Your code will store values into the nonexistent array[3] and array[4], and be able to properly retrieve them, but the actual storage used will be from foo.
So you can "get away" with exceeding the bounds of the array in your original example, but at the cost of causing damage elsewhere -- damage which may prove to be very hard to diagnose.
As to why there is no automatic bounds checking -- a correctly written program does not need it. Once that has been done, there is no reason to do run-time bounds checking and doing so would just slow down the program. Best to get that all figured out during design and coding.
C++ is based on C, which was designed to be as close to assembly language as possible.
when you declare int array[2]; you reserve 2 memory spaces of 4 bytes each(32bit program).
if you type array[4] in your code it still corresponds to a valid call but only at run time will it throw an unhandled exception. C++ uses manual memory management. This is actually a security flaw that was used for hacking programs
this can help understanding:
int * somepointer;
somepointer[0]=somepointer[5];
The behavior can depend on your system. Typically, you will have a margin for out of bounds, sometimes with value of 0 or garbage values. For the details you can check with memory allocation mechanism used in your OS. On top of that, if you use the programming language like c/c++, it will not check the bounds when you using some containers, like array. So, you will meet "undefined event" because you do not know what the OS did below the surface. But like the programming language Java, it will check the bound. If you step outside of the bound, you will get an exception.

How is the shifting taking place in the character array? [duplicate]

I am assigning values in a C++ program out of the bounds like this:
#include <iostream>
using namespace std;
int main()
{
int array[2];
array[0] = 1;
array[1] = 2;
array[3] = 3;
array[4] = 4;
cout << array[3] << endl;
cout << array[4] << endl;
return 0;
}
The program prints 3 and 4. It should not be possible. I am using g++ 4.3.3
Here is compile and run command
$ g++ -W -Wall errorRange.cpp -o errorRange
$ ./errorRange
3
4
Only when assigning array[3000]=3000 does it give me a segmentation fault.
If gcc doesn't check for array bounds, how can I be sure if my program is correct, as it can lead to some serious issues later?
I replaced the above code with
vector<int> vint(2);
vint[0] = 0;
vint[1] = 1;
vint[2] = 2;
vint[5] = 5;
cout << vint[2] << endl;
cout << vint[5] << endl;
and this one also produces no error.
Welcome to every C/C++ programmer's bestest friend: Undefined Behavior.
There is a lot that is not specified by the language standard, for a variety of reasons. This is one of them.
In general, whenever you encounter undefined behavior, anything might happen. The application may crash, it may freeze, it may eject your CD-ROM drive or make demons come out of your nose. It may format your harddrive or email all your porn to your grandmother.
It may even, if you are really unlucky, appear to work correctly.
The language simply says what should happen if you access the elements within the bounds of an array. It is left undefined what happens if you go out of bounds. It might seem to work today, on your compiler, but it is not legal C or C++, and there is no guarantee that it'll still work the next time you run the program. Or that it hasn't overwritten essential data even now, and you just haven't encountered the problems, that it is going to cause — yet.
As for why there is no bounds checking, there are a couple aspects to the answer:
An array is a leftover from C. C arrays are about as primitive as you can get. Just a sequence of elements with contiguous addresses. There is no bounds checking because it is simply exposing raw memory. Implementing a robust bounds-checking mechanism would have been almost impossible in C.
In C++, bounds-checking is possible on class types. But an array is still the plain old C-compatible one. It is not a class. Further, C++ is also built on another rule which makes bounds-checking non-ideal. The C++ guiding principle is "you don't pay for what you don't use". If your code is correct, you don't need bounds-checking, and you shouldn't be forced to pay for the overhead of runtime bounds-checking.
So C++ offers the std::vector class template, which allows both. operator[] is designed to be efficient. The language standard does not require that it performs bounds checking (although it does not forbid it either). A vector also has the at() member function which is guaranteed to perform bounds-checking. So in C++, you get the best of both worlds if you use a vector. You get array-like performance without bounds-checking, and you get the ability to use bounds-checked access when you want it.
Using g++, you can add the command line option: -fstack-protector-all.
On your example it resulted in the following:
> g++ -o t -fstack-protector-all t.cc
> ./t
3
4
/bin/bash: line 1: 15450 Segmentation fault ./t
It doesn't really help you find or solve the problem, but at least the segfault will let you know that something is wrong.
g++ does not check for array bounds, and you may be overwriting something with 3,4 but nothing really important, if you try with higher numbers you'll get a crash.
You are just overwriting parts of the stack that are not used, you could continue till you reach the end of the allocated space for the stack and it'd crash eventually
EDIT:
You have no way of dealing with that, maybe a static code analyzer could reveal those failures, but that's too simple, you may have similar(but more complex) failures undetected even for static analyzers
It's undefined behavior as far as I know. Run a larger program with that and it will crash somewhere along the way. Bounds checking is not a part of raw arrays (or even std::vector).
Use std::vector with std::vector::iterator's instead so you don't have to worry about it.
Edit:
Just for fun, run this and see how long until you crash:
int main()
{
int arr[1];
for (int i = 0; i != 100000; i++)
{
arr[i] = i;
}
return 0; //will be lucky to ever reach this
}
Edit2:
Don't run that.
Edit3:
OK, here is a quick lesson on arrays and their relationships with pointers:
When you use array indexing, you are really using a pointer in disguise (called a "reference"), that is automatically dereferenced. This is why instead of *(array+1), array[1] automatically returns the value at that index.
When you have a pointer to an array, like this:
int arr[5];
int *ptr = arr;
Then the "array" in the second declaration is really decaying to a pointer to the first array. This is equivalent behavior to this:
int *ptr = &arr[0];
When you try to access beyond what you allocated, you are really just using a pointer to other memory (which C++ won't complain about). Taking my example program above, that is equivalent to this:
int main()
{
int arr[1];
int *ptr = arr;
for (int i = 0; i != 100000; i++, ptr++)
{
*ptr++ = i;
}
return 0; //will be lucky to ever reach this
}
The compiler won't complain because in programming, you often have to communicate with other programs, especially the operating system. This is done with pointers quite a bit.
Hint
If you want to have fast constraint size arrays with range error check, try using boost::array, (also std::tr1::array from <tr1/array> it will be standard container in next C++ specification). It's much faster then std::vector. It reserve memory on heap or inside class instance, just like int array[].
This is simple sample code:
#include <iostream>
#include <boost/array.hpp>
int main()
{
boost::array<int,2> array;
array.at(0) = 1; // checking index is inside range
array[1] = 2; // no error check, as fast as int array[2];
try
{
// index is inside range
std::cout << "array.at(0) = " << array.at(0) << std::endl;
// index is outside range, throwing exception
std::cout << "array.at(2) = " << array.at(2) << std::endl;
// never comes here
std::cout << "array.at(1) = " << array.at(1) << std::endl;
}
catch(const std::out_of_range& r)
{
std::cout << "Something goes wrong: " << r.what() << std::endl;
}
return 0;
}
This program will print:
array.at(0) = 1
Something goes wrong: array<>: index out of range
C or C++ will not check the bounds of an array access.
You are allocating the array on the stack. Indexing the array via array[3] is equivalent to *(array + 3), where array is a pointer to &array[0]. This will result in undefined behavior.
One way to catch this sometimes in C is to use a static checker, such as splint. If you run:
splint +bounds array.c
on,
int main(void)
{
int array[1];
array[1] = 1;
return 0;
}
then you will get the warning:
array.c: (in function main)
array.c:5:9: Likely out-of-bounds
store:
array[1]
Unable to resolve constraint:
requires 0 >= 1
needed to satisfy precondition:
requires maxSet(array # array.c:5:9) >= 1 A memory write may
write to an address beyond the
allocated buffer.
Run this through Valgrind and you might see an error.
As Falaina pointed out, valgrind does not detect many instances of stack corruption. I just tried the sample under valgrind, and it does indeed report zero errors. However, Valgrind can be instrumental in finding many other types of memory problems, it's just not particularly useful in this case unless you modify your bulid to include the --stack-check option. If you build and run the sample as
g++ --stack-check -W -Wall errorRange.cpp -o errorRange
valgrind ./errorRange
valgrind will report an error.
You are certainly overwriting your stack, but the program is simple enough that effects of this go unnoticed.
libstdc++, which is part of gcc, has a special debug mode for error checking. It is enabled by compiler flag -D_GLIBCXX_DEBUG. Among other things it does bounds checking for std::vector at the cost of performance. Here is online demo with recent version of gcc.
So actually you can do bounds checking with libstdc++ debug mode but you should do it only when testing because it costs notable performance compared to normal libstdc++ mode.
Undefined behavior working in your favor. Whatever memory you're clobbering apparently isn't holding anything important. Note that C and C++ do not do bounds checking on arrays, so stuff like that isn't going to be caught at compile or run time.
When you write 'array[index]' in C it translates it to machine instructions.
The translation is goes something like:
'get the address of array'
'get the size of the type of objects array is made up of'
'multiply the size of the type by index'
'add the result to the address of array'
'read what's at the resulting address'
The result addresses something which may, or may not, be part of the array. In exchange for the blazing speed of machine instructions you lose the safety net of the computer checking things for you. If you're meticulous and careful it's not a problem. If you're sloppy or make a mistake you get burnt. Sometimes it might generate an invalid instruction that causes an exception, sometimes not.
When you initialize the array with int array[2], space for 2 integers is allocated; but the identifier array simply points to the beginning of that space. When you then access array[3] and array[4], the compiler then simply increments that address to point to where those values would be, if the array was long enough; try accessing something like array[42] without initializing it first, you'll end up getting whatever value happened to already be in memory at that location.
Edit:
More info on pointers/arrays: http://home.netcom.com/~tjensen/ptr/pointers.htm
As I understand, local variables are allocated on stack, so going out of bounds on your own stack can only overwrite some other local variable, unless you go oob too much and exceed your stack size.
Since you have no other variables declared in your function - it does not cause any side effects. Try declaring another variable/array right after your first one and see what will happen with it.
A nice approach that i have seen often and I had been used actually is to inject some NULL type element (or a created one, like uint THIS_IS_INFINITY = 82862863263;) at end of the array.
Then at the loop condition check, TYPE *pagesWords is some kind of pointer array:
int pagesWordsLength = sizeof(pagesWords) / sizeof(pagesWords[0]);
realloc (pagesWords, sizeof(pagesWords[0]) * (pagesWordsLength + 1);
pagesWords[pagesWordsLength] = MY_NULL;
for (uint i = 0; i < 1000; i++)
{
if (pagesWords[i] == MY_NULL)
{
break;
}
}
This solution won't word if array is filled with struct types.
As mentioned now in the question using std::vector::at will solve the problem and make a bound check before accessing.
If you need a constant size array that is located on the stack as your first code use the C++11 new container std::array; as vector there is std::array::at function. In fact the function exists in all standard containers in which it have a meaning,i.e, where operator[] is defined :( deque, map, unordered_map) with the exception of std::bitset in which it is called std::bitset::test.
If you change your program slightly:
#include <iostream>
using namespace std;
int main()
{
int array[2];
INT NOTHING;
CHAR FOO[4];
STRCPY(FOO, "BAR");
array[0] = 1;
array[1] = 2;
array[3] = 3;
array[4] = 4;
cout << array[3] << endl;
cout << array[4] << endl;
COUT << FOO << ENDL;
return 0;
}
(Changes in capitals -- put those in lower case if you're going to try this.)
You will see that the variable foo has been trashed. Your code will store values into the nonexistent array[3] and array[4], and be able to properly retrieve them, but the actual storage used will be from foo.
So you can "get away" with exceeding the bounds of the array in your original example, but at the cost of causing damage elsewhere -- damage which may prove to be very hard to diagnose.
As to why there is no automatic bounds checking -- a correctly written program does not need it. Once that has been done, there is no reason to do run-time bounds checking and doing so would just slow down the program. Best to get that all figured out during design and coding.
C++ is based on C, which was designed to be as close to assembly language as possible.
when you declare int array[2]; you reserve 2 memory spaces of 4 bytes each(32bit program).
if you type array[4] in your code it still corresponds to a valid call but only at run time will it throw an unhandled exception. C++ uses manual memory management. This is actually a security flaw that was used for hacking programs
this can help understanding:
int * somepointer;
somepointer[0]=somepointer[5];
The behavior can depend on your system. Typically, you will have a margin for out of bounds, sometimes with value of 0 or garbage values. For the details you can check with memory allocation mechanism used in your OS. On top of that, if you use the programming language like c/c++, it will not check the bounds when you using some containers, like array. So, you will meet "undefined event" because you do not know what the OS did below the surface. But like the programming language Java, it will check the bound. If you step outside of the bound, you will get an exception.

Undefined behaviour when taking string input through scanf [duplicate]

I am assigning values in a C++ program out of the bounds like this:
#include <iostream>
using namespace std;
int main()
{
int array[2];
array[0] = 1;
array[1] = 2;
array[3] = 3;
array[4] = 4;
cout << array[3] << endl;
cout << array[4] << endl;
return 0;
}
The program prints 3 and 4. It should not be possible. I am using g++ 4.3.3
Here is compile and run command
$ g++ -W -Wall errorRange.cpp -o errorRange
$ ./errorRange
3
4
Only when assigning array[3000]=3000 does it give me a segmentation fault.
If gcc doesn't check for array bounds, how can I be sure if my program is correct, as it can lead to some serious issues later?
I replaced the above code with
vector<int> vint(2);
vint[0] = 0;
vint[1] = 1;
vint[2] = 2;
vint[5] = 5;
cout << vint[2] << endl;
cout << vint[5] << endl;
and this one also produces no error.
Welcome to every C/C++ programmer's bestest friend: Undefined Behavior.
There is a lot that is not specified by the language standard, for a variety of reasons. This is one of them.
In general, whenever you encounter undefined behavior, anything might happen. The application may crash, it may freeze, it may eject your CD-ROM drive or make demons come out of your nose. It may format your harddrive or email all your porn to your grandmother.
It may even, if you are really unlucky, appear to work correctly.
The language simply says what should happen if you access the elements within the bounds of an array. It is left undefined what happens if you go out of bounds. It might seem to work today, on your compiler, but it is not legal C or C++, and there is no guarantee that it'll still work the next time you run the program. Or that it hasn't overwritten essential data even now, and you just haven't encountered the problems, that it is going to cause — yet.
As for why there is no bounds checking, there are a couple aspects to the answer:
An array is a leftover from C. C arrays are about as primitive as you can get. Just a sequence of elements with contiguous addresses. There is no bounds checking because it is simply exposing raw memory. Implementing a robust bounds-checking mechanism would have been almost impossible in C.
In C++, bounds-checking is possible on class types. But an array is still the plain old C-compatible one. It is not a class. Further, C++ is also built on another rule which makes bounds-checking non-ideal. The C++ guiding principle is "you don't pay for what you don't use". If your code is correct, you don't need bounds-checking, and you shouldn't be forced to pay for the overhead of runtime bounds-checking.
So C++ offers the std::vector class template, which allows both. operator[] is designed to be efficient. The language standard does not require that it performs bounds checking (although it does not forbid it either). A vector also has the at() member function which is guaranteed to perform bounds-checking. So in C++, you get the best of both worlds if you use a vector. You get array-like performance without bounds-checking, and you get the ability to use bounds-checked access when you want it.
Using g++, you can add the command line option: -fstack-protector-all.
On your example it resulted in the following:
> g++ -o t -fstack-protector-all t.cc
> ./t
3
4
/bin/bash: line 1: 15450 Segmentation fault ./t
It doesn't really help you find or solve the problem, but at least the segfault will let you know that something is wrong.
g++ does not check for array bounds, and you may be overwriting something with 3,4 but nothing really important, if you try with higher numbers you'll get a crash.
You are just overwriting parts of the stack that are not used, you could continue till you reach the end of the allocated space for the stack and it'd crash eventually
EDIT:
You have no way of dealing with that, maybe a static code analyzer could reveal those failures, but that's too simple, you may have similar(but more complex) failures undetected even for static analyzers
It's undefined behavior as far as I know. Run a larger program with that and it will crash somewhere along the way. Bounds checking is not a part of raw arrays (or even std::vector).
Use std::vector with std::vector::iterator's instead so you don't have to worry about it.
Edit:
Just for fun, run this and see how long until you crash:
int main()
{
int arr[1];
for (int i = 0; i != 100000; i++)
{
arr[i] = i;
}
return 0; //will be lucky to ever reach this
}
Edit2:
Don't run that.
Edit3:
OK, here is a quick lesson on arrays and their relationships with pointers:
When you use array indexing, you are really using a pointer in disguise (called a "reference"), that is automatically dereferenced. This is why instead of *(array+1), array[1] automatically returns the value at that index.
When you have a pointer to an array, like this:
int arr[5];
int *ptr = arr;
Then the "array" in the second declaration is really decaying to a pointer to the first array. This is equivalent behavior to this:
int *ptr = &arr[0];
When you try to access beyond what you allocated, you are really just using a pointer to other memory (which C++ won't complain about). Taking my example program above, that is equivalent to this:
int main()
{
int arr[1];
int *ptr = arr;
for (int i = 0; i != 100000; i++, ptr++)
{
*ptr++ = i;
}
return 0; //will be lucky to ever reach this
}
The compiler won't complain because in programming, you often have to communicate with other programs, especially the operating system. This is done with pointers quite a bit.
Hint
If you want to have fast constraint size arrays with range error check, try using boost::array, (also std::tr1::array from <tr1/array> it will be standard container in next C++ specification). It's much faster then std::vector. It reserve memory on heap or inside class instance, just like int array[].
This is simple sample code:
#include <iostream>
#include <boost/array.hpp>
int main()
{
boost::array<int,2> array;
array.at(0) = 1; // checking index is inside range
array[1] = 2; // no error check, as fast as int array[2];
try
{
// index is inside range
std::cout << "array.at(0) = " << array.at(0) << std::endl;
// index is outside range, throwing exception
std::cout << "array.at(2) = " << array.at(2) << std::endl;
// never comes here
std::cout << "array.at(1) = " << array.at(1) << std::endl;
}
catch(const std::out_of_range& r)
{
std::cout << "Something goes wrong: " << r.what() << std::endl;
}
return 0;
}
This program will print:
array.at(0) = 1
Something goes wrong: array<>: index out of range
C or C++ will not check the bounds of an array access.
You are allocating the array on the stack. Indexing the array via array[3] is equivalent to *(array + 3), where array is a pointer to &array[0]. This will result in undefined behavior.
One way to catch this sometimes in C is to use a static checker, such as splint. If you run:
splint +bounds array.c
on,
int main(void)
{
int array[1];
array[1] = 1;
return 0;
}
then you will get the warning:
array.c: (in function main)
array.c:5:9: Likely out-of-bounds
store:
array[1]
Unable to resolve constraint:
requires 0 >= 1
needed to satisfy precondition:
requires maxSet(array # array.c:5:9) >= 1 A memory write may
write to an address beyond the
allocated buffer.
Run this through Valgrind and you might see an error.
As Falaina pointed out, valgrind does not detect many instances of stack corruption. I just tried the sample under valgrind, and it does indeed report zero errors. However, Valgrind can be instrumental in finding many other types of memory problems, it's just not particularly useful in this case unless you modify your bulid to include the --stack-check option. If you build and run the sample as
g++ --stack-check -W -Wall errorRange.cpp -o errorRange
valgrind ./errorRange
valgrind will report an error.
You are certainly overwriting your stack, but the program is simple enough that effects of this go unnoticed.
libstdc++, which is part of gcc, has a special debug mode for error checking. It is enabled by compiler flag -D_GLIBCXX_DEBUG. Among other things it does bounds checking for std::vector at the cost of performance. Here is online demo with recent version of gcc.
So actually you can do bounds checking with libstdc++ debug mode but you should do it only when testing because it costs notable performance compared to normal libstdc++ mode.
Undefined behavior working in your favor. Whatever memory you're clobbering apparently isn't holding anything important. Note that C and C++ do not do bounds checking on arrays, so stuff like that isn't going to be caught at compile or run time.
When you write 'array[index]' in C it translates it to machine instructions.
The translation is goes something like:
'get the address of array'
'get the size of the type of objects array is made up of'
'multiply the size of the type by index'
'add the result to the address of array'
'read what's at the resulting address'
The result addresses something which may, or may not, be part of the array. In exchange for the blazing speed of machine instructions you lose the safety net of the computer checking things for you. If you're meticulous and careful it's not a problem. If you're sloppy or make a mistake you get burnt. Sometimes it might generate an invalid instruction that causes an exception, sometimes not.
When you initialize the array with int array[2], space for 2 integers is allocated; but the identifier array simply points to the beginning of that space. When you then access array[3] and array[4], the compiler then simply increments that address to point to where those values would be, if the array was long enough; try accessing something like array[42] without initializing it first, you'll end up getting whatever value happened to already be in memory at that location.
Edit:
More info on pointers/arrays: http://home.netcom.com/~tjensen/ptr/pointers.htm
As I understand, local variables are allocated on stack, so going out of bounds on your own stack can only overwrite some other local variable, unless you go oob too much and exceed your stack size.
Since you have no other variables declared in your function - it does not cause any side effects. Try declaring another variable/array right after your first one and see what will happen with it.
A nice approach that i have seen often and I had been used actually is to inject some NULL type element (or a created one, like uint THIS_IS_INFINITY = 82862863263;) at end of the array.
Then at the loop condition check, TYPE *pagesWords is some kind of pointer array:
int pagesWordsLength = sizeof(pagesWords) / sizeof(pagesWords[0]);
realloc (pagesWords, sizeof(pagesWords[0]) * (pagesWordsLength + 1);
pagesWords[pagesWordsLength] = MY_NULL;
for (uint i = 0; i < 1000; i++)
{
if (pagesWords[i] == MY_NULL)
{
break;
}
}
This solution won't word if array is filled with struct types.
As mentioned now in the question using std::vector::at will solve the problem and make a bound check before accessing.
If you need a constant size array that is located on the stack as your first code use the C++11 new container std::array; as vector there is std::array::at function. In fact the function exists in all standard containers in which it have a meaning,i.e, where operator[] is defined :( deque, map, unordered_map) with the exception of std::bitset in which it is called std::bitset::test.
If you change your program slightly:
#include <iostream>
using namespace std;
int main()
{
int array[2];
INT NOTHING;
CHAR FOO[4];
STRCPY(FOO, "BAR");
array[0] = 1;
array[1] = 2;
array[3] = 3;
array[4] = 4;
cout << array[3] << endl;
cout << array[4] << endl;
COUT << FOO << ENDL;
return 0;
}
(Changes in capitals -- put those in lower case if you're going to try this.)
You will see that the variable foo has been trashed. Your code will store values into the nonexistent array[3] and array[4], and be able to properly retrieve them, but the actual storage used will be from foo.
So you can "get away" with exceeding the bounds of the array in your original example, but at the cost of causing damage elsewhere -- damage which may prove to be very hard to diagnose.
As to why there is no automatic bounds checking -- a correctly written program does not need it. Once that has been done, there is no reason to do run-time bounds checking and doing so would just slow down the program. Best to get that all figured out during design and coding.
C++ is based on C, which was designed to be as close to assembly language as possible.
when you declare int array[2]; you reserve 2 memory spaces of 4 bytes each(32bit program).
if you type array[4] in your code it still corresponds to a valid call but only at run time will it throw an unhandled exception. C++ uses manual memory management. This is actually a security flaw that was used for hacking programs
this can help understanding:
int * somepointer;
somepointer[0]=somepointer[5];
The behavior can depend on your system. Typically, you will have a margin for out of bounds, sometimes with value of 0 or garbage values. For the details you can check with memory allocation mechanism used in your OS. On top of that, if you use the programming language like c/c++, it will not check the bounds when you using some containers, like array. So, you will meet "undefined event" because you do not know what the OS did below the surface. But like the programming language Java, it will check the bound. If you step outside of the bound, you will get an exception.

In this queue code my Array size is 5 but I can insert more than 5 element, how? [duplicate]

I am assigning values in a C++ program out of the bounds like this:
#include <iostream>
using namespace std;
int main()
{
int array[2];
array[0] = 1;
array[1] = 2;
array[3] = 3;
array[4] = 4;
cout << array[3] << endl;
cout << array[4] << endl;
return 0;
}
The program prints 3 and 4. It should not be possible. I am using g++ 4.3.3
Here is compile and run command
$ g++ -W -Wall errorRange.cpp -o errorRange
$ ./errorRange
3
4
Only when assigning array[3000]=3000 does it give me a segmentation fault.
If gcc doesn't check for array bounds, how can I be sure if my program is correct, as it can lead to some serious issues later?
I replaced the above code with
vector<int> vint(2);
vint[0] = 0;
vint[1] = 1;
vint[2] = 2;
vint[5] = 5;
cout << vint[2] << endl;
cout << vint[5] << endl;
and this one also produces no error.
Welcome to every C/C++ programmer's bestest friend: Undefined Behavior.
There is a lot that is not specified by the language standard, for a variety of reasons. This is one of them.
In general, whenever you encounter undefined behavior, anything might happen. The application may crash, it may freeze, it may eject your CD-ROM drive or make demons come out of your nose. It may format your harddrive or email all your porn to your grandmother.
It may even, if you are really unlucky, appear to work correctly.
The language simply says what should happen if you access the elements within the bounds of an array. It is left undefined what happens if you go out of bounds. It might seem to work today, on your compiler, but it is not legal C or C++, and there is no guarantee that it'll still work the next time you run the program. Or that it hasn't overwritten essential data even now, and you just haven't encountered the problems, that it is going to cause — yet.
As for why there is no bounds checking, there are a couple aspects to the answer:
An array is a leftover from C. C arrays are about as primitive as you can get. Just a sequence of elements with contiguous addresses. There is no bounds checking because it is simply exposing raw memory. Implementing a robust bounds-checking mechanism would have been almost impossible in C.
In C++, bounds-checking is possible on class types. But an array is still the plain old C-compatible one. It is not a class. Further, C++ is also built on another rule which makes bounds-checking non-ideal. The C++ guiding principle is "you don't pay for what you don't use". If your code is correct, you don't need bounds-checking, and you shouldn't be forced to pay for the overhead of runtime bounds-checking.
So C++ offers the std::vector class template, which allows both. operator[] is designed to be efficient. The language standard does not require that it performs bounds checking (although it does not forbid it either). A vector also has the at() member function which is guaranteed to perform bounds-checking. So in C++, you get the best of both worlds if you use a vector. You get array-like performance without bounds-checking, and you get the ability to use bounds-checked access when you want it.
Using g++, you can add the command line option: -fstack-protector-all.
On your example it resulted in the following:
> g++ -o t -fstack-protector-all t.cc
> ./t
3
4
/bin/bash: line 1: 15450 Segmentation fault ./t
It doesn't really help you find or solve the problem, but at least the segfault will let you know that something is wrong.
g++ does not check for array bounds, and you may be overwriting something with 3,4 but nothing really important, if you try with higher numbers you'll get a crash.
You are just overwriting parts of the stack that are not used, you could continue till you reach the end of the allocated space for the stack and it'd crash eventually
EDIT:
You have no way of dealing with that, maybe a static code analyzer could reveal those failures, but that's too simple, you may have similar(but more complex) failures undetected even for static analyzers
It's undefined behavior as far as I know. Run a larger program with that and it will crash somewhere along the way. Bounds checking is not a part of raw arrays (or even std::vector).
Use std::vector with std::vector::iterator's instead so you don't have to worry about it.
Edit:
Just for fun, run this and see how long until you crash:
int main()
{
int arr[1];
for (int i = 0; i != 100000; i++)
{
arr[i] = i;
}
return 0; //will be lucky to ever reach this
}
Edit2:
Don't run that.
Edit3:
OK, here is a quick lesson on arrays and their relationships with pointers:
When you use array indexing, you are really using a pointer in disguise (called a "reference"), that is automatically dereferenced. This is why instead of *(array+1), array[1] automatically returns the value at that index.
When you have a pointer to an array, like this:
int arr[5];
int *ptr = arr;
Then the "array" in the second declaration is really decaying to a pointer to the first array. This is equivalent behavior to this:
int *ptr = &arr[0];
When you try to access beyond what you allocated, you are really just using a pointer to other memory (which C++ won't complain about). Taking my example program above, that is equivalent to this:
int main()
{
int arr[1];
int *ptr = arr;
for (int i = 0; i != 100000; i++, ptr++)
{
*ptr++ = i;
}
return 0; //will be lucky to ever reach this
}
The compiler won't complain because in programming, you often have to communicate with other programs, especially the operating system. This is done with pointers quite a bit.
Hint
If you want to have fast constraint size arrays with range error check, try using boost::array, (also std::tr1::array from <tr1/array> it will be standard container in next C++ specification). It's much faster then std::vector. It reserve memory on heap or inside class instance, just like int array[].
This is simple sample code:
#include <iostream>
#include <boost/array.hpp>
int main()
{
boost::array<int,2> array;
array.at(0) = 1; // checking index is inside range
array[1] = 2; // no error check, as fast as int array[2];
try
{
// index is inside range
std::cout << "array.at(0) = " << array.at(0) << std::endl;
// index is outside range, throwing exception
std::cout << "array.at(2) = " << array.at(2) << std::endl;
// never comes here
std::cout << "array.at(1) = " << array.at(1) << std::endl;
}
catch(const std::out_of_range& r)
{
std::cout << "Something goes wrong: " << r.what() << std::endl;
}
return 0;
}
This program will print:
array.at(0) = 1
Something goes wrong: array<>: index out of range
C or C++ will not check the bounds of an array access.
You are allocating the array on the stack. Indexing the array via array[3] is equivalent to *(array + 3), where array is a pointer to &array[0]. This will result in undefined behavior.
One way to catch this sometimes in C is to use a static checker, such as splint. If you run:
splint +bounds array.c
on,
int main(void)
{
int array[1];
array[1] = 1;
return 0;
}
then you will get the warning:
array.c: (in function main)
array.c:5:9: Likely out-of-bounds
store:
array[1]
Unable to resolve constraint:
requires 0 >= 1
needed to satisfy precondition:
requires maxSet(array # array.c:5:9) >= 1 A memory write may
write to an address beyond the
allocated buffer.
Run this through Valgrind and you might see an error.
As Falaina pointed out, valgrind does not detect many instances of stack corruption. I just tried the sample under valgrind, and it does indeed report zero errors. However, Valgrind can be instrumental in finding many other types of memory problems, it's just not particularly useful in this case unless you modify your bulid to include the --stack-check option. If you build and run the sample as
g++ --stack-check -W -Wall errorRange.cpp -o errorRange
valgrind ./errorRange
valgrind will report an error.
You are certainly overwriting your stack, but the program is simple enough that effects of this go unnoticed.
libstdc++, which is part of gcc, has a special debug mode for error checking. It is enabled by compiler flag -D_GLIBCXX_DEBUG. Among other things it does bounds checking for std::vector at the cost of performance. Here is online demo with recent version of gcc.
So actually you can do bounds checking with libstdc++ debug mode but you should do it only when testing because it costs notable performance compared to normal libstdc++ mode.
Undefined behavior working in your favor. Whatever memory you're clobbering apparently isn't holding anything important. Note that C and C++ do not do bounds checking on arrays, so stuff like that isn't going to be caught at compile or run time.
When you write 'array[index]' in C it translates it to machine instructions.
The translation is goes something like:
'get the address of array'
'get the size of the type of objects array is made up of'
'multiply the size of the type by index'
'add the result to the address of array'
'read what's at the resulting address'
The result addresses something which may, or may not, be part of the array. In exchange for the blazing speed of machine instructions you lose the safety net of the computer checking things for you. If you're meticulous and careful it's not a problem. If you're sloppy or make a mistake you get burnt. Sometimes it might generate an invalid instruction that causes an exception, sometimes not.
When you initialize the array with int array[2], space for 2 integers is allocated; but the identifier array simply points to the beginning of that space. When you then access array[3] and array[4], the compiler then simply increments that address to point to where those values would be, if the array was long enough; try accessing something like array[42] without initializing it first, you'll end up getting whatever value happened to already be in memory at that location.
Edit:
More info on pointers/arrays: http://home.netcom.com/~tjensen/ptr/pointers.htm
As I understand, local variables are allocated on stack, so going out of bounds on your own stack can only overwrite some other local variable, unless you go oob too much and exceed your stack size.
Since you have no other variables declared in your function - it does not cause any side effects. Try declaring another variable/array right after your first one and see what will happen with it.
A nice approach that i have seen often and I had been used actually is to inject some NULL type element (or a created one, like uint THIS_IS_INFINITY = 82862863263;) at end of the array.
Then at the loop condition check, TYPE *pagesWords is some kind of pointer array:
int pagesWordsLength = sizeof(pagesWords) / sizeof(pagesWords[0]);
realloc (pagesWords, sizeof(pagesWords[0]) * (pagesWordsLength + 1);
pagesWords[pagesWordsLength] = MY_NULL;
for (uint i = 0; i < 1000; i++)
{
if (pagesWords[i] == MY_NULL)
{
break;
}
}
This solution won't word if array is filled with struct types.
As mentioned now in the question using std::vector::at will solve the problem and make a bound check before accessing.
If you need a constant size array that is located on the stack as your first code use the C++11 new container std::array; as vector there is std::array::at function. In fact the function exists in all standard containers in which it have a meaning,i.e, where operator[] is defined :( deque, map, unordered_map) with the exception of std::bitset in which it is called std::bitset::test.
If you change your program slightly:
#include <iostream>
using namespace std;
int main()
{
int array[2];
INT NOTHING;
CHAR FOO[4];
STRCPY(FOO, "BAR");
array[0] = 1;
array[1] = 2;
array[3] = 3;
array[4] = 4;
cout << array[3] << endl;
cout << array[4] << endl;
COUT << FOO << ENDL;
return 0;
}
(Changes in capitals -- put those in lower case if you're going to try this.)
You will see that the variable foo has been trashed. Your code will store values into the nonexistent array[3] and array[4], and be able to properly retrieve them, but the actual storage used will be from foo.
So you can "get away" with exceeding the bounds of the array in your original example, but at the cost of causing damage elsewhere -- damage which may prove to be very hard to diagnose.
As to why there is no automatic bounds checking -- a correctly written program does not need it. Once that has been done, there is no reason to do run-time bounds checking and doing so would just slow down the program. Best to get that all figured out during design and coding.
C++ is based on C, which was designed to be as close to assembly language as possible.
when you declare int array[2]; you reserve 2 memory spaces of 4 bytes each(32bit program).
if you type array[4] in your code it still corresponds to a valid call but only at run time will it throw an unhandled exception. C++ uses manual memory management. This is actually a security flaw that was used for hacking programs
this can help understanding:
int * somepointer;
somepointer[0]=somepointer[5];
The behavior can depend on your system. Typically, you will have a margin for out of bounds, sometimes with value of 0 or garbage values. For the details you can check with memory allocation mechanism used in your OS. On top of that, if you use the programming language like c/c++, it will not check the bounds when you using some containers, like array. So, you will meet "undefined event" because you do not know what the OS did below the surface. But like the programming language Java, it will check the bound. If you step outside of the bound, you will get an exception.

Assigning values to an array, beyond initialized elements [duplicate]

I am assigning values in a C++ program out of the bounds like this:
#include <iostream>
using namespace std;
int main()
{
int array[2];
array[0] = 1;
array[1] = 2;
array[3] = 3;
array[4] = 4;
cout << array[3] << endl;
cout << array[4] << endl;
return 0;
}
The program prints 3 and 4. It should not be possible. I am using g++ 4.3.3
Here is compile and run command
$ g++ -W -Wall errorRange.cpp -o errorRange
$ ./errorRange
3
4
Only when assigning array[3000]=3000 does it give me a segmentation fault.
If gcc doesn't check for array bounds, how can I be sure if my program is correct, as it can lead to some serious issues later?
I replaced the above code with
vector<int> vint(2);
vint[0] = 0;
vint[1] = 1;
vint[2] = 2;
vint[5] = 5;
cout << vint[2] << endl;
cout << vint[5] << endl;
and this one also produces no error.
Welcome to every C/C++ programmer's bestest friend: Undefined Behavior.
There is a lot that is not specified by the language standard, for a variety of reasons. This is one of them.
In general, whenever you encounter undefined behavior, anything might happen. The application may crash, it may freeze, it may eject your CD-ROM drive or make demons come out of your nose. It may format your harddrive or email all your porn to your grandmother.
It may even, if you are really unlucky, appear to work correctly.
The language simply says what should happen if you access the elements within the bounds of an array. It is left undefined what happens if you go out of bounds. It might seem to work today, on your compiler, but it is not legal C or C++, and there is no guarantee that it'll still work the next time you run the program. Or that it hasn't overwritten essential data even now, and you just haven't encountered the problems, that it is going to cause — yet.
As for why there is no bounds checking, there are a couple aspects to the answer:
An array is a leftover from C. C arrays are about as primitive as you can get. Just a sequence of elements with contiguous addresses. There is no bounds checking because it is simply exposing raw memory. Implementing a robust bounds-checking mechanism would have been almost impossible in C.
In C++, bounds-checking is possible on class types. But an array is still the plain old C-compatible one. It is not a class. Further, C++ is also built on another rule which makes bounds-checking non-ideal. The C++ guiding principle is "you don't pay for what you don't use". If your code is correct, you don't need bounds-checking, and you shouldn't be forced to pay for the overhead of runtime bounds-checking.
So C++ offers the std::vector class template, which allows both. operator[] is designed to be efficient. The language standard does not require that it performs bounds checking (although it does not forbid it either). A vector also has the at() member function which is guaranteed to perform bounds-checking. So in C++, you get the best of both worlds if you use a vector. You get array-like performance without bounds-checking, and you get the ability to use bounds-checked access when you want it.
Using g++, you can add the command line option: -fstack-protector-all.
On your example it resulted in the following:
> g++ -o t -fstack-protector-all t.cc
> ./t
3
4
/bin/bash: line 1: 15450 Segmentation fault ./t
It doesn't really help you find or solve the problem, but at least the segfault will let you know that something is wrong.
g++ does not check for array bounds, and you may be overwriting something with 3,4 but nothing really important, if you try with higher numbers you'll get a crash.
You are just overwriting parts of the stack that are not used, you could continue till you reach the end of the allocated space for the stack and it'd crash eventually
EDIT:
You have no way of dealing with that, maybe a static code analyzer could reveal those failures, but that's too simple, you may have similar(but more complex) failures undetected even for static analyzers
It's undefined behavior as far as I know. Run a larger program with that and it will crash somewhere along the way. Bounds checking is not a part of raw arrays (or even std::vector).
Use std::vector with std::vector::iterator's instead so you don't have to worry about it.
Edit:
Just for fun, run this and see how long until you crash:
int main()
{
int arr[1];
for (int i = 0; i != 100000; i++)
{
arr[i] = i;
}
return 0; //will be lucky to ever reach this
}
Edit2:
Don't run that.
Edit3:
OK, here is a quick lesson on arrays and their relationships with pointers:
When you use array indexing, you are really using a pointer in disguise (called a "reference"), that is automatically dereferenced. This is why instead of *(array+1), array[1] automatically returns the value at that index.
When you have a pointer to an array, like this:
int arr[5];
int *ptr = arr;
Then the "array" in the second declaration is really decaying to a pointer to the first array. This is equivalent behavior to this:
int *ptr = &arr[0];
When you try to access beyond what you allocated, you are really just using a pointer to other memory (which C++ won't complain about). Taking my example program above, that is equivalent to this:
int main()
{
int arr[1];
int *ptr = arr;
for (int i = 0; i != 100000; i++, ptr++)
{
*ptr++ = i;
}
return 0; //will be lucky to ever reach this
}
The compiler won't complain because in programming, you often have to communicate with other programs, especially the operating system. This is done with pointers quite a bit.
Hint
If you want to have fast constraint size arrays with range error check, try using boost::array, (also std::tr1::array from <tr1/array> it will be standard container in next C++ specification). It's much faster then std::vector. It reserve memory on heap or inside class instance, just like int array[].
This is simple sample code:
#include <iostream>
#include <boost/array.hpp>
int main()
{
boost::array<int,2> array;
array.at(0) = 1; // checking index is inside range
array[1] = 2; // no error check, as fast as int array[2];
try
{
// index is inside range
std::cout << "array.at(0) = " << array.at(0) << std::endl;
// index is outside range, throwing exception
std::cout << "array.at(2) = " << array.at(2) << std::endl;
// never comes here
std::cout << "array.at(1) = " << array.at(1) << std::endl;
}
catch(const std::out_of_range& r)
{
std::cout << "Something goes wrong: " << r.what() << std::endl;
}
return 0;
}
This program will print:
array.at(0) = 1
Something goes wrong: array<>: index out of range
C or C++ will not check the bounds of an array access.
You are allocating the array on the stack. Indexing the array via array[3] is equivalent to *(array + 3), where array is a pointer to &array[0]. This will result in undefined behavior.
One way to catch this sometimes in C is to use a static checker, such as splint. If you run:
splint +bounds array.c
on,
int main(void)
{
int array[1];
array[1] = 1;
return 0;
}
then you will get the warning:
array.c: (in function main)
array.c:5:9: Likely out-of-bounds
store:
array[1]
Unable to resolve constraint:
requires 0 >= 1
needed to satisfy precondition:
requires maxSet(array # array.c:5:9) >= 1 A memory write may
write to an address beyond the
allocated buffer.
Run this through Valgrind and you might see an error.
As Falaina pointed out, valgrind does not detect many instances of stack corruption. I just tried the sample under valgrind, and it does indeed report zero errors. However, Valgrind can be instrumental in finding many other types of memory problems, it's just not particularly useful in this case unless you modify your bulid to include the --stack-check option. If you build and run the sample as
g++ --stack-check -W -Wall errorRange.cpp -o errorRange
valgrind ./errorRange
valgrind will report an error.
You are certainly overwriting your stack, but the program is simple enough that effects of this go unnoticed.
libstdc++, which is part of gcc, has a special debug mode for error checking. It is enabled by compiler flag -D_GLIBCXX_DEBUG. Among other things it does bounds checking for std::vector at the cost of performance. Here is online demo with recent version of gcc.
So actually you can do bounds checking with libstdc++ debug mode but you should do it only when testing because it costs notable performance compared to normal libstdc++ mode.
Undefined behavior working in your favor. Whatever memory you're clobbering apparently isn't holding anything important. Note that C and C++ do not do bounds checking on arrays, so stuff like that isn't going to be caught at compile or run time.
When you write 'array[index]' in C it translates it to machine instructions.
The translation is goes something like:
'get the address of array'
'get the size of the type of objects array is made up of'
'multiply the size of the type by index'
'add the result to the address of array'
'read what's at the resulting address'
The result addresses something which may, or may not, be part of the array. In exchange for the blazing speed of machine instructions you lose the safety net of the computer checking things for you. If you're meticulous and careful it's not a problem. If you're sloppy or make a mistake you get burnt. Sometimes it might generate an invalid instruction that causes an exception, sometimes not.
When you initialize the array with int array[2], space for 2 integers is allocated; but the identifier array simply points to the beginning of that space. When you then access array[3] and array[4], the compiler then simply increments that address to point to where those values would be, if the array was long enough; try accessing something like array[42] without initializing it first, you'll end up getting whatever value happened to already be in memory at that location.
Edit:
More info on pointers/arrays: http://home.netcom.com/~tjensen/ptr/pointers.htm
As I understand, local variables are allocated on stack, so going out of bounds on your own stack can only overwrite some other local variable, unless you go oob too much and exceed your stack size.
Since you have no other variables declared in your function - it does not cause any side effects. Try declaring another variable/array right after your first one and see what will happen with it.
A nice approach that i have seen often and I had been used actually is to inject some NULL type element (or a created one, like uint THIS_IS_INFINITY = 82862863263;) at end of the array.
Then at the loop condition check, TYPE *pagesWords is some kind of pointer array:
int pagesWordsLength = sizeof(pagesWords) / sizeof(pagesWords[0]);
realloc (pagesWords, sizeof(pagesWords[0]) * (pagesWordsLength + 1);
pagesWords[pagesWordsLength] = MY_NULL;
for (uint i = 0; i < 1000; i++)
{
if (pagesWords[i] == MY_NULL)
{
break;
}
}
This solution won't word if array is filled with struct types.
As mentioned now in the question using std::vector::at will solve the problem and make a bound check before accessing.
If you need a constant size array that is located on the stack as your first code use the C++11 new container std::array; as vector there is std::array::at function. In fact the function exists in all standard containers in which it have a meaning,i.e, where operator[] is defined :( deque, map, unordered_map) with the exception of std::bitset in which it is called std::bitset::test.
If you change your program slightly:
#include <iostream>
using namespace std;
int main()
{
int array[2];
INT NOTHING;
CHAR FOO[4];
STRCPY(FOO, "BAR");
array[0] = 1;
array[1] = 2;
array[3] = 3;
array[4] = 4;
cout << array[3] << endl;
cout << array[4] << endl;
COUT << FOO << ENDL;
return 0;
}
(Changes in capitals -- put those in lower case if you're going to try this.)
You will see that the variable foo has been trashed. Your code will store values into the nonexistent array[3] and array[4], and be able to properly retrieve them, but the actual storage used will be from foo.
So you can "get away" with exceeding the bounds of the array in your original example, but at the cost of causing damage elsewhere -- damage which may prove to be very hard to diagnose.
As to why there is no automatic bounds checking -- a correctly written program does not need it. Once that has been done, there is no reason to do run-time bounds checking and doing so would just slow down the program. Best to get that all figured out during design and coding.
C++ is based on C, which was designed to be as close to assembly language as possible.
when you declare int array[2]; you reserve 2 memory spaces of 4 bytes each(32bit program).
if you type array[4] in your code it still corresponds to a valid call but only at run time will it throw an unhandled exception. C++ uses manual memory management. This is actually a security flaw that was used for hacking programs
this can help understanding:
int * somepointer;
somepointer[0]=somepointer[5];
The behavior can depend on your system. Typically, you will have a margin for out of bounds, sometimes with value of 0 or garbage values. For the details you can check with memory allocation mechanism used in your OS. On top of that, if you use the programming language like c/c++, it will not check the bounds when you using some containers, like array. So, you will meet "undefined event" because you do not know what the OS did below the surface. But like the programming language Java, it will check the bound. If you step outside of the bound, you will get an exception.