std::array and operator [] - c++

The reference for std::array::operator[] states:
Returns a reference to the element at specified location pos. No
bounds checking is performed.
I wrote this small program to check the behavior of operator[]:
#include <array>
#include <cstddef>
#include <iostream>
using std::array;
using std::size_t;
using std::cout;
using std::endl;
#define MAX_SZ 5
int main (void){
array<int,MAX_SZ> a;
size_t idx = MAX_SZ - 1;
while(idx < MAX_SZ){
cout << idx << ":" << a[idx] << endl;
--idx;
}
cout << idx << ":" << a[idx] << endl;
return 0;
}
When compiled and run, the above program produces the following output:
4:-13104
3:0
2:-12816
1:1
0:-2144863424
18446744073709551615:0
Based on the above output, my question is:
Why doesn't above code give a segmentation fault error, when the value of idx assumes the value 18446744073709551615?

oprator[] is not required to do bound-checks. Thus, it is out of bound access. Out of bound access causes undefined behavior. Meaning anything could happen. I really do mean anything. For ex. it could order pizza.

As already said you face undefined behaviour.
Nevertheless, if you like to have a boundary check you can use .at() instead of the operator []. This will be a bit slower, since it performs the check for every access to the array.
Also a memory checker such as valgrind is able to find errors like this at runtime of the program.

>> Why doesn't above code give a segmentation fault error, when the value of idx assumes the value 18446744073709551615 ?
Because this large number is 2**64 - 1, that is 2 raised to the 64th power, minus 1.
And as far as the array indexing logic is concerned, this is exactly the same thing as -1, because the 2**64 value is outside of what the 64 bits hardware can consider. So you are accessing (illegally) a[-1], and it happens to contain 0 in your machine.
In your memory, this is the word just before a[0]. It is memory in your stack, which you are perfectly allowed by the hardware to access, so no segmentation fault is expected to occur.
Your while loop uses a size_t index, which is essentially an unsigned 64 bits quantity. So when the index is decremented and goes from 0 to -1, -1 is interpreted by the loop control test as 18446744073709551615 (a bit pattern consisting of 64 bits all set to 1), which is way bigger than MAX_SZ = 5, so the test fails and the while loop stops there.
If you have the slightest doubt about that, you can check by controlling the memory values around array a[]. To do this, you can "sandwich" array a between 2 smaller arrays, say magica and magicb, which you properly initialize. Like this:
#include <array>
#include <cstddef>
#include <iostream>
using std::array;
using std::size_t;
using std::cout;
using std::endl;
#define MAX_SZ 5
int main (void){
array<int,2> magica;
array<int,MAX_SZ> a;
size_t idx = MAX_SZ - 1;
array<int,2> magicb;
magica[0] = 111222333;
magica[1] = 111222334;
magicb[0] = 111222335;
magicb[1] = 111222336;
cout << "magicb[1] : " << magicb[1] << endl;
while (idx < MAX_SZ) {
cout << idx << ":" << a[idx] << endl;
--idx;
}
cout << idx << ":" << a[idx] << endl;
return 0;
}
My machine is a x86-based one, so its stack grows towards numerically lower memory addresses. Array magicb is defined after array a in the source code order, so it is allocated last on the stack, so it has a numerically lower address than array a.
Hence, the memory layout is: magicb[0], magicb[1], a[0], ... , a[4], magica[0], magica[1]. So you expect the hardware to give you magicb[1] when you ask for a[-1].
This is indeed what happens:
magicb[1] : 111222336
4:607440832
3:0
2:4199469
1:0
0:2
18446744073709551615:111222336
As other people have pointed out, the C++ language rules do not define what you are expected to get from negative array indexes, and hence the people who wrote the compiler were at license to return whatever value suited them as a[-1]. Their sole concern was probably to write machine code that does not decrease the performance for well-behaved source code.

Related

Printing array of doubles gives unexpected results

In a program I am writing I am experiencing unexpected output when printing data from an array. I have tried with float and double. Here is the code:
#include <iostream>
int main()
{
double vector[3]{ 193.09375 , 338.5411682 , -4.0 };
double pVecX{ 193.09375 };
double pVecY{ 338.5411682 };
double pVecZ{ -4 };
std::cout << std::dec << vector[1] << '\n' << vector[2] << '\n' << vector[3] << '\n' << '\n';
std::cout << std::dec << pVecX << '\n' << pVecY << '\n' << pVecZ << '\n';
system("Pause");
return 0;
}
This is the output:
338.541
-4
1.42292e-306
193.094
338.541
-4
Press any key to continue . . .
Issues:
I expected the vectors to print in reverse order from how they were entered into the array.
(Even though I ask for [1]..[2]..[3], it is printing [2]..[3]..[1] (I Think that is the order))
When part of the array, the number "193.09375" becomes a (seemingly) random notated number, and is different every time the program runs.
I was reading about variables and understand that a variable stored outside of the range it is initialized as can cause wrap-around, I just do not know why that is happening here. (I assume it is based on the negative notation.)
I am certain that I am missing something simple, and I am fairly new.
An Arrays index starts at 0. So when you say vector[3] you are actually going out of bounds.
You only have 0, 1, and 2 indices or subscripts. Although you do have 3 elements. 0 would refer to your first element, 1 would refer to your second element, and 2 would refer to your 3 element, and so on and so forth.
(Like I mentioned in my comment.)
You should have something like this instead:
std::cout << std::dec << vector[0] << '\n' << vector[1] << '\n' << vector[2] << '\n' << '\n';
This should fix your problem. Also consider using a std::vector.
Also read about why you should not use system("Pause");.
As mentioned in other answers, the valid indexes for an array of size 3 is 0, 1, and 2. Using any other index invokes undefined behavior.
You can also avoid explicitly indexing into the array, if you use a loop:
for (auto v : vector)
std::cout << std::dec << v << '\n';
vector[3] is outside the bounds of the array. Behaviour of the program is undefined. Valid indices are 0, 1 and 2.
I expected the vectors to print in reverse order from how they were entered into the array.
Why? The indexes you print are ordered from lower to higher (and the << print in the order of the source code, etc.). If you wanted to reverse them, you need to print first the highest index, 2, then 1, then 0,
When part of the array, the number "193.09375" becomes a (seemingly) random notated number, and is different every time the program runs.
The vector array goes from 0 to 2, not from 1 to 3. When you try to access vector[3], it is undefined behavior and the program will likely printing whatever memory ends up there. Every time you run the program that memory may contain different things, it is a fairly normal result of undefined behavior.
Arrays in C++ are zero indexed, that means that the first element is accessed by the index 0, e.g.
int array[3] {5,6,7};
So array[0] == 5, array[1] == 6, array[2] == 7.
The reason you get a random number is that you are trying to print an element of the array which never got defined. In the given example of my array above, if I would try to print element array[3], the corresponds to a certain place in memory (which is not part of my array) which can be filled by any value, thats called undefined bahavior).
If you want to print out every element of an array, you could make use of range based for loops:
for (auto a : my_array) std::cout << a << std::endl;

Accessing memory at negative indexes of array in C++ does not return garbage

I wrote the following program to search of a particular string in a given array of string. I made an error in the search function and wrote i-- instead of i++.
#include <iostream>
#include <string>
using namespace std;
int search(string S[], int pos, string s)
{
for(int i=0; i<pos; i--) {
cout << i << " : " << S[i] << "\n";
if (S[i] == s) {
cout << "Inside Return ->\n";
cout << i << " / " << S[i] << " / " << s << "\n";
return i;
}
}
return -1;
}
int main()
{
string S[] = {"abc", "def", "pqr", "xyz"};
string s = "def";
cout << search(S,2,s) << "\n";
return 0;
}
Logically the loop is an infinite loop and should not stop but what I observed was that the if condition was true for each search and the function returned -1.
I printed the values and noticed that the value of S[-1] is always same as the third argument passed to the function (the string to be searched) due to which the loop was returning -1 every time.
Is this something that g++ is doing or is it related to the way memory is allocated for the formal arguments of the function?
Output of the above code -
0 : abc
-1 : def
Inside Return ->
-1 / def / def
PS - I am using g++ (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
Edit -I understand that g++ doesn't check for bounds but I was intrigued by the fact that the values of S[-1] was always the same as s. I was wondering if there are any possible theories for this
Access out of bounds is undefined behaviour.
Undefined behaviour reads is not "garbage" or "segfault", it is literally anything. The read could time travel and make code earlier in the program behave differently. The behaviour of the program, from start to finish, it completely unspecified by the C++ standard whenever any undefined behaviour happens anywhere.
In this case, naive assembly and the ABI tells you that arguments on the "stack" at run time are located adjacent to things like the arguments to the function.
So a naive rewriting of your code into assembly results in negative indexes reading from the arguments to the function.
But, a whole myriad of completely innocuous, common and safe alternative interpretations of your program as machine code, starting with inline and going far away from there, make this not happen.
When compiling without LTO or over dynamic library boundaries, you can have some small amount of confidence that the compiler's published ABI will be used to make the call; any assumption elsewhere is dangerously bad. And if you are compiling without LTO and relying on it, it now means that you have to audit every build of your code from now until eternity or risk a bug showing up with no apparent cause long from now.

Extra number while looping through an array in C++

I am trying to loop through an array of integers using pointers using the following code:
#include <iostream>
int main (int argc, char ** argv)
{
int ar[] = {1,1,2,3,5,8,13,21,34,55};
char s[] = "string";
std::cout << "Print fibonacci until ten using pointers" << std::endl;
for (int * p = ar; *p; p++)
{
std::cout << *p << std::endl;
}
// for (char * cp = s; *cp; cp++)
// std::cout << "char is " << *cp << std::endl;
return 0;
}
On running this code, I get all 10 elements plus a number, 4196368.
But on uncommenting the second for-loop and running it again, the numbers vanishes.
Can someone explain why this happens? If needed, the code is compiled in a 64-bit Linux box.
You're lucky the loop stopped at all; you could have blown up your entire neighbourhood!
Your loop expects to find a "zero" to terminate the array iteration, but your array doesn't have one. Thus, your loop will just keep incrementing past the end of the array until god knows what. The practical results depend on too many practical factors to be either predictable or usefully explained.
I presume that this is an exercise, because using "null-termination" to iterate over an int array is mighty peculiar. In reality you'd just write:
for (auto x : ar)
std::cout << x << '\n';
}
You are invoking an undefined behavior.
The first for loop's termination condition is *p. So it is trying to access memory past what actually is owned by ar. Your loop then runs until it finds a memory location that contains 0 (read false) before terminating. In your case, it ran just one extra time (lucky you!). At my end, it ran four more times before terminating.
You must loop only as many times as the size of the array, which is sizeof(ar)/sizeof(ar[0])
Ensure that you have terminated zero:
int ar[] = {1,1,2,3,5,8,13,21,34,55, 0};
Well, actually this will result in a different outcome on a different machine or a different condition. The one that causes this is your for statement
for (int * p = ar; *p; p++)
{
std::cout << *p << std::endl;
}
Here, you used *p as a conditional for your for loop to keep running. As we know, C++ treat number > 0 as a true and 0 as a false. While in the for statement, your program checks the next memory address if the value in that address is zero or not (True or False). And as you know, the value of the next address in this particular case on your particular PC is 4196368. So the for loop keeps going until the value of the next address is zero. You can see this with printing the address.
for (int * p = ar; *p; p++)
{
std::cout << *p << " " << p << std::endl;
}
You will know here that your code check the next address to see its value an if it is indeed not zero, it will continue the loop.

C++11 Why does cout print large integers from a boolean array?

#include <iostream>
using namespace std;
int main() {
bool *a = new bool[10];
cout << sizeof(bool) << endl;
cout << sizeof(a[0]) << endl;
for (int i = 0; i < 10; i++) {
cout << a[i] << " ";
}
delete[] a;
}
The above code outputs:
1
1
112 104 151 0 0 0 0 0 88 1
The last line should contain garbage values, but why are they not all 0 or 1? The same thing happens for a stack-allocated array.
Solved: I forgot that sizeof counts bytes, not bits as I thought.
You have an array of default-initialized bools. Default-initialization for primitive types entail no initialization, thus they all have indeterminate values.
You can zero-initialize them by providing a pair of parentheses:
bool *a = new bool[10]();
Booleans are 1-byte integral types so the reason you're seeing this output is probably because that is the data on the stack at that moment that can be viewed with a single byte. Notice how they are values under 255 (the largest number that can be produced from an unsigned 1-byte integer).
OTOH, printing out an indeterminate value is Undefined Behavior, so there really is no logic to consider in this program.
sizeof(bool) on your machine returns 1.
That's 1 byte, not 1 bit, so the values you show can certainly be present.
What you are seeing is uninitialized values, different compilers generate different code. On GCC I see everything as 0 on windows i see junk values.
generally char is the smallest byte addressable- even though bool has 1/0 value- memory access wise it will be a char. Thus you will never see junk value greater than 255
Following initialization (memset fixes the things for you)
#include <iostream>
using namespace std;
int main() {
bool* a = new bool[10];
memset(a, 0, 10*sizeof(bool));
cout << sizeof(bool) << endl;
cout << sizeof(a[0]) << endl;
for (int i = 0; i < 10; ++i)
{
bool b = a[i];
cout << b << " ";
}
return 0;
}
Formally speaking, as pointed out in this answer, reading any uninitialized variable is undefined behaviour, which basically means everything is possible.
More practically, the memory used by those bools is filled with what you called garbage. ostreams operator<< inserts booleans via std::num_put::put(), which, if boolalpha is not set, converts the value present to an int and outputs the result.
I do not know why you put a * sign before variable a .
Is it a pointer to point a top element address of the array?

Why does a for loop in c++ access a memory location in non-initialized compared to the behavior of a normal cout?

I found this occurrence to be rather interesting, let me explain:
When I initialized a int array, I started to wonder how c++ handles an index with a no initialized value. When directly using cout, c++ directly outputs the values as 0. However, when inserting a for loop right afterwards, with the same purpose it instead points to the values inside the memory location, and pretends they were not initialized.
To regenerate this error, copy & paste the code below onto a compiler. Run it once without the for loop, and once with the for loop.
I am just interested to find out why this occurs.
#include <iostream>
using namespace std;
int main() {
int myArray[4];
myArray[2] = 32;
cout << "\n Val 1: "<< myArray[0] << "\n Val 2: "<< myArray[1]<< "\n Val 3: "<< myArray[2]<< "\n Val 4: "<< myArray[3]<< "\n Val 5: "<< myArray[4];
cout <<"\n ----------------------------";
/*
for(int i = 0; i < 5; i++){
cout << "\n Val " << i << ": " << myArray[i];
}
*/
return 0;
}
You are likely witnessing the work of a (clever) optimizer:
Without the for loop, you access the array elements with a fixed constant, a constant which the optimizer can easily proove will lead to an uninitialized value, which is also never used again. As such, it can optimize away actually reading the element from the uninitialized memory, because it is perfectly entitled to use some constant instead.
With the for loop, you have a second usage of the values (through the use of a dynamic index), and the optimizer has to ensure that the undefined value you read from the array elements in the first cout is the same as the one that is later read within the loop. Obviously, it does not try to unroll the loop - after that it would know how to optimize the reads away.
In any case, whenever you access an uninitialized value, that value is undefined, it can be anything, including zero (even though you are not yet invoking undefined behavior). Whenever you use such a value for memory access (uninitialized pointer etc.), you have undefined behavior at its worst.