Weird C++ array initialization behaviour - c++

The following code
int x;
cin >> x;
int b[x];
b[5] = 8;
cout << sizeof(b)/sizeof(b[0]) << endl << b[5];
with x inputted as 10 gives the ouput:
10
8
which seems very weird to me because:
According to http://www.cplusplus.com/doc/tutorial/arrays/ we shouldn't even be able to initialize an array using a value obtained from cin stream.
NOTE: The elements field within square brackets [], representing the number of elements in the array, must be a constant expression, since arrays are blocks of static memory whose size must be determined at compile time, before the program runs.
But that's not the whole story! The same code with x inputted as 4 sometimes gives the output
Segmentation fault. Core dumped.
and sometimes gives the output:
4
8
What the heck is going on? Why doesn't the compiler act in a single manner? Why can I assign a value to an array index that is larger than the array? And why can we even initialize an array using variable in the first place?

I initially mentioned this as a comment, but seeing how no one has answered, I'm gonna add it here.
What you have demonstrated above is undefined behavior. It means that you can't tell what will the outcome be. As Brian adds in the comments, it will result in a diagnostic message (which could be a warning). Since the compiler would go ahead anyway, it can be called UB as it is not defined in the standard.

Related

Same array giving garbage value at one place and an unrelated value at the other place

In the following code:
#include<iostream>
using namespace std;
int main()
{
int A[5] = {10,20,30,40,50};
// Let us try to print A[5] which does NOT exist but still
cout <<"First A[5] = "<< A[5] << endl<<endl;
//Now let us print A[5] inside the for loop
for(int i=0; i<=5; i++)
{
cout<<"Second A["<<i<<"]"<<" = "<<A[i]<<endl;
}
}
Output:
The first A[5] is giving different output (is it called garbage value?) and the second A[5] which is inside the for loop is giving different output (in this case, A[i] is giving the output as i). Can anyone explain me why?
Also inside the for loop, if I declare a random variable like int sax = 100; then A[5] will take the value 100 and I don't have the slightest of clue why is this happening.
I am on Windows, CodeBlocks, GNUGCC Compiler
Well you invoke Undefined Behaviour, so behaviour is err... undefined and anything can happen including what your show here.
In common implementations, data past the end of array could be used by a different element, and only implementation details in the compiler could tell which one.
Here your implementation has placed the next variable (i) just after the array, so A[5] is an (invalid) accessor for i.
But please do not rely on that. Different compilers or different compilation options could give a different result. And as a compiler is free to assume that you code shall not invoke UB an optimizing compiler could just optimize out all of your code and only you would be to blame.
TL/DR: Never, ever try to experiment UB: anything can happen from a consistent behaviour to an immediate crash passing by various inconsistent outputs. And what you see will not be reproduced in a different context (context here can even be just a different run of same code)
In your Program, I think "there is no any syntax issue" because when I execute this same code in my compiler. Then there is no any issue likes you.
It gives same garbage value at direct as well as in loop.
enter image description here
The problem is that when you wrote:
cout <<"First A[5] = "<< A[5] << endl<<endl;//this is Undefined behavior
In the above statement you're going out of bounds. This is because array index starts from 0 and not 1.
Since your array size is 5. This means you can safely access A[0],A[1],A[2],A[3] and A[4].
On the other hand you cannot access A[5]. If you try to do so, you will get undefined behavior.
Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior.
So the output that you're seeing is a result of undefined behavior. And as i said don't rely on the output of a program that has UB.
So the first step to make the program correct would be to remove UB. Then and only then you can start reasoning about the output of the program.
For the same reason, in your for loop you should replace i<=5 with i<5.
1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.

Why sizeof (array A[n]) without n defined in C++is fixed?

When I try to find the sizeof(A) where A is of type int with size as 'n', n is an int that is not defined. I get an output of 496 and when I give a value to n and then check it, sizeof(A) gives me the same values of 496.
I know Array is a static data type so it will have memory irrespective of 'n' but can anyone explain me where the value 496 came from?
int main()
{
int n;
int A[n];
cout<<sizeof(A)<<"\n";
cin>>n;
cout<<sizeof(A);
return 0;
}
where A is of type int with size as 'n'
int n;
int A[n];
The type of A is not "int with size as 'n'". The type of A is int[n] which is array of n integers. However, since n is not a compile time constant, the program is ill-formed. If we were to look past the ill-formedness, the value of n is indeterminate. Reading an indeterminate value results in undefined behaviour.
anyone explain me where the value 496 came from?
It came from undefined behaviour. You can find more details by reading the assembly of the compiled program that produced that result.
The first cout statement cout<<sizeof(A)<<"\n"; in your code is giving 0 as output. Irrespective of what I take n as input, the next cout statement is also giving a 0. There are two declarations here, int n and int A[n]. As a beginner, it is fair to assume that n remains the same in both cases or has the same value, therefore the size shouldn't change. However, one is an integer(n), the other is an array of integer(A[n]). That makes all the difference!
The first time you print the size of A[n], you are getting a 0 because the array is only declared and not initialized so we know it's empty. The next time, you are taking n as an input, so its size should be 4 bytes(try it yourself) because it's an integer.
Having said that, it really depends on the type of compiler or operating system you are using. I got 4 as an output in one of the online compilers and when I tried implementing it on codeblocks and vscode, I got 32 and 80 respectively. Essentially, this is an undefined behaviour even if n had a garbage value!

C++/Address Space: 2 Bytes per adress?

I was just trying something and i was wondering how this could be. I have the following Code:
int var1 = 132;
int var2 = 200;
int *secondvariable = &var2;
cout << *(secondvariable+2) << endl << sizeof(int) << endl;
I get the Output
132
4
So how is it possible that the second int is only 2 addresses higher? I mean shouldn't it be 4 addresses? I'm currently under WIN10 x64.
Regards
With cout << *(secondvariable+2) you don't print a pointer, you print the value at secondvariable[2], which is an invalid indexing and lead to undefined behavior.
If you want to print a pointer then drop the dereference and print secondvariable+2.
While you already are far in the field of undefined behaviour (see Some programmer dude's answer) due to indexing an array out of bounds (a single variable is considered an array of length 1 for such matters), some technical background:
Alignment! Compilers are allowed to place variables at addresses such that they can be accessed most efficiently. As you seem to have gotten valid output by adding 2*sizeof(int) to the second variable's address, you apparently have reached the first one by accident. Apparently, the compiler decided to leave a gap in between the two variables so that both can be aligned to addresses dividable by 8.
Be aware, though, that you don't have any guarantee for such alignment, different compilers might decide differently (or same compiler on another system), and alignment even might be changed via compiler flags.
On the other hand, arrays are guaranteed to occupy contiguous memory, so you would have gotten the expected result in the following example:
int array[2];
int* a0 = &array[0];
int* a1 = &array[1];
uintptr_t diff = static_cast<uintptr_t>(a1) - static_cast<uintptr_t>(a0);
std::cout << diff;
The cast to uintptr_t (or alternatively to char*) assures that you get address difference in bytes, not sizes of int...
This is not how C++ works.
You can't "navigate" your scope like this.
Such pointer antics have completely undefined behaviour and shall not be relied upon.
You are not punching holes in tape now, you are writing a description of a program's semantics, that gets converted by your compiler into something executable by a machine.
Code to these abstractions and everything will be fine.

C++: float value reused across iteration

Let's look at the following piece of code which I unintentionally wrote:
void test (){
for (int i = 1; i <=5; ++i){
float newNum;
newNum +=i;
cout << newNum << " ";
}
}
Now, this is what I happened in my head:
I have always been thinking that float newNum would create a new variable newNum with a brand-new value for each iteration since the line is put inside the loop. And since float newNum doesn't throw a compile error, C++ must be assigning some default value (huhm, must be 0). I then expected the output to be "1 2 3 4 5". What was printed was "1 3 6 10 15".
Please help me know what's wrong with my expectation that float newNum would create a new variable for each iteration?
Btw, in Java, this piece of code won't compile due to newNum not initialized and that's probably better for me since I would know I need to set it to 0 to get the expected output.
Since newNum is not initialized explicitly, it will have a random value (determined by the garbage data contained in the memory block it is allocated to), at least on the first iteration.
On subsequent iterations, it may have its earlier values reused (as the compiler may allocate it repeatedly to the same memory location - this is entirely up to the compiler's discretion). Judging from the output, this is what actually happened here: in the first iteration newNum had the value 0 (by pure chance), then 1, 3, 6 and 10, respectively.
So to get the desired output, initialize the variable explicitly inside the loop:
float newNum = 0.0;
C++ must be assigning some default
value (huhm, must be 0)
This is the mistake in your assumptions. C++ doesn't attempt to assign default values, you must explicitly initialise everything.
Most likely it will assign the same location in memory each time around the loop and so (in this simple case) newNum will probably seem to persist from each iteration to the next.
In a more complicated scenario the memory assigned to newNum would be in an essentially random state and you could expect weird behaviour.
http://www.cplusplus.com/doc/tutorial/variables/
The float you are creating is not initialised at all. Looks like you got lucky and it turned out to be zero on the first pass, though it could have had any value in it.
In each iteration of the loop a new float is created, but it uses the same bit of memory as the last one, so ended up with the old value that you had.
To get the effect you wanted, you will need to initialise the float on each pass.
float newNum = 0.0;
The mistake in thoughts you've expressed is "C++ must be assigning some default value". It will not. newNum contains dump.
You are using an uninitialized automatic stack variable. In each loop iteration it is located at the same place on the stack, so event though it has an undefined value, in your case it will be the value of the previous iteration.
Also beware that in the first iteration it could potentialliay have any value, not only 0.0.
You might have got your expected answer in a debug build but not release as debug builds sometimes initialise variables to 0 for you. I think this is undefined behaviour - because C++ doesn't auto initialise variables for you, every time around the loop it is creating a new variable but it keeps using the same memory as it was just released and doesn't scrub out the previous value. As other people have said you could have ended up with complete nonsense printing out.
It is not a compile error to use an uninitialised variable but there should usually be a warning about it. Always good to turn warnings on and try to remove all of them incase something nastier is hidden amongst them.
Using garbage value in your code invokes Undefined Behaviour in C++. Undefined Behavior means anything can happen i.e the behavior of the code is not defined.
float newNum; //uninitialized (may contain some garbage value)
newNum +=i; // same as newNum=newNum+i
^^^^^
Whoa!!
So better try this
float newNum=0; //initialize the variable
for (int i = 1; i <=5; ++i){
newNum +=i;
cout << newNum << " ";
}
C++ must be assigning some default value (huhm, must be 0).
C++ doesn't initialize things without default constructor (it may set it to something like 0xcccccccc on debug build, though) - because as any proper tool, compiler "thinks" that if you haven't provided initialization then it is what you wanted. float doesn't have default constructor, so it is unknown value.
I then expected the output to be "1 2 3 4 5". What was printed was "1 3 6 10 15".
Please help me know what's wrong with my expectation that float newNum would create a new variable for each iteration?
Variable is a block of memory. In this variable is allocated on stack. You didn't initialize it, and each iteration it just happen to be placed on the same memory address, which is why it stores previous value. Of course, you shouldn't rely on such behavior. If you want value to persist across iterations, declare it outside of loop.
Btw, in Java, this piece of code won't compile due to newNum not initialized
BTW, in C++ normal compiler would give you a warning that variable is not initialized (Example: "warning C4700: uninitialized local variable 'f' used"). And on debug build you would get crt debug error (Example: "Run-Time check failure #3 - The variable 'f' is being used without being initialized").
and that's probably better
Negative. You DO need uninitialized variables from time to time (normally - to initialize them without "standard" assignment operator - by passing into function by pointer/reference, for example), and forcing me to initialize every one of them will be a waste of my time.
I don't do too much C++ since last month, but:
The float values is newly allocated for each iteration. I'm a bit surprised about the initial zero value, though. The thing is, after each iteration, the float value runs out of scope, the next step (the reenter of the loop scope) first reallocates the float memory and this will often return the same memory block that was just freed.
(I'm waiting for any bashing :-P)

What's C++ Really Doing When I Accidently Use a Variables to Declare Array Length?

I was helping a friend with some C++ homework. I warned said friend that the kind of programming I do (PHP, Perl, Python) is pretty different from C++, and there were no guarantees I wouldn't tell horrible lies.
I was able to answer his questions, but not without stumbling over my own dynamic background. While I was reacquainting myself with C++ array semantics, I did something stupid like this (simplified example to make my question clearer)
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char easy_as_one_two_three[] = {'A','B','C'};
int an_int = 1;
//I want an array that has a length of the value
//that's currently in an_int (1)
//This clearly (to a c++ programmer) doesn't do that.
//but what is it doing?
char breaking_things[an_int];
cout << easy_as_one_two_three << endl;
return 1;
}
When I compile and run this program, it produces the following output
ABC????
However, if I comment out my bogus array declaration
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char easy_as_one_two_three[] = {'A','B','C'};
int an_int = 1;
//I want an array that has a length of the value
//that's currently in an_int (1)
//This clearly (to a c programmer) doesn't do that.
//but what is it doing?
//char breaking_things[an_int];
cout << easy_as_one_two_three << endl;
return 1;
}
I get the output I expect:
ABC
So, what exactly is happening here? I understand (vaguely) that when you create an array, you're pointing to a specific memory address, and when you give an array a length, you're telling the computer "reserve the next X blocks for me".
What I don't understand is, when I use a variable in an array declaration, what am I telling the computer to do, and why does it have an effect on a completely separate array?
Compiler is g++, version string is
science:c++ alanstorm$ g++ -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5493~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5493)
Update:
Neil pointed out in his comment to the question that you will get error if you compile this with -Wall and -pedantic flags in g++.
error: ISO C++ forbids variable-size array
You are getting ABC???? because it prints the contents of the array (ABC) and continues to print until it encounters a \0.
Had the array been {'A','B','C', '\0'};, the output will be just ABC as expected.
Variable-length arrays were introduced in C99 - this doesn't seem to apply to C++ though.
It is undefined behavior. Even if you comment out the bogus declaration, the printed output is not always what you expect (ABC). Try giving ASCII values of some printable character (something between 32 and 126) to an_int instead of 1 and you will see the difference.
an_int output
------------------------
40 ABC(
65 ABCA
66 ABCB
67 ABCC
296 ABC(
552 ABC(
1064 ABC(
1024*1024 + 40 ABC(
See the pattern here? Apparently it interprets the last byte (LSB) of the an_int as a char, prints it, somehow finds a null char afterwards and stops printing. I think the "somehow" has to do something with the MSB portion of an_int being filled with zeros, but I'm not sure (and couldn't get any results to support this argument either).
UPDATE: It is about the MSB being filled zeros. I got the following results.
ABC( for 40 - (3 zero bytes and a 40),
ABC(( for 10280 (which is (40 << 8) + 40) - (2 zero bytes and two 40s),
ABC((( for 2631720 (which is (10280 << 8) + 40) - (1 zero byte and three 40s),
ABC((((°¿® for 673720360 (which is (2631720 << 8) + 40) - no zero bytes and hence prints random chars until a zero byte is found.
ABCDCBA0á´¿á´¿® for (((((65 << 8) + 66) << 8) + 67) << 8) + 68;
These results were obtained on a little endian processor with 8-bit atomic element size and 1-byte address increment, where 32 bit integer 40 (0x28 in hex) is represented as 0x28-0x00-0x00-0x00 (LSB at the lowest address). Results might vary from compiler to compiler and platform to platform.
Now if you try uncommenting the bogus declaration, you will find that all the outputs are of the form ABC-randomchars-char_corresponding_to_an_int. This again is the result of undefined behavior.
That will not "reacquaint" you "with c++ array semantics" since in C++ it is simply illegal. In C++ arrays can only be declared with sizes defined by Integral Constant Expressions (ICE). In your example the size is not an ICE. It only compiles because of GCC-specific extension.
From the C point of view, this is actually perfectly legal in C99 version of the language. And it does produce a so-called Variable Length Array of length 1. So your "clearly" comment is incorrect.
It isn't invalid syntax. It's syntactically just fine.
It's semantically invalid C++, and rejected by my compiler (VC++). g++ seems to have an extension that allow the use of C99 VLAs in C++.
The reason for the question marks is that your array of three characters is not null terminated; it's printing until it finds a null on the stack. The layout of the stack is influenced by the variables declared on the stack. With the array, the layout is such that there's garbage prior to the first null; without the array there isn't. That is all.
You get the output that you expect or don't expect by dumb luck. Because you didn't null terminate the characters in your array, when you go to print it out to cout it'll print the A, the B, and the C, and whatever else it finds until it hits a NULL character. With the array declaration, there's probably something that the compiler is pushing onto the stack to make the array sized at runtime that's leaving you with garbage characters after the A, B, and C whereas when you don't there just happens to be a 0 after the C on the stack.
Again, it's just dumb luck. To always get what you expect you should do: char easy_as_one_two_three[] = { 'A','B','C','\0'}; or, probably more usefully char easy_as_one_two_three[] = "ABC";, which will properly null terminate the string.
char breaking_things[an_int] is allocating char array of size an_int (in your case 1), It's called variable length array and it's a relatively new feature.
In case like this it's more common to dynamically allocate memory using new:
char* breaking_things = new char[an_int]; // C++ way, C programmer would use malloc
It's probably not breaking_things that broke things. The first array is not a NUL (\0) terminated string, which explains the output - cout will print whatever comes after ABC up until the first NUL it encounters.
As for the size of breaking_things, I would suspect it differs between compilers. I believe at least earlier versions of gcc used whatever value the variable happened to have at compile time, which can be tricky to determine.
Output is like this since it will print the content of the char array until it finds a null character .
Make sure that char array must be null terminated string and specify the size of the array --> total chars + 1 (for null char) .