Is it legal to initialize a possibly invalid reference without using it? - c++

I would like to save typing in some loop, creating reference to an array element, which might not exist. Is it legal to do so? A short example:
#include<vector>
#include<iostream>
#include<initializer_list>
using namespace std;
int main(void){
vector<int> nn={0,1,2,3,4};
for(size_t i=0; i<10; i++){
int& n(nn[i]); // this is just to save typing, and is not used if invalid
if(i<nn.size()) cout<<n<<endl;
}
};
https://ideone.com/nJGKdW compiles and runs the code just fine (I tried locally with both g++ and clang++), but I am not sure if I can count on that.
PS: Neither gcc not clang complain, even when compiled+run with -Wall and -g.
EDIT 2: The discussion focuses on array indexing. The real code actually uses std::list and a fragment would look like this:
std::list<int> l;
// the list contains something or not, don't know yet
const int& i(*l.begin());
if(!l.empty()) /* use i here */ ;
EDIT 3: Legal solution to what I was doing is to use iterator:
std::list<int> l;
const std::list<int>::iterator I(l.begin()); // if empty, I==l.end()
if(!l.empty()) /* use (*I) here */ ;

No it's not legal. You are reading data out of bounds from the vector in the declaration of n and therefore your program have undefined behavior.

No, for two reasons:
The standard states (8.3.2):
A reference shall be initialized to refer to a valid object or function
std::vector::operator[] guarantees that even if N exceeds the container size, the function never throws exceptions (no-throw guarantee, no bounds checking other than at()). However, in that case, the behavior is undefined.
Therefore, your program is not well-formed (bullet point 1) and invoke undefined behaviour (bullet point 2).

I'd be surprised if this is "allowed" by the specification. However, what it does is store the address of an element that is outside the range of its allocation, which shouldn't in itself cause a problem in most cases - in extreme cases, it may overflow the pointer type, which could cause problems, I suppose.
In other words, if i is WAY outside the size of nn, it could be a problem, not necessarily saying i has to be enormous - if each element in the vector is several megabytes (or gigabytes in a 64-bit machine), you can quite quickly run into problems with address range.
But don't ask me to quote the specification - someone else will probably do that.
Edit: As per comment, since you are requesting the address of a value outside of the valid size, at least in debug builds, this may well cause the vector implementation to assert or otherwise "warn you that this is wrong".

Related

Same array giving garbage value at one place and an unrelated value at the other place

In the following code:
#include<iostream>
using namespace std;
int main()
{
int A[5] = {10,20,30,40,50};
// Let us try to print A[5] which does NOT exist but still
cout <<"First A[5] = "<< A[5] << endl<<endl;
//Now let us print A[5] inside the for loop
for(int i=0; i<=5; i++)
{
cout<<"Second A["<<i<<"]"<<" = "<<A[i]<<endl;
}
}
Output:
The first A[5] is giving different output (is it called garbage value?) and the second A[5] which is inside the for loop is giving different output (in this case, A[i] is giving the output as i). Can anyone explain me why?
Also inside the for loop, if I declare a random variable like int sax = 100; then A[5] will take the value 100 and I don't have the slightest of clue why is this happening.
I am on Windows, CodeBlocks, GNUGCC Compiler
Well you invoke Undefined Behaviour, so behaviour is err... undefined and anything can happen including what your show here.
In common implementations, data past the end of array could be used by a different element, and only implementation details in the compiler could tell which one.
Here your implementation has placed the next variable (i) just after the array, so A[5] is an (invalid) accessor for i.
But please do not rely on that. Different compilers or different compilation options could give a different result. And as a compiler is free to assume that you code shall not invoke UB an optimizing compiler could just optimize out all of your code and only you would be to blame.
TL/DR: Never, ever try to experiment UB: anything can happen from a consistent behaviour to an immediate crash passing by various inconsistent outputs. And what you see will not be reproduced in a different context (context here can even be just a different run of same code)
In your Program, I think "there is no any syntax issue" because when I execute this same code in my compiler. Then there is no any issue likes you.
It gives same garbage value at direct as well as in loop.
enter image description here
The problem is that when you wrote:
cout <<"First A[5] = "<< A[5] << endl<<endl;//this is Undefined behavior
In the above statement you're going out of bounds. This is because array index starts from 0 and not 1.
Since your array size is 5. This means you can safely access A[0],A[1],A[2],A[3] and A[4].
On the other hand you cannot access A[5]. If you try to do so, you will get undefined behavior.
Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior.
So the output that you're seeing is a result of undefined behavior. And as i said don't rely on the output of a program that has UB.
So the first step to make the program correct would be to remove UB. Then and only then you can start reasoning about the output of the program.
For the same reason, in your for loop you should replace i<=5 with i<5.
1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.

Code running successfully using vector but showing error using array

I was practicing an array manipulation question. While solving I declared an array (array A in code).
For some test cases, I got a segmentation fault. I replaced the array with vector and got AC. I don't know the reason for this. Plz, explain.
#include <bits/stdc++.h>
using namespace std;
int main()
{
int n,m,a,b,k;
cin>>n>>m;
vector<long int> A(n+2);
//long int A[n+2]={0};
for(int i=0;i<m;i++)
{
cin>>a>>b>>k;
A[a]+=k;
A[b+1]-=k;
}
long res=0;
for(int i=1;i<n+2;i++)
{
A[i]+=A[i-1];
if(res<A[i])
res=A[i];
}
cout<<res;
return 0;
}
Since it looks like you haven't being programming in C++ for very long I will try to break it down for you to make it simpler to understand:
First of all c++ does not intialize any values for you this is not Java, so please do not do:
int n,m,a,b,k;
And then use:
A[a]+=k;
A[b+1]-=k;
At this point we have no idea what a and b are it might be -300 for all we know, you never intialized it. Hence, occasically you get lucky and the number that is initalized by the compiler does not cause a segmentation fault, and other times you are not so lucky and the value intialized by the compiler does cause a segmentation fault.
long int A[n+2]={0}; is not legal in Standard C++. There are a bunch of reasons for this and I think you stumbled over one of them.
Compilers that allow Variable Length Arrays follow the example of C99 and the array is allocated on the stack. Stack is a limited resource, usually between 1 and 10 MB for a desktop computer. If the user inputs an n of sufficient size, the array will take up too much of the stack or breach the bounds of the stack resulting in Undefined Behaviour. of then this behaviour manifests in a segmentation fault from accessing memory that is so far off the end of the stack that it's not controlled by the program. There are typically no warnings when you overflow the stack. Often a program crash or corrupted data is the the way you find out, and it's too late to salvage the program by then.
On the other hand, a vector allocates it's internal buffer from the freestore, and on a modern PC with virtual memory and 64 bit addressing the freestore is fantastically huge and throws an exception if you attempt to exceed what it can allocate.
Another important difference is
long int A[n+2]={0};
likely did not zero initialize the array. This is the case with g++. The first byte will be set to zero and the remainder are uninitialized. Such is the curse of using non-Standard extensions. You cannot count on the behaviour guaranteed by the Standard.
std::vector will zero initialize the whole array or set the array to whatever value you tell it to use.

Why declaring the size of an array as n, before inputting n, works for the first time but not the second time?

I was solving a problem, and I declared the size of an array as n, before inputting n's value and it worked for the first test case but not for the second test case. Why?
P.S: I couldn't find any relevant information online.
Here is the code snippet
int n,arr[n];
cin>>n;
int n,arr[n];
cin>>n;
This attempts to define a VLA (variable length array). However, VLAs are not part of C++.
This might probably supported as an extension of your compiler (e.g. g++ supports as an extension). In that case, you still have a problem. When you define the array, n is uninitialized. So it triggers undefined behaviour.
You'd want to read n before defining the VLA:
int n;
std::cin >> n;
int arr[n];
Beware that VLAs are allocated on stack. So if n value is sufficiently large, you will have undefined behaviour due to overflow (= undefined behaviour). For that reason, VLAs are best avoided. You could use std::vector<int> instead.
In your example, you are using the value of n before initializing it. This is UB.
Additionaly, variable length arrays are not allowed in c++. e.g.
int n;
// compute n somehow
int arr[n];
is also not allowed.
If your program doesn't follow the rules of the language, then anything can happen, e.g. working sometimes, working on some inputs, but not others, working on some compilers, but not others, etc. Basically, you can't have any expectations of a program that has undefined behaviour.

How are these two pieces of code different?

I tried these lines of code and found out shocking output. I am expecting some reason related to initialisation either in general or in for loop.
1.)
int i = 0;
for(i++; i++; i++){
if(i>10) break;
}
printf("%d",i);
Output - 12
2.)
int i;
for(i++; i++; i++){
if(i>10) break;
}
printf("%d",i);
Output - 1
I expected the statements "int i = 0" and "int i" to be the same.What is the difference between them?
I expected the statements "int i = 0" and "int i" to be the same.
No, that was a wrong expectation on your part. If a variable is declared outside of a function (as a "global" variable), or if it is declared with the static keyword, it's guaranteed to be initialized to 0 even if you don't write = 0. But variables defined inside functions (ordinary "local" variables without static) do not have this guaranteed initialization. If you don't explicitly initialize them, they start out containing indeterminate values.
(Note, though, that in this context "indeterminate" does not mean "random". If you write a program that uses or prints an uninitialized variable, often you'll find that it starts out containing the same value every time you run your program. By chance, it might even be 0. On most machines, what happens is that the variable takes on whatever value was left "on the stack" by the previous function that was called.)
See also these related questions:
Non-static variable initialization
Static variable initialization?
See also section 4.2 and section 4.3 in these class notes.
See also question 1.30 in the C FAQ list.
Addendum: Based on your comments, it sounds like when you fail to initialize i, the indeterminate value it happens to start out with is 0, so your question is now:
"Given the program
#include <stdio.h>
int main()
{
int i; // note uninitialized
printf("%d\n", i); // prints 0
for(i++; i++; i++){
if(i>10) break;
}
printf("%d\n", i); // prints 1
}
what possible sequence of operations could the compiler be emitting that would cause it to compute a final value of 1?"
This can be a difficult question to answer. Several people have tried to answer it, in this question's other answer and in the comments, but for some reason you haven't accepted that answer.
That answer again is, "An uninitialized local variable leads to undefined behavior. Undefined behavior means anything can happen."
The important thing about this answer is that it says that "anything can happen", and "anything" means absolutely anything. It absolutely does not have to make sense.
The second question, as I have phrased it, does not really even make sense, because it contains an inherent contradiction, because it asks, "what possible sequence of operations could the compiler be emitting", but since the program contains Undefined behavior, the compiler isn't even obliged to emit a sensible sequence of operations at all.
If you really want to know what sequence of operations your compiler is emitting, you'll have to ask it. Under Unix/Linux, compile with the -S flag. Under other compilers, I don't know how to view the assembly-language output. But please don't expect the output to make any sense, and please don't ask me to explain it to you (because I already know it won't make any sense).
Because the compiler is allowed to do anything, it might be emitting code as if your program had been written, for example, as
#include <stdio.h>
int main()
{
int i; // note uninitialized
printf("%d\n", i); // prints 0
i++;
printf("%d\n", i); // prints 1
}
"But that doesn't make any sense!", you say. "How could the compiler turn "for(i++; i++; i++) ..." into just "i++"? And the answer -- you've heard it, but maybe you still didn't quite believe it -- is that when a program contains undefined behavior, the compiler is allowed to do anything.
The difference is what you already observed. The first code initializes i the other does not. Using an unitialized value is undefined behaviour (UB) in c++. The compiler assumes UB does not happen in a correct program, and hence is allowed to emit code that does whatever.
Simpler example is:
int i;
i++;
Compiler knows that i++ cannot happen in a correct program, and the compiler does not bother to emit correct output for wrong input, hece when you run this code anything could happen.
For further reading see here: https://en.cppreference.com/w/cpp/language/ub
The is a rule of thumb that (among other things) helps to avoid uninitialized variables. It is called Almost-Always-Auto, and it suggests to use auto almost always. If you write
auto i = 0;
You cannot forget to initialize i, because auto requires an initialzer to be able to deduce the type.
PS: C and C++ are two different languages with different rules. Your second code is UB in C++, but I cannot answer your question for C.

Accessing unavailable memory location in character array (Out of range value)

I have a piece of code:
#include <iostream>
using namespace std;
int main(){
char a[5][5] = {{'x','y','z','a','v'},{'d','g','h','v','x'}};
for(int i=0; i<2; i++){
for(int j = 0; j<6; j++)
{
cout << a[i][j];
}
}
return 0;
}
As you can see the first and second dimensions are or size 5 elements each. With the double for loop's I am just printing what is initialized for variable a.
The size of int "j", as it increases the output changes dramatically.
Why is this happen?
Does pointer is the solution to this? If yes how? If no what can we do to avoid run time errors caused by this incorrect access?
You might be treating this issue like an out-of-bounds error in Java, where the behavior is strictly defined: you'll get an ArrayIndexOutOfBoundsException, and the program will immediately terminate, unless the exception is caught and handled.
In C++, this kind of out-of-bounds error is undefined behavior, which means the compiler is allowed to do whatever silly thing it thinks will achieve the best performance. Generally speaking, this results in the compiler just blindly performing the same pointer arithmetic it would perform on array accesses that are in-bounds, regardless of whether the memory is valid or not.
In your case, because you've allocated 25 chars worth of memory, you'll access valid memory (in most environments, UB withstanding) at least until i * 5 + j >= 25, at which point any number of things could happen:
You could get garbage data off the stack
You could crash the program with a Segmentation Fault (Access Violation in Windows/Visual Studio)
The loop could refuse to terminate at the index you expect it to terminate.
That last one is an incredible bug: If aggressive loop optimization is occurring, you could get some very odd behavior when you make mistakes like this in your code.
What's almost certainly happening in the code you wrote is that that first point: though you allocated space for 25 chars, you only defined the contents of 10 of them, meaning any accesses beyond those first 10 will invoke a different kind of undefined behavior (access of an uninitialized variable), which the vast majority of the time, results in their values being filled in with whatever coincidentally was in that memory space before the variable was used.