Why can we use uninitialized variables in C++?

Why can we use uninitialized variables in C++? - c++

In programming languages like Java, C# or PHP we can't use uninitialized variables. This makes sense to me.
C++ dot com states that uninitialized variables have an undetermined value until they are assigned a value for the first time. But for integer case it's 0?
I've noticed we can use it without initializing and the compiler shows no error and the code is executed.
Example:
#include <iostream>
using namespace std;
int main()
{
int a;
char b;
a++; // This works... No error
cout<< a << endl; // Outputs 1
// This is false but also no error...
if(b == '0'){
cout << "equals" << endl;
}
return 0;
}
If I tried to replicate above code in other languages like C#, it gives me compilation error. I can't find anything in the official documentation.
I highly value your help.

C++ gives you the ability to shoot yourself in the foot.
Initialising an integral type variable to 0 is a machine instruction typically of the form
REG XOR REG
Its presence is less than satisfactory if you want to initialise it to something else. That's abhorrent to a language that prides itself on being the fastest. Your assertion that integers are initialised to zero is not correct.
The behaviour of using an uninitialised variable in C++ is undefined.

It isn't feasible or even possible to detect or prove that variable is used uninitialized in all cases. For example:
int a;
if (<complex condition>)
a = 0;
if (<another complex condition>)
a = 1;
++a;
Can there be case when both conditions are false? You wouldn't know, unless you do an extensive analysis of your program. Pointers to variables can be passed, multithreading might be involved, making analysis even harder.
So, the decision was made to trust the programmer and merely declare those UB.
Modern compilers can issue warnings in many cases of uninitialized variable usage, and you should always use maximum warning level.

Anything is possible when your code has undefined behavior.
Correct code does not contain undefined behavior. Using the value of an uninitialized variable is undefined behavior.
The concept of undefined behavior is not unique to C++, but in C++ it is more important than elsewhere because there are so many chances to write wrong code without getting a compiler error.
However, the compiler is your friend. Use it! For example with gcc -Wall -Werror should be your default to get the error message:
<source>: In function 'int main()':
<source>:9:6: error: 'a' is used uninitialized [-Werror=uninitialized]
9 | a++; // This works... No error
| ~^~
<source>:13:5: error: 'b' is used uninitialized [-Werror=uninitialized]
13 | if(b == '0'){
| ^~
cc1plus: all warnings being treated as errors
Though, not all cases of undefined behavior can be caught by warnings (that can be treated as errors).
C++ dot com states that uninitialized variables have an undetermined value until they are assigned a value for the first time. But for integer case it's 0?
The correct term is indeterminate. As you can see in the above compiler output, there is no difference for your int a;. When anything can happen then undefined behavior can look like correct behavior, nevertheless it must be fixed.
TL;DR: You cannot use the value of an uninitialized variable. Code that compiles without errors is not necessarily correct.

There is no way to "mark" a variable as being uninitialized unless you store an extra bit of information somewhere, or reserve a value in the range of values that the data type covers. Plus every reference to the variable would have to test for uninitializedness.
All of this is completely unacceptable.
Also note that automatic variables are not implicitly initialized to some value (say 0) because this has a cost at run-time, even if the variable is not used.

As others have stated it's not always feasible for the compiler to detect if the variable is uninitialized and C and C++ prefer performance in those cases.
However, there are some additional points:
There are dynamic checkers that will detect if any of your test-cases uses an uninitialized variable. That only works if you don't zero-initialize them "just in case".
In C++ you can mix statements and declarations, so instead of
int a,b,c;
...
c=2;
a=12*c;
b=...;
you can write:
...
int c=2;
int a=12*c;
int b=...;
and if you don't modify them further you can add const as well, and lambdas are also useful for this.
If you really need to represent a possibly uninitialized variable use std::optional<...>. It can avoid some of those 'possibly uninitialized' cases and can detect if you try to access it when uninitialized. But it has a cost.

Related

Why is it allowed to use a variable in its own declaration? [duplicate]

I noticed just now that the following code can be compiled with clang/gcc/clang++/g++, using c99, c11, c++11 standards.
int main(void) {
int i = i;
}
and even with -Wall -Wextra, none of the compilers even reports warnings.
By modifying the code to int i = i + 1; and with -Wall, they may report:
why.c:2:13: warning: variable 'i' is uninitialized when used within its own initialization [-Wuninitialized]
int i = i + 1;
~ ^
1 warning generated.
My questions:
Why is this even allowed by compilers?
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?

Because i is uninitialized when use to initialize itself, it has an indeterminate value at that time. An indeterminate value can be either an unspecified value or a trap representation.
If your implementation supports padding bits in integer types and if the indeterminate value in question happens to be a trap representation, then using it results in undefined behavior.
If your implementation does not have padding in integers, then the value is simply unspecified and there is no undefined behavior.
EDIT:
To elaborate further, the behavior can still be undefined if i never has its address taken at some point. This is detailed in section 6.3.2.1p2 of the C11 standard:
If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage
class (never had its address taken), and that object is uninitialized
(not declared with an initializer and no assignment to it
has been performed prior to use), the behavior is undefined.
So if you never take the address of i, then you have undefined behavior. Otherwise, the statements above apply.

This is a warning, it's not related to the standard.
Warnings are heuristic with "optimistic" approach. The warning is issued only when the compiler is sure that it's going to be a problem. In cases like this you have better luck with clang or newest versions of gcc as stated in comments (see another related question of mine: why am I not getting an "used uninitialized" warning from gcc in this trivial example?).
anyway, in the first case:
int i = i;
does nothing, since i==i already. It is possible that the assignment is completely optimized out as it's useless. With compilers which don't "see" self-initialization as a problem you can do this without a warning:
int i = i;
printf("%d\n",i);
Whereas this triggers a warning all right:
int i;
printf("%d\n",i);
Still, it's bad enough not to be warned about this, since from now on i is seen as initialized.
In the second case:
int i = i + 1;
A computation between an uninitialized value and 1 must be performed. Undefined behaviour happens there.

I believe you are okay with getting the warning in case of
int i = i + 1;
as expected, however, you expect the warning to be displayed even in case of
int i = i;
also.
Why is this even allowed by compilers?
There is nothing inherently wrong with the statement. See the related discussions:
Why does the compiler allow initializing a variable with itself?
Why is initialization of a new variable by itself valid?
for more insight.
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?
This is undefined behavior, as the type int can have trap representation and you never have taken the address of the variable in discussion. So, technically, you'll face UB as soon as you try to use the (indeterminate) value stored in variable i.
You should turn on your compiler warnings. In gcc,
compile with -Winit-self to get a warning. in C.
For C++, -Winit-self is enabled with -Wall already.

What's the behavior of an uninitialized variable used as its own initializer?

I noticed just now that the following code can be compiled with clang/gcc/clang++/g++, using c99, c11, c++11 standards.
int main(void) {
int i = i;
}
and even with -Wall -Wextra, none of the compilers even reports warnings.
By modifying the code to int i = i + 1; and with -Wall, they may report:
why.c:2:13: warning: variable 'i' is uninitialized when used within its own initialization [-Wuninitialized]
int i = i + 1;
~ ^
1 warning generated.
My questions:
Why is this even allowed by compilers?
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?

Because i is uninitialized when use to initialize itself, it has an indeterminate value at that time. An indeterminate value can be either an unspecified value or a trap representation.
If your implementation supports padding bits in integer types and if the indeterminate value in question happens to be a trap representation, then using it results in undefined behavior.
If your implementation does not have padding in integers, then the value is simply unspecified and there is no undefined behavior.
EDIT:
To elaborate further, the behavior can still be undefined if i never has its address taken at some point. This is detailed in section 6.3.2.1p2 of the C11 standard:
If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage
class (never had its address taken), and that object is uninitialized
(not declared with an initializer and no assignment to it
has been performed prior to use), the behavior is undefined.
So if you never take the address of i, then you have undefined behavior. Otherwise, the statements above apply.

This is a warning, it's not related to the standard.
Warnings are heuristic with "optimistic" approach. The warning is issued only when the compiler is sure that it's going to be a problem. In cases like this you have better luck with clang or newest versions of gcc as stated in comments (see another related question of mine: why am I not getting an "used uninitialized" warning from gcc in this trivial example?).
anyway, in the first case:
int i = i;
does nothing, since i==i already. It is possible that the assignment is completely optimized out as it's useless. With compilers which don't "see" self-initialization as a problem you can do this without a warning:
int i = i;
printf("%d\n",i);
Whereas this triggers a warning all right:
int i;
printf("%d\n",i);
Still, it's bad enough not to be warned about this, since from now on i is seen as initialized.
In the second case:
int i = i + 1;
A computation between an uninitialized value and 1 must be performed. Undefined behaviour happens there.

I believe you are okay with getting the warning in case of
int i = i + 1;
as expected, however, you expect the warning to be displayed even in case of
int i = i;
also.
Why is this even allowed by compilers?
There is nothing inherently wrong with the statement. See the related discussions:
Why does the compiler allow initializing a variable with itself?
Why is initialization of a new variable by itself valid?
for more insight.
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?
This is undefined behavior, as the type int can have trap representation and you never have taken the address of the variable in discussion. So, technically, you'll face UB as soon as you try to use the (indeterminate) value stored in variable i.
You should turn on your compiler warnings. In gcc,
compile with -Winit-self to get a warning. in C.
For C++, -Winit-self is enabled with -Wall already.

Why gcc and clang both don't emit any warning?

Suppose we have code like this:
int check(){
int x = 5;
++x; /* line 1.*/
return 0;
}
int main(){
return check();
}
If line 1 is commented out and the compiler is started with all warnings enabled, it emits:
warning: unused variable ‘x’ [-Wunused-variable]
However if we un-comment line 1, i.e. increase x, then no warning is emitted.
Why is that? Increasing the variable is not really using it.
This happen in both GCC and Clang for both c and c++.

Yes.
x++ is the same as x = x+1;, the assignment. When you are assigning to something, you possibly can not skip using it. The result is not discarded.
Also, from the online gcc manual, regarding -Wunused-variable option
Warn whenever a local or static variable is unused aside from its declaration.
So, when you comment the x++;, it satisfies the condition to generate and emit the warning message. When you uncomment, the usage is visible to the compiler (the "usefulness" of this particular "usage" is questionable, but, it's an usage, nonetheless) and no warning.

With the preincrement you are incrementing and assigning the value to the variable again. It is like:
x=x+1
As the gcc documentation says:
-Wunused-variable:
Warn whenever a local or static variable is unused aside from its declaration.
If you comment that line you are not using the variable aside of the line in which you declare it

increasing variable not really using it.
Sure this is using it. It's doing a read and a write access on the stored object. This operation doesn't have any effect in your simple toy code, and the optimizer might notice that and remove the variable altogether. But the logic behind the warning is much simpler: warn iff the variable is never used.
This has actually the benefit that you can silence that warning in cases where it makes sense:
void someCallback(void *data)
{
(void)data; // <- this "uses" data
// [...] handler code that doesn't need data
}

Why is that? increasing variable not really using it.
Yes, it is really using it. At least from the language point of view. I would hope that an optimizer removes all trace of the variable.
Sure, that particular use has no effect on the rest of the program, so the variable is indeed redundant. I would agree that warning in this case would be helpful. But that is not the purpose of the warning about being unused, that you mention.
However, consider that analyzing whether a particular variable has any effect on the execution of the program in general is quite difficult. There has to be a point where the compiler stops checking whether a variable is actually useful. It appears that the stages that generate warnings of the compilers that you tested only check whether the variable is used at least once. That once was the increment operation.

I think there is a misconception about the word 'using' and what the compiler means with that. When you have a ++i you are not only accessing the variable, you are even modifying it, and AFAIK this counts as 'use'.
There are limitations to what the compiler can identify as 'how' variables are being used, and if the statements make any sense. In fact both clang and gcc will try to remove unnecessary statements, depending on the -O-flag (sometimes too aggressively). But these optimizations happen without warnings.
Detecting a variable that is never ever accessed or used though (there is no further statement mentioning that variable) is rather easy.

I agree with you, it could generate a warning about this. I think it doesn't generate a warning, because developers of the compilers just didn't bothered handling this case (yet). Maybe it is because it is too complicated to do. But maybe they will do this in the future (hint: you can suggest them this warning).
Compilers getting more and more warnings. For example, there is -Wunused-but-set-variable in GCC (which is a "new" warning, introduced in GCC 4.6 in 2011), which warns about this:
void fn() {
int a;
a = 2;
}
So it is completely fine to expect that this emits a warning too (there is nothing different here, neither codes do anything useful):
void fn() {
int a = 1;
a++;
}
Maybe they could add a new warning, like -Wmeaningless-variable

As per C standard ISO/IEC 9899:201x, expressions evaluation are always executed to allow for expression's side effects to be produced unless the compiler can't be sufficiently sure that removing it the program execution is not altered.
5.1.2.3 Program execution
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
When removing the line
++x;
The compiler can deduce that the local variable x is defined and initialized, but not used.
When you add it, the expression itself can be considered a void expression, that must be evaluated for side effects, as stated in:
6.8.3 Expression and null statements
The expression in an expression statement is evaluated as a void expression for its side effects.
On the other hand to remove compiler warnings relative to unused variable is very common to cast the expression to void. I.e. for an unused parameter in a function you can write:
int MyFunc(int unused)
{
(void)unused;
...
return a;
}
In this case we have a void expression that reference the symbol unused.

Need help regarding macro definition

Im reading c++ code, i have found such definition
#define USE_VAL(X) if (&X-1) {}
has anybody idea, what does it mean?

Based on the name, it looks like a way of getting rid of an "unused variable" warning. The intended use is probably something like this:
int function(int i)
{
USE_VAL(i)
return 42;
}
Without this, you could get a compiler warning that the parameter i is unused inside the function.
However, it's a rather dangerous way of going about this, because it introduces Undefined Behaviour into the code (pointer arithmetic beyond bounds of an actual array is Undefined by the standard). It is possible to add 1 to an address of an object, but not subtract 1. Of course, with + 1 instead of - 1, the compiler could then warn about "condition always true." It's possible that the optimiser will remove the entire if and the code will remain valid, but optimisers are getting better at exploiting "undefined behaviour cannot happen," which could actually mess up the code quite unexpectedly.
Not to mention that fact that operator& could be overloaded for the type involved, potentially leading to undesired side effects.
There are better ways of implementing such functionality, such as casting to void:
#define USE_VAL(X) static_cast<void>(X)
However, my personal preference is to comment out the name of the parameter in the function definition, like this:
int function(int /*i*/)
{
return 42;
}
The advantage of this is that it actually prevents you from accidentally using the parameter after passing it to the macro.

Typically it's to avoid an "unused return value" warning. Even if the usual "cast to void" idiom normally works for unused function parameters, gcc with -pedantic is particularly strict when ignoring the return values of functions such as fread (in general, functions marked with __attribute__((warn_unused_result))), so a "fake if" is often used to trick the compiler in thinking you are doing something with the return value.

A macro is a pre-processor directive, meaning that wherever it's used, it will be replaced by the relevant piece of code.
and here after USE_VAL(X) the space it is explain what will USE_VAL(X) do.
first it take the address of x and then subtract 1 from it. if it is 0 then do nothing.
where USE_VAL(X) will used it will replaced by the if (&X-1) {}

“Uninitialized use” warning in the g++ compiler

I’m using g++ with warning level -Wall -Wextra and treating warnings as errors (-Werror).
Now I’m sometimes getting an error “variable may be used uninitialized in this function”.
By “sometimes” I mean that I have two independent compilation units that both include the same header file. One compilation unit compiles without error, the other gives the above error.
The relevant piece of code in the header files is as follows. Since the function is pretty long, I’ve only reproduced the relevant bit below.
The exact error is:
'cmpres' may be used uninitialized in this function
And I’ve marked the line with the error by * below.
for (; ;) {
int cmpres; // *
while (b <= c and (cmpres = cmp(b, pivot)) <= 0) {
if (cmpres == 0)
::std::iter_swap(a++, b);
++b;
}
while (c >= b and (cmpres = cmp(c, pivot)) >= 0) {
if (cmpres == 0)
::std::iter_swap(d--, c);
--c;
}
if (b > c) break;
::std::iter_swap(b++, c--);
}
(cmp is a functor that takes two pointers x and y and returns –1, 0 or +1 if *x < *y, *x == *y or *x > *y respectively. The other variables are pointers into the same array.)
This piece of code is part of a larger function but the variable cmpres is used nowhere else. Hence I fail to understand why this warning is generated. Furthermore, the compiler obviously understands that cmpres will never be read uninitialized (or at least, it doesn’t always warn, see above).
Now I have two questions:
Why the inconsistent behaviour? Is this warning generated by a heuristic? (This is plausible since emitting this warning requires a control flow analysis which is NP hard in the general case and cannot always be performed.)
Why the warning? Is my code unsafe? I have come to appreciate this particular warning because it has saved me from very hard to detect bugs in other cases – so this is a valid warning, at least sometimes. Is it valid here?

An algorithm that diagnoses uninitialized variables with no false negatives or positives must (as a subroutine) include an algorithm that solves the Halting Problem. Which means there is no such algorithm. It is impossible for a computer to get this right 100% of the time.
I don't know how GCC's uninitialized variable analysis works exactly, but I do know it's very sensitive to what early optimization passes have done to the code. So I'm not at all surprised you get false positives only sometimes. It does distinguish cases where it's certain from cases where it can't be certain --
int foo() { int a; return a; }
produces "warning: ‘a’ is used uninitialized in this function" (emphasis mine).
EDIT: I found a case where recent versions of GCC (4.3 and later) fail to diagnose an uninitialized variable:
int foo(int x)
{
int a;
return x ? a : 0;
}
Early optimizations notice that if x is nonzero, the function's behavior is undefined, so they assume x must be zero and replace the entire body of the function with "return 0;" This happens well before the pass that generates the used-uninitialized warnings, so there's no diagnostic. See GCC bug 18501 for gory details.
I bring this up partially to demonstrate that production-grade compilers can get uninitialized-variable diagnostics wrong both ways, and partially because it's a nice example of the point that undefined behavior can propagate backward in execution time. There's nothing undefined about testing x, but because code control-dependent on x has undefined behavior, a compiler is allowed to assume that the control dependency is never satisfied and discard the test.

There was an interesting discussion on clang dev-mailing list related to those heuristics this week.
The bottom line is: it's actually quite difficult to diagnose unitialized values without getting exponential behavior...
Apparently (from the discussion), gcc uses a predicate base approach, but given your experience it seems that it is not always sufficient.
I suspect it's got something to do with the fact that the assignment is mixed within the condition (and after a short-circuiting operator at that...). Have you tried without ?
I think both the gcc and clang folks would be very interested by this example since it's relatively common practice in C or C++ and thus could benefit from some tuning.

The code is correct, but the compiler is failing to identify that the variable is never used without initialization.

I would suggest that it's likely a heuristical error- that's what the "may" is for. I suspect that not many loop conditions look quite like that. That code is not unsafe because in all control paths, cmpres is assigned before use. However, I certainly wouldn't find it wrong to initialize it first.
You could, however, have some kind of variable shadowing going on here. That would be the only explanation I could think of for only one of the two translation units giving errors.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why can we use uninitialized variables in C++? - c++

Related

Why is it allowed to use a variable in its own declaration? [duplicate]

What's the behavior of an uninitialized variable used as its own initializer?

Why gcc and clang both don't emit any warning?

Need help regarding macro definition

“Uninitialized use” warning in the g++ compiler

Categories

Resources