What are the dangers of uninitialised variables? - c++

In a program I am writing I currently have several uninitialised variables in my .h files, all of which are initialised at run-time. However, in Visual Studio it warns me every time I do this to "Always initialise a member variable" despite how seemingly pointless it feels to do so. I am well aware that attempting to use a variable when uninitialised will lead to undefined behaviour, but as far as I know, this can be avoided by not doing so. Am I overlooking something?
Thanks.

These variables could contain any value if you don't initialize them and reading them in an uninitialized stated is undefined behavior. (except if they are zero initalized)
And if you forgot to initialize one of them, and reading from it by accident results in the value you expect it should have on your current system configuration (due to undefined behavior), then your program might behave unpredictable/unexpected after a system update, on a different system or when you do changes in your code.
And these kinds of errors are hard to debug. So even if you set them at runtime it is suggested to initialize them to known values so that you have a controlled environment with predictable behavior.
There are a few exceptions, e.g. if you set the variable right after you declared it and you can't set it directly, like if you set its value using a streaming operator.

You have not included the source so we have to guess about why it happens, and I can see possible reasons with different solutions (except just zero-initializing everything):
You don't initialize at the start of the constructor, but you combine member initialization with some other code that calls some functions for the not fully initialized object. That's a mess - and you never know when some functions will call another function using some non-initialized member. If you really need this, don't send in the entire object - but only the parts you need (might need more refactoring).
You have the initialization in an Init-function. Just use the recent C++-feature of having one constructor call another instead.
You don't initialize some members in the constructor, but even later. If you really don't want to initialize it having a pointer (or std::unique_ptr) containing that data, and create it when needed; or don't have it in the object.

It's a safety measure to not allow uninitialized variables, witch is a good thing, but if you are sure of what you are doing and you make sure your variables are always initialzed before use, you can turn this off, right click on your project in solution explorer -> properties -> C/C++ -> SDL checks, this should be marked as NO. It comes as YES by default.
Note that these compile-time checks do more than just check for unitialized variables, so before you turn this off I advise reading https://learn.microsoft.com/en-us/cpp/build/reference/sdl-enable-additional-security-checks?view=vs-2019
You can also disable a specific warning in you code using warning pragma
Personally I keep these on because IMO in the tradeoff safety/annoyance I prefer safety, but I reckon that someone else can have a different opinion.

There are two parts to this question: first is reading uninitialized variables dangerous and second is defining variables uninitialized dangerous even if I make sure I never access uninitialized variables.
What are the dangers of accessing uninitialized variables?
With very few exceptions, accessing an uninitialized variable makes the whole program have Undefined Behavior. There is a common misconception (which unfortunately is taught) that uninitialized variables have "garbage values" and so reading an uninitialized variable will result in reading some value. This is completely false. Undefined Behavior means the program can have any behavior: it can crash, it can behave as the variable has some value, it can pretend the variable doesn't even exist or all sorts of weird behaviors.
For instance:
void foo();
void bar();
void test(bool cond)
{
int a; // uninitialized
if (cond)
{
a = 24;
}
if (a == 24)
{
foo();
}
else
{
bar();
}
}
What is the result of calling the above function with true? What about with false?
test(true) will cleary call foo().
What about test(false)? If you answer: "Well it depends on what garbage value is in variable a, if it is 24 it will call foo, else it will call bar" Then you are completely wrong.
If you call test(false) the program accesses an uninitialized variable and has Undefined Behavior, it is an illegal path and so the compilers are free to assume cond is never false (because otherwise the program would be illegal). And surprise surprise both gcc and clang with optimizations enabled actually do this and generate this assembly for the function:
test(bool):
jmp foo()
So don't do this! Never access uninitialized variable! It is undefined behavior and it's much much worse than "the variable has some garbage value". Furthermore, on your system could work as you expect, on other systems or with other compiler flags it can behave in unexpected ways.
What are the dangers of defining uninitialized variables if I make sure I always initialize them later, before accessing them?
Well, the program is correct from this respect, but the source code is prone to errors. You have to mentally burden yourself with always checking if somewhere you actually initialized the variable. And if you did forget to initialize a variable finding the bug will be difficult as you have a lot of variables in your code who are defined uninitialized.
As opposed, if you always initialize your variables you and the programmers after you have a much much easier job and ease of mind.
It's just a very very good practice.

Related

How function is able to read the value of user input variable from main function without passing the value as param? [duplicate]

From "C++ Primer" by Lippman,
When we define a variable, we should give it an initial value unless we are certain that the initial value will be overwritten before the variable is used for any other purpose. If we cannot guarantee that the variable will be reset before being read, we should initialize it.
What happens if an uninitialized variable is used in say an operation? Will it crash/ will the code fail to compile?
I searched the internet for answer to the same but there were differing 'claims'. Hence the following questions,
Will C and C++ standards differ in how they treat an uninitialized variable?
Regarding similar queries, how and where can I find an 'official' answer? Is it practical for an amateur to look up the C and C++ standards?
Q.1) What happens if an uninitialized variable is used in say an operation? Will it crash/ will the code fail to compile?
Many compilers try to warn you about code that improperly uses the value of an uninitialized variable. Many compilers have an option that says "treat warnings as errors". So depending on the compiler you're using and the option flags you invoke it with and how obvious it is that a variable is uninitialized, the code might fail to compile, although we can't say that it will fail to compile.
If the code does compile, and you try to run it, it's obviously impossible to predict what will happen. In most cases the variable will start out containing an "indeterminate" value. Whether that indeterminate value will cause your program to work correctly, or work incorrectly, or crash, is anyone's guess. If the variable is an integer and you try to do some math on it, you'll probably just get a weird answer. But if the variable is a pointer and you try to indirect on it, you're quite likely to get a crash.
It's often said that uninitialized local variables start out containing "random garbage", but that can be misleading, as evidenced by the number of people who post questions here pointing out that, in their program where they tried it, the value wasn't random, but was always 0 or was always the same. So I like to say that uninitialized local variables never start out holding what you expect. If you expected them to be random, you'll find that (at least on any given day) they're repeatable and predictable. But if you expect them to be predictable (and, god help you, if you write code that depends on it), then by jingo, you'll find that they're quite random.
Whether use of an uninitialized variable makes your program formally undefined turns out to be a complicated question. But you might as well assume that it does, because it's a case you want to avoid just as assiduously as you avoid any other dangerous, imperfectly-defined behavior.
See this old question and this other old question for more (much more!) information on the fine distinctions between undefined and indeterminate behavior in this case.
Q.2) Will C and C++ standards differ in how they treat an uninitialized variable?
They might differ. As I alluded to above, and at least in C, it turns out that not all uses of uninitialized local variables are formally undefined. (Some are merely "indeterminate".) But the passages quoted from the C++ standards by other answers here make it sound like it's undefined there all the time. Again, for practical purposes, the question probably doesn't matter, because as I said, you'll want to avoid it no matter what.
Q.3) Regarding similar queries, how and where can I find an 'official' answer? Is it practical for an amateur to look up the C and C++ standards?
It is not always easy to obtain copies of the standards (let alone official ones, which often cost money), and the standards can be difficult to read and to properly interpret, but yes, given effort, anyone can obtain, read, and attempt to answer questions using the standards. You might not always make the correct interpretation the first time (and you may therefore need to ask for help), but I wouldn't say that's a reason not to try. (For one thing, anyone can read any document and end up not making the correct interpretation the first time; this phenomenon is not limited to amateur programmers reading complex language standards documents!)
C++
The C++ Standard, [dcl.init], paragraph 12 [ISO/IEC 14882-2014], states the following:
If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced. If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:
(end quote)
So using an uninitialized variable will result in undefined behavior.
Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior. The program may give your expected output or it may crash.
C
The C Standard, 6.7.9, paragraph 10, specifies [ISO/IEC 9899:2011]
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.
Uninitialized automatic variables or dynamically allocated memory has indeterminate values, which for objects of some types, can be a trap representation. Reading such trap representations is undefined behavior.
1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.
What happens if an uninitialized variable is used in say an operation?
It depends. If the operation uses the value of the variable, and the type of the variable and the expression aren't excepted from it, then the behaviour of the program is undefined.
If the value isn't used - such as in the case of sizeof operator, then nothing particular happens that wouldn't happen with an initialised variable.
Will C and C++ standards differ in how they treat an uninitialized variable?
They use different wording, but are essentially quite similar in this regard.
C defines the undefined behaviour of indeterminate values through the concept of "trap representation" and specifies types that are guaranteed to not have trap representations.
C++ doesn't define the concept of "trap representation", but rather lists exceptional cases where producing an indeterminate value doesn't result in undefined behaviour. These cases have some overlap with the exceptional types in C, but aren't exactly the same.
Regarding similar queries, how and where can I find an 'official' answer?
The official answer - if there is one - is always in the language standard document.
The standard says it is undefined.
However on a Unix or VMS based system (Gnu/Linux, UNIX, BSD, MS-Windows > XP or NT, MacOS > X) then the stack and heap are initialised to zero (this is done for security reasons. Now to make your code work.)
However if you go up and down the stack or free then malloc then the data will be random rubish. (there may be other causes of random rubish. Don't rely on undefined behaviours).
Could the program crash? (By this, you mean detect error at run-time.)
Probably not, but again this is undefined behaviour. A C interpreter may do this.
Note also, some C++ types have a constructor that does well-defined initialisation.
You have tagged both C and C++. In C, an uninitialized variable probably has junk bits. Often your compiler with put zero bits there, but you can not count on it. So if you use that variable without explicitly initializing, the result may be sensible and it may not. And strictly speaking this is undefined behavior, so anything at all may happen.
C++ has the same for simple variables, but there is an interesting exception: while a int x[3] contains junk, std::vector x(3) contains zeros.

What happens to uninitialized variables in C/C++?

From "C++ Primer" by Lippman,
When we define a variable, we should give it an initial value unless we are certain that the initial value will be overwritten before the variable is used for any other purpose. If we cannot guarantee that the variable will be reset before being read, we should initialize it.
What happens if an uninitialized variable is used in say an operation? Will it crash/ will the code fail to compile?
I searched the internet for answer to the same but there were differing 'claims'. Hence the following questions,
Will C and C++ standards differ in how they treat an uninitialized variable?
Regarding similar queries, how and where can I find an 'official' answer? Is it practical for an amateur to look up the C and C++ standards?
Q.1) What happens if an uninitialized variable is used in say an operation? Will it crash/ will the code fail to compile?
Many compilers try to warn you about code that improperly uses the value of an uninitialized variable. Many compilers have an option that says "treat warnings as errors". So depending on the compiler you're using and the option flags you invoke it with and how obvious it is that a variable is uninitialized, the code might fail to compile, although we can't say that it will fail to compile.
If the code does compile, and you try to run it, it's obviously impossible to predict what will happen. In most cases the variable will start out containing an "indeterminate" value. Whether that indeterminate value will cause your program to work correctly, or work incorrectly, or crash, is anyone's guess. If the variable is an integer and you try to do some math on it, you'll probably just get a weird answer. But if the variable is a pointer and you try to indirect on it, you're quite likely to get a crash.
It's often said that uninitialized local variables start out containing "random garbage", but that can be misleading, as evidenced by the number of people who post questions here pointing out that, in their program where they tried it, the value wasn't random, but was always 0 or was always the same. So I like to say that uninitialized local variables never start out holding what you expect. If you expected them to be random, you'll find that (at least on any given day) they're repeatable and predictable. But if you expect them to be predictable (and, god help you, if you write code that depends on it), then by jingo, you'll find that they're quite random.
Whether use of an uninitialized variable makes your program formally undefined turns out to be a complicated question. But you might as well assume that it does, because it's a case you want to avoid just as assiduously as you avoid any other dangerous, imperfectly-defined behavior.
See this old question and this other old question for more (much more!) information on the fine distinctions between undefined and indeterminate behavior in this case.
Q.2) Will C and C++ standards differ in how they treat an uninitialized variable?
They might differ. As I alluded to above, and at least in C, it turns out that not all uses of uninitialized local variables are formally undefined. (Some are merely "indeterminate".) But the passages quoted from the C++ standards by other answers here make it sound like it's undefined there all the time. Again, for practical purposes, the question probably doesn't matter, because as I said, you'll want to avoid it no matter what.
Q.3) Regarding similar queries, how and where can I find an 'official' answer? Is it practical for an amateur to look up the C and C++ standards?
It is not always easy to obtain copies of the standards (let alone official ones, which often cost money), and the standards can be difficult to read and to properly interpret, but yes, given effort, anyone can obtain, read, and attempt to answer questions using the standards. You might not always make the correct interpretation the first time (and you may therefore need to ask for help), but I wouldn't say that's a reason not to try. (For one thing, anyone can read any document and end up not making the correct interpretation the first time; this phenomenon is not limited to amateur programmers reading complex language standards documents!)
C++
The C++ Standard, [dcl.init], paragraph 12 [ISO/IEC 14882-2014], states the following:
If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced. If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:
(end quote)
So using an uninitialized variable will result in undefined behavior.
Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior. The program may give your expected output or it may crash.
C
The C Standard, 6.7.9, paragraph 10, specifies [ISO/IEC 9899:2011]
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.
Uninitialized automatic variables or dynamically allocated memory has indeterminate values, which for objects of some types, can be a trap representation. Reading such trap representations is undefined behavior.
1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.
What happens if an uninitialized variable is used in say an operation?
It depends. If the operation uses the value of the variable, and the type of the variable and the expression aren't excepted from it, then the behaviour of the program is undefined.
If the value isn't used - such as in the case of sizeof operator, then nothing particular happens that wouldn't happen with an initialised variable.
Will C and C++ standards differ in how they treat an uninitialized variable?
They use different wording, but are essentially quite similar in this regard.
C defines the undefined behaviour of indeterminate values through the concept of "trap representation" and specifies types that are guaranteed to not have trap representations.
C++ doesn't define the concept of "trap representation", but rather lists exceptional cases where producing an indeterminate value doesn't result in undefined behaviour. These cases have some overlap with the exceptional types in C, but aren't exactly the same.
Regarding similar queries, how and where can I find an 'official' answer?
The official answer - if there is one - is always in the language standard document.
The standard says it is undefined.
However on a Unix or VMS based system (Gnu/Linux, UNIX, BSD, MS-Windows > XP or NT, MacOS > X) then the stack and heap are initialised to zero (this is done for security reasons. Now to make your code work.)
However if you go up and down the stack or free then malloc then the data will be random rubish. (there may be other causes of random rubish. Don't rely on undefined behaviours).
Could the program crash? (By this, you mean detect error at run-time.)
Probably not, but again this is undefined behaviour. A C interpreter may do this.
Note also, some C++ types have a constructor that does well-defined initialisation.
You have tagged both C and C++. In C, an uninitialized variable probably has junk bits. Often your compiler with put zero bits there, but you can not count on it. So if you use that variable without explicitly initializing, the result may be sensible and it may not. And strictly speaking this is undefined behavior, so anything at all may happen.
C++ has the same for simple variables, but there is an interesting exception: while a int x[3] contains junk, std::vector x(3) contains zeros.

Why doesn't Valgrind detect usage of uninitialized variable?

As I understand Valgrind should report errors when code contains usage of uninitialized variables. In this toy example below printer is uninitialized, but program "happily" prints message anyway.
#include <iostream>
class Printer {
public:
void print() {
std::cout<<"I PRINT"<<std::endl;
}
};
int main() {
Printer* printer;
printer->print();
};
When I test this program with Valgrind it doesn't report any errors.
Is it expected behavior? And if yes, why so?
The variable is actually never used.
The method call is inlined1, so the variable is not passed as an argument.
The method itself doesn't use this in any way, so the variable is not used at all.
Above is independent of turning optimizations on or off.
As a matter of fact, in optimized code the variable will never exist at all - not even as memory allocation.
Question about a similar case: Extern variable only in header unexpectedly working, why?
.
1 All methods defined in the class body are inlined by default.
Is it an Undefined Behavior?
Yes it is. Calling the method requires this to point at an actual, initialized intance of object to be well-formed. As Nir Friedman points out, compiler is free to assume that and optimize on that base (and IIRC this kind of optimizations can happen even with -O0!).
I'd personally expect the specific code in question to work in any practical conditions (as the pointer value is really irrelevant), but I would never rely on that. You should fix your code right now.
Detection
To detect usage of uninitialized variables in Clang/GCC, use option -Wuninitialized (or simply use -Wall, which includes this flag).
-Wuninitialized should mostly cover use of stack-allocated memory, though I guess some use of stack-allocated arrays may still slip. Some compilers may support including extra runtime checks for uninitialized reads with -fsanitize=... options, like -fsanitize=memory in Clang (thx, chtz). These checks should cover the edge cases as well as the use of heap-allocated memory.
The main() function has undefined behaviour, since printer is uninitialised and the statement printer->print() both accesses the value of printer and dereferences it via -> and the call of the member function.
Practically, however, a compiler is permitted to handle undefined behaviour by simply assuming it is not present. Then the compiler can, if it chooses, follow a chain of logic;
When it sees a statement like printer->print() this means it is allowed to reason that printer has a value that can be accessed and dereferenced without introducing undefined behaviour.
Based on this reasoning, it is then permitted to assume that printer must have been initialised (by some means invisible to the compiler) to point at a valid object.
Based on this assumption, it can reason that the statement printer->print() will result in a call of Printer::print().
Since the compiler can see the definition of Printer::print(), it can simply inline it, and execute the statement std::cout<<"I PRINT"<<std::endl.
Since it doesn't need to access printer at all to produce that output, it can optimise out any reference to the variable named printer in main().
If a compiler follows the above sequence of logic, the program will simply print I PRINT and exit, without accessing any memory in a way that might trigger a report from Valgrind.
If you think the above sounds far-fetched, then you are mistaken. LLVM/Clang is one compiler that notionally follows a chain of logic very similar to what I have described. For more information have a look at the the LLVM Project Blog link to first article, second article, and
third article.

What happens when I do int*p=p in c/cpp?

Below code is getting compiled in MinGw. How does it get compiled? How is it possible to assign a variable which is not yet created?
int main()
{
int*p=p;
return 0;
}
How does it get compiled?
The point of declaration of a variable starts at the end of its declarator, but before its initialiser. This allows more legitimate self-referential declarations like
void * p = &p;
as well as undefined initialisations like yours.
How is it possible to assign a variable which is not yet created?
There is no assignment here, just initialisation.
The variable has been created (in the sense of having storage allocated for it), but not initialised. You initialise it from whatever indeterminate value happened to be in that storage, with undefined behaviour.
Most compilers will give a warning or error about using uninitialised values, if you ask them to.
Let's take a look at what happens with the int*p=p; statement:
The compiler allocates space on the stack to hold the yet uninitialized value of variable p
Then the compiler initializes p with its uninitialized value
So, essentially there should be no problem with the code except that it assigns a variable an uninitialized value.
Actually there is no much difference than the following code:
int *q; // define a pointer and do not initialize it
int *p = q; // assign the value of the uninitizlized pointer to another pointer
The likely result ("what it compiles to") will be the declaration of a pointer variable that is not initialized at all (which is subsequently optimized out since it is not used, so the net result would be "empty main").
The pointer is declared and initialized. So far, this is an ordinary and legal thing. However, it is initialized to itself, and its value is only in a valid, initialized state after the end of the statement (that is, at the location of the semicolon).
This, unsurprisingly, makes the statement undefined behavior.
By definition, invoking undefined behavior could in principle cause just about everything (although often quoted dramatic effects like formatting your harddrive or setting the computer on fire are exaggerated).
The compiler might actually generate an instruction that moves a register (or memory location) to itself, which would be a no-op instruction on most architectures, but could cause a hardware exception killing your process on some exotic architectures which have special validating registers for pointers (in case the "random" value is incidentially an invalid address).
The compiler will however not insert any "format harddisk" statements.
In practice, optimizing compilers will nowadays often assume "didn't happen" when they encounter undefined behavior, so it is most likely that the compiler will simply honor the declaration, and do nothing else.
This is, in every sense, perfectly allowable in the light of undefined behavior. Further, it is the easiest and least troublesome option for the compiler.

Why don't C++ compilers zero-initialize integer, floating-point, and pointer variables by default?

Sometimes, we can use tools like valgrind to find out if we forgot to initialize a pointer variable. But why don't modern compilers relieve us from this common mistake which is hard to reproduce?
Initialization is not always desirable:
For performance reasons; initializing memory takes time which may not be necessary. There are even algorithms that depend on constant time allocation to achieve their desired performance (initialization is linear time). D recognizes this but has a different (probably better) approach; variables are initialized by default but has special syntax to prevent initialization.
Sometimes there is no correct default value. Static analysis or runtime debugging features can help detect when a variable is used without initialization. Simply assigning some (incorrect) value to them by default could hide a bug that would be detected using these.
There are several aspects to your question. First of all, the standard doesn't mandate this behavior. But where it wouldn't violate the standard it's probably possible. In fact debug builds, e.g. in MSVC, do quite the opposite and initialize not with 0 but with some magic value - another magic value is used to fill "freed" memory. Again, this is specific to debug builds and can help detect certain classes of errors while it might hide other bugs or obscure there whereabouts in a code base.
But I think the performance issue weighs more here. If you were to initialize integers and floats, why not all kinds of structs or arrays both on the heap and on the stack? The problem with this is of course that if I write my program and allocate a buffer to read data into it, why should I waste time to zero the buffer first if I know I'm going to read x bytes and the buffer is only x bytes long?
I agree, relying on this "random" behavior like in the Debian SSH disaster wasn't correct, but it shows that people rely on the existing behavior and consequences of changes to this behavior will not always be obvious.
Not exactly. Apart from the reasons presented in the other answers, I'd like to point out that:
a) Global variables, static local variables are always initialized, if not explicitly, then implicitly zero.
b) For a custom class `class Test`, creating a new instance by `Test *ptr=new Test()` or `Test test;` gives an initialized object.
To sum up, things that end up in '.data' '.bss' sections of ELF file are initialized either explicitly or implicitly ( zero ). Other variables in primitive types that doesn't lie in the area mentioned above has undefined content before explicit initialization (depends on your compiler). This is because, the ELF file itself provides a way to initialize variables in a) kind at very low cost, while initializing variables in stack, heap, etc, can be a terrible overhead.
You might argue that class Object might be more costly to initialize. But, above all, omitting initialization for primitive types is how things works in C and therefore C++ comply with it.