I’m using g++ with warning level -Wall -Wextra and treating warnings as errors (-Werror).
Now I’m sometimes getting an error “variable may be used uninitialized in this function”.
By “sometimes” I mean that I have two independent compilation units that both include the same header file. One compilation unit compiles without error, the other gives the above error.
The relevant piece of code in the header files is as follows. Since the function is pretty long, I’ve only reproduced the relevant bit below.
The exact error is:
'cmpres' may be used uninitialized in this function
And I’ve marked the line with the error by * below.
for (; ;) {
int cmpres; // *
while (b <= c and (cmpres = cmp(b, pivot)) <= 0) {
if (cmpres == 0)
::std::iter_swap(a++, b);
++b;
}
while (c >= b and (cmpres = cmp(c, pivot)) >= 0) {
if (cmpres == 0)
::std::iter_swap(d--, c);
--c;
}
if (b > c) break;
::std::iter_swap(b++, c--);
}
(cmp is a functor that takes two pointers x and y and returns –1, 0 or +1 if *x < *y, *x == *y or *x > *y respectively. The other variables are pointers into the same array.)
This piece of code is part of a larger function but the variable cmpres is used nowhere else. Hence I fail to understand why this warning is generated. Furthermore, the compiler obviously understands that cmpres will never be read uninitialized (or at least, it doesn’t always warn, see above).
Now I have two questions:
Why the inconsistent behaviour? Is this warning generated by a heuristic? (This is plausible since emitting this warning requires a control flow analysis which is NP hard in the general case and cannot always be performed.)
Why the warning? Is my code unsafe? I have come to appreciate this particular warning because it has saved me from very hard to detect bugs in other cases – so this is a valid warning, at least sometimes. Is it valid here?
An algorithm that diagnoses uninitialized variables with no false negatives or positives must (as a subroutine) include an algorithm that solves the Halting Problem. Which means there is no such algorithm. It is impossible for a computer to get this right 100% of the time.
I don't know how GCC's uninitialized variable analysis works exactly, but I do know it's very sensitive to what early optimization passes have done to the code. So I'm not at all surprised you get false positives only sometimes. It does distinguish cases where it's certain from cases where it can't be certain --
int foo() { int a; return a; }
produces "warning: ‘a’ is used uninitialized in this function" (emphasis mine).
EDIT: I found a case where recent versions of GCC (4.3 and later) fail to diagnose an uninitialized variable:
int foo(int x)
{
int a;
return x ? a : 0;
}
Early optimizations notice that if x is nonzero, the function's behavior is undefined, so they assume x must be zero and replace the entire body of the function with "return 0;" This happens well before the pass that generates the used-uninitialized warnings, so there's no diagnostic. See GCC bug 18501 for gory details.
I bring this up partially to demonstrate that production-grade compilers can get uninitialized-variable diagnostics wrong both ways, and partially because it's a nice example of the point that undefined behavior can propagate backward in execution time. There's nothing undefined about testing x, but because code control-dependent on x has undefined behavior, a compiler is allowed to assume that the control dependency is never satisfied and discard the test.
There was an interesting discussion on clang dev-mailing list related to those heuristics this week.
The bottom line is: it's actually quite difficult to diagnose unitialized values without getting exponential behavior...
Apparently (from the discussion), gcc uses a predicate base approach, but given your experience it seems that it is not always sufficient.
I suspect it's got something to do with the fact that the assignment is mixed within the condition (and after a short-circuiting operator at that...). Have you tried without ?
I think both the gcc and clang folks would be very interested by this example since it's relatively common practice in C or C++ and thus could benefit from some tuning.
The code is correct, but the compiler is failing to identify that the variable is never used without initialization.
I would suggest that it's likely a heuristical error- that's what the "may" is for. I suspect that not many loop conditions look quite like that. That code is not unsafe because in all control paths, cmpres is assigned before use. However, I certainly wouldn't find it wrong to initialize it first.
You could, however, have some kind of variable shadowing going on here. That would be the only explanation I could think of for only one of the two translation units giving errors.
Related
I'm trying to understand what's going on inside with this (weird?) g++ behavior.
#include <iostream>
using namespace std;
int& f(void) {
int a = 9;
int& b = a;
return a;
}
int main(void) {
int& l = f();
cout << ++l << '\n' << l << '\n';
}
When returning a itself and bind it to l, I get a warning (reference to local variable) and a seg.fault if I acess it from l, but when returning b itself not only do I not get a seg.fault but I can access it once from l (UB I am guessing) before the value of l randomly changes. But what exactly happens here?
Aren't the two returns identical? Does g++ automatically mark a's area as unusable after the return, hence the seg.fault while for some reason allowing b to live longer?
Your main question is why gcc doesn't issue a warning in one of the alternatives. Both alternatives are undefined behavior and the only difference is that in one case the compiler can detect it and warn you about it.
The C++ standard does not require a diagnostic for undefined behavior. Any diagnostic to that effect, from your compiler, is just an extra bonus; and although modern C++ compilers are very smart, they can't always figure out that the compiled code will result in demons flying out of your nose.
P.S. gcc 10.2 does issue a warning with the -O3 option, for the return b; alternative. With -Wall only, gcc also issues a 2nd warning for undefined behavior, you can discover what it is by yourself.
No, C++ compilers do not mark areas as unusable.
Segfaults are just one of many ways undefined behavior can exhibit itself. It is in fact one of the friendlier ways, as it makes you notice it early.
You are in charge of lifetime. If you get it wrong, the result is undefined behavior. Not an exception. Not a segfault. Literally anything.
One possibly symptom is "it appears to work". Another is segfault. Others include literal time travel (where UB later in the program makes ealier code behave differently), your computer hard drive being rendered unusable, someone getting your credit card information, your browser history being emailed to your contact list, etc.
Some compilers, in debug mode, mark deallocated memory with a bit pattern to aid debugging.
But what exactly happens here?
Undefined behaviour.
Aren't the two returns identical?
The source code is clearly syntactically different. The programs don't have same semantic meaning because neither program has any semantic meaning because the meaning of the both programs is undefined. As such, the behaviour of the programs is not guaranteed to be the same.
Does g++ automatically mark a's area as unusable after the return
Perhaps. I wouldn't assume this to be the case based on that one observation but this may be true. See the source code of GCC to confirm.
#include <vector>
std::vector<int>::iterator foo();
void bar(void*) {}
int main()
{
void* p;
while (foo() != foo() && (p = 0, true))
{
bar(p);
}
return 0;
}
Results in error:
c:\users\jessepepper\source\repos\testcode\consoleapplication1\consoleapplication1.cpp(15): error C4703: potentially uninitialized local pointer variable 'p' used
It's kind of a bug, but very typical for the kind of code you write.
First, this isn't an error, it's a warning. C4703 is a level 4 warning (meaning that it isn't even enabled by default). So in order to get it reported as an error (and thus interrupt compilation), compiler arguments or pragmas were passed to enable this warning and turn it into an error (/W4 and /Werror are the most likely I think).
Then there's a trade-off in the compiler. How complex should the data flow analysis be to determine whether a variable is actually uninitialized? Should it be interprocedural? The more complex it is, the slower the compiler gets (and because of the halting problem, the issue may be undecidable anyway). The simpler it is, the more false positives you get because the condition that guarantees initialization is too complex for the compiler to understand.
In this case, I suspect that the compiler's analysis works as follows: the assignment to p is behind a conditional (it only happens if foo() != foo()). The usage of p is also behind a conditional (it only happens if that complex and-expression is true). The compiler cannot establish a relationship between these conditions (the analysis is not complex enough to realize that foo() != foo() is a precondition to the entire while loop condition being true). Thus, the compiler errs on the side of assuming that the access could happen without prior initialization and emits the warning.
So it's an engineering trade-off. You could report the bug, but if you do, I suggest you supply a more compelling real-world example of idiomatic code to argue in favor of making the analysis more complex. Are you sure you can't restructure your original code to make it more approachable to the compiler, and more readable for humans at the same time?
I did some experimenting with VC++2017 Preview.
It's definitely a bug bug. It makes it impossible to compile and link code that might be correct, albetit smelly.
A warning would be acceptable. (See #SebastianRedl answer.) But in the latest and greatest VC++2017, it is being treated as an error, not warning, even with warnings turned off, and "Treat warnings as errors" set to No. Something odd is happening. The "error" is being thrown late - after it says, "Generating code". I would guess, and it's only a guess, that the "Generating code" pass is doing global analysis to determine if un-initialized access is possible, and it's getting it wrong. Even then, you should be able to disable the error, IMO.
I do not know if this is new behavior. Reading Sebastian's answer, I presume it is. When I get any kind of warning at any level, I always fix it in the code, so I would not know.
Jesse, click on the triangular flag near the top right of Visual Studio, and report it.
For sure it's a bug. I tried to remove it in all possible ways, including #pragma. The real thing is that this is reported as an error, not as a warning as Microsoft say. This is a big mistake from Microsoft. It's NOT a WARNING, it's an ERROR. Please, do not repeat again that it's a warning, because it's NOT.
What I'm doing is trying to compile some third party library whose sources I do not want to fix in any way, and should compile in normal cases, but it DOESN'T compile in VS2017 because the infamous "error C4703: potentially uninitialized local pointer variable *** used".
Someone found a solution for that?
This question already has answers here:
Why does flowing off the end of a non-void function without returning a value not produce a compiler error?
(11 answers)
Closed 6 years ago.
I found this in one of my libraries this morning:
static tvec4 Min(const tvec4& a, const tvec4& b, tvec4& out)
{
tvec3::Min(a,b,out);
out.w = min(a.w,b.w);
}
I'd expect a compiler error because this method doesn't return anything, and the return type is not void.
The only two things that come to mind are
In the only place where this method is called, the return value isn't being used or stored. (This method was supposed to be void - the tvec4 return type is a copy-and-paste error)
a default constructed tvec4 is being created, which seems a bit unlike, oh, everything else in C++.
I haven't found the part of the C++ spec that addresses this. References (ha) are appreciated.
Update
In some circumstances, this generates an error in VS2012. I haven't narrowed down specifics, but it's interesting, nonetheless.
This is undefined behavior from the C++11 draft standard section 6.6.3 The return statement paragraph 2 which says:
[...] Flowing off the end of a function is equivalent to a return with no value; this results in undefined behavior in a value-returning function. [...]
This means that the compiler is not obligated provide an error nor a warning usually because it can be difficult to diagnose in all cases. We can see this from the definition of undefined behavior in the draft standard in section 1.3.24 which says:
[...]Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).[...]
Although in this case we can get both gcc and clang to generate a wanring using the -Wall flag, which gives me a warning similar to this:
warning: control reaches end of non-void function [-Wreturn-type]
We can turn this particular warning into an error using the -Werror=return-type flag. I also like to use -Wextra -Wconversion -pedantic for my own personal projects.
As ComicSansMS mentions in Visual Studio this code would generate C4716 which is an error by default, the message I see is:
error C4716: 'Min' : must return a value
and in the case where not all code paths would return a value then it would generate C4715, which is a warning.
Maybe some elaboration on the why part of the question:
As it turns out, it is actually quite hard† for a C++ compiler to determine whether a function exits without a return value. In addition to the code paths that end in explicit return statements and the ones that fall off the end of the function, you also have to consider potential exception throws or longjmps in the function itself, as well as all of its callees.
While it is quite easy for a compiler to identify a function that looks like it might be missing a return, it is considerably harder to prove that it is missing a return. In order to lift compiler vendors of this burden, the standard does not require this to generate an error.
So compiler vendors are free to generate a warning if they are quite sure that a function is missing a return and the user is then free to ignore/mask that warning in those rare cases where the compiler was actually wrong.
†: In the general case, this is equivalent to the halting problem, so it is actually impossible for a machine to decide this reliably.
Compile your code with -Wreturn-type option:
$ g++ -Wreturn-type source.cpp
This will give you warning. You can turn the warning into error if you use -Werror too:
$ g++ -Wreturn-type -Werror source.cpp
Note that this will turn all warnings into errors. So if you want error for specific warning, say -Wreturn-type, just type return-type without -W part as:
$ g++ -Werror=return-type source.cpp
In general you should always use -Wall option which includes most common warnings — this includes missing return statement also. Along with -Wall, you can use -Wextra also, which includes other warnings not included by -Wall.
Maybe some additional elaboration on the why part of the question.
C++ was designed so that a very large body of pre-existing body of C code compiles with minimum amount of changes. Unfortunately, C itself was paying a similar duty to earliest pre-standard C which did not even have the void keyword and instead relied on a default return type of int. C functions usually did return values, and whenever code superficially similar to Algol/Pascal/Basic procedures was written without any return statements, the function was, under the hood, returning whichever garbage was left on the stack. Neither the caller nor the callee assigns the value of the garbage in a reliable way. If the garbage is then ignored by every caller, everything is fine and C++ inherits the moral obligation to compile such code.
(If the returned value is used by the caller, the code may behave non-deterministically, similar to processing of an uninitialized variable. Could the difference be reliably identified by a compiler, in a hypothetical successor language to C? This is hardly possible. The caller and the callee may be in different compilation units.)
The implicit int is just a part of the C legacy involved here. A "dispatcher" function might, depending on a parameter, return a variety of types from some code branches, and return no useful value from other code branches. Such a function would generally be declared to return a type long enough to hold any of the possible types and the caller might need to cast it or extract it from a union.
So the deepest cause is probably the C language creators' belief that procedures that do not return any value are just an unimportant special case of functions that do; this problem got aggravated by the lack of focus on type safety of function calls in the oldest C dialects.
While C++ did break compatibility with some of the worst aspects of C (example), the willingness to compile a return statement without a value (or the implicit value-less return at the end of a function) was not one of them.
As already mentioned, this is undefined behavior and will give you a compiler warning. Most places I've worked require you to turn on compiler settings to treat warnings as errors - which enforces that all your code must compile with 0 errors and 0 warnings. This is a good example of why that is a good idea.
This is more of the standard C++ rule/feature which tends to be flexible with things and which tends to be more close to C.
But when we talk of the compilers, GCC or VS, they are more for professional usage and for variety of development purposes and hence put more strict development rules as per your needs.
That makes sense also, my personal opinion, because the language is all about features and its usage whereas compiler defines the rules for optimal and best way of using it as per your needs.
As mentioned in above post, compiler sometimes gives the error, sometimes gives warning and also it has the option of skipping these warning etc, indicating the freedom to use the language and its features in a way that suits us best.
Along with this there are several other questions mentioning this behaviour of returning a result without having a return statement. One simple example would be:
int foo(int a, int b){ int c = a+b;}
int main(){
int c = 5;
int d = 5;
printf("f(%d,%d) is %d\n", c, d, foo(c,d));
return 0;
}
Could this anomaly be due stack properties and more specifically:
Zero-Address Machines
In zero-address machines, locations of both operands are assumed to be at a default location.
These machines use the stack as the source of the input operands and the result goes back into
the stack. Stack is a LIFO (last-in-first-out) data structure that all processors support, whether
or not they are zero-address machines. As the name implies, the last item placed on the stack
is the first item to be taken out of the stack. All operations on this type of machine assume that the required input operands are the top
two values on the stack. The result of the operation is placed on top of the stack.
In addition to that, for accessing memory to read and write data same registers are used as data source and destination(DS (data segment) register), that store first the variables needed for the calculation and then the returned result.
Note:
with this answer I would like to discuss one possible explanation of the strange behaviour at machine (instruction) level as it has already a context and its covered in adequately wide range.
This question already has answers here:
Why does flowing off the end of a non-void function without returning a value not produce a compiler error?
(11 answers)
Closed 6 years ago.
I found this in one of my libraries this morning:
static tvec4 Min(const tvec4& a, const tvec4& b, tvec4& out)
{
tvec3::Min(a,b,out);
out.w = min(a.w,b.w);
}
I'd expect a compiler error because this method doesn't return anything, and the return type is not void.
The only two things that come to mind are
In the only place where this method is called, the return value isn't being used or stored. (This method was supposed to be void - the tvec4 return type is a copy-and-paste error)
a default constructed tvec4 is being created, which seems a bit unlike, oh, everything else in C++.
I haven't found the part of the C++ spec that addresses this. References (ha) are appreciated.
Update
In some circumstances, this generates an error in VS2012. I haven't narrowed down specifics, but it's interesting, nonetheless.
This is undefined behavior from the C++11 draft standard section 6.6.3 The return statement paragraph 2 which says:
[...] Flowing off the end of a function is equivalent to a return with no value; this results in undefined behavior in a value-returning function. [...]
This means that the compiler is not obligated provide an error nor a warning usually because it can be difficult to diagnose in all cases. We can see this from the definition of undefined behavior in the draft standard in section 1.3.24 which says:
[...]Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).[...]
Although in this case we can get both gcc and clang to generate a wanring using the -Wall flag, which gives me a warning similar to this:
warning: control reaches end of non-void function [-Wreturn-type]
We can turn this particular warning into an error using the -Werror=return-type flag. I also like to use -Wextra -Wconversion -pedantic for my own personal projects.
As ComicSansMS mentions in Visual Studio this code would generate C4716 which is an error by default, the message I see is:
error C4716: 'Min' : must return a value
and in the case where not all code paths would return a value then it would generate C4715, which is a warning.
Maybe some elaboration on the why part of the question:
As it turns out, it is actually quite hard† for a C++ compiler to determine whether a function exits without a return value. In addition to the code paths that end in explicit return statements and the ones that fall off the end of the function, you also have to consider potential exception throws or longjmps in the function itself, as well as all of its callees.
While it is quite easy for a compiler to identify a function that looks like it might be missing a return, it is considerably harder to prove that it is missing a return. In order to lift compiler vendors of this burden, the standard does not require this to generate an error.
So compiler vendors are free to generate a warning if they are quite sure that a function is missing a return and the user is then free to ignore/mask that warning in those rare cases where the compiler was actually wrong.
†: In the general case, this is equivalent to the halting problem, so it is actually impossible for a machine to decide this reliably.
Compile your code with -Wreturn-type option:
$ g++ -Wreturn-type source.cpp
This will give you warning. You can turn the warning into error if you use -Werror too:
$ g++ -Wreturn-type -Werror source.cpp
Note that this will turn all warnings into errors. So if you want error for specific warning, say -Wreturn-type, just type return-type without -W part as:
$ g++ -Werror=return-type source.cpp
In general you should always use -Wall option which includes most common warnings — this includes missing return statement also. Along with -Wall, you can use -Wextra also, which includes other warnings not included by -Wall.
Maybe some additional elaboration on the why part of the question.
C++ was designed so that a very large body of pre-existing body of C code compiles with minimum amount of changes. Unfortunately, C itself was paying a similar duty to earliest pre-standard C which did not even have the void keyword and instead relied on a default return type of int. C functions usually did return values, and whenever code superficially similar to Algol/Pascal/Basic procedures was written without any return statements, the function was, under the hood, returning whichever garbage was left on the stack. Neither the caller nor the callee assigns the value of the garbage in a reliable way. If the garbage is then ignored by every caller, everything is fine and C++ inherits the moral obligation to compile such code.
(If the returned value is used by the caller, the code may behave non-deterministically, similar to processing of an uninitialized variable. Could the difference be reliably identified by a compiler, in a hypothetical successor language to C? This is hardly possible. The caller and the callee may be in different compilation units.)
The implicit int is just a part of the C legacy involved here. A "dispatcher" function might, depending on a parameter, return a variety of types from some code branches, and return no useful value from other code branches. Such a function would generally be declared to return a type long enough to hold any of the possible types and the caller might need to cast it or extract it from a union.
So the deepest cause is probably the C language creators' belief that procedures that do not return any value are just an unimportant special case of functions that do; this problem got aggravated by the lack of focus on type safety of function calls in the oldest C dialects.
While C++ did break compatibility with some of the worst aspects of C (example), the willingness to compile a return statement without a value (or the implicit value-less return at the end of a function) was not one of them.
As already mentioned, this is undefined behavior and will give you a compiler warning. Most places I've worked require you to turn on compiler settings to treat warnings as errors - which enforces that all your code must compile with 0 errors and 0 warnings. This is a good example of why that is a good idea.
This is more of the standard C++ rule/feature which tends to be flexible with things and which tends to be more close to C.
But when we talk of the compilers, GCC or VS, they are more for professional usage and for variety of development purposes and hence put more strict development rules as per your needs.
That makes sense also, my personal opinion, because the language is all about features and its usage whereas compiler defines the rules for optimal and best way of using it as per your needs.
As mentioned in above post, compiler sometimes gives the error, sometimes gives warning and also it has the option of skipping these warning etc, indicating the freedom to use the language and its features in a way that suits us best.
Along with this there are several other questions mentioning this behaviour of returning a result without having a return statement. One simple example would be:
int foo(int a, int b){ int c = a+b;}
int main(){
int c = 5;
int d = 5;
printf("f(%d,%d) is %d\n", c, d, foo(c,d));
return 0;
}
Could this anomaly be due stack properties and more specifically:
Zero-Address Machines
In zero-address machines, locations of both operands are assumed to be at a default location.
These machines use the stack as the source of the input operands and the result goes back into
the stack. Stack is a LIFO (last-in-first-out) data structure that all processors support, whether
or not they are zero-address machines. As the name implies, the last item placed on the stack
is the first item to be taken out of the stack. All operations on this type of machine assume that the required input operands are the top
two values on the stack. The result of the operation is placed on top of the stack.
In addition to that, for accessing memory to read and write data same registers are used as data source and destination(DS (data segment) register), that store first the variables needed for the calculation and then the returned result.
Note:
with this answer I would like to discuss one possible explanation of the strange behaviour at machine (instruction) level as it has already a context and its covered in adequately wide range.
I came across the following code that compiles fine (using Visual Studio 2005):
SomeObject SomeClass::getSomeThing()
{
for each (SomeObject something in someMemberCollection)
{
if ( something.data == 0 )
{
return something;
}
}
// No return statement here
}
Why does this compile if there is no return statement at the end of the method?
This is to support backwards compatibility with C which did not strictly require a return from all functions. In those cases you were simply left with whatever the last value in the return position (stack or register).
If this is compiling without warning though you likely don't have your error level set high enough. Most compilers will warn about this now.
It's possible to write code that is guaranteed to always return a value, but the compiler might not be able to figure that out. One trivial example would be:
int func(int x)
{
if(x > 0)
return 1;
else if(x == 0)
return 0;
else if(x < 0)
return -1;
}
As far as the compiler is concerned, it's possible that all 3 if statements evaluate to false, in which case control would fall off the end of the function, returning an undefined result. Mathematically, though, we know it's impossible for that to happen, so this function has defined behavior.
Yes, a smarter compiler might be able to figure this out, but imagine that of integer comparisons, we had calls to external functions defined in a separate translation unit. Then, as humans we can prove that all control paths return values, but the compiler certainly can't figure that out.
The reason this is allowed is for compatibility with C, and the reason that C allows it is for compatibility with legacy C code that was written before C was standardized (pre-ANSI). There was code that did exactly this, so to allow such code to remain valid and error-free, the C standard permitted this. Letting control fall off a function without returning a value is still undefined behavior, though.
Any decent compiler should provide a warning about this; depending on your compiler, you may have to turn your warning level way up. I believe the option for this warning with gcc is -Wextra, which also includes a bunch of other warnings.
Set the warning level to 4 and tryout.Not all control path returns a value is the warning I remember getting this warning.
Probably your particular compiler is not doing as good flow control analysis as it should.
What compiler version are you using and what switches are you using to compile with?