C++ Perincrement Undefined Operation vs C

C++ Perincrement Undefined Operation vs C - c++

I have this line of code:
front = (++front) % size;
In C I get no warnings but in C++ I get the warning operation on front may be undefined [-Wsequence-point]. How does this preincrement usage cause undefined behavior? In my mind, this line is very unambiguous and will be interpreted as:
increment front
mod front with size
assign new value to front.
Is my compiler just throwing a blanket warning?
P.S. I understand the warning if I were doing something like front = front++; or Heaven forbid front = front++ + front++;.
EDIT: This warning was produced in CodeBlocks on Windows 64 using GCC (tdm-1) 4.6.1

You are changing front twice between sequence points: once through ++, and once through assignment.
This is undefined behaviour.

In C++11 this is well-defined; the structure is the same as that of:
i = ++i + 1;
which is given as an example of well-defined behaviour in the Standard itself. For a more detailed explanation see AndreyT's answer here.
In C++03, C89 and C99 this is undefined behaviour as they had looser sequencting rules for ++i.

The old sequencing rules had edge cases where order of operations was well defined but which were still technically undefined behavior. With C++11 and C11 this has been fixed by replacing the sequence point requirements with 'sequenced-before' and 'sequenced-after' relations.
Your example happens to be such a case. If you're getting warnings in a C11 or C++11 mode then the warning simply hasn't been updated for the new rules yet. In earlier C and C++ modes the warning is correct. If you're not getting warnings in earlier modes then they simply weren't implemented, and that's okay as 'no diagnostic is required'.
At the same time, this line can be written more clearly and also be correct under the old rules:
front = (front + 1) % size;

Writing the incremented value back to front can happen at any time (before or after the assignment modifies front), so the warning is valid and the code is unsafe.

Related

Why is it allowed to use a variable in its own declaration? [duplicate]

I noticed just now that the following code can be compiled with clang/gcc/clang++/g++, using c99, c11, c++11 standards.
int main(void) {
int i = i;
}
and even with -Wall -Wextra, none of the compilers even reports warnings.
By modifying the code to int i = i + 1; and with -Wall, they may report:
why.c:2:13: warning: variable 'i' is uninitialized when used within its own initialization [-Wuninitialized]
int i = i + 1;
~ ^
1 warning generated.
My questions:
Why is this even allowed by compilers?
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?

Because i is uninitialized when use to initialize itself, it has an indeterminate value at that time. An indeterminate value can be either an unspecified value or a trap representation.
If your implementation supports padding bits in integer types and if the indeterminate value in question happens to be a trap representation, then using it results in undefined behavior.
If your implementation does not have padding in integers, then the value is simply unspecified and there is no undefined behavior.
EDIT:
To elaborate further, the behavior can still be undefined if i never has its address taken at some point. This is detailed in section 6.3.2.1p2 of the C11 standard:
If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage
class (never had its address taken), and that object is uninitialized
(not declared with an initializer and no assignment to it
has been performed prior to use), the behavior is undefined.
So if you never take the address of i, then you have undefined behavior. Otherwise, the statements above apply.

This is a warning, it's not related to the standard.
Warnings are heuristic with "optimistic" approach. The warning is issued only when the compiler is sure that it's going to be a problem. In cases like this you have better luck with clang or newest versions of gcc as stated in comments (see another related question of mine: why am I not getting an "used uninitialized" warning from gcc in this trivial example?).
anyway, in the first case:
int i = i;
does nothing, since i==i already. It is possible that the assignment is completely optimized out as it's useless. With compilers which don't "see" self-initialization as a problem you can do this without a warning:
int i = i;
printf("%d\n",i);
Whereas this triggers a warning all right:
int i;
printf("%d\n",i);
Still, it's bad enough not to be warned about this, since from now on i is seen as initialized.
In the second case:
int i = i + 1;
A computation between an uninitialized value and 1 must be performed. Undefined behaviour happens there.

I believe you are okay with getting the warning in case of
int i = i + 1;
as expected, however, you expect the warning to be displayed even in case of
int i = i;
also.
Why is this even allowed by compilers?
There is nothing inherently wrong with the statement. See the related discussions:
Why does the compiler allow initializing a variable with itself?
Why is initialization of a new variable by itself valid?
for more insight.
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?
This is undefined behavior, as the type int can have trap representation and you never have taken the address of the variable in discussion. So, technically, you'll face UB as soon as you try to use the (indeterminate) value stored in variable i.
You should turn on your compiler warnings. In gcc,
compile with -Winit-self to get a warning. in C.
For C++, -Winit-self is enabled with -Wall already.

What's the behavior of an uninitialized variable used as its own initializer?

I noticed just now that the following code can be compiled with clang/gcc/clang++/g++, using c99, c11, c++11 standards.
int main(void) {
int i = i;
}
and even with -Wall -Wextra, none of the compilers even reports warnings.
By modifying the code to int i = i + 1; and with -Wall, they may report:
why.c:2:13: warning: variable 'i' is uninitialized when used within its own initialization [-Wuninitialized]
int i = i + 1;
~ ^
1 warning generated.
My questions:
Why is this even allowed by compilers?
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?

Because i is uninitialized when use to initialize itself, it has an indeterminate value at that time. An indeterminate value can be either an unspecified value or a trap representation.
If your implementation supports padding bits in integer types and if the indeterminate value in question happens to be a trap representation, then using it results in undefined behavior.
If your implementation does not have padding in integers, then the value is simply unspecified and there is no undefined behavior.
EDIT:
To elaborate further, the behavior can still be undefined if i never has its address taken at some point. This is detailed in section 6.3.2.1p2 of the C11 standard:
If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage
class (never had its address taken), and that object is uninitialized
(not declared with an initializer and no assignment to it
has been performed prior to use), the behavior is undefined.
So if you never take the address of i, then you have undefined behavior. Otherwise, the statements above apply.

This is a warning, it's not related to the standard.
Warnings are heuristic with "optimistic" approach. The warning is issued only when the compiler is sure that it's going to be a problem. In cases like this you have better luck with clang or newest versions of gcc as stated in comments (see another related question of mine: why am I not getting an "used uninitialized" warning from gcc in this trivial example?).
anyway, in the first case:
int i = i;
does nothing, since i==i already. It is possible that the assignment is completely optimized out as it's useless. With compilers which don't "see" self-initialization as a problem you can do this without a warning:
int i = i;
printf("%d\n",i);
Whereas this triggers a warning all right:
int i;
printf("%d\n",i);
Still, it's bad enough not to be warned about this, since from now on i is seen as initialized.
In the second case:
int i = i + 1;
A computation between an uninitialized value and 1 must be performed. Undefined behaviour happens there.

I believe you are okay with getting the warning in case of
int i = i + 1;
as expected, however, you expect the warning to be displayed even in case of
int i = i;
also.
Why is this even allowed by compilers?
There is nothing inherently wrong with the statement. See the related discussions:
Why does the compiler allow initializing a variable with itself?
Why is initialization of a new variable by itself valid?
for more insight.
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?
This is undefined behavior, as the type int can have trap representation and you never have taken the address of the variable in discussion. So, technically, you'll face UB as soon as you try to use the (indeterminate) value stored in variable i.
You should turn on your compiler warnings. In gcc,
compile with -Winit-self to get a warning. in C.
For C++, -Winit-self is enabled with -Wall already.

Using reference causes no warning about undefined behaviour

I've been studying undefined behavior examples for C++, and I've found following one:
int a = 0;
a = a++;
Tried it with g++ -Wall -Wextra and it got me warning about sequence point.
But then I thought about another situation using reference:
int a = 0;
int &b = a;
b = a++;
This one didn't shout on me about sequence point. It seems almost obvious that it should.
Is there any good explanation why those two examples are treated differently by compiler?

That might seem obvious UB, but you have to understand that there are uncountably many different ways to violate the sequencing rules. And proving whether any particular expression is in violation is a slow and complex process, and sometimes turns out to be impossible. This is why the violation of these rules has been specified undefined behaviour in the standard, instead of ill-formed which would have required a diagnostic in every possible violation.
So, the compiler has to draw a line somewhere, and not spend resources to validate all expressions. Your test shows two expressions that are on opposite sides of that "line".

Why does C++17 GCC compiler gives warning about undefined?

According to C++17, there is no guarantee for order of evaluation in following expression. It is called unspecified behaviour.
int i = 0;
std::cout<<i<<i++<<std::endl;
C++17 GCC compiler gives following warning: Live Demo
prog.cc: In function 'int main()':
prog.cc:6:20: warning: operation on 'i' may be undefined [-Wsequence-point]
std::cout<<i<<i++<<std::endl;
I don't understand, in c++17 above express no longer undefined behaviour, then Why does compiler gives warning about undefined?

Seems like gcc gives a warning because this is a corner case, or at least very close to being one. Portability seems to be one concern.
From the page https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
The C++17 standard will define the order of evaluation of operands in more cases: in particular it requires that the right-hand side of an assignment be evaluated before the left-hand side, so the above examples are no longer undefined. But this warning will still warn about them, to help people avoid writing code that is undefined in C and earlier revisions of C++.
The standard is worded confusingly, therefore there is some debate over the precise meaning of the sequence point rules in subtle cases. Links to discussions of the problem, including proposed formal definitions, may be found on the GCC readings page, at http://gcc.gnu.org/readings.html.

A C++ implementation that detects undefined behavior?

A huge number of operations in C++ result in undefined behavior, where the spec is completely mute about what the program's behavior ought to be and allows for anything to happen. Because of this, there are all sorts of cases where people have code that compiles in debug but not release mode, or that works until a seemingly unrelated change is made, or that works on one machine but not another, etc.
My question is whether there is a utility that looks at the execution of C++ code and flags all instances where the program invokes undefined behavior. While it's nice that we have tools like valgrind and checked STL implementations, these aren't as strong as what I'm thinking about - valgrind can have false negatives if you trash memory that you still have allocated, for example, and checked STL implementations won't catch deleting through a base class pointer.
Does this tool exist? Or would it even be useful to have it lying around at all?
EDIT: I am aware that in general it is undecidable to statically check whether a C++ program may ever execute something that has undefined behavior. However, it is possible to determine whether a specific execution of a C++ produced undefined behavior. One way to do this would be to make a C++ interpreter that steps through the code according to the definitions set out in the spec, at each point determining whether or not the code has undefined behavior. This won't detect undefined behavior that doesn't occur on a particular program execution, but it will find any undefined behavior that actually manifests itself in the program. This is related to how it is Turing-recognizable to determine if a TM accepts some input, even if it's still undecidable in general.
Thanks!

This is a great question, but let me give an idea for why I think it might be impossible (or at least very hard) in general.
Presumably, such an implementation would almost be a C++ interpreter, or at least a compiler for something more like Lisp or Java. It would need to keep extra data for each pointer to ensure you did not perform arithmetic outside of an array or dereference something that was already freed or whatever.
Now, consider the following code:
int *p = new int;
delete p;
int *q = new int;
if (p == q)
*p = 17;
Is the *p = 17 undefined behavior? On the one hand, it dereferences p after it has been freed. On the other hand, dereferencing q is fine and p == q...
But that is not really the point. The point is that whether the if evaluates to true at all depends on the details of the heap implementation, which can vary from implementation to implementation. So replace *p = 17 by some actual undefined behavior, and you have a program that might very well blow up on a normal compiler but run fine on your hypothetical "UB detector". (A typical C++ implementation will use a LIFO free list, so the pointers have a good chance of being equal. A hypothetical "UB detector" might work more like a garbage collected language in order to detect use-after-free problems.)
Put another way, the existence of merely implementation-defined behavior makes it impossible to write a "UB detector" that works for all programs, I suspect.
That said, a project to create an "uber-strict C++ compiler" would be very interesting. Let me know if you want to start one. :-)

John Regehr in Finding Undefined Behavior Bugs by Finding Dead Code points out a tool called STACK and I quote from the site (emphasis mine):
Optimization-unstable code (unstable code for short) is an emerging class of software bugs: code that is unexpectedly eliminated by compiler optimizations due to undefined behavior in the program. Unstable code is present in many systems, including the Linux kernel and the Postgres database server. The consequences of unstable code range from incorrect functionality to missing security checks.
STACK is a static checker that detects unstable code in C/C++ programs. Applying STACK to widely used systems has uncovered 160 new bugs that have been confirmed and fixed by developers.
Also in C++11 for the case of constexpr variables and functions undefined behavior should be caught at compile time.
We also have gcc ubsan:
GCC recently (version 4.9) gained Undefined Behavior Sanitizer
(ubsan), a run-time checker for the C and C++ languages. In order to
check your program with ubsan, compile and link the program with
-fsanitize=undefined option. Such instrumented binaries have to be executed; if ubsan detects any problem, it outputs a “runtime error:”
message, and in most cases continues executing the program.
and Clang Static Analyzer which includes many checks for undefined behavior. For example clangs -fsanitize checks which includes -fsanitize=undefined:
-fsanitize=undefined: Fast and compatible undefined behavior checker. Enables the undefined behavior checks that have small runtime cost and
no impact on address space layout or ABI. This includes all of the
checks listed below other than unsigned-integer-overflow.
and for C we can look at his article It’s Time to Get Serious About Exploiting Undefined Behavior which says:
[..]I confess to not personally having the gumption necessary for cramming GCC or LLVM through the best available dynamic undefined behavior checkers: KCC and Frama-C.[...]
Here is a link to kcc and I quote:
[...]If you try to run a program that is undefined (or one for which we are missing semantics), the program will get stuck. The message should tell you where it got stuck and may give a hint as to why. If you want help deciphering the output, or help understanding why the program is undefined, please send your .kdump file to us.[...]
and here are a link to Frama-C, an article where the first use of Frama-C as a C interpreter is described and an addendum to the article.

Using g++
-Wall -Werror -pedantic-error
(preferably with an appropriate -std argument as well) will pick up quite a few case of U.B.
Things that -Wall gets you include:
-pedantic
Issue all the warnings demanded by strict ISO C and ISO C++; reject
all programs that use forbidden extensions, and some other programs
that do not follow ISO C and ISO C++. For ISO C, follows the
version of the ISO C standard specified by any -std option used.
-Winit-self (C, C++, Objective-C and Objective-C++ only)
Warn about uninitialized variables which are initialized with
themselves. Note this option can only be used with the
-Wuninitialized option, which in turn only works with -O1 and
above.
-Wuninitialized
Warn if an automatic variable is used without first being
initialized or if a variable may be clobbered by a "setjmp" call.
and various disallowed things you can do with specifiers to printf and scanf family functions.

Clang has a suite of sanitizers that catch various forms of undefined behavior. Their eventual goal is to be able to catch all C++ core language undefined behavior, but checks for a few tricky forms of undefined behavior are missing right now.
For a decent set of sanitizers, try:
clang++ -fsanitize=undefined,address
-fsanitize=address checks for use of bad pointers (not pointing to valid memory), and -fsanitize=undefined enables a set of lightweight UB checks (integer overflow, bad shifts, misaligned pointers, ...).
-fsanitize=memory (for detecting uninitialized memory reads) and -fsanitize=thread (for detecting data races) are also useful, but neither of these can be combined with -fsanitize=address nor with each other because all three have an invasive impact on the program's address space.

You might want to read about SAFECode.
This is a research project from the University of Illinois, the goal is stated on the front page (linked above):
The purpose of the SAFECode project is to enable program safety without garbage collection and with minimal run-time checks using static analysis when possible and run-time checks when necessary. SAFECode defines a code representation with minimal semantic restrictions designed to enable static enforcement of safety, using aggressive compiler techniques developed in this project.
What is really interesting to me is the elimination of the runtime checks whenever the program can be proved to be correct statically, for example:
int array[N];
for (i = 0; i != N; ++i) { array[i] = 0; }
Should not incur any more overhead than the regular version.
In a lighter fashion, Clang has some guarantees about undefined behavior too as far as I recall, but I cannot get my hands on it...

The clang compiler can detect some undefined behaviors and warn against them. Probably not as complete as you want, but it's definitely a good start.

Unfortunately I'm not aware of any such tool. Typically UB is defined as such precisely because it would be hard or impossible for a compiler to diagnose it in all cases.
In fact your best tool is probably compiler warnings: They often warn about UB type items (for example, non-virtual destructor in base classes, abusing the strict-aliasing rules, etc).
Code review can also help catch cases where UB is relied upon.
Then you have to rely on valgrind to capture the remaining cases.

Just as a side observation, according to the theory of computability, you cannot have a program that detects all possible undefined behaviours.
You can only have tools that use heuristics and detect some particular cases that follow certain patterns. Or you can in certain cases prove that a program behaves as you want. But you cannot detect undefined behaviour in general.
Edit
If a program does not terminate (hangs, loops forever) on a given input, then its output is undefined.
If you agree on this definition, then determining whether a program terminates is the well-known "Halting Problem", which has been proven to be undecidable, i.e. there exists no program (Turing Machine, C program, C++ program, Pascal program, in whatever language) that can solve this problem in general.
Simply put: there exists no program P that can take as input any program Q and input data I and print as output TRUE if Q(I) terminates, or else print FALSE if Q(I) does not terminate.
For more information you can look at http://en.wikipedia.org/wiki/Halting_problem.

Undefined behaviour is undefined. The best you can do is conform to the standard pedantically, as others have suggested, however, you can not test for what is undefined, because you don't know what it is. If you knew what it was and standards specified it, it would not be undefined.
However, if you for some reason, do actually rely on what the standard says is undefined, and it results in a particular result, then you may choose to define it, and write some unit tests to confirm that for your particular build, it is defined. It is much better, however, to simply avoid undefined behaviour whenever possible.

Take a look at PCLint its pretty decent at detecting a lot of bad things in C++.
Here's a subset of what it catches

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js