Related
Updated, see below!
I have heard and read that C++0x allows an compiler to print "Hello" for the following snippet
#include <iostream>
int main() {
while(1)
;
std::cout << "Hello" << std::endl;
}
It apparently has something to do with threads and optimization capabilities. It looks to me that this can surprise many people though.
Does someone have a good explanation of why this was necessary to allow? For reference, the most recent C++0x draft says at 6.5/5
A loop that, outside of the for-init-statement in the case of a for statement,
makes no calls to library I/O functions, and
does not access or modify volatile objects, and
performs no synchronization operations (1.10) or atomic operations (Clause 29)
may be assumed by the implementation to terminate. [ Note: This is intended to allow compiler transfor-
mations, such as removal of empty loops, even when termination cannot be proven. — end note ]
Edit:
This insightful article says about that Standards text
Unfortunately, the words "undefined behavior" are not used. However, anytime the standard says "the compiler may assume P," it is implied that a program which has the property not-P has undefined semantics.
Is that correct, and is the compiler allowed to print "Bye" for the above program?
There is an even more insightful thread here, which is about an analogous change to C, started off by the Guy done the above linked article. Among other useful facts, they present a solution that seems to also apply to C++0x (Update: This won't work anymore with n3225 - see below!)
endless:
goto endless;
A compiler is not allowed to optimize that away, it seems, because it's not a loop, but a jump. Another guy summarizes the proposed change in C++0x and C201X
By writing a loop, the programmer is asserting either that the
loop does something with visible behavior (performs I/O, accesses
volatile objects, or performs synchronization or atomic operations),
or that it eventually terminates. If I violate that assumption
by writing an infinite loop with no side effects, I am lying to the
compiler, and my program's behavior is undefined. (If I'm lucky,
the compiler might warn me about it.) The language doesn't provide
(no longer provides?) a way to express an infinite loop without
visible behavior.
Update on 3.1.2011 with n3225: Committee moved the text to 1.10/24 and say
The implementation may assume that any thread will eventually do one of the following:
terminate,
make a call to a library I/O function,
access or modify a volatile object, or
perform a synchronization operation or an atomic operation.
The goto trick will not work anymore!
To me, the relevant justification is:
This is intended to allow compiler transfor- mations, such as removal of empty loops, even when termination cannot be proven.
Presumably, this is because proving termination mechanically is difficult, and the inability to prove termination hampers compilers which could otherwise make useful transformations, such as moving nondependent operations from before the loop to after or vice versa, performing post-loop operations in one thread while the loop executes in another, and so on. Without these transformations, a loop might block all other threads while they wait for the one thread to finish said loop. (I use "thread" loosely to mean any form of parallel processing, including separate VLIW instruction streams.)
EDIT: Dumb example:
while (complicated_condition()) {
x = complicated_but_externally_invisible_operation(x);
}
complex_io_operation();
cout << "Results:" << endl;
cout << x << endl;
Here, it would be faster for one thread to do the complex_io_operation while the other is doing all the complex calculations in the loop. But without the clause you have quoted, the compiler has to prove two things before it can make the optimisation: 1) that complex_io_operation() doesn't depend on the results of the loop, and 2) that the loop will terminate. Proving 1) is pretty easy, proving 2) is the halting problem. With the clause, it may assume the loop terminates and get a parallelisation win.
I also imagine that the designers considered that the cases where infinite loops occur in production code are very rare and are usually things like event-driven loops which access I/O in some manner. As a result, they have pessimised the rare case (infinite loops) in favour of optimising the more common case (noninfinite, but difficult to mechanically prove noninfinite, loops).
It does, however, mean that infinite loops used in learning examples will suffer as a result, and will raise gotchas in beginner code. I can't say this is entirely a good thing.
EDIT: with respect to the insightful article you now link, I would say that "the compiler may assume X about the program" is logically equivalent to "if the program doesn't satisfy X, the behaviour is undefined". We can show this as follows: suppose there exists a program which does not satisfy property X. Where would the behaviour of this program be defined? The Standard only defines behaviour assuming property X is true. Although the Standard does not explicitly declare the behaviour undefined, it has declared it undefined by omission.
Consider a similar argument: "the compiler may assume a variable x is only assigned to at most once between sequence points" is equivalent to "assigning to x more than once between sequence points is undefined".
Does someone have a good explanation of why this was necessary to allow?
Yes, Hans Boehm provides a rationale for this in N1528: Why undefined behavior for infinite loops?, although this is WG14 document the rationale applies to C++ as well and the document refers to both WG14 and WG21:
As N1509 correctly points out, the current draft essentially gives
undefined behavior to infinite loops in 6.8.5p6. A major issue for
doing so is that it allows code to move across a potentially
non-terminating loop. For example, assume we have the following loops,
where count and count2 are global variables (or have had their address
taken), and p is a local variable, whose address has not been taken:
for (p = q; p != 0; p = p -> next) {
++count;
}
for (p = q; p != 0; p = p -> next) {
++count2;
}
Could these two loops be merged and replaced by the following loop?
for (p = q; p != 0; p = p -> next) {
++count;
++count2;
}
Without the special dispensation in 6.8.5p6 for infinite loops, this
would be disallowed: If the first loop doesn't terminate because q
points to a circular list, the original never writes to count2. Thus
it could be run in parallel with another thread that accesses or
updates count2. This is no longer safe with the transformed version
which does access count2 in spite of the infinite loop. Thus the
transformation potentially introduces a data race.
In cases like this, it is very unlikely that a compiler would be able
to prove loop termination; it would have to understand that q points
to an acyclic list, which I believe is beyond the ability of most
mainstream compilers, and often impossible without whole program
information.
The restrictions imposed by non-terminating loops are a restriction on
the optimization of terminating loops for which the compiler cannot
prove termination, as well as on the optimization of actually
non-terminating loops. The former are much more common than the
latter, and often more interesting to optimize.
There are clearly also for-loops with an integer loop variable in
which it would be difficult for a compiler to prove termination, and
it would thus be difficult for the compiler to restructure loops
without 6.8.5p6. Even something like
for (i = 1; i != 15; i += 2)
or
for (i = 1; i <= 10; i += j)
seems nontrivial to handle. (In the former case, some basic number
theory is required to prove termination, in the latter case, we need
to know something about the possible values of j to do so. Wrap-around
for unsigned integers may complicate some of this reasoning further.)
This issue seems to apply to almost all loop restructuring
transformations, including compiler parallelization and
cache-optimization transformations, both of which are likely to gain
in importance, and are already often important for numerical code.
This appears likely to turn into a substantial cost for the benefit of
being able to write infinite loops in the most natural way possible,
especially since most of us rarely write intentionally infinite loops.
The one major difference with C is that C11 provides an exception for controlling expressions that are constant expressions which differs from C++ and makes your specific example well-defined in C11.
I think the correct interpretation is the one from your edit: empty infinite loops are undefined behavior.
I wouldn't say it's particularly intuitive behavior, but this interpretation makes more sense than the alternative one, that the compiler is arbitrarily allowed to ignore infinite loops without invoking UB.
If infinite loops are UB, it just means that non-terminating programs aren't considered meaningful: according to C++0x, they have no semantics.
That does make a certain amount of sense too. They are a special case, where a number of side effects just no longer occur (for example, nothing is ever returned from main), and a number of compiler optimizations are hampered by having to preserve infinite loops. For example, moving computations across the loop is perfectly valid if the loop has no side effects, because eventually, the computation will be performed in any case.
But if the loop never terminates, we can't safely rearrange code across it, because we might just be changing which operations actually get executed before the program hangs. Unless we treat a hanging program as UB, that is.
The relevant issue is that the compiler is allowed to reorder code whose side effects do not conflict. The surprising order of execution could occur even if the compiler produced non-terminating machine code for the infinite loop.
I believe this is the right approach. The language spec defines ways to enforce order of execution. If you want an infinite loop that cannot be reordered around, write this:
volatile int dummy_side_effect;
while (1) {
dummy_side_effect = 0;
}
printf("Never prints.\n");
I think this is along the lines of the this type of question, which references another thread. Optimization can occasionally remove empty loops.
I think the issue could perhaps best be stated, as "If a later piece of code does not depend on an earlier piece of code, and the earlier piece of code has no side-effects on any other part of the system, the compiler's output may execute the later piece of code before, after, or intermixed with, the execution of the former, even if the former contains loops, without regard for when or whether the former code would actually complete. For example, the compiler could rewrite:
void testfermat(int n)
{
int a=1,b=1,c=1;
while(pow(a,n)+pow(b,n) != pow(c,n))
{
if (b > a) a++; else if (c > b) {a=1; b++}; else {a=1; b=1; c++};
}
printf("The result is ");
printf("%d/%d/%d", a,b,c);
}
as
void testfermat(int n)
{
if (fork_is_first_thread())
{
int a=1,b=1,c=1;
while(pow(a,n)+pow(b,n) != pow(c,n))
{
if (b > a) a++; else if (c > b) {a=1; b++}; else {a=1; b=1; c++};
}
signal_other_thread_and_die();
}
else // Second thread
{
printf("The result is ");
wait_for_other_thread();
}
printf("%d/%d/%d", a,b,c);
}
Generally not unreasonable, though I might worry that:
int total=0;
for (i=0; num_reps > i; i++)
{
update_progress_bar(i);
total+=do_something_slow_with_no_side_effects(i);
}
show_result(total);
would become
int total=0;
if (fork_is_first_thread())
{
for (i=0; num_reps > i; i++)
total+=do_something_slow_with_no_side_effects(i);
signal_other_thread_and_die();
}
else
{
for (i=0; num_reps > i; i++)
update_progress_bar(i);
wait_for_other_thread();
}
show_result(total);
By having one CPU handle the calculations and another handle the progress bar updates, the rewrite would improve efficiency. Unfortunately, it would make the progress bar updates rather less useful than they should be.
Does someone have a good explanation of why this was necessary to allow?
I have an explanation of why it is necessary not to allow, assuming that C++ still has the ambition to be a general-purpose language suitable for performance-critical, low-level programming.
This used to be a working, strictly conforming freestanding C++ program:
int main()
{
setup_interrupts();
for(;;)
{}
}
The above is a perfectly fine and common way to write main() in many low-end microcontroller systems. The whole program execution is done inside interrupt service routines (or in case of RTOS, it could be processes). And main() may absolutely not be allowed to return since these are bare metal systems and there is nobody to return to.
Typical real-world examples of where the above design can be used are PWM/motor driver microcontrollers, lighting control applications, simple regulator systems, sensor applications, simple household electronics and so on.
Changes to C++ have unfortunately made the language impossible to use for this kind of embedded systems programming. Existing real-world applications like the ones above will break in dangerous ways if ported to newer C++ compilers.
C++20 6.9.2.3 Forward progress [intro.progress]
The implementation may assume that any thread will eventually do one of the following:
(1.1) — terminate,
(1.2) — make a call to a library I/O function,
(1.3) — perform an access through a volatile glvalue, or
(1.4) — perform a synchronization operation or an atomic operation.
Lets address each bullet for the above, previously strictly conforming freestanding C++ program:
1.1. As any embedded systems beginner can tell the committee, bare metal/RTOS microcontroller systems never terminate. Therefore the loop cannot terminate either or main() would turn into runaway code, which would be a severe and dangerous error condition.
1.2 N/A.
1.3 Not necessarily. One may argue that the for(;;) loop is the proper place to feed the watchdog, which in turn is a side effect as it writes to a volatile register. But there may be implementation reasons why you don't want to use a watchdog. At any rate, it is not C++'s business to dictate that a watchdog should be used.
1.4 N/A.
Because of this, C++ is made even more unsuitable for embedded systems applications than it already was before.
Another problem here is that the standard speaks of "threads", which are higher level concepts. On real-world computers, threads are implemented through a low-level concept known as interrupts. Interrupts are similar to threads in terms of race conditions and concurrent execution, but they are not threads. On low-level systems there is just one core and therefore only one interrupt at a time is typically executed (kind of like threads used to work on single core PC back in the days).
And you can't have threads if you can't have interrupts. So threads would have to be implemented by a lower-level language more suitable for embedded systems than C++. The options being assembler or C.
By writing a loop, the programmer is asserting either that the loop does something with visible behavior (performs I/O, accesses volatile objects, or performs synchronization or atomic operations), or that it eventually terminates.
This is completely misguided and clearly written by someone who has never worked with microcontroller programming.
So what should those few remaining C++ embedded programmers who refuse to port their code to C "because of reasons" do? You have to introduce a side effect inside the for(;;) loop:
int main()
{
setup_interrupts();
for(volatile int i=0; i==0;)
{}
}
Or if you have the watchdog enabled as you ought to, feed it inside the for(;;) loop.
It is not decidable for the compiler for non-trivial cases if it is an infinite loop at all.
In different cases, it can happen that your optimiser will reach a better complexity class for your code (e.g. it was O(n^2) and you get O(n) or O(1) after optimisation).
So, to include such a rule that disallows removing an infinite loop into the C++ standard would make many optimisations impossible. And most people don't want this. I think this quite answers your question.
Another thing: I never have seen any valid example where you need an infinite loop which does nothing.
The one example I have heard about was an ugly hack that really should be solved otherwise: It was about embedded systems where the only way to trigger a reset was to freeze the device so that the watchdog restarts it automatically.
If you know any valid/good example where you need an infinite loop which does nothing, please tell me.
I think it's worth pointing out that loops which would be infinite except for the fact that they interact with other threads via non-volatile, non-synchronised variables can now yield incorrect behaviour with a new compiler.
I other words, make your globals volatile -- as well as arguments passed into such a loop via pointer/reference.
Updated, see below!
I have heard and read that C++0x allows an compiler to print "Hello" for the following snippet
#include <iostream>
int main() {
while(1)
;
std::cout << "Hello" << std::endl;
}
It apparently has something to do with threads and optimization capabilities. It looks to me that this can surprise many people though.
Does someone have a good explanation of why this was necessary to allow? For reference, the most recent C++0x draft says at 6.5/5
A loop that, outside of the for-init-statement in the case of a for statement,
makes no calls to library I/O functions, and
does not access or modify volatile objects, and
performs no synchronization operations (1.10) or atomic operations (Clause 29)
may be assumed by the implementation to terminate. [ Note: This is intended to allow compiler transfor-
mations, such as removal of empty loops, even when termination cannot be proven. — end note ]
Edit:
This insightful article says about that Standards text
Unfortunately, the words "undefined behavior" are not used. However, anytime the standard says "the compiler may assume P," it is implied that a program which has the property not-P has undefined semantics.
Is that correct, and is the compiler allowed to print "Bye" for the above program?
There is an even more insightful thread here, which is about an analogous change to C, started off by the Guy done the above linked article. Among other useful facts, they present a solution that seems to also apply to C++0x (Update: This won't work anymore with n3225 - see below!)
endless:
goto endless;
A compiler is not allowed to optimize that away, it seems, because it's not a loop, but a jump. Another guy summarizes the proposed change in C++0x and C201X
By writing a loop, the programmer is asserting either that the
loop does something with visible behavior (performs I/O, accesses
volatile objects, or performs synchronization or atomic operations),
or that it eventually terminates. If I violate that assumption
by writing an infinite loop with no side effects, I am lying to the
compiler, and my program's behavior is undefined. (If I'm lucky,
the compiler might warn me about it.) The language doesn't provide
(no longer provides?) a way to express an infinite loop without
visible behavior.
Update on 3.1.2011 with n3225: Committee moved the text to 1.10/24 and say
The implementation may assume that any thread will eventually do one of the following:
terminate,
make a call to a library I/O function,
access or modify a volatile object, or
perform a synchronization operation or an atomic operation.
The goto trick will not work anymore!
To me, the relevant justification is:
This is intended to allow compiler transfor- mations, such as removal of empty loops, even when termination cannot be proven.
Presumably, this is because proving termination mechanically is difficult, and the inability to prove termination hampers compilers which could otherwise make useful transformations, such as moving nondependent operations from before the loop to after or vice versa, performing post-loop operations in one thread while the loop executes in another, and so on. Without these transformations, a loop might block all other threads while they wait for the one thread to finish said loop. (I use "thread" loosely to mean any form of parallel processing, including separate VLIW instruction streams.)
EDIT: Dumb example:
while (complicated_condition()) {
x = complicated_but_externally_invisible_operation(x);
}
complex_io_operation();
cout << "Results:" << endl;
cout << x << endl;
Here, it would be faster for one thread to do the complex_io_operation while the other is doing all the complex calculations in the loop. But without the clause you have quoted, the compiler has to prove two things before it can make the optimisation: 1) that complex_io_operation() doesn't depend on the results of the loop, and 2) that the loop will terminate. Proving 1) is pretty easy, proving 2) is the halting problem. With the clause, it may assume the loop terminates and get a parallelisation win.
I also imagine that the designers considered that the cases where infinite loops occur in production code are very rare and are usually things like event-driven loops which access I/O in some manner. As a result, they have pessimised the rare case (infinite loops) in favour of optimising the more common case (noninfinite, but difficult to mechanically prove noninfinite, loops).
It does, however, mean that infinite loops used in learning examples will suffer as a result, and will raise gotchas in beginner code. I can't say this is entirely a good thing.
EDIT: with respect to the insightful article you now link, I would say that "the compiler may assume X about the program" is logically equivalent to "if the program doesn't satisfy X, the behaviour is undefined". We can show this as follows: suppose there exists a program which does not satisfy property X. Where would the behaviour of this program be defined? The Standard only defines behaviour assuming property X is true. Although the Standard does not explicitly declare the behaviour undefined, it has declared it undefined by omission.
Consider a similar argument: "the compiler may assume a variable x is only assigned to at most once between sequence points" is equivalent to "assigning to x more than once between sequence points is undefined".
Does someone have a good explanation of why this was necessary to allow?
Yes, Hans Boehm provides a rationale for this in N1528: Why undefined behavior for infinite loops?, although this is WG14 document the rationale applies to C++ as well and the document refers to both WG14 and WG21:
As N1509 correctly points out, the current draft essentially gives
undefined behavior to infinite loops in 6.8.5p6. A major issue for
doing so is that it allows code to move across a potentially
non-terminating loop. For example, assume we have the following loops,
where count and count2 are global variables (or have had their address
taken), and p is a local variable, whose address has not been taken:
for (p = q; p != 0; p = p -> next) {
++count;
}
for (p = q; p != 0; p = p -> next) {
++count2;
}
Could these two loops be merged and replaced by the following loop?
for (p = q; p != 0; p = p -> next) {
++count;
++count2;
}
Without the special dispensation in 6.8.5p6 for infinite loops, this
would be disallowed: If the first loop doesn't terminate because q
points to a circular list, the original never writes to count2. Thus
it could be run in parallel with another thread that accesses or
updates count2. This is no longer safe with the transformed version
which does access count2 in spite of the infinite loop. Thus the
transformation potentially introduces a data race.
In cases like this, it is very unlikely that a compiler would be able
to prove loop termination; it would have to understand that q points
to an acyclic list, which I believe is beyond the ability of most
mainstream compilers, and often impossible without whole program
information.
The restrictions imposed by non-terminating loops are a restriction on
the optimization of terminating loops for which the compiler cannot
prove termination, as well as on the optimization of actually
non-terminating loops. The former are much more common than the
latter, and often more interesting to optimize.
There are clearly also for-loops with an integer loop variable in
which it would be difficult for a compiler to prove termination, and
it would thus be difficult for the compiler to restructure loops
without 6.8.5p6. Even something like
for (i = 1; i != 15; i += 2)
or
for (i = 1; i <= 10; i += j)
seems nontrivial to handle. (In the former case, some basic number
theory is required to prove termination, in the latter case, we need
to know something about the possible values of j to do so. Wrap-around
for unsigned integers may complicate some of this reasoning further.)
This issue seems to apply to almost all loop restructuring
transformations, including compiler parallelization and
cache-optimization transformations, both of which are likely to gain
in importance, and are already often important for numerical code.
This appears likely to turn into a substantial cost for the benefit of
being able to write infinite loops in the most natural way possible,
especially since most of us rarely write intentionally infinite loops.
The one major difference with C is that C11 provides an exception for controlling expressions that are constant expressions which differs from C++ and makes your specific example well-defined in C11.
I think the correct interpretation is the one from your edit: empty infinite loops are undefined behavior.
I wouldn't say it's particularly intuitive behavior, but this interpretation makes more sense than the alternative one, that the compiler is arbitrarily allowed to ignore infinite loops without invoking UB.
If infinite loops are UB, it just means that non-terminating programs aren't considered meaningful: according to C++0x, they have no semantics.
That does make a certain amount of sense too. They are a special case, where a number of side effects just no longer occur (for example, nothing is ever returned from main), and a number of compiler optimizations are hampered by having to preserve infinite loops. For example, moving computations across the loop is perfectly valid if the loop has no side effects, because eventually, the computation will be performed in any case.
But if the loop never terminates, we can't safely rearrange code across it, because we might just be changing which operations actually get executed before the program hangs. Unless we treat a hanging program as UB, that is.
The relevant issue is that the compiler is allowed to reorder code whose side effects do not conflict. The surprising order of execution could occur even if the compiler produced non-terminating machine code for the infinite loop.
I believe this is the right approach. The language spec defines ways to enforce order of execution. If you want an infinite loop that cannot be reordered around, write this:
volatile int dummy_side_effect;
while (1) {
dummy_side_effect = 0;
}
printf("Never prints.\n");
I think this is along the lines of the this type of question, which references another thread. Optimization can occasionally remove empty loops.
I think the issue could perhaps best be stated, as "If a later piece of code does not depend on an earlier piece of code, and the earlier piece of code has no side-effects on any other part of the system, the compiler's output may execute the later piece of code before, after, or intermixed with, the execution of the former, even if the former contains loops, without regard for when or whether the former code would actually complete. For example, the compiler could rewrite:
void testfermat(int n)
{
int a=1,b=1,c=1;
while(pow(a,n)+pow(b,n) != pow(c,n))
{
if (b > a) a++; else if (c > b) {a=1; b++}; else {a=1; b=1; c++};
}
printf("The result is ");
printf("%d/%d/%d", a,b,c);
}
as
void testfermat(int n)
{
if (fork_is_first_thread())
{
int a=1,b=1,c=1;
while(pow(a,n)+pow(b,n) != pow(c,n))
{
if (b > a) a++; else if (c > b) {a=1; b++}; else {a=1; b=1; c++};
}
signal_other_thread_and_die();
}
else // Second thread
{
printf("The result is ");
wait_for_other_thread();
}
printf("%d/%d/%d", a,b,c);
}
Generally not unreasonable, though I might worry that:
int total=0;
for (i=0; num_reps > i; i++)
{
update_progress_bar(i);
total+=do_something_slow_with_no_side_effects(i);
}
show_result(total);
would become
int total=0;
if (fork_is_first_thread())
{
for (i=0; num_reps > i; i++)
total+=do_something_slow_with_no_side_effects(i);
signal_other_thread_and_die();
}
else
{
for (i=0; num_reps > i; i++)
update_progress_bar(i);
wait_for_other_thread();
}
show_result(total);
By having one CPU handle the calculations and another handle the progress bar updates, the rewrite would improve efficiency. Unfortunately, it would make the progress bar updates rather less useful than they should be.
Does someone have a good explanation of why this was necessary to allow?
I have an explanation of why it is necessary not to allow, assuming that C++ still has the ambition to be a general-purpose language suitable for performance-critical, low-level programming.
This used to be a working, strictly conforming freestanding C++ program:
int main()
{
setup_interrupts();
for(;;)
{}
}
The above is a perfectly fine and common way to write main() in many low-end microcontroller systems. The whole program execution is done inside interrupt service routines (or in case of RTOS, it could be processes). And main() may absolutely not be allowed to return since these are bare metal systems and there is nobody to return to.
Typical real-world examples of where the above design can be used are PWM/motor driver microcontrollers, lighting control applications, simple regulator systems, sensor applications, simple household electronics and so on.
Changes to C++ have unfortunately made the language impossible to use for this kind of embedded systems programming. Existing real-world applications like the ones above will break in dangerous ways if ported to newer C++ compilers.
C++20 6.9.2.3 Forward progress [intro.progress]
The implementation may assume that any thread will eventually do one of the following:
(1.1) — terminate,
(1.2) — make a call to a library I/O function,
(1.3) — perform an access through a volatile glvalue, or
(1.4) — perform a synchronization operation or an atomic operation.
Lets address each bullet for the above, previously strictly conforming freestanding C++ program:
1.1. As any embedded systems beginner can tell the committee, bare metal/RTOS microcontroller systems never terminate. Therefore the loop cannot terminate either or main() would turn into runaway code, which would be a severe and dangerous error condition.
1.2 N/A.
1.3 Not necessarily. One may argue that the for(;;) loop is the proper place to feed the watchdog, which in turn is a side effect as it writes to a volatile register. But there may be implementation reasons why you don't want to use a watchdog. At any rate, it is not C++'s business to dictate that a watchdog should be used.
1.4 N/A.
Because of this, C++ is made even more unsuitable for embedded systems applications than it already was before.
Another problem here is that the standard speaks of "threads", which are higher level concepts. On real-world computers, threads are implemented through a low-level concept known as interrupts. Interrupts are similar to threads in terms of race conditions and concurrent execution, but they are not threads. On low-level systems there is just one core and therefore only one interrupt at a time is typically executed (kind of like threads used to work on single core PC back in the days).
And you can't have threads if you can't have interrupts. So threads would have to be implemented by a lower-level language more suitable for embedded systems than C++. The options being assembler or C.
By writing a loop, the programmer is asserting either that the loop does something with visible behavior (performs I/O, accesses volatile objects, or performs synchronization or atomic operations), or that it eventually terminates.
This is completely misguided and clearly written by someone who has never worked with microcontroller programming.
So what should those few remaining C++ embedded programmers who refuse to port their code to C "because of reasons" do? You have to introduce a side effect inside the for(;;) loop:
int main()
{
setup_interrupts();
for(volatile int i=0; i==0;)
{}
}
Or if you have the watchdog enabled as you ought to, feed it inside the for(;;) loop.
It is not decidable for the compiler for non-trivial cases if it is an infinite loop at all.
In different cases, it can happen that your optimiser will reach a better complexity class for your code (e.g. it was O(n^2) and you get O(n) or O(1) after optimisation).
So, to include such a rule that disallows removing an infinite loop into the C++ standard would make many optimisations impossible. And most people don't want this. I think this quite answers your question.
Another thing: I never have seen any valid example where you need an infinite loop which does nothing.
The one example I have heard about was an ugly hack that really should be solved otherwise: It was about embedded systems where the only way to trigger a reset was to freeze the device so that the watchdog restarts it automatically.
If you know any valid/good example where you need an infinite loop which does nothing, please tell me.
I think it's worth pointing out that loops which would be infinite except for the fact that they interact with other threads via non-volatile, non-synchronised variables can now yield incorrect behaviour with a new compiler.
I other words, make your globals volatile -- as well as arguments passed into such a loop via pointer/reference.
When I compile and run this code with Clang (-O3) or MSVC (/O2)...
#include <stdio.h>
#include <time.h>
static int const N = 0x8000;
int main()
{
clock_t const start = clock();
for (int i = 0; i < N; ++i)
{
int a[N]; // Never used outside of this block, but not optimized away
for (int j = 0; j < N; ++j)
{
++a[j]; // This is undefined behavior (due to possible
// signed integer overflow), but Clang doesn't see it
}
}
clock_t const finish = clock();
fprintf(stderr, "%u ms\n",
static_cast<unsigned int>((finish - start) * 1000 / CLOCKS_PER_SEC));
return 0;
}
... the loop doesn't get optimized away.
Furthermore, neither Clang 3.6 nor Visual C++ 2013 nor GCC 4.8.1 tells me that the variable is uninitialized!
Now I realize that the lack of an optimization isn't a bug per se, but I find this astonishing given how compilers are supposed to be pretty smart nowadays. This seems like such a simple piece of code that even liveness analysis techniques from a decade ago should be able to take care of optimizing away the variable a and therefore the whole loop -- never mind the fact that incrementing the variable is already undefined behavior.
Yet only GCC is able to figure out that it's a no-op, and none of the compilers tells me that this is an uninitialized variable.
Why is this? What's preventing simple liveness analysis from telling the compiler that a is unused? Moreover, why isn't the compiler detecting that a[j] is uninitialized in the first place? Why can't the existing uninitialized-variable-detectors in all of those compilers catch this obvious error?
The undefined behavior is irrelevant here. Replacing the inner loop with:
for (int j = 1; j < N; ++j)
{
a[j-1] = a[j];
a[j] = j;
}
... has the same effect, at least with Clang.
The issue is that the inner loop both loads from a[j] (for some j) and stores to a[j] (for some j). None of the stores can be removed, because the compiler believes they may be visible to later loads, and none of the loads can be removed, because their values are used (as input to the later stores). As a result, the loop still has side-effects on memory, so the compiler doesn't see that it can be deleted.
Contrary to n.m.'s answer, replacing int with unsigned does not make the problem go away. The code generated by Clang 3.4.1 using int and using unsigned int is identical.
It's an interesting issue with regards to optimizing. I would
expect that in most cases, the compiler would treat each element
of the array as an individual variable when doing dead code
analysis. Ans 0x8000 make too many individual variables to
track, so the compiler doesn't try. The fact that a[j]
doesn't always access the the same object could cause problems
as well for the optimizer.
Obviously, different compilers use different heuristics;
a compiler could treat the array as a single object, and detect
that it never affected output (observable behavior). Some
compilers may choose not to, however, on the grounds that
typically, it's a lot of work for very little gain: how often
would such optimizations be applicable in real code?
++a[j]; // This is undefined behavior too, but Clang doesn't see it
Are you saying this is undefined behavior because the array elements are uninitialized?
If so, although this is a common interpretation of clause 4.1/1 in the standard I believe it is incorrect. The elements are 'uninitialized' in the sense that programmers usually use this term, but I do not believe this corresponds exactly to the C++ specification's use of the term.
In particular C++11 8.5/11 states that these objects are in fact default initialized, and this seems to me to be mutually exclusive with being uninitialized. The standard also states that for some objects being default initialized means that 'no initialized is performed'. Some might assume this means that they are uninitialized but this is not specified and I simply take it to mean that no such performance is required.
The spec does make clear that the array elements will have indeterminant values. C++ specifies, by reference to the C standard, that indeterminant values can be either valid representations, legal to access normally, or trap representations. If the particular indeterminant values of the array elements happen to all be valid representations, (and none are INT_MAX, avoiding overflow) then the above line does not trigger any undefined behavior in C++11.
Since these array elements could be trap representations it would be perfectly conformant for clang to act as though they are guaranteed to be trap representations, effectively choosing to make the code UB in order to create an optimization opportunity.
Even if clang doesn't do that it could still choose to optimize based on the dataflow. Clang does know how to do that, as demonstrated by the fact that if the inner loop is changed slightly then the loops do get removed.
So then why does the (optional) presence of UB seem to stymie optimization, when UB is usually taken as an opportunity for more optimization?
What may be going on is that clang has decided that users want int trapping based on the hardware's behavior. And so rather than taking traps as an optimization opportunity, clang has to generate code which faithfully reproduces the program behavior in hardware. This means that the loops cannot be eliminated based on dataflow, because doing so might eliminate hardware traps.
C++14 updates the behavior such that accessing indeterminant values itself produces undefined behavior, independent of whether one considers the variable uninitialized or not: https://stackoverflow.com/a/23415662/365496
That is indeed very interesting. I tried your example with MSVC 2013.
My first idea was that the fact that the ++a[j] is somewhat undefined is the reason why the loop is not removed, because removing this would definetly change the meaning of the program from an undefined/incorrect semantic to something meaningful, so I tried to initialize the values before but the loops still did not dissappear.
Afterwards I replaced the ++a[j]; with an a[j] = 0; which then produced an output without any loop so everything between the two calls to clock() was removed. I can only guess about the reason. Perhaps the optimizer is not able to prove that the operator++ has no side effects for any reason.
I have no idea why this code complies :
int array[100];
array[-50] = 100; // Crash!!
...the compiler still compiles properly, without compiling errors, and warnings.
So why does it compile at all?
array[-50] = 100;
Actually means here:
*(array - 50) = 100;
Take into consideration this code:
int array[100];
int *b = &(a[50]);
b[-20] = 5;
This code is valid and won't crash. Compiler has no way of knowing, whether the code will crash or not and what programmer wanted to do with the array. So it does not complain.
Finally, take into consideration, that you should not rely on compiler warnings while finding bugs in your code. Compilers will not find most of your bugs, they barely try to make some hints for you to ease the bugfixing process (sometimes they even may be mistaken and point out, that valid code is buggy). Also, the standard actually never requires the compiler to emit warning, so these are only an act of good will of compiler implementers.
It compiles because the expression array[-50] is transformed to the equivalent
*(&array[0] + (-50))
which is another way of saying "take the memory address &array[0] and add to it -50 times sizeof(array[0]), then interpret the contents of the resulting memory address and those following it as an int", as per the usual pointer arithmetic rules. This is a perfectly valid expression where -50 might really be any integer (and of course it doesn't need to be a compile-time constant).
Now it's definitely true that since here -50 is a compile-time constant, and since accessing the minus 50th element of an array is almost always an error, the compiler could (and perhaps should) produce a warning for this.
However, we should also consider that detecting this specific condition (statically indexing into an array with an apparently invalid index) is something that you don't expect to see in real code. Therefore the compiler team's resources will be probably put to better use doing something else.
Contrast this with other constructs like if (answer = 42) which you do expect to see in real code (if only because it's so easy to make that typo) and which are hard to debug (the eye can easily read = as ==, whereas that -50 immediately sticks out). In these cases a compiler warning is much more productive.
The compiler is not required to catch all potential problems at compile time. The C standard allows for undefined behavior at run time (which is what happens when this program is executed). You may treat it as a legal excuse not to catch this kind of bugs.
There are compilers and static program analyzers that can do catch trivial bugs like this, though.
True compilers do (note: need to switch the compiler to clang 3.2, gcc is not user-friendly)
Compilation finished with warnings:
source.cpp:3:4: warning: array index -50 is before the beginning of the array [-Warray-bounds]
array[-50] = 100;
^ ~~~
source.cpp:2:4: note: array 'array' declared here
int array[100];
^
1 warning generated.
If you have a lesser (*) compiler, you may have to setup the warning manually though.
(*) ie, less user-friendly
The number inside the brackets is just an index. It tells you how many steps in memory to take to find the number you're requesting. array[2] means start at the beginning of array, and jump forwards two times.
You just told it to jump backwards 50 times, which is a valid statement. However, I can't imagine there being a good reason for doing this...
Updated, see below!
I have heard and read that C++0x allows an compiler to print "Hello" for the following snippet
#include <iostream>
int main() {
while(1)
;
std::cout << "Hello" << std::endl;
}
It apparently has something to do with threads and optimization capabilities. It looks to me that this can surprise many people though.
Does someone have a good explanation of why this was necessary to allow? For reference, the most recent C++0x draft says at 6.5/5
A loop that, outside of the for-init-statement in the case of a for statement,
makes no calls to library I/O functions, and
does not access or modify volatile objects, and
performs no synchronization operations (1.10) or atomic operations (Clause 29)
may be assumed by the implementation to terminate. [ Note: This is intended to allow compiler transfor-
mations, such as removal of empty loops, even when termination cannot be proven. — end note ]
Edit:
This insightful article says about that Standards text
Unfortunately, the words "undefined behavior" are not used. However, anytime the standard says "the compiler may assume P," it is implied that a program which has the property not-P has undefined semantics.
Is that correct, and is the compiler allowed to print "Bye" for the above program?
There is an even more insightful thread here, which is about an analogous change to C, started off by the Guy done the above linked article. Among other useful facts, they present a solution that seems to also apply to C++0x (Update: This won't work anymore with n3225 - see below!)
endless:
goto endless;
A compiler is not allowed to optimize that away, it seems, because it's not a loop, but a jump. Another guy summarizes the proposed change in C++0x and C201X
By writing a loop, the programmer is asserting either that the
loop does something with visible behavior (performs I/O, accesses
volatile objects, or performs synchronization or atomic operations),
or that it eventually terminates. If I violate that assumption
by writing an infinite loop with no side effects, I am lying to the
compiler, and my program's behavior is undefined. (If I'm lucky,
the compiler might warn me about it.) The language doesn't provide
(no longer provides?) a way to express an infinite loop without
visible behavior.
Update on 3.1.2011 with n3225: Committee moved the text to 1.10/24 and say
The implementation may assume that any thread will eventually do one of the following:
terminate,
make a call to a library I/O function,
access or modify a volatile object, or
perform a synchronization operation or an atomic operation.
The goto trick will not work anymore!
To me, the relevant justification is:
This is intended to allow compiler transfor- mations, such as removal of empty loops, even when termination cannot be proven.
Presumably, this is because proving termination mechanically is difficult, and the inability to prove termination hampers compilers which could otherwise make useful transformations, such as moving nondependent operations from before the loop to after or vice versa, performing post-loop operations in one thread while the loop executes in another, and so on. Without these transformations, a loop might block all other threads while they wait for the one thread to finish said loop. (I use "thread" loosely to mean any form of parallel processing, including separate VLIW instruction streams.)
EDIT: Dumb example:
while (complicated_condition()) {
x = complicated_but_externally_invisible_operation(x);
}
complex_io_operation();
cout << "Results:" << endl;
cout << x << endl;
Here, it would be faster for one thread to do the complex_io_operation while the other is doing all the complex calculations in the loop. But without the clause you have quoted, the compiler has to prove two things before it can make the optimisation: 1) that complex_io_operation() doesn't depend on the results of the loop, and 2) that the loop will terminate. Proving 1) is pretty easy, proving 2) is the halting problem. With the clause, it may assume the loop terminates and get a parallelisation win.
I also imagine that the designers considered that the cases where infinite loops occur in production code are very rare and are usually things like event-driven loops which access I/O in some manner. As a result, they have pessimised the rare case (infinite loops) in favour of optimising the more common case (noninfinite, but difficult to mechanically prove noninfinite, loops).
It does, however, mean that infinite loops used in learning examples will suffer as a result, and will raise gotchas in beginner code. I can't say this is entirely a good thing.
EDIT: with respect to the insightful article you now link, I would say that "the compiler may assume X about the program" is logically equivalent to "if the program doesn't satisfy X, the behaviour is undefined". We can show this as follows: suppose there exists a program which does not satisfy property X. Where would the behaviour of this program be defined? The Standard only defines behaviour assuming property X is true. Although the Standard does not explicitly declare the behaviour undefined, it has declared it undefined by omission.
Consider a similar argument: "the compiler may assume a variable x is only assigned to at most once between sequence points" is equivalent to "assigning to x more than once between sequence points is undefined".
Does someone have a good explanation of why this was necessary to allow?
Yes, Hans Boehm provides a rationale for this in N1528: Why undefined behavior for infinite loops?, although this is WG14 document the rationale applies to C++ as well and the document refers to both WG14 and WG21:
As N1509 correctly points out, the current draft essentially gives
undefined behavior to infinite loops in 6.8.5p6. A major issue for
doing so is that it allows code to move across a potentially
non-terminating loop. For example, assume we have the following loops,
where count and count2 are global variables (or have had their address
taken), and p is a local variable, whose address has not been taken:
for (p = q; p != 0; p = p -> next) {
++count;
}
for (p = q; p != 0; p = p -> next) {
++count2;
}
Could these two loops be merged and replaced by the following loop?
for (p = q; p != 0; p = p -> next) {
++count;
++count2;
}
Without the special dispensation in 6.8.5p6 for infinite loops, this
would be disallowed: If the first loop doesn't terminate because q
points to a circular list, the original never writes to count2. Thus
it could be run in parallel with another thread that accesses or
updates count2. This is no longer safe with the transformed version
which does access count2 in spite of the infinite loop. Thus the
transformation potentially introduces a data race.
In cases like this, it is very unlikely that a compiler would be able
to prove loop termination; it would have to understand that q points
to an acyclic list, which I believe is beyond the ability of most
mainstream compilers, and often impossible without whole program
information.
The restrictions imposed by non-terminating loops are a restriction on
the optimization of terminating loops for which the compiler cannot
prove termination, as well as on the optimization of actually
non-terminating loops. The former are much more common than the
latter, and often more interesting to optimize.
There are clearly also for-loops with an integer loop variable in
which it would be difficult for a compiler to prove termination, and
it would thus be difficult for the compiler to restructure loops
without 6.8.5p6. Even something like
for (i = 1; i != 15; i += 2)
or
for (i = 1; i <= 10; i += j)
seems nontrivial to handle. (In the former case, some basic number
theory is required to prove termination, in the latter case, we need
to know something about the possible values of j to do so. Wrap-around
for unsigned integers may complicate some of this reasoning further.)
This issue seems to apply to almost all loop restructuring
transformations, including compiler parallelization and
cache-optimization transformations, both of which are likely to gain
in importance, and are already often important for numerical code.
This appears likely to turn into a substantial cost for the benefit of
being able to write infinite loops in the most natural way possible,
especially since most of us rarely write intentionally infinite loops.
The one major difference with C is that C11 provides an exception for controlling expressions that are constant expressions which differs from C++ and makes your specific example well-defined in C11.
I think the correct interpretation is the one from your edit: empty infinite loops are undefined behavior.
I wouldn't say it's particularly intuitive behavior, but this interpretation makes more sense than the alternative one, that the compiler is arbitrarily allowed to ignore infinite loops without invoking UB.
If infinite loops are UB, it just means that non-terminating programs aren't considered meaningful: according to C++0x, they have no semantics.
That does make a certain amount of sense too. They are a special case, where a number of side effects just no longer occur (for example, nothing is ever returned from main), and a number of compiler optimizations are hampered by having to preserve infinite loops. For example, moving computations across the loop is perfectly valid if the loop has no side effects, because eventually, the computation will be performed in any case.
But if the loop never terminates, we can't safely rearrange code across it, because we might just be changing which operations actually get executed before the program hangs. Unless we treat a hanging program as UB, that is.
The relevant issue is that the compiler is allowed to reorder code whose side effects do not conflict. The surprising order of execution could occur even if the compiler produced non-terminating machine code for the infinite loop.
I believe this is the right approach. The language spec defines ways to enforce order of execution. If you want an infinite loop that cannot be reordered around, write this:
volatile int dummy_side_effect;
while (1) {
dummy_side_effect = 0;
}
printf("Never prints.\n");
I think this is along the lines of the this type of question, which references another thread. Optimization can occasionally remove empty loops.
I think the issue could perhaps best be stated, as "If a later piece of code does not depend on an earlier piece of code, and the earlier piece of code has no side-effects on any other part of the system, the compiler's output may execute the later piece of code before, after, or intermixed with, the execution of the former, even if the former contains loops, without regard for when or whether the former code would actually complete. For example, the compiler could rewrite:
void testfermat(int n)
{
int a=1,b=1,c=1;
while(pow(a,n)+pow(b,n) != pow(c,n))
{
if (b > a) a++; else if (c > b) {a=1; b++}; else {a=1; b=1; c++};
}
printf("The result is ");
printf("%d/%d/%d", a,b,c);
}
as
void testfermat(int n)
{
if (fork_is_first_thread())
{
int a=1,b=1,c=1;
while(pow(a,n)+pow(b,n) != pow(c,n))
{
if (b > a) a++; else if (c > b) {a=1; b++}; else {a=1; b=1; c++};
}
signal_other_thread_and_die();
}
else // Second thread
{
printf("The result is ");
wait_for_other_thread();
}
printf("%d/%d/%d", a,b,c);
}
Generally not unreasonable, though I might worry that:
int total=0;
for (i=0; num_reps > i; i++)
{
update_progress_bar(i);
total+=do_something_slow_with_no_side_effects(i);
}
show_result(total);
would become
int total=0;
if (fork_is_first_thread())
{
for (i=0; num_reps > i; i++)
total+=do_something_slow_with_no_side_effects(i);
signal_other_thread_and_die();
}
else
{
for (i=0; num_reps > i; i++)
update_progress_bar(i);
wait_for_other_thread();
}
show_result(total);
By having one CPU handle the calculations and another handle the progress bar updates, the rewrite would improve efficiency. Unfortunately, it would make the progress bar updates rather less useful than they should be.
Does someone have a good explanation of why this was necessary to allow?
I have an explanation of why it is necessary not to allow, assuming that C++ still has the ambition to be a general-purpose language suitable for performance-critical, low-level programming.
This used to be a working, strictly conforming freestanding C++ program:
int main()
{
setup_interrupts();
for(;;)
{}
}
The above is a perfectly fine and common way to write main() in many low-end microcontroller systems. The whole program execution is done inside interrupt service routines (or in case of RTOS, it could be processes). And main() may absolutely not be allowed to return since these are bare metal systems and there is nobody to return to.
Typical real-world examples of where the above design can be used are PWM/motor driver microcontrollers, lighting control applications, simple regulator systems, sensor applications, simple household electronics and so on.
Changes to C++ have unfortunately made the language impossible to use for this kind of embedded systems programming. Existing real-world applications like the ones above will break in dangerous ways if ported to newer C++ compilers.
C++20 6.9.2.3 Forward progress [intro.progress]
The implementation may assume that any thread will eventually do one of the following:
(1.1) — terminate,
(1.2) — make a call to a library I/O function,
(1.3) — perform an access through a volatile glvalue, or
(1.4) — perform a synchronization operation or an atomic operation.
Lets address each bullet for the above, previously strictly conforming freestanding C++ program:
1.1. As any embedded systems beginner can tell the committee, bare metal/RTOS microcontroller systems never terminate. Therefore the loop cannot terminate either or main() would turn into runaway code, which would be a severe and dangerous error condition.
1.2 N/A.
1.3 Not necessarily. One may argue that the for(;;) loop is the proper place to feed the watchdog, which in turn is a side effect as it writes to a volatile register. But there may be implementation reasons why you don't want to use a watchdog. At any rate, it is not C++'s business to dictate that a watchdog should be used.
1.4 N/A.
Because of this, C++ is made even more unsuitable for embedded systems applications than it already was before.
Another problem here is that the standard speaks of "threads", which are higher level concepts. On real-world computers, threads are implemented through a low-level concept known as interrupts. Interrupts are similar to threads in terms of race conditions and concurrent execution, but they are not threads. On low-level systems there is just one core and therefore only one interrupt at a time is typically executed (kind of like threads used to work on single core PC back in the days).
And you can't have threads if you can't have interrupts. So threads would have to be implemented by a lower-level language more suitable for embedded systems than C++. The options being assembler or C.
By writing a loop, the programmer is asserting either that the loop does something with visible behavior (performs I/O, accesses volatile objects, or performs synchronization or atomic operations), or that it eventually terminates.
This is completely misguided and clearly written by someone who has never worked with microcontroller programming.
So what should those few remaining C++ embedded programmers who refuse to port their code to C "because of reasons" do? You have to introduce a side effect inside the for(;;) loop:
int main()
{
setup_interrupts();
for(volatile int i=0; i==0;)
{}
}
Or if you have the watchdog enabled as you ought to, feed it inside the for(;;) loop.
It is not decidable for the compiler for non-trivial cases if it is an infinite loop at all.
In different cases, it can happen that your optimiser will reach a better complexity class for your code (e.g. it was O(n^2) and you get O(n) or O(1) after optimisation).
So, to include such a rule that disallows removing an infinite loop into the C++ standard would make many optimisations impossible. And most people don't want this. I think this quite answers your question.
Another thing: I never have seen any valid example where you need an infinite loop which does nothing.
The one example I have heard about was an ugly hack that really should be solved otherwise: It was about embedded systems where the only way to trigger a reset was to freeze the device so that the watchdog restarts it automatically.
If you know any valid/good example where you need an infinite loop which does nothing, please tell me.
I think it's worth pointing out that loops which would be infinite except for the fact that they interact with other threads via non-volatile, non-synchronised variables can now yield incorrect behaviour with a new compiler.
I other words, make your globals volatile -- as well as arguments passed into such a loop via pointer/reference.