EDITED and refined my question after Johannes's valuable answer
bool b = true;
volatile bool vb = true;
void f1() { }
void f2() { b = false; }
void(* volatile pf)() = &f1; //a volatile pointer to function
int main()
{
//different threads start here, some of which may change pf
while(b && vb)
{
pf();
}
}
So, let's forget synchronization for a while. The question is whether b has to be declared volatile. I have read the standard and sort-of know the formal definition of volatile semantics (I even almost understand them, the word almost being the key). But let's be a bit informal here. If the compiler sees that in the loop there is no way for b to change then unless b is volatile, it can optimize it away and assume it is equivalent to while(vb). The question is, in this case pf is itself volatile, so is the compiler allowed to assume that b won't change in the loop even if b is not volatile?
Please refrain from comments and answers which address the style of this piece of code, this is not a real-world example, this is an experimental theoretical question.
Comments and answers which, apart from answering my question, also address the semantics of volatile in greater detail which you think I have misunderstood are very much welcome.
I hope my question is clear. TIA
Editing once more:
what about this?
bool b = true;
volatile bool vb = true;
void f1() {}
void f2() {b = false;}
void (*pf) () = &f1;
#include <iosrteam>
int main()
{
//threads here
while(b && vb)
{
int x;
std::cin >> x;
if(x == 0)
pf = &f1;
else
pf = &f2;
pf();
}
}
Is there a principal difference between the two programs. If yes, what is the difference?
The question is, in this case pf is itself volatile, so is the compiler allowed to assume that b won't change in the loop even if b is not volatile?
It can't, because you say that pf might be changed by the other threads, and this indirectly changes b if pf is called then by the while loop. So while it is theoretically not required to read b normally, it in practice must read it to determine whether it should short circuit (when b becomes false it must not read vb another time).
Answer to the second part
In this case pf is not volatile anymore, so the compiler can get rid of it and see that f1 has an empty body and f2 sets b to false. It could optimize main as follows
int main()
{
// threads here (which you say can only change "vb")
while(vb)
{
int x;
std::cin >> x;
if(x != 0)
break;
}
}
Answer to older revision
One condition for the compiler to be allowed to optimize the loop away is that the loop does not access or modify any volatile object (See [stmt.iter]p5 in n3126). You do that here, so it can't optimize the loop away. In C++03 a compiler wasn't allowed to optimize even the non-volatile version of that loop away (but compilers did it anyway).
Note that another condition for being able to optimize it away is that the loop contains no synchronization or atomic operations. In a multithreaded program, such should be present anyway though. So even if you get rid of that volatile, if your program is properly coded I don't think the compiler can optimize it away entirely.
The exact requirements on volatile in the current C++ standard in a case like this are, as I understand it, not entirely well-defined by the standard, since the standard doesn't really deal with multi-threading. It's basically a compiler hint. So, instead, I'll address what happens in a typical compiler.
First, suppose the compiler is compiling your functions independently, and then linking them together. In either example, you have a loop in which you're checking a variable, and calling a function pointer. Within the context of that function, the compiler has no idea what the function behind that function pointer will do, and thus it must always re-load b from memory after calling it. Thus, volatile is irrelevant there.
Expanding that to your first actual case, and allowing the compiler to make whole-program optimizations, because pf is volatile the compiler still has no idea what it's going to be pointing at (it can't even assume it's either f1 or f2!), and thus likewise cannot make any assumptions about what will be unmodified across the function-pointer call -- and so volatile on b is still irrelevant.
Your second case is actually simpler -- vb in it is a red herring. If you eliminate that, you can see that even in completely single-threaded semantics, the function-pointer call may modify b. You're not doing anything with undefined behavior, and so the program must operate correctly without volatile -- remember that, if you aren't considering a situation with external thread tweaks, volatile is a no-op. Therefore, without vb in the picture, you cannot possibly need volatile, and it's pretty clear that adding vb changes nothing.
Thus, in summary: You don't need volatile in either case. The difference, insofar as there is one, is that in the first case if fp were not volatile, a sufficiently-advanced compiler could possibly optimize b away, whereas it cannot even without volatile anywhere in the program in the second case. In practice, I do not expect any compilers would actually make that optimization.
volatile only hurts you if you think you could have benefited from an optimization that can't be done or if it communicates something that isn't true.
In your case, you said that these variables can be changed by other threads. Reading code, that's my assumption when I see volatile, so from a maintainer's perspective, that's good -- it's giving me extra information (which is true).
I don't know whether the optimizations are worth trying to salvage since you said this isn't the real code, but if they aren't then there aren't any reasons to not use volatile.
Not using volatile when you are supposed to results in incorrect behavior, since the optimizations are changing the meaning of the code.
I worry about coding the minutia of the standard and behavior of your compilers because things like this can change and even if they don't, your code changes (which could effect the compiler) -- so, unless you are looking for micro-optimization improvements on this specific code, I'd just leave it volatile.
Related
Say I have the function
int foo(int * const bar){
while(!*bar){
printf("qwertyuiop\n");
}
}
where I intend to change the value at bar to something other than 0 to stop this loop. Would it be appropriate to instead write it as below?
int foo(int volatile * const bar){
while(!*bar){
printf("qwertyuiop\n");
}
}
volatile was intended for things like memory-mapped device registers, where the pointed-to value could "magically" change "behind the compiler's back" due to the nature of the hardware involved. Assuming you're not writing code that deals with special hardware that might "spontaneously" change the value that bar points to, then you needn't (and shouldn't) use the volatile keyword. Simply omitting the const keyword is sufficient to let the compiler (and any programmer that might call the function) know that the pointed-to value is subject to change.
Note that if you are intending to set *bar from another thread, then the volatile keyword isn't good enough; even if you tag the pointer volatile, the compiler still won't guarantee correct handling. For that use-case to work correctly, you need to either synchronize all reads and writes to *bar with a mutex, or alternatively use a std::atomic<int> instead of a plain int.
In this question Will a static variable always use up memory? it is stated that compilers are allowed to optimize away a static variable if the address is never taken, e.g. like following:
void f() {
static int i = 3;
printf( "%d", i );
}
If there exists a function which takes its arguments by reference, is the compiler still allowed to optimize away the variable, e.g. as in
void ref( int & i ) {
printf( "%d", i );
}
void f() {
static int i = 3;
g( i );
}
Is the situation different for the "perfect forwarding" case. Here the function body is empty on purpose:
template< typename T >
void fwd( T && i ) {
}
void f() {
static int i = 3;
fwd( i );
}
Furthermore, would the compiler be allowed to optimize the call in the following case. (The function body is empty on purpose again):
void ptr( int * i ) {
}
void f() {
static int i = 3;
ptr( &i );
}
My questions arise from the fact, that references are not a pointer by the standard - but implemented as one usually.
Apart from, "is the compiler allowed to?" I am actually more interested in whether compilers do this kind of optimization?
that compilers are allowed to optimize away a static variable if the address is never taken
You seem to concentrated on the wrong part of the answer. The answer states:
the compiler can do anything it wants to your code so long as the observable behavior is the same
The end. You can take the address, don't take it, calculate the meaning of life and calculate how to heal cancer, the only thing that matters is observable effect. As long as you don't actually heal cancer (or output the results of calculations...), all calculations are just no-op.
f there exists a function which takes its arguments by reference, is the compiler still allowed to optimize away the variable
Yes. The code is just putc('3').
Is the situation different for the "perfect forwarding" case
No. The code is still just putc('3').
would the compiler be allowed to optimize the call in the following case
Yes. This code has no observable effect, contrary to the previous ones. The call to f() can just be removed.
in whether compilers do this kind of optimization?
Copy your code to https://godbolt.org/ and inspect the assembly code. Even with no experience in assembly code, you will see differences with different code and compilers.
Choose x86 gcc (trunk) and remember to enable optimizations -O. Copy code with static, then remove static - did the code change? Repeat for all code snippets.
Compilers are allowed to optimize out variables under the "as-if" rule, meaning that the compiler is allowed to do any optimization that doesn't alter the observable behaviour of the program. Whether the optimization actually occurs depends on how good the compiler's optimizer is, what optimization level you request, and whether the optimization belongs to a class of optimizations that actually improve performance (humans are not very good at predicting this).
In all of the examples you gave, the as-if rule gives the compiler latitude to eliminate the static variable.
In example 1, the definition of f is equivalent to void f() { printf("%d", 3); }. Since this has the exact same observable behaviour as the f you wrote, the compiler is allowed to replace one by the other, optimizing out the variable.
In example 2, since fwd does nothing, the definition of f is equivalent to void f() {}. Again, the as-if rule allows the compiler to replace the f you wrote with this empty function.
Example 3 is very similar to example 2 in terms of the implications of the as-if rule.
If you want to see whether a compiler will actually perform these optimizations, Godbolt is very useful. For example, if you look here, you'll see that at -O2, both GCC and Clang will perform the optimization described for example 1. They probably do this by first inlining ref into f.
I have code that calculates an array index, and if it is valid accesses that array item. Something like:
int b = rowCount() - 1;
if (b == -1) return;
const BlockInfo& bi = blockInfo[b];
I am worried that this might be triggering undefined behavior. For example, the compiler might assume that b is always non-negative, since I use it to index the array, so it will optimize the if clause away.
Under which circumstances is it safe to "access" an array out-of-bounds, when you do nothing with the invalid result? Does it change if blockInfo is not an actual array, but an container like a vector? If this is unsafe, could I fix it by putting the access in an else clause?
if (b == -1) {
return;
} else {
const BlockInfo& bi = blockInfo[b];
}
Lastly, are there compiler flags in the spirit of -fno-strict-aliasing or -fno-delete-null-pointer-checks that make the compiler "do the obvious thing" and prevent any unwanted behavior?
For clarification: My concern is specifically because of a different issue, where you intend to test whether a pointer is non-null before accessing it. The compiler turns this around and reasons that, since you are dereferencing it, it cannot have been null! Something like this (untested):
void someFunc(struct MyStruct *s) {
if (s != NULL) {
cout << s->someField << endl;
delete s;
}
}
I recall hearing that simply forming an out-of-bounds array access is UB in C++. Thus the compiler could legally assume the array index is not out of bounds, and remove checks to the contrary.
There is no access to blockInfo[-1] in your program. Your code specifically prohibits that.
For example, the compiler might assume that b is always non-negative, since I use it to index the array, so it will optimize the if clause away.
No, it cannot do that, precisely because an access to index -1 (or, rather, (std::size_t)-1) may or may not be a valid index. The language does let you pass -1 as an index; it'll just be converted first to a std::size_t with the well-defined unsigned wrap-around logic that comes with doing so. So there is not, and cannot be, any rule whereby the compiler is permitted to assume that you will never pass int -1 as an index.
Even if there were, it'd still make no sense to let the compiler completely ignore the if statement. If it could, if our if statements were not reliable, every program in the world would be unsafe! There'd be no way to enforce any of your operations' preconditions.
The compiler may only skip or re-order things when it can prove that doing so results in a well-defined program with the same behaviour as your original instructions, given any possible input.
In fact, this is where UB comes from: where proving correctness is really difficult, that's usually where the standard throws compilers a bone and says something is "undefined" and the compiler can just do whatever it likes.
One interesting example of this is kind of the opposite of your case, where a check is [erroneously] placed after the access, and the compiler therefore assumes the check passes, whether it actually did or not:
void foo(char* ptr)
{
char x = *ptr;
if (ptr)
bar();
else
baz();
}
The function foo may call bar() even if ptr is null! That might sound unlikely to you, but it actually does happen (e.g. this crash in a widely-used library).
could I fix it by putting the access in an else clause?
Those two pieces of code are semantically equivalent; it's the same program.
Lastly, are there compiler flags in the spirit of -fno-strict-aliasing or -fno-delete-null-pointer-checks that make the compiler "do the obvious thing" and prevent any unwanted behavior?
The compiler already does the obvious thing, as long as "obvious" is "according to the C++ standard".
the compiler might assume
If the compiler proceeds from a wrong assumption, then it's wrong and defective.
Under which circumstances is it safe to "access" an array out-of-bounds, when you do nothing with the invalid result?
It is never safe to access an array out of bounds, because that produces UB before you have a chance to use or not-use the result. However, an untaken branch in the code doesn't count as an access, as in your first or second examples. So, if I understand your last question, there's no need for a special flag.
I have stumbled upon the following code structure and I'm wondering whether this is intentional or just poor understanding of casting mechanisms:
struct AbstractBase{
virtual void doThis(){
//Basic implementation here.
};
virtual void doThat()=0;
};
struct DerivedA: public AbstractBase{
virtual void doThis(){
//Other implementation here.
};
virtual void doThat(){
// some stuff here.
};
};
// More derived classes with similar structure....
// Dubious stuff happening here:
void strangeStuff(AbstractBase* pAbstract, int switcher){
AbstractBase* a = NULL;
switch(switcher){
case TYPE_DERIVED_A:
// why would someone use the abstract base pointer here???
a = dynamic_cast<DerivedA*>(pAbstract);
a->doThis();
a->doThat();
break;
// similar case statement with other derived classes...
}
}
// "main"
DerivedA* pDerivedA = new DerivedA;
strangeStuff( pDerivedA, TYPE_DERIVED_A );
My guess is, that this dynamic_cast statement is just the result of poor understanding and very bad programming style in general (the whole way the code works, just feels painful to me) and that it doesn't cause any change in behaviour for this specific use case.
However, since I'm not an expert on casting, I'd like to know whether there are any subtle side-effects that I'm not aware of.
Blockquote [C++11: 5.2.7/9]: The value of a failed cast to pointer type is the null pointer value of the required result type.
The dynamic_cast can return NULL if the type was wrong, making the following lines crash. Hence, this can be either 1. an attempt to make (logical) errors more explicit, or 2. some sort of in-code documentation.
So while it doesn't look like the best design, it is not exactly true that the cast has no effect whatsoever.
My guess would be that the coder screwed up.
A second guess would be that you skipped a check for a being null in your simplification.
A third, and highly unlikely possibility, is that the coder was exploiting undefined behavior to optimize.
With this code:
a = dynamic_cast<DerivedA*>(pAbstract);
a->doThis();
if a is not of type DerivedA* (or more derived), the a->doThis() is undefined behavior. And if it is of type DerivedA*, then the dynamic_cast does absolutely nothing (guaranteed).
A compiler can, in theory, optimize out any other possibility away, even if you did not change the type of a, and remain conforming behavior. Even if someone later checks if a is null, the execution of undefined behavior on the very next line means that the compiler is free not to set a to null on the dynamic_cast line.
I would doubt that a given compiler would do this, but I could be wrong.
There are compilers that detect certain paths cause undefined behavior (in the future), eliminate such possibilities from happening backwards in execution to the point where the undefined behavior would have been set in motion, and then "know" that the code in question cannot be in the state that would trigger undefined behavior. It can then use this knowledge to optimize the code in question.
Here is an example:
std::string foo( unsigned int x ) {
std::string r;
if (x == (unsigned)-1)) {
r = "hello ";
}
int y = x;
std::stringstream ss;
ss << y;
r += ss.str();
return r;
}
The compiler can see the y=x line above. If x would overflow an int, then the conversion y=x is undefined behavior. It happens regardless of the result of the first branch.
In short, if the first branch runs, undefined behavior would result. And undefined behavior can do anything, including time travel -- it can go back in time and prevent that branch from being taken.
So the
if (x == (unsigned)-1)) {
r = "hello ";
}
branch can be eliminated by the optimizer, legally in C++.
While the above is just a toy case, gcc does optimizations very much like this. There is a flag to tell it not do.
(unsigned -1 is defined behavior, but overflowing an int is not, in C++. In practice, this is because there are platforms in which signed int overflow causes problems, and C++ doesn't want to impose extra costs on them to make a conforming implementation.)
dynamic_cast will confirm that the dynamic type does match the type indicated by the switcher variable, making the code slightly less dangerous. However, it will give a null pointer in the case of a mismatch, and the code neglects to check for that.
But it seems more likely that the author didn't really understand the use of virtual functions (for uniform treatment of polymorphic types) and RTTI (for the rarer cases where you need to distinguish between types), and attempted to invent their own form of manual, error-prone type identification.
I learned that pointer aliasing may hurt performance, and that a __restrict__ attribute (in GCC, or equivalent attributes in other implementations) may help keeping track of which pointers should or should not be aliased. Meanwhile, I also learned that GCC's implementation of valarray stores a __restrict__'ed pointer (line 517 in https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.1/valarray-source.html), which I think hints the compiler (and responsible users) that the private pointer can be assumed not to be aliased anywhere in valarray methods.
But if we alias a pointer to a valarray object, for example:
#include <valarray>
int main() {
std::valarray<double> *a = new std::valarray<double>(10);
std::valarray<double> *b = a;
return 0;
}
is it valid to say that the member pointer of a is aliased too? And would the very existence of b hurt any optimizations that valarray methods could benefit otherwise? (Is it bad practice to point to optimized pointer containers?)
Let's first understand how aliasing hurts optimization.
Consider this code,
void
process_data(float *in, float *out, float gain, int nsamps)
{
int i;
for (i = 0; i < nsamps; i++) {
out[i] = in[i] * gain;
}
}
In C or C++, it is legal for the parameters in and out to point to overlapping regions in memory.... When the compiler optimizes the function, it does not in general know whether in and out are aliases. It must therefore assume that any store through out can affect the memory pointed to by in, which severely limits its ability to reorder or parallelize the code (For some simple cases, the compiler could analyze the entire program to determine that two pointers cannot be aliases. But in general, it is impossible for the compiler to determine whether or not two pointers are aliases, so to be safe, it must assume that they are).
Coming to your code,
#include <valarray>
int main() {
std::valarray<double> *a = new std::valarray<double>(10);
std::valarray<double> *b = a;
return 0;
}
Since a and b are aliases. The underlying storage structure used by valarray will also be aliased(I think it uses an array. Not very sure about this). So, any part of your code that uses a and b in a fashion similar to that shown above will not benefit from compiler optimizations like parallelization and reordering. Note that JUST the existence of b will not hurt optimization but how you use it.
Credits:
The quoted part and the code is take from here. This should serve as a good source for more information about the topic as well.
is it valid to say that the member pointer of a is aliased too?
Yes. For example, a->[0] and b->[0] reference the same object. That's aliasing.
And would the very existence of b hurt any optimizations that valarray methods could benefit otherwise?
No.
You haven't done anything with b in your sample code. Suppose you have a function much larger than this sample code that starts with the same construct. There's usually no problem if the first several lines of that function uses a but never b, and the remaining lines uses b but never a. Usually. (Optimizing compilers do rearrange lines of code however.)
If on the other hand you intermingle uses of a and b, you aren't hurting the optimizations. You are doing something much worse: You are invoking undefined behavior. "Don't do it" is the best solution to the undefined behavior problem.
Addendum
The C restrict and gcc __restrict__ keywords are not constraints on the developers of the compiler or the standard library. Those keywords are promises to the compiler/library that restricted data do not overlap other data. The compiler/library doesn't check whether the programmer violated this promise. If this promise enables certain optimizations that might otherwise be invalid with overlapping data, the compiler/library is free to apply those optimizations.
What this means is that restrict (or __restrict__) is a restriction on you, not the compiler. You can violate those restrictions even without your b pointer. For example, consider
*a = a->[std::slice(a.size()-1,a.size(),-1)];
This is undefined behavior.