I learned that pointer aliasing may hurt performance, and that a __restrict__ attribute (in GCC, or equivalent attributes in other implementations) may help keeping track of which pointers should or should not be aliased. Meanwhile, I also learned that GCC's implementation of valarray stores a __restrict__'ed pointer (line 517 in https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.1/valarray-source.html), which I think hints the compiler (and responsible users) that the private pointer can be assumed not to be aliased anywhere in valarray methods.
But if we alias a pointer to a valarray object, for example:
#include <valarray>
int main() {
std::valarray<double> *a = new std::valarray<double>(10);
std::valarray<double> *b = a;
return 0;
}
is it valid to say that the member pointer of a is aliased too? And would the very existence of b hurt any optimizations that valarray methods could benefit otherwise? (Is it bad practice to point to optimized pointer containers?)
Let's first understand how aliasing hurts optimization.
Consider this code,
void
process_data(float *in, float *out, float gain, int nsamps)
{
int i;
for (i = 0; i < nsamps; i++) {
out[i] = in[i] * gain;
}
}
In C or C++, it is legal for the parameters in and out to point to overlapping regions in memory.... When the compiler optimizes the function, it does not in general know whether in and out are aliases. It must therefore assume that any store through out can affect the memory pointed to by in, which severely limits its ability to reorder or parallelize the code (For some simple cases, the compiler could analyze the entire program to determine that two pointers cannot be aliases. But in general, it is impossible for the compiler to determine whether or not two pointers are aliases, so to be safe, it must assume that they are).
Coming to your code,
#include <valarray>
int main() {
std::valarray<double> *a = new std::valarray<double>(10);
std::valarray<double> *b = a;
return 0;
}
Since a and b are aliases. The underlying storage structure used by valarray will also be aliased(I think it uses an array. Not very sure about this). So, any part of your code that uses a and b in a fashion similar to that shown above will not benefit from compiler optimizations like parallelization and reordering. Note that JUST the existence of b will not hurt optimization but how you use it.
Credits:
The quoted part and the code is take from here. This should serve as a good source for more information about the topic as well.
is it valid to say that the member pointer of a is aliased too?
Yes. For example, a->[0] and b->[0] reference the same object. That's aliasing.
And would the very existence of b hurt any optimizations that valarray methods could benefit otherwise?
No.
You haven't done anything with b in your sample code. Suppose you have a function much larger than this sample code that starts with the same construct. There's usually no problem if the first several lines of that function uses a but never b, and the remaining lines uses b but never a. Usually. (Optimizing compilers do rearrange lines of code however.)
If on the other hand you intermingle uses of a and b, you aren't hurting the optimizations. You are doing something much worse: You are invoking undefined behavior. "Don't do it" is the best solution to the undefined behavior problem.
Addendum
The C restrict and gcc __restrict__ keywords are not constraints on the developers of the compiler or the standard library. Those keywords are promises to the compiler/library that restricted data do not overlap other data. The compiler/library doesn't check whether the programmer violated this promise. If this promise enables certain optimizations that might otherwise be invalid with overlapping data, the compiler/library is free to apply those optimizations.
What this means is that restrict (or __restrict__) is a restriction on you, not the compiler. You can violate those restrictions even without your b pointer. For example, consider
*a = a->[std::slice(a.size()-1,a.size(),-1)];
This is undefined behavior.
Related
I have code that calculates an array index, and if it is valid accesses that array item. Something like:
int b = rowCount() - 1;
if (b == -1) return;
const BlockInfo& bi = blockInfo[b];
I am worried that this might be triggering undefined behavior. For example, the compiler might assume that b is always non-negative, since I use it to index the array, so it will optimize the if clause away.
Under which circumstances is it safe to "access" an array out-of-bounds, when you do nothing with the invalid result? Does it change if blockInfo is not an actual array, but an container like a vector? If this is unsafe, could I fix it by putting the access in an else clause?
if (b == -1) {
return;
} else {
const BlockInfo& bi = blockInfo[b];
}
Lastly, are there compiler flags in the spirit of -fno-strict-aliasing or -fno-delete-null-pointer-checks that make the compiler "do the obvious thing" and prevent any unwanted behavior?
For clarification: My concern is specifically because of a different issue, where you intend to test whether a pointer is non-null before accessing it. The compiler turns this around and reasons that, since you are dereferencing it, it cannot have been null! Something like this (untested):
void someFunc(struct MyStruct *s) {
if (s != NULL) {
cout << s->someField << endl;
delete s;
}
}
I recall hearing that simply forming an out-of-bounds array access is UB in C++. Thus the compiler could legally assume the array index is not out of bounds, and remove checks to the contrary.
There is no access to blockInfo[-1] in your program. Your code specifically prohibits that.
For example, the compiler might assume that b is always non-negative, since I use it to index the array, so it will optimize the if clause away.
No, it cannot do that, precisely because an access to index -1 (or, rather, (std::size_t)-1) may or may not be a valid index. The language does let you pass -1 as an index; it'll just be converted first to a std::size_t with the well-defined unsigned wrap-around logic that comes with doing so. So there is not, and cannot be, any rule whereby the compiler is permitted to assume that you will never pass int -1 as an index.
Even if there were, it'd still make no sense to let the compiler completely ignore the if statement. If it could, if our if statements were not reliable, every program in the world would be unsafe! There'd be no way to enforce any of your operations' preconditions.
The compiler may only skip or re-order things when it can prove that doing so results in a well-defined program with the same behaviour as your original instructions, given any possible input.
In fact, this is where UB comes from: where proving correctness is really difficult, that's usually where the standard throws compilers a bone and says something is "undefined" and the compiler can just do whatever it likes.
One interesting example of this is kind of the opposite of your case, where a check is [erroneously] placed after the access, and the compiler therefore assumes the check passes, whether it actually did or not:
void foo(char* ptr)
{
char x = *ptr;
if (ptr)
bar();
else
baz();
}
The function foo may call bar() even if ptr is null! That might sound unlikely to you, but it actually does happen (e.g. this crash in a widely-used library).
could I fix it by putting the access in an else clause?
Those two pieces of code are semantically equivalent; it's the same program.
Lastly, are there compiler flags in the spirit of -fno-strict-aliasing or -fno-delete-null-pointer-checks that make the compiler "do the obvious thing" and prevent any unwanted behavior?
The compiler already does the obvious thing, as long as "obvious" is "according to the C++ standard".
the compiler might assume
If the compiler proceeds from a wrong assumption, then it's wrong and defective.
Under which circumstances is it safe to "access" an array out-of-bounds, when you do nothing with the invalid result?
It is never safe to access an array out of bounds, because that produces UB before you have a chance to use or not-use the result. However, an untaken branch in the code doesn't count as an access, as in your first or second examples. So, if I understand your last question, there's no need for a special flag.
Is there any advantage to specifying the MSVC/GCC non-standard __restrict qualifier on a function pointer parameter if it is the only pointer parameter? For example,
int longCalculation(int a, int* __restrict b)
My guess is it should allow better optimization since it implies b does not point to a, but all examples I've seen __restrict two pointers to indicate no aliasing between them.
As mentioned in the comments b can't point to a anyways, so there is no aliasing potential there anyways. So if the function is pure in the sense that it works only on its parameters there shouldn't be any real benefits.
However if the function uses global variables internally then __restrict might offer benefits once again, since it makes clear that b doesn't point to any of those global variables.
An interesting case might be the situation where you allocate and deallocate memory inside the function. The compiler could theoretically be sure that b doesn't point to that memory, however whether or not it realizes that I'm not sure and might depend how the allocation is called.
Personally however I prefer to keep __restrict out of the signature and do something like this
int longCalculation(int a, int* b){
assert(...);//ensure that b doesn't point to anything used
int* __restrict bx = b;
...
}
IMO this has the following advantages:
The function signature doesn't expose the non standard __restrict used
The ability to ensure that the variables actually conform to __restrict using assert, since passing aliasing pointers to a function expecting them to be nonaliasing can lead to hard to track down bugs.
I get this warning. I would like defined behavior but i would like to keep this code as it is. When may i break aliasing rules?
warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
String is my own string which is a POD. This code is called from C. S may be an int. String is pretty much struct String { RealString*s; } but templated and helper functions. I do a static assert to make sure String is a pod, is 4bytes and int is 4bytes. I also wrote an assert which checks if all pointers are >= NotAPtr. Its in my new/malloc overload. I may put that assert in String as well if you suggest
Considering the rules i am following (mainly that string is a pod and always the same size as int) would it be fine if i break aliasing rules? Is this one of the few times one is breaking it right?
void func(String s) {
auto v=*(unsigned int*)&s;
myassert(v);
if(v < NotAPtr) {
//v is an int
}
else{
//v is a ptr
}
}
memcpy is fully supported. So is punning to a char* (you can then use std::copy, for instance).
If you cannot change code to 2 functions as proposed why not (requires C99 compiler as uses uintptr_t - for older MSVC you need to define it yourself, 2008/2010 should be ok):
void f(RealString *s) {
uintptr_t int = reinterpret_cast<uintptr_t>(s);
assert(int);
}
The Standard specifies a minimal set of actions that all conforming implementations must process in predictable fashion unless they encounter translation limits (whereupon all bets are off). It does not attempt to define all of the actions that an implementation must support to be suitable for any particular purpose. Instead, support for actions beyond those mandated is treated as a Quality of Implementation issue. The authors acknowledge that an implementation could be conforming and yet be of such poor quality as to be useless.
Code such as yours should be usable for quality implementations that are intended for low-level programming, and which represent things in memory in the expected fashion. It should not be expected to be usable on other kinds of implementations, including those which interpret "quality of implementation" issues as an invitation to try to behave in poor-quality-but-conforming fashion.
The safe way of treating a variable as two different types is to turn it into a union. One part of the union can be your pointer, the other part an integer.
struct String
{
union
{
RealString*s;
int i;
};
};
EDITED and refined my question after Johannes's valuable answer
bool b = true;
volatile bool vb = true;
void f1() { }
void f2() { b = false; }
void(* volatile pf)() = &f1; //a volatile pointer to function
int main()
{
//different threads start here, some of which may change pf
while(b && vb)
{
pf();
}
}
So, let's forget synchronization for a while. The question is whether b has to be declared volatile. I have read the standard and sort-of know the formal definition of volatile semantics (I even almost understand them, the word almost being the key). But let's be a bit informal here. If the compiler sees that in the loop there is no way for b to change then unless b is volatile, it can optimize it away and assume it is equivalent to while(vb). The question is, in this case pf is itself volatile, so is the compiler allowed to assume that b won't change in the loop even if b is not volatile?
Please refrain from comments and answers which address the style of this piece of code, this is not a real-world example, this is an experimental theoretical question.
Comments and answers which, apart from answering my question, also address the semantics of volatile in greater detail which you think I have misunderstood are very much welcome.
I hope my question is clear. TIA
Editing once more:
what about this?
bool b = true;
volatile bool vb = true;
void f1() {}
void f2() {b = false;}
void (*pf) () = &f1;
#include <iosrteam>
int main()
{
//threads here
while(b && vb)
{
int x;
std::cin >> x;
if(x == 0)
pf = &f1;
else
pf = &f2;
pf();
}
}
Is there a principal difference between the two programs. If yes, what is the difference?
The question is, in this case pf is itself volatile, so is the compiler allowed to assume that b won't change in the loop even if b is not volatile?
It can't, because you say that pf might be changed by the other threads, and this indirectly changes b if pf is called then by the while loop. So while it is theoretically not required to read b normally, it in practice must read it to determine whether it should short circuit (when b becomes false it must not read vb another time).
Answer to the second part
In this case pf is not volatile anymore, so the compiler can get rid of it and see that f1 has an empty body and f2 sets b to false. It could optimize main as follows
int main()
{
// threads here (which you say can only change "vb")
while(vb)
{
int x;
std::cin >> x;
if(x != 0)
break;
}
}
Answer to older revision
One condition for the compiler to be allowed to optimize the loop away is that the loop does not access or modify any volatile object (See [stmt.iter]p5 in n3126). You do that here, so it can't optimize the loop away. In C++03 a compiler wasn't allowed to optimize even the non-volatile version of that loop away (but compilers did it anyway).
Note that another condition for being able to optimize it away is that the loop contains no synchronization or atomic operations. In a multithreaded program, such should be present anyway though. So even if you get rid of that volatile, if your program is properly coded I don't think the compiler can optimize it away entirely.
The exact requirements on volatile in the current C++ standard in a case like this are, as I understand it, not entirely well-defined by the standard, since the standard doesn't really deal with multi-threading. It's basically a compiler hint. So, instead, I'll address what happens in a typical compiler.
First, suppose the compiler is compiling your functions independently, and then linking them together. In either example, you have a loop in which you're checking a variable, and calling a function pointer. Within the context of that function, the compiler has no idea what the function behind that function pointer will do, and thus it must always re-load b from memory after calling it. Thus, volatile is irrelevant there.
Expanding that to your first actual case, and allowing the compiler to make whole-program optimizations, because pf is volatile the compiler still has no idea what it's going to be pointing at (it can't even assume it's either f1 or f2!), and thus likewise cannot make any assumptions about what will be unmodified across the function-pointer call -- and so volatile on b is still irrelevant.
Your second case is actually simpler -- vb in it is a red herring. If you eliminate that, you can see that even in completely single-threaded semantics, the function-pointer call may modify b. You're not doing anything with undefined behavior, and so the program must operate correctly without volatile -- remember that, if you aren't considering a situation with external thread tweaks, volatile is a no-op. Therefore, without vb in the picture, you cannot possibly need volatile, and it's pretty clear that adding vb changes nothing.
Thus, in summary: You don't need volatile in either case. The difference, insofar as there is one, is that in the first case if fp were not volatile, a sufficiently-advanced compiler could possibly optimize b away, whereas it cannot even without volatile anywhere in the program in the second case. In practice, I do not expect any compilers would actually make that optimization.
volatile only hurts you if you think you could have benefited from an optimization that can't be done or if it communicates something that isn't true.
In your case, you said that these variables can be changed by other threads. Reading code, that's my assumption when I see volatile, so from a maintainer's perspective, that's good -- it's giving me extra information (which is true).
I don't know whether the optimizations are worth trying to salvage since you said this isn't the real code, but if they aren't then there aren't any reasons to not use volatile.
Not using volatile when you are supposed to results in incorrect behavior, since the optimizations are changing the meaning of the code.
I worry about coding the minutia of the standard and behavior of your compilers because things like this can change and even if they don't, your code changes (which could effect the compiler) -- so, unless you are looking for micro-optimization improvements on this specific code, I'd just leave it volatile.
I got a comment to my answer on this thread:
Malloc inside a function call appears to be getting freed on return?
In short I had code like this:
int * somefunc (void)
{
int * temp = (int*) malloc (sizeof (int));
temp[0] = 0;
return temp;
}
I got this comment:
Can I just say, please don't cast the
return value of malloc? It is not
required and can hide errors.
I agree that the cast is not required in C. It is mandatory in C++, so I usually add them just in case I have to port the code in C++ one day.
However, I wonder how casts like this can hide errors. Any ideas?
Edit:
Seems like there are very good and valid arguments on both sides. Thanks for posting, folks.
It seems fitting I post an answer, since I left the comment :P
Basically, if you forget to include stdlib.h the compiler will assume malloc returns an int. Without casting, you will get a warning. With casting you won't.
So by casting you get nothing, and run the risk of suppressing legitimate warnings.
Much is written about this, a quick google search will turn up more detailed explanations.
edit
It has been argued that
TYPE * p;
p = (TYPE *)malloc(n*sizeof(TYPE));
makes it obvious when you accidentally don't allocate enough memory because say, you thought p was TYPe not TYPE, and thus we should cast malloc because the advantage of this method overrides the smaller cost of accidentally suppressing compiler warnings.
I would like to point out 2 things:
you should write p = malloc(sizeof(*p)*n); to always ensure you malloc the right amount of space
with the above approach, you need to make changes in 3 places if you ever change the type of p: once in the declaration, once in the malloc, and once in the cast.
In short, I still personally believe there is no need for casting the return value of malloc and it is certainly not best practice.
This question is tagged both for C and C++, so it has at least two answers, IMHO:
C
Ahem... Do whatever you want.
I believe the reason given above "If you don't include "stdlib" then you won't get a warning" is not a valid one because one should not rely on this kind of hacks to not forget to include an header.
The real reason that could make you not write the cast is that the C compiler already silently cast a void * into whatever pointer type you want, and so, doing it yourself is overkill and useless.
If you want to have type safety, you can either switch to C++ or write your own wrapper function, like:
int * malloc_Int(size_t p_iSize) /* number of ints wanted */
{
return malloc(sizeof(int) * p_iSize) ;
}
C++
Sometimes, even in C++, you have to make profit of the malloc/realloc/free utils. Then you'll have to cast. But you already knew that. Using static_cast<>() will be better, as always, than C-style cast.
And in C, you could override malloc (and realloc, etc.) through templates to achieve type-safety:
template <typename T>
T * myMalloc(const size_t p_iSize)
{
return static_cast<T *>(malloc(sizeof(T) * p_iSize)) ;
}
Which would be used like:
int * p = myMalloc<int>(25) ;
free(p) ;
MyStruct * p2 = myMalloc<MyStruct>(12) ;
free(p2) ;
and the following code:
// error: cannot convert ‘int*’ to ‘short int*’ in initialization
short * p = myMalloc<int>(25) ;
free(p) ;
won't compile, so, no problemo.
All in all, in pure C++, you now have no excuse if someone finds more than one C malloc inside your code...
:-)
C + C++ crossover
Sometimes, you want to produce code that will compile both in C and in C++ (for whatever reasons... Isn't it the point of the C++ extern "C" {} block?). In this case, C++ demands the cast, but C won't understand the static_cast keyword, so the solution is the C-style cast (which is still legal in C++ for exactly this kind of reasons).
Note that even with writing pure C code, compiling it with a C++ compiler will get you a lot more warnings and errors (for example attempting to use a function without declaring it first won't compile, unlike the error mentioned above).
So, to be on the safe side, write code that will compile cleanly in C++, study and correct the warnings, and then use the C compiler to produce the final binary. This means, again, write the cast, in a C-style cast.
One possible error it can introduce is if you are compiling on a 64-bit system using C (not C++).
Basically, if you forget to include stdlib.h, the default int rule will apply. Thus the compiler will happily assume that malloc has the prototype of int malloc(); On Many 64-bit systems an int is 32-bits and a pointer is 64-bits.
Uh oh, the value gets truncated and you only get the lower 32-bits of the pointer! Now if you cast the return value of malloc, this error is hidden by the cast. But if you don't you will get an error (something to the nature of "cannot convert int to T *").
This does not apply to C++ of course for 2 reasons. Firstly, it has no default int rule, secondly it requires the cast.
All in all though, you should just new in c++ code anyway :-P.
Well, I think it's the exact opposite - always directly cast it to the needed type. Read on here!
The "forgot stdlib.h" argument is a straw man. Modern compilers will detect and warn of the problem (gcc -Wall).
You should always cast the result of malloc immediately. Not doing so should be considered an error, and not just because it will fail as C++. If you're targeting a machine architecture with different kinds of pointers, for example, you could wind up with a very tricky bug if you don't put in the cast.
Edit: The commentor Evan Teran is correct. My mistake was thinking that the compiler didn't have to do any work on a void pointer in any context. I freak when I think of FAR pointer bugs, so my intuition is to cast everything. Thanks Evan!
Actually, the only way a cast could hide an error is if you were converting from one datatype to an smaller datatype and lost data, or if you were converting pears to apples. Take the following example:
int int_array[10];
/* initialize array */
int *p = &(int_array[3]);
short *sp = (short *)p;
short my_val = *sp;
in this case the conversion to short would be dropping some data from the int. And then this case:
struct {
/* something */
} my_struct[100];
int my_int_array[100];
/* initialize array */
struct my_struct *p = &(my_int_array[99]);
in which you'd end up pointing to the wrong kind of data, or even to invalid memory.
But in general, and if you know what you are doing, it's OK to do the casting. Even more so when you are getting memory from malloc, which happens to return a void pointer which you can't use at all unless you cast it, and most compilers will warn you if you are casting to something the lvalue (the value to the left side of the assignment) can't take anyway.
#if CPLUSPLUS
#define MALLOC_CAST(T) (T)
#else
#define MALLOC_CAST(T)
#endif
...
int * p;
p = MALLOC_CAST(int *) malloc(sizeof(int) * n);
or, alternately
#if CPLUSPLUS
#define MYMALLOC(T, N) static_cast<T*>(malloc(sizeof(T) * N))
#else
#define MYMALLOC(T, N) malloc(sizeof(T) * N)
#endif
...
int * p;
p = MYMALLOC(int, n);
People have already cited the reasons I usually trot out: the old (no longer applicable to most compilers) argument about not including stdlib.h and using sizeof *p to make sure the types and sizes always match regardless of later updating. I do want to point out one other argument against casting. It's a small one, but I think it applies.
C is fairly weakly typed. Most safe type conversions happen automatically, and most unsafe ones require a cast. Consider:
int from_f(float f)
{
return *(int *)&f;
}
That's dangerous code. It's technically undefined behavior, though in practice it's going to do the same thing on nearly every platform you run it on. And the cast helps tell you "This code is a terrible hack."
Consider:
int *p = (int *)malloc(sizeof(int) * 10);
I see a cast, and I wonder, "Why is this necessary? Where is the hack?" It raises hairs on my neck that there's something evil going on, when in fact the code is completely harmless.
As long as we're using C, casts (especially pointer casts) are a way of saying "There's something evil and easily breakable going on here." They may accomplish what you need accomplished, but they indicate to you and future maintainers that the kids aren't alright.
Using casts on every malloc diminishes the "hack" indication of pointer casting. It makes it less jarring to see things like *(int *)&f;.
Note: C and C++ are different languages. C is weakly typed, C++ is more strongly typed. The casts are necessary in C++, even though they don't indicate a hack at all, because of (in my humble opinion) the unnecessarily strong C++ type system. (Really, this particular case is the only place I think the C++ type system is "too strong," but I can't think of any place where it's "too weak," which makes it overall too strong for my tastes.)
If you're worried about C++ compatibility, don't. If you're writing C, use a C compiler. There are plenty really good ones avaliable for every platform. If, for some inane reason, you have to write C code that compiles cleanly as C++, you're not really writing C. If you need to port C to C++, you should be making lots of changes to make your C code more idiomatic C++.
If you can't do any of that, your code won't be pretty no matter what you do, so it doesn't really matter how you decide to cast at that point. I do like the idea of using templates to make a new allocator that returns the correct type, although that's basically just reinventing the new keyword.
Casting a function which returns (void *) to instead be an (int *) is harmless: you're casting one type of pointer to another.
Casting a function which returns an integer to instead be a pointer is most likely incorrect. The compiler would have flagged it had you not explicitly cast it.
One possible error could (depending on this is whether what you really want or not) be mallocing with one size scale, and assigning to a pointer of a different type. E.g.,
int *temp = (int *)malloc(sizeof(double));
There may be cases where you want to do this, but I suspect that they are rare.
I think you should put the cast in. Consider that there are three locations for types:
T1 *p;
p = (T2*) malloc(sizeof(T3));
The two lines of code might be widely separated. Therefore it's good that the compiler will enforce that T1 == T2. It is easier to visually verify that T2 == T3.
If you miss out the T2 cast, then you have to hope that T1 == T3.
On the other hand you have the missing stdlib.h argument - but I think it's less likely to be a problem.
On the other hand, if you ever need to port the code to C++, it is much better to use the 'new' operator.