GCC: program doesn't work with compilation option -O3 - c++

I'm writing a C++ program that doesn't work (I get a segmentation fault) when I compile it with optimizations (options -O1, -O2, -O3, etc.), but it works just fine when I compile it without optimizations.
Is there any chance that the error is in my code? or should I assume that this is a bug in GCC?
My GCC version is 3.4.6.
Is there any known workaround for this kind of problem?
There is a big difference in speed between the optimized and unoptimized version of my program, so I really need to use optimizations.
This is my original functor. The one that works fine with no levels of optimizations and throws a segmentation fault with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
return distance(point,p1) < distance(point,p2) ;
}
} ;
And this one works flawlessly with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
float d1=distance(point,p1) ;
float d2=distance(point,p2) ;
std::cout << "" ; //without this line, I get a segmentation fault anyways
return d1 < d2 ;
}
} ;
Unfortunately, this problem is hard to reproduce because it happens with some specific values. I get the segmentation fault upon sorting just one out of more than a thousand vectors, so it really depends on the specific combination of values each vector has.

Now that you posted the code fragment and a working workaround was found (#Windows programmer's answer), I can say that perhaps what you are looking for is -ffloat-store.
-ffloat-store
Do not store floating point variables in registers, and inhibit other options that might change whether a floating point value is taken from a register or memory.
This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables.
Source: http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Optimize-Options.html

I would assume your code is wrong first.
Though it is hard to tell.
Does your code compile with 0 warnings?
g++ -Wall -Wextra -pedantic -ansi

Here's some code that seems to work, until you hit -O3...
#include <stdio.h>
int main()
{
int i = 0, j = 1, k = 2;
printf("%d %d %d\n", *(&j-1), *(&j), *(&j+1));
return 0;
}
Without optimisations, I get "2 1 0"; with optimisations I get "40 1 2293680". Why? Because i and k got optimised out!
But I was taking the address of j and going out of the memory region allocated to j. That's not allowed by the standard. It's most likely that your problem is caused by a similar deviation from the standard.
I find valgrind is often helpful at times like these.
EDIT: Some commenters are under the impression that the standard allows arbitrary pointer arithmetic. It does not. Remember that some architectures have funny addressing schemes, alignment may be important, and you may get problems if you overflow certain registers!
The words of the [draft] standard, on adding/subtracting an integer to/from a pointer (emphasis added):
"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
Seeing as &j doesn't even point to an array object, &j-1 and &j+1 can hardly point to part of the same array object. So simply evaluating &j+1 (let alone dereferencing it) is undefined behaviour.
On x86 we can be pretty confident that adding one to a pointer is fairly safe and just takes us to the next memory location. In the code above, the problem occurs when we make assumptions about what that memory contains, which of course the standard doesn't go near.

As an experiment, try to see if this will force the compiler to round everything consistently.
volatile float d1=distance(point,p1) ;
volatile float d2=distance(point,p2) ;
return d1 < d2 ;

The error is in your code. It's likely you're doing something that invokes undefined behavior according to the C standard which just happens to work with no optimizations, but when GCC makes certain assumptions for performing its optimizations, the code breaks when those assumptions aren't true. Make sure to compile with the -Wall option, and the -Wextra might also be a good idea, and see if you get any warnings. You could also try -ansi or -pedantic, but those are likely to result in false positives.

You may be running into an aliasing problem (or it could be a million other things). Look up the -fstrict-aliasing option.
This kind of question is impossible to answer properly without more information.

It is very seldom the compiler fault, but compiler do have bugs in them, and them often manifest themselves at different optimization levels (if there is a bug in an optimization pass, for example).
In general when reporting programming problems: provide a minimal code sample to demonstrate the issue, such that people can just save the code to a file, compile and run it. Make it as easy as possible to reproduce your problem.
Also, try different versions of GCC (compiling your own GCC is very easy, especially on Linux). If possible, try with another compiler. Intel C has a compiler which is more or less GCC compatible (and free for non-commercial use, I think). This will help pinpointing the problem.

It's almost (almost) never the compiler.
First, make sure you're compiling warning-free, with -Wall.
If that didn't give you a "eureka" moment, attach a debugger to the least optimized version of your executable that crashes and see what it's doing and where it goes.
5 will get you 10 that you've fixed the problem by this point.

Ran into the same problem a few days ago, in my case it was aliasing. And GCC does it differently, but not wrongly, when compared to other compilers. GCC has become what some might call a rules-lawyer of the C++ standard, and their implementation is correct, but you also have to be really correct in you C++, or it'll over optimize somethings, which is a pain. But you get speed, so can't complain.

I expect to get some downvotes here after reading some of the comments, but in the console game programming world, it's rather common knowledge that the higher optimization levels can sometimes generate incorrect code in weird edge cases. It might very well be that edge cases can be fixed with subtle changes to the code, though.

Alright...
This is one of the weirdest problems I've ever had.
I dont think I have enough proof to state it's a GCC bug, but honestly... It really looks like one.
This is my original functor. The one that works fine with no levels of optimizations and throws a segmentation fault with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
return distance(point,p1) < distance(point,p2) ;
}
} ;
And this one works flawlessly with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
float d1=distance(point,p1) ;
float d2=distance(point,p2) ;
std::cout << "" ; //without this line, I get a segmentation fault anyways
return d1 < d2 ;
}
} ;
Unfortunately, this problem is hard to reproduce because it happens with some specific values. I get the segmentation fault upon sorting just one out of more than a thousand vectors, so it really depends on the specific combination of values each vector has.

Wow, I didn't expect answers so quicly, and so many...
The error occurs upon sorting a std::vector of pointers using std::sort()
I provide the strict-weak-ordering functor.
But I know the functor I provide is correct because I've used it a lot and it works fine.
Plus, the error cannot be some invalid pointer in the vector becasue the error occurs just when I sort the vector. If I iterate through the vector without applying std::sort first, the program works fine.
I just used GDB to try to find out what's going on. The error occurs when std::sort invoke my functor. Aparently std::sort is passing an invalid pointer to my functor. (of course this happens with the optimized version only, any level of optimization -O, -O2, -O3)

as other have pointed out, probably strict aliasing.
turn it of in o3 and try again. My guess is that you are doing some pointer tricks in your functor (fast float as int compare? object type in lower 2 bits?) that fail across inlining template functions.
warnings do not help to catch this case. "if the compiler could detect all strict aliasing problems it could just as well avoid them" just changing an unrelated line of code may make the problem appear or go away as it changes register allocation.

As the updated question will show ;) , the problem exists with a std::vector<T*>. One common error with vectors is reserve()ing what should have been resize()d. As a result, you'd be writing outside array bounds. An optimizer may discard those writes.

post the code in distance! it probably does some pointer magic, see my previous post. doing an intermediate assignment just hides the bug in your code by changing register allocation. even more telling of this is the output changing things!

The true answer is hidden somewhere inside all the comments in this thread. First of all: it is not a bug in the compiler.
The problem has to do with floating point precision. distanceToPointSort should be a function that should never return true for both the arguments (a,b) and (b,a), but that is exactly what can happen when the compiler decides to use higher precision for some data paths. The problem is especially likely on, but by no means limited to, x86 without -mfpmath=sse. If the comparator behaves that way, the sort function can become confused, and the segmentation fault is not surprising.
I consider -ffloat-store the best solution here (already suggested by CesarB).

Related

How much do C/C++ compilers optimize conditional statements?

I recently ran into a situation where I wrote the following code:
for(int i = 0; i < (size - 1); i++)
{
// do whatever
}
// Assume 'size' will be constant during the duration of the for loop
When looking at this code, it made me wonder how exactly the for loop condition is evaluated for each loop. Specifically, I'm curious as to whether or not the compiler would 'optimize away' any additional arithmetic that has to be done for each loop. In my case, would this code get compiled such that (size - 1) would have to be evaluated for every loop iteration? Or is the compiler smart enough to realize that the 'size' variable won't change, thus it could precalculate it for each loop iteration.
This then got me thinking about the general case where you have a conditional statement that may specify more operations than necessary.
As an example, how would the following two pieces of code compile:
if(6)
if(1+1+1+1+1+1)
int foo = 1;
if(foo + foo + foo + foo + foo + foo)
How smart is the compiler? Will the 3 cases listed above be converted into the same machine code?
And while I'm at, why not list another example. What does the compiler do if you are doing an operation within a conditional that won't have any effect on the end result? Example:
if(2*(val))
// Assume val is an int that can take on any value
In this example, the multiplication is completely unnecessary. While this case seems a lot stupider than my original case, the question still stands: will the compiler be able to remove this unnecessary multiplication?
Question:
How much optimization is involved with conditional statements?
Does it vary based on compiler?
Short answer: the compiler is exceptionally clever, and will generally optimise those cases that you have presented (including utterly ignoring irrelevant conditions).
One of the biggest hurdles language newcomers face in terms of truly understanding C++, is that there is not a one-to-one relationship between their code and what the computer executes. The entire purpose of the language is to create an abstraction. You are defining the program's semantics, but the computer has no responsibility to actually follow your C++ code line by line; indeed, if it did so, it would be abhorrently slow as compared to the speed we can expect from modern computers.
Generally speaking, unless you have a reason to micro-optimise (game developers come to mind), it is best to almost completely ignore this facet of programming, and trust your compiler. Write a program that takes the inputs you want, and gives the outputs you want, after performing the calculations you want… and let your compiler do the hard work of figuring out how the physical machine is going to make all that happen.
Are there exceptions? Certainly. Sometimes your requirements are so specific that you do know better than the compiler, and you end up optimising. You generally do this after profiling and determining what your bottlenecks are. And there's also no excuse to write deliberately silly code. After all, if you go out of your way to ask your program to copy a 50MB vector, then it's going to copy a 50MB vector.
But, assuming sensible code that means what it looks like, you really shouldn't spend too much time worrying about this. Because modern compilers are so good at optimising, that you'd be a fool to try to keep up.
The C++ language specification permits the compiler to make any optimization that results in no observable changes to the expected results.
If the compiler can determine that size is constant and will not change during execution, it can certainly make that particular optimization.
Alternatively, if the compiler can also determine that i is not used in the loop (and its value is not used afterwards), that it is used only as a counter, it might very well rewrite the loop to:
for(int i = 1; i < size; i++)
because that might produce smaller code. Even if this i is used in some fashion, the compiler can still make this change and then adjust all other usage of i so that the observable results are still the same.
To summarize: anything goes. The compiler may or may not make any optimization change as long as the observable results are the same.
Yes, there is a lot of optimization, and it is very complex.
It varies based on the compiler, and it also varies based on the compiler options
Check
https://meta.stackexchange.com/questions/25840/can-we-stop-recommending-the-dragon-book-please
for some book recomendations if you really want to understand what a compiler may do. It is a very complex subject.
You can also compile to assembly with the -S option (gcc / g++) to see what the compiler is really doing. Use -O3 / ... / -O0 / -O to experiment with different optimization levels.

Is uninitialized local variable the fastest random number generator?

I know the uninitialized local variable is undefined behaviour(UB), and also the value may have trap representations which may affect further operation, but sometimes I want to use the random number only for visual representation and will not further use them in other part of program, for example, set something with random color in a visual effect, for example:
void updateEffect(){
for(int i=0;i<1000;i++){
int r;
int g;
int b;
star[i].setColor(r%255,g%255,b%255);
bool isVisible;
star[i].setVisible(isVisible);
}
}
is it that faster than
void updateEffect(){
for(int i=0;i<1000;i++){
star[i].setColor(rand()%255,rand()%255,rand()%255);
star[i].setVisible(rand()%2==0?true:false);
}
}
and also faster than other random number generator?
As others have noted, this is Undefined Behavior (UB).
In practice, it will (probably) actually (kind of) work. Reading from an uninitialized register on x86[-64] architectures will indeed produce garbage results, and probably won't do anything bad (as opposed to e.g. Itanium, where registers can be flagged as invalid, so that reads propagate errors like NaN).
There are two main problems though:
It won't be particularly random. In this case, you're reading from the stack, so you'll get whatever was there previously. Which might be effectively random, completely structured, the password you entered ten minutes ago, or your grandmother's cookie recipe.
It's Bad (capital 'B') practice to let things like this creep into your code. Technically, the compiler could insert reformat_hdd(); every time you read an undefined variable. It won't, but you shouldn't do it anyway. Don't do unsafe things. The fewer exceptions you make, the safer you are from accidental mistakes all the time.
The more pressing issue with UB is that it makes your entire program's behavior undefined. Modern compilers can use this to elide huge swaths of your code or even go back in time. Playing with UB is like a Victorian engineer dismantling a live nuclear reactor. There's a zillion things to go wrong, and you probably won't know half of the underlying principles or implemented technology. It might be okay, but you still shouldn't let it happen. Look at the other nice answers for details.
Also, I'd fire you.
Let me say this clearly: we do not invoke undefined behavior in our programs. It is never ever a good idea, period. There are rare exceptions to this rule; for example, if you are a library implementer implementing offsetof. If your case falls under such an exception you likely know this already. In this case we know using uninitialized automatic variables is undefined behavior.
Compilers have become very aggressive with optimizations around undefined behavior and we can find many cases where undefined behavior has lead to security flaws. The most infamous case is probably the Linux kernel null pointer check removal which I mention in my answer to C++ compilation bug? where a compiler optimization around undefined behavior turned a finite loop into an infinite one.
We can read CERT's Dangerous Optimizations and the Loss of Causality (video) which says, amongst other things:
Increasingly, compiler writers are taking advantage of undefined
behaviors in the C and C++ programming languages to improve
optimizations.
Frequently, these optimizations are interfering with
the ability of developers to perform cause-effect analysis on their
source code, that is, analyzing the dependence of downstream results
on prior results.
Consequently, these optimizations are eliminating
causality in software and are increasing the probability of software
faults, defects, and vulnerabilities.
Specifically with respect to indeterminate values, the C standard defect report 451: Instability of uninitialized automatic variables makes for some interesting reading. It has not been resolved yet but introduces the concept of wobbly values which means the indeterminatness of a value may propagate through the program and can have different indeterminate values at different points in the program.
I don't know of any examples where this happens but at this point we can't rule it out.
Real examples, not the result you expect
You are unlikely to get random values. A compiler could optimize the away the loop altogether. For example, with this simplified case:
void updateEffect(int arr[20]){
for(int i=0;i<20;i++){
int r ;
arr[i] = r ;
}
}
clang optimizes it away (see it live):
updateEffect(int*): # #updateEffect(int*)
retq
or perhaps get all zeros, as with this modified case:
void updateEffect(int arr[20]){
for(int i=0;i<20;i++){
int r ;
arr[i] = r%255 ;
}
}
see it live:
updateEffect(int*): # #updateEffect(int*)
xorps %xmm0, %xmm0
movups %xmm0, 64(%rdi)
movups %xmm0, 48(%rdi)
movups %xmm0, 32(%rdi)
movups %xmm0, 16(%rdi)
movups %xmm0, (%rdi)
retq
Both of these cases are perfectly acceptable forms of undefined behavior.
Note, if we are on an Itanium we could end up with a trap value:
[...]if the register happens to hold a special not-a-thing value,
reading the register traps except for a few instructions[...]
Other important notes
It is interesting to note the variance between gcc and clang noted in the UB Canaries project over how willing they are to take advantage of undefined behavior with respect to uninitialized memory. The article notes (emphasis mine):
Of course we need to be completely clear with ourselves that any such expectation has nothing to do with the language standard and everything to do with what a particular compiler happens to do, either because the providers of that compiler are unwilling to exploit that UB or just because they have not gotten around to exploiting it yet. When no real guarantee from the compiler provider exists, we like to say that as-yet unexploited UBs are time bombs: they’re waiting to go off next month or next year when the compiler gets a bit more aggressive.
As Matthieu M. points out What Every C Programmer Should Know About Undefined Behavior #2/3 is also relevant to this question. It says amongst other things (emphasis mine):
The important and scary thing to realize is that just about any
optimization based on undefined behavior can start being triggered on
buggy code at any time in the future. Inlining, loop unrolling, memory
promotion and other optimizations will keep getting better, and a
significant part of their reason for existing is to expose secondary
optimizations like the ones above.
To me, this is deeply dissatisfying, partially because the compiler
inevitably ends up getting blamed, but also because it means that huge
bodies of C code are land mines just waiting to explode.
For completeness sake I should probably mention that implementations can choose to make undefined behavior well defined, for example gcc allows type punning through unions while in C++ this seems like undefined behavior. If this is the case the implementation should document it and this will usually not be portable.
No, it's terrible.
The behaviour of using an uninitialised variable is undefined in both C and C++, and it's very unlikely that such a scheme would have desirable statistical properties.
If you want a "quick and dirty" random number generator, then rand() is your best bet. In its implementation, all it does is a multiplication, an addition, and a modulus.
The fastest generator I know of requires you to use a uint32_t as the type of the pseudo-random variable I, and use
I = 1664525 * I + 1013904223
to generate successive values. You can choose any initial value of I (called the seed) that takes your fancy. Obviously you can code that inline. The standard-guaranteed wraparound of an unsigned type acts as the modulus. (The numeric constants are hand-picked by that remarkable scientific programmer Donald Knuth.)
Good question!
Undefined does not mean it's random. Think about it, the values you'd get in global uninitialized variables were left there by the system or your/other applications running. Depending what your system does with no longer used memory and/or what kind of values the system and applications generate, you may get:
Always the same.
Be one of a small set of values.
Get values in one or more small ranges.
See many values dividable by 2/4/8 from pointers on 16/32/64-bit system
...
The values you'll get completely depend on which non-random values are left by the system and/or applications. So, indeed there will be some noise (unless your system wipes no longer used memory), but the value pool from which you'll draw will by no means be random.
Things get much worse for local variables because these come directly from the stack of your own program. There is a very good chance that your program will actually write these stack locations during the execution of other code. I estimate the chances for luck in this situation very low, and a 'random' code change you make tries this luck.
Read about randomness. As you'll see randomness is a very specific and hard to obtain property. It's a common mistake to think that if you just take something that's hard to track (like your suggestion) you'll get a random value.
Many good answers, but allow me to add another and stress the point that in a deterministic computer, nothing is random. This is true for both the numbers produced by an pseudo-RNG and the seemingly "random" numbers found in areas of memory reserved for C/C++ local variables on the stack.
BUT... there is a crucial difference.
The numbers generated by a good pseudorandom generator have the properties that make them statistically similar to truly random draws. For instance, the distribution is uniform. The cycle length is long: you can get millions of random numbers before the cycle repeats itself. The sequence is not autocorrelated: for instance, you will not begin to see strange patterns emerge if you take every 2nd, 3rd, or 27th number, or if you look at specific digits in the generated numbers.
In contrast, the "random" numbers left behind on the stack have none of these properties. Their values and their apparent randomness depend entirely on how the program is constructed, how it is compiled, and how it is optimized by the compiler. By way of example, here is a variation of your idea as a self-contained program:
#include <stdio.h>
notrandom()
{
int r, g, b;
printf("R=%d, G=%d, B=%d", r&255, g&255, b&255);
}
int main(int argc, char *argv[])
{
int i;
for (i = 0; i < 10; i++)
{
notrandom();
printf("\n");
}
return 0;
}
When I compile this code with GCC on a Linux machine and run it, it turns out to be rather unpleasantly deterministic:
R=0, G=19, B=0
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
If you looked at the compiled code with a disassembler, you could reconstruct what was going on, in detail. The first call to notrandom() used an area of the stack that was not used by this program previously; who knows what was in there. But after that call to notrandom(), there is a call to printf() (which the GCC compiler actually optimizes to a call to putchar(), but never mind) and that overwrites the stack. So the next and subsequent times, when notrandom() is called, the stack will contain stale data from the execution of putchar(), and since putchar() is always called with the same arguments, this stale data will always be the same, too.
So there is absolutely nothing random about this behavior, nor do the numbers obtained this way have any of the desirable properties of a well-written pseudorandom number generator. In fact, in most real-life scenarios, their values will be repetitive and highly correlated.
Indeed, as others, I would also seriously consider firing someone who tried to pass off this idea as a "high performance RNG".
Undefined behavior means that the authors of compilers are free to ignore the problem because programmers will never have a right to complain whatever happens.
While in theory when entering UB land anything can happen (including a daemon flying off your nose) what normally means is that compiler authors just won't care and, for local variables, the value will be whatever is in the stack memory at that point.
This also means that often the content will be "strange" but fixed or slightly random or variable but with a clear evident pattern (e.g. increasing values at each iteration).
For sure you cannot expect it being a decent random generator.
Undefined behaviour is undefined. It doesn't mean that you get an undefined value, it means that the the program can do anything and still meet the language specification.
A good optimizing compiler should take
void updateEffect(){
for(int i=0;i<1000;i++){
int r;
int g;
int b;
star[i].setColor(r%255,g%255,b%255);
bool isVisible;
star[i].setVisible(isVisible);
}
}
and compile it to a noop. This is certainly faster than any alternative. It has the downside that it will not do anything, but such is the downside of undefined behaviour.
Not mentioned yet, but code paths that invoke undefined behavior are allowed to do whatever the compiler wants, e.g.
void updateEffect(){}
Which is certainly faster than your correct loop, and because of UB, is perfectly conformant.
Because of security reasons, new memory assigned to a program has to be cleaned, otherwise the information could be used, and passwords could leak from one application into another. Only when you reuse memory, you get different values than 0. And it is very likely, that on a stack the previous value is just fixed, because the previous use of that memory is fixed.
Your particular code example would probably not do what you are expecting. While technically each iteration of the loop re-creates the local variables for the r, g, and b values, in practice it's the exact same memory space on the stack. Hence it won't get re-randomized with each iteration, and you will end up assigning the same 3 values for each of the 1000 colors, regardless of how random the r, g, and b are individually and initially.
Indeed, if it did work, I would be very curious as to what's re-randomizing it. The only thing I can think of would be an interleaved interrupt that piggypacked atop that stack, highly unlikely. Perhaps internal optimization that kept those as register variables rather than as true memory locations, where the registers get re-used further down in the loop, would do the trick, too, especially if the set visibility function is particularly register-hungry. Still, far from random.
As most of people here mentioned undefined behavior. Undefined also means that you may get some valid integer value (luckily) and in this case this will be faster (as rand function call is not made).
But don't practically use it. I am sure this will terrible results as luck is not with you all the time.
Really bad! Bad habit, bad result.
Consider:
A_Function_that_use_a_lot_the_Stack();
updateEffect();
If the function A_Function_that_use_a_lot_the_Stack() make always the same initialization it leaves the stack with the same data on it. That data is what we get calling updateEffect(): always same value!.
I performed a very simple test, and it wasn't random at all.
#include <stdio.h>
int main() {
int a;
printf("%d\n", a);
return 0;
}
Every time I ran the program, it printed the same number (32767 in my case) -- you can't get much less random than that. This is presumably whatever the startup code in the runtime library left on the stack. Since it uses the same startup code every time the program runs, and nothing else varies in the program between runs, the results are perfectly consistent.
You need to have a definition of what you mean by 'random'.
A sensible definition involves that the values you get should have little correlation. That's something you can measure. It's also not trivial to achieve in a controlled, reproducible manner. So undefined behaviour is certainly not what you are looking for.
There are certain situations in which uninitialized memory may be safely read using type "unsigned char*" [e.g. a buffer returned from malloc]. Code may read such memory without having to worry about the compiler throwing causality out the window, and there are times when it may be more efficient to have code be prepared for anything memory might contain than to ensure that uninitialized data won't be read (a commonplace example of this would be using memcpy on partially-initialized buffer rather than discretely copying all of the elements that contain meaningful data).
Even in such cases, however, one should always assume that if any combination of bytes will be particularly vexatious, reading it will always yield that pattern of bytes (and if a certain pattern would be vexatious in production, but not in development, such a pattern won't appear until code is in production).
Reading uninitialized memory might be useful as part of a random-generation strategy in an embedded system where one can be sure the memory has never been written with substantially-non-random content since the last time the system was powered on, and if the manufacturing process used for the memory causes its power-on state to vary in semi-random fashion. Code should work even if all devices always yield the same data, but in cases where e.g. a group of nodes each need to select arbitrary unique IDs as quickly as possible, having a "not very random" generator which gives half the nodes the same initial ID might be better than not having any initial source of randomness at all.
As others have said, it will be fast, but not random.
What most compilers will do for local variables is to grab some space for them on the stack, but not bother setting it to anything (the standard says they don't need to, so why slow down the code you're generating?).
In this case, the value you'll get will depend on what was on previously on the stack - if you call a function before this one that has a hundred local char variables all set to 'Q' and then call you're function after that returns, then you'll probably find your "random" values behave as if you've memset() them all to 'Q's.
Importantly for your example function trying to use this, these values wont change each time you read them, they'll be the same every time. So you'll get a 100 stars all set to the same colour and visibility.
Also, nothing says that the compiler shouldn't initialize these value - so a future compiler might do so.
In general: bad idea, don't do it.
(like a lot of "clever" code level optimizations really...)
As others have already mentioned, this is undefined behavior (UB), but it may "work".
Except from problems already mentioned by others, I see one other problem (disadvantage) - it will not work in any language other than C and C++. I know that this question is about C++, but if you can write code which will be good C++ and Java code and it's not a problem then why not? Maybe some day someone will have to port it to other language and searching for bugs caused by "magic tricks" UB like this definitely will be a nightmare (especially for an inexperienced C/C++ developer).
Here there is question about another similar UB. Just imagine yourself trying to find bug like this without knowing about this UB. If you want to read more about such strange things in C/C++, read answers for question from link and see this GREAT slideshow. It will help you understand what's under the hood and how it's working; it's not not just another slideshow full of "magic". I'm quite sure that even most of experienced C/c++ programmers can learn a lot from this.
Not a good idea to rely our any logic on language undefined behaviour. In addition to whatever mentioned/discussed in this post, I would like to mention that with modern C++ approach/style such program may not be compile.
This was mentioned in my previous post which contains the advantage of auto feature and useful link for the same.
https://stackoverflow.com/a/26170069/2724703
So, if we change the above code and replace the actual types with auto, the program would not even compile.
void updateEffect(){
for(int i=0;i<1000;i++){
auto r;
auto g;
auto b;
star[i].setColor(r%255,g%255,b%255);
auto isVisible;
star[i].setVisible(isVisible);
}
}
I like your way of thinking. Really outside the box. However the tradeoff is really not worth it. Memory-runtime tradeoff is a thing, including undefined behavior for runtime is not.
It must give you a very unsettling feeling to know you are using such "random" as your business logic. I woudn't do it.
Use 7757 every place you are tempted to use uninitialized variables. I picked it randomly from a list of prime numbers:
it is defined behavior
it is guaranteed to not always be 0
it is prime
it is likely to be as statistically random as uninitualized
variables
it is likely to be faster than uninitialized variables since its
value is known at compile time
There is one more possibility to consider.
Modern compilers (ahem g++) are so intelligent that they go through your code to see what instructions affect state, and what don't, and if an instruction is guaranteed to NOT affect the state, g++ will simply remove that instruction.
So here's what will happen. g++ will definitely see that you are reading, performing arithmetic on, saving, what is essentially a garbage value, which produces more garbage. Since there is no guarantee that the new garbage is any more useful than the old one, it will simply do away with your loop. BLOOP!
This method is useful, but here's what I would do. Combine UB (Undefined Behaviour) with rand() speed.
Of course, reduce rand()s executed, but mix them in so compiler doesn't do anything you don't want it to.
And I won't fire you.
Using uninitialized data for randomness is not necessarily a bad thing if done properly. In fact, OpenSSL does exactly this to seed its PRNG.
Apparently this usage wasn't well documented however, because someone noticed Valgrind complaining about using uninitialized data and "fixed" it, causing a bug in the PRNG.
So you can do it, but you need to know what you're doing and make sure that anyone reading your code understands this.

How to store doubles in memory

Recently I changed some code
double d0, d1;
// ... assign things to d0/d1 ...
double result = f(d0, d1)
to
double d[2];
// ... assign things to d[0]/d[1]
double result = f(d[0], d[1]);
I did not change any of the assignments to d, nor the calculations in f, nor anything else apart from the fact that the doubles are now stored in a fixed-length array.
However when compiling in release mode, with optimizations on, result changed.
My question is, why, and what should I know about how I should store doubles? Is one way more efficient, or better, than the other? Are there memory alignment issues? I'm looking for any information that would help me understand what's going on.
EDIT: I will try to get some code demonstrating the problem, however this is quite hard as the process that these numbers go through is huge (a lot of maths, numerical solvers, etc.).
However there is no change when compiled in Debug. I will double check this again to make sure but this is almost certain, i.e. the double values are identical in Debug between version 1 and version 2.
Comparing Debug to Release, results have never ever been the same between the two compilation modes, for various optimization reasons.
You probably have a 'fast math' compiler switch turned on, or are doing something in the "assign things" (which we can't see) which allows the compiler to legally reorder calculations. Even though the sequences are equivalent, it's likely the optimizer is treating them differently, so you end up with slightly different code generation. If it's reordered, you end up with slight differences in the least significant bits. Such is life with floating point.
You can prevent this by not using 'fast math' (if that's turned on), or forcing ordering thru the way you construct the formulas and intermediate values. Even that's hard (impossible?) to guarantee. The question is really "Why is the compiler generating different code for arrays vs numbered variables?", but that's basically an analysis of the code generator.
no these are equivalent - you have something else wrong.
Check the /fp:precise flags (or equivalent) the processor floating point hardware can run in more accuracy or more speed mode - it may have a different default in an optimized build
With regard to floating-point semantics, these are equivalent. However, it is conceivable that the compiler might decide to generate slightly different code sequences for the two, and that could result in differences in the result.
Can you post a complete code example that illustrates the difference? Without that to go on, anything anyone posts as an answer is just speculation.
To your concerns: memory alignment cannot effect the value of a double, and a compiler should be able to generate equivalent code for either example, so you don't need to worry that you're doing something wrong (at least, not in the limited example you posted).
The first way is more efficient, in a very theoretical way. It gives the compiler slightly more leeway in assigning stack slots and registers. In the second example, the compiler has to pick 2 consecutive slots - except of course if the compiler is smart enough to realize that you'd never notice.
It's quite possible that the double[2] causes the array to be allocated as two adjacent stack slots where it wasn't before, and that in turn can cause code reordering to improve memory access efficiency. IEEE754 floating point math doesn't obey the regular math rules, i.e. a+b+c != c+b+a

gcc optimization? bug? and its practial implication to project

My questions are divided into three parts
Question 1
Consider the below code,
#include <iostream>
using namespace std;
int main( int argc, char *argv[])
{
const int v = 50;
int i = 0X7FFFFFFF;
cout<<(i + v)<<endl;
if ( i + v < i )
{
cout<<"Number is negative"<<endl;
}
else
{
cout<<"Number is positive"<<endl;
}
return 0;
}
No specific compiler optimisation options are used or the O's flag is used. It is basic compilation command g++ -o test main.cpp is used to form the executable.
The seemingly very simple code, has odd behaviour in SUSE 64 bit OS, gcc version 4.1.2. The expected output is "Number is negative", instead only in SUSE 64 bit OS, the output would be "Number is positive".
After some amount of analysis and doing a 'disass' of the code, I find that the compiler optimises in the below format -
Since i is same on both sides of comparison, it cannot be changed in the same expression, remove 'i' from the equation.
Now, the comparison leads to if ( v < 0 ), where v is a constant positive, So during compilation itself, the else part cout function address is added to the register. No cmp/jmp instructions can be found.
I see that the behaviour is only in gcc 4.1.2 SUSE 10. When tried in AIX 5.1/5.3 and HP IA64, the result is as expected.
Is the above optimisation valid?
Or, is using the overflow mechanism for int not a valid use case?
Question 2
Now when I change the conditional statement from if (i + v < i) to if ( (i + v) < i ) even then, the behaviour is same, this atleast I would personally disagree, since additional braces are provided, I expect the compiler to create a temporary built-in type variable and them compare, thus nullify the optimisation.
Question 3
Suppose I have a huge code base, an I migrate my compiler version, such bug/optimisation can cause havoc in my system behaviour. Ofcourse from business perspective, it is very ineffective to test all lines of code again just because of compiler upgradation.
I think for all practical purpose, these kinds of error are very difficult to catch (during upgradation) and invariably will be leaked to production site.
Can anyone suggest any possible way to ensure to ensure that these kind of bug/optimization does not have any impact on my existing system/code base?
PS :
When the const for v is removed from the code, then optimization is not done by the compiler.
I believe, it is perfectly fine to use overflow mechanism to find if the variable is from MAX - 50 value (in my case).
Update(1)
What would I want to achieve? variable i would be a counter (kind of syncID). If I do offline operation (50 operation) then during startup, I would like to reset my counter, For this I am checking the boundary value (to reset it) rather than adding it blindly.
I am not sure if I am relying on the hardware implementation. I know that 0X7FFFFFFF is the max positive value. All I am doing is, by adding value to this, I am expecting the return value to be negative. I don't think this logic has anything to do with hardware implementation.
Anyways, all thanks for your input.
Update(2)
Most of the inpit states that I am relying on the lower level behavior on overflow checking. I have one questions regarding the same,
If that is the case, For an unsigned int how do I validate and reset the value during underflow or overflow? like if v=10, i=0X7FFFFFFE, I want reset i = 9. Similarly for underflow?
I would not be able to do that unless I check for negativity of the number. So my claim is that int must return a negative number when a value is added to the +MAX_INT.
Please let me know your inputs.
It's a known problem, and I don't think it's considered a bug in the compiler. When I compile with gcc 4.5 with -Wall -O2 it warns
warning: assuming signed overflow does not occur when assuming that (X + c) < X is always false
Although your code does overflow.
You can pass the -fno-strict-overflow flag to turn that particular optimization off.
Your code produces undefined behavior. C and C++ languages has no "overflow mechanism" for signed integer arithmetic. Your calculations overflow signed integers - the behavior is immediately undefined. Considering it form "a bug in the compiler or not" position is no different that attempting to analyze the i = i++ + ++i examples.
GCC compiler has an optimization based on that part of the specification of C/C++ languages. It is called "strict overflow semantics" or something lake that. It is based on the fact that adding a positive value to a signed integer in C++ always produces a larger value or results in undefined behavior. This immediately means that the compiler is perfectly free to assume that the sum is always larger. The general nature of that optimization is very similar to the "strict aliasing" optimizations also present in GCC. They both resulted in some complaints from the more "hackerish" parts of GCC user community, many of whom didn't even suspect that the tricks they were relying on in their C/C++ programs were simply illegal hacks.
Q1: Perhaps, the number is indeed positive in a 64bit implementation? Who knows? Before debugging the code I'd just printf("%d", i+v);
Q2: The parentheses are only there to tell the compiler how to parse an expression. This is usually done in the form of a tree, so the optimizer does not see any parentheses at all. And it is free to transform the expression.
Q3: That's why, as c/c++ programmer, you must not write code that assumes particular properties of the underlying hardware, such as, for example, that an int is a 32 bit quantity in two's complement form.
What does the line:
cout<<(i + v)<<endl;
Output in the SUSE example? You're sure you don't have 64bit ints?
OK, so this was almost six years ago and the question is answered. Still I feel that there are some bits that have not been adressed to my satisfaction, so I add a few comments, hopefully for the good of future readers of this discussion. (Such as myself when I got a search hit for it.)
The OP specified using gcc 4.1.2 without any special flags. I assume the absence of the -O flag is equivalent to -O0. With no optimization requested, why did gcc optimize away code in the reported way? That does seem to me like a compiler bug. I also assume this has been fixed in later versions (for example, one answer mentions gcc 4.5 and the -fno-strict-overflow optimization flag). The current gcc man page states that -fstrict-overflow is included with -O2 or more.
In current versions of gcc, there is an option -fwrapv that enables you to use the sort of code that caused trouble for the OP. Provided of course that you make sure you know the bit sizes of your integer types. From gcc man page:
-fstrict-overflow
.....
See also the -fwrapv option. Using -fwrapv means that integer signed overflow
is fully defined: it wraps. ... With -fwrapv certain types of overflow are
permitted. For example, if the compiler gets an overflow when doing arithmetic
on constants, the overflowed value can still be used with -fwrapv, but not otherwise.

What is the performance implication of converting to bool in C++?

[This question is related to but not the same as this one.]
My compiler warns about implicitly converting or casting certain types to bool whereas explicit conversions do not produce a warning:
long t = 0;
bool b = false;
b = t; // performance warning: forcing long to bool
b = (bool)t; // performance warning
b = bool(t); // performance warning
b = static_cast<bool>(t); // performance warning
b = t ? true : false; // ok, no warning
b = t != 0; // ok
b = !!t; // ok
This is with Visual C++ 2008 but I suspect other compilers may have similar warnings.
So my question is: what is the performance implication of casting/converting to bool? Does explicit conversion have better performance in some circumstance (e.g., for certain target architectures or processors)? Does implicit conversion somehow confuse the optimizer?
Microsoft's explanation of their warning is not particularly helpful. They imply that there is a good reason but they don't explain it.
I was puzzled by this behaviour, until I found this link:
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=99633
Apparently, coming from the Microsoft Developer who "owns" this warning:
This warning is surprisingly
helpful, and found a bug in my code
just yesterday. I think Martin is
taking "performance warning" out of
context.
It's not about the generated code,
it's about whether or not the
programmer has signalled an intent to
change a value from int to bool.
There is a penalty for that, and the
user has the choice to use "int"
instead of "bool" consistently (or
more likely vice versa) to avoid the
"boolifying" codegen. [...]
It is an old warning, and may have
outlived its purpose, but it's
behaving as designed here.
So it seems to me the warning is more about style and avoiding some mistakes than anything else.
Hope this will answer your question...
:-p
The performance is identical across the board. It involves a couple of instructions on x86, maybe 3 on some other architectures.
On x86 / VC++, they all do
cmp DWORD PTR [whatever], 0
setne al
GCC generates the same thing, but without the warnings (at any warning-level).
The performance warning does actually make a little bit of sense. I've had it as well and my curiousity led me to investigate with the disassembler. It is trying to tell you that the compiler has to generate some code to coerce the value to either 0 or 1. Because you are insisting on a bool, the old school C idea of 0 or anything else doesn't apply.
You can avoid that tiny performance hit if you really want to. The best way is to avoid the cast altogether and use a bool from the start. If you must have an int, you could just use if( int ) instead of if( bool ). The code generated will simply check whether the int is 0 or not. No extra code to make sure the value is 1 if it's not 0 will be generated.
Sounds like premature optimization to me. Are you expecting that the performance of the cast to seriously effect the performance of your app? Maybe if you are writing kernel code or device drivers but in most cases, they all should be ok.
As far as I know, there is no warning on any other compiler for this. The only way I can think that this would cause a performance loss is that the compiler has to compare the entire integer to 0 and then assign the bool appropriately (unlike a conversion such as a char to bool, where the result can be copied over because a bool is one byte and so they are effectively the same), or an integral conversion which involves copying some or all of the source to the destination, possibly after a zero of the destination if it's bigger than the source (in terms of memory).
It's yet another one of Microsoft's useless and unhelpful ideas as to what constitutes good code, and leads us to have to put up with stupid definitions like this:
template <typename T>
inline bool to_bool (const T& t)
{ return t ? true : false; }
long t;
bool b;
int i;
signed char c;
...
You get a warning when you do anything that would be "free" if bool wasn't required to be 0 or 1. b = !!t is effectively assigning the result of the (language built-in, non-overrideable) bool operator!(long)
You shouldn't expect the ! or != operators to cost zero asm instructions even with an optimizing compiler. It is usually true that int i = t is usually optimized away completely. Or even signed char c = t; (on x86/amd64, if t is in the %eax register, after c = t, using c just means using %al. amd64 has byte addressing for every register, BTW. IIRC, in x86 some registers don't have byte addressing.)
Anyway, b = t; i = b; isn't the same as c = t; i = c; it's i = !!t; instead of i = t & 0xff;
Err, I guess everyone already knows all that from the previous replies. My point was, the warning made sense to me, since it caught cases where the compiler had to do things you didn't really tell it to, like !!BOOL on return because you declared the function bool, but are returning an integral value that could be true and != 1. e.g. a lot of windows stuff returns BOOL (int).
This is one of MSVC's few warnings that G++ doesn't have. I'm a lot more used to g++, and it definitely warns about stuff MSVC doesn't, but that I'm glad it told me about. I wrote a portab.h header file with stubs for the MFC/Win32 classes/macros/functions I used. This got the MFC app I'm working on to compile on my GNU/Linux machine at home (and with cygwin). I mainly wanted to be able to compile-test what I was working on at home, but I ended up finding g++'s warnings very useful. It's also a lot stricter about e.g. templates...
On bool in general, I'm not sure it makes for better code when used as a return values and parameter passing. Even for locals, g++ 4.3 doesn't seem to figure out that it doesn't have to coerce the value to 0 or 1 before branching on it. If it's a local variable and you never take its address, the compiler should keep it in whatever size is fastest. If it has to spill it from registers to the stack, it could just as well keep it in 4 bytes, since that may be slightly faster. (It uses a lot of movsx (sign-extension) instructions when loading/storing (non-local) bools, but I don't really remember what it did for automatic (local stack) variables. I do remember seeing it reserve an odd amount of stack space (not a multiple of 4) in functions that had some bools locals.)
Using bool flags was slower than int with the Digital Mars D compiler as of last year:
http://www.digitalmars.com/d/archives/digitalmars/D/opEquals_needs_to_return_bool_71813.html
(D is a lot like C++, but abandons full C backwards compat to define some nice new semantics, and good support for template metaprogramming. e.g. "static if" or "static assert" instead of template hacks or cpp macros. I'd really like to give D a try sometime. :)
For data structures, it can make sense, e.g. if you want to pack a couple flags before an int and then some doubles in a struct you're going to have quite a lot of.
Based on your link to MS' explanation, it appears that if the value is merely 1 or 0, there is not performance hit, but if it's any other non-0 value that a comparison must be built at compile time?
In C++ a bool ISA int with only two values 0 = false, 1 = true. The compiler only has to check one bit. To be perfectly clear, true != 0, so any int can override bool, it just cost processing cycles to do so.
By using a long as in the code sample, you are forcing a lot more bit checks, which will cause a performance hit.
No this is not premature optimization, it is quite crazy to use code that takes more processing time across the board. This is simply good coding practice.
Unless you're writing code for a really critical inner loop (simulator core, ray-tracer, etc.) there is no point in worrying about any performance hits in this case. There are other more important things to worry about in your code (and other more significant performance traps lurking, I'm sure).
Microsoft's explanation seems to be that what they're trying to say is:
Hey, if you're using an int, but are
only storing true or false information in
it, make it a bool!
I'm skeptical about how much would be gained performance-wise, but MS may have found that there was some gain (for their use, anyway). Microsoft's code does tend to run an awful lot, so maybe they've found the micro-optimization to be worthwhile. I believe that a fair bit of what goes into the MS compiler is to support stuff they find useful themselves (only makes sense, right?).
And you get rid of some dirty, little casts to boot.
I don't think performance is the issue here. The reason you get a warning is that information is lost during conversion from int to bool.