Why does this code compile without warnings?

Why does this code compile without warnings? - c++

I have no idea why this code complies :
int array[100];
array[-50] = 100; // Crash!!
...the compiler still compiles properly, without compiling errors, and warnings.
So why does it compile at all?

array[-50] = 100;
Actually means here:
*(array - 50) = 100;
Take into consideration this code:
int array[100];
int *b = &(a[50]);
b[-20] = 5;
This code is valid and won't crash. Compiler has no way of knowing, whether the code will crash or not and what programmer wanted to do with the array. So it does not complain.
Finally, take into consideration, that you should not rely on compiler warnings while finding bugs in your code. Compilers will not find most of your bugs, they barely try to make some hints for you to ease the bugfixing process (sometimes they even may be mistaken and point out, that valid code is buggy). Also, the standard actually never requires the compiler to emit warning, so these are only an act of good will of compiler implementers.

It compiles because the expression array[-50] is transformed to the equivalent
*(&array[0] + (-50))
which is another way of saying "take the memory address &array[0] and add to it -50 times sizeof(array[0]), then interpret the contents of the resulting memory address and those following it as an int", as per the usual pointer arithmetic rules. This is a perfectly valid expression where -50 might really be any integer (and of course it doesn't need to be a compile-time constant).
Now it's definitely true that since here -50 is a compile-time constant, and since accessing the minus 50th element of an array is almost always an error, the compiler could (and perhaps should) produce a warning for this.
However, we should also consider that detecting this specific condition (statically indexing into an array with an apparently invalid index) is something that you don't expect to see in real code. Therefore the compiler team's resources will be probably put to better use doing something else.
Contrast this with other constructs like if (answer = 42) which you do expect to see in real code (if only because it's so easy to make that typo) and which are hard to debug (the eye can easily read = as ==, whereas that -50 immediately sticks out). In these cases a compiler warning is much more productive.

The compiler is not required to catch all potential problems at compile time. The C standard allows for undefined behavior at run time (which is what happens when this program is executed). You may treat it as a legal excuse not to catch this kind of bugs.
There are compilers and static program analyzers that can do catch trivial bugs like this, though.

True compilers do (note: need to switch the compiler to clang 3.2, gcc is not user-friendly)
Compilation finished with warnings:
source.cpp:3:4: warning: array index -50 is before the beginning of the array [-Warray-bounds]
array[-50] = 100;
^ ~~~
source.cpp:2:4: note: array 'array' declared here
int array[100];
^
1 warning generated.
If you have a lesser (*) compiler, you may have to setup the warning manually though.
(*) ie, less user-friendly

The number inside the brackets is just an index. It tells you how many steps in memory to take to find the number you're requesting. array[2] means start at the beginning of array, and jump forwards two times.
You just told it to jump backwards 50 times, which is a valid statement. However, I can't imagine there being a good reason for doing this...

Related

Constexpr and array

Consider the following code snippet (of course this piece of code is not useful at all but I've simplified it just to demonstrate my question) :
constexpr std::array<char*, 5> params_name
{
"first_param",
"second_param",
"third_param",
"fourth_param",
"fifth_param"
};
int main()
{
std::vector<std::string> a_vector;
for (int i = 0; i < params_name.size(); ++i) {
a_vector.push_back(params_name[i]);
}
}
I would like to be sure understanding what happens to the for loop during compilation. Is the loop unrolled and becomes ? :
a_vector.push_back("first_param")
a_vector.push_back("second_param")
a_vector.push_back("third_param")
a_vector.push_back("fourth_param")
a_vector.push_back("fifth_param")
If it's the case, is the behaviour identical regardless the number of elements contained in the params_name array ? If yes, then I'm wondering whether it could be more interesting just to store those values in a regular array built at run time to avoid code expansion ?
Thanks in advance for your help.

One problem with your code is that at present std::array isn't constexpr-enabled. You can work around this by simply using a regular array such as
constexpr char const * const my_array[5] = { /* ... */ };
As for your question:
All constexpr really means is "this value is known at compile time".
Is the loop unrolled and becomes ?
I don't know. It depends on your compiler, architecture, standard library implementation, and optimization settings. I wouldn't think about this too much. You can be confident that at reasonable optimization levels (-O1 and -O2) your compiler will weigh the benefits and drawbacks of doing this vs not, and pick a good option.
If it's the case, is the behaviour identical regardless the number of elements contained in the params_name array ?
Yes! It doesn't matter whether the compiler unrolls the loop. When your code runs, it will appear to behave exactly like what you wrote. This is called the "as-if" rule, meaning that no matter what optimizations the compiler does, the resulting program must behave "as-if" it does what you wrote (assuming your code doesn't invoke undefined behavior).
could be more interesting just to store those values in a regular array built at run time to avoid code expansion?
From where would these values come if you did? From standard input? From a file? If yes, then the compiler can't know what they will be or how many there will be, so it has little choice but to make a runtime loop. If no, then even if the array is not constexpr, the compiler is likely smart enough to figure out what you mean and optimize the program to be the same as with a constexpr array.
To summarize: don't worry about things like loop unrolling or code duplication. Modern compilers are pretty smart and will usually generate the right code for your situation. The amount of extra memory spent on loop unrolling like this is usually more than offset by the performance improvements. Unless you're on an embedded system where every byte matters, just don't worry about it.

How much do C/C++ compilers optimize conditional statements?

I recently ran into a situation where I wrote the following code:
for(int i = 0; i < (size - 1); i++)
{
// do whatever
}
// Assume 'size' will be constant during the duration of the for loop
When looking at this code, it made me wonder how exactly the for loop condition is evaluated for each loop. Specifically, I'm curious as to whether or not the compiler would 'optimize away' any additional arithmetic that has to be done for each loop. In my case, would this code get compiled such that (size - 1) would have to be evaluated for every loop iteration? Or is the compiler smart enough to realize that the 'size' variable won't change, thus it could precalculate it for each loop iteration.
This then got me thinking about the general case where you have a conditional statement that may specify more operations than necessary.
As an example, how would the following two pieces of code compile:
if(6)
if(1+1+1+1+1+1)
int foo = 1;
if(foo + foo + foo + foo + foo + foo)
How smart is the compiler? Will the 3 cases listed above be converted into the same machine code?
And while I'm at, why not list another example. What does the compiler do if you are doing an operation within a conditional that won't have any effect on the end result? Example:
if(2*(val))
// Assume val is an int that can take on any value
In this example, the multiplication is completely unnecessary. While this case seems a lot stupider than my original case, the question still stands: will the compiler be able to remove this unnecessary multiplication?
Question:
How much optimization is involved with conditional statements?
Does it vary based on compiler?

Short answer: the compiler is exceptionally clever, and will generally optimise those cases that you have presented (including utterly ignoring irrelevant conditions).
One of the biggest hurdles language newcomers face in terms of truly understanding C++, is that there is not a one-to-one relationship between their code and what the computer executes. The entire purpose of the language is to create an abstraction. You are defining the program's semantics, but the computer has no responsibility to actually follow your C++ code line by line; indeed, if it did so, it would be abhorrently slow as compared to the speed we can expect from modern computers.
Generally speaking, unless you have a reason to micro-optimise (game developers come to mind), it is best to almost completely ignore this facet of programming, and trust your compiler. Write a program that takes the inputs you want, and gives the outputs you want, after performing the calculations you want… and let your compiler do the hard work of figuring out how the physical machine is going to make all that happen.
Are there exceptions? Certainly. Sometimes your requirements are so specific that you do know better than the compiler, and you end up optimising. You generally do this after profiling and determining what your bottlenecks are. And there's also no excuse to write deliberately silly code. After all, if you go out of your way to ask your program to copy a 50MB vector, then it's going to copy a 50MB vector.
But, assuming sensible code that means what it looks like, you really shouldn't spend too much time worrying about this. Because modern compilers are so good at optimising, that you'd be a fool to try to keep up.

The C++ language specification permits the compiler to make any optimization that results in no observable changes to the expected results.
If the compiler can determine that size is constant and will not change during execution, it can certainly make that particular optimization.
Alternatively, if the compiler can also determine that i is not used in the loop (and its value is not used afterwards), that it is used only as a counter, it might very well rewrite the loop to:
for(int i = 1; i < size; i++)
because that might produce smaller code. Even if this i is used in some fashion, the compiler can still make this change and then adjust all other usage of i so that the observable results are still the same.
To summarize: anything goes. The compiler may or may not make any optimization change as long as the observable results are the same.

Yes, there is a lot of optimization, and it is very complex.
It varies based on the compiler, and it also varies based on the compiler options
Check
https://meta.stackexchange.com/questions/25840/can-we-stop-recommending-the-dragon-book-please
for some book recomendations if you really want to understand what a compiler may do. It is a very complex subject.
You can also compile to assembly with the -S option (gcc / g++) to see what the compiler is really doing. Use -O3 / ... / -O0 / -O to experiment with different optimization levels.

Why don't modern C++ compilers optimize away simple loops like this? (Clang, MSVC)

When I compile and run this code with Clang (-O3) or MSVC (/O2)...
#include <stdio.h>
#include <time.h>
static int const N = 0x8000;
int main()
{
clock_t const start = clock();
for (int i = 0; i < N; ++i)
{
int a[N]; // Never used outside of this block, but not optimized away
for (int j = 0; j < N; ++j)
{
++a[j]; // This is undefined behavior (due to possible
// signed integer overflow), but Clang doesn't see it
}
}
clock_t const finish = clock();
fprintf(stderr, "%u ms\n",
static_cast<unsigned int>((finish - start) * 1000 / CLOCKS_PER_SEC));
return 0;
}
... the loop doesn't get optimized away.
Furthermore, neither Clang 3.6 nor Visual C++ 2013 nor GCC 4.8.1 tells me that the variable is uninitialized!
Now I realize that the lack of an optimization isn't a bug per se, but I find this astonishing given how compilers are supposed to be pretty smart nowadays. This seems like such a simple piece of code that even liveness analysis techniques from a decade ago should be able to take care of optimizing away the variable a and therefore the whole loop -- never mind the fact that incrementing the variable is already undefined behavior.
Yet only GCC is able to figure out that it's a no-op, and none of the compilers tells me that this is an uninitialized variable.
Why is this? What's preventing simple liveness analysis from telling the compiler that a is unused? Moreover, why isn't the compiler detecting that a[j] is uninitialized in the first place? Why can't the existing uninitialized-variable-detectors in all of those compilers catch this obvious error?

The undefined behavior is irrelevant here. Replacing the inner loop with:
for (int j = 1; j < N; ++j)
{
a[j-1] = a[j];
a[j] = j;
}
... has the same effect, at least with Clang.
The issue is that the inner loop both loads from a[j] (for some j) and stores to a[j] (for some j). None of the stores can be removed, because the compiler believes they may be visible to later loads, and none of the loads can be removed, because their values are used (as input to the later stores). As a result, the loop still has side-effects on memory, so the compiler doesn't see that it can be deleted.
Contrary to n.m.'s answer, replacing int with unsigned does not make the problem go away. The code generated by Clang 3.4.1 using int and using unsigned int is identical.

It's an interesting issue with regards to optimizing. I would
expect that in most cases, the compiler would treat each element
of the array as an individual variable when doing dead code
analysis. Ans 0x8000 make too many individual variables to
track, so the compiler doesn't try. The fact that a[j]
doesn't always access the the same object could cause problems
as well for the optimizer.
Obviously, different compilers use different heuristics;
a compiler could treat the array as a single object, and detect
that it never affected output (observable behavior). Some
compilers may choose not to, however, on the grounds that
typically, it's a lot of work for very little gain: how often
would such optimizations be applicable in real code?

++a[j]; // This is undefined behavior too, but Clang doesn't see it
Are you saying this is undefined behavior because the array elements are uninitialized?
If so, although this is a common interpretation of clause 4.1/1 in the standard I believe it is incorrect. The elements are 'uninitialized' in the sense that programmers usually use this term, but I do not believe this corresponds exactly to the C++ specification's use of the term.
In particular C++11 8.5/11 states that these objects are in fact default initialized, and this seems to me to be mutually exclusive with being uninitialized. The standard also states that for some objects being default initialized means that 'no initialized is performed'. Some might assume this means that they are uninitialized but this is not specified and I simply take it to mean that no such performance is required.
The spec does make clear that the array elements will have indeterminant values. C++ specifies, by reference to the C standard, that indeterminant values can be either valid representations, legal to access normally, or trap representations. If the particular indeterminant values of the array elements happen to all be valid representations, (and none are INT_MAX, avoiding overflow) then the above line does not trigger any undefined behavior in C++11.
Since these array elements could be trap representations it would be perfectly conformant for clang to act as though they are guaranteed to be trap representations, effectively choosing to make the code UB in order to create an optimization opportunity.
Even if clang doesn't do that it could still choose to optimize based on the dataflow. Clang does know how to do that, as demonstrated by the fact that if the inner loop is changed slightly then the loops do get removed.
So then why does the (optional) presence of UB seem to stymie optimization, when UB is usually taken as an opportunity for more optimization?
What may be going on is that clang has decided that users want int trapping based on the hardware's behavior. And so rather than taking traps as an optimization opportunity, clang has to generate code which faithfully reproduces the program behavior in hardware. This means that the loops cannot be eliminated based on dataflow, because doing so might eliminate hardware traps.
C++14 updates the behavior such that accessing indeterminant values itself produces undefined behavior, independent of whether one considers the variable uninitialized or not: https://stackoverflow.com/a/23415662/365496

That is indeed very interesting. I tried your example with MSVC 2013.
My first idea was that the fact that the ++a[j] is somewhat undefined is the reason why the loop is not removed, because removing this would definetly change the meaning of the program from an undefined/incorrect semantic to something meaningful, so I tried to initialize the values before but the loops still did not dissappear.
Afterwards I replaced the ++a[j]; with an a[j] = 0; which then produced an output without any loop so everything between the two calls to clock() was removed. I can only guess about the reason. Perhaps the optimizer is not able to prove that the operator++ has no side effects for any reason.

GCC: program doesn't work with compilation option -O3

I'm writing a C++ program that doesn't work (I get a segmentation fault) when I compile it with optimizations (options -O1, -O2, -O3, etc.), but it works just fine when I compile it without optimizations.
Is there any chance that the error is in my code? or should I assume that this is a bug in GCC?
My GCC version is 3.4.6.
Is there any known workaround for this kind of problem?
There is a big difference in speed between the optimized and unoptimized version of my program, so I really need to use optimizations.
This is my original functor. The one that works fine with no levels of optimizations and throws a segmentation fault with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
return distance(point,p1) < distance(point,p2) ;
}
} ;
And this one works flawlessly with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
float d1=distance(point,p1) ;
float d2=distance(point,p2) ;
std::cout << "" ; //without this line, I get a segmentation fault anyways
return d1 < d2 ;
}
} ;
Unfortunately, this problem is hard to reproduce because it happens with some specific values. I get the segmentation fault upon sorting just one out of more than a thousand vectors, so it really depends on the specific combination of values each vector has.

Now that you posted the code fragment and a working workaround was found (#Windows programmer's answer), I can say that perhaps what you are looking for is -ffloat-store.
-ffloat-store
Do not store floating point variables in registers, and inhibit other options that might change whether a floating point value is taken from a register or memory.
This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables.
Source: http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Optimize-Options.html

I would assume your code is wrong first.
Though it is hard to tell.
Does your code compile with 0 warnings?
g++ -Wall -Wextra -pedantic -ansi

Here's some code that seems to work, until you hit -O3...
#include <stdio.h>
int main()
{
int i = 0, j = 1, k = 2;
printf("%d %d %d\n", *(&j-1), *(&j), *(&j+1));
return 0;
}
Without optimisations, I get "2 1 0"; with optimisations I get "40 1 2293680". Why? Because i and k got optimised out!
But I was taking the address of j and going out of the memory region allocated to j. That's not allowed by the standard. It's most likely that your problem is caused by a similar deviation from the standard.
I find valgrind is often helpful at times like these.
EDIT: Some commenters are under the impression that the standard allows arbitrary pointer arithmetic. It does not. Remember that some architectures have funny addressing schemes, alignment may be important, and you may get problems if you overflow certain registers!
The words of the [draft] standard, on adding/subtracting an integer to/from a pointer (emphasis added):
"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
Seeing as &j doesn't even point to an array object, &j-1 and &j+1 can hardly point to part of the same array object. So simply evaluating &j+1 (let alone dereferencing it) is undefined behaviour.
On x86 we can be pretty confident that adding one to a pointer is fairly safe and just takes us to the next memory location. In the code above, the problem occurs when we make assumptions about what that memory contains, which of course the standard doesn't go near.

As an experiment, try to see if this will force the compiler to round everything consistently.
volatile float d1=distance(point,p1) ;
volatile float d2=distance(point,p2) ;
return d1 < d2 ;

The error is in your code. It's likely you're doing something that invokes undefined behavior according to the C standard which just happens to work with no optimizations, but when GCC makes certain assumptions for performing its optimizations, the code breaks when those assumptions aren't true. Make sure to compile with the -Wall option, and the -Wextra might also be a good idea, and see if you get any warnings. You could also try -ansi or -pedantic, but those are likely to result in false positives.

You may be running into an aliasing problem (or it could be a million other things). Look up the -fstrict-aliasing option.
This kind of question is impossible to answer properly without more information.

It is very seldom the compiler fault, but compiler do have bugs in them, and them often manifest themselves at different optimization levels (if there is a bug in an optimization pass, for example).
In general when reporting programming problems: provide a minimal code sample to demonstrate the issue, such that people can just save the code to a file, compile and run it. Make it as easy as possible to reproduce your problem.
Also, try different versions of GCC (compiling your own GCC is very easy, especially on Linux). If possible, try with another compiler. Intel C has a compiler which is more or less GCC compatible (and free for non-commercial use, I think). This will help pinpointing the problem.

It's almost (almost) never the compiler.
First, make sure you're compiling warning-free, with -Wall.
If that didn't give you a "eureka" moment, attach a debugger to the least optimized version of your executable that crashes and see what it's doing and where it goes.
5 will get you 10 that you've fixed the problem by this point.

Ran into the same problem a few days ago, in my case it was aliasing. And GCC does it differently, but not wrongly, when compared to other compilers. GCC has become what some might call a rules-lawyer of the C++ standard, and their implementation is correct, but you also have to be really correct in you C++, or it'll over optimize somethings, which is a pain. But you get speed, so can't complain.

I expect to get some downvotes here after reading some of the comments, but in the console game programming world, it's rather common knowledge that the higher optimization levels can sometimes generate incorrect code in weird edge cases. It might very well be that edge cases can be fixed with subtle changes to the code, though.

Alright...
This is one of the weirdest problems I've ever had.
I dont think I have enough proof to state it's a GCC bug, but honestly... It really looks like one.
This is my original functor. The one that works fine with no levels of optimizations and throws a segmentation fault with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
return distance(point,p1) < distance(point,p2) ;
}
} ;
And this one works flawlessly with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
float d1=distance(point,p1) ;
float d2=distance(point,p2) ;
std::cout << "" ; //without this line, I get a segmentation fault anyways
return d1 < d2 ;
}
} ;
Unfortunately, this problem is hard to reproduce because it happens with some specific values. I get the segmentation fault upon sorting just one out of more than a thousand vectors, so it really depends on the specific combination of values each vector has.

Wow, I didn't expect answers so quicly, and so many...
The error occurs upon sorting a std::vector of pointers using std::sort()
I provide the strict-weak-ordering functor.
But I know the functor I provide is correct because I've used it a lot and it works fine.
Plus, the error cannot be some invalid pointer in the vector becasue the error occurs just when I sort the vector. If I iterate through the vector without applying std::sort first, the program works fine.
I just used GDB to try to find out what's going on. The error occurs when std::sort invoke my functor. Aparently std::sort is passing an invalid pointer to my functor. (of course this happens with the optimized version only, any level of optimization -O, -O2, -O3)

as other have pointed out, probably strict aliasing.
turn it of in o3 and try again. My guess is that you are doing some pointer tricks in your functor (fast float as int compare? object type in lower 2 bits?) that fail across inlining template functions.
warnings do not help to catch this case. "if the compiler could detect all strict aliasing problems it could just as well avoid them" just changing an unrelated line of code may make the problem appear or go away as it changes register allocation.

As the updated question will show ;) , the problem exists with a std::vector<T*>. One common error with vectors is reserve()ing what should have been resize()d. As a result, you'd be writing outside array bounds. An optimizer may discard those writes.

post the code in distance! it probably does some pointer magic, see my previous post. doing an intermediate assignment just hides the bug in your code by changing register allocation. even more telling of this is the output changing things!

The true answer is hidden somewhere inside all the comments in this thread. First of all: it is not a bug in the compiler.
The problem has to do with floating point precision. distanceToPointSort should be a function that should never return true for both the arguments (a,b) and (b,a), but that is exactly what can happen when the compiler decides to use higher precision for some data paths. The problem is especially likely on, but by no means limited to, x86 without -mfpmath=sse. If the comparator behaves that way, the sort function can become confused, and the segmentation fault is not surprising.
I consider -ffloat-store the best solution here (already suggested by CesarB).

What is the performance implication of converting to bool in C++?

[This question is related to but not the same as this one.]
My compiler warns about implicitly converting or casting certain types to bool whereas explicit conversions do not produce a warning:
long t = 0;
bool b = false;
b = t; // performance warning: forcing long to bool
b = (bool)t; // performance warning
b = bool(t); // performance warning
b = static_cast<bool>(t); // performance warning
b = t ? true : false; // ok, no warning
b = t != 0; // ok
b = !!t; // ok
This is with Visual C++ 2008 but I suspect other compilers may have similar warnings.
So my question is: what is the performance implication of casting/converting to bool? Does explicit conversion have better performance in some circumstance (e.g., for certain target architectures or processors)? Does implicit conversion somehow confuse the optimizer?
Microsoft's explanation of their warning is not particularly helpful. They imply that there is a good reason but they don't explain it.

I was puzzled by this behaviour, until I found this link:
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=99633
Apparently, coming from the Microsoft Developer who "owns" this warning:
This warning is surprisingly
helpful, and found a bug in my code
just yesterday. I think Martin is
taking "performance warning" out of
context.
It's not about the generated code,
it's about whether or not the
programmer has signalled an intent to
change a value from int to bool.
There is a penalty for that, and the
user has the choice to use "int"
instead of "bool" consistently (or
more likely vice versa) to avoid the
"boolifying" codegen. [...]
It is an old warning, and may have
outlived its purpose, but it's
behaving as designed here.
So it seems to me the warning is more about style and avoiding some mistakes than anything else.
Hope this will answer your question...
:-p

The performance is identical across the board. It involves a couple of instructions on x86, maybe 3 on some other architectures.
On x86 / VC++, they all do
cmp DWORD PTR [whatever], 0
setne al
GCC generates the same thing, but without the warnings (at any warning-level).

The performance warning does actually make a little bit of sense. I've had it as well and my curiousity led me to investigate with the disassembler. It is trying to tell you that the compiler has to generate some code to coerce the value to either 0 or 1. Because you are insisting on a bool, the old school C idea of 0 or anything else doesn't apply.
You can avoid that tiny performance hit if you really want to. The best way is to avoid the cast altogether and use a bool from the start. If you must have an int, you could just use if( int ) instead of if( bool ). The code generated will simply check whether the int is 0 or not. No extra code to make sure the value is 1 if it's not 0 will be generated.

Sounds like premature optimization to me. Are you expecting that the performance of the cast to seriously effect the performance of your app? Maybe if you are writing kernel code or device drivers but in most cases, they all should be ok.

As far as I know, there is no warning on any other compiler for this. The only way I can think that this would cause a performance loss is that the compiler has to compare the entire integer to 0 and then assign the bool appropriately (unlike a conversion such as a char to bool, where the result can be copied over because a bool is one byte and so they are effectively the same), or an integral conversion which involves copying some or all of the source to the destination, possibly after a zero of the destination if it's bigger than the source (in terms of memory).
It's yet another one of Microsoft's useless and unhelpful ideas as to what constitutes good code, and leads us to have to put up with stupid definitions like this:
template <typename T>
inline bool to_bool (const T& t)
{ return t ? true : false; }

long t;
bool b;
int i;
signed char c;
...
You get a warning when you do anything that would be "free" if bool wasn't required to be 0 or 1. b = !!t is effectively assigning the result of the (language built-in, non-overrideable) bool operator!(long)
You shouldn't expect the ! or != operators to cost zero asm instructions even with an optimizing compiler. It is usually true that int i = t is usually optimized away completely. Or even signed char c = t; (on x86/amd64, if t is in the %eax register, after c = t, using c just means using %al. amd64 has byte addressing for every register, BTW. IIRC, in x86 some registers don't have byte addressing.)
Anyway, b = t; i = b; isn't the same as c = t; i = c; it's i = !!t; instead of i = t & 0xff;
Err, I guess everyone already knows all that from the previous replies. My point was, the warning made sense to me, since it caught cases where the compiler had to do things you didn't really tell it to, like !!BOOL on return because you declared the function bool, but are returning an integral value that could be true and != 1. e.g. a lot of windows stuff returns BOOL (int).
This is one of MSVC's few warnings that G++ doesn't have. I'm a lot more used to g++, and it definitely warns about stuff MSVC doesn't, but that I'm glad it told me about. I wrote a portab.h header file with stubs for the MFC/Win32 classes/macros/functions I used. This got the MFC app I'm working on to compile on my GNU/Linux machine at home (and with cygwin). I mainly wanted to be able to compile-test what I was working on at home, but I ended up finding g++'s warnings very useful. It's also a lot stricter about e.g. templates...
On bool in general, I'm not sure it makes for better code when used as a return values and parameter passing. Even for locals, g++ 4.3 doesn't seem to figure out that it doesn't have to coerce the value to 0 or 1 before branching on it. If it's a local variable and you never take its address, the compiler should keep it in whatever size is fastest. If it has to spill it from registers to the stack, it could just as well keep it in 4 bytes, since that may be slightly faster. (It uses a lot of movsx (sign-extension) instructions when loading/storing (non-local) bools, but I don't really remember what it did for automatic (local stack) variables. I do remember seeing it reserve an odd amount of stack space (not a multiple of 4) in functions that had some bools locals.)
Using bool flags was slower than int with the Digital Mars D compiler as of last year:
http://www.digitalmars.com/d/archives/digitalmars/D/opEquals_needs_to_return_bool_71813.html
(D is a lot like C++, but abandons full C backwards compat to define some nice new semantics, and good support for template metaprogramming. e.g. "static if" or "static assert" instead of template hacks or cpp macros. I'd really like to give D a try sometime. :)
For data structures, it can make sense, e.g. if you want to pack a couple flags before an int and then some doubles in a struct you're going to have quite a lot of.

Based on your link to MS' explanation, it appears that if the value is merely 1 or 0, there is not performance hit, but if it's any other non-0 value that a comparison must be built at compile time?

In C++ a bool ISA int with only two values 0 = false, 1 = true. The compiler only has to check one bit. To be perfectly clear, true != 0, so any int can override bool, it just cost processing cycles to do so.
By using a long as in the code sample, you are forcing a lot more bit checks, which will cause a performance hit.
No this is not premature optimization, it is quite crazy to use code that takes more processing time across the board. This is simply good coding practice.

Unless you're writing code for a really critical inner loop (simulator core, ray-tracer, etc.) there is no point in worrying about any performance hits in this case. There are other more important things to worry about in your code (and other more significant performance traps lurking, I'm sure).

Microsoft's explanation seems to be that what they're trying to say is:
Hey, if you're using an int, but are
only storing true or false information in
it, make it a bool!
I'm skeptical about how much would be gained performance-wise, but MS may have found that there was some gain (for their use, anyway). Microsoft's code does tend to run an awful lot, so maybe they've found the micro-optimization to be worthwhile. I believe that a fair bit of what goes into the MS compiler is to support stuff they find useful themselves (only makes sense, right?).
And you get rid of some dirty, little casts to boot.

I don't think performance is the issue here. The reason you get a warning is that information is lost during conversion from int to bool.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js