I came up with a thought about the types of _Bool/ bool (stdbool.h) in C and bool in C++.
We use the boolean types to declare objects, that only shall hold the values of 0 or 1.
For example:
_Bool bin = 1;
or
bool bin = 1;
(Note: bool is a macro for _Bool inside the header file of stdbool.h.)
in C,
or
bool bin = 1;
in C++.
But are the boolean types of _Bool and bool really efficient?
I made a test to determine the size of each object in memory:
For C:
#include <stdio.h>
#include <stdbool.h> // for "bool" macro.
int main()
{
_Bool bin1 = 1;
bool bin2 = 1; // just for the sake of completeness; bool is a macro for _Bool.
printf("the size of bin1 in bytes is: %lu \n",(sizeof(bin1)));
printf("the size of bin2 in bytes is: %lu \n",(sizeof(bin2)));
return 0;
}
Output:
the size of bin1 in bytes is: 1
the size of bin2 in bytes is: 1
For C++:
#include <iostream>
int main()
{
bool bin = 1;
std::cout << "the size of bin in bytes is: " << sizeof(bin);
return 0;
}
Output:
the size of bin in bytes is: 1
So, objects of a boolean type do get stored inside 1 byte (8 bits) in memory, not just in one 1 bit, as it normally only shall require.
The reason why is discussed here: Why is a char and a bool the same size in c++?. This is not what my question is about.
My question are:
Why do we use the types of _Bool/ bool (stdbool.h) in C and bool in C++, if they do not provide a benefit in memory storage, as it is specificially pretended for use these types?
Why can´t I just use the types of int8_t or char (assuming char is contained of 8 bit (which is usually the case) in the specific implementation) instead?
Is it just to provide the obvious impression for a reader of the code, that the respective objects are used for 0 or 1/true or false purposes only?
Thank you very much to participate.
Why do we use the types of _Bool/ bool (stdbool.h) in C and bool in C++, if they do not provide a benefit in memory storage, as it is specificially pretended for use these types?
You already mentioned the reason in your question:
We use the boolean types to declare objects, that only shall hold the values of 0 or 1
The advantage of using boolean datatype specifically is because it can only represent true or false. The other integer types have more representable values which is undesirable when you want only two.
Why can´t I just use the types of int8_t or char (assuming char is contained of 8 bit (which is usually the case) in the specific implementation) instead?
You can. In fact, C didn't have a boolean data type until C99 standard. Note that downside of using int8_t is that it is not guaranteed to be provided by all systems. And porblem with char is that it may be either signed or unsigned.
But you don't need to, since you can use boolean data type instead.
This implies that there is difference when I use trueor false with boolean types in comparison to when I use these with char or int8_t. Could you state this difference?
Consider following trivial example:
int8_t i = some_value;
bool b = some_value;
if (i == true)
if (i)
if (b == true)
if (b)
For int8_t, those two conditionals have different behaviour, which creates opporunity for the behaviour to be wrong if the wrong form is chosen. For a boolean, they have identical behaviour and there is no wrong choice.
P.S. If you want to compactly store multiple boolean values (at the cost of multiple instructions per read and write), you can use std::bitset or std::vector<bool> for example. In C there are no analogous standard library utilities, but such functionality can be implemented with shifting and masking.
There's more than one sort of efficiency. Memory efficiency is one. Speed efficiency is another.
The original C language did not have a boolean type at all -- typically a programmer would use an int for boolean flags, 0 for false and 1 for true. If they were really concerned about memory efficiency, they might use a bitmap to store eight booleans in a byte, but this was generally only necessary in situations where memory was really scarce. But accessing an int is faster than accessing an int then unpacking its constituent bits.
_Bool/bool was introduced in C99. It reflects the common practice of storing booleans in an int.
However, it has the advantage that the compiler knows it's a boolean, so it's more difficult to accidentally assign it the value 3, add it to an integer, etc.
Most programming languages today store a boolean in a byte. Yes, it uses eight times more memory than necessary -- but it's fast, and it's rare to have so many booleans on the go at once that the waste becomes significant.
In many programming languages, the implementation is separate from the language spec -- Javascript's spec doesn't say how the Javascript runtime should store true or false. In C99 however you can rely on true being equivalent to integer 1.
If booleans are truly using too much of your system's memory, you can work with bitwise operations to store 8 booleans in an unsigned char or more in the larger types.
You can do that for runtime operations if necessary, or just when writing to an output format (if the problem is the size of records on filesystems, or network packets).
It's worth noting though, that in many, many modern applications, people are perfectly happy to represent false on the wire or on the filesystem as the 7 bytes [ '"', 'f', 'a', 'l', 's', 'e', '"' ]
One of the reasons for having bool as well as int is to increase the comprehensibility of the code to those who come after and try to maintain it.
Consider these
bool b;
int c;
if (b == c)
c = 2;
b = 2;
Now, I'd have said that comparing a boolean (true or false) with a number is very likely an error. So things like 'if (b == 1)' could indicate a coding error. I hope you'd agree that 'b = 2' is just wrong.
Benefits in memory storage (and I don't recall anything anywhere claiming that using boolean types reduced memory requirements) are not the only reason for language features.
Related
There was a similar question here, but the user in that question seemed to have a much larger array, or vector. If I have:
bool boolArray[4];
And I want to check if all elements are false, I can check [ 0 ], [ 1 ] , [ 2 ] and [ 3 ] either separately, or I can loop through it. Since (as far as I know) false should have value 0 and anything other than 0 is true, I thought about simply doing:
if ( *(int*) boolArray) { }
This works, but I realize that it relies on bool being one byte and int being four bytes. If I cast to (std::uint32_t) would it be OK, or is it still a bad idea? I just happen to have 3 or 4 bools in an array and was wondering if this is safe, and if not if there is a better way to do it.
Also, in the case I end up with more than 4 bools but less than 8 can I do the same thing with a std::uint64_t or unsigned long long or something?
As πάντα ῥεῖ noticed in comments, std::bitset is probably the best way to deal with that in UB-free manner.
std::bitset<4> boolArray {};
if(boolArray.any()) {
//do the thing
}
If you want to stick to arrays, you could use std::any_of, but this requires (possibly peculiar to the readers) usage of functor which just returns its argument:
bool boolArray[4];
if(std::any_of(std::begin(boolArray), std::end(boolArray), [](bool b){return b;}) {
//do the thing
}
Type-punning 4 bools to int might be a bad idea - you cannot be sure of the size of each of the types. It probably will work on most architectures, but std::bitset is guaranteed to work everywhere, under any circumstances.
Several answers have already explained good alternatives, particularly std::bitset and std::any_of(). I am writing separately to point out that, unless you know something we don't, it is not safe to type pun between bool and int in this fashion, for several reasons:
int might not be four bytes, as multiple answers have pointed out.
M.M points out in the comments that bool might not be one byte. I'm not aware of any real-world architectures in which this has ever been the case, but it is nevertheless spec-legal. It (probably) can't be smaller than a byte unless the compiler is doing some very elaborate hide-the-ball chicanery with its memory model, and a multi-byte bool seems rather useless. Note however that a byte need not be 8 bits in the first place.
int can have trap representations. That is, it is legal for certain bit patterns to cause undefined behavior when they are cast to int. This is rare on modern architectures, but might arise on (for example) ia64, or any system with signed zeros.
Regardless of whether you have to worry about any of the above, your code violates the strict aliasing rule, so compilers are free to "optimize" it under the assumption that the bools and the int are entirely separate objects with non-overlapping lifetimes. For example, the compiler might decide that the code which initializes the bool array is a dead store and eliminate it, because the bools "must have" ceased to exist* at some point before you dereferenced the pointer. More complicated situations can also arise relating to register reuse and load/store reordering. All of these infelicities are expressly permitted by the C++ standard, which says the behavior is undefined when you engage in this kind of type punning.
You should use one of the alternative solutions provided by the other answers.
* It is legal (with some qualifications, particularly regarding alignment) to reuse the memory pointed to by boolArray by casting it to int and storing an integer, although if you actually want to do this, you must then pass boolArray through std::launder if you want to read the resulting int later. Regardless, the compiler is entitled to assume that you have done this once it sees the read, even if you don't call launder.
You can use std::bitset<N>::any:
Any returns true if any of the bits are set to true, otherwise false.
#include <iostream>
#include <bitset>
int main ()
{
std::bitset<4> foo;
// modify foo here
if (foo.any())
std::cout << foo << " has " << foo.count() << " bits set.\n";
else
std::cout << foo << " has no bits set.\n";
return 0;
}
Live
If you want to return true if all or none of the bits set to on, you can use std::bitset<N>::all or std::bitset<N>::none respectively.
The standard library has what you need in the form of the std::all_of, std::any_of, std::none_of algorithms.
...And for the obligatory "roll your own" answer, we can provide a simple "or"-like function for any array bool[N], like so:
template<size_t N>
constexpr bool or_all(const bool (&bs)[N]) {
for (bool b : bs) {
if (b) { return b; }
}
return false;
}
Or more concisely,
template<size_t N>
constexpr bool or_all(const bool (&bs)[N]) {
for (bool b : bs) { if (b) { return b; } }
return false;
}
This also has the benefit of both short-circuiting like ||, and being optimised out entirely if calculable at compile time.
Apart from that, if you want to examine the original idea of type-punning bool[N] to some other type to simplify observation, I would very much recommend that you don't do that view it as char[N2] instead, where N2 == (sizeof(bool) * N). This would allow you to provide a simple representation viewer that can automatically scale to the viewed object's actual size, allow iteration over its individual bytes, and allow you to more easily determine whether the representation matches specific values (such as, e.g., zero or non-zero). I'm not entirely sure off the top of my head whether such examination would invoke any UB, but I can say for certain that any such type's construction cannot be a viable constant-expression, due to requiring a reinterpret cast to char* or unsigned char* or similar (either explicitly, or in std::memcpy()), and thus couldn't as easily be optimised out.
The function std::isdigit is:
int isdigit(int ch);
The return (Non-zero value if the character is a numeric character, zero otherwise.) smells like the function was inherited from C, but even that does not explain why the parameter type is int not char while at the same time...
The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF.
Is there any technical reason why isdigitstakes an int not a char?
The reaons is to allow EOF as input. And EOF is (from here):
EOF integer constant expression of type int and negative value
The accepted answer is correct, but I believe the question deserves more detail.
A char in C++ is either signed or unsigned depending on your implementation (and, yet, it's a distinct type from signed char and unsigned char).
Where C grew up, char was typically unsigned and assumed to be an n-bit byte that could represent [0..2^n-1]. (Yes, there were some machines that had byte sizes other than 8 bits.) In fact, chars were considered virtually indistinguishable from bytes, which is why functions like memcpy take char * rather than something like uint8_t *, why sizeof char is always 1, and why CHAR_BITS isn't named BYTE_BITS.
But the C standard, which was the baseline for C++, only promised that char could hold any value in the execution character set. They might hold additional values, but there was no guarantee. The source character set (basically 7-bit ASCII minus some control characters) required something like 97 values. For a while, the execution character set could be smaller, but in practice it almost never was. Eventually there was an explicit requirement that a char be large enough to hold an 8-bit byte.
But the range was still uncertain. If unsigned, you could rely on [0..255]. Signed chars, however, could--in theory--use a sign+magnitude representation that would give you a range of [-127..127]. Note that's only 255 unique values, not 256 values ([-128..127]) like you'd get from two's complement. If you were language lawyerly enough, you could argue that you cannot store every possible value of an 8-bit byte in a char even though that was a fundamental assumption throughout the design of the language and its run-time library. I think C++ finally closed that apparent loophole in C++17 or C++20 by, in effect, requiring that a signed char use two's complement even if the larger integral types use sign+magnitude.
When it came time to design fundamental input/output functions, they had to think about how to return a value or a signal that you've reached the end of the file. It was decided to use a special value rather than an out-of-band signaling mechanism. But what value to use? The Unix folks generally had [128..255] available and others had [-128..-1].
But that's only if you're working with text. The Unix/C folks thought of textual characters and binary byte values as the same thing. So getc() was also for reading bytes from a binary file. All 256 possible values of a char, regardless of its signedness, were already claimed.
K&R C (before the first ANSI standard) didn't require function prototypes. The compiler made assumptions about parameter and return types. This is why C and C++ have the "default promotions," even though they're less important now than they once were. In effect, you couldn't return anything smaller than an int from a function. If you did, it would just be converted to int anyway.
The natural solution was therefore to have getc() return an int containing either the character value or a special end-of-file value, imaginatively dubbed EOF, a macro for -1.
The default promotions not only mandated a function couldn't return an integral type smaller than an int, they also made it difficult to pass in a small type. So int was also the natural parameter type for functions that expected a character. And thus we ended up with function signatures like int isdigit(int ch).
If you're a Posix fan, this is basically all you need.
For the rest of us, there's a remaining gotcha: If your chars are signed, then -1 might represent a legitimate character in your execution character set. How can you distinguish between them?
The answer is that functions don't really traffic in char values at all. They're really using unsigned char values dressed up as ints.
int x = getc(source_file);
if (x != EOF) { /* reached end of file */ }
else if (0 <= x && x < 128) { /* plain 7-bit character */ }
else if (128 <= x && x < 256) {
// Here it gets interesting.
bool b1 = isdigit(x); // OK
bool b2 = isdigit(static_cast<char>(x)); // NOT PORTABLE
bool b3 = isdigit(static_cast<unsigned char>(x)); // CORRECT!
}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What is advantage of using bool data type over integer values as 0 and 1 for true and false values, which will be more efficient to use?
And how it will differ in different languages like c and c++?
int was used in C when there was no bool, there is no advantage1 using an integer type to represent a boolean value in C++ or in modern C:
bool is meaningful, int is not:
bool try_something();
int try_something(); // What does this return?
It is common in C to have functioning returning int to indicate success or failure, but a lot (most?) of these do not return 1 on success and 0 on failure, they follow UNIX standard which is to return 0 in case of success and something else in case of error, so you get code like this:
int close(int fd);
if (close(fd)) { /* Something bad happened... */ }
See close manual which returns 0 on success and -1 on failure. For new user, this code is disconcerting, you expect close to return something that is true if it succeed, not the opposite.
If you need to store large amount of boolean values, you may want to optimize the storage by using only one bit to represent the value (against 32 bits if you store it in a int), and in C++, std::vector<bool>2 is already specialized to do that, so:
std::vector<bool> bvec(8000); // Size ~ 1000 bytes (optimization of std::vector<bool>)
std::vector<int> ivec(8000); // Size ~ 32000 bytes (32-bits int)
1 There are no advantages if you use them in the same way (e.g. by doing bool b = f(); vs. int b = f(); or the vector<bool> above (if you can replace int by bool without problem in your code, then do so)), but if you use an int to store multiple boolean values using bitwise operations, this is another question.
2 Note that this only applies to std::vector<bool>, other containers such as std::array<bool, N> are not optimized because they cannot use proxy object to "represent" a bit value.
It's mainly a style issue and so it's hard to prove one way is correct. C allows the syntax if(x) where the condition is executed if x is non-zero. So "true" can be a bit of a trap, if(x == true) doesn't always mean what you think. On the other hand, return true is a lot clearer than return 1 in a function like is_valid(). bool can be more memory efficient but that can be an illusion, often it's padded to four bytes anyway for efficiency reasons.
The main issue with bool, though again it is style issue, is that
mywdgt = openwidget("canvaswidget", 256, 256, true);
obviously means open or create a widget, which is a canvas, and is 256 x 256 pixels. But what is the last parameter?
mywdgt = openwidget("canvaswidget", 256, 256, ALLOW_ALPHA);
is a lot clearer. You know what the parameter is and what it does, at
a glance. So bool arguments should be avoided in function signatures,
use a define instead and say what the flag means.
bool - the built-in Boolean type. A bool can have the values true and false.
So, if you have a scenario in which a boolean type makes sense, e.g., a flag or return value to denote yes/no or success/failure, you would consider using bool.
But then if you have multiple such values, the use of an int(s) would be more appropriate. bool would not be appropriate because the values involved should be restricted to true/false.
To save memory it is highly recommended to use BOOL e.g if we are just checking condition(true or false) or some thing like that.
Bool occupies only 1 byte(generally- smallest addressable size) of memory whereas integer may take 2 or 4 bytes or more depends on compiler.
So its very much advisable to use bool instead integer to store only 1 or 0. Why to waste memory if you can do it in less memory.
And the bool can guarantee to have values only 0 or 1. So it will be appropriate for conditional statements.
I'm working on a microcontroller with only 2KB of SRAM and desperately need to conserve some memory. Trying to work out how I can put 8 0/1 values into a single byte using a bitfield but can't quite work it out.
struct Bits
{
int8_t b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};
int main(){
Bits b;
b.b0 = 0;
b.b1 = 1;
cout << (int)b.b0; // outputs 0, correct
cout << (int)b.b1; // outputs -1, should be outputting 1
}
What gives?
All of your bitfield members are signed 1-bit integers. On a two's complement system, that means they can represent only either 0 or -1. Use uint8_t if you want 0 and 1:
struct Bits
{
uint8_t b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};
As a word of caution - the standard doesn't really enforce an implementation scheme for bitfields. There is no guarantee that Bits will be 1 byte, and hypothetically it is entirely possible for it to be larger.
In practice however the actual implementations usually follow the obvious logic and it will "almost always" be 1 byte in size, but again, there is no requirement that it is guaranteed. Just in case you want to be sure, you could do it manually.
BTW -1 is still true but it -1 != true
As noted, these variables consist of only a sign bit, so the only available values are 0 and -1.
A more appropriate type for these bitfields would be bool. C++14 §9.6/4:
If the value true or false is stored into a bit-field of type bool of any size (including a one bit bit-field), the original bool value and the value of the bit-field shall compare equal.
Yes, std::uint8_t will do the job, but you might as well use the best fit. You won't need things like the cast for std::cout << (int)b.b0;.
Signed and unsigned integers are the answer.
Keep in mind that signaling is just an interpretation of bits, -1 or 1 is just the 'print' serializer interpreting the "variable type", as it was "revealed" to cout functions (look operator overloading) by compiler, the bit is the same, its value also (on/off) - since you have only 1 bit.
Don't care about that, but is a good practice to be explicit, so prefer to declare your variable with unsigned, it instructs the compiler to mount a proper code when you set or get the value to a serializer like "print" (cout).
"COUT" OPERATOR OVERLOADING:
"cout" works through a series of functions which the parameter overloading instructs the compiler which function to call. So, there are two functions, one receives an unsigned and another signed, thus they can interpret the same data differently, and you can change it, instructing the compiler to call another one using cast. See cout << myclass
Just read on an internal university thread:
#include <iostream>
using namespace std;
union zt
{
bool b;
int i;
};
int main()
{
zt w;
bool a,b;
a=1;
b=2;
cerr<<(bool)2<<static_cast<bool>(2)<<endl; //11
cerr<<a<<b<<(a==b)<<endl; //111
w.i=2;
int q=w.b;
cerr<<(bool)q<<q<<w.b<<((bool)((int)w.b))<<w.i<<(w.b==a)<<endl; //122220
cerr<<((w.b==a)?'T':'F')<<endl; //F
}
So a,b and w.b are all declared as bool. a is assigned 1, b is assigned 2, and the internal representation of w.b is changed to 2 (using a union).
This way all of a,b and w.b will be true, but a and w.b won't be equal, so this might mean that the universe is broken (true!=true)
I know this problem is more theoretical than practical (a sake programmer doesn't want to change the internal representation of a bool), but here are the questions:
Is this okay? (this was tested with g++ 4.3.3) I mean, should the compiler be aware that during boolean comparison any non-zero value might mean true?
Do you know any case where this corner case might become a real issue? (For example while loading binary data from a stream)
EDIT:
Three things:
bool and int have different sizes, that's okay. But what if I use char instead of int. Or when sizeof(bool)==sizeof(int)?
Please give answer to the two questions I asked if possible. I'm actually interested in answers to the second questions too, because in my honest opinion, in embedded systems (which might be 8bit systems) this might be a real problem (or not).
New question: Is this really undefined behavior? If yes, why? If not, why? Aren't there any assumptions on the boolean comparison operators in the specs?
If you read a member of a union that is a different member than the last member which was written then you get undefined behaviour. Writing an int member and then reading the union's bool member could cause anything to happen at any subsequent point in the program.
The only exception is where the unions is a union of structs and all the structs contain a common initial sequence, in which case the common sequence may be read.
Is this okay? (this was tested with g++ 4.3.3) I mean, should the compiler be aware that during boolean comparison any non-zero value might mean true?
Any integer value that is non zero (or pointer that is non NULL) represents true.
But when comparing integers and bool the bool is converted to int before comparison.
Do you know any case where this corner case might become a real issue? (For example while binary loading of data from a stream)
It is always a real issue.
Is this okay?
I don't know whether the specs specify anything about this. A compiler might always create a code like this: ((a!=0) && (b!=0)) || ((a==0) && (b==0)) when comparing two booleans, although this might decrease performance.
In my opinion this is not a bug, but an undefined behaviour. Although I think that every implementor should tell the users how boolean comparisons are made in their implementation.
If we go by your last code sample both a and b are bool and set to true by assigning 1 and 2 respectfully (Noe the 1 and 2 disappear they are now just true).
So breaking down your expression:
a!=0 // true (a converted to 1 because of auto-type conversion)
b!=0 // true (b converted to 1 because of auto-type conversion)
((a!=0) && (b!=0)) => (true && true) // true ( no conversion done)
a==0 // false (a converted to 1 because of auto-type conversion)
b==0 // false (b converted to 1 because of auto-type conversion)
((a==0) && (b==0)) => (false && false) // false ( no conversion done)
((a!=0) && (b!=0)) || ((a==0) && (b==0)) => (true || false) => true
So I would always expect the above expression to be well defined and always true.
But I am not sure how this applies to your original question. When assigning an integer to a bool the integer is converted to bool (as described several times). The actual representation of true is not defined by the standard and could be any bit pattern that fits in an bool (You may not assume any particular bit pattern).
When comparing the bool to int the bool is converted into an int first then compared.
Any real-world case
The only thing that pops in my mind, if someone reads binary data from a file into a struct, that have bool members. The problem might rise, if the file was made with an other program that has written 2 instead of 1 into the place of the bool (maybe because it was written in another programming language).
But this might mean bad programming practice.
Writing data in a binary format is non portable without knowledge.
There are problems with the size of each object.
There are problems with representation:
Integers (have endianess)
Float (Representation undefined ((usually depends on the underlying hardware))
Bool (Binary representation is undefined by the standard)
Struct (Padding between members may differ)
With all these you need to know the underlying hardware and the compiler. Different compilers or different versions of the compiler or even a compiler with different optimization flags may have different behaviors for all the above.
The problem with Union
struct X
{
int a;
bool b;
};
As people mention writing to 'a' and then reading from 'b' is undefined.
Why: because we do not know how 'a' or 'b' is represented on this hardware. Writing to 'a' will fill out the bits in 'a' but how does that reflect on the bits in 'b'. If your system used 1 byte bool and 4 byte int with lowest byte in low memory highest byte in the high memory then writing 1 to 'a' will put 1 in 'b'. But then how does your implementation represent a bool? Is true represented by 1 or 255? What happens if you put a 1 in 'b' and for all other uses of true it is using 255?
So unless you understand both your hardware and your compiler the behavior will be unexpected.
Thus these uses are undefined but not disallowed by the standard. The reason they are allowed is that you may have done the research and found that on your system with this particular compiler you can do some freeky optimization by making these assumptions. But be warned any changes in the assumptions will break your code.
Also when comparing two types the compiler will do some auto-conversions before comparison, remember the two types are converted into the same type before comparison. For comparison between integers and bool the bool is converted into an integer and then compared against the other integer (the conversion converts false to 0 and true to 1). If the objects being converted are both bool then no conversion is required and the comparison is done using boolean logic.
Normally, when assigning an arbitrary value to a bool the compiler will convert it for you:
int x = 5;
bool z = x; // automatic conversion here
The equivalent code generated by the compiler will look more like:
bool z = (x != 0) ? true : false;
However, the compiler will only do this conversion once. It would be unreasonable for it to assume that any nonzero bit pattern in a bool variable is equivalent to true, especially for doing logical operations like and. The resulting assembly code would be unwieldy.
Suffice to say that if you're using union data structures, you know what you're doing and you have the ability to confuse the compiler.
The boolean is one byte, and the integer is four bytes. When you assign 2 to the integer, the fourth byte has a value of 2, but the first byte has a value of 0. If you read the boolean out of the union, it's going to grab the first byte.
Edit: D'oh. As Oleg Zhylin points out, this only applies to a big-endian CPU. Thanks for the correction.
I believe what you're doing is called type punning:
http://en.wikipedia.org/wiki/Type_punning
Hmm strange, I am getting different output from codepad:
11
111
122222
T
The code also seems right to me, maybe it's a compiler bug?
See here
Just to write down my points of view:
Is this okay?
I don't know whether the specs specify anything about this. A compiler might always create a code like this: ((a!=0) && (b!=0)) || ((a==0) && (b==0)) when comparing two booleans, although this might decrease performance.
In my opinion this is not a bug, but an undefined behaviour. Although I think that every implementor should tell the users how boolean comparisons are made in their implementation.
Any real-world case
The only thing that pops in my mind, if someone reads binary data from a file into a struct, that have bool members. The problem might rise, if the file was made with an other program that has written 2 instead of 1 into the place of the bool (maybe because it was written in another programming language).
But this might mean bad programming practice.
One more: in embedded systems this bug might be a bigger problem, than on a "normal" system, because the programmers usually do more "bit-magic" to get the job done.
Addressing the questions posed, I think the behavior is ok and shouldn't be a problem in real world. As we don't have ^^ in C++ I would suggest !bool == !bool as a safe bool comparison technique.
This way every non-zero value in bool variable will be converted to zero and every zero is converted to some non-zero value, but most probably one and the same for any negation operation.