I know that in C and C++, when casting bools to ints, (int)true == 1 and (int)false == 0. I'm wondering about casting in the reverse direction...
In the code below, all of the following assertions held true for me in .c files compiled with Visual Studio 2013 and Keil µVision 5. Notice (bool)2 == true.
What do the C and C++ standards say about casting non-zero, non-one integers to bools? Is this behavior specified? Please include citations.
#include <stdbool.h>
#include <assert.h>
void TestBoolCast(void)
{
int i0 = 0, i1 = 1, i2 = 2;
assert((bool)i0 == false);
assert((bool)i1 == true);
assert((bool)i2 == true);
assert(!!i0 == false);
assert(!!i1 == true);
assert(!!i2 == true);
}
Not a duplicate of Can I assume (bool)true == (int)1 for any C++ compiler?:
Casting in the reverse direction (int --> bool).
No discussion there of non-zero, non-one values.
0 values of basic types (1)(2)map to false.
Other values map to true.
This convention was established in original C, via its flow control statements; C didn't have a boolean type at the time.
It's a common error to assume that as function return values, false indicates failure. But in particular from main it's false that indicates success. I've seen this done wrong many times, including in the Windows starter code for the D language (when you have folks like Walter Bright and Andrei Alexandrescu getting it wrong, then it's just dang easy to get wrong), hence this heads-up beware beware.
There's no need to cast to bool for built-in types because that conversion is implicit. However, Visual C++ (Microsoft's C++ compiler) has a tendency to issue a performance warning (!) for this, a pure silly-warning. A cast doesn't suffice to shut it up, but a conversion via double negation, i.e. return !!x, works nicely. One can read !! as a “convert to bool” operator, much as --> can be read as “goes to”. For those who are deeply into readability of operator notation. ;-)
1) C++14 §4.12/1 “A zero value, null pointer value, or null member pointer value is converted to false; any other value is converted to true. For direct-initialization (8.5), a prvalue of type std::nullptr_t can be converted to a prvalue of type bool; the resulting value is false.”
2) C99 and C11 §6.3.1.2/1 “When any scalar value is converted to _Bool, the result is 0 if the value compares equal to 0; otherwise, the result is 1.”
The following cites the C11 standard (final draft).
6.3.1.2: When any scalar value is converted to _Bool, the result is 0 if the value compares equal to 0; otherwise, the result is 1.
bool (mapped by stdbool.h to the internal name _Bool for C) itself is an unsigned integer type:
... The type _Bool and the unsigned integer types that correspond to the standard signed integer types are the standard unsigned integer types.
According to 6.2.5p2:
An object declared as type _Bool is large enough to store the values 0 and 1.
AFAIK these definitions are semantically identical to C++ - with the minor difference of the built-in(!) names. bool for C++ and _Bool for C.
Note that C does not use the term rvalues as C++ does. However, in C pointers are scalars, so assigning a pointer to a _Bool behaves as in C++.
There some kind of old school 'Marxismic' way to the cast int -> bool without C4800 warnings of Microsoft's cl compiler - is to use negation of negation.
int i = 0;
bool bi = !!i;
int j = 1;
bool bj = !!j;
Related
As I know reinterpret_cast must not lead to data loss.
So it is not possible to compile such code in X86_64 due integer is smaller than pointer
#include <cstdio>
int main() {
int a = 123;
int res = reinterpret_cast<int>(reinterpret_cast<void*>(a));
printf("%d", a == res);
}
The question is: why I can compile such code in GCC and Clang?
#include <cstdio>
int main() {
__uint128_t a = 4000000000000000000;
a *= 100;
__uint128_t res = reinterpret_cast<__uint128_t>(reinterpret_cast<void*>(a));
printf("%d", a == res);
}
And the result I get is "0", means that there is a data loss.
Edit
I think there are 3 possible variants what it could be.
Compiler bug, abuse of spec, or consequence of spec.
Which one is this?
It's explained here https://en.cppreference.com/w/cpp/language/reinterpret_cast
A pointer can be converted to any integral type large enough to hold all values of its type (e.g. to std::uintptr_t)
That's why you have an error for the first case
A value of any integral or enumeration type can be converted to a pointer type...
that's why you don't have an error, but it wraps to 0 in the second case. it somehow assumes that pointer type has the biggest range compared to any integral types, whereas with 128 bits integers it's not the case.
Note that a 128 bit integer is not an integral type generally speaking but at least gcc defines it as is in gcc extensions:
from https://quuxplusone.github.io/blog/2019/02/28/is-int128-integral/
libstdc++ (in standard, non-gnu++XX mode) leaves is_integral_v<__int128> as false. This makes a certain amount of sense from the library implementor’s point of view, because __int128 is not one of the standard integral types, and furthermore, if you call it integral, then you have to face the consequence that intmax_t (which is 64 bits on every ABI that matters) is kind of lying about being the “max.”
but
In -std=gnu++XX mode, libstdc++ makes is_integral_v<__int128> come out to true
I came up with a thought about the types of _Bool/ bool (stdbool.h) in C and bool in C++.
We use the boolean types to declare objects, that only shall hold the values of 0 or 1.
For example:
_Bool bin = 1;
or
bool bin = 1;
(Note: bool is a macro for _Bool inside the header file of stdbool.h.)
in C,
or
bool bin = 1;
in C++.
But are the boolean types of _Bool and bool really efficient?
I made a test to determine the size of each object in memory:
For C:
#include <stdio.h>
#include <stdbool.h> // for "bool" macro.
int main()
{
_Bool bin1 = 1;
bool bin2 = 1; // just for the sake of completeness; bool is a macro for _Bool.
printf("the size of bin1 in bytes is: %lu \n",(sizeof(bin1)));
printf("the size of bin2 in bytes is: %lu \n",(sizeof(bin2)));
return 0;
}
Output:
the size of bin1 in bytes is: 1
the size of bin2 in bytes is: 1
For C++:
#include <iostream>
int main()
{
bool bin = 1;
std::cout << "the size of bin in bytes is: " << sizeof(bin);
return 0;
}
Output:
the size of bin in bytes is: 1
So, objects of a boolean type do get stored inside 1 byte (8 bits) in memory, not just in one 1 bit, as it normally only shall require.
The reason why is discussed here: Why is a char and a bool the same size in c++?. This is not what my question is about.
My question are:
Why do we use the types of _Bool/ bool (stdbool.h) in C and bool in C++, if they do not provide a benefit in memory storage, as it is specificially pretended for use these types?
Why can´t I just use the types of int8_t or char (assuming char is contained of 8 bit (which is usually the case) in the specific implementation) instead?
Is it just to provide the obvious impression for a reader of the code, that the respective objects are used for 0 or 1/true or false purposes only?
Thank you very much to participate.
Why do we use the types of _Bool/ bool (stdbool.h) in C and bool in C++, if they do not provide a benefit in memory storage, as it is specificially pretended for use these types?
You already mentioned the reason in your question:
We use the boolean types to declare objects, that only shall hold the values of 0 or 1
The advantage of using boolean datatype specifically is because it can only represent true or false. The other integer types have more representable values which is undesirable when you want only two.
Why can´t I just use the types of int8_t or char (assuming char is contained of 8 bit (which is usually the case) in the specific implementation) instead?
You can. In fact, C didn't have a boolean data type until C99 standard. Note that downside of using int8_t is that it is not guaranteed to be provided by all systems. And porblem with char is that it may be either signed or unsigned.
But you don't need to, since you can use boolean data type instead.
This implies that there is difference when I use trueor false with boolean types in comparison to when I use these with char or int8_t. Could you state this difference?
Consider following trivial example:
int8_t i = some_value;
bool b = some_value;
if (i == true)
if (i)
if (b == true)
if (b)
For int8_t, those two conditionals have different behaviour, which creates opporunity for the behaviour to be wrong if the wrong form is chosen. For a boolean, they have identical behaviour and there is no wrong choice.
P.S. If you want to compactly store multiple boolean values (at the cost of multiple instructions per read and write), you can use std::bitset or std::vector<bool> for example. In C there are no analogous standard library utilities, but such functionality can be implemented with shifting and masking.
There's more than one sort of efficiency. Memory efficiency is one. Speed efficiency is another.
The original C language did not have a boolean type at all -- typically a programmer would use an int for boolean flags, 0 for false and 1 for true. If they were really concerned about memory efficiency, they might use a bitmap to store eight booleans in a byte, but this was generally only necessary in situations where memory was really scarce. But accessing an int is faster than accessing an int then unpacking its constituent bits.
_Bool/bool was introduced in C99. It reflects the common practice of storing booleans in an int.
However, it has the advantage that the compiler knows it's a boolean, so it's more difficult to accidentally assign it the value 3, add it to an integer, etc.
Most programming languages today store a boolean in a byte. Yes, it uses eight times more memory than necessary -- but it's fast, and it's rare to have so many booleans on the go at once that the waste becomes significant.
In many programming languages, the implementation is separate from the language spec -- Javascript's spec doesn't say how the Javascript runtime should store true or false. In C99 however you can rely on true being equivalent to integer 1.
If booleans are truly using too much of your system's memory, you can work with bitwise operations to store 8 booleans in an unsigned char or more in the larger types.
You can do that for runtime operations if necessary, or just when writing to an output format (if the problem is the size of records on filesystems, or network packets).
It's worth noting though, that in many, many modern applications, people are perfectly happy to represent false on the wire or on the filesystem as the 7 bytes [ '"', 'f', 'a', 'l', 's', 'e', '"' ]
One of the reasons for having bool as well as int is to increase the comprehensibility of the code to those who come after and try to maintain it.
Consider these
bool b;
int c;
if (b == c)
c = 2;
b = 2;
Now, I'd have said that comparing a boolean (true or false) with a number is very likely an error. So things like 'if (b == 1)' could indicate a coding error. I hope you'd agree that 'b = 2' is just wrong.
Benefits in memory storage (and I don't recall anything anywhere claiming that using boolean types reduced memory requirements) are not the only reason for language features.
This question already has answers here:
Setting extra bits in a bool makes it true and false at the same time
(2 answers)
Closed 3 years ago.
Consider the program below.
All comparisons are true with a recent gcc but only the value 1 compares equal with the Visual Studio commandline compiler v. 19.16.27031.1 for x86.
I believe that it's generally OK to write into PODs through char pointers; but is there wording in the standard about writing funny values into bool variables? If it is allowed, is there wording about the behavior in comparisons?
#include <iostream>
using namespace std;
void f()
{
if(sizeof(bool) != 1)
{
cout << "sizeof(bool) != 1\n";
return;
}
bool b;
*(char *)&b = 1;
if(b == true) { cout << (int) *(char *)&b << " is true\n"; }
*(char *)&b = 2;
if(b == true) { cout << (int) *(char *)&b << " is true\n"; }
*(char *)&b = 3;
if(b == true) { cout << (int) *(char *)&b << " is true\n"; }
}
int main()
{
f();
}
P.S. gcc 8.3 uses a test instruction to effectively check for non-zero while gcc 9.1 explicitly compares with 1, making only that comparison true. Perhaps this godbolt link works.
No. This is not OK.
Writting arbitrary data in a bool is much UB (see What is the strict aliasing rule?) and similar to Does the C++ standard allow for an uninitialized bool to crash a program?
*(char *)&b = 2;
This type punning hack invoke UB. According to your compiler implementation for bool and the optimization it is allowed to do, you could have demons flying off your nose.
Consider:
bool b;
b = char{2}; // 1
(char&)b = 2; // 2
*(char*)&b = 2; // 3
Here, lines 2 and 3 have the same meaning, but 1 has a different meaning. In line 1, since the value being assigned to the bool object is nonzero, the result is guaranteed to be true. However, in lines 2 and 3, the object representation of the bool object is being written to directly.
It is indeed legal to write to an object of any non-const type through an lvalue of type char, but:
In C++17, the standard does not specify the representation of bool objects. The bool type may have padding bits, and may even be larger than char. Thus, any attempt to write directly to a bool value in this way may yield an invalid (or "trap") object representation, which means that subsequently reading that value will yield undefined behaviour. Implementations may (but are not required by the standard to) define the representation of bool objects.
In C++20, my understanding is that thanks to P1236R1, there are no longer any trap representations, but the representation of bool is still not completely specified. The bool object may still be larger than char, so if you write to only the first byte of it, it can still contain an indeterminate value, yielding UB when accessed. If bool is 1 byte (which is likely), then the result is unspecified---it must yield some valid value of the underlying type (which will most likely be char or its signed or unsigned cousin) but the mapping of such values to true and false remains unspecified.
Writing any integer values into a bool through a pointer to a type other than bool is undefined behavior, because those may not match the compiler's representation of the type. And yes, writing something other than 0 or 1 will absolutely break things: compilers often rely on the exact internal representation of boolean true.
But bool b = 3 is fine, and just sets b to true (the rule for converting from integer types to bool is, any nonzero value becomes true and zero becomes false).
It's OK to assign values other than true and false to a variable of type bool.
The RHS is converted to a bool by using the standard conversion sequence to true/false before the value is assigned.
However, what you are trying to do is not OK.
*(char *)&b = 2; // Not OK
*(char *)&b = 3; // Not OK
Even assigning 1 and 0 by using that mechanism is not OK.
*(char *)&b = 1; // Not OK
*(char *)&b = 0; // Not OK
The following statements are OK.
b = 2; // OK
b = 3; // OK
Update, in response to OP's comment.
From the standard/basic.types#basic.fundamental-6:
Values of type bool are either true or false.
The standard does not mandate that true be represented as 1 and/or false be represented as 0. An implementation can choose a representation that best suits their needs.
The standard goes on to say this about value of bool types:
Using a bool value in ways described by this International Standard as “undefined,” such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false.
Storing the value char(1) or char(0) in its memory location indirectly does not guarantee that the values will be properly converted to true/false. Since theose value may not represent either true or false in an implementation, accessing those values would lead to undefined behavior.
In general, it's perfectly find to assign values other than 0 or 1 to a bool:
7.3.14 Boolean conversions
[conv.bool]
1 A prvalue of arithmetic, unscoped enumeration, pointer, or pointer-to-member type can be converted to a prvalue of type bool. A zero value, null pointer value, or null member pointer value is converted to false; any other value is converted to true.
But your casting is another question entirely.
Be careful thinking it's ok to write to types through pointers to something else. You can get very surprising results, and the optimizer is allowed to assume certain such things are not done. I don't know all the rules for it, but the optimizer doesn't always follow writes through pointers to different types (it is allowed to do all sorts of things in the presence of undefined behavior!) But beware, code like this:
bool f()
{
bool a = true;
bool b = true;
*reinterpret_cast<char*>(&a) = 1;
*reinterpret_cast<char*>(&b) = 2;
return a == b;
}
Live: https://godbolt.org/z/hJnuSi
With optimizations:
g++: -> true (but the value is actually 2)
clang: -> false
main() {
std::cout << f() << "\n"; // g++ prints 2!!!
}
Though f() returns a bool, g++ actually prints out 2 in main here. Probably not expected.
I am interrested wheather standard says anything about possible values of bool type type after casting it to integer type.
For example following code:
#include <iostream>
using namespace std;
int main() {
bool someValue=false;
*((int*)(&someValue)) = 50;
cout << someValue << endl;
return 0;
}
prints 1 even though it's forced to store value 50. Does standard specify anything about it? Or is compiler generating some method for type bool as:
operator int(){
return myValue !=0 ? 1 : 0;
}
Also why is casting like following:
reinterpret_cast<int>(someValue) = 50;
forbidden with error
error: invalid cast from type 'bool' to type 'int'
(For all above I user GCC 5.1 compiler.)
The way you are using it exhibits UB, because you write outside of the bool variable's boundaries AND you break strict aliasing rule.
However, if you have a bool and want to use it as a an int (this usually happens when you want to index into an array based on some condition), the standard mandates that a true bool converts into 1 and false bool converts into 0, no matter what (UB obviously excluded).
For example, this is guaranteed to output 52 as long as should_add == true.
int main(){
int arr[] = {0, 10};
bool should_add = 123;
int result = 42 + arr[should_add];
std::cout << result << '\n';
}
This line *((int*)(&someValue)) = 50; is at least non standard. The implementation could use a lesser rank for bool (say 1 or 2 bytes) that for int (say 4 bytes). In that case, you would write past the variable possibly erasing an other variable.
And anyway, as you were said in comment, thanks to the strict aliasing rule almost any access through a casted pointer can be seen as Undefined Behaviour by a compiler. The only almost legal one (for the strict aliasing rule) would be:
*((char *) &someValue) = 50;
on a little endian system, and
*(((char *) &someValue) + sizeof(bool) - 1) = 50;
on a big endian one (byte access has still not be forbidden).
Anyway, as the representation of bool is not specified by the standard directly writing something in a bool can lead to true or false depending on implementation. For example an implementation could considere only the lowest level bit (true if val&1 is 1, else 0), another one could considere all bits (true for any non 0 value, false for only 0). The only thing that standard says is that a conversion of a 0 leads to false and of a non 0 leads to true.
But was is mandated by standard is the conversion from bool to int:
4.5 Integral promotions [conv.prom]
...A prvalue of type bool can be converted to a prvalue of type int, with false becoming zero and true
becoming one.
So this fully explains that displaying a bool can only give 0 or 1 - even if as the previous operation invoked UB, anything could have happen here including this display
You invoked Undefined Behaviour - shame on you
Just read on an internal university thread:
#include <iostream>
using namespace std;
union zt
{
bool b;
int i;
};
int main()
{
zt w;
bool a,b;
a=1;
b=2;
cerr<<(bool)2<<static_cast<bool>(2)<<endl; //11
cerr<<a<<b<<(a==b)<<endl; //111
w.i=2;
int q=w.b;
cerr<<(bool)q<<q<<w.b<<((bool)((int)w.b))<<w.i<<(w.b==a)<<endl; //122220
cerr<<((w.b==a)?'T':'F')<<endl; //F
}
So a,b and w.b are all declared as bool. a is assigned 1, b is assigned 2, and the internal representation of w.b is changed to 2 (using a union).
This way all of a,b and w.b will be true, but a and w.b won't be equal, so this might mean that the universe is broken (true!=true)
I know this problem is more theoretical than practical (a sake programmer doesn't want to change the internal representation of a bool), but here are the questions:
Is this okay? (this was tested with g++ 4.3.3) I mean, should the compiler be aware that during boolean comparison any non-zero value might mean true?
Do you know any case where this corner case might become a real issue? (For example while loading binary data from a stream)
EDIT:
Three things:
bool and int have different sizes, that's okay. But what if I use char instead of int. Or when sizeof(bool)==sizeof(int)?
Please give answer to the two questions I asked if possible. I'm actually interested in answers to the second questions too, because in my honest opinion, in embedded systems (which might be 8bit systems) this might be a real problem (or not).
New question: Is this really undefined behavior? If yes, why? If not, why? Aren't there any assumptions on the boolean comparison operators in the specs?
If you read a member of a union that is a different member than the last member which was written then you get undefined behaviour. Writing an int member and then reading the union's bool member could cause anything to happen at any subsequent point in the program.
The only exception is where the unions is a union of structs and all the structs contain a common initial sequence, in which case the common sequence may be read.
Is this okay? (this was tested with g++ 4.3.3) I mean, should the compiler be aware that during boolean comparison any non-zero value might mean true?
Any integer value that is non zero (or pointer that is non NULL) represents true.
But when comparing integers and bool the bool is converted to int before comparison.
Do you know any case where this corner case might become a real issue? (For example while binary loading of data from a stream)
It is always a real issue.
Is this okay?
I don't know whether the specs specify anything about this. A compiler might always create a code like this: ((a!=0) && (b!=0)) || ((a==0) && (b==0)) when comparing two booleans, although this might decrease performance.
In my opinion this is not a bug, but an undefined behaviour. Although I think that every implementor should tell the users how boolean comparisons are made in their implementation.
If we go by your last code sample both a and b are bool and set to true by assigning 1 and 2 respectfully (Noe the 1 and 2 disappear they are now just true).
So breaking down your expression:
a!=0 // true (a converted to 1 because of auto-type conversion)
b!=0 // true (b converted to 1 because of auto-type conversion)
((a!=0) && (b!=0)) => (true && true) // true ( no conversion done)
a==0 // false (a converted to 1 because of auto-type conversion)
b==0 // false (b converted to 1 because of auto-type conversion)
((a==0) && (b==0)) => (false && false) // false ( no conversion done)
((a!=0) && (b!=0)) || ((a==0) && (b==0)) => (true || false) => true
So I would always expect the above expression to be well defined and always true.
But I am not sure how this applies to your original question. When assigning an integer to a bool the integer is converted to bool (as described several times). The actual representation of true is not defined by the standard and could be any bit pattern that fits in an bool (You may not assume any particular bit pattern).
When comparing the bool to int the bool is converted into an int first then compared.
Any real-world case
The only thing that pops in my mind, if someone reads binary data from a file into a struct, that have bool members. The problem might rise, if the file was made with an other program that has written 2 instead of 1 into the place of the bool (maybe because it was written in another programming language).
But this might mean bad programming practice.
Writing data in a binary format is non portable without knowledge.
There are problems with the size of each object.
There are problems with representation:
Integers (have endianess)
Float (Representation undefined ((usually depends on the underlying hardware))
Bool (Binary representation is undefined by the standard)
Struct (Padding between members may differ)
With all these you need to know the underlying hardware and the compiler. Different compilers or different versions of the compiler or even a compiler with different optimization flags may have different behaviors for all the above.
The problem with Union
struct X
{
int a;
bool b;
};
As people mention writing to 'a' and then reading from 'b' is undefined.
Why: because we do not know how 'a' or 'b' is represented on this hardware. Writing to 'a' will fill out the bits in 'a' but how does that reflect on the bits in 'b'. If your system used 1 byte bool and 4 byte int with lowest byte in low memory highest byte in the high memory then writing 1 to 'a' will put 1 in 'b'. But then how does your implementation represent a bool? Is true represented by 1 or 255? What happens if you put a 1 in 'b' and for all other uses of true it is using 255?
So unless you understand both your hardware and your compiler the behavior will be unexpected.
Thus these uses are undefined but not disallowed by the standard. The reason they are allowed is that you may have done the research and found that on your system with this particular compiler you can do some freeky optimization by making these assumptions. But be warned any changes in the assumptions will break your code.
Also when comparing two types the compiler will do some auto-conversions before comparison, remember the two types are converted into the same type before comparison. For comparison between integers and bool the bool is converted into an integer and then compared against the other integer (the conversion converts false to 0 and true to 1). If the objects being converted are both bool then no conversion is required and the comparison is done using boolean logic.
Normally, when assigning an arbitrary value to a bool the compiler will convert it for you:
int x = 5;
bool z = x; // automatic conversion here
The equivalent code generated by the compiler will look more like:
bool z = (x != 0) ? true : false;
However, the compiler will only do this conversion once. It would be unreasonable for it to assume that any nonzero bit pattern in a bool variable is equivalent to true, especially for doing logical operations like and. The resulting assembly code would be unwieldy.
Suffice to say that if you're using union data structures, you know what you're doing and you have the ability to confuse the compiler.
The boolean is one byte, and the integer is four bytes. When you assign 2 to the integer, the fourth byte has a value of 2, but the first byte has a value of 0. If you read the boolean out of the union, it's going to grab the first byte.
Edit: D'oh. As Oleg Zhylin points out, this only applies to a big-endian CPU. Thanks for the correction.
I believe what you're doing is called type punning:
http://en.wikipedia.org/wiki/Type_punning
Hmm strange, I am getting different output from codepad:
11
111
122222
T
The code also seems right to me, maybe it's a compiler bug?
See here
Just to write down my points of view:
Is this okay?
I don't know whether the specs specify anything about this. A compiler might always create a code like this: ((a!=0) && (b!=0)) || ((a==0) && (b==0)) when comparing two booleans, although this might decrease performance.
In my opinion this is not a bug, but an undefined behaviour. Although I think that every implementor should tell the users how boolean comparisons are made in their implementation.
Any real-world case
The only thing that pops in my mind, if someone reads binary data from a file into a struct, that have bool members. The problem might rise, if the file was made with an other program that has written 2 instead of 1 into the place of the bool (maybe because it was written in another programming language).
But this might mean bad programming practice.
One more: in embedded systems this bug might be a bigger problem, than on a "normal" system, because the programmers usually do more "bit-magic" to get the job done.
Addressing the questions posed, I think the behavior is ok and shouldn't be a problem in real world. As we don't have ^^ in C++ I would suggest !bool == !bool as a safe bool comparison technique.
This way every non-zero value in bool variable will be converted to zero and every zero is converted to some non-zero value, but most probably one and the same for any negation operation.