How are bools stored in C++? [duplicate]

How are bools stored in C++? [duplicate] - c++

As already discussed in the docs, a bool data type occupies at least a byte of memory. A similar question was asked on SO before (How a bool type variable is stored in memory? (C++)), but this discussion and the documentation only seem to discuss the amount of space occupied by a boolean data type, not what actually happens in memory when I do this:
bool b = true;
So what does actually happen in memory? What happens to the 7 bits that are not used in storing this information? Does the standard prescribe behavior for this?
Are they undefined? Or did someone at C++ headquarters just do this:
enum bool : char
{
false = 0,
true = 1
};

Standard states that bool values behave as integral types, yet it doesn't specify their concrete representation in memory:
"Values of type bool are either true or false. As described below, bool values behave as integral types. Values of type bool participate in integral promotions" ~ C++03 3.9.1 §6
"A synonym for integral type is integer type. The representations of integral types shall define values by use of a pure binary numeration system" ~ C++03 3.9.1 §7
Yet it defines the integral promotion from bool to int:
"An rvalue of type bool can be converted to an rvalue of type int, with false becoming zero and true becoming one. These conversions are called integral promotions." ~ C++03 4.5 §4-5
as well as conversion from other types to bool:
"A zero value, null pointer value, or null member pointer value is converted to false; any other value is converted to true." ~ C++03 4.12 §1

The standard doesn't mandate anything for the binary representation of bools; it just says that, when converting to other integral types, a true bool will become 1 and a false bool will become 0.
This of course suggests an implementation similar in spirit to the one you said, where such conversions would become essentially no-ops or plain integer widening (but remember that bool is mandated to be a primitive type, not an enumeration type).

You can test things like this by copying the memory, ignoring which type it is. This program reads the raw memory value of test_bool and puts it into test_int so that you may print it.
int test_int = 0;
bool test_bool;
test_bool = true;
memcpy (&test_int, &test_bool, sizeof(bool));
printf ("True value is: %d\n", test_int);
test_bool = false;
memcpy (&test_int, &test_bool, sizeof(bool));
printf ("False value is: %d\n", test_int);
For me, this program gives:
True value is: 1
False value is: 0

I didn't program in C++ since a couple of months but AFAIR the rule is the following:
0 - false;
any value different than 0 - true; (by default it's 1 but AFAIR if you convert it from other integer value, ex. 2 will be also treated as true).
Thus you can say that C++ in some way waste memory (others too) as it could use one bit only but it's simpler to make compiler then.
In C++ you can also define bit fields (bit fields in wikipedia) but it's not used often.

Related

Why can bool and _Bool only store 0 or 1 if they occupy 1 byte in memory?

I have read the answers to that question: Why is a char and a bool the same size in c++? and made an experiment to determine the size of allocated bytes in memory of a _Bool and a bool(I know that bool is a macro for _Bool in stdbool.h but for the sake of completeness I used it too) object in C, as well a bool object in C++ on my implementation Linux Ubuntu 12.4:
For C:
#include <stdio.h>
#include <stdbool.h> // for "bool" macro.
int main()
{
_Bool bin1 = 1;
bool bin2 = 1; // just for the sake of completeness; bool is a macro for _Bool.
printf("the size of bin1 in bytes is: %lu \n",(sizeof(bin1)));
printf("the size of bin2 in bytes is: %lu \n",(sizeof(bin2)));
return 0;
}
Output:
the size of bin1 in bytes is: 1
the size of bin2 in bytes is: 1
For C++:
#include <iostream>
int main()
{
bool bin = 1;
std::cout << "the size of bin in bytes is: " << sizeof(bin);
return 0;
}
Output:
the size of bin in bytes is: 1
So, objects of a boolean type, regardless of specifically C or C++, occupy 1 byte (8 bits) in memory, not just 1 bit.
My question is:
Why do objects of the types bool and _Bool in C and bool in C++ can store only the values of 0 or 1 if they occupy 1 byte in memory which could hold 256 values?
Of course, Their purpose is to represent only the values of 0 and 1 or true and false, but which unit or macro decides that it only can store 0 or 1?
Additional, but not my main question:
And what would happen if the value of a boolean type is *accidentally modified in memory to a greater value since it can be stored in memory this way?
*With accidentally I mean either: Modified by "Undetectable means" - What are “undetectable means” and how can they change objects of a C/C++ program? or an inappropriate assignment of f.e. bool a; a = 25;.

The C language limits what can be stored in a _Bool, even if it has the capacity to hold other values besides 0 and 1.
Section 6.3.1.2 of the C standard says the following regarding conversions to _Bool:
When any scalar value is converted to _Bool, the result is 0 if the value compares equal
to 0; otherwise, the result is 1.
The C++17 standard has similar language in section 7.14:
A prvalue of arithmetic, unscoped enumeration, pointer, or pointer to member type can be converted to a
prvalue of type bool. A zero value, null pointer value, or null member pointer value is converted to false;
any other value is converted to true. For direct-initialization (11.6), a prvalue of type std::nullptr_t can
be converted to a prvalue of type bool; the resulting value is false.
So even if you attempt to assign some other value to a _Bool the language will convert the value to either 0 or 1 for C and to true or false for C++. If you attempt to bypass this by writing to a _Bool via a pointer to a different type, you invoke undefined behavior.

Answer for C++:
So, objects of a boolean type, regardless of specifically C or C++, occupy 1 byte (8 bits) in memory, not just 1 bit.
That's simply because the fundamental storage unit in the C++ memory model is the byte.
Why do objects of the type [...] bool in C++ can store only the values of 0 or 1 if they occupy 1 byte in memory which could hold 256 values?
But which unit or macro decides that it only can store 0 or 1?
The assumption here is wrong. In C++, a bool does not hold 0 or 1, it holds false or true: http://eel.is/c++draft/basic.fundamental#10.
How those two values are represented in memory is up to the implementation. An implementation could use 0 and 1, or 0 and 255, or 0 and <any nonzero value>, or anything it wants really. You are not guaranteed to find 0 or 1 when inspecting the memory of a bool, because...
If you "assign" e.g. an integer or a pointer to a bool, it is implicitly converted to either true or false according to the usual rules: http://eel.is/c++draft/conv.bool#1
If you "read" an integer from a bool, it is implicitly converted to 0 if it held the value false or 1 if it held the value true: http://eel.is/c++draft/conv.prom#6
It is the job of the compiler to ensure that the above two things hold true, regardless of how the bool values are represented in memory. Remember, C++ is specified on the abstract machine, and your program only has to behave as if executed on the abstract machine.
And what would happen if the value of a boolean type is accidentally modified in memory to a greater value?
Undefined behavior. See one of these:
Why does bool and not bool both return true in this case?
Engineered bool compares equal to both true and false, why?.
Does the C++ standard allow for an uninitialized bool to crash a program?

(Answering for C.)
But which unit or macro decides that it only can store 0 or 1?
In typical C implementations, the compiler implements this. The compiler decides (or is designed to) which instructions to use when manipulating _Bool values. It might test a _Bool with an instruction that sets a condition code according to whether the byte is zero or non-zero it might test it with an instruction that sets a condition code according to whether the low bit (for example) is zero or non-zero. The C standard does not impose any requirements on this. Each C implementation is free to choose its own implementation.
And what would happen if the value of a boolean type is accidentally modified in memory to a greater value?
This depends on the C implementation. A greater value might be treated as 1, if the implementation is testing zero versus non-zero. A greater value might be treated according to its low bit, if the implementation is using that. A greater value might behave differently in different circumstances, if the implementation uses varied instructions according to circumstances. A greater value also might cause results that would be otherwise nonsensical. For example, given int x = 4; and some _Bool y that has been inappropriately modified by writing to its memory, int z = x + y; might set z to 10 even though only 4 or 5 would be possible if y were a proper _Bool. When you modify the representation of a type to something other than bits that represent a proper value as defined by the implementation, the resulting behavior is not defined by the C standard, or, generally, by the C implementation.
Would it even be possible and permissible to assign a greater value to a boolean type?
No, assignments convert the right operand to the type of the assignment expression (which is the type of the left operand, except as a value rather than an lvalue).

Why do objects of the types bool and _Bool in C and bool in C++ can store only the values of 0 or 1 if they occupy 1 byte in memory which could hold 256 values?
Because, in the end, the language specification doesn't say how large a bool is, it only defines what it can do. The C language specification says that _Bool can hold a 0 or a 1. The size of a bool data type is a detail of individual implementations, not a part of the specification itself. It's possible to have an implementation that actually allocates individual bits for a bool, it's possible to have a specification that allocates multiple bytes for a bool. So to stay inline with the specification, the important part isn't the size of the memory allocated, but that it performs according to the specification which means it holds a 0 or a 1.
And what would happen if the value of a boolean type is accidentally modified in memory to a greater value since it can be stored in memory this way?
Undefined behavior I expect. I don't think that the specification says what happens, and as a result what happens is up to the implementor. One implementation may examine the first bit of the underlying memory and ignore the rest. Another implementation may examine the entire underlying memory location and if any of the bits are set, give a value of 1.
The word of caution...
You can write a program to see what your implementation does with such data and write programs that work will for your implementation, but know that you're not testing what 'C' does, you're testing what that particular implementation/compiler will do. Also, know that once you start treading into the waters of undefined behavior, you also start treading into the waters of things that will break programs for reasons that you may not understand. Compilers will apply a wide variety of optimizations based on a number of assumptions. The compiler may write a program that performs perfectly fine while you're doing a bunch of work, you finish it, you tell the compiler to create an optimized release version and because you've been digging into undefined behaviors you broke an assumption the compiler made and, it may apply an optimization that suddenly breaks your code and tracking it down may prove tremendously difficult. Always try to stick within well defined behaviors.

Why do objects of the types bool and _Bool in C and bool in C++ can store only the values of 0 or 1 if they occupy 1 byte in memory which could hold 256 values?
If bool can store the whole value range of a char then why don't just use char?
Of course, Their purpose is to represent only the values of 0 and 1 or true and false, but which unit or macro decides that it only can store 0 or 1?
The compiler will handle the conversion when you assign a value to a bool variable. If it's truthy then the variable will contains true. That behavior was defined in the C and C++ standards. That means bool a; a = 25; is completely valid and not "an inappropriate assignment" as you though. After that a will always contain true/1. You can never set a bool to anything other than 0 and 1 via normal variable assignment
There's no problem using a char or an int as bool like how it was before modern C and C++, but by limiting the value range it also allows the compiler to do a lot of optimizations. For example bool x = !y; will be done by a simple XOR instruction, which will not work if y contains any values other than 0 and 1. If y is a normal integer type then you'll need to normalize y to 0 and 1 first. See demo
As a matter of fact, not all bits in the representation have to involve in value calculation, and not all bit patterns have to be valid. C and C++ allow types to contain padding bits and trap representations, so a 32-bit type may have only 30 value bits, or be able to store only 232-4 different values. This is not to say that bool definitely contains padding bits, just a proof that you are permitted to have a type narrower than the possible range
The only exception we are aware of is _Bool (as observed by Joseph Myers wrt GCC). It seems one could either (a) take non {0,1} values to be trap representations in the current sense, or (b) regard operations on non {0,1} values of that type as giving an unspecified value. The latter would bound possible misbehaviour, which would be good for programmers; the only possible downside we are aware of is that it could limit compilation via computed branch tables indexed by unchecked _Bool values.
N2091: Clarifying Trap Representations (Draft Defect Report or Proposal for C2x)
However some implementations do consider them trap representations
In fact, as implemented by GCC and Clang, the _Bool type has two values and 254 trap representations.
Trap representations and padding bits - Pascal Cuoq
And what would happen if the value of a boolean type is accidentally modified in memory to a greater value?
If you manipulate the value of the bool to another value directly via a pointer then in C++ undefined behavior will happen
6.9.1 Fundamental types
Values of type bool are either true or false. 50 [Note: There are no signed, unsigned, short, or long bool types or values. — end note] Values of type bool participate in integral promotions (7.6).
50) Using a bool value in ways described by this International Standard as “undefined”, such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false.
C++17
I couldn't find the reference in C99 but it will be undefined behavior if the value you set is a trap representation
6.2.6 Representations of types
Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.41) Such a representation is called a trap representation.
There are already many questions regarding that "weird" behavior
_Bool type and strict aliasing
Setting extra bits in a bool makes it true and false at the same time
Why does bool and not bool both return true in this case?
Weird results for conditional operator with GCC and bool pointers
Engineered bool compares equal to both true and false, why?

using Boolean to represent integers

In c++, bool is used to represent Boolean. that is it holds true or false. But in some case we can use bool to represent integers also.
what is the meaning of bool a=5; in c++?

"what is the meaning of bool a=5; in c++?"
It's actually equivalent to writing
bool a = (5 != 0);
"But in some case we can use bool to represent integers also."
Not really. The bool value only represents whether the integer used to initialize it was zero (-> false) or not (-> true).
The other way round (as mentioned in #Daniel Frey's comment) false will be converted back to an integer 0, and true will become 1.
So the original integer value's (or any other expression results like pointers, besides nullptr, or double values not exactly representing 0.0) will be lost.
Conclusion
As mentioned in LRiO's answer, it's not possible to store information other than false or true in a bool variable.
There are guaranteed rules of conversion though (citation from cppreference.com):
The safe bool problem
Until the introduction of explicit conversion functions in C++11, designing a class that should be usable in boolean contexts (e.g. if(obj) { ... }) presented a problem: given a user-defined conversion function, such as T::operator bool() const;, the implicit conversion sequence allowed one additional standard conversion sequence after that function call, which means the resultant bool could be converted to int, allowing such code as obj << 1; or int i = obj;.
One early solution for this can be seen in std::basic_ios, which defines operator! and operator void* (until C++11), so that the code such as if(std::cin) {...} compiles because void* is convertible to bool, but int n = std::cout; does not compile because void* is not convertible to int. This still allows nonsense code such as delete std::cout; to compile, and many pre-C++11 third party libraries were designed with a more elaborate solution, known as the Safe Bool idiom.

No, we can't.
It's just that there's an implicit conversion from integer to boolean.
Zero becomes false; anything else becomes true.

In fact this declaration
bool a=5;
is equivalent to
bool a=true;
except that in the first declaration 5 as it is not equal to zero is implicitly converted to true.
From the C++ Standard (4.12 Boolean conversions )
1 A prvalue of arithmetic, unscoped enumeration, pointer, or pointer
to member type can be converted to a prvalue of type bool. A zero
value, null pointer value, or null member pointer value is converted
to false; any other value is converted to true. For
direct-initialization (8.5), a prvalue of type std::nullptr_t can be
converted to a prvalue of type bool; the resulting value is false.
One more funny example
bool a = new int;
Of course it does not mean that we may use bool to represent pointers. Simply if the allocation is successfull then the returned poimter is not equal to zero and implicitly converted to true according to the quote I showed.
Take into account that till now some compilers have a bug and compile this code successfully
bool a = nullptr;
Though according to the same quote a valid declaration will look like
bool a( nullptr );

And, basically, I would opine that bool b=5 is "just plain wrong."
Both C and C++ have a comparatively weak system of typing. Nevertheless, I'm of the opinion that this should have produced a compile error ... and, if not, certainly a rejected code review, because it is (IMHO) more likely to be an unintentional mistake, than to be intention.
You should examine the code to determine whether b is or is not "really" being treated as a bool value. Barring some "magic bit-twiddling purpose" (which C/C++ programs sometimes are called-upon to do ...), this is probably a typo or a bug.
Vlad's response (below) shows what the standard says will happen if you do this, but I suggest that, "since it does not make human-sense to do this, it's probably an error and should be treated accordingly by the team.
A "(typecast)?" Maybe. A data-type mismatch such as this? Nyet.

How is a bool represented in memory?

As already discussed in the docs, a bool data type occupies at least a byte of memory. A similar question was asked on SO before (How a bool type variable is stored in memory? (C++)), but this discussion and the documentation only seem to discuss the amount of space occupied by a boolean data type, not what actually happens in memory when I do this:
bool b = true;
So what does actually happen in memory? What happens to the 7 bits that are not used in storing this information? Does the standard prescribe behavior for this?
Are they undefined? Or did someone at C++ headquarters just do this:
enum bool : char
{
false = 0,
true = 1
};

Standard states that bool values behave as integral types, yet it doesn't specify their concrete representation in memory:
"Values of type bool are either true or false. As described below, bool values behave as integral types. Values of type bool participate in integral promotions" ~ C++03 3.9.1 §6
"A synonym for integral type is integer type. The representations of integral types shall define values by use of a pure binary numeration system" ~ C++03 3.9.1 §7
Yet it defines the integral promotion from bool to int:
"An rvalue of type bool can be converted to an rvalue of type int, with false becoming zero and true becoming one. These conversions are called integral promotions." ~ C++03 4.5 §4-5
as well as conversion from other types to bool:
"A zero value, null pointer value, or null member pointer value is converted to false; any other value is converted to true." ~ C++03 4.12 §1

The standard doesn't mandate anything for the binary representation of bools; it just says that, when converting to other integral types, a true bool will become 1 and a false bool will become 0.
This of course suggests an implementation similar in spirit to the one you said, where such conversions would become essentially no-ops or plain integer widening (but remember that bool is mandated to be a primitive type, not an enumeration type).

You can test things like this by copying the memory, ignoring which type it is. This program reads the raw memory value of test_bool and puts it into test_int so that you may print it.
int test_int = 0;
bool test_bool;
test_bool = true;
memcpy (&test_int, &test_bool, sizeof(bool));
printf ("True value is: %d\n", test_int);
test_bool = false;
memcpy (&test_int, &test_bool, sizeof(bool));
printf ("False value is: %d\n", test_int);
For me, this program gives:
True value is: 1
False value is: 0

I didn't program in C++ since a couple of months but AFAIR the rule is the following:
0 - false;
any value different than 0 - true; (by default it's 1 but AFAIR if you convert it from other integer value, ex. 2 will be also treated as true).
Thus you can say that C++ in some way waste memory (others too) as it could use one bit only but it's simpler to make compiler then.
In C++ you can also define bit fields (bit fields in wikipedia) but it's not used often.

Magic of casting value to bool

Today I realized that casting a value to a bool is a kind of magic:
int value = 0x100;
unsigned char uc = static_cast<unsigned char>(value);
bool b = static_cast<bool>(value);
Both sizeof(uc) and sizeof(b) return 1. I know that uc will contain 0x00, because only the LSB is copied. But b will be true, so my assumption is that, when casting to bool, the value gets evaluated instead of copied.
Is this assumption correct? And is this a standard C++ behavior?

There's nothing magical about it. The conversion from int to unsigned char is defined as value % 256 (for 8-bit chars), so that is what you get. It can be implemented as copying the LSB, but you should still think about it in the semantic sense, not the implementation.
On the same vein, conversion of int to bool is defined as value != 0, so again, that is what you get.
Integral (and boolean) conversions are covered by the C++11 standard in [conv.integral] and [conv.bool]. For the C-style cast, see [expr.cast] and [expr.static.cast].

It is a part of the standard:
4.12 Boolean conversions [conv.bool]
1 A prvalue of arithmetic, unscoped enumeration, pointer, or pointer
to member type can be converted to a prvalue of type bool. A zero
value, null pointer value, or null member pointer value is converted
to false; any other value is converted to true. A prvalue of type
std::nullptr_t can be converted to a prvalue of type bool; the
resulting value is false.

Yes, when casting to bool, the value gets evaluated, not copying.
In fact, in your example, as long as value is not 0, b will be true.
Update: Quoting from C++ Primer 5th Edition Chapter 2.1.2:
When we assign one of the nonbool arithmetic types to a bool object, the
result is false if the value is 0 and true otherwise.

According to the rules for C-style casts, (bool)value is effectively a static_cast. The first rule for static_cast then kicks in, which computes the value of a temporary "declared and initialized ... as by new_type Temp(expression);", i.e. bool Temp(value);. That's well-defined: Temp is true iff value != 0. So yes, value is "evaluated" in a sense.

Casting to bool is a feature inherited from plain old C. Originally, C didn't have a bool type, and it was useful to use other types in if statements, like so:
int myBool = 1;
if(myBool) {
// C didn't have bools, so we had to use ints!
}
void* p = malloc(sizeof(int));
if(!p) {
// malloc failed to allocate memory!
}
When you're converting to bool, it acts just like you're putting the statement in an if.
Of course, C++ is backwards compatible with C, so it adopted the feature. C++ also added the ability to overload the conversion to bool on your classes. iostream does this to indicate when the stream is in an invalid state:
if(!cout) {
// Something went horribly wrong with the standard output stream!
}

ISO/IEC C++ Standard:
4.12 Boolean conversions [conv.bool] 1 A prvalue of arithmetic... type can be converted to a prvalue of type bool. A zero value... is converted to false; any other value is converted to true.
So, since a prvalue is a value, you might say that the value gets evaluated, albeit that's kind of pleonastic.

Why bool variable can not be set to 0, in addition to the direct assignment of 0?

I can not overflow the bool variable becomes zero. The variables of type bool is 1 byte long，so I consider it will overflow when variables are stored 256，but it's not like I expected.
#include<iostream>
int main()
{
bool b = 0;
std::cout << sizeof(b) << std::endl; // the result is 1 byte
std::cout << b << std::endl; // the result is 0
b = 256;
std::cout << b << std::endl; // the result is 1,rather than the desired zero
return 0;
}
Maybe I did not make it clear that when only 1 byte,both 0 and 256 are stored as 00000000(binary)

The rule for converting numeric types to bool is simple: zero becomes false, all other values become true. The size and layout of the either type is irrelevant.
If you want it in Standardese:
C++11 4.12: A zero value, null pointer value, or null member pointer value is converted to false;
any other value is converted to true.

A bool may use one byte in your platform, but its value representation has only one 1 bit, because there are only two values of type bool: true and false. Not 0, not 1, not 256.
This is mentioned in section 3.9.1 of the C++ standard, paragraph 6:
Values of type bool are either true or false. (...)
You can assign 0 and 256 to a variable of type bool because there is an implicit conversion between int and bool. That conversion converts zero to false and anything else to true. That's what you're seeing here.
This is specified in section 4.12:
A prvalue of arithmetic, unscoped enumeration, pointer, or pointer to
member type can be converted to a prvalue of type bool. A zero value,
null pointer value, or null member pointer value is converted to
false; any other value is converted to true. (...)
b = 256 is not storing 256. It's storing true. You cannot store 256 in a bool variable because 256 is not a bool value. This is C++ not some form of assembly. b = 256 does not mean "write the lower 8 bits of 256 into the memory location referred to by variable b". As the standard quotes above show, it means "if 256 is zero, assign false to the variable b; otherwise assign true".
In common implementations the value false is represented as 00000000 in binary, but there is nothing preventing that. An implementation is free to choose different object representations as appropriate, because that's irrelevant: C++ code cannot see it without subverting the type system and wondering into implementation-defined behaviour. An implementation can be perfectly valid and pick 0xDEADBABE and 0xDEADD00D as the object representations of true and false: only one bit carries the value representation. The thing that matters is that bool is a type meant to represent simple truth values. It's not "a bunch of bits that we can test against zero".

Converting to bool any numeric value other than 0 yields 1/true.
Even though sizeof(bool) == 1, it's semantics are different from that of char.

Becase the C++ standard says something like (too lazy to dig out the exact quote now) "assigning a numeric value to a bool variable, a value of zero is converted to false, any other value is converted to true".

Probably 256 is not directly assigned but checked if it is 0. If so, the boolean will be false (0), otherwise it will be true (!= 0).

That is how bool type works, it can be zero or non-zero

Try
*reinterpret_cast<unsigned char *>(&b) = 256;
It is, nonetheless, unsafe (sizeof(bool) can be 1 or 4 or anything else, depending on implementation).

bool is a system defined type and can accept only true/false or alternately 1/0. The fact that it occupies one byte is because of the restrictions placed by the present storage mechanisms: It is not possible to address data of less than one byte in the present scenario.
To make it simple to understand: think of bool as a system defined type that has an overridden assignment operator. The operator internally converts any assignment to 1/0 and hence, no overflow.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js