Is it safe to memset bool to 0? - c++

Suppose I have some legacy code which cannot be changed unless a bug is discovered, and it contains this code:
bool data[32];
memset(data, 0, sizeof(data));
Is this a safe way to set all bool in the array to a false value?
More generally, is it safe to memset a bool to 0 in order to make its value false?
Is it guaranteed to work on all compilers? Or do I to request a fix?

Is it guaranteed by the law? No.
C++ says nothing about the representation of bool values.
Is it guaranteed by practical reality? Yes.
I mean, if you wish to find a C++ implementation that does not represent boolean false as a sequence of zeroes, I shall wish you luck. Given that false must implicitly convert to 0, and true must implicitly convert to 1, and 0 must implicitly convert to false, and non-0 must implicitly convert to true … well, you'd be silly to implement it any other way.
Whether that means it's "safe" is for you to decide.
I don't usually say this, but if I were in your situation I would be happy to let this slide. If you're really concerned, you can add a test executable to your distributable to validate the precondition on each target platform before installing the real project.

No. It is not safe (or more specifically, portable). However, it likely works by virtue of the fact that your typical implementation will:
use 0 to represent a boolean (actually, the C++ specification requires it)
generate an array of elements that memset() can deal with.
However, best practice would dictate using bool data[32] = {false} - additionally, this will likely free the compiler up to internally represent the structure differently - since using memset() could result in it generating a 32 byte array of values rather than, say, a single 4 byte that will fit nicely within your average CPU register.

Update
P1236R1: Alternative Wording for P0907R4 Signed Integers are Two's Complement says the following:
As per EWG decision in San Diego, deviating from P0907R3, bool is specified to have some integral type as its underlying type, but the presence of padding bits for "bool" will remain unspecified, as will the mapping of true and false to values of the underlying type.
Original Answer
I believe this unspecified although it seems likely the underlying representation of false would be all zeros. Boost.Container relies on this as well (emphasis mine):
Boost.Container uses std::memset with a zero value to initialize some
types as in most platforms this initialization yields to the desired
value initialization with improved performance.
Following the C11 standard, Boost.Container assumes that for any
integer type, the object representation where all the bits are zero
shall be a representation of the value zero in that type. Since
_Bool/wchar_t/char16_t/char32_t are also integer types in C, it considers all C++ integral types as initializable via std::memset.
This C11 quote they they point to as a rationale actually comes from a C99 defect: defect 263: all-zero bits representations which added the following:
For any integer type, the object representation where all the bits are
zero shall be a representation of the value zero in that type.
So then the question here is the assumption correct, are the underlying object representation for integer compatible between C and C++?
The proposal Resolving the difference between C and C++ with regards to object representation of integers sought to answer this to some extent which as far as I can tell was not resolved. I can not find conclusive evidence of this in the draft standard. We have a couple of cases where it links to the C standard explicitly with respect to types. Section 3.9.1 [basic.fundamental] says:
[...] The signed and unsigned integer types shall satisfy the
constraints given in the C standard, section 5.2.4.2.1.
and 3.9 [basic.types] which says:
The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T). The value representation of an object is the set of bits
that hold the value of type T. For trivially copyable types, the value
representation is a set of bits in the object representation that
determines a value, which is one discrete element of an
implementation-defined set of values.44
where footnote 44(which is not normative) says:
The intent is that the memory model of C++ is compatible with that of
ISO/IEC 9899 Programming Language C.
The farthest the draft standard gets to specifying the underlying representation of bool is in section 3.9.1:
Types bool, char, char16_t, char32_t, wchar_t, and the signed and
unsigned integer types are collectively called integral types.50 A
synonym for integral type is integer type. The representations of
integral types shall define values by use of a pure binary numeration
system.51 [ Example: this International Standard permits 2’s
complement, 1’s complement and signed magnitude representations for
integral types. —end example ]
the section also says:
Values of type bool are either true or false.
but all we know of true and false is:
The Boolean literals are the keywords false and true. Such literals
are prvalues and have type bool.
and we know they are convertible to 0 an 1:
A prvalue of type bool can be converted to a prvalue of type int, with
false becoming zero and true becoming one.
but this gets us no closer to the underlying representation.
As far as I can tell the only place where the standard references the actual underlying bit value besides padding bits was removed via defect report 1796: Is all-bits-zero for null characters a meaningful requirement? :
It is not clear that a portable program can examine the bits of the representation; instead, it would appear to be limited to examining the bits of the numbers corresponding to the value representation (3.9.1 [basic.fundamental] paragraph 1). It might be more appropriate to require that the null character value compare equal to 0 or '\0' rather than specifying the bit pattern of the representation.
There are more defect reports that deal with the gaps in the standard with respect to what is a bit and difference between the value and object representation.
Practically, I would expect this to work, I would not consider it safe since we can not nail this down in the standard. Do you need to change it, not clear, you clearly have a non-trivial trade-off involved. So assuming it works now the question is do we consider it likely to break with future versions of various compilers, that is unknown.

From 3.9.1/7:
Types bool , char , char16_t , char32_t , wchar_t , and the signed and
unsigned integer types are collectively called integral types. A
synonym for integral type is integer type . The representations of
integral types shall define values by use of a pure binary numeration
system.
Given this I can't see any possible implementation of bool that wouldn't represent false as all 0 bits.

Related

Is the value representation of integral types implementation-defined or unspecified?

To quote from N4868 6.8.2 paragraph 5:
Each value x of an unsigned integer type with width N has a unique representation...
Notably, it avoids specifying "value representation" or "object representation," so it's not clear if either is intended here.
Later on (in the index of implementation-defined behavior), N4868 does call out the value representation of pointers and of floating-point types as implementation-defined, but very notably excludes integral types.
Given this, there are four potential interpretations that I can think of:
The value representation of integral types is uniquely specified
The value representation of integral types is unspecified
The value representation of integral types is implementation-defined, but mistakenly left out of the aforementioned index
The value representation of integral types is undefined
#1 appears impossible, as implementations exist for both big- and little- endian architectures.
#3 appears unlikely, since the absence of integral types from the index is conspicuous, and the actual text of both floating-point and pointer types calls out their being implementation-defined, while the text on integral types goes to great lengths to avoid specifying the value representation.
#2 is the most likely interpretation, but is conspicuous in that the standard often calls out behavior as unspecified, but here says no such thing. This would, among other things, imply that behavior can be unspecified even if not actually called out as such, which makes it difficult to distinguish merely unspecified behavior vs behavior that is left undefined by the standard not defining it at all (as opposed to called out as "undefined behavior")
#4 seems absurd, as the standard implies that all types (or at least, trivially-copyable ones) have a definite, if otherwise unspecified, object representation (and by extension, value representation). Specifically, 6.7, paragraph 4 states:
For trivially copyable types, the value representation is a set of bits in the object representation
that determines a value, which is one discrete element of an implementation-defined set of values.
Which seems to imply that the value representation of trivially-copyable types (including integral types) is otherwise unspecified.
Scenario #2 probably indicates a failure to call the representation out as "unspecified," since we have the note under the definition of "undefined behavior" in Section 3: "Undefined behavior may be expected when this document omits any explicit definition of behavior." If the value representation of integral types isn't every explicitly stated as unspecified / implementation-defined, then code that depends on the value representation wouldn't just be unspecified / implementation-defined, it would be undefined by omission.
However, one could also argue that the "explicit definition of behavior" clause does not apply, as the behavior is perfectly well-defined, the object representation being a sequence of objects of type unsigned char, with merely their values being left to the implementation.
After bringing this up as an editorial issue, the correct answer appears to be that the integral representation is "none of the above." It is simply left unspecified, and is not called out as such because the "unspecified" label is only generally applied to behavior.

Using reinterpret_cast to convert integer to pointer and back to integer [duplicate]

According to http://en.cppreference.com/w/cpp/language/reinterpret_cast, it is known that reinterpret_cast a pointer to an integral of sufficient size and back yield the same value. I'm wondering whether the converse is also true by the standards. That is, does reinterpret_cast an integral to a pointer type of sufficient size and back yield the same value?
No, that is not guaranteed by the standard. Quoting all parts of C++14 (n4140) [expr.reinterpret.cast] which concern pointer–integer conversions, emphasis mine:
4 A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is
implementation-defined. [ Note: It is intended to be unsurprising to those who know the addressing structure
of the underlying machine. —end note ] ...
5 A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted
to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type
will have its original value; mappings between pointers and integers are otherwise implementation-defined.
[ Note: Except as described in 3.7.4.3, the result of such a conversion will not be a safely-derived pointer
value. —end note ]
So starting with an integral value and converting it to a pointer and back (assuming no size issues) is implementation-defined. Which means you must consult your compiler's documentation to learn whether such a round trip preserves values or not. As such, it is certainly not portable.
I get exactly this problem in library exporting pointers to objects as opaque identifiers and now attempting to recover these pointers from external calls don't work for old x86 CPU's (in the time of windows 98). So, while we can expect that behaviour, this is false in general case. In 386-CPU the address is composed by overlapped pointers so the address of any memory position is not unique, and I found that conversion back don't recover original value.

In C++, what happens when I use static_cast<char> on an integer value outside -128,127 range?

In a code compiled on i386 Linux using g++, I have used static_cast<char>() cast on a value that might exceed the valid range of -128,127 for a char. There were no errors or exceptions and so I used the code in production.
The problem is now I don't know how this code might behave when a value outside this range is thrown at it. There is no problem if data is modified or truncated, I only need to know how this modification behaves on this particular platform.
Also what would happen if C-style cast ((char)value) had been used? would it behave differently?
In your case this would be an explicit type conversion. Or to be more precise an integral conversions.
The standard says about this(4.7):
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is implementation-defined.
So your problem is implementation-defined. On the other hand I have not yet seen a compiler that does not just truncate the larger value to the smaller one. And I have never seen any compiler that uses the rule mentioned above.
So it should be fairly safe to just cast your integer/short to the char.
I don't know the rules for an C cast by heart and I really try to avoid them because it is not easy to say which rule will kick in.
This is dealt with in §4.7 of the standard (integral conversions).
The answer depends on whether in the implementation in question char is signed or unsigned. If it is unsigned, then modulo arithmetic is applied. §4.7/2 of C++11 states: "If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2 n where n is the number of bits used to represent the unsigned type)." This means that if the input integer is not negative, the normal bit truncation you expect will arise. If is is negative, the same will apply if negative numbers are represented by 2's complement, otherwise the conversion will be bit altering.
If char is signed, §4.7/3 of C++11 applies: "If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined." So it is up to the documentation for the particular implementation you use. Having said that, on 2's complement systems (ie all those in normal use) I have not seen a case where anything other than normal bit truncation occurs for char types: apart from anything else, by virtue of §3.9.1/1 of the c++11 standard all character types (char, unsigned char and signed char) must have the same object representation and alignment.
The effect of a C style case, an explicit static_cast and an implicit narrowing conversion is the same.
Technically the language specs for unsigned types agree in inposing a plain base-2. And for unsigned plain base-2 its pretty obvious what extension and truncation do.
When going to unsigned, however, the specs are more "tolerant" allowing potentially different kind of processor to use different ways to represent signed numbers. And since a same number may have in different platform different representations is practically not possible to provide a description on what happen to it when adding or removing bits.
For this reason, language specification remain more vague by saying that "the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined"
In other words, compiler manufacturer are required to do the best as they can to keep the numeric value. But when this cannot be done, they are free to adapt to what is more efficient for them.

Difference between object and value representation by example

N3797::3.9/4 [basic.types] :
The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T). The value representation of an object is the set of bits
that hold the value of type T. For trivially copyable types, the value
representation is a set of bits in the object representation that
determines a value, which is one discrete element of an
implementation-defined set of values
N3797::3.9.1 [basic.fundamental] says:
For narrow character types, all bits of the object representation
participate in the value representation.
Consider the following struct:
struct A
{
char a;
int b;
}
I think for A not all bits of the object representation participate in the value representation because of padding added by implementation. But what about others fundamentals type?
The Standard says:
N3797::3.9.1 [basic.fundamental]
For narrow character types, all bits of the object representation
participate in the value representation.
These requirements do not hold for other types.
I can't imagine why it doesn't hold for say int or long. What's the reason? Could you clarify?
An example might be the Unisys mainframes, where an int has 48
bits, but only 40 participate in the value representation (and INT_MAX is 2^39-1); the
others must be 0. I imagine that any machine with a tagged
architecture would have similar issues.
EDIT:
Just some further information: the Unisys mainframes are
probably the only remaining architectures which are really
exotic: the Unisys Libra (ex-Burroughs) have a 48 bit word, use signed
magnitude for integers, and have a tagged architecture, where
the data itself contains information concerning its type. The
Unisys Dorado are the ex-Univac: 36 bit one's complement (but no
reserved bits for tagging) and 9 bit char's.
From what I understand, however, Unisys is phasing them out (or
has phased them out in the last year) in favor of Intel based
systems. Once they disappear, pretty much all systems will be
2's complement, 32 or 64 bits, and all but the IBM mainframes
will use IEEE floating poing (and IBM is moving or has moved in
that direction as well). So there won't be any motivation for
the standard to continue with special wording to support them;
in the end, in a couple of years at lesat, C/C++ could probably
follow the Java path, and impose a representation on all of its
basic data types.
This is probably meant to give the compiler headroom for optimizations on some platforms.
Consider for example a 64 bit platform where handling non-64 bit values incurs a large penalty, then it would make sense to have e.g. short only use 16 bits (value repr), but still use 64 bit storage (obj repr).
Similar rationale applies to the Fastest minimum-width integer types mandated by <stdint>. Sometimes larger types are not slower, but faster to use.
As far as I understand at least one case for this is dealing with trap representations, usually on exotic architectures. This issue is covered in N2631: Resolving the difference between C and C++ with regards to object representation of integers. It is is very long but I will quote some sections(The author is James Kanze, so if we are lucky maybe he will drop by and comment further) which says (emphasis mine).
In recent discussions in comp.lang.c++, it became clear that C and C++ have different requirements concerning the object representation of integers, and that at least one real implementation of C does not meet the C++ requirements. The purpose of this paper is to suggest wording to align the C++ standard with C.
It should be noted that the issue only concerns some fairly “exotic” hardware. In this regard, it raises a somewhat larger issue
and:
If C compatibility is desired, it seems to me that the simplest and surest way of attaining this is by incorporating the exact words from the C standard, in place of the current wording. I thus propose that we adopt the wording from the C standard, as follows
and:
Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.
and:
For signed integer types [...] Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for one's complement), is a trap representation or a normal value. In the case of sign and magnitude and one's complement, if this representation is a normal value it is called a negative zero.

What does the C++ standard say about results of casting value of a type that lies outside the range of the target type?

Recently I had to perform some data type conversions from float to 16 bit integer. Essentially my code reduces to the following
float f_val = 99999.0;
short int si_val = static_cast<short int>(f_val);
// si_val is now -32768
This input value was a problem and in my code I had neglected to check the limits of the float value so I can see my fault, but it made me wonder about the exact rules of the language when one has to do this kind of ungainly cast. I was slightly surprised to find that value of the cast was -32768. Furthermore, this is the value I get whenever the value of the float exceeds the limits of a 16 bit integer. I have googled this but found a surprising lack of detailed info about it. The best I could find was the following from cplusplus.com
Converting to int from some smaller integer type, or to double from
float is known as promotion, and is guaranteed to produce the exact
same value in the destination type. Other conversions between
arithmetic types may not always be able to represent the same value
exactly:
If the conversion is from a floating-point type to an integer type, the value
is truncated (the decimal part is removed).
The conversions from/to bool consider false equivalent to zero (for numeric
types) and to null pointer (for pointer types); and true equivalent to all
other values.
Otherwise, when the destination type cannot represent the value, the conversion
is valid between numerical types, but the value is
implementation-specific (and may not be portable).
This suggestion that the results are implementation defined does not surprise me, but I have heard that cplusplus.com is not always reliable.
Finally, when performing the same cast from a 32 bit integer to 16 bit integer (again with a value outisde of 16 bit range) I saw results clearly indicating integer overflow. Although I was not surprised by this it has added to my confusion due to the inconsistency with the cast from float type.
I have no access to the C++ standard, but a lot of C++ people here do so I was wondering what the standard says on this issue? Just for completeness, I am using g++ version 4.6.3.
You're right to question what you've read. The conversion has no defined behaviour, which contradicts what you quoted in your question.
4.9 Floating-integral conversions [conv.fpint]
1 A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates;
that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be
represented in the destination type. [ Note: If the destination type is bool, see 4.12. -- end note ]
One potentially useful permitted result that you might get is a crash.