This is awkward, but the bitwise AND operator is defined in the C++ standard as follows (emphasis mine).
The usual arithmetic conversions are performed; the result is the bitwise AND function of its operands. The operator applies only to integral or unscoped enumeration operands.
This looks kind of meaningless to me. The "bitwise AND function" is not defined anywhere in the standard, as far as I can see.
I get that the AND function is well-understood and thus may not require explanation. The meaning of the word "bitwise" should also be rather clear: the function is applied to corresponding bits of its operands. However, what constitute the bits of the operands is not clear.
What gives?
This is underspecified. The issue of what the standard means when it refers to bit-wise operations is the subject of a few defect reports.
For example defect report 1857: Additional questions about bits:
The specification of the bitwise operations in 5.11 [expr.bit.and],
5.12 [expr.xor], and 5.13 [expr.or] uses the undefined term “bitwise” in describing the operations, without specifying whether it is the
value or object representation that is in view.
Part of the resolution of this might be to define “bit” (which is
otherwise currently undefined in C++) as a value of a given power of
2.
and the response was:
CWG decided to reformulate the description of the operations
themselves to avoid references to bits, splitting off the larger
questions of defining “bit” and the like to issue 1943 for further
consideration.
and defect report 1943 says:
CWG decided at the 2014-06 (Rapperswil) meeting to address only a
limited subset of the questions raised by issues 1857 and 1861. This
issue is a placeholder for the remaining questions, such as defining a
“bit” in terms of a value of 2n, specifying whether a bit-field has a
sign bit, etc.
We can see from this defect report 1796: Is all-bits-zero for null characters a meaningful requirement?, that this issue of what the standard means when it refers to bits affected/affects other sections as well:
According to 2.3 [lex.charset] paragraph 3,
The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic
source character set, plus control characters representing alert,
backspace, and carriage return, plus a null character (respectively,
null wide character), whose representation has all zero bits.
It is not clear that a portable program can examine the bits of the
representation; instead, it would appear to be limited to examining
the bits of the numbers corresponding to the value representation
(3.9.1 [basic.fundamental] paragraph 1). It might be more appropriate
to require that the null character value compare equal to 0 or '\0'
rather than specifying the bit pattern of the representation.
There is a similar issue for the definition of shift, bitwise and, and
bitwise or operators: are those specifications constraints on the bit
pattern of the representation or on the values resulting from the
interpretation of those patterns as numbers?
In this case the resolution was to change:
representation has all zero bits
to:
value is 0.
Note that as mentioned in ecatmur's answer the draft C++ standard does defer to C standard section 5.2.4.2.1 in section 3.9.1 [basic.fundamental] in paragraph 3 it does not refer to section 6.5/4 from the C standard which would at least tell us that the results are implementation defined. I explain in my comment below that C++ standard can only incorporate text from normative references explicitly.
[basic.fundamental]/3 defers to C 5.2.4.2.1. It seems reasonable that the bitwise operators in C++ being underspecified should similarly defer to C, in this case 6.5.10/4:
The result of the binary & operator is the bitwise AND of the operands (that is, each bit in
the result is set if and only if each of the corresponding bits in the converted operands is
set).
Note that C 6.5/4 has:
Some operators (the unary operator ~, and the binary operators <<, >>, &, ^, and |,
collectively described as bitwise operators) are required to have operands that have
integer type. These operators yield values that depend on the internal representations of
integers, and have implementation-defined and undefined aspects for signed types.
The internal representations of the integers are of course described in 6.2.6.2/1, /2.
C++ Standard defines storage as a certain amount of bits. The implementation might decide what meaning to attribute to a particular bit; that being said, binary AND is supposed to work on conceptual 0s and 1s forming a particular type's representation.
3.9.1.7. (...) The representations of integral types shall define values by use of a pure binary numeration system.49 (...)
3.9.1, footnote 49) A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive
bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest
position
That means that for whatever physical representation used, binary AND acts according to the truth table for the AND function (for each bit number i, take bits Ai and Bi from appropriate operands and produce a value of 1 only if both are 1, otherwise produce a 0 for the bit Ri).. Resulting value is left to interpret by the implementation, but whatever is chosen, it has to be in line with other expectations with regard to other binary operations like OR and XOR.
Legally, we could consider all bitwise operations to have undefined behaviour as they are not actually defined.
More reasonably, we are expected to apply common sense and refer to the common meanings of these operations, applying them to the bits of the operands (hence the term "bitwise").
But nothing actually states that. Shame my answer can't be considered normative wording.
Related
In the following C snippet that checks if the first two bits of a 16-bit sequence are set:
bool is_pointer(unsigned short int sequence) {
return (sequence >> 14) == 3;
}
CLion's Clang-Tidy is giving me a "Use of a signed integer operand with a binary bitwise operator" warning, and I can't understand why. Is unsigned short not unsigned enough?
The code for this warning checks if either operand to the bitwise operator is signed. It is not sequence causing the warning, but 14, and you can alleviate the problem by making 14 unsigned by appending a u to the end.
(sequence >> 14u)
This warning is bad. As Roland's answer describes, CLion is fixing this.
There is a check in clang-tidy that is called hicpp-signed-bitwise. This check follows the wording of the HIC++ standard. That standard is freely available and says:
5.6.1. Do not use bitwise operators with signed operands
Use of signed operands with bitwise operators is in some cases subject to undefined or implementation defined behavior. Therefore, bitwise operators should only be used with operands of unsigned integral types.
The authors of the HIC++ coding standard misinterpreted the intention of the C and C++ standards and either accidentally or intentionally focused on the type of the operands instead of the value of the operands.
The check in clang-tidy implements exactly this wording, in order to conform to that standard. That check is not intended to be generally useful, its only purpose is to help the poor souls whose programs have to conform to that one stupid rule from the HIC++ standard.
The crucial point is that by definition integer literals without any suffix are of type int, and that type is defined as being a signed type. HIC++ now wrongly concludes that positive integer literals might be negative and thus could invoke undefined behavior.
For comparison, the C11 standard says:
6.5.7 Bitwise shift operators
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
This wording is carefully chosen and emphasises that the value of the right operand is important, not its type. It also covers the case of a too large value, while the HIC++ standard simply forgot that case. Therefore, saying 1u << 1000u is ok in HIC++, while 1 << 3 isn't.
The best strategy is to explicitly disable this single check. There are several bug reports for CLion mentioning this, and it is getting fixed there.
Update 2019-12-16: I asked Perforce what the motivation behind this exact wording was and whether the wording was intentional. Here is their response:
Our C++ team who were involved in creating the HIC++ standard have taken a look at the Stack Overflow question you mentioned.
In short, referring to the object type in the HIC++ rule instead of the value is an intentional choice to allow easier automated checking of the code. The type of an object is always known, while the value is not.
HIC++ rules in general aim to be "decidable". Enforcing against the type ensures that a decidable check is always possible, ie. directly where the operator is used or where a signed type is converted to unsigned.
The rationale explicitly refers to "possible" undefined behavior, therefore a sensible implementation can exclude:
constants unless there is definitely an issue and,
unsigned types that are promoted to signed types.
The best operation is therefore for CLion to limit the checking to non-constant types before promotion.
I think the integer promotion causes here the warning. Operands smaller than an int are widened to integer for the arithmetic expression, which is signed. So your code is effectively return ( (int)sequence >> 14)==3; which leds to the warning. Try return ( (unsigned)sequence >> 14)==3; or return (sequence & 0xC000)==0xC000;.
In the following C snippet that checks if the first two bits of a 16-bit sequence are set:
bool is_pointer(unsigned short int sequence) {
return (sequence >> 14) == 3;
}
CLion's Clang-Tidy is giving me a "Use of a signed integer operand with a binary bitwise operator" warning, and I can't understand why. Is unsigned short not unsigned enough?
The code for this warning checks if either operand to the bitwise operator is signed. It is not sequence causing the warning, but 14, and you can alleviate the problem by making 14 unsigned by appending a u to the end.
(sequence >> 14u)
This warning is bad. As Roland's answer describes, CLion is fixing this.
There is a check in clang-tidy that is called hicpp-signed-bitwise. This check follows the wording of the HIC++ standard. That standard is freely available and says:
5.6.1. Do not use bitwise operators with signed operands
Use of signed operands with bitwise operators is in some cases subject to undefined or implementation defined behavior. Therefore, bitwise operators should only be used with operands of unsigned integral types.
The authors of the HIC++ coding standard misinterpreted the intention of the C and C++ standards and either accidentally or intentionally focused on the type of the operands instead of the value of the operands.
The check in clang-tidy implements exactly this wording, in order to conform to that standard. That check is not intended to be generally useful, its only purpose is to help the poor souls whose programs have to conform to that one stupid rule from the HIC++ standard.
The crucial point is that by definition integer literals without any suffix are of type int, and that type is defined as being a signed type. HIC++ now wrongly concludes that positive integer literals might be negative and thus could invoke undefined behavior.
For comparison, the C11 standard says:
6.5.7 Bitwise shift operators
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
This wording is carefully chosen and emphasises that the value of the right operand is important, not its type. It also covers the case of a too large value, while the HIC++ standard simply forgot that case. Therefore, saying 1u << 1000u is ok in HIC++, while 1 << 3 isn't.
The best strategy is to explicitly disable this single check. There are several bug reports for CLion mentioning this, and it is getting fixed there.
Update 2019-12-16: I asked Perforce what the motivation behind this exact wording was and whether the wording was intentional. Here is their response:
Our C++ team who were involved in creating the HIC++ standard have taken a look at the Stack Overflow question you mentioned.
In short, referring to the object type in the HIC++ rule instead of the value is an intentional choice to allow easier automated checking of the code. The type of an object is always known, while the value is not.
HIC++ rules in general aim to be "decidable". Enforcing against the type ensures that a decidable check is always possible, ie. directly where the operator is used or where a signed type is converted to unsigned.
The rationale explicitly refers to "possible" undefined behavior, therefore a sensible implementation can exclude:
constants unless there is definitely an issue and,
unsigned types that are promoted to signed types.
The best operation is therefore for CLion to limit the checking to non-constant types before promotion.
I think the integer promotion causes here the warning. Operands smaller than an int are widened to integer for the arithmetic expression, which is signed. So your code is effectively return ( (int)sequence >> 14)==3; which leds to the warning. Try return ( (unsigned)sequence >> 14)==3; or return (sequence & 0xC000)==0xC000;.
Result of left shifting can be undefined behavior:
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are zero-filled. If E1 has an unsigned type, the value of the
result is E1×2^E2, reduced modulo one more than the maximum value
representable in the result type. Otherwise, if E1 has a signed type
and non-negative value, and E1×2^E2 is representable in the
corresponding unsigned type of the result type, then that value,
converted to the result type, is the resulting value; otherwise, the
behavior is undefined.
Result of right shifting can be implementation-defined:
The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a non-negative value,
the value of the result is the integral part of the quotient of
E1/2^E2. If E1 has a signed type and a negative value, the resulting
value is implementation-defined.
Now, all the platforms I know, undefined behavior/implementation-defined actually do sensible things here:
left shifting a negative number is multiplication by 2^E2, just like if the number would be positive
right shifting a negative number is "arithmetic shift", which shifts the number normally, but puts the sign bit into most significant bits.
So, the question is, are there any 2-complement platforms/compilers, which doesn't behave like this?
Why the question? Most compilers don't emit optimal code (link provided by phuclv, check out the disassembly between test1 and test2) for power-of-2 division in certain circumstances (clang generates optimal code, though).
Some compilers for processors that lack a sign-extending right-shift instruction have historically processed all right-shift operations as zero-filling.
The C89 Standard precisely defined the behavior of left-shifting a negative number on all ones'-complement or two's-complement platforms that don't use padding bits on either their signed or unsigned types. The behavior on ones'-complement machines was not equivalent to multiplication, however. Further, the proper behavior on sign-magnitude machines wasn't clear.
The C99 Standard changed a negative left-shift to Undefined Behavior. Nothing in the Rationale offers any reason for the change, nor even acknowledges it. The lack of mention in the rationale suggests that it was not seen as a breaking change, because there was no reason to expect that any implementations where the behavior was usefully defined under C89 wouldn't continue to define the behavior the same way whether or not the Standard continued to require it. The only intention that would make sense would be to allow ones'-complement or sign-magnitude implementations (in the event any were ever produced for C99) to behave in a fashion which may be more useful than what C89 had required.
When C89 or even C99 was written, the authors didn't perceive much difference between actions whose behavior is mandated by the Standard, versus actions which implementations could be expected to process in predictable and useful fashion whether the Standard required them to or not. Compiler writers seem to believe the Committee intended to eliminate from the language all actions of the latter form, but I've seen no evidence to suggest anything of the sort.
In practice, the only compilers I know of which don't presently abide that expectation with regard to left-shifts whose result would be representable as a multiplication that doesn't overflow are those which are explicitly configured to squawk on negative left-shifts purely because the Standard no longer defines their behavior. I would not be particularly surprised, however, if some "clever" person were to "optimize" a two's-complement compiler based on the fact that the Standard no longer requires implementations for such platforms to behave in the same sensible and useful way they always have. Such deviation could take two forms:
A compiler could decide that if an operation that requires the carry flag be clear is preceded by a signed left shift, the "clear carry" instruction that would normally precede the latter operation could be omitted. I've used platforms where that would save an instruction, but all the compilers I've seen for such platforms clear the carry even though the Standard would not require it.
A compiler could decide that it's "impossible" for the result of a left shift to be negative, and thus that any comparisons between that result and any negative values may be safely omitted. Going further, it could decide that it's likewise impossible for the operand to be negative, and remove any comparisons with negative numbers that would not prevent the left shift from being performed. Neither gcc nor clang attempts to impose such optimizations yet, but that doesn't mean they never will.
Quality implementations for two's-complement systems will process signed left-shift predictably unless or until there is a particularly compelling reason for them to do otherwise--something that seems rather unlikely to occur. As to what poor-quality implementations might do, who knows.
Suppose I have some legacy code which cannot be changed unless a bug is discovered, and it contains this code:
bool data[32];
memset(data, 0, sizeof(data));
Is this a safe way to set all bool in the array to a false value?
More generally, is it safe to memset a bool to 0 in order to make its value false?
Is it guaranteed to work on all compilers? Or do I to request a fix?
Is it guaranteed by the law? No.
C++ says nothing about the representation of bool values.
Is it guaranteed by practical reality? Yes.
I mean, if you wish to find a C++ implementation that does not represent boolean false as a sequence of zeroes, I shall wish you luck. Given that false must implicitly convert to 0, and true must implicitly convert to 1, and 0 must implicitly convert to false, and non-0 must implicitly convert to true … well, you'd be silly to implement it any other way.
Whether that means it's "safe" is for you to decide.
I don't usually say this, but if I were in your situation I would be happy to let this slide. If you're really concerned, you can add a test executable to your distributable to validate the precondition on each target platform before installing the real project.
No. It is not safe (or more specifically, portable). However, it likely works by virtue of the fact that your typical implementation will:
use 0 to represent a boolean (actually, the C++ specification requires it)
generate an array of elements that memset() can deal with.
However, best practice would dictate using bool data[32] = {false} - additionally, this will likely free the compiler up to internally represent the structure differently - since using memset() could result in it generating a 32 byte array of values rather than, say, a single 4 byte that will fit nicely within your average CPU register.
Update
P1236R1: Alternative Wording for P0907R4 Signed Integers are Two's Complement says the following:
As per EWG decision in San Diego, deviating from P0907R3, bool is specified to have some integral type as its underlying type, but the presence of padding bits for "bool" will remain unspecified, as will the mapping of true and false to values of the underlying type.
Original Answer
I believe this unspecified although it seems likely the underlying representation of false would be all zeros. Boost.Container relies on this as well (emphasis mine):
Boost.Container uses std::memset with a zero value to initialize some
types as in most platforms this initialization yields to the desired
value initialization with improved performance.
Following the C11 standard, Boost.Container assumes that for any
integer type, the object representation where all the bits are zero
shall be a representation of the value zero in that type. Since
_Bool/wchar_t/char16_t/char32_t are also integer types in C, it considers all C++ integral types as initializable via std::memset.
This C11 quote they they point to as a rationale actually comes from a C99 defect: defect 263: all-zero bits representations which added the following:
For any integer type, the object representation where all the bits are
zero shall be a representation of the value zero in that type.
So then the question here is the assumption correct, are the underlying object representation for integer compatible between C and C++?
The proposal Resolving the difference between C and C++ with regards to object representation of integers sought to answer this to some extent which as far as I can tell was not resolved. I can not find conclusive evidence of this in the draft standard. We have a couple of cases where it links to the C standard explicitly with respect to types. Section 3.9.1 [basic.fundamental] says:
[...] The signed and unsigned integer types shall satisfy the
constraints given in the C standard, section 5.2.4.2.1.
and 3.9 [basic.types] which says:
The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T). The value representation of an object is the set of bits
that hold the value of type T. For trivially copyable types, the value
representation is a set of bits in the object representation that
determines a value, which is one discrete element of an
implementation-defined set of values.44
where footnote 44(which is not normative) says:
The intent is that the memory model of C++ is compatible with that of
ISO/IEC 9899 Programming Language C.
The farthest the draft standard gets to specifying the underlying representation of bool is in section 3.9.1:
Types bool, char, char16_t, char32_t, wchar_t, and the signed and
unsigned integer types are collectively called integral types.50 A
synonym for integral type is integer type. The representations of
integral types shall define values by use of a pure binary numeration
system.51 [ Example: this International Standard permits 2’s
complement, 1’s complement and signed magnitude representations for
integral types. —end example ]
the section also says:
Values of type bool are either true or false.
but all we know of true and false is:
The Boolean literals are the keywords false and true. Such literals
are prvalues and have type bool.
and we know they are convertible to 0 an 1:
A prvalue of type bool can be converted to a prvalue of type int, with
false becoming zero and true becoming one.
but this gets us no closer to the underlying representation.
As far as I can tell the only place where the standard references the actual underlying bit value besides padding bits was removed via defect report 1796: Is all-bits-zero for null characters a meaningful requirement? :
It is not clear that a portable program can examine the bits of the representation; instead, it would appear to be limited to examining the bits of the numbers corresponding to the value representation (3.9.1 [basic.fundamental] paragraph 1). It might be more appropriate to require that the null character value compare equal to 0 or '\0' rather than specifying the bit pattern of the representation.
There are more defect reports that deal with the gaps in the standard with respect to what is a bit and difference between the value and object representation.
Practically, I would expect this to work, I would not consider it safe since we can not nail this down in the standard. Do you need to change it, not clear, you clearly have a non-trivial trade-off involved. So assuming it works now the question is do we consider it likely to break with future versions of various compilers, that is unknown.
From 3.9.1/7:
Types bool , char , char16_t , char32_t , wchar_t , and the signed and
unsigned integer types are collectively called integral types. A
synonym for integral type is integer type . The representations of
integral types shall define values by use of a pure binary numeration
system.
Given this I can't see any possible implementation of bool that wouldn't represent false as all 0 bits.
N3797::3.9/4 [basic.types] :
The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T). The value representation of an object is the set of bits
that hold the value of type T. For trivially copyable types, the value
representation is a set of bits in the object representation that
determines a value, which is one discrete element of an
implementation-defined set of values
N3797::3.9.1 [basic.fundamental] says:
For narrow character types, all bits of the object representation
participate in the value representation.
Consider the following struct:
struct A
{
char a;
int b;
}
I think for A not all bits of the object representation participate in the value representation because of padding added by implementation. But what about others fundamentals type?
The Standard says:
N3797::3.9.1 [basic.fundamental]
For narrow character types, all bits of the object representation
participate in the value representation.
These requirements do not hold for other types.
I can't imagine why it doesn't hold for say int or long. What's the reason? Could you clarify?
An example might be the Unisys mainframes, where an int has 48
bits, but only 40 participate in the value representation (and INT_MAX is 2^39-1); the
others must be 0. I imagine that any machine with a tagged
architecture would have similar issues.
EDIT:
Just some further information: the Unisys mainframes are
probably the only remaining architectures which are really
exotic: the Unisys Libra (ex-Burroughs) have a 48 bit word, use signed
magnitude for integers, and have a tagged architecture, where
the data itself contains information concerning its type. The
Unisys Dorado are the ex-Univac: 36 bit one's complement (but no
reserved bits for tagging) and 9 bit char's.
From what I understand, however, Unisys is phasing them out (or
has phased them out in the last year) in favor of Intel based
systems. Once they disappear, pretty much all systems will be
2's complement, 32 or 64 bits, and all but the IBM mainframes
will use IEEE floating poing (and IBM is moving or has moved in
that direction as well). So there won't be any motivation for
the standard to continue with special wording to support them;
in the end, in a couple of years at lesat, C/C++ could probably
follow the Java path, and impose a representation on all of its
basic data types.
This is probably meant to give the compiler headroom for optimizations on some platforms.
Consider for example a 64 bit platform where handling non-64 bit values incurs a large penalty, then it would make sense to have e.g. short only use 16 bits (value repr), but still use 64 bit storage (obj repr).
Similar rationale applies to the Fastest minimum-width integer types mandated by <stdint>. Sometimes larger types are not slower, but faster to use.
As far as I understand at least one case for this is dealing with trap representations, usually on exotic architectures. This issue is covered in N2631: Resolving the difference between C and C++ with regards to object representation of integers. It is is very long but I will quote some sections(The author is James Kanze, so if we are lucky maybe he will drop by and comment further) which says (emphasis mine).
In recent discussions in comp.lang.c++, it became clear that C and C++ have different requirements concerning the object representation of integers, and that at least one real implementation of C does not meet the C++ requirements. The purpose of this paper is to suggest wording to align the C++ standard with C.
It should be noted that the issue only concerns some fairly “exotic” hardware. In this regard, it raises a somewhat larger issue
and:
If C compatibility is desired, it seems to me that the simplest and surest way of attaining this is by incorporating the exact words from the C standard, in place of the current wording. I thus propose that we adopt the wording from the C standard, as follows
and:
Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.
and:
For signed integer types [...] Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for one's complement), is a trap representation or a normal value. In the case of sign and magnitude and one's complement, if this representation is a normal value it is called a negative zero.