Result of bitwise operator in C++ - c++

Testing a couple of compilers (Comeau, g++) confirms that the result of a bitwise operator of some "integer type" is an int:
void foo( unsigned char );
void foo( unsigned short );
unsigned char a, b;
foo (a | b);
I would have expected the type of "a | b" to be an unsigned char, as both operands are unsigned char, but the compilers say that the result is an int, and the call to foo() is ambiguous. Why is the language designed so that the result is an int, or is this implementation dependent?
Thanks,

This is in fact standard C++ behavior (ISO/IEC 14882):
5.13/1 Bitwise inclusive OR operator
The usual arithmetic conversions are
performed; the result is the bitwise
inclusive OR function of its operands.
The operator applies only to integral
or enumeration operands.
5/9 Usual arithmetic conversions
Many binary operators that expect
operands of arithmetic or enumeration
type cause conversions and yield
result types in a similar way. The purpose
is to yield a common type, which is also
the type of the result. This
pattern is called the usual arithmetic
conversions, which are defined as
follows:
If either operand is of type long double,
the other shall be converted to long double.
Otherwise, if either operand is double,
the other shall be converted to double.
Otherwise, if either operand is float,
the other shall be converted to float.
Otherwise, the integral promotions
shall be performed on both operands.
...
4.5/1 Integral Promotions
An rvalue of type char, signed char,
unsigned char, short int, or unsigned
short int can be converted to an
rvalue of type int if int can
represent all the values of the source
type; otherwise, the source rvalue can be
converted to an rvalue of type unsigned int.
I think it has to do with int supposedly being the "natural" size for the execution environment to allow for efficient arithmetic (see Charles Bailey's answer).

I would have expected the type of "a | b" to be an unsigned char, as both operands are unsigned char,
My reading of some beginner C books in past left the impression that bitwise operators were left in the language solely for the purpose of the system programming and generally should be avoided.
The operators are performed by the CPU itself. CPU uses for operands registers (which are surely larger than char) and thus compiler cannot know how much bits of a register would be affected by the operation. To not to loose the full result of the operation, compiler upcasts the result to the proper operation. AFAICT.
Why is the language designed so that the result is an int, or is this implementation dependent?
Bit-level representation of data types is in fact implementation defined. That might be the reason why apparently bit-wise operations are also implementation defined.
Though C99 defines in 6.2.6.2 Integer types how they should appear and behave (and later how bitwise operations should work) the particular chapter gives a lot of freedom to the implementation.

It seems this is the same thing as in Java:
Short and char (and other integers smaller than an int) are weaker types than int. Every operation on these weaker types is therefore automatically unboxed into an int.
If you really want to get a short, you will have to typecast it.
Unfortunately I can't tell you why this was done, but it seems this is a relatively common language decision...

Isn't short the same as short int? in the same way that long is synonymous with int. eg. a short is an int taking up less memory then a standard int?

int is supposed to be the natural word size for any given machine architecture and many machines have instructions that only (or at least optimally) perform arithmetic operations on machine words.
If the language were defined without integer promotion, many multistep calculations which could otherwise naturally be mapped directly into machine instructions might have to be interspersed with masking operations performed on the intermediates in order to generate 'correct' results.

Neither C nor C++ ever perform any arithmetic operations on types smaller than int. Any time you specify a smaller operand (any flavor of char or short), the operand gets promoted to either int or unsigned int, depending on the range.

Related

Why is signed and unsigned addition converted differently for 16 and 32 bit integers?

It seems the GCC and Clang interpret addition between a signed and unsigned integers differently, depending on their size. Why is this, and is the conversion consistent on all compilers and platforms?
Take this example:
#include <cstdint>
#include <iostream>
int main()
{
std::cout <<"16 bit uint 2 - int 3 = "<<uint16_t(2)+int16_t(-3)<<std::endl;
std::cout <<"32 bit uint 2 - int 3 = "<<uint32_t(2)+int32_t(-3)<<std::endl;
return 0;
}
Result:
$ ./out.exe
16 bit uint 2 - int 3 = -1
32 bit uint 2 - int 3 = 4294967295
In both cases we got -1, but one was interpreted as an unsigned integer and underflowed. I would have expected both to be converted in the same way.
So again, why do the compilers convert these so differently, and is this guaranteed to be consistent? I tested this with g++ 11.1.0, clang 12.0. and g++ 11.2.0 on Arch Linux and Debian, getting the same result.
When you do uint16_t(2)+int16_t(-3), both operands are types that are smaller than int. Because of this, each operand is promoted to an int and signed + signed results in a signed integer and you get the result of -1 stored in that signed integer.
When you do uint32_t(2)+int32_t(-3), since both operands are the size of an int or larger, no promotion happens and now you are in a case where you have unsigned + signed which results in a conversion of the signed integer into an unsigned integer, and the unsigned value of -1 wraps to being the largest value representable.
So again, why do the compilers convert these so differently,
Standard quotes for [language-lawyer]:
[expr.arith.conv]
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way.
The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
...
Otherwise, the integral promotions ([conv.prom]) shall be performed on both operands.
Then the following rules shall be applied to the promoted operands:
If both operands have the same type, no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
[conv.prom]
A prvalue of an integer type other than bool, char8_­t, char16_­t, char32_­t, or wchar_­t whose integer conversion rank ([conv.rank]) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
These conversions are called integral promotions.
std::uint16_t type may have a lower conversion rank than int in which case it will be promoted when used as an operand. int may be able to represent all values of std::uint16_t in which case the promotion will be to int. The common type of two int is int.
std::uint32_t type may have the same or a higher conversion rank than int in which case it won't be promoted. The common type of an unsigned type and a signed of same rank is an unsigned type.
For an explanation why this conversion behaviour was chosen, see chapter "6.3.1.1 Booleans, characters, and integers" of "
Rationale for
International Standard—
Programming Languages—
C". I won't quote the entire chapter here.
is this guaranteed to be consistent?
The consistency depends on relative sizes of the integer types which are implementation defined.
Why is this,
C (and hence C++) has a rule that effectively says when a type smaller than int is used in an expression it is first promoted to int (the actual rule is a little more complex than that to allow for multiple distinct types of the same size).
Section 6.3.1.1 of the Rationale for International Standard Programming Languages C claims that in early C compilers there were two versions of the promotion rule. "unsigned preserving" and "value preserving" and talks about why they chose the "value preserving" option. To summarise they believed it would produce correct results in a greater proportion of situations.
It does not however explain why the concept of promotion exists in the first place. I would speculate that it existed because on many processors, including the PDP-11 for which C was originally designed, arithmetic operations only operated on words, not on units smaller than words. So it was simpler and more efficient to convert everything smaller than a word to a word at the start of an expression.
On most platforms today int is 32 bits. So both uint16_t and int16_t are promoted to int. The artithmetic proceeds to produce a result of type int with a value of -1.
OTOH uint32_t and int32_t are not smaller than int, so they retain their original size and signedness through the promotion step. The rules for when the operands to an arithmetic operator are of different types come into play and since the operands are the same size the signed operand is converted to unsigned.
The rationale does not seem to talk about this rule, which suggests it goes back to pre-standard C.
and is the conversion consistent on all compilers and platforms?
On an Ansi C or ISO C++ platform it depends on the size of int. With 16 bit int both examples would give large positive values. With 64-bit int both examples would give -1.
On pre-standard implementations it's possible that both expressions might return large positive numbers.
A 16-bit unsigned int can be promoted to a 32-bit int without any lost values due to range differences, so that's what happens. Not so for the 32-bit integers.

Integer promotion for implementations where sizeof(short) == sizeof(int)

Background
I'm looking into integer promotion rules in C++, and came across the following (taken from n4296):
4.5.1 [pconv.prom]
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type;
otherwise, the source prvalue can be converted to a prvalue of type
unsigned int.
4.13.1.3 [conv.rank]
The rank of long long int shall be greater than the rank of long int, which shall be greater than the rank of int, which
shall be greater than the rank of short int, which shall be greater
than the rank of signed char.
5.10 [expr]
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result
types in a similar way. The purpose is to yield a common type, which
is also the type of the result. This pattern is called the usual
arithmetic conversions , which are defined as follows:
[many points omitted]
Otherwise, the integral promotions (4.5) shall be performed on both operands.
[further omissions]
Question
Given that both the size and range of short int may be equal to that of int, it seems unecessary to promote a short int to an int in those circumstances. Yet, the above makes no mention of such an exception.
Does the standard require that short int is promoted to int in these circumstances (even if implementation optimisations presult in the same executable being produced)?
Note / additional question
I've also noticed that the wording uses "can be" rather than "shall be", is this intentional?
It is perfectly legitimate, and at one time was extremely commonplace, for short and int to have identical ranges and representations; even in today it's not uncommon for embedded systems to use the same 16-bit representation for both. The C specification does contain some language which is specific to such implementations, since on such platforms unsigned short promotes to unsigned int whereas it promotes to signed int on platforms where the latter type can represent all values of unsigned short. It does not exempt such implementations from the rule requiring that types smaller than int or unsigned int be promoted to one of those because nothing would be gained from such an exemption.
The standard allows that an implementation may perform computations in any fashion it sees fit if, in all cases where the indicated promotions to int or unsigned int would yield defined behavior, the implementations' computations yield that same behavior (the as-if rule). If the behaviors of rvalues of types int and short would be indistinguishable, that would imply that an implementation could perform computations on short rather than int if it chose to do so. There's no need to add a rule to the standard exempting int-sized short types from promotion since the as-if rule is already adequate.

size_t divided by int type conversion rules

When I am doing arithmetic operations with size_t type (or unsigned long), how careful should I be with decorating integer constants with type literals. For example,
size_t a = 1111111;
if (a/2 > 0) ...;
What happens when compiler does the division? Does it treat 2 as integer or as unsigned integer? If the former, then what is the resulting type for (unsigned int)/(int)?
Should I always carefully write 'u' literals
if (a/2u > 0) ...;
for (a=amax; a >= 0u; a -= 3u) ...;
or compiler will correctly guess that I want to use operations with unsigned integers?
2 is indeed treated as an int, which is then implicitly converted to size_t. In a mixed operation size_t / int, the unsigned type "wins" and signed type gets converted to unsigned one, assuming the unsigned type is at least as wide as the signed one. The result is unsigned, i.e. size_t in your case. (See Usual arithmetic conversions for details).
It is a better idea to just write it as a / 2. No suffixes, no type casts. Keep the code as type-independent as possible. Type names (and suffixes) belong in declarations, not in statements.
The C standard guarantees that size_t is an unsigned integer.
The literal 2 is always of type int.
The "usual artihmetic converstions guarantee that whenever an unsigned and a signed integer of the same size ("rank") are used as operands in a binary operation, the signed operand gets converted to unsigned type.
So the compiler actually interprets the expression like this:
a/(size_t)2 > (size_t)0
(The result of the > operator or any relational operator is always of type int though, as a special case for that group of operators.)
Should I always carefully write 'u' literals
Some coding standards, most notably MISRA-C, would have you do this, to make sure that no implicit type promotions exist in the code. Implicit promotions or conversions are very dangerous and they are a flaw in the C language.
For your specific case, there is no real danger with implicit promotions. But there are cases when small integer types are used and you might end up with unintentional change of signedness because of implicit type promotions.
There is never any harm with being explicit, although writing an u suffix to every literal in your code may arguably reduce readability.
Now what you really must do as a C programmer to deal with type promotion dangers, is to learn how the integer promotions and usual arithmetic conversions work (here's some example on the topic). Sadly, there are lots of C programmers who don't, veterans included. The result is subtle, but sometimes critical bugs. Particularly when using the bit-wise operators such as shifts, where change of signedness could invoke undefined behavior.
These rules can be somewhat tricky to learn as they aren't really behaving rationally or consistently. But until you know these rules in detail you have to be explicit with types.
EDIT: To be picky, the size of size_t is actually not specified, all the standard says is that it must be large enough to at least hold the value 65535 (2 bytes). So in theory, size_t could be equal to unsigned short, in which case promotions would turn out quite different. But in practice I doubt that scenario is of any interest, since I don't believe there exists any implementation where size_t is smaller than unsigned int.
Both C++ and C promote signed types to unsigned types when evaluating an operator like division which takes two arguments, and one of the arguments is an unsigned type.
So the literal 2 will be converted to an unsigned type.
Personally, I believe it's better to leave promotion to the compiler rather than be explicit: if your code was ever refactored and a became a signed type then a / 2u would cause a to be promoted to an unsigned type, with potentially disastrous consequences.
size_t sz = 11;
sz / 2 = 5
sz / (-2) = 0
WHY?
sz is treated as unsigned int because size can not be negative. When doing an arithmetic operation with an unsigned int and an int, the int is turned into unsigned int.
From "CS dummies"

Why does C++ converts a signed int to an unsigned int in an expression?

When comparing an int to an unsigned int, for example:
signed int si = -1;
unsigned int ui = 0;
if (ui > si)
{
// Do something
}
si will be converted to an unsigned int and so it will be bigger than ui. So why is this even allowed if the result will not be as expected, is it made this way for historical reasons, and if they had to do it again they wouldn't allow it?
C++ has the following rules for deciding the type to which the two values are converted after integer promotions have been done (chapter 5, clause 9):
If both operands have the same type, no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
The last rule applies here, because both int and unsigned int have the same rank.
This rule exists because it is the best solution to the problem.
You can't compare apples and oranges. The only options are:
convert orange to apple
convert apple to orange
convert both to some other common type
Out of the first two options, converting both to unsigned makes more sense than converting both to signed.
How about the third option? I suppose a possibility would be to convert both values to long and then do the comparison. This might seem like a good idea at first, but if you think about it some more then there are some problems:
If long and int are the same size then this doesn't actually help
Even if long is bigger than int, you have just moved the problem off to the case of comparing a long with an unsigned long .
It would be harder to write portable code.
The last point is important. The historical rules about short and char being promoted to int are actually extremely annoying when you are writing template code or code with overloaded functions, because it changes which overload is called.
We would not want to introduce any more rules of the same type (e.g. promote int to long if it is in comparison with unsigned int but only if sizeof(long) > sizeof(int) yada yada yada).
The reason is mostly historical. C++ is big on being compatible with C code even today. You can take a C code-base and convert it verbatim to C++ and it will probably work, even though there are some minor differences and incompatibilities. C has defined it that way and C++ will not change it, because otherwise it would change the meaning of code and therefore break programs that would otherwise work.
In the current working draft (N4296) you can find the rules in section 5.10.5.
There are only two choices for the language:
treat the signed as unsigned
treat the unsigned as signed
As dasblinkenlight says, the language mandates the former. The reason is that it makes the code simpler. In modern machines, the top bit is the sign bit, and the hardware can perform either a signed or an unsigned comparison, so the code is just a compare followed by an unsigned conditional jump.
To treat the unsigned as signed, the compiler could throw away (mask out) the top bit in the unsigned word ui and then perform a signed test, but that would change its value. Alternatively it could test the top bit of ui first and return greater if set, then perform the masking above.
Bottom line, the language choice was made because it's more code-efficient.

Why does C/C++ automatically convert char/wchar_t/short/bool/enum types to int?

So, if I understood it well, integral promotion provides that: char, wchar_t, bool, enum, short types ALWAYS are converted to int (or unsigned int). Then, if there are different types in an expression, further conversion will be applied.
Am I understanding this well?
And if yes, then my question: Why is it good? Why? Don't become char/wchar_t/bool/enum/short unnecessary? I mean for example:
char c1;
char c2;
c1 = c2;
As I described before, char ALWAYS is converted to int, so in this case after automatic converting this looks like this:
int c1;
int c2;
c1 = c2;
But I can't understand why is this good, if I know that char type will be enough for my needs.
Storage types are never automatically converted. You only get automatic integer promotion as soon as you start doing integer arithmetics (+, -, bitshifts, ...) on those variables.
char c1, c2; // stores them as char
char c3 = c1 + c2; // equivalent to
char c3 = (char)((int)c1 + (int)c2);
The conversions you're asking about are the usual arithmetic conversions and the integer promotions, defined in section 6.3.1.8 of the latest ISO C standard. They're applied to the operands of most binary operators ("binary" meaning that they take two operands, such as +, *, etc.). (The rules are similar for C++. In this answer, I'll just refer to the C standard.)
Briefly the usual arithmetic conversions are:
If either operand is long double, the other operand is converted to long double.
Otherwise, if either operand is double, the other operand is converted to double.
Otherwise, if either operand is float, the other operand is converted to float.
Otherwise, the integer promotions are performed on both operands, and then some other rules are applied to bring the two operands to a common type.
The integer promotions are defined in section 6.3.1.1 of the C standard. For a type narrower than int, if the type int can hold all the values of the type, then an expression of that type is converted to int; otherwise it's converted to unsigned int. (Note that this means that an expression of type unsigned short may be converted either to int or to unsigned int, depending on the relative ranges of the types.)
The integer promotions are also applied to function arguments when the declaration doesn't specify the type of the parameter. For example:
short s = 2;
printf("%d\n", s);
promotes the short value to int. This promotion does not occur for non-variadic functions.
The quick answer for why this is done is that the standard says so.
The underlying reason for all this complexity is to allow for the restricted set of arithmetic operations available on most CPUs. With this set of rules, all arithmetic operators (other than the shift operators, which are a special case) are only required to work on operands of the same type. There is no short + long addition operator; instead, the short operand is implicitly converted to long. And there are no arithmetic operators for types narrower than int; if you add two short values, both arguments are promoted to int, yielding an int result (which might then be converted back to short).
Some CPUs can perform arithmetic on narrow operands, but not all can do so. Without this uniform set of rules, either compilers would have to emulate narrow arithmetic on CPUs that don't support it directly, or the behavior of arithmetic expressions would vary depending on what operations the target CPU supports. The current rules are a good compromise between consistency across platforms and making good use of CPU operations.
if I understood it well, integral promotion provides that: char, wchar_t, bool, enum, short types ALWAYS converted to int (or unsigned int).
Your understanding is only partially correct: short types are indeed promoted to int, but only when you use them in expressions. The conversion is done immediately before the use. It is also "undone" when the result is stored back.
The way the values are stored remains consistent with the properties of the type, letting you control the way you use your memory for the variables that you store. For example,
struct Test {
char c1;
char c2;
};
will be four times as small as
struct Test {
int c1;
int c2;
};
on systems with 32-bit ints.
The conversion is not performed when you store the value in the variable. The conversion is done if you cast the value or if you perform some operation like some arithmetic operation on it explicitly
It really depends on your underlying microprocessor architecture. For example, if your processor is 32-bit, that is its native integer size. Using its native integer size in integer computations is better optimized.
Type conversion takes place when arithmetic operations, shift operations, unary operations are performed. See what standard says about it:
C11; 6.3.1.4 Real floating and integer:
If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int; otherwise, it is converted to an unsigned
int. These are called the integer promotions.58) All other types are unchanged by the
integer promotions.
58.The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators,1 as specified by their respective subclauses
1. Emphasis is mine.