Is comparing a short to a long an implicit conversion? - c++

From what I understand, comparing two different types, including a short and a long will still cause a conversion. I believe that a short will be promoted to an int. However, I can't seem to find a direct answer on comparing a short to a long.
For example:
Is it improper to compare a Uint32 to a Uint8.
Is it improper to add a Uint32 to a Uint8?
Uint32/Uint8 are shorthand typedefs in SDL for uint32_t and uint8_t, respectively.
EDIT:
I suppose I should be a bit more explicit on my overall question. I'm really wondering whether or not comparing or evaluating with two different types of ints, that are the same signage (in the example case, unsigned), but differ in the SIZE (uint8_t and uint32_t), is an improper thing to do.
Perhaps it is improper due to a implicit conversion. Perhaps it is improper because of a performance issue other than conversion. Maybe it is frowned upon because of some sort of readability issue I am unaware of.
In the comments two similar questions were linked, however they are comparing an int to a long. Which I believe is very similar, but doesn't an int just take the form of whichever version is needed (uint8_t, sint16_t, etc.)?

I believe this question is answered with this: http://en.cppreference.com/w/cpp/language/operator_arithmetic under the subsection 'Conversions'.
Otherwise, the operand has integer type (because bool, char, char16_t, char32_t, wchar_t, and unscoped enumeration were promoted at this point) and integral conversions are applied to produce the common type, as follows:
If both operands are signed or both are unsigned, the operand with lesser conversion rank is converted to the operand with the greater integer conversion rank
So the overall answer to my question is, Yes, there is a conversion. And from what I can tell, there are no issues with comparing two unsigned int types, as long as you don't compared signed and unsigned.

Related

Why do arithmetic operations on unsigned chars promote them to signed integers?

Many answers to similar questions point out that it is so due to the standard. But, I cannot understand the reasoning behind this decision by the standard setters.
From my understanding an unsigned char does not store the value in 2's complement form. So, I don't see a situation where let's say XORing two unsigned chars would produce unexpected behavior. Therefore, promoting them to int just seems like a waste of space (in most cases) and CPU cycles.
Moreover, why int? If a variable is being declared as unsigned, clearly the unsignedness is important to the programmer, therefore a promotion to an unsigned int would still make more sense than an int, in my opinion.
[EDIT #1] As stated out in the comments, promotion to unsigned int will take place if an int cannot sufficiently accommodate the value in the unsigned char.
[EDIT #2] To clarify the question, if it is about the performance benefit of operating over int than char, then why is it in the standard? This could have been given as a suggestion to compiler designers for better optimization. Now, if someone were to design a compiler which didn't do this that would make their compiler as one not adhering to the C/C++ standard fully, even though, hypothetically this compiler did support all other required features of the language. In a nutshell, I cannot figure out a reason for why I cannot operate directly over unsigned chars, therefore the requirement to promote them to ints, seems unnecessary. Can you give me an example which proves this wrong?
You can find this document on-line: Rationale for International Standard - Programming Languages - C (Revision 5.10, 2003).
Chapter 6.3 (p. 44 - 45) is about conversions
Between the publication of K&R and the development of C89, a serious divergence had occurred among implementations in the evolution of integer promotion rules. Implementations fell into two major camps which may be characterized as unsigned preserving and value preserving.
The difference between these approaches centered on the treatment of unsigned char and unsigned short when widened by the integer promotions, but the decision had an impact on the typing of constants as well (see §6.4.4.1).
The unsigned preserving approach calls for promoting the two smaller unsigned types to unsigned int. This is a simple rule, and yields a type which is independent of execution environment.
The value preserving approach calls for promoting those types to signed int if that type can properly represent all the values of the original type, and otherwise for promoting those types to unsigned int.
Thus, if the execution environment represents short as something smaller than int, unsigned short becomes int; otherwise it becomes unsigned int. Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with two's complement arithmetic and quiet wraparound on signed overflow - that is, in most current implementations. In such implementations, differences between the two only appear when these two conditions are both true:
An expression involving an unsigned char or unsigned short produces an int-wide result in which the sign bit is set, that is, either a unary operation on such a type, or a binary operation in which the other operand is an int or “narrower” type.
The result of the preceding expression is used in a context in which its signedness is significant:
• sizeof(int) < sizeof(long) and it is in a context where it must be widened to a long type, or
• it is the left operand of the right-shift operator in an implementation where this shift is defined as arithmetic, or
• it is either operand of /, %, <, <=, >, or >=.
In such circumstances a genuine ambiguity of interpretation arises. The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an unsigned int confronts a signed int across an operator, and the signed int has a negative value. Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation. Suddenly, the negative signed int becomes a very large unsigned int, which may be surprising, or it may be exactly what is desired by a knowledgeable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts.
One of the important outcomes of exploring this problem is the understanding that high-quality compilers might do well to look for such questionable code and offer (optional) diagnostics, and that conscientious instructors might do well to warn programmers of the problems of implicit type conversions.
The unsigned preserving rules greatly increase the number of situations where unsigned int confronts signed int to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the C89 Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.
QUIET CHANGE IN C89
A program that depends upon unsigned preserving arithmetic conversions will behave differently, probably without complaint. This was considered the most serious semantic change made by the C89 Committee to a widespread current practice.
For reference, you can find more details about those conversions updated to C11 in this answer by Lundin.

Why does C++ standard specify signed integer be cast to unsigned in binary operations with mixed signedness?

The C and C++ standards stipulate that, in binary operations between a signed and an unsigned integer of the same rank, the signed integer is cast to unsigned. There are many questions on SO caused by this... let's call it strange behavior: unsigned to signed conversion, C++ Implicit Conversion (Signed + Unsigned), A warning - comparison between signed and unsigned integer expressions, % (mod) with mixed signedness, etc.
But none of these give any reasons as to why the standard goes this way, rather than casting towards signed ints. I did find a self-proclaimed guru who says it's the obvious right thing to do, but he doesn't give a reasoning either: http://embeddedgurus.com/stack-overflow/2009/08/a-tutorial-on-signed-and-unsigned-integers/.
Looking through my own code, wherever I combine signed and unsigned integers, I always need to cast from unsigned to signed. There are places where it doesn't matter, but I haven't found a single example of code where it makes sense to cast the signed integer to unsigned.
What are cases where casting to unsigned in the correct thing to do? Why is the standard the way it is?
Casting from unsigned to signed results in implementation-defined behaviour if the value cannot be represented. Casting from signed to unsigned is always modulo two to the power of the unsigned's bitsize, so it is always well-defined.
The standard conversion is to the signed type if every possible unsigned value is representable in the signed type. Otherwise, the unsigned type is chosen. This guarantees that the conversion is always well-defined.
Notes
As indicated in comments, the conversion algorithm for C++ was inherited from C to maintain compatibility, which is technically the reason it is so in C++.
When this note was written, the C++ standard allowed three binary representations, including sign-magnitude and ones' complement. That's no longer the case, and there's every reason to believe that it won't be the case for C either in the reasonably bear future. I'm leaving the footnote as a historical relic, but it says nothing relevant to the current language.
It has been suggested that the decision in the standard to define signed to unsigned conversions and not unsigned to signed conversion is somehow arbitrary, and that the other possible decision would be symmetric. However, the possible conversion are not symmetric.
In both of the non-2's-complement representations contemplated by the standard, an n-bit signed representation can represent only 2n−1 values, whereas an n-bit unsigned representation can represent 2n values. Consequently, a signed-to-unsigned conversion is lossless and can be reversed (although one unsigned value can never be produced). The unsigned-to-signed conversion, on the other hand, must collapse two different unsigned values onto the same signed result.
In a comment, the formula sint = uint > sint_max ? uint - uint_max : uint is proposed. This coalesces the values uint_max and 0; both are mapped to 0. That's a little weird even for non-2s-complement representations, but for 2's-complement it's unnecessary and, worse, it requires the compiler to emit code to laboriously compute this unnecessary conflation. By contrast the standard's signed-to-unsigned conversion is lossless and in the common case (2's-complement architectures) it is a no-op.
If the signed casting was chosen, then simple a+1 would always result in signed type (unless constant was typed as 1U).
Assume a was unsigned int, then this seemingly innocent increment a+1 could lead to things like undefined overflow or "index out of bound", in the case of arr[a+1]
Thus, "unsigned casting" seems like a safer approach because people probably don't even expect casting to be happening in the first place, when simply adding a constant.
This is sort of a half-answer, because I don't really understand the committee's reasoning.
From the C90 committee's rationale document: https://www.lysator.liu.se/c/rat/c2.html#3-2-1-1
Since the publication of K&R, a serious divergence has occurred among implementations of C in the evolution of integral promotion rules. Implementations fall into two major camps, which may be characterized as unsigned preserving and value preserving. The difference between these approaches centers on the treatment of unsigned char and unsigned short, when widened by the integral promotions, but the decision has an impact on the typing of constants as well (see §3.1.3.2).
... and apparently also on the conversions done to match the two operands for any operator. It continues:
Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with twos-complement arithmetic and quiet wraparound on signed overflow --- that is, in most current implementations.
It then specifies a case where ambiguity of interpretation arises, and states:
The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an unsigned int confronts a signed int across an operator, and the signed int has a negative value. (Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation.) Suddenly, the negative signed int becomes a very large unsigned int, which may be surprising --- or it may be exactly what is desired by a knowledgable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts.
and:
The unsigned preserving rules greatly increase the number of situations where unsigned int confronts signed int to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.
Thus, they consider the case of int + unsigned an unwanted situation, and chose conversion rules for char and short that yield as few of those situations as possible, even though most compilers at the time followed a different approach. If I understand right, this choice then forced them to follow the current choice of int + unsigned yielding an unsigned operation.
I still find all of this truly bizarre.
Why does C++ standard specify signed integer be cast to unsigned in binary operations with mixed signedness?
I suppose that you mean converted rather than "cast". A cast is an explicit conversion.
As I'm not the author nor have I encountered documentation about this decision, I cannot promise that my explanation is the truth. However, there is a fairly reasonable potential explanation: Because that's how C works, and C++ was based on C. Unless there was an opportunity improve upon the rules, there would be no reason to change what works and what programmers have been used to. I don't know if the committee even deliberated changing this.
I know what you may be thinking: "Why does C standard specify signed integer...". Well, I'm also not the author of C standard, but there is at least a fairly extensive document titled "Rationale for
American National Standard
for Information Systems -
Programming Language -
C". As extensive it is, it doesn't cover this question unfortunately (it does cover a very similar question of how to promote integer types narrower than int in which regard the standard differs from some of the C implementations that pre-date the standard).
I don't have access to a pre-standard K&R documents, but I did find a passage from book "Expert C Programming: Deep C Secrets" which quotes rules from the pre-standard K&R C (in context of comparing the rule with the standardised ones):
Section 6.6 Arithmetic Conversions
A great many operators cause conversions and yield result types in a similar way. This pattern will be called the "usual arithmetic conversions."
First, any operands of type char or short are converted to int, and any of type float are converted to double. Then if either operand is double, the other is converted to double and that is the type of the result. Otherwise, if either operand is long, the other is converted to long and that is the type of the result. Otherwise, if either operand is unsigned, the other is converted to unsigned and that is the type of the result. Otherwise, both operands must be int, and that is the type of the result.
So, it appears that this has been the rule from since before standardisation of C and was presumably the chosen by the designer himself. Unless someone can find a written rationale, we may never know the answer.
What are cases where casting to unsigned in the correct thing to do?
Here is an extremely simple case:
unsigned u = INT_MAX;
u + 42;
The type of the literal 42 is signed, so with your proposed / designer rule, u + 42 would also be signed. This would be quite surprising and would result in the shown program to have undefined behaviour due to signed integer overflow.
Basically, implicit conversion to signed and to unsigned each have their problems.

Is promotion and widening the same thing?

Is there's a difference between promotion and widening, I've heard that widening only describes integral promotion.
Widening "typically" refers to integral/floating point types (as in a char going to a long or float to double), but it can also refer to character widening (as in going from a char type to a wchar_t type).
Widening conversions are also known as "promotions" and narrowing conversions are known as "coercion".
The notion of "promotion" and "coercion" can also be used in the OO since as well (polymorphism); as in promotion of a base class to a derived type, or coercion of derived type to base. In this since it's still a "widening" and "narrowing" as the address space used for the base is "less" than the derived type (hence you are widening/promoting your types when "up-casting", or narrowing/coercing your types when "down-casting").
So to answer directly: Is there's a difference between promotion and widening .. no not really (unless you are feeling pedantic), though I probably wouldn't say "widen that class type" over "promote that class type" if I was talking about non-integrals (just to avoid any possible initial confusion).
It really depends on context, because the term "widening" is an informal term, and the meaning varies a bit depending on who is telling the story. I'll describe some common interpretations (but not the only ones).
Before doing that, it is necessary to describe what promotions are.
The C++ standard describes integral promotions (between integral types) and floating point promotions (between floating point types). Conversion between an integral type and a floating point type is not described as a promotion.
The common features are that promotions are generally value preserving (except from signed to unsigned integral types, which uses modulo arithmetic) but need not involve increasing the size of a variable (or range of values it can represent). For example, a short may be promoted to an int, but a short and an int may also be the same size (albeit that is implementation/compiler dependent).
The C++ standard doesn't use the term "widening" at all (except in some contexts in the library, unrelated to type conversions). A common informal meaning, in context of integral and floating point conversions, is a promotion that is BOTH value preserving AND to a larger type. The implementation is typically setting the additional bits in the result to zero (i.e. making the value wider without fiddling the bits that represent it). So signed char to short, short to long, unsigned char to unsigned short are widening conversions (assuming none of the types are equal size). Similarly, float to double is a widening conversion (the standard guarantees that the values a float can represent is a strict subset of the values that a double can represent). Conversion from int to double is not a widening (e.g. not necessarily value preserving, bits may be fiddled).
Widening is also sometimes used to describe a conversion of a pointer to derived class into a pointer to base class (or between similar references). The reverse is called "narrowing" and - in C++ - can only be forced with an explicit type conversion.

In C++, what happens when I use static_cast<char> on an integer value outside -128,127 range?

In a code compiled on i386 Linux using g++, I have used static_cast<char>() cast on a value that might exceed the valid range of -128,127 for a char. There were no errors or exceptions and so I used the code in production.
The problem is now I don't know how this code might behave when a value outside this range is thrown at it. There is no problem if data is modified or truncated, I only need to know how this modification behaves on this particular platform.
Also what would happen if C-style cast ((char)value) had been used? would it behave differently?
In your case this would be an explicit type conversion. Or to be more precise an integral conversions.
The standard says about this(4.7):
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is implementation-defined.
So your problem is implementation-defined. On the other hand I have not yet seen a compiler that does not just truncate the larger value to the smaller one. And I have never seen any compiler that uses the rule mentioned above.
So it should be fairly safe to just cast your integer/short to the char.
I don't know the rules for an C cast by heart and I really try to avoid them because it is not easy to say which rule will kick in.
This is dealt with in §4.7 of the standard (integral conversions).
The answer depends on whether in the implementation in question char is signed or unsigned. If it is unsigned, then modulo arithmetic is applied. §4.7/2 of C++11 states: "If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2 n where n is the number of bits used to represent the unsigned type)." This means that if the input integer is not negative, the normal bit truncation you expect will arise. If is is negative, the same will apply if negative numbers are represented by 2's complement, otherwise the conversion will be bit altering.
If char is signed, §4.7/3 of C++11 applies: "If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined." So it is up to the documentation for the particular implementation you use. Having said that, on 2's complement systems (ie all those in normal use) I have not seen a case where anything other than normal bit truncation occurs for char types: apart from anything else, by virtue of §3.9.1/1 of the c++11 standard all character types (char, unsigned char and signed char) must have the same object representation and alignment.
The effect of a C style case, an explicit static_cast and an implicit narrowing conversion is the same.
Technically the language specs for unsigned types agree in inposing a plain base-2. And for unsigned plain base-2 its pretty obvious what extension and truncation do.
When going to unsigned, however, the specs are more "tolerant" allowing potentially different kind of processor to use different ways to represent signed numbers. And since a same number may have in different platform different representations is practically not possible to provide a description on what happen to it when adding or removing bits.
For this reason, language specification remain more vague by saying that "the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined"
In other words, compiler manufacturer are required to do the best as they can to keep the numeric value. But when this cannot be done, they are free to adapt to what is more efficient for them.

understanding widening and narrowing conversions C++

I wanted to understand C++ types along with widening and narrowing conversions in a little bit more details (C++03 specific).I am aware that you cannot put multiple questions on a single question thread however all these questions are related to the same topic. Now I have the C++ types and ranges msdn
Now consider the following example:
int a = 35000;
short b;
b=a;
Now in the above example I know a is 4 bytes and b is 2 bytes. I am aware that a widening conversion increases precision where as a narrowing conversion decreases precision. The above example I believe would be a narrowing cast (since there is loss of data) 4 bytes is going into two bytes. However I get confused in certain cases what conversion is taking place is it widening or is it narrowing ? Are we suppose to look at the number of bytes of types to determine if it will be a widening or a narrowing conversion or are we suppose to look at ranges? Also I read that for narrowing conversion explicit casting is required however in this case the compiler did not point out any error and implicitly did the conversion which was wrong. Any suggestions on why that happened ? I am using VS2010
Narrowing doesn't necessarily depend solely on size. For example, a long long and a double, are both normally the same size (64 bits apiece). Nonetheless, conversion in either direction is a narrowing conversion, because information can be lost either way. The same can happen with a conversion between a signed integer type and an unsigned integer type of the same size--conversion in either direction can lose information (i.e., result in a value that doesn't match the input) so both are considered narrowing conversions.
If you work solely within the same...class of types (e.g., all signed integers or all floating point), then you can use the size of the item to figure out whether a conversion will involve narrowing (in theory there might be times/situations where this was false, but it's true for the types provided by most hardware and compilers).
Also note that C++ does add some exceptions for conversions when the source is a literal, and the source value can be put in the destination without losing information, even though some values of the source type could lead to information loss.
For example, if I do something like float a = 3.5;, the literal 3.5 has type double, but since it's a literal and it can be converted to float without losing information, it's allowed even in places where narrowing conversions are prohibited (e.g., braced initialization lists).
For integral types, the precision is always 1. Widening conversion increases the range, narrowing conversion decreases the range.
http://msdn.microsoft.com/en-us/library/s3f49ktz.aspx
The range of a short is –32,768 to 32,767. your input, 3500 is bigger than that. What ends up happening is the number gets crammed into the smaller number of bytes, causing a wrap around error (b is negative). (The specifics of that have to do with complements of 2 and what not, but for your level it is suffice to say "you broke it, it don't work").
To avoid, compiler with -Wnarrowing or -Wall with gcc, or /Wall with the visual studio compiler (i don't actually know if /Wall will catch it though).