Consider following code:
void foo(unsigned int x)
{
}
int main()
{
foo(-5);
return 0;
}
This code compiles with no problems. Errors like this can cause lots of problems and are hard to find. Why does C++ allow such conversion?
The short answer is because C supported such conversions originally and they didn't want to break existing software in C++.
Note that some compilers will warn on this. For example g++ -Wconversion will warn on that construct.
In many cases the implicit conversion is useful, for example when int was used in calculations, but the end result will never be negative (known from the algorithm and optionally asserted upon).
EDIT: Additional probable explanation: Remember that originally C was a much looser-typed language than C++ is now. With K&R style function declarations there would have been no way for the compiler to detect such implicit conversions, so why bother restricting it in the language. For example your code would look roughly like this:
int foo(x)
unsigned int x
{
}
int main()
{
foo(-5);
return 0;
}
while the declaration alone would have been int foo(x);
The compiler actually relied on the programmer to pass the right types into each function call and did no conversions at the call site. Then when the function actually got called the data on the stack (etc) was interpreted in the way the function declaration indicated.
Once code was written that relied on that sort of implicit conversion it would have become much harder to remove it from ANSI C even when function prototypes were added with actual type information. This is likely why it remains in C even now. Then C++ came along and again decided to not break backwards compatibility with C, continuing to allow such implicit conversions.
Just another quirk of a language that has lots of silly quirks.
The conversion is well-defined to wrap around, which may be useful in some cases.
It's backward-compatible with C, which does it for the above reasons.
Take your pick.
#user168715 is right. C++ was initially designed to be a superset of C, pretending to be as backward-compatible as possible.
The "C" philosophy is to deliver most of the responsibility to the programmer, instead of disallowing dangerous things. For C programmers it is heaven, for Java programmers, it is hell... a matter of taste.
I will dig the standards to see where exactly it is written, but I have no time for this right now. I'll edit my answer as soon as I can.
I also agree that some of the inherited freedom can lead to errors that are really hard to debug, so I am adding to what was said that in g++ you can turn on a warning to prevent you from doing this kind of mistake: -Wconversion flag.
-Wconversion
Warn for implicit conversions that may alter a value. This includes
conversions between real and integer,
like abs (x) when x is double;
conversions between signed and
unsigned, like unsigned ui = -1; and
conversions to smaller types, like
sqrtf (M_PI). Do not warn for explicit
casts like abs ((int) x) and ui =
(unsigned) -1, or if the value is not
changed by the conversion like in abs
(2.0). Warnings about conversions
between signed and unsigned integers
can be disabled by using
-Wno-sign-conversion.
For C++, also warn for confusing overload resolution for user-defined
conversions; and conversions that will
never use a type conversion operator:
conversions to void, the same type, a
base class or a reference to them.
Warnings about conversions between
signed and unsigned integers are
disabled by default in C++ unless
-Wsign-conversion is explicitly enabled.
Other compilers may have similar flags.
By the time of the original C standard, the conversion was already allowed by many (all?) compilers. Based on the C rationale, there appears to have been little (if any) discussion of whether such implicit conversions should be allowed. By the time C++ came along, such implicit conversions were sufficiently common that eliminating them would have rendered the language incompatible with a great deal of C code. It would probably have made C++ cleaner; it would certainly have made it much less used -- to the point that it would probably never have gotten beyond the "C with Classes" stage, and even that would just be a mostly-ignored footnote in the history of Bell labs.
The only real question along this line was between "value preserving" and "unsigned preserving" rules when promoting unsigned values "smaller" than int. The difference between the two arises when you have (for example) an unsigned short being added to an unsigned char.
Unsigned preserving rules say that you promote both to unsigned int. Value preserving rules say that you promote both values to int, if it can represent all values of the original type (e.g., the common case of 8-bit char, 16-bit short, and 32-bit int). On the other hand, if int and short are both 16 bits, so int cannot represent all values of unsigned short, then you promote the unsigned short to unsigned int (note that it's still considered a promotion, even though it only happens when it's really not a promotion -- i.e., the two types are the same size).
For better or worse, (and it's been argued both directions many times) the committee chose value preserving rather than unsigned preserving promotions. Note, however, that this deals with a conversion in the opposite direction: rather than from signed to unsigned, it's about whether you convert unsigned to signed.
Because the standard allows implicit conversion from signed to unsigned types.
Also (int)a + (unsigned)b results to unsigned - this is a c++ standard.
Related
I'm attempting to write a generic version of __builtin_clz that handles all integer types, including signed ones. To ensure that conversion of signed to unsigned types doesn't change the bit representation, I decided to use reinterpret_cast.
I've got stuck on int64_t which unlike the other types doesn't seem to work with reinterpret_cast.
I would think the code below is correct but it generates a warning in GCC.
#include <cstdint>
int countLeadingZeros(const std::int64_t value)
{
static_assert(sizeof(std::int64_t) == sizeof(unsigned long long));
return __builtin_clzll(reinterpret_cast<const unsigned long long&>(value));
}
(demo)
GCC shows a warning: dereferencing type-punned pointer will break strict-aliasing rules.
Clang compiles it without a complaint.
Which compiler is right?
If it is GCC, what is the reason for the violation of strict-aliasing?
Edit: After reading the answers, I can see that the described behavior applies not only to conversion int64_t -> unsigned long long but also to long -> long long. The latter one makes the problem a little more obvious.
If you have a signed integer type T, you can access its value through a pointer/reference to the unsigned version of T and vice-versa.
What you cannot do is access its value through a pointer/reference to the unsigned version of U, where U is not the original type. That's undefined behavior.
long and long long are not the same type, no matter what the size of those types say. int64_t may be an alias for a long, a long long, or some other type. But unless you know that int64_t is an alias for signed long long (and no, testing its size is not good enough), you cannot access its value through a reference to unsigned long long.
On compilers where both long and long long are 64-bit types without padding bits, an implementation may at its leisure define type int64_t as a synonym for long, a synonym for long long, a synonym for an extended integer type which will be treated as compatible with both, or a synonym for an extended integer type which is incompatible with both.
The C++ Standard allows, but does not require, that implementations treat types which are representation-compatible as alias-compatible. According to the C++ Draft:
Although this document states only requirements on C++ implementations, those requirements are often easier to understand if they are phrased as requirements on programs, parts of programs, or execution of programs.
If a program targets an implementation or configuration that documents that operations involving representation-compatible types will be processed "in a documented manner characteristic of the implementation"--a treatment which is explicitly provided for in the Standard--then a program which relies upon such types being alias-compatible would have defined behavior on that implementation, and could thus be correct.
If an implementation opts, or is configured, not to define the behavior of such actions, instead processing it in gratuitously useless fashion, then the behavior would not be defined on that implementation, and an attempt to use the code on that platform, rather than one that defined the behavior, would be erroneous.
Because C++ Standard explicitly states that it does not impose requirements on C++ programs, correctness of many program may only be judged in reference to particular implementations. Almost all implementations can be configured to define the behavior of code that relies upon representation-compatible types being alias-compatible, and such code would have defined behavior on such implementations or configurations.
In a Wikipedia article on type punning it gives an example of pointing an int type pointer at a float to extract the signed bit:
However, supposing that floating-point comparisons are expensive, and
also supposing that float is represented according to the IEEE
floating-point standard, and integers are 32 bits wide, we could
engage in type punning to extract the sign bit of the floating-point
number using only integer operations:
bool is_negative(float x) {
unsigned int *ui = (unsigned int *)&x;
return *ui & 0x80000000;
}
Is it true that pointing a pointer to a type not its own is undefined behavior? The article makes it seem as if this operation is a legitimate and common thing. What are the things that can possibly go wrong in this particular piece of code? I'm interested in both C and C++, if it makes any difference. Both have the strict aliasing rule, right?
Is it true that pointing a pointer to a type not its own is undefined behavior?
No, both C and C++ allow an object pointer to be converted to a different pointer type, with some caveats.
But with a few narrow exceptions, accessing the pointed-to object via the differently-typed pointer does have undefined behavior. Such undefined behavior arises from evaluating the expression *ui in the example function.
The article makes it seem as if this operation is a legitimate and common thing. What are the things that can possibly go wrong in this particular piece of code?
The behavior is undefined, so anything and everything within the power of the program to do is possible. In practice, the observed behavior might be exactly what the author(s) of the Wikipedia article expected, and if not, then the most likely misbehaviors are variations on the function computing incorrect results.
I'm interested in both C and C++, if it makes any difference. Both have the strict aliasing rule, right?
To the best of my knowledge, the example code has undefined behavior in both C and C++, for substantially the same reason.
The fact that it is technically undefined behaviour to call this is_negative function implies that compilers are legally allowed to "exploit" this fact, e.g., in the below code:
if (condition) {
is_negative(bar);
} else {
// do something
}
the compiler may "optimize out" the branch, by evaluating condition and then unconditionally proceeding to the else substatement even if the condition is true.
However, because this would break enormous amounts of existing code, "real" compilers are practically forced to treat is_negative as if it were legitimate. In legal C++, the author's intent is expressed as follows:
unsigned int ui;
memcpy(&ui, &x, sizeof(x));
return ui & 0x80000000;
So the reinterpret_cast approach to type punning, while undefined according to the standard in this case, is thought of by many people as "de facto implementation-defined" and equivalent to the memcpy approach.
Why
If this is undefined behavior then why is it given as a seemingly legitimate example?
This was a common practice before C was standardized and added the rules about aliasing, and it has unfortunately persisted in practice. Nonetheless, Wikipedia pages should not be offering it as examples.
Aliasing Via Pointer Conversions
Is it true that pointing a pointer to a type not its own is undefined behavior?
The rules are more complicated than that, but, yes, many uses of an object through an lvalue of a different type are not defined by the C or C++ standards, including this one. There are also rules about pointer conversions that may be violated.
The fact that many compilers support this behavior even though the C and C++ standards do not require them to is not a reason to do so, as there is a simple alternative defined by the standards (use memcpy, below).
Using Unions
In C, an object may be reinterpreted as another type using a union. C++ does not define this:
union { float f; unsigned int ui; } u = { .f = x };
unsigned int ui = u.ui;
or the new value may be obtained more tersely using a compound literal:
(union { float f; unsigned int ui; }) {x} .ui
Naturally, float and unsigned int should have the same size when using this.
Copying Bytes
Both C and C++ support reinterpreting an object by copying the bytes that represent it:
unsigned int ui;
memcpy(&ui, &x, sizeof ui);
Naturally, float and unsigned int should have the same size when using this. The above is C code; C++ requires std::memcpy or a suitable using declaration.
Accessing data through pointers (or unions) seems pretty common in (embedded) c code but requires often extra knowledge.
If a float would be smaller then an int, you would be accessing outside defined space.
the code takes several assumptions on where and how the sign bit is stored (little vs big endian, 2s-complement)
When the C Standard characterizes an action as invoking Undefined Behavior, that implies that at least one of the following is true:
The code is non-portable.
The code is erroneous.
The code is acting upon erroneous data.
One of the reasons for the Standard leaves some actions as Undefined is to, among other things, "identify areas of possible conforming language extension: the
implementor may augment the language by providing a definition of the officially undefined
behavior." A common extension, listed in the Standard as one of the ways implementations may process constructs that invokes "Undefined Behavior", is to process some such constructs by "behaving during translation or program execution in a documented manner characteristic of the environment".
I don't think the code listed in the example claims to be 100% portable. As such, the fact that it invokes Undefined Behavior does not preclude the possibility of it being non-portable but correct. Some compiler writers believe that the Standard was intended to deprecate non-portable constructs, but such a notion is contradicted by both the text of the Standard and the published Rationale. According to the published Rationale, the authors of the Standard wanted to give programmers a "fighting chance" [their term] to write portable code, and defined a category of maximally-portable programs, but not not specify portability as a requirement for anything other than strictly conforming C programs, and they expressly did not wish to demean programs that were conforming but not strictly conforming.
Here is a MWE of something I came across in some C++ code.
int a = (int)(b/c);
Why is (int) after the assignment operator?
Is it not recommended to use it like this?
This is simply a C-style typecast. It is used to make the author's intentions explicit, especially when the result of b/c is of another type (such as unsigned or float).
Without the cast, you will often get a compiler warning about an implicit conversion which can sometimes have consequences. By using the explicit cast, you are stating that you accept this conversion is fine within whatever other limits your program enforces, and the compiler will perform the conversion without emitting a warning.
In C++, we use static_cast<int>(b/c) to make the cast even more explicit and intentional.
This is a cast used to convert a variable or expression to a given type. In this case if b and c were floating point numbers, adding the cast (int) forces the result to an integer.
Specifically this is a "C style cast", modern C++ has some additional casts to given even more control (static_cast, dynamic_cast, const_cast etc)
It is not "(int) after the assignment operator".
It is "(int) before a float - the result of b/c".
It casts the float to an int.
This is a mistake. In the code:
int a = b/c;
then it may cause undefined behaviour if the result of the division is a floating point value that is out of range of int (e.g. it exceeds INT_MAX after truncation). Compilers may warn about this if you use warning flags.
Changing the code to int a = (int)(b/c); has no effect on the behaviour of the code, but it may cause the compiler to suppress the warning (compilers sometimes treat a cast as the programmer expressing the intent that they do not want to see the warning).
So now you just have silent undefined behaviour, unless the previous code is designed in such a way that the division result can never be out of range.
A better solution to the problem would be:
long a = std::lrint(b/c);
If the quotient is out of range then this will store an unspecified value in a and you can detect the error using floating point error handling. Reference for std::lrint
Why does C++ (and probably C as well) allow me to assign and int to a char without at least giving me a warning?
Is it okay to directly assign the value, like in
int i = 12;
char c = i;
i.e. do an implicit conversion, or shall I use a static_cast<>?
EDIT
Btw, I'm using gcc.
It was allowed in C before an explicit cast syntax was invented.
Then it remained a common practice, so C++ inherited it in order to not break a huge amount of code.
Actually most compilers issue a warning. If your doesn't, try change its settings.
C as originally designed wasn't really a strongly-typed language. The general philosophy was that you the programmer must know what you are doing, and the compiler is just there to help you do it. If you asked to convert between float, int, and unsigned char six or seven times in a single expression, well that must be what you wanted.
C++ sort of picked that up just so that all the existing C code wouldn't be too much of a bear to port. They are slowly trying to make it stronger with each revision though. Today just about any C++ compiler will give you a warning for that if you turn the warning levels up (which I highly recommend you do).
Otherwise, perhaps you should look into true strongly-typed languages, like Java and Ada. The equivalent Ada code would not compile without an explicit conversion.
Short answer: It's okay (by the c++ standard) in your example.
Slightly longer answer: It's not okay if char is signed and you are trying to assign it a value outside its range. It's okay if it is unsigned though (whether or not char is signed depends on your environment), then you'll get modulo arithmetics. Compilers usually have a switch to warn you because of the first case, but as long as you stay in the bounds it's perfectly fine (however, an explicit cast to make your intentions clear does not hurt).
char is the same as short. So, there should be a warning about possible lose of information. May be you have warnings switched off, try to configure somehow your compiler/ide.
I know that if the data type declaration is omitted in C/C++ code in such way: unsigned test=5;, the compiler automatically makes this variable an int (an unsigned int in this case). I've heard that it's a C standard and it will work in all compilers.
But I've also heard that doing this is considered a bad practice.
What do you think? Should I really type unsigned int instead of just unsigned?
Are short, long and long long also datatypes?
unsigned is a data type! And it happens to alias to unsigned int.
When you’re writing unsigned x; you are not omitting any data type.
This is completely different from “default int” which exists in C (but not in C++!) where you really omit the type on a declaration and C automatically infers that type to be int.
As for style, I personally prefer to be explicit and thus to write unsigned int. On the other hand, I’m currently involved in a library where it’s convention to just write unsigned, so I do that instead.
I would even take it one step further and use stdint's uint32_t type.
It might be a matter of taste, but I prefer to know what primitive I'm using over some ancient consideration of optimising per platform.
As #Konrad Rudolph says, unsigned is a datatype. It's really just an alias for unsigned int.
As to the question of using unsigned being bad practice? I would say no, there is nothing wrong with using unsigned as a datatype specifier. Professionals won't be thrown by this, and any coding standard that says you have to use unsigned int is needlessly draconian, in my view.
Gratuitous verbosity considered harmful. I would never write unsigned int or long int or signed anything (except char or bitfields) because it increases clutter and decreases the amount of meaningful code you can fit in 80 columns. (Or more likely, encourages people to write code that does not fit in 80 columns...)