Type Conversion/Casting Confusion in C++

Type Conversion/Casting Confusion in C++ - c++

What is Type Conversion and what is Type Casting?
When should I use each of them?
Detail: Sorry if this is an obvious question; I'm new to C++, coming from a ruby background and being used to to_s and to_i and the like.

Conversion is when a value is, um, converted to a different type. The result is a value of the target type, and there are rules for what output value results from what input (of the source type).
For example:
int i = 3;
unsigned int j;
j = i; // the value of "i" is converted to "unsigned int".
The result is the unsigned int value that is equal to i modulo UINT_MAX+1, and this rule is part of the language. So, in this case the value (in English) is still "3", but it's an unsigned int value of 3, which is subtly different from a signed int value of 3.
Note that conversion happened automatically, we just used a signed int value in a position where an unsigned int value is required, and the language defines what that means without us actually saying that we're converting. That's called an "implicit conversion".
"Casting" is an explicit conversion.
For example:
unsigned int k = (unsigned int)i;
long l = long(i);
unsigned int m = static_cast<unsigned int>(i);
are all casts. Specifically, according to 5.4/2 of the standard, k uses a cast-expression, and according to 5.2.3/1, l uses an equivalent thing (except that I've used a different type). m uses a "type conversion operator" (static_cast), but other parts of the standard refer to those as "casts" too.
User-defined types can define "conversion functions" which provide specific rules for converting your type to another type, and single-arg constructors are used in conversions too:
struct Foo {
int a;
Foo(int b) : a(b) {} // single-arg constructor
Foo(int b, int c) : a(b+c) {} // two-arg constructor
operator float () { return float(a); } // conversion function
};
Foo f(3,4); // two-arg constructor
f = static_cast<Foo>(4); // conversion: single-arg constructor is called
float g = f; // conversion: conversion function is called

Classic casting (something like (Bar)foo in C, used in C++ with reinterpret_cast<>) is when the actual memory contents of a variable are assumed to be a variable of a different type. Type conversion (ie. Boost's lexical_cast<> or other user-defined functions which convert types) is when some logic is performed to actually convert a variable from one type to another, like integer to a string, where some code runs to logically form a string out of a given integer.
There is also static and dynamic casting, which are used in inheritance, for instance, to force usage of a parent's member functions on a child's type (dynamic_cast<>), or vice-versa (static_cast<>). Static casting also allows you to perform the typical "implicit" type conversion that occurs when you do something like:
float f = 3.14;
int i = f; //float converted to int by dropping the fraction
which can be rewritten as:
float f = 3.14;
int i = static_cast<int>(f); //same thing

In C++, any expression has a type. when you use an expression of one type (say type S) in a context where a value of another type is required (say type D), the compiler tries to convert the expression from type S to type D. If such an implicit conversion doesn't exist, this results in an error. The word type cast is not standard but is the same as conversion.
E.G.
void f(int x){}
char c;
f(c); //c is converted from char to int.
The conversions are ranked and you can google for promotions vs. conversions for more details.
There are 5 explicit cast operators in C++ static_cast, const_cast, reinterpret_cast and dynamic_cast, and also the C-style cast

Type conversion is when you actually convert a type in another type, for example a string into an integer and vice-versa, a type casting is when the actual content of the memory isn't changed, but the compiler interpret it in a different way.

Type casting indicates you are treating a block of memory differently.
int i = 10;
int* ip = &i;
char* cp = reinterpret_cast<char*>(ip);
if ( *cp == 10 ) // Here, you are treating memory that was declared
{ // as int to be char.
}
Type conversion indicates that you are converting a value from one type to another.
char c = 'A';
int i = c; // This coverts a char to an int.
// Memory used for c is independent of memory
// used for i.

Related

What is a diference between ((int) a) and (int(a))?

What is a difference between ((int) a) and (int(a))?
Is the second expression valid in pure "С" (not "C" under "C++")?

There's no difference between them in C++. However, C supports only the first cast operation.
See this example from tutorial:
double x = 10.3;
int y;
y = (int) x; // c-like cast notation
y = int (x); // functional notation

(type_name)identifier (or more specifically (type_name)cast_expression (6.5.4)) is a C-style cast. (int(a)) is syntactically invalid in C unless a is a type. Then it could be part of a cast to a function taking type a and returning int, which would be a syntactically valid but semantically invalid cast, so useless too. int(a); in C would be a declaration equivalent to int a;.
C++ does support the int(a) syntax for casts (the type name must be a single word; it doesn't work with e.g., unsigned long(a)) on the grounds that int (the type name) then becomes kind of like a type with a parametrized constructor (although even this is in C++ grouped together with C-style casts as a kind of a deprecated way of casting, and the more fine-grained/visible static_cast/reinterpret_cast/const_cast casts are preferred).
The C++ syntax then appears to be quite interesting because this works (C++):
typedef int type_name;
type_name (a); //a declaration
a=0;
printf("%d\n", type_name(a)); //type_name(a) is a cast expr here

How to convert between int * and enum *?

From some C legacy code I get a number of constants as int *. In the C++ part, I have an enum of underlying type int. Conversion between the enum and int on a single value basis works. However, conversion between int * and enum * is not possible. See code example below.
Why is that and how would I convert a pointer to some int values to a pointer to int enums and vice versa? I kind of expect it to work since the single value conversions work and the underlying types are the same. I read about What happens if you static_cast invalid value to enum class?
but could not determine if potentially invalid values play a role here.
int i = 3;
enum E : int;
E e;
e = static_cast<E>(i); // ok
i = static_cast<int>(e); // ok
int *j;
E * f;
j = static_cast<int *>(&i); // ok
f = static_cast<E *>(&i); // 'static_cast': cannot convert from 'int *' to 'E *'
j = static_cast<int *>(&e); // 'static_cast': cannot convert from 'E *' to 'int *'
// now use j and f
*j = *f;

Why is that?
From the compiler point of view int* and E* are pointers of different non-related types, that is why static_cast is not applicable here.
How would I convert a pointer to some int values to a pointer to int enums and vice versa?
You might try reinterpret_cast instead of static_cast:
f = reinterpret_cast<E *>(&i);
j = reinterpret_cast<int *>(&e);
From reinterpret_cast:
Any pointer to object of type T1 can be converted to pointer to object of another type cv T2
However, note, that dereferencing f or j (i.e. with *f or *j) will be a violation of the strict aliasing rule (for more details see the discussion below). This means that this kind of conversion, though strictly possible, is usually not useful.

The default 'base type' of an enum is int and can be explicitly specified in the OP. Logically the value stored at E* e where E is an enumeration with base type int is an int. It can't be statically cast.
There's no guarantee in C++ that an enum of base type (say) is layout compatible with short but even if the language tightened up that point there could be issues of type compatibility/
One issue is that E* to int*pi would violate type-safety because pi could be used to quietly set values outside the enumeration.
Similarly int* to E* may violate type safety if the integer value isn't in the enumeration.
Note however the standard makes a clear note that there's nothing to preclude an enum taking a value outside its defined set of values:
This set of values is used to define promotion and conversion semantics for the enumeration type. It does not preclude an
expression of enumeration type from having a value that falls outside this range.
See here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf (see note 95 bottom of p. 156)
The only case that could (if layout-compatibility were assured) be valid is E* to const int* because all values of E* are ints and the value cannot be (correctly) modified through the int * pointer without a further violation of the type system.
But I think the language definition is not that subtle.

how to make c++11 auto smarter?

I am now beginning to use auto keyword in c++ 11. One thing I found it not so smart is illustrated in the following code:
unsigned int a = 7;
unsigned int b = 23;
auto c = a - b;
std::cout << c << std::endl;
As you can see, the type of c variable is unsigned int. But my intention is the difference of two unsigned int should be int. So I expect variable c is equal to -16. How could I use auto more wisely so that it can infer the type of c variable to int? Thanks.

Both a and b have a type of unsigned int. Consequently, the type of expression a - b is deduced as unsigned int and c has a type of unsigned int. So auto here works as it is supposed to do.
If you want to change a type from unsigned int to int you might use static_cast:
auto c = static_cast<int>(a - b);
Or explicitly specify the type for c:
int c = a - b;

You misunderstand what unsigned int type is.
unsigned int is a mod-2^k for some k (usually 32) integer. Subtracting two such mod-2^k integers is well defined, and is not a signed integer.
If you want a type that models a bounded set of integers from -2^k to 2^k-1 (with k usually equal to 31), use int instead of unsigned int. If you want them to be positive, simply make them positive.
Despite its name, unsigned int is not an int that has no sign and is thus positive. Instead, it is a very specific integral type that happens to have no notion of sign.
If you don't need mod-2^k math for some unknown implementation defined k, and are not desparate for every single bit of of magnitude to be packed into a value, don't use unsigned int.
What you appear to want is something like
positive<int> a = 7;
positive<int> b = 23;
auto c = a-b; // c is of type `int`, because the difference of two positive values may be negative
With a few syntax changes and a lot of work that might be possible, but it isn't what unsigned means.

So just because you don't know how basic expressions work, the C++11 auto keyword is dumb? How does that even make any sense to you?
In the expression auto c = a - b;, the auto c = has nothing whatsoever to do with the type used in the sub-expression a - b.
Ever since the very first C language draft, the type used by an expression is determined by the operands of that expression. This is true for everything from pre-standard K&R C to C++17.
Now what you need to do if you want negative numbers is, not too surprisingly, to use negative types. Change the operands to (signed) int or cast them to that type before invoking the - operator.
Declaring the result as a signed type without changing the type of the operands of - is not a good idea, because then you force a conversion from unsigned to signed, which isn't necessarily well-defined behavior.
Now, if the operands have different types, or are of small integer types, they will get implicitly promoted as per the usual arithmetic conversions. That does not apply in this specific case, since both operands are of the same type and not of a small integer type.
But... this is why the auto keyword is dumb and dangerous. Consider:
unsigned short a = 7;
unsigned short b = 23;
auto c = a - b;
Since both operands were unsigned, the programmer intended to use unsigned arithmetic. But here both operands are implicitly promoted to int, with an unintended change of signedness. The auto keyword pretends that nothing happened, whereas unsigned int c = a - b; would increase the chance of compiler diagnostic messages, or at least increase the chance of warnings from external static analysis tools. And it will also accidentally smoother out what could have otherwise been a change-of-signedness bug.
In addition, with auto we would end up with the wrong, unintended type for the declared variable.

What precedence are types derived when using auto?

I'd like to know in what order are types derived when using auto in C++? For example if I have
auto x = 12.5;
Will that result in a float or a double? Is there any reason it chooses one over the other in terms of speed,efficiency or size? And in what order are the types derived? Does it try int then double then string or is it not that simple?
Thanks

While C++ allows initialization of different typed variables with the same kind of literal, all literals in C++ have one specific type. Therefore the type deduction for auto variables does not need to be special for initialization with literals, it just takes the type of the right hand side (the single, unambiguous type of the literal, in your case) and applies it to the variable.
Examples for literals and their different types:
12.5 //double
12.5f //float
13 //int
13u //unsigned int
13l //long
13ull //unsigned long long
"foo" //char const [4]
'f' //char
So what about float f = 12.5;? Very simple: here the float f is initialized with a literal of type double and an implicit conversion takes place. 12.5 for itself never is float, it always is double.
An exception where the type of the auto variable does not have the type of the literal is when array-to-pointer decay takes place, which is the case for all string literals:
auto c = "bar"; //c has type char const*, while "bar" has type char const[4]
But this again is not special for literals but holds for all kinds of arrays:
int iarr[5] = {};
auto x = iarr; //x has type int*

How do C/C++ compilers handle type casting between types with different value ranges?

How do type casting happen without loss of data inside the compiler?
For example:
int i = 10;
UINT k = (UINT) k;
float fl = 10.123;
UINT ufl = (UINT) fl; // data loss here?
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p;
How does the compiler handle this type of typecasting? A low-level example showing the bits would be highly appreciated.

Well, first note that a cast is an explicit request to convert a value of one type to a value of another type. A cast will also always produce a new object, which is a temporary returned by the cast operator. Casting to a reference type, however, will not create a new object. The object referenced by the value is reinterpreted as a reference of a different type.
Now to your question. Note that there are two major types of conversions:
Promotions: This type can be thought of casting from a possibly more narrow type to a wider type. Casting from char to int, short to int, float to double are all promotions.
Conversions: These allow casting from long to int, int to unsigned int and so forth. They can in principle cause loss of information. There are rules for what happens if you assign a -1 to an unsigned typed object for example. In some cases, a wrong conversion can result in undefined behavior. If you assign a double larger than what a float can store to a float, the behavior is not defined.
Let's look at your casts:
int i = 10;
unsigned int k = (unsigned int) i; // :1
float fl = 10.123;
unsigned int ufl = (unsigned int) fl; // :2
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p; // :3
This cast causes a conversion to happen. No loss of data happens, since 10 is guaranteed to be stored by an unsigned int. If the integer were negative, the value would basically wrap around the maximal value of an unsigned int (see 4.7/2).
The value 10.123 is truncated to 10. Here, it does cause lost of information, obviously. As 10 fits into an unsigned int, the behavior is defined.
This actually requires more attention. First, there is a deprecated conversion from a string literal to char*. But let's ignore that here. (see here). More importantly, what does happen if you cast to an unsigned type? Actually, the result of that is unspecified per 5.2.10/7 (note the semantics of that cast is the same as using reinterpret_cast in this case, since that is the only C++ cast being able to do that):
A pointer to an object can be explicitly converted to a pointer to
an object of different type. Except that converting an rvalue of type “pointer to T1” to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
So you are only safe to use the pointer after you cast back to char * again.

The two C-style casts in your example are different kinds of cast. In C++, you'd normally write them
unsigned int uf1 = static_cast<unsigned int>(fl);
and
unsigned char* up = reinterpret_cast<unsigned char*>(p);
The first performs an arithmetic cast, which truncates the floating point number, so there is data loss.
The second makes no changes to data - it just instructs the compiler to treat the pointer as a different type. Care needs to be taken with this kind of cast: it can be very dangerous.

"Type" in C and C++ is a property assigned to variables when they're handled in the compiler. The property doesn't exist at runtime anymore, except for virtual functions/RTTI in C++.
The compiler uses the type of variables to determine a lot of things. For instance, in the assignment of a float to an int, it will know that it needs to convert. Both types are probably 32 bits, but with different meanings. It's likely that the CPU has an instruction, but otherwise the compiler would know to call a conversion function. I.e.
& __stack[4] = float_to_int_bits(& __stack[0])
The conversion from char* to unsigned char* is even simpeler. That is just a different label. At bit level, p and up are identical. The compiler just needs to remember that *p requires sign-extension while *up does not.

Casts mean different things depending on what they are. They can just be renamings of a data type, with no change in the bits represented (most casts between integral types and pointers are like this), or conversions that don't even preserve length (such as between double and int on most compilers). In many cases, the meaning of a cast is simply unspecified, meaning the compiler has to do something reasonable but doesn't have to document exactly what.
A cast doesn't even need to result in a usable value. Something like
char * cp;
float * fp;
cp = malloc(100);
fp = (float *)(cp + 1);
will almost certainly result in a misaligned pointer to float, which will crash the program on some systems if the program attempts to use it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js