Casting positive 'int' to 'size_t'

Casting positive 'int' to 'size_t' - c++

The difference between size_t and int is well-documented, which I recapitulate: the former is an alias to some unsigned integer type that's implementation-dependent, whereas the latter is signed; the former is preferable for memory declarations, whereas the latter is better for arithmetic operations.
My question is, if I do some arithmetic computations to give an int (which is never too large and is always positive) and assign it to a size_t variable (that's used for accessing array locations), is there any situation in which a problem may arise?

Assigning a signed type to an unsigned type is always well-defined (even for negative values). If the signed variable is no larger than the unsigned type, and has only non-negative numbers the value will not change in such a conversion.

Related

What is the recommended way to convert from long long int to uint64_t?

Consider a function that returns a long long int value. Even though it returns a long long int, the logic guarantees that it is always positive. I want to use the return value assigned to an uint64_t. Given the logic is correct what is the recommended way to do this cast? Should I just assign to it or do a static cast?

This is an implicit conversion, integral convertion, no cast is required:
If the destination type is unsigned, the resulting value is the smallest unsigned value equal to the source value modulo 2n
where n is the number of bits used to represent the destination type.
That is, depending on whether the destination type is wider or narrower, signed integers are sign-extended or truncated and unsigned integers are zero-extended or truncated respectively.
static_cast adds no value.
A static_assert can be used to prevent truncation, e.g.:
static_assert(sizeof(uint64_t) >= sizeof(long long), "Truncation detected.");`
There is also boost::numeric_cast:
The fact that the behavior for overflow is undefined for all conversions (except the aforementioned unsigned to unsigned) makes any code that may produce positive or negative overflows exposed to portability issues.
numeric_cast returns the result of converting a value of type Source to a value of type Target. If out-of-range is detected, an overflow policy is executed whose default behavior is to throw an an exception (see bad_numeric_cast, negative_overflow and positive_overflow ).

size_t divided by int type conversion rules

When I am doing arithmetic operations with size_t type (or unsigned long), how careful should I be with decorating integer constants with type literals. For example,
size_t a = 1111111;
if (a/2 > 0) ...;
What happens when compiler does the division? Does it treat 2 as integer or as unsigned integer? If the former, then what is the resulting type for (unsigned int)/(int)?
Should I always carefully write 'u' literals
if (a/2u > 0) ...;
for (a=amax; a >= 0u; a -= 3u) ...;
or compiler will correctly guess that I want to use operations with unsigned integers?

2 is indeed treated as an int, which is then implicitly converted to size_t. In a mixed operation size_t / int, the unsigned type "wins" and signed type gets converted to unsigned one, assuming the unsigned type is at least as wide as the signed one. The result is unsigned, i.e. size_t in your case. (See Usual arithmetic conversions for details).
It is a better idea to just write it as a / 2. No suffixes, no type casts. Keep the code as type-independent as possible. Type names (and suffixes) belong in declarations, not in statements.

The C standard guarantees that size_t is an unsigned integer.
The literal 2 is always of type int.
The "usual artihmetic converstions guarantee that whenever an unsigned and a signed integer of the same size ("rank") are used as operands in a binary operation, the signed operand gets converted to unsigned type.
So the compiler actually interprets the expression like this:
a/(size_t)2 > (size_t)0
(The result of the > operator or any relational operator is always of type int though, as a special case for that group of operators.)
Should I always carefully write 'u' literals
Some coding standards, most notably MISRA-C, would have you do this, to make sure that no implicit type promotions exist in the code. Implicit promotions or conversions are very dangerous and they are a flaw in the C language.
For your specific case, there is no real danger with implicit promotions. But there are cases when small integer types are used and you might end up with unintentional change of signedness because of implicit type promotions.
There is never any harm with being explicit, although writing an u suffix to every literal in your code may arguably reduce readability.
Now what you really must do as a C programmer to deal with type promotion dangers, is to learn how the integer promotions and usual arithmetic conversions work (here's some example on the topic). Sadly, there are lots of C programmers who don't, veterans included. The result is subtle, but sometimes critical bugs. Particularly when using the bit-wise operators such as shifts, where change of signedness could invoke undefined behavior.
These rules can be somewhat tricky to learn as they aren't really behaving rationally or consistently. But until you know these rules in detail you have to be explicit with types.
EDIT: To be picky, the size of size_t is actually not specified, all the standard says is that it must be large enough to at least hold the value 65535 (2 bytes). So in theory, size_t could be equal to unsigned short, in which case promotions would turn out quite different. But in practice I doubt that scenario is of any interest, since I don't believe there exists any implementation where size_t is smaller than unsigned int.

Both C++ and C promote signed types to unsigned types when evaluating an operator like division which takes two arguments, and one of the arguments is an unsigned type.
So the literal 2 will be converted to an unsigned type.
Personally, I believe it's better to leave promotion to the compiler rather than be explicit: if your code was ever refactored and a became a signed type then a / 2u would cause a to be promoted to an unsigned type, with potentially disastrous consequences.

size_t sz = 11;
sz / 2 = 5
sz / (-2) = 0
WHY?
sz is treated as unsigned int because size can not be negative. When doing an arithmetic operation with an unsigned int and an int, the int is turned into unsigned int.
From "CS dummies"

Aliasing of otherwise equivalent signed and unsigned types

The C and C++ standards both allow signed and unsigned variants of the same integer type to alias each other. For example, unsigned int* and int* may alias. But that's not the whole story because they clearly have a different range of representable values. I have the following assumptions:
If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?
If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?
If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?
Additionally, what about compatible but not equivalent integer types?
On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)
Can char16_t* and uint_least16_t* alias? I suspect this differs between C and C++. In C, char16_t is a typedef for uint_least16_t (correct?). In C++, char16_t is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.

If an unsigned int is read through an int*, the value must be
within the range of int or an integer overflow occurs and the
behaviour is undefined. Is this correct?
Why would it be undefined? there is no integer overflow since no conversion or computation is done. We take an object representation of an unsigned int object and see it through an int. In what way the value of the unsigned int object transposes to the value of an int is completely implementation defined.
If an int is read through an unsigned int*, negative values wrap
around as if they were casted to unsigned int. Is this correct?
Depends on the representation. With two's complement and equivalent padding, yes. Not with signed magnitude though - a cast from int to unsigned is always defined through a congruence:
If the destination type is unsigned, the resulting value is the
least unsigned integer congruent to the source integer (modulo
2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this
conversion is conceptual and there is no change in the bit pattern (if
there is no truncation). — end note ]
And now consider
10000000 00000001 // -1 in signed magnitude for 16-bit int
This would certainly be 215+1 if interpreted as an unsigned. A cast would yield 216-1 though.
If the value is within the range of both int and unsigned int,
accessing it through a pointer of either type is fully defined and
gives the same value. Is this correct?
Again, with two's complement and equivalent padding, yes. With signed magnitude we might have -0.
On systems where int and long have the same range, alignment,
etc., can int* and long* alias? (I assume not.)
No. They are independent types.
Can char16_t* and uint_least16_t* alias?
Technically not, but that seems to be an unneccessary restriction of the standard.
Types char16_t and char32_t denote distinct types with the same
size, signedness, and alignment as uint_least16_t and
uint_least32_t, respectively, in <cstdint>, called the underlying
types.
So it should be practically possible without any risks (since there shouldn't be any padding).

If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?
For a system using two's complement, type-punning and signed-to-unsigned conversion are equivalent, for example:
int n = ...;
unsigned u1 = (unsigned)n;
unsigned u2 = *(unsigned *)&n;
Here, both u1 and u2 have the same value. This is by far the most common setup (e.g. Gcc documents this behaviour for all its targets). However, the C standard also addresses machines using ones' complement or sign-magnitude to represent signed integers. In such an implementation (assuming no padding bits and no trap representations), the result of a conversion of an integer value and type-punning can yield different results. As an example, assume sign-magnitude and n being initialized to -1:
int n = -1; /* 10000000 00000001 assuming 16-bit integers*/
unsigned u1 = (unsigned)n; /* 11111111 11111111
effectively 2's complement, UINT_MAX */
unsigned u2 = *(unsigned *)&n; /* 10000000 00000001
only reinterpreted, the value is now INT_MAX + 2u */
Conversion to an unsigned type means adding/subtracting one more than the maximum value of that type until the value is in range. Dereferencing a converted pointer simply reinterprets the bit pattern. In other words, the conversion in the initialization of u1 is a no-op on 2's complement machines, but requires some calculations on other machines.
If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?
Not exactly. The bit pattern must represent a valid value in the new type, it doesn't matter if the old value is representable. From C11 (n1570) [omitted footnotes]:
6.2.6.2 Integer types
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that objects of that type shall be capable of representing values from 0 to 2N-1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M≤N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and magnitude);
the sign bit has the value -2M (two's complement);
the sign bit has the value -2M-1 (ones' complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.
E.g., an unsigned int could have value bits, where the corresponding signed type (int) has a padding bit, something like unsigned u = ...; int n = *(int *)&u; may result in a trap representation on such a system (reading of which is undefined behaviour), but not the other way round.
If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?
I think, the standard would allow for one of the types to have a padding bit, which is always ignored (thus, two different bit patterns can represent the same value and that bit may be set on initialization) but be an always-trap-if-set bit for the other type. This leeway, however, is limited at least by ibid. p5:
The values of any padding bits are unspecified. A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value. For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.
On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)
Sure they can, if you don't use them ;) But no, the following is invalid on such platforms:
int n = 42;
long l = *(long *)&n; // UB
Can char16_t* and uint_least16_t* alias? I suspect this differs between C and C++. In C, char16_t is a typedef for uint_least16_t (correct?). In C++, char16_t is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.
I'm not sure about C++, but at least for C, char16_t is a typedef, but not necessarily for uint_least16_t, it could very well be a typedef of some implementation-specific __char16_t, some type incompatible with uint_least16_t (or any other type).

It is not defined that happens since the c standard does not exactly define how singed integers should be stored. so you can not rely on the internal representation. Also there does no overflow occur. if you just typecast a pointer nothing other happens then another interpretation of the binary data in the following calculations.
Edit
Oh, i misread the phrase "but not equivalent integer types", but i keep the paragraph for your interest:
Your second question has much more trouble in it. Many machines can only read from correctly aligned addresses there the data has to lie on multiples of the types width. If you read a int32 from a non-by-4-divisable address (because you casted a 2-byte int pointer) your CPU may crash.
You should not rely on the sizes of types. If you chose another compiler or platform your long and int may not match anymore.
Conclusion:
Do not do this. You wrote highly platform dependent (compiler, target machine, architecture) code that hides its errors behind casts that suppress any warnings.

Concerning your questions regarding unsigned int* and int*: if the
value in the actual type doesn't fit in the type you're reading, the
behavior is undefined, simply because the standard neglects to define
any behavior in this case, and any time the standard fails to define
behavior, the behavior is undefined. In practice, you'll almost always
obtain a value (no signals or anything), but the value will vary
depending on the machine: a machine with signed magnitude or 1's
complement, for example, will result in different values (both ways)
from the usual 2's complement.
For the rest, int and long are different types, regardless of their
representations, and int* and long* cannot alias. Similarly, as you
say, in C++, char16_t is a distinct type in C++, but a typedef in
C (so the rules concerning aliasing are different).

Why int plus uint returns uint?

int plus unsigned int returns an unsigned int. Should it be so?
Consider this code:
#include <boost/static_assert.hpp>
#include <boost/typeof/typeof.hpp>
#include <boost/type_traits/is_same.hpp>
class test
{
static const int si = 0;
static const unsigned int ui = 0;
typedef BOOST_TYPEOF(si + ui) type;
BOOST_STATIC_ASSERT( ( boost::is_same<type, int>::value ) ); // fails
};
int main()
{
return 0;
}

If by "should it be" you mean "does my compiler behave according to the standard": yes.
C++2003: Clause 5, paragraph 9:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
blah
Otherwise, blah,
Otherise, blah, ...
Otherwise, if either operand is unsigned, the other shall be converted to unsigned.
If by "should it be" you mean "would the world be a better place if it didn't": I'm not competent to answer that.

Unsigned integer types mostly behave as members of a wrapping abstract algebraic ring of values which are equivalent mod 2^N; one might view an N-bit unsigned integer not as representing a particular integer, but rather the set of all integers with a particular value in the bottom N bits. For example, if one adds together two binary numbers whose last 4 digits are ...1001 and ...0101, the result will be ...1110. If one adds ...1111 and ...0001, the result will be ...0000; if one subtracts ...0001 from ...0000 the result will be ...1111. Note that concepts of overflow or underflow don't really mean anything, since the upper-bit values of the operands are unknown and the upper-bit values of the result are of no interest. Note also that adding a signed integer whose upper bits are known to one whose upper bits are "don't know/don't care" should yield a number whose upper bits are "don't know/don't care" (which is what unsigned integer types mostly behave as).
The only places where unsigned integer types fail to behave as members of a wrapping algebraic ring is when they participate in comparisons, are used in numerical division (which implies comparisons), or are promoted to other types. If the only way to convert an unsigned integer type to something larger was to use an operator or function for that purpose, the use of such an operator or function could make clear that it was making assumptions about the upper bits (e.g. turning "some number whose lower bits are ...00010110" into "the number whose lower bits are ...00010110 and whose upper bits are all zeroes). Unfortunately, C doesn't do that. Adding a signed value to an unsigned value of equal size yields a like-size unsigned value (which makes sense with the interpretation of unsigned values above), but adding a larger signed integer to an unsigned type will cause the compiler to silently assume that all upper bits of the latter are zeroes. This behavior can be especially vexing in cases where, depending upon a compilers' promotion rules, some compilers may deem two expressions as having the same size while others may view them as different sizes.

It is likely that the behavior stems from the logic behind pointer types (memory location, e.g. std::size_t) plus a memory location difference (std::ptrdiff_t) is also a memory location.
In other words, std::size_t = std::size_t + std::ptrdiff_t.
When this logic is translated to underlaying types this means, unsigned long = unsigned long + long, or unsigned = unsigned + int.
The "other" explanation from #supercat is also possibly correct.
What is clear is that unsigned integer were not designed or should not be interpreted to be mathematical positive numbers, no even in principle. See https://www.youtube.com/watch?v=wvtFGa6XJDU

When does casting change a value's bits in C++?

I have a C++ unsigned int which is actually storing a signed value. I want to cast this variable to a signed int, so that the unsigned and signed values have the same binary value.
unsigned int lUnsigned = 0x80000001;
int lSigned1 = (int)lUnsigned; // Does lSigned == 0x80000001?
int lSigned2 = static_cast<int>(lUnsigned); // Does lSigned == 0x80000001?
int lSigned3 = reinterpret_cast<int>(lUnsigned); // Compiler didn't like this
When do casts change the bits of a variable in C++? For example, I know that casting from an int to a float will change the bits because int is twos-complement and float is floating-point. But what about other scenarios? I am not clear on the rules for this in C++.
In section 6.3.1.3 of the C99 spec it says that casting from an unsigned to a signed integer is compiler-defined!

A type conversion can
keep the conceptual value (the bitpattern may have to be changed), or
keep the bitpattern (the conceptual value may have to be changed).
The only C++ cast that guaranteed always keeps the bitpattern is const_cast.
A reinterpret_cast is, as its name suggests, intended to keep the bitpattern and simply reinterpret it. But the standard allows an implementation very much leeway in how to implement reinterpret_cast. In some case a reinterpret_cast may change the bitpattern.
A dynamic_cast generally changes both bitpattern and value, since it generally delves into an object and returns a pointer/reference to a sub-object of requested type.
A static_cast may change the bitpattern both for integers and pointers, but, nearly all extant computers use a representation of signed integers (called two's complement) where static_cast will not change the bitpattern. Regarding pointers, suffice it to say that, for example, when a base class is non-polymorphic and a derived class is polymorphic, using static_cast to go from pointer to derived to pointer to base, or vice versa, may change the bitpattern (as you can see when comparing the void* pointers). Now, integers...
With n value bits, an unsigned integer type has 2^n values, in the range 0 through 2^n-1 (inclusive).
The C++ standard guarantees that any result of the type is wrapped into that range by adding or subtracting a suitable multiple of 2^n.
Actually that's how the C standard describes it; the C++ standard just says that operations are modulo 2^n, which means the same.
With two's complement form a signed value -x has the same bitpattern as the unsigned value -x+2^n. That is, the same bitpattern as the C++ standard guarantees that you get by converting -x to unsigned type of the same size. That's the simple basics of two's complement form, that it is precisely the guarantee that you're seeking. :-)
And nearly all extant computers use two's complement form.
Hence, in practice you're guaranteed an unchanged bitpattern for your examples.

If you cast from a smaller signed integral type to a larger signed integral type, copies of the original most significant bit (1 in the case of a negative number) will be prepended as necessary to preserve the integer's value.
If you cast an object pointer to a pointer of one of its superclasses, the bits can change, especially if there is multiple inheritance or virtual superclasses.
You're kind of asking for the difference between static_cast and reinterpret_cast.

If your implementation uses 2's complement for signed integer types, then casting from signed to unsigned integer types of the same width doesn't change the bit pattern.
Casting from unsigned to signed could in theory do all sorts of things when the value is out of range of the signed type, because it's implementation-defined. But the obvious thing for a 2's complement implementation to do is to use the same bit pattern.
If your implementation doesn't use 2's complement, then casting between signed and unsigned values will change the bit pattern, when the signed value involved is negative. Such implementations are rare, though (I don't specifically know of any use of non-2's complement in C++ compilers).

Using a C-style cast, or a static_cast, to cast an unsigned int to a signed int may still allow the compiler to assign the former to the latter directly as if a cast were not performed, and thus may change the bits if the unsigned int value is larger than what the signed int can hold. A reinterpret_cast should work though, or you can type-cast using a pointer instead:
unsigned int lUnsigned = 0x80000001;
int lSigned1 = *((int*)&lUnsigned);
int lSigned2 = *(reinterpret_cast<int*>(&lUnsigned));

unsigned int is always the same size as int. And every computer on the planet uses 2's complement these days. So none of your casts will change the bit representation.

You're looking for int lSigned = reinterpret_cast<int&>(lUnsigned);
You don't want to reinterpret the value of lUnsigned, you want to reinterpret the object lUnsigned. Hence, the cast to a reference type.

Casting is just a way to override the type-checker, it shouldn't actually modify the bits themselves.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js