C++ Converting unsigned to signed integer portability - c++

I know that in C the conversion of unsigned to signed integers is implementation defined, but what is it for C++? I figured someone would have asked this already, and I searched but I couldn't find it.
I have a function that operates on an unsigned integer and returns a related unsigned integer. I am passing that function a signed integer by casting to unsigned similar to int num = -6; unsigned ret = func((unsigned)num); int ret_as_signed = (int)ret;. In Visual Studio that works fine, but I wonder how portable it is.
Is there a portable way to convert unsigned integers to signed integers? It it possible to just reverse how signed integers are converted to unsigned via wraparound? Thanks

Since C++20 finally got rid of ones' complement and sign-magnitude integers, conversion between signed and unsigned integers is well-defined and reversible. All standard integer types are now 2's complement and conversion between signed and unsigned does not alter any bits in the representation.
For versions of C++ prior to C++20, the original answer still applies. I'm leaving it as a historical remnant.
Conversion of an unsigned integer to a signed integer where the unsigned value is outside of the range of the signed type is implementation-defined. You cannot count on being able to round-trip a negative integer to unsigned and then back to signed. [1]
C++ standard, [conv.integral], § 4.7/3:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.
[1] It seems likely that it will work, but there are no guarantees.

For the portable version of the inverse of signed->unsigned conversion, how about:
if ( ret <= INT_MAX )
ret_as_signed = ret;
else
ret_as_signed = -(int)(UINT_MAX - ret) - 1;
You could probably generalize this using the templates in <limits>.

Related

c++ safeness of code with implicit conversion between signed and unsigned

According to the rules on implicit conversions between signed and unsigned integer types, discussed here and here, when summing an unsigned int with a int, the signed int is first converted to an unsigned int.
Consider, e.g., the following minimal program
#include <iostream>
int main()
{
unsigned int n = 2;
int x = -1;
std::cout << n + x << std::endl;
return 0;
}
The output of the program is, nevertheless, 1 as expected: x is converted first to an unsigned int, and the sum with n leads to an integer overflow, giving the "right" answer.
In a code like the previous one, if I know for sure that n + x is positive, can I assume that the sum of unsigned int n and int x gives the expected value?
In a code like the previous one, if I know for sure that n + x is positive, can I assume that the sum of unsigned int n and int x gives the expected value?
Yes.
First, the signed value converted to unsigned, using modulo arithmetic:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n
where n is the number of bits used to represent the unsigned type).
Then two unsigned values will be added using modulo arithmetic:
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.
This means that you'll get the expected answer.
Even, if the result would be negative in the mathematical sense, the result in C++ would be a number which is modulo-equal to the negative number.
Note that I've supposed here that you add two same-sized integers.
I think you can be sure and it is not implementation defined, although this statement requires some interpretations of the standard when it comes to systems that do not use two's complement for representing negative values.
First, let's state the things that are clear: unsigned integrals do not overflow but take on a modulo 2^nrOfBits-value (cf this online C++ standard draft):
6.7.1 Fundamental types
(7) Unsigned integers shall obey the laws of arithmetic modulo 2n
where n is the number of bits in the value representation of that
particular size of integer.
So it's just a matter of whether a negative value nv is converted correctly into an unsigned integral bit pattern nv(conv) such that x + nv(conv) will always be the same as x - nv. For the case of a system using two's complement, things are clear, since the two's complement is actually designed such that this arithmetic works immediately.
For systems using other representations of negative values, we'll have to read the standard carefully:
7.8 Integral conversions
(2) If the destination type is unsigned, the resulting value is the
least unsigned integer congruent to the source integer (modulo 2n
where n is the number of bits used to represent the unsigned type). [
Note: In a two’s complement representation, this conversion is
conceptual and there is no change in the bit pattern (if there is
notruncation). —endnote]
As the footnote explicitly says, that in a two's complement representation, there is no change in the bit pattern, we may assume that in systems other than 2s complement a real conversion will take place such that x + nv(conv) == x - nv.
So due to 7.8 (2), I'd say that your assumption is valid.

Is Sign Extension in C++ a compiler option, or compiler dependent or target dependent?

The following code has been compiled on 3 different compilers and 3 different processors and gave 2 different results:
typedef unsigned long int u32;
typedef signed long long s64;
int main ()
{ u32 Operand1,Operand2;
s64 Result;
Operand1=95;
Operand2=100;
Result= (s64)(Operand1-Operand2);}
Result produces 2 results:
either
-5 or 4294967291
I do understand that the operation of (Operand1-Operand2) is done in as 32-bit unsigned calculation, then when casted to s64 sign extension was done correctly in the first case but not done correctly for the 2nd case.
My question is whether the sign extension is possible to be controlled via compiler options, or it is compiler-dependent or maybe it is target-dependent.
Your issue is that you assume unsigned long int to be 32 bit wide and signed long long to be 64 bit wide. This assumption is wrong.
We can visualize what's going on by using types that have a guaranteed (by the standard) bit width:
int main() {
{
uint32_t large = 100, small = 95;
int64_t result = (small - large);
std::cout << "32 and 64 bits: " << result << std::endl;
} // 4294967291
{
uint32_t large = 100, small = 95;
int32_t result = (small - large);
std::cout << "32 and 32 bits: " << result << std::endl;
} // -5
{
uint64_t large = 100, small = 95;
int64_t result = (small - large);
std::cout << "64 and 64 bits: " << result << std::endl;
} // -5
return 0;
}
In every of these three cases, the expression small - large results in a result of unsigned integer type (of according width). This result is calculated using modular arithmetic.
In the first case, because that unsigned result can be stored in the wider signed integer, no conversion of the value is performed.
In the other cases the result cannot be stored in the signed integer. Thus an implementation defined conversion is performed, which usually means interpreting the bit pattern of the unsigned value as signed value. Because the result is "large", the highest bits will be set, which when treated as signed value (under two's complement) is equivalent to a "small" negative value.
To highlight the comment from Lưu Vĩnh Phúc:
Operand1-Operand2 is unsigned therefore when casting to s64 it's always zero extension. [..]
The sign extension is only done in the first case as only then there is a widening conversion, and it is indeed always zero extension.
Quotes from the standard, emphasis mine. Regarding small - large:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n$ where n is the number of bits used to represent the unsigned type). [..]
§ 4.7/2
Regarding the conversion from unsigned to signed:
If the destination type [of the integral conversion] is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.
§ 4.7/3
Sign extension is platform dependent, where platform is a combination of a compiler, target hardware architecture and operating system.
Moreover, as Paul R mentioned, width of built-in types (like unsigned long) is platform-dependent too. Use types from <cstdint> to get fixed-width types. Nevertheless, they are just platform-dependent definitions, so their sign extension behavior still depends on the platform.
Here is a good almost-duplicate question about type sizes. And here is a good table about type size relations.
Type promotions, and the corresponding sign-extensions are specified by the C++ language.
What's not specified, but is platform-dependent, is the range of integer types provided. It's even Standard-compliant for char, short int, int, long int and long long int all to have the same range, provided that range satisfies the C++ Standard requirements for long long int. On such a platform, no widening or narrowing would ever happen, but signed<->unsigned conversion could still alter values.

Is static_cast<T>(-1) the right way to generate all-one-bits data without numeric_limits?

I'm writing C++ code in an environment in which I don't have access to the C++ standard library, specifically not to std::numeric_limits. Suppose I want to implement
template <typename T> constexpr T all_ones( /* ... */ )
Focusing on unsigned integral types, what do I put there? Specifically, is static_cast<T>(-1) good enough? (Other types I could treat as an array of unsigned chars based on their size I guess.)
Use the bitwise NOT operator ~ on 0.
T allOnes = ~(T)0;
A static_cast<T>(-1) assumes two's complement, which is not portable. If you are only concerned about unsigned types, hvd's answer is the way to go.
Working example: https://ideone.com/iV28u0
Focusing on unsigned integral types, what do I put there? Specifically, is static_cast(-1) good enough
If you're only concerned about unsigned types, yes, converting -1 is correct for all standard C++ implementations. Operations on unsigned types, including conversions of signed types to unsigned types, are guaranteed to work modulo (max+1).
This disarmingly direct way.
T allOnes;
memset(&allOnes, ~0, sizeof(T));
Focusing on unsigned integral types, what do I put there?
Specifically, is static_cast(-1) good enough
Yes, it is good enough.
But I prefer a hex value because my background is embedded systems, and I have always had to know the sizeof(T).
Even in desktop systems, we know the sizes of the following T:
uint8_t allones8 = 0xff;
uint16_t allones16 = 0xffff;
uint32_t allones32 = 0xffffffff;
uint64_t allones64 = 0xffffffffffffffff;
Another way is
static_cast<T>(-1ull)
which would be more correct and works in any signed integer format, regardless of 1's complement, 2's complement or sign-magnitude. You can also use static_cast<T>(-UINTMAX_C(1))
Because unary minus of an unsigned value is defined as
The negative of an unsigned quantity is computed by subtracting its value from 2^n, where n is the number of bits in the promoted operand."
Therefore -1u will always return an all-one-bits data in unsigned int. ll suffix is to make it work for any types narrower than unsigned long long. There's no extended integer types (yet) in C++ so this should be fine
However a solution that expresses the intention clearer would be
static_cast<T>(~0ull)

Unsigned and Signed int and printf

I understand that I am assigning signed int a value that is larger than what it can handle. Also I should be using %d for signed and %u for unsigned. Similarly I should not be assigning -ve value to unsigned. But if I make such assignments and use printf as below, I get the results show below.
My understanding is that in each case, the number of converted to its two's compliment binary representation which is same for -1 or 4294967295. That is why %u for signed prints 4294967295 by ignoring -ve leftmost bit. When used %d for signed int, it uses left most bit as -ve flag and prints -1. Similarly %u for unsigned prints unsigned value but %d causes it to treat the number as signed and thus prints -1. Is that correct?
signed int si = 4294967295;
unsigned int ui = 4294967295;
printf("si = u=%u d=%d\n", si, si);
printf("ui = u=%u d=%d\n", ui, ui);
Output:
si = u=4294967295 d=-1
ui = u=4294967295 d=-1
signed int si = -1;
unsigned int ui = -1;
printf("si = u=%u d=%d\n", si, si);
printf("ui = u=%u d=%d\n", ui, ui);
Output:
si = u=4294967295 d=-1
ui = u=4294967295 d=-1
That is why %u for signed prints 4294967295 by ignoring -ve leftmost bit. When used %d for signed int, it uses left most bit as -ve flag and prints -1.
In the case of unsigned, the "leftmost" or most significant bit is not ignored, and is not negative; rather it has a place value of 231.
In the negative case, the sign bit is not a flag; instead it is a bit with a place value of -231.
In both cases the value of the integer is equal to the sum of the place values of all the binary digits (bits) set to 1.
The encoding of signed values in this way is known as two's complement. It is not the only possible encoding; what you described is known as sign and magnitude for example, and one's complement is another possibility. However, these alternative encodings are seldom encountered in practice, not least because two's complement is how arithmetic works on modern hardware in all but perhaps the most arcane architectures.
There are a few things going on here let's start out by saying that using the incorrect format specifier to printf is undefined behavior which means the results of your program are unpredictable, what actually happens will depends on many factors including your compiler, architecture, optimization level, etc...
For signed/unsigned conversions, that is defined by the respective standards, both C and C++ make it implementation defined behavior to convert a value that is larger than be stored in a signed integer type, from the C++ draft standard:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is implementation-defined.
for example gcc chooses to use the same convention as unsigned:
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
When you assign -1 to an unsigned value in both C and C++ the result will always be the maximum unsigned value of the type, from the draft C++ standard:
If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2n where n is
the number of bits used to represent the unsigned type). [ Note: In a
two’s complement representation, this conversion is conceptual and
there is no change in the bit pattern (if there is no truncation).
—end note ]
The wording from C99 is easier to digest:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
So we have the following:
-1 + (UNSIGNED_MAX + 1)
the result of which is UNSIGNED_MAX
As for printf and incorrect format specifier we can see form the draft C99 standard section 7.19.6.1 The fprintf function says:
If a conversion specification is invalid, the behavior is
undefined.248) If any argument is not the correct type for the
corresponding conversion specification, the behavior is undefined.
fprintf covers printf with respect to format specifiers and C++ falls back in C with respect to printf.

The "unsigned" keyword [duplicate]

This question already has answers here:
Difference between unsigned and unsigned int in C
(5 answers)
Closed 9 years ago.
I saw in some C++ code the keyword "unsigned" in the following form:
const int HASH_MASK = unsigned(-1) >> 1;
and later:
unsigned hash = HASH_SEED;
(it is taken from the CS106B/X reader - of Stanford - by Eric S. Roberts - on the topic of "implementation of the hash code function for strings").
Can someone tell me please what does that keyword mean and when do I use it anyway?
Thanks!
Take a look: https://stackoverflow.com/a/7176690/1758762
unsigned is a modifier which can apply to any integral type (char,
short, int, long, etc.) but on its own it is identical to unsigned
int.
It's a short version of unsigned int. Syntactically, you can use it anywhere you would use any other datatype like float or short.
Unsigned types are types that can't represent negative numbers; only zero and positive numbers. In C++, they use modular arithmetic; the modulus for an N-bit type is 2^N. It's a good idea to use unsigned rather than signed types when messing around with bit patterns (for example, when calculating hash codes), since C++ allows several different representations of negative numbers which could lead to portability issues.
unsigned can be used as a qualifier for any integer type (e.g. unsigned int or unsigned long long); or on its own as shorthand for unsigned int.
So the first converts -1 into unsigned int. Due to modular arithmetic, this gives the largest representable value. This could also be written (more clearly, in my opinion) as std::numeric_limits<unsigned>::max().
The second declares and initialises a variable of type unsigned int.
Values are signed by default, which means they can be positive or negative. The unsigned keyword is used to specify that a value must be positive.
Signed variables use 1 bit to specify whether the value is positive or not. The unsigned keyword actualy makes this bit part of the value (thus allowing bigger numbers to be stored).
Lastly, unsigned hash is interpreted by compilers as unsigned int hash (int being the default type in C programming).
To get a good idea what unsigned means, one has to understand signed and unsigned integers. For a full explanation of twos-compliment, search Wikipedia, but in a nutshell, a computer stores negative numbers by subtracting negative numbers from 2^32 (for a 32-bit integer). In this way, -1 is stored as 2^32-1. This does mean that you only have 2^31 positive numbers, but that is by the by. This is known as signed integers (as it can have positive or negative sign)
Unsigned tells the compiler that you don't want twos compliment and are dealing only in positive numbers. When -1 is typecast (as it is in the code) to an unsigned int it becomes
2^32-1 = 0b111111111...
Thus that is an easy way of getting a whole lot of 1s in binary.
Use unsigned rarely. If you need to do bit operations, or for some reason need only positive integers bigger than 2^31. Otherwise, if you leave it out, c++ assumes signed integers.
C allows chars to be signed or unsigned, depending on which is more efficient for the host computer. if you want to be sure your char is unsigned, you can declare your variable to be unsigned char. You can use signed char if you want the ensure signed interpretation.
Incidentally, the C and C++ compilers treatd char, signed char, and unsigned char as three distinct types, even though char is compiled into one of the other two.